CN112464269A - Data selection method in federated learning scene - Google Patents

Data selection method in federated learning scene Download PDF

Info

Publication number
CN112464269A
CN112464269A CN202011464915.XA CN202011464915A CN112464269A CN 112464269 A CN112464269 A CN 112464269A CN 202011464915 A CN202011464915 A CN 202011464915A CN 112464269 A CN112464269 A CN 112464269A
Authority
CN
China
Prior art keywords
user
data
users
server
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011464915.XA
Other languages
Chinese (zh)
Inventor
张兰
李向阳
李安然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Deqing Alpha Innovation Research Institute
Original Assignee
Deqing Alpha Innovation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deqing Alpha Innovation Research Institute filed Critical Deqing Alpha Innovation Research Institute
Priority to CN202011464915.XA priority Critical patent/CN112464269A/en
Publication of CN112464269A publication Critical patent/CN112464269A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data selection method in the federated learning scene comprises the steps of filtering out users and data related to tasks, selecting users before training, selecting users and data in the training process, and training a model; meanwhile, because the server end log information is adopted to dynamically select the user; the data selection strategy is efficient and accurate based on the gradient upper bound value selection data and in view of the impact of erroneous data on the gradient.

Description

Data selection method in federated learning scene
Technical Field
The invention relates to a data selection method in a federal learning scene, and belongs to the field of data analysis and data quality evaluation.
Background
How to acquire a large number of high-quality data sets has become a common bottleneck for many machine learning models and AI applications. This is not only because collecting and labeling large numbers of samples is very expensive, but also because privacy issues prevent data sharing in many fields (e.g., medicine and economics). The advent of federal learning has made it possible for end users to jointly train network models using local data. In the federal learning process, the local data quality of the user affects the performance of the global model, and low-quality data (e.g., error label data and non-uniformly distributed data) seriously hinders the global model from achieving good effect.
The invention aims to select a group of high-quality training samples for a given federated learning task in a privacy protection mode under a given budget, so that the accuracy of a model is improved and the convergence speed of the model is accelerated.
There has been a series of work on data selection in deep learning: 1) the method provides various quality indexes such as task relevance and content diversity, performs quality index detection on data samples, and selects data with high quality scores to participate in training. 2) The training samples important to the model are dynamically selected to compose data batch during the training process to speed up model convergence, typically with importance scores quantified by gradient norm or loss values. They cannot be used directly in federal learning: 1) existing methods require direct access to all training samples, whereas in federal systems, data cannot be accessed directly by third parties. 2) Directly computing the importance of each sample incurs unacceptable overhead for resource-limited participants. 3) Existing methods do not take into account the impact of non-IID or erroneous samples on the sample selection strategy and may place more importance on erroneous samples, thereby degrading model performance.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a privacy protection mode for selecting a group of high-quality training samples for a given federal learning task, so that the accuracy of a model is improved and the convergence speed of the model is accelerated. The method comprises the steps of filtering out users and data related to tasks, selecting users before training, selecting users and data in the training process and training models.
Preferably, the method comprises the following steps: task-related user and data filtering is such that when a FL task arrives, the server first computes each user Ck,k∈[K]Tag set Y ofk={yk|(xk,yk)∈DkThe intersection of { (x) and target tag set Yk,yk)|yk∈YkAndyy) to filter out users having target category data. If the number of samples in the intersecting set exceeds the minimum number of the target model | { (x)k,yk)|yk∈YkN Y } | > v, then the user is relevant, in order to meet the need for privacy protection, we use privacy protection negotiation technology (PSI).
Preferably, the method comprises the following steps: selecting by a user before training: the server further selects a set of high quality users (user subscript set Q) from the set of relevant users using a lattice determinant-based (DPP) algorithm to maximize homogeneity and content diversity under a budget constraint B: max V (Q), s.t., Sigmak∈Q,Q∈N′bkB.V (Q) is the quality value of the selected user. The server then coordinates the selected users to begin training the model. In the module, the method mainly comprises the following steps:
a) user selection based on homogeneity: the server preferentially selects users with uniformly distributed data and without missing categories. When homogeneity is taken as an index for selecting users, Vμ(Q)=∑k∈Qμk,μkDefined as the difference between the data distribution and the uniform distribution for user k, i.e.:
Figure BDA0002833748760000021
calculating mu for privacy protectionkThe server and each user use the public key of the server to jointly calculate by using the efficient and safe two-party calculation protocol based on the homomorphic encryption of the BGN. The server then greedily selects the one with the largest
Figure BDA0002833748760000022
Until budget B is exhausted, find the best set of users.
b) Based on diversity user selection: the server selects users with various data contents to participate in model training. When content diversity is used as a user selection criterion, v (q) ═ ρ (D) and D uber @ uk∈QDk
Figure BDA0002833748760000023
Figure BDA0002833748760000024
Wherein S (v)i,vj) Is calculating user Ci,CjSimilarity functions between them, such as euclidean distance. The Server greedily selects the next user with the lowest similarity to the current set of users.
In order to calculate the content diversity of a data set, firstly, feature vector expression of data needs to be extracted, we extract features using a deep learning model, for example, extract content feature vectors of pictures using a VGG-16 network, and then calculate the content diversity of all data of the user. When the data volume M of the user is large and the feature vector dimension l is high, the calculation content diversity overhead is large O (M)2l), meanwhile, the existing computing method needs to be directly contacted with original data, so we propose an efficient privacy-protecting content diversity computing method, which constructs the features of each user data set through a low-dimensional vector based on JL transformation, and protects the privacy of each sample by using a random response mechanism, and mainly comprises the following steps:
i. constructing a data set content sketch: user CkLocally generated content feature vector phik={φk,i|i∈[Uk]Is then server selects a mapping matrix w will be phikMapped as a low dimensional vector h (phi)k,i)=sign(w·φk,i). The distortion caused by the mapping reduces the accuracy of diversity, but protects the privacy of user mapping vectors to a certain extent, and the data set DKThe content vector sketch of
Figure BDA00028337487600000213
Random response mechanism: to further protect the privacy of the presence of each datum, we use a random response mechanism to generate a vector sketch h (phi)k,i) Is represented by a perturbation vector
Figure BDA0002833748760000025
To be provided with
Figure BDA0002833748760000026
Has a probability of 1 to
Figure BDA0002833748760000027
Has a probability of 0 and a probability of 1-f
Figure BDA0002833748760000028
f is a parameter resulting from the definition of the degree of privacy. The user then generates a disturbance sketch using the disturbance vectors
Figure BDA0002833748760000029
And handle
Figure BDA00028337487600000210
Sending the result to a server, and using the disturbance draft vector to calculate similarity by the server
Figure BDA00028337487600000211
And calculating content diversity
Figure BDA00028337487600000212
By this time, the server reduces the overhead of computing content diversity by several orders of magnitude and protects the data privacy of the user.
C) Selecting by a user based on the lattice determinant: when considering both homogeneity and diversity, the user selection problem is translated into the DPP problem. User CiIs μiAnd user CjIs SijWe define a semi-positive definite matrix A[N′]=[Aij]i,j∈[N′],Aij=uiujSijThen probability P that the user is selectedA(Q)=det(AQ). When homogeneity increases and similarity decreases, determinant increases, and thus DPP-based selection tends to select users with evenly distributed categories while avoiding content heightSimilar users. Value function Vd(Q)=det(AQ),
Figure BDA0002833748760000031
SQ=[Sij]i,j∈QWe transform the user selection problem into a log-submodel problem, iteratively selecting the maximum PA(Q { k }) of user Ck
Preferably, the method comprises the following steps: user and data selection during training is given a selected set of high quality users, with an index set of Q ∈ [ N']And in order to further improve the model performance and reduce the training overhead, users of the zeta proportion are selected in each training iteration, and simultaneously the users locally select important data samples to participate in model training. An intuitive quantized user CkThe method for calculating the importance in the t-th round
Figure BDA0002833748760000032
Figure BDA0002833748760000033
Figure BDA0002833748760000034
Is a sample zk,iIn the input and output of the last layer (L layer) of the model, the gradient-based upper bound value selection data is adopted, but in the method, when the model is complex and the data volume is large, the calculation cost is large O (n.s), n is the total data volume, and s is the number theta epsilon of the model parameters to RsTo this end, we propose a policy for dynamically selecting users based on server-side log information. Specifically, in the t-th round, the server is selected according to the probability of the user being selected
Figure BDA0002833748760000035
The number m of users is selected,
Figure BDA0002833748760000036
i.e. we have users that have a large impact on the model to be selected with a higher probability. For each selected user CkThey calculate locally the importance of each data λ, (zk,iT-1), then with probability
Figure BDA0002833748760000037
(gradient L of error data)2Norm value far greater than the gradient L of the correct data2Norm) of those selected
Figure BDA0002833748760000038
Data z ofk,i∈Dk
Preferably, the method comprises the following steps: model training is that in each iteration, all selected users train their local models on the selected samples, and the server aggregates the model updates of the user side to update the global model. The Server repeats the process until a global optimal model θ is obtained*
The invention designs an efficient data selection method facing the federal learning, improves the precision of the model and accelerates the convergence speed of the model. The method has the advantages that the vector sketch and the random response mechanism are adopted, so that the strategy selection of the user is efficient and has privacy protection; meanwhile, because the server end log information is adopted to dynamically select the user; the data selection strategy is efficient and accurate based on the gradient upper bound value selection data and in view of the impact of erroneous data on the gradient.
Drawings
FIG. 1 is a flow diagram of an efficient data selection system in a federated learning scenario.
Detailed Description
The invention will be described in detail below with reference to the following figures: as shown in fig. 1, the data selection method in the federal learning scenario proposed in the present invention is mainly divided into the following modules: filtering out users and data related to tasks, selecting users before training, selecting users and data in the training process, and training models.
(1) Task related user and data filtering: when a FL task arrives, server first calculates each user Ck,k∈[K]Tag set Y ofk={yk|(xk,yk)∈DkIntersection of the target tag set YSet { (x)k,yk)|yk∈YkAndyy) to filter out users having target category data. If the number of samples in the intersecting set exceeds the minimum number of the target model | { (x)k,yk)|yk∈YkAndyjj > v, then the user is relevant. To meet the need for privacy protection, we use privacy protection intersection technology (PSI).
(2) Selecting by a user before training: the server further selects a set of high quality users (user subscript set Q) from the set of relevant users using a lattice determinant-based (DPP) algorithm to maximize homogeneity and content diversity under a budget constraint B: max V (Q), s.t., Sigmak∈Q,Q∈N′bkB.V (Q) is the quality value of the selected user. The server then coordinates the selected users to begin training the model. In the module, the method mainly comprises the following steps:
a) user selection based on homogeneity: the server preferentially selects users with uniformly distributed data and without missing categories. When homogeneity is taken as an index for selecting users, Vμ(Q)=∑k∈Qμk,μkDefined as the difference between the data distribution and the uniform distribution for user k, i.e.:
Figure BDA0002833748760000041
calculating mu for privacy protectionkThe server and each user use the public key of the server to jointly calculate by using the efficient and safe two-party calculation protocol based on the homomorphic encryption of the BGN. The server then greedily selects the one with the largest
Figure BDA0002833748760000042
Until budget B is exhausted, find the best set of users.
b) Based on diversity user selection: the server selects users with various data contents to participate in model training. When content diversity is used as a user selection criterion, v (q) ═ ρ (D) and D uber @ uk∈QDk
Figure BDA0002833748760000043
Figure BDA0002833748760000044
Wherein S (v)i,vj) Is calculating user Ci,CjSimilarity functions between them, such as euclidean distance. The Server greedily selects the next user with the lowest similarity to the current set of users.
In order to calculate the content diversity of a data set, firstly, feature vector expression of data needs to be extracted, we extract features using a deep learning model, for example, extract content feature vectors of pictures using a VGG-16 network, and then calculate the content diversity of all data of the user. When the data volume M of the user is large and the feature vector dimension l is high, the calculation content diversity overhead is large O (M)2l), meanwhile, the existing computing method needs to be directly contacted with original data, so we propose an efficient privacy-protecting content diversity computing method, which constructs the features of each user data set through a low-dimensional vector based on JL transformation, and protects the privacy of each sample by using a random response mechanism, and mainly comprises the following steps:
i. constructing a data set content sketch: user CkLocally generated content feature vector phik={φk,i|i∈[Uk]Is then server selects a mapping matrix w will be phikMapped as a low dimensional vector h (phi)k,i)=sign(w·φk,i). The distortion caused by the mapping reduces the accuracy of diversity, but protects the privacy of user mapping vectors to a certain extent, and the data set DKThe content vector sketch of
Figure BDA0002833748760000045
Random response mechanism: to further protect the privacy of the presence of each datum, we use a random response mechanism to generate a vector sketch h (phi)k,i) Is represented by a perturbation vector
Figure BDA0002833748760000046
To be provided with
Figure BDA0002833748760000047
Has a probability of 1 to
Figure BDA0002833748760000048
Has a probability of 0 and a probability of 1-f
Figure BDA0002833748760000049
f is a parameter resulting from the definition of the degree of privacy. The user then generates a disturbance sketch using the disturbance vectors
Figure BDA00028337487600000410
And handle
Figure BDA00028337487600000411
Sending the result to a server, and using the disturbance draft vector to calculate similarity by the server
Figure BDA00028337487600000412
And calculating content diversity
Figure BDA00028337487600000413
By this time, the server reduces the overhead of computing content diversity by several orders of magnitude and protects the data privacy of the user.
(3) Selecting by a user based on the lattice determinant: when considering both homogeneity and diversity, the user selection problem is translated into the DPP problem. User CiIs μiAnd user CjIs SijWe define a semi-positive definite matrix A[N′]=[Aij]i,j∈[N′],Aij=uiujSijThen probability P that the user is selectedA(Q)=det(AQ). When homogeneity increases and similarity decreases, determinant increases, and thus DPP-based selection tends to select users with evenly distributed categories while avoiding users with highly similar content. Value function Vd(Q)=det(AQ),
Figure BDA0002833748760000051
SQ=[Sij]i,j∈QWe transform the user selection problem into]The og-submodel problem iteratively selects the maximum PA(Q { k }) of user Ck
(4) User and data selection during training: given a selected set of high quality users, its index set is Q ∈ [ N']And in order to further improve the model performance and reduce the training overhead, users of the zeta proportion are selected in each training iteration, and simultaneously the users locally select important data samples to participate in model training. An intuitive quantized user CkThe method for calculating the importance in the t-th round
Figure BDA0002833748760000052
Figure BDA0002833748760000053
Is a sample zk,iIn the input and output of the last layer (L layer) of the model, the gradient-based upper bound value selection data is adopted, but in the method, when the model is complex and the data volume is large, the calculation cost is large O (n.s), n is the total data volume, and s is the number theta epsilon of the model parameters to RsTo this end, we propose a policy for dynamically selecting users based on server-side log information. Specifically, in the t-th round, the server is selected according to the probability of the user being selected
Figure BDA0002833748760000054
The number m of users is selected,
Figure BDA0002833748760000055
i.e. we have users that have a large impact on the model to be selected with a higher probability. For each selected user CkThey calculate locally the importance of each data λ (z)k,iT-1), then with probability
Figure BDA0002833748760000056
(gradient L of error data)2Norm value far greater than the gradient L of the correct data2Norm) selectionThose
Figure BDA0002833748760000057
Data z ofk,i∈Dk
(5) Model training: in each iteration, all selected users train their local models on the selected samples, and the server aggregates the model updates of the user side to update the global model. The Server repeats the process until a global optimal model θ is obtained*

Claims (5)

1. A data selection method in a federated learning scene is characterized by comprising the steps of filtering out users and data related to tasks, selecting the users before training, selecting the users and data in the training process and training a model.
2. The method of claim 1, wherein the task-related users and data are filtered such that when a FL task arrives, the server first calculates each user Ck,k∈[K]Tag set Y ofk={yk|(xk,yk)∈DkThe intersection of { (x) and target tag set Yk,yk)|yk∈YkAndyy) to filter out users having target category data. If the number of samples in the intersecting set exceeds the minimum number of the target model | { (x)k,yk)|yk∈Yk∩Y>v, then the user is relevant, in order to meet the need for privacy protection, we use privacy protection requirements technology (PSI).
3. The method of claim 1, wherein the pre-training user selects: the server further selects a set of high quality users (user subscript set Q) from the set of relevant users using a lattice determinant-based (DPP) algorithm to maximize homogeneity and content diversity under a budget constraint B: maxv (q), s.t., Σk∈Q,Q∈N′bkB.V (Q) is the quality value of the selected user.The server then coordinates the selected users to begin training the model. In the module, the method mainly comprises the following steps:
a) user selection based on homogeneity: the server preferentially selects users with uniformly distributed data and without missing categories. When homogeneity is taken as an index for selecting users, Vμ(Q)=∑k∈Qμk,μkDefined as the difference between the data distribution and the uniform distribution for user k, i.e.:
Figure FDA0002833748750000011
calculating mu for privacy protectionkThe server and each user use the public key of the server to jointly calculate by using the efficient and safe two-party calculation protocol based on the homomorphic encryption of the BGN. The server then greedily selects the one with the largest
Figure FDA0002833748750000012
Until budget B is exhausted, find the best set of users.
b) Based on diversity user selection: the server selects users with various data contents to participate in model training. When content diversity is used as a user selection criterion, v (q) ═ ρ (D) and D uber @ uk∈QDk,
Figure FDA0002833748750000013
Figure FDA0002833748750000014
Wherein S (v)i,vj) Is calculating user Ci,CjSimilarity functions between them, such as euclidean distance. The Server greedily selects the next user with the lowest similarity to the current set of users.
In order to calculate the content diversity of a data set, firstly, feature vector expression of data needs to be extracted, we extract features using a deep learning model, for example, extract content feature vectors of pictures using a VGG-16 network, and then calculate the content diversity of all data of the user. When in useWhen the data volume M of the user is large and the feature vector dimension l is high, the calculation content diversity expense is large O (M)2l), meanwhile, the existing computing method needs to be directly contacted with original data, so we propose an efficient privacy-protecting content diversity computing method, which constructs the features of each user data set through a low-dimensional vector based on JL transformation, and protects the privacy of each sample by using a random response mechanism, and mainly comprises the following steps:
i. constructing a data set content sketch: user CkLocally generated content feature vector phik={φk,i|i∈[Uk]Is then server selects a mapping matrix w will be phikMapped as a low dimensional vector h (phi)k,i)=sign(w·φk,i). The distortion caused by the mapping reduces the accuracy of diversity, but protects the privacy of user mapping vectors to a certain extent, and the data set DKThe content vector sketch of
Figure FDA0002833748750000021
Random response mechanism: to further protect the privacy of the presence of each datum, we use a random response mechanism to generate a vector sketch h (phi)k,i) Is represented by a perturbation vector
Figure FDA0002833748750000022
Figure FDA0002833748750000023
To be provided with
Figure FDA0002833748750000024
Has a probability of 1 to
Figure FDA0002833748750000025
Has a probability of 0 and a probability of 1-f
Figure FDA0002833748750000026
f is a user-defined controlAnd (4) a parameter for controlling the privacy degree. The user then generates a disturbance sketch using the disturbance vectors
Figure FDA0002833748750000027
And handle
Figure FDA0002833748750000028
Sending the result to a server, and using the disturbance draft vector to calculate similarity by the server
Figure FDA0002833748750000029
And calculating content diversity
Figure FDA00028337487500000210
By this time, the server reduces the overhead of computing content diversity by several orders of magnitude and protects the data privacy of the user.
C) Selecting by a user based on the lattice determinant: when considering both homogeneity and diversity, the user selection problem is translated into the DPP problem. User CiIs μiAnd user CjIs SijWe define a semi-positive definite matrix A[N′]=[Aij]i,j∈[N′],Aij=uiujSijThen probability of user being selected
Figure FDA00028337487500000211
When homogeneity increases and similarity decreases, determinant increases, and thus DPP-based selection tends to select users with evenly distributed categories while avoiding users with highly similar content. Function of value
Figure FDA00028337487500000212
SQ=[Sij]i,j∈QWe transform the user selection problem into a log-submodel problem, iteratively selecting the maximum PA(Q { k }) of user Ck
4. The method of data selection in a federated learning scenario as in claim 1, wherein the user and data selection during the training process is given a selected set of high quality users with an index set of Q e [ N']And in order to further improve the model performance and reduce the training overhead, users of the zeta proportion are selected in each training iteration, and simultaneously the users locally select important data samples to participate in model training. An intuitive quantized user CkThe method for calculating the importance in the t-th round
Figure FDA0002833748750000031
Figure FDA0002833748750000032
Is a sample zk,iIn the input and output of the last layer (L layer) of the model, the gradient-based upper bound value selection data is adopted, but in the method, when the model is complex and the data volume is large, the calculation cost is large O (n.s), n is the total data volume, and s is the number theta epsilon of the model parameters to RsTo this end, we propose a policy for dynamically selecting users based on server-side log information. Specifically, in the t-th round, the server is selected according to the probability of the user being selected
Figure FDA0002833748750000033
The number m of users is selected,
Figure FDA0002833748750000034
i.e. we have users that have a large impact on the model to be selected with a higher probability. For each selected user CkThey calculate locally the importance of each data λ (z)k,iT-1), then with probability
Figure FDA0002833748750000035
(gradient L of error data)2Norm value far greater than the gradient L of the correct data2Norm) of those selected
Figure FDA0002833748750000036
Data z ofk,i∈Dk
5. The method of claim 1, wherein the model training is such that in each iteration, all selected users train their local models on selected samples, and server aggregates model updates at user end to update global models. The Server repeats the process until a global optimal model θ is obtained*
CN202011464915.XA 2020-12-14 2020-12-14 Data selection method in federated learning scene Pending CN112464269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011464915.XA CN112464269A (en) 2020-12-14 2020-12-14 Data selection method in federated learning scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011464915.XA CN112464269A (en) 2020-12-14 2020-12-14 Data selection method in federated learning scene

Publications (1)

Publication Number Publication Date
CN112464269A true CN112464269A (en) 2021-03-09

Family

ID=74804713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011464915.XA Pending CN112464269A (en) 2020-12-14 2020-12-14 Data selection method in federated learning scene

Country Status (1)

Country Link
CN (1) CN112464269A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326947A (en) * 2021-05-28 2021-08-31 山东师范大学 Joint learning model training method and system
CN114189899A (en) * 2021-12-10 2022-03-15 东南大学 User equipment selection method based on random aggregation beam forming
CN114219147A (en) * 2021-12-13 2022-03-22 南京富尔登科技发展有限公司 Power distribution station fault prediction method based on federal learning
CN114841016A (en) * 2022-05-26 2022-08-02 北京交通大学 Multi-model federal learning method, system and storage medium
CN115391734A (en) * 2022-10-11 2022-11-25 广州天维信息技术股份有限公司 Client satisfaction analysis system based on federal learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200242466A1 (en) * 2017-03-22 2020-07-30 Visa International Service Association Privacy-preserving machine learning
CN111866954A (en) * 2020-07-21 2020-10-30 重庆邮电大学 User selection and resource allocation method based on federal learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200242466A1 (en) * 2017-03-22 2020-07-30 Visa International Service Association Privacy-preserving machine learning
CN111866954A (en) * 2020-07-21 2020-10-30 重庆邮电大学 User selection and resource allocation method based on federal learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TI_ANY TUOR, ET AL.: "Data Selection for Federated Learning with Relevant and Irrelevant Data at Clients", 《ARXIV:2001.08300》, pages 1 - 14 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326947A (en) * 2021-05-28 2021-08-31 山东师范大学 Joint learning model training method and system
CN113326947B (en) * 2021-05-28 2023-06-16 山东师范大学 Training method and system for joint learning model
CN114189899A (en) * 2021-12-10 2022-03-15 东南大学 User equipment selection method based on random aggregation beam forming
CN114219147A (en) * 2021-12-13 2022-03-22 南京富尔登科技发展有限公司 Power distribution station fault prediction method based on federal learning
CN114219147B (en) * 2021-12-13 2024-06-07 南京富尔登科技发展有限公司 Power distribution station fault prediction method based on federal learning
CN114841016A (en) * 2022-05-26 2022-08-02 北京交通大学 Multi-model federal learning method, system and storage medium
CN115391734A (en) * 2022-10-11 2022-11-25 广州天维信息技术股份有限公司 Client satisfaction analysis system based on federal learning
CN115391734B (en) * 2022-10-11 2023-03-10 广州天维信息技术股份有限公司 Client satisfaction analysis system based on federal learning

Similar Documents

Publication Publication Date Title
CN112464269A (en) Data selection method in federated learning scene
Cao et al. Multi-marginal wasserstein gan
CN112446423B (en) Fast hybrid high-order attention domain confrontation network method based on transfer learning
Xu et al. Unsupervised domain adaptation via importance sampling
CN104008174A (en) Privacy-protection index generation method for mass image retrieval
Liu et al. Intelligent and secure content-based image retrieval for mobile users
CN109726195B (en) Data enhancement method and device
CN112883070B (en) Generation type countermeasure network recommendation method with differential privacy
US11874866B2 (en) Multiscale quantization for fast similarity search
CN113378938A (en) Edge transform graph neural network-based small sample image classification method and system
CN112668482A (en) Face recognition training method and device, computer equipment and storage medium
CN116229552A (en) Face recognition method for embedded hardware based on YOLOV7 model
CN116450877A (en) Image text matching method based on semantic selection and hierarchical alignment
CN114401229A (en) Encrypted traffic identification method based on Transformer deep learning model
Chapel et al. Partial gromov-wasserstein with applications on positive-unlabeled learning
CN116630726B (en) Multi-mode-based bird classification method and system
CN114003744A (en) Image retrieval method and system based on convolutional neural network and vector homomorphic encryption
CN113935396A (en) Manifold theory-based method and related device for resisting sample attack
CN116383470B (en) Image searching method with privacy protection function
CN117456267A (en) Class increment learning method based on similarity prototype playback
CN116796038A (en) Remote sensing data retrieval method, remote sensing data retrieval device, edge processing equipment and storage medium
CN114911967B (en) Three-dimensional model sketch retrieval method based on self-adaptive domain enhancement
CN106529601A (en) Image classification prediction method based on multi-task learning in sparse subspace
CN112906829B (en) Method and device for constructing digital recognition model based on Mnist data set
CN115481415A (en) Communication cost optimization method, system, device and medium based on longitudinal federal learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination