CN115238908A - Data generation method based on variational self-encoder, unsupervised clustering algorithm and federal learning - Google Patents

Data generation method based on variational self-encoder, unsupervised clustering algorithm and federal learning Download PDF

Info

Publication number
CN115238908A
CN115238908A CN202210251482.2A CN202210251482A CN115238908A CN 115238908 A CN115238908 A CN 115238908A CN 202210251482 A CN202210251482 A CN 202210251482A CN 115238908 A CN115238908 A CN 115238908A
Authority
CN
China
Prior art keywords
encoder
local
group
data
central server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210251482.2A
Other languages
Chinese (zh)
Inventor
魏森辉
高明
蔡文渊
杜蓓
刘翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hipu Intelligent Information Technology Co ltd
East China Normal University
Original Assignee
Shanghai Hipu Intelligent Information Technology Co ltd
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hipu Intelligent Information Technology Co ltd, East China Normal University filed Critical Shanghai Hipu Intelligent Information Technology Co ltd
Priority to CN202210251482.2A priority Critical patent/CN115238908A/en
Publication of CN115238908A publication Critical patent/CN115238908A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data generation method based on a variational self-encoder, an unsupervised clustering algorithm and federal learning, which is characterized in that the variational self-encoder of each local client is trained together through a federal learning framework, and the unsupervised clustering algorithm is proposed to group different clients according to the difference of data domains of different clients, and then the federal learning model training is carried out in each cluster independently, so that the harm of the data domain difference to the federal model training is relieved to a great extent, and finally, each group can be trained to obtain a global generation model. More safe shared data can be generated by using the trained global generation model in the prediction stage, and effective data support is improved for more machine learning and deep learning tasks.

Description

Data generation method based on variational self-encoder, unsupervised clustering algorithm and federal learning
Technical Field
The invention belongs to the technical field of data privacy security and data generation of deep learning, and particularly relates to a data generation method based on a variational self-encoder, an unsupervised clustering algorithm and federal learning.
Background
Data is new energy
With the vigorous development of information technologies such as big data, cloud computing, internet of things and the internet, artificial intelligence technologies represented by machine learning and deep learning enter the period of rapid development, and a new technological revolution is opened. For machine learning and deep learning, they are learning processes for mining rules from data, two factors of which are crucial: algorithms, and data. Algorithms can solve the problem of "how to learn" while data can solve the problem of "from where to learn". Due to the rapid development of deep learning, researchers propose a plurality of algorithms to solve the problem of how to learn in models in various scenes, however, the machine learning world flows a sentence, data and characteristics determine the upper limit of machine learning, and the models and algorithms only approach the upper limit, so that the researchers can know that even if the algorithms are delicate, the models have poor performance to solve the practical problem without good data support. Data seems ubiquitous in the modern times, as with the rapid growth of the internet, huge amounts of data are constantly being produced and stored.
In the era of digital economy, data is viewed as a new source of energy, with an unlimited amount of value, and is reusable compared to petroleum. In the present day, data is certainly scarce, but they are scattered on different companies, different people, different devices. Data sharing among different systems and different organizations is generally low in openness, and therefore the problem of information islanding is caused. Massive data are isolated from each other, and further fusion and collision are difficult to achieve to release potential. And if the data is freely shared, the problem of privacy security is involved.
Data privacy security
In recent years, negative events related to privacy leakage and illegal data disclosure of users, for example, in 2018, a third-party company collects personal information of nearly 5000 ten thousand users through an application program, the number of the users accounts for one fourth of the people elected in a country, and the related range is very large. A software bug in the same year causes the leakage of a private photograph of 6800 million users. The series of events raises concerns about privacy security of data by users, and the relevant privacy regulation agencies also make a huge fine. The public is more concerned about data security and privacy protection, and countries begin to establish laws and regulations for data security, establish data security laws and personal information protection laws and provide protection for personal data privacy from the legal level.
Federal learning
In this context, it is difficult to collect enough data for model training for machine learning and deep learning. Compared with the traditional method of collecting data of each party to one place for centralized model training, the data of each party can be independently trained by the current party, the data volume is small, and the difficulty of training a good model is greatly increased. How to effectively integrate and utilize data scattered everywhere without invading the privacy of users is a problem that researchers need to think about. The concept of the federal learning was proposed in 2016, and unlike the traditional machine learning algorithm which requires all data to be concentrated to one place for training, the federal learning sends a model to each owner of the data, learns the data locally, and then integrates the learning results of all parties to obtain a final model. The federated learning allows a user to form a united body to train to obtain a global model on the premise that data is kept not shared by local clients, so that the problem of data privacy security is effectively solved.
Federal learning aims to build a distributed data set based federal learning model. During model training, model-related information can be exchanged (or in encrypted form) between the parties, but the raw data cannot. This exchange does not expose any protected private portion of the parties' data. The trained federated learning model can be placed on each participant of the federated learning system or can be shared among multiple parties.
Horizontal federal learning refers to the fact that data from different participants have a large overlap of features (horizontal), but data samples (vertical), i.e., samples to which features belong, do not overlap as much. For example, the federally learned participant is two banks that serve different regional markets, where the customer population is more diverse, but the customer characteristics may overlap more due to similar business models.
Variational self-encoder
Under the background that massive data are mutually isolated to form a data island, more safe and sharable data are expected to be generated from a coder by combining a generative model variation through a federally learned framework. The variational self-encoder has wide application in the real world, such as image generation, style migration and the like, and has the defect that a large amount of data is required for model training as other deep learning models, otherwise, the generated data quality is poor.
With the development of information technologies such as big data, cloud computing, the internet of things and the like, a generation model based on deep learning such as a variational self-encoder has a good data generation effect, but deep learning generally needs a large amount of data to perform model training, and a large amount of data does exist in real life. For example, after an enterprise develops to a certain stage, a plurality of business parts are bound to appear, each business part has respective data, however, each business part is like an isolated island, and data of different business parts cannot be connected and are isolated from each other, namely, the data isolated island.
Disclosure of Invention
Aiming at the problems, the invention provides a data generation method based on a variational self-encoder, an unsupervised clustering algorithm and federal learning innovatively, and provides a method for grouping different clients by using the unsupervised clustering algorithm according to the difference of data fields of the different clients, and then performing federal learning model training independently in each cluster to finally obtain a federal generation model with good data generation effect. More safe and sharable data can be generated by using the federation generation model in the reasoning stage, and effective data support is improved for more machine learning and deep learning tasks.
The specific technical scheme for realizing the purpose of the invention is as follows:
a data generation method based on a variational self-encoder, an unsupervised clustering algorithm and federal learning comprises the following steps:
model training phase
Step S1: in each round of communication of federal learning, the central server randomly selects the proportion K from all local clients 1 Local client of, wherein K 1 Is 10% -50%, then the encoder parameters of the central server are sent to the selected local client for updating the encoder parameters thereof;
step S2: the selected local client side utilizes the local training set to train the generation model variation self-encoder, defines a mean square error loss function and KL divergence as optimization targets, and uses gradient descent as an optimization method to iteratively train the local model;
and step S3: after the local training is finished, the selected client uploads the encoder parameters in the local variational decoder to a central server in a network communication transmission mode;
and step S4: the central server aggregates the encoder parameters uploaded from the local client and updates the encoder parameters of the central server;
step S5: repeating the steps S1-S4 until all local clients are selected by the central server for at least 3-5 times, sending the encoder parameters of the current central server to all clients, and updating the parameters by the encoders of the local clients;
step S6: each local model transmits its original number through the encoderMapping the data to a low-dimensional space, and clustering the low-dimensional space by using an unsupervised clustering algorithm K-means + + to obtain G 1 Group wherein G 1 The selectable range of (3) to (5), averaging the low-dimensional vectors of each group, and uploading the obtained low-dimensional vectors to a central server;
step S7: after receiving all the low-dimensional vectors sent by the local client, the central server performs clustering by using an unsupervised clustering algorithm K-means + +, and divides all the low-dimensional vectors into G 2 Group wherein G 2 The selectable range of (1) is 4-8, and the group with the most low-dimensional vectors belonging to a certain client is taken as the group into which the client is classified;
step S8: after grouping the local clients, independently performing federated learning model training in each group;
step S9: in each round of communication process of the clients in each group, the central server of the current group randomly selects the ratio K from all local clients of the current group 2 Local client of, wherein K 2 Then sending the encoder and decoder parameters of the current set of central server to the selected local client, and updating the encoder and decoder parameters;
step S10: the selected client in each group carries out local model training in the same step S2;
step S11: the selected client in each group uploads the parameters of the encoder and the decoder in the local variational decoder to a central server of the current group in a network communication transmission mode;
step S12: the central server in each group aggregates the parameters of the encoder and the decoder uploaded from the local client, and updates the parameters of the encoder and the decoder of the central server in the current group;
step S13: repeating the step S9 to the step S12 until the model of each group converges or reaches a fixed communication round number, stopping training, and obtaining a final global generation model for each group;
model prediction phase
Step S14:collecting N from a standard normal distribution s A random sample of N s Adjusting according to a specific service scene;
step S15: the clients of each group map random samples into realistic, secure shared data using a decoder that generates a global model.
The invention has the advantages of
(1) In the steps S1-S5, the encoder of each client model can be used for training a local model and exchanging information related to the model among all parties by means of a federal learning framework under the conditions that the privacy and the safety of the data are protected and the data of the client do not leave the local, and the exchange does not expose the protected original data. Since more data can be accessed in this particular manner, the encoder of the variational self-encoder can have greater information compression capability than the local model training alone.
(2) In the step S6, the encoder with strong information compression capability is utilized to map the original data to the low-dimensional space and simultaneously carry the noise obtained from the standard normal distribution sampling. Therefore, the low-dimensional vector reflecting the data information can be taken, and the central server can be ensured not to reversely deduce the original data from the low-dimensional vector, so that the privacy safety problem of the data is well protected.
(3) In the step S7, considering that the data of different clients are probably not independently and identically distributed, which can obviously reduce the performance of the whole federal learning model, the data generation method based on the variational self-encoder, the unsupervised clustering algorithm and the federal learning provided by the invention utilizes the information of low-dimensional vectors extracted by local clients to group the clients by using the unsupervised clustering algorithm K-means + +, divides the clients with similar data distribution into the same group, and divides the clients with larger data distribution difference into different groups.
(4) In the data generation method based on the variational self-encoder, the unsupervised clustering algorithm and the federal learning, in the steps S8-S13, the federal learning model training is independently carried out in each group, and as the client of each group carries out the federal learning training under the condition of similar data distribution, the harm of data distribution difference to the federal model training can be relieved to a great extent. The generated model variation in each group can be more effectively optimized from the encoder parameters and the decoder parameters of the encoder, and the decoder can obtain stronger data generation capacity by using more data information while improving the information compression capacity of the encoder. The method is beneficial to improving the performance of the central server generation model in each group and improving the generalization capability.
(5) In the data generation method based on the variational self-encoder, the unsupervised clustering algorithm and the federal learning, a large amount of vivid and safe data which can be shared can be generated by utilizing the capability of a decoder of a specific group of central servers to generate data according to different common requirements in the steps S14-S15.
(6) The invention provides a data generation method based on a variational self-encoder, an unsupervised clustering algorithm and federal learning, and the work of combining the variational self-encoder and the federal learning does not exist before. The training method provided by the invention can skillfully utilize the data scattered in different places, and effectively improve the capability of generating data by the variational self-encoder.
(7) At the present stage, the information islanding problem is caused because the data sharing openness degree between different systems and different organizations is generally low, and the problem of user privacy safety can be involved when random data sharing is used for model training, so that legal regulations can be seriously violated. In order to solve the problems, the method provided by the invention can generate a large amount of safe sharable valuable data by integrating the federal learning architecture into the variational self-coder model, thereby improving effective data support for more machine learning and deep learning tasks.
(8) The data collected by different devices may vary significantly, for example, the collectors may have different preferences and different geographical locations may result in different styles of photos. The federal learning is greatly limited by the data distribution difference of different participants, the data distribution difference between the participants is large, and the performance of the federal learning model is greatly reduced. The method skillfully combines the characteristics of the variational self-encoder, performs K-means + + clustering on safe low-dimensional vectors which are extracted by different participants and do not leak original data, groups the different participants, and then performs federated learning training in different groups, so that the problem caused by inconsistent data distribution to federated learning training optimization can be relieved, and finally a federated generation model with a good data generation effect can be obtained. More safe and sharable data can be generated by using the federation generation model in the reasoning stage, and effective data support is improved for more machine learning and deep learning tasks.
Drawings
FIG. 1 is a diagram of a model of a variational self-encoder used by a central server and local clients in the present invention;
FIG. 2 is a frame diagram of the present invention;
FIG. 3 is a flow chart of the prediction phase of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to the following detailed description and the accompanying drawings, which include conditions, processes, training methods, and the like for practicing the invention, and are general knowledge in the art, except for those specifically mentioned below, and the present invention is not particularly limited thereto.
The invention provides a data generation method based on a variational self-encoder, an unsupervised clustering algorithm and federal learning, which specifically comprises the following steps:
suppose there are N local clients { P 1 ,P 2 ,…,P N Are provided with training data { D } 1 ,D 2 ,…,D N And a central server C is also needed in federal learning. The central server C has no data set thereon, and is mainly used for completing a model training task in cooperation with the client. At the model level, participant P i Having variational autoencoder M i And center ofServer C has variational autocoder M g All the variational self-encoders related to the invention have the same structure and are composed of an encoder and a decoder. The encoder is a multilayer convolutional neural network, the decoder is also a multilayer convolutional neural network, and the parameter of the encoder on the local client is set as theta e The parameter of the decoder is theta d The parameter of the encoder on the central server C is θ ge The parameter of the decoder is theta gd . In fact, the invention has no great limitation to the specific network structures of the encoder and the decoder, and only needs to satisfy the model architecture of the variational self-encoder. The concrete model structure of the variational self-encoder is shown in FIG. 1, and the framework of the method of the present invention is shown in FIG. 2.
The invention aims to train a federal data generation model with a good data generation effect. More safe and sharable data can be generated by using the federal data generation model in the prediction stage, and effective data support is improved for more machine learning and deep learning tasks.
Stage of model training
Randomly initializing the parameter of the encoder in the variational self-encoder of the central server to be theta ge The parameter of the decoder is theta gd
Step S1: in each round of communication of federal learning, the central server randomly selects the proportion K from all local clients 1 Local client of (1), wherein K 1 Is 10% -50%, let the selected local client set P = { P = { P } 1 ,P 5 ,…,P N-2 And the encoder parameter theta of the central server ge Sending the parameters to the client in the set P, and after receiving the parameters of the encoder, the local client sends the local variation to the encoder parameter theta of the encoder e Is updated to theta ge
Step S2: the clients in the set P utilize the local data to train the variational self-encoder. The training process is a parameter optimization process, in which a certain client P k For example, the encoder parameter of its variational self-encoder model is θ e And decoder parameter θ d On the basis of minimizing KL divergence and reconstruction loss, the corresponding optimization targets are as follows:
Figure BDA0003546851500000091
wherein x is the input of the variational self-encoder, x gets the mean and variance through the encoder, then gets z through this normal distribution sampling, z gets the output through the decoder
Figure BDA0003546851500000092
Representing the loss of mean square error of the input and output,
Figure BDA0003546851500000093
representing a distribution
Figure BDA0003546851500000094
KL divergence distance from the standard normal distribution N (0, I);
the invention adopts a Stochastic Gradient Descent (SGD) method to optimize the objective function, the learning rate is 0.01, the batch size is 64, and the dimension of the low-dimensional vector z is 32. Training 5 epochs locally, during which the auto-encoder model encoder parameters θ are differentiated e And decoder parameter θ d The information compression capability of the encoder is improved and the data generation capability of the decoder is improved by continuous updating and optimization;
and step S3: after the client in the set P is trained by the local model in the step S2, the parameter theta of the encoder in the local variational decoder model is transmitted in a network communication mode e Uploading to the central server C;
and step S4: the central server aggregates the encoder parameters uploaded from the local client and updates the encoder parameters theta of the central server ge
Figure BDA0003546851500000101
Wherein
Figure BDA0003546851500000102
Representing clients s i Is derived from the encoder parameters of the encoder,
Figure BDA0003546851500000103
representing a client s i The number of data set samples;
step S5: repeating steps S1-S4 until all local clients are selected by the central server at least 3-5 times. Because each round of clients is randomly selected according to a certain proportion, in order to accurately group N clients in a subsequent process, each client needs to be ensured to be selected at least 3-5 times. The encoder parameter theta of the central server at the moment ge Sending the parameters to all clients, and updating the parameter theta of the encoder of the local client e =θ ge
Step S6: each client maps own local original data x to a low-dimensional vector z through an encoder; all locally obtained low-dimensional vectors are marked as SetZ, and an unsupervised clustering algorithm K-means + + is used for clustering, wherein the process is as follows:
(1) Randomly selecting a sample point from the set SetZ as a first initial cluster center;
(2) Then, calculating the shortest distance between each sample and the current existing cluster center, representing the shortest distance by using D (z), and selecting a sample point corresponding to the largest D (z) in the set SetZ as the next cluster center;
(3) Repeating the processes of (1) and (2) until G is selected 1 A cluster center;
(4) According to the principle of minimizing the distance from the cluster center, all sample points are classified into the classes in which the center points are located, and G is calculated 1 The mean of all sample points in the class, as G for the second iteration 1 A center point;
(5) Repeating the step (4) until the central point is not changed any more or the specified iteration times are reached, and finishing the clustering process;
SetZ in each client is divided into G by K-means + + clustering 1 Groups, averaging all low-dimensional vectors of each group, which will then result in G 1 Uploading an average low-dimensional vector to a central server, wherein G 1 Is in the range of 3-5.
Step S7: after receiving all the low-dimensional vectors sent by the local client, the central server performs clustering by using an unsupervised clustering algorithm K-means + +, and divides all the low-dimensional vectors into G 2 Group wherein G 2 Is in the range of 4-8. Will contain the client P i The group with the most low-dimensional vectors is taken as the category of the client. In the operation of S6-S7, the encoder with strong information compression capability is utilized to map the original data into a low-dimensional vector and carry noise obtained by standard normal distribution sampling. Therefore, the low-dimensional vector reflecting the data information can be obtained, and the central server can be ensured not to reversely deduce the original data from the low-dimensional vector, so that the privacy and safety problems of the data are well protected;
step S8: after grouping the local clients, independently performing federated learning model training in each group;
step S9: in each communication process of the clients in each group, the central server C of the current group randomly selects the proportion K from all local clients of the current group 2 Local client of (1), wherein K 2 Is 40% -80%, and then the variation of the central server C of the current group is divided from the encoder parameter θ of the encoder ge And decoder parameter θ gd Sending the parameters to the local client for updating the encoder parameters theta of the local client e And decoder parameter theta d
Step S10: the selected clients in each group carry out local model training in the same step S2;
step S11: the selected clients in each group transmit the trained encoder parameter theta in the local variational decoder in a network communication transmission mode e And decoder parameter θ d Uploading to a central server of the current group;
step S12: central server aggregation slave within each groupEncoder parameter theta uploaded by local client e And decoder parameter theta d The parameter aggregation mode is the same as the step S4 and is used for updating the encoder parameter theta of the current group of central servers ge And decoder parameter theta gd
Step S13: steps S9-S12 are repeated until the model for each group converges or a fixed number of communication rounds is reached. Because each client is subjected to the federal model training with the client with the data distribution similar to that of the client, the harm of data field difference to the federal model training is relieved to a great extent, and each group can be trained to obtain a final global generation model;
model prediction phase
Step S14: collecting N from a standard normal distribution s A random sample of N s Adjusting according to a specific service scene;
step S15: clients of each group utilize a global generative model M g The sample set Z is mapped into a realistic, secure sharable data set X, and the prediction process is shown in fig. 3.

Claims (2)

1. A data generation method based on a variational self-encoder, an unsupervised clustering algorithm and federal learning is characterized by comprising the following steps of:
model training phase
Step S1: in each round of communication of federal learning, the central server randomly selects the proportion K from all local clients 1 Then the encoder parameters of the central server are sent to the selected local client to update the encoder parameters of the local client; wherein K 1 The selection range of (A) is 10% -50%;
step S2: the selected local client side utilizes the local training set to train the generation model variation self-encoder, defines a mean square error loss function and KL divergence as optimization targets, and uses gradient descent as an optimization method to iteratively train the local model;
and step S3: after the local training is finished, the selected client uploads the encoder parameters in the local variational decoder to a central server in a network communication transmission mode;
and step S4: the central server aggregates the encoder parameters uploaded from the local client and updates the encoder parameters of the central server;
step S5: repeating the step S1 to the step S4 until all the local clients are selected by the central server for at least 3-5 times, sending the encoder parameters of the current central server to all the clients, and updating the parameters by the encoders of the local clients;
step S6: each local model maps own original data to a low-dimensional space through an encoder, and clustering is carried out on the low-dimensional space by using an unsupervised clustering algorithm K-means + + to obtain G 1 The low-dimensional vectors of each group are averaged, and then the obtained low-dimensional vectors are uploaded to a central server; wherein G is 1 Is 3 to 5;
step S7: after receiving all the low-dimensional vectors sent by the local client, the central server performs clustering by using an unsupervised clustering algorithm K-means + +, and divides all the low-dimensional vectors into G 2 A group to which the client is classified, the group having the most low-dimensional vectors belonging to the client being the group to which the client is classified; wherein G is 2 Is 4 to 8;
step S8: after grouping the local clients, independently performing federated learning model training in each group;
step S9: in each round of communication process of the clients in each group, the central server of the current group randomly selects the ratio K from all local clients of the current group 2 Then, the encoder and decoder parameters of the current group of central servers are sent to the selected local client, and the encoder and decoder parameters of the local client are updated; wherein K 2 The selection range of (A) is 40% -80%;
step S10: the selected client in each group carries out local model training in the same step S2;
step S11: uploading parameters of an encoder and a decoder in a local variation self-decoder to a central server of a current group by selected clients in each group in a network communication transmission mode;
step S12: the central server in each group aggregates the parameters of the encoder and the decoder uploaded from the local client, and updates the parameters of the encoder and the decoder of the central server in the current group;
step S13: repeating the step S9 to the step S12 until the model of each group converges or reaches a fixed communication round number, stopping training, and obtaining a final global generation model for each group;
model prediction phase
Step S14: collecting N from a standard normal distribution s A random number of samples, wherein N s Adjusting according to a specific service scene;
step S15: the clients of each group map the random samples into realistic, secure shared data using a decoder of the global generative model.
2. The method for generating data based on the variational self-encoder, the unsupervised clustering algorithm and the federal learning as claimed in claim 1, wherein said step S6 specifically comprises: each client maps own local original data x to a low-dimensional vector z through an encoder; all locally obtained low-dimensional vectors are marked as SetZ, and an unsupervised clustering algorithm K-means + + is used for clustering, wherein the process is as follows:
(1) Randomly selecting a sample point from the set SetZ as a first initial cluster center;
(2) Then, calculating the shortest distance between each sample and the current existing cluster center, representing the shortest distance by using D (z), and selecting a sample point corresponding to the largest D (z) in the set SetZ as the next cluster center;
(3) Repeating the processes of (1) and (2) until G is selected 1 A cluster center;
(4) According to the principle of minimizing the distance from the cluster center, all sample points are classified into the classes in which the center points are located, and G is calculated 1 The mean of all sample points in the class, as G for the second iteration 1 A center point;
(5) Repeating the step (4) until the central point is not changed any more or the specified iteration times are reached, and finishing the clustering process;
SetZ in each client is divided into G by K-means + + clustering 1 Groups, averaging all low-dimensional vectors of each group, which will then result in G 1 Uploading an average low-dimensional vector to a central server, wherein G 1 Is 3-5.
CN202210251482.2A 2022-03-15 2022-03-15 Data generation method based on variational self-encoder, unsupervised clustering algorithm and federal learning Pending CN115238908A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210251482.2A CN115238908A (en) 2022-03-15 2022-03-15 Data generation method based on variational self-encoder, unsupervised clustering algorithm and federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210251482.2A CN115238908A (en) 2022-03-15 2022-03-15 Data generation method based on variational self-encoder, unsupervised clustering algorithm and federal learning

Publications (1)

Publication Number Publication Date
CN115238908A true CN115238908A (en) 2022-10-25

Family

ID=83667889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210251482.2A Pending CN115238908A (en) 2022-03-15 2022-03-15 Data generation method based on variational self-encoder, unsupervised clustering algorithm and federal learning

Country Status (1)

Country Link
CN (1) CN115238908A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860116A (en) * 2022-12-02 2023-03-28 广州图灵科技有限公司 Federal learning method based on generative model and deep transfer learning
CN115881306A (en) * 2023-02-22 2023-03-31 中国科学技术大学 Networked ICU intelligent medical decision-making method based on federal learning and storage medium
CN116108491A (en) * 2023-04-04 2023-05-12 杭州海康威视数字技术股份有限公司 Data leakage early warning method, device and system based on semi-supervised federal learning
CN116522228A (en) * 2023-04-28 2023-08-01 哈尔滨工程大学 Radio frequency fingerprint identification method based on feature imitation federal learning
CN116578674A (en) * 2023-07-07 2023-08-11 北京邮电大学 Federal variation self-coding theme model training method, theme prediction method and device
CN116741388A (en) * 2023-08-14 2023-09-12 中国人民解放军总医院 Method for constructing cardiovascular critical severe disease large model based on federal learning
CN117077817A (en) * 2023-10-13 2023-11-17 之江实验室 Personalized federal learning model training method and device based on label distribution
CN117094412A (en) * 2023-08-18 2023-11-21 之江实验室 Federal learning method and device aiming at non-independent co-distributed medical scene

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860116A (en) * 2022-12-02 2023-03-28 广州图灵科技有限公司 Federal learning method based on generative model and deep transfer learning
CN115881306A (en) * 2023-02-22 2023-03-31 中国科学技术大学 Networked ICU intelligent medical decision-making method based on federal learning and storage medium
CN115881306B (en) * 2023-02-22 2023-06-16 中国科学技术大学 Networked ICU intelligent medical decision-making method based on federal learning and storage medium
CN116108491A (en) * 2023-04-04 2023-05-12 杭州海康威视数字技术股份有限公司 Data leakage early warning method, device and system based on semi-supervised federal learning
CN116108491B (en) * 2023-04-04 2024-03-22 杭州海康威视数字技术股份有限公司 Data leakage early warning method, device and system based on semi-supervised federal learning
CN116522228B (en) * 2023-04-28 2024-02-06 哈尔滨工程大学 Radio frequency fingerprint identification method based on feature imitation federal learning
CN116522228A (en) * 2023-04-28 2023-08-01 哈尔滨工程大学 Radio frequency fingerprint identification method based on feature imitation federal learning
CN116578674A (en) * 2023-07-07 2023-08-11 北京邮电大学 Federal variation self-coding theme model training method, theme prediction method and device
CN116578674B (en) * 2023-07-07 2023-10-31 北京邮电大学 Federal variation self-coding theme model training method, theme prediction method and device
CN116741388A (en) * 2023-08-14 2023-09-12 中国人民解放军总医院 Method for constructing cardiovascular critical severe disease large model based on federal learning
CN116741388B (en) * 2023-08-14 2023-11-21 中国人民解放军总医院 Method for constructing cardiovascular critical severe disease large model based on federal learning
CN117094412A (en) * 2023-08-18 2023-11-21 之江实验室 Federal learning method and device aiming at non-independent co-distributed medical scene
CN117077817B (en) * 2023-10-13 2024-01-30 之江实验室 Personalized federal learning model training method and device based on label distribution
CN117077817A (en) * 2023-10-13 2023-11-17 之江实验室 Personalized federal learning model training method and device based on label distribution

Similar Documents

Publication Publication Date Title
CN115238908A (en) Data generation method based on variational self-encoder, unsupervised clustering algorithm and federal learning
CN109523463A (en) A kind of face aging method generating confrontation network based on condition
CN112770291B (en) Distributed intrusion detection method and system based on federal learning and trust evaluation
CN112329940A (en) Personalized model training method and system combining federal learning and user portrait
CN115510494B (en) Multiparty safety data sharing method based on block chain and federal learning
CN114092769B (en) Transformer substation multi-scene inspection analysis method based on federal learning
CN110263928A (en) Protect the mobile device-based distributed deep learning training method of data-privacy
CN113518007B (en) Multi-internet-of-things equipment heterogeneous model efficient mutual learning method based on federal learning
CN113806735A (en) Execution and evaluation dual-network personalized federal learning intrusion detection method and system
CN113191530B (en) Block link point reliability prediction method and system with privacy protection function
CN114997420B (en) Federal learning system and method based on segmentation learning and differential privacy fusion
CN115905978A (en) Fault diagnosis method and system based on layered federal learning
CN111192206A (en) Method for improving image definition
Mei et al. Fedvf: Personalized federated learning based on layer-wise parameter updates with variable frequency
CN113972012A (en) Infectious disease prevention and control cooperative system based on alliance chain and public chain
CN116862022A (en) Efficient privacy protection personalized federal learning method for communication
CN117371555A (en) Federal learning model training method based on domain generalization technology and unsupervised clustering algorithm
CN114048838A (en) Knowledge migration-based hybrid federal learning method
CN116502709A (en) Heterogeneous federal learning method and device
CN116719607A (en) Model updating method and system based on federal learning
CN116843069A (en) Commuting flow estimation method and system based on crowd activity intensity characteristics
CN115730267A (en) Multi-source unbalanced credit data fusion method and system based on federal distillation learning
CN115908600A (en) Massive image reconstruction method based on prior regularization
CN116010832A (en) Federal clustering method, federal clustering device, central server, federal clustering system and electronic equipment
CN115345320A (en) Method for realizing personalized model under layered federal learning framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination