CN115563519A

CN115563519A - Federal contrast clustering learning method and system for non-independent same-distribution data

Info

Publication number: CN115563519A
Application number: CN202211267754.4A
Authority: CN
Inventors: 李瑞轩; 王号召; 徐子珺; 李玉华; 辜希武
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2023-01-03

Abstract

The invention discloses a federal contrast clustering learning method and a system for non-independent same-distribution data, belonging to the technical field of federal learning and comprising the following steps: the method comprises the steps that a shared model and a prototype matrix are respectively issued to selected clients at a central server side for cluster comparison training, the model does not pay attention to data distribution when the cluster comparison training is carried out, only the comparison relation among data cluster information is paid attention to, the dependence on global distribution is relieved based on the characteristic that self-supervision comparison learning pays attention to local characteristics, so that the deviation of the model caused by unbalanced data distribution is improved, the class distribution of data can be well represented, the Non-IID problem caused by class unbalance is improved, the dependence of federal learning on labeled data can be eliminated, and the technical problems that the model accuracy is low due to the fact that the existing federal learning method has the Non-IID data distribution problem and the model cannot be applied in actual production due to the fact that the model depends on the labeled data can be solved.

Description

Federal contrast clustering learning method and system for non-independent same-distribution data

Technical Field

The invention belongs to the technical field of federal learning, and particularly relates to a federal contrast clustering learning method and a system for non-independent same-distribution data.

Background

In the current Internet + era, people generate and store huge amounts of data through edge devices such as smart phones, internet of things (IoT) devices, wearable devices, and the like, and the data provide huge learning data samples for further development and application of deep learning. Deep learning provides an algorithmic basis for intelligent applications that serve billions of people per day. However, with the rapid development of deep learning, the architecture for uploading personal data to a cloud server for data centralized storage processing and model training has difficulty in eliminating the growing concerns of privacy and security of personal data, and the high cost and high delay caused by the architecture are also unacceptable. Thus, the Google team proposed a Federal Learning (FL) method in 2016, which is a distributed method that can decouple training data from the devices used to train the model. The original data is still stored on the user equipment, and the combined model is trained through cooperation of the user equipment and the central server. On the central server, only updates and analysis results of local computations are received and aggregated for enhancing the global model, and then new models are shared with the clients for sharing knowledge among the clients, enabling higher security and privacy of the user data.

Although the federal learning method solves the problems of security and privacy to some extent, it also brings many problems, first, the Non-IID data distribution problem of data: the excessively numerous and disordered data distribution causes that the conventional federated learning algorithm cannot effectively extract data features conforming to the global situation from data following Non-IID data distribution, and the Non-IID degree of the data is further aggravated by the problems of too little data quantity, too few data types, too discrete data distribution among users and the like. In order to improve the damage of Non-IID data to a model, the existing federated learning algorithm generally improves the overall distribution of data at a data level by using methods of sharing user data and characteristics, enhancing and expanding the user data with small data quantity by data enhancement and the like, and does not solve the Non-IID problem from the algorithm perspective. In addition, a conventional federated learning algorithm is usually trained by using a machine learning method based on labeled data, and depends on the labeled data, however, in an actual application scenario, a user has a large amount of unlabeled data, so that federated learning cannot be applied in actual production.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides a federal contrast cluster learning method and a system for Non-independent same-distribution data, which are used for solving the technical problems that the prior federal learning method has low model accuracy due to the Non-IID data distribution problem and cannot be applied in actual production due to the dependence on tag data.

In order to achieve the above object, in a first aspect, the present invention provides a federated contrast clustering learning method for non-independent and identically distributed data, including:

the following steps are executed at the central server side:

a11, initializing vector representation of K clustering centers to obtain a prototype matrix; initializing a sharing model;

a12, randomly selecting m clients from the client set, and respectively issuing a shared model and a prototype matrix to the selected m clients for training;

a13, after the trained shared models and prototype matrices returned by m clients are collected, taking the proportion of the data volume in the clients in all the data volumes as weight, carrying out weighted summation on the trained shared models returned by the clients to obtain an aggregation model, and carrying out weighted summation on the trained prototype matrices returned by the clients to obtain an aggregation matrix;

a14, updating the shared model into an aggregation model, updating the prototype matrix into an aggregation matrix, then performing normalization processing on the prototype matrix, and repeating the steps A12-A13 to perform iteration until the preset iteration times are reached; the shared model at this moment is a trained model;

after receiving the shared model and the prototype matrix issued by the central server, the client inputs the local data set in the client into the shared model for cluster comparison training, and the method specifically includes: performing the following for each local data sample in the local dataset in the client:

B11, random data enhancement is respectively carried out on the local data samples twice to obtain a first comparison sample and a second comparison sample;

b12, inputting the first comparison sample and the second comparison sample into the received shared model respectively to obtain a first sample characteristic and a second sample characteristic;

b13, mapping and matching the first sample characteristic and the second sample characteristic with the received vector representation of each clustering center in the prototype matrix to obtain a first characteristic coding vector and a second characteristic coding vector;

and B14, updating the parameters in the received shared model and the received prototype matrix thereof by minimizing the cross entropy loss between the first sample feature and the second feature coding vector and the cross entropy loss between the second sample feature and the first feature coding vector.

Further preferably, the loss function L (z) in the step B14 is a loss function in the clustering comparison training of the shared model _t ，z _s ) Comprises the following steps:

L(z _s ,z _t )＝l(z _s ,q _t )+l(z _t ,q _s )

wherein z is _s Is a first sample characteristic；z _t Is a second sample feature; q. q.s _s Encoding a vector for the first feature; q. q.s _t Encoding the vector for the second feature; q. q.s _t (k) Coding the kth feature in the second feature coding vector; c. C _k Vector representation for the kth cluster center in the prototype matrix; tau is a temperature coefficient; q. q.s _s (k) The kth feature in the first feature encoding vector is encoded.

Further preferably, the clients perform the clustering comparison training in parallel.

In a second aspect, the invention provides a federated contrast clustering learning method for non-independent same-distribution data, which includes:

the following steps are executed at the central server side:

a21, initializing vector representation of K cluster centers to obtain initial representation C of prototype matrix ₀ (ii) a Constructing a prototype array with the length of m, storing the prototype matrix returned by the client correspondingly according to the client number, and respectively initializing the prototype array to C ₀ (ii) a Initializing a sharing model;

a22, randomly selecting m clients from the client set, and respectively issuing a shared model and a prototype matrix to the selected m clients for training; in the process of sending the shared model and the prototype matrix to the tth client, randomly selecting a prototype matrix from the prototype array except the prototype matrix corresponding to the tth client, and recording the prototype matrix as a matrix C _P Will matrix C _P Meanwhile, the training data is sent to the t-th client for training; t =1,2, …, m;

a23, after the trained shared models and prototype matrices returned by m clients are collected, correspondingly storing the prototype matrices returned by each client into corresponding positions of a prototype array, weighting and summing the trained shared models returned by each client by taking the proportion of the data volume in the client in all the data volumes as weight to obtain a polymerization model, and weighting and summing the trained prototype matrices returned by each client to obtain a polymerization matrix;

A24, updating the shared model into an aggregation model, updating the prototype matrix into an aggregation matrix, then carrying out normalization processing on the prototype matrix, and repeating the steps A22-A23 for iteration until the preset iteration times are reached; the shared model at this moment is a trained model;

when the client receives the sharing model, the prototype matrix and the matrix C issued by the central server _P Then, inputting the local data set in the client into the shared model for cluster comparison training, specifically comprising: performing the following for each local data sample in the local dataset in the client:

b21, random data enhancement is respectively carried out on the local data samples twice to obtain a first comparison sample and a second comparison sample;

b22, inputting the first comparison sample and the second comparison sample into the received shared model respectively to obtain a first sample characteristic and a second sample characteristic;

b23, mapping and matching the first sample characteristic and the second sample characteristic with the received vector representation of each clustering center in the prototype matrix to obtain a first characteristic coding vector and a second characteristic coding vector;

respectively combining the first sample characteristic and the second sample characteristic with the received matrix C _P Carrying out mapping matching on the vector representation of each cluster center to obtain a third feature coding vector and a fourth feature coding vector;

and B24, updating the parameters and the prototype matrix in the shared model by minimizing the cross entropy loss between the first sample characteristic and the second characteristic coding vector, the cross entropy loss between the second sample characteristic and the first characteristic coding vector, the cross entropy loss between the first sample characteristic and the fourth characteristic coding vector and the cross entropy loss between the second sample characteristic and the third characteristic coding vector.

Further preferably, the loss function L (z) in the step B24 of cluster comparison training of the shared model is _t ，z _s ) Comprises the following steps:

L(z _s ,z _t )＝l(z _s ,q _t )+l(z _t ,q _s )+l′(z _s ,q _t ')+l′(z _t ,q _s ')

wherein z is _s Is a first sample feature; z is a radical of _t Is a second sample feature; q. q.s _s Encoding a vector for the first feature; q. q.s _t Encoding a vector for the second feature; q. q.s _s ' is a third feature encoding vector; q. q.s _t ' is a fourth feature encoding vector; q. q of _t (k) Coding the kth feature in the second feature coding vector; c. C _k Vector representation of the kth cluster center in the prototype matrix; tau is a temperature coefficient; q. q.s _s (k) Coding the kth feature in the first feature coding vector; q. q.s _t ' (k) is the kth feature code in the fourth feature code vector; d _k Is a matrix C _P A vector representation of the kth cluster center in (a); q. q.s _s ' (k) encodes the kth feature in the third feature encoding vector.

Further preferably, the temperature coefficient is initialized by the central server and then transmitted to each client.

Further preferably, the loss function L (z) in the step B24 of performing cluster comparison training on the shared model is _t ，z _s ) Comprises the following steps:

L(z _s ,z _t )＝l(z _s ,q _t )+l(z _t ,q _s )+l″(z _s ,q _t ')+l″(z _t ,q _s ')

wherein z is _s Is a first sample feature; z is a radical of _t Is a second sample feature; q. q.s _s Encoding a vector for the first feature; q. q.s _t Encoding a vector for the second feature; q. q.s _s ' is a third feature encoding vector; q. q of _t ' is a fourth feature encoding vector; q. q.s _t (k) Coding the kth feature in the second feature coding vector; c. C _k Vector representation of the kth cluster center in the prototype matrix; tau is a temperature coefficient; q. q.s _s (k) Coding the kth feature in the first feature coding vector; q. q.s _t ' (k) is the kth feature code in the fourth feature code vector; τ' is the temperature difference coefficient; d _k Is a matrix C _P A vector representation of the kth cluster center in (a); q. q.s _s ' (k) encodes the kth feature in the third feature encoding vector.

Further preferably, the temperature coefficient and the temperature difference coefficient are initialized by the central server and then sent to each client.

Further preferably, the temperature coefficient of difference is smaller than the temperature coefficient.

In a third aspect, the present invention provides a federated contrastive clustering learning system for non-independent same distribution data, including: the computer program is executed by the processor, and the federated contrast cluster learning method provided by the first aspect of the invention and/or the federated contrast cluster learning method provided by the second aspect of the invention are/is executed by the processor.

In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where the computer program, when executed by a processor, controls an apparatus on which the storage medium is located to perform the federated contrast cluster learning method provided in the first aspect of the present invention and/or the federated contrast cluster learning method provided in the second aspect of the present invention.

Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:

1. the invention provides a federated contrast clustering learning method, wherein a shared model and a prototype matrix are respectively issued to each selected client side at a central server side for clustering contrast training, the model does not pay attention to the distribution of data but only pays attention to the contrast relationship among data clustering information when the clustering contrast training is carried out, and the dependency on the global distribution is greatly relieved based on the characteristic that the self-supervision contrast learning pays attention to local characteristics, so that the model deviation caused by uneven data distribution in the global range can be effectively improved, the class distribution of the data can be well represented, the Non-IID problem caused by class imbalance can be improved, and the dependency of federated learning on labeled data can be eliminated. After the central server side collects the trained shared models and prototype matrices returned by all the clients, in order to ensure that the prototype matrices can effectively reflect the global clustering center, the prototype matrices are averaged and then the normalization processing is performed on the obtained prototype matrices, so that the prototype matrices are fused with the local clustering centers of all the clients before normalization, and the normalization is performed on the basis to ensure that the prototypes still reflect the global clustering center. Based on the method, the invention can solve the technical problems that the existing federal learning method has low model accuracy due to the Non-IID data distribution problem and cannot be applied in actual production due to the dependence on tag data.

2. The second aspect of the invention provides a federate comparison clustering learning method for sharing a center vector, which improves the federate comparison clustering learning method provided by the first aspect on a higher dimensionality by sharing the center vector, namely, directly starting from the data distribution angle, and improving the data offset of users by using prototype matrixes of different users. In the method, each user shares the prototype matrix to the global, the local prototype matrix and the random global prototype matrix are simultaneously used for mapping the local data in the training process, and the loss function calculates the loss according to the deviation of the local data to the local prototype matrix, so that the local prototype matrix covers the local data distribution and the data distribution of other users represented by the global prototype matrix as far as possible at the same time, and the local prototype matrix does not simply reflect the local data distribution, thereby achieving the effect of improving the model. Meanwhile, the prototype matrix generated by the invention reflects the mixture of the distribution of the local data in the high-dimensional space and the distribution of other user data, and usually cannot be semantically interpreted, and an attacker cannot reversely obtain the user original data from the matrix, so that the safety of the user data can be effectively ensured.

3. The third aspect of the invention provides a federal contrast cluster learning method based on temperature difference and shared center vectors, on the basis of the federal contrast cluster learning method of the shared center vectors provided by the second aspect, a temperature difference coefficient is introduced to enable partial cluster centers to be more rapidly transferred to a non-local data distribution area in the training process, different loss functions are designed for different prototypes by setting different temperature parameters for the local prototypes and the global prototypes, so that the local prototype matrix can effectively learn the characteristics of different data distributions at the same time, the shared model can be ensured to be correctly identified facing the characteristics belonging to different distributions, the obtained global feature space represents more uniformly, and the accuracy of the data characteristics learned by the model is improved. Meanwhile, the loss function based on the temperature difference coefficient utilizes the characteristic that the user data are far away from each other in distribution, so that the probability of overfitting is reduced, the gradient direction of the model in the training process is more stable, the model can be converged towards a local optimal point more accurately, and the convergence speed of the model is accelerated.

Drawings

FIG. 1 is a schematic diagram of a federated comparison cluster learning method framework oriented to non-independent same-distribution data provided by the present invention;

Fig. 2 is a flowchart of cluster comparison training performed in the client according to embodiment 1 of the present invention;

FIG. 3 is a flowchart of a federated contrast clustering learning method provided in embodiment 1 of the present invention;

fig. 4 is a schematic diagram illustrating a relationship between user data distribution and cluster center distribution according to embodiment 1 of the present invention; wherein, (a) is the distribution of the clustering centers in an ideal state; (b) The distribution condition and the variation trend of the clustering center under Non-IID data distribution before training; (c) the distribution condition of the trained clustering centers;

fig. 5 is a schematic diagram of a client update process according to embodiment 3 of the present invention;

fig. 6 is a flowchart of a federal comparative cluster learning method based on temperature differences and a shared center vector according to embodiment 3 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The theory of federal learning is proposed to solve the data islanding problem formed in the context of privacy protection, and is a framework for training a model by means of a distributed machine learning idea, wherein the model is divided into a central server and a client according to different roles participating in training.

The clients can be low-performance mobile devices, each client has local private data for training the model, and the data generally cannot be directly transmitted or exchanged in the training process so as to ensure the privacy security of the data. The clients are independent and independent from each other and can be parallel during training, so that the training efficiency is greatly improved. The central server is generally composed of a high-performance server, and is mainly responsible for scheduling each client in federal learning training, collecting models after client training for aggregation, and distributing the models after aggregation to each client participating in training.

In the training of federal learning, the participants that make up the client can be heterogeneous devices, so that federal learning can make a large use of idle devices. And the requirement on hardware of the client is not high, the training can be carried out on the mobile terminal and the Internet of things equipment, and the threshold of participating in federal learning is relaxed, so that the low-efficiency scattered equipment is effectively integrated. In the training process of the federal learning, the participating end can join or quit the training task of the federal learning at any time, and the characteristic also greatly improves the flexibility of the federal learning.

With the continuous development of the information age, the requirements for the bang study are continuously increased by larger and larger data volume and more users. This is both an opportunity and a challenge for federal learning, and the massive data brought by massive users from different regions, different professions and different hobbies, is a valuable wealth for federal learning, but it also brings many problems, firstly the Non-IID data distribution problem of the data: the excessively jumbled and disordered data distribution results in that the conventional federated learning algorithm cannot effectively extract data features conforming to the whole situation from the data following the Non-IID data distribution, and the problem that the data quantity is too small, the data types are too few, or the data distribution among users is too discrete further aggravates the Non-IID degree of the data. Meanwhile, a conventional federal learning algorithm is usually trained by using a machine learning method based on labeled data, but the algorithm is difficult to operate due to massive unlabeled data owned by a user in an actual application scene.

In order to solve the above problems, as shown in fig. 1, the present invention provides a federated contrast clustering learning method for non-independent and identically distributed data, which is applied to a distributed system, and specifically includes the following steps:

Examples 1,

A federal contrast clustering learning method facing to non-independent same distribution data comprises the following steps:

the following steps are executed at the central server side:

a14, updating the shared model into an aggregation model, updating the prototype matrix into an aggregation matrix, then performing normalization processing on the prototype matrix, and repeating the steps A12-A13 to perform iteration until a preset iteration number is reached (the value in the embodiment is 500); the shared model at this moment is a trained model;

after receiving the shared model and the prototype matrix issued by the central server, the client inputs the local data set in the client into the shared model for cluster comparison training, as shown in fig. 2, specifically including: performing the following for each local data sample in the local dataset in the client:

specifically, the feature coding vector represents the similarity degree of the sample feature and the vector of each cluster center in the prototype matrix; the cosine similarity can be used for measurement, or the inner product of the sample features and the cluster center vector can be directly calculated for measurement.

And B14, updating the parameters in the received shared model and the received prototype matrix thereof by minimizing the cross entropy loss between the first sample characteristic and the second characteristic coding vector and the cross entropy loss between the second sample characteristic and the first characteristic coding vector.

Preferably, the clients perform the clustering comparison training in parallel.

In this embodiment, the solution in step B14 is an exchange prediction problem, and the corresponding loss function represents the fitting degree between the sample feature and the feature code; specifically, the loss function L (z) in the step B14 is used for performing cluster comparison training on the shared model _t ，z _s ) Comprises the following steps:

L(z _s ,z _t )＝l(z _s ,qt)+l(z _t ,q _s )

wherein z is _s Is a first sample feature; z is a radical of _t Is a second sample feature; q. q.s _s Encoding a vector for the first feature; q. q.s _t Encoding the vector for the second feature; q. q.s _t (k) Encoding the kth feature in the vector for the second featureCoding; c. C _k Vector representation for the kth cluster center in the prototype matrix; tau is a temperature coefficient; q. q of _s (k) The kth feature in the first feature encoding vector is encoded.

In particular, the loss function L consists of two terms, one by one through the sample feature z _t To predict the characteristic code q _s And by sample feature z _s To predict the characteristic code q _t Through these two terms, a swap prediction problem is formed. To find a solution that minimizes the loss function L, the swap prediction problem needs to be solved. Each term in the function L represents the cross-entropy loss between the feature encoding and the probability obtained by softmax on the dot product of the sample features and all vectors in the prototype matrix.

In the training process, the loss function is minimized by updating the parameters in the received shared model and the vector representation in the received prototype matrix. The feature coding calculation is performed using only the sample features and prototype matrices of the current batch during each training process, i.e. the prototype matrices are updated according to the current data distribution during each training process without the need to recalculate the cluster distribution from scratch as new data is introduced. By using the prototype matrix to calculate the codes, it is required that all the features of the samples (image samples are used in this embodiment) in the current batch are uniformly distributed to the respective cluster centers, and this equipartition constraint ensures that the feature codes of different images in each batch are different, otherwise the model collapses to the point that the same codes are generated for all the samples.

It should be noted that, in order to solve the technical problem that the conventional federal learning method depends on labeled data, the invention provides the federal learning method based on the unsupervised deep learning method in consideration that the unsupervised learning allows a model to provide only samples in the training process without providing corresponding labels for training. In the unsupervised deep learning method, contrast learning does not depend on the label of data and focuses on the contrast between local samples, so that the contrast learning is insensitive to the global distribution of the data, and meanwhile, the clustering-based deep learning algorithm can better represent the class distribution of the data and improve the Non-IID problem caused by class imbalance. Therefore, in order to better improve the damage of Non-IID data distribution to a model from the aspect of an algorithm and solve the problem of dependence of the algorithm on labeled data, the invention provides a federated learning method based on clustering contrast learning, and the cluster-based method and the contrast learning method are insensitive to global data distribution to respectively improve the Non-IID problem caused by class imbalance among user data and the influence of the Non-IID data distribution on the model.

Specifically, when the client performs cluster comparison training, the prototype matrix needs to be normalized after being updated through back propagation to ensure normal distribution of the cluster centers, but if the cluster centers are normalized at the client, the distribution of the cluster centers is greatly changed, because the prototype matrix at the client is obtained through training of user local data, only the local cluster centers of the prototype matrix can be reflected, but not the global cluster centers. In order to ensure that the prototype matrix can effectively reflect the global clustering center, the embodiment places the normalization step of the prototype matrix into the central server for processing, and normalizes the obtained prototype matrix after the central server receives and averages the prototype matrices from all the clients, so that the prototype matrix is fused with the local clustering centers of all the clients before normalization, and the normalization is carried out on the basis, so that the prototype matrix can still reflect the global clustering center without deviating too far.

Further, the local data set is a local private data set, and is determined by specific tasks of the training model, such as an image classification task, and other visual tasks such as target detection and image segmentation. Taking an image classification task as an example, in contrast learning, in order to obtain more positive samples or negative samples, the number of samples is generally extended by using an image cropping method, but in a federal environment, since a user equipment is generally in a resource-limited state due to a conventional image cropping method, in order to extend the number of samples while reducing the requirements of a training process on memory and computing resources, since the user equipment is generally in a resource-limited state, a Multi-crop (Multi-crop) strategy is introduced to reduce the consumption of memory and computing by the conventional cropping method, the method uses two standard-resolution crops and samples V additional low-resolution crops covering only a small part of an image, and the memory and computing consumption brought by the conventional cropping method can be effectively reduced by using the low-resolution crops.

Next, the working flow of the federated contrast clustering learning method provided in this embodiment will be described in detail, and as shown in fig. 3, the central server first determines the central number of the archetypes and initializes the prototype matrix C according to the central number ₀ Then model w is initialized ₀ The model and the prototype are initialized by normalized random values, and meanwhile, clients participating in training are selected at the clients and data are distributed to the clients according to a data division method. And distributing the models and the prototype matrixes from the central server to each client participating in training after the models and the prototypes are prepared and the data distribution is completed. Client k is receiving model w _i And prototype matrix C _i Then, the local data set is read to start the cluster comparison training. Obtaining the model after finishing the local training of the client

And prototype matrix

And the client k transmits the trained model and prototype back to the central server for aggregation. After obtaining the updated models and prototypes of all the clients, the central server performs aggregation update on the model parameters and the prototype matrix in a weighted average mode, and performs normalization processing on the prototype matrix to obtain a model w _i+1 And prototype C _i+1 And then, adding one operation to the round i and judging whether the model is converged, if not, distributing the model and the prototype to the client for training again, and if so, returning to the final model. Pseudo codes of the federal comparative cluster learning method provided by the embodiment are shown in table 1.

TABLE 1

Examples 2,

Although the federal contrast cluster learning method provided in embodiment 1 of the present invention can improve the Non-IID problem and the technical problem of dependency on labeled data faced by federal learning, the above method only focuses on a single sample (e.g., an image) itself and is insensitive to data distribution, so that a prototype matrix can be corrected while a shared model is corrected to ensure that the model does not deviate greatly, but a user with a large data volume can still make the prototype matrix deviate greatly to the local distribution of the user under the condition of a large Non-IID degree, thereby causing the prototype matrix to fail to effectively reflect global data feature distribution. Moreover, the introduced contrast learning has a characteristic that the feature contrast among samples and the data distribution of local samples are emphasized, and in a federal environment, the generated model of a single client is over biased to the local data distribution, so that overfitting is caused.

In order to solve the above problems, many existing frameworks enhance the model by using a shared data or pre-trained model to train with more global information, so as to improve the effect of federal learning, for example, the FedCA algorithm uses a method of sharing data features and preparing an additional data set, which can expand users with too small data amount and too small data types and simultaneously pull up the data distribution among the users, but this also brings the problem of privacy disclosure: when the method is used, the data of the user is transmitted to other user equipment for training, the data can be intercepted by a hacker in the transmission process and can also be utilized by a malicious user when the user equipment is trained, and the purpose of protecting privacy is violated by federal study.

In order to solve the deviation problem of the federate comparison clustering learning method provided by embodiment 1 of the present invention and avoid the potential privacy leakage risk by using a method of sharing user information, the present embodiment provides a federate comparison clustering learning method of sharing a center vector, in which a user local prototype is shared to the global, and when the user performs mapping through a local prototype, prototype matrices of other users are simultaneously used for mapping to correct the deviation of the prototype matrices. Compared with the existing method for relieving the model effect damage caused by overlarge influence of users with a large amount of data on the model under Non-IID data distribution through methods of sharing data and characteristics or introducing a pre-training model and the like, the Federal contrast clustering learning method for sharing the center vector can avoid data leakage and simultaneously does not need to additionally introduce the pre-training model to enhance the model effect.

Specifically, the federate comparison clustering learning method for sharing a center vector provided by this embodiment includes:

the following steps are executed at the central server side:

a23, after the trained shared models and prototype matrices returned by m clients are collected, correspondingly storing the prototype matrices returned by each client into corresponding positions of a prototype array, weighting and summing the trained shared models returned by each client by taking the proportion of the data volume in the clients in all the data volumes as weight to obtain a polymerization model, and weighting and summing the trained prototype matrices returned by each client to obtain a polymerization matrix;

a24, updating the shared model into an aggregation model, updating the prototype matrix into an aggregation matrix, then performing normalization processing on the prototype matrix, and repeating the steps A22-A23 to perform iteration until a preset iteration number is reached (the value in the embodiment is 500); the shared model at this moment is a trained model;

And B24, updating the parameters in the shared model and the prototype matrix by minimizing the cross entropy loss between the first sample characteristic and the second characteristic coding vector, the cross entropy loss between the second sample characteristic and the first characteristic coding vector, the cross entropy loss between the first sample characteristic and the fourth characteristic coding vector and the cross entropy loss between the second sample characteristic and the third characteristic coding vector.

Preferably, the temperature coefficient is initialized by the central server and then sent to each client.

It should be noted that, in a conventional manner of sharing data or feature vectors, the offset caused by the data itself is solved by adding user data, so that it is alleviated that the model cannot correctly reflect global data distribution due to extreme user data, and the federate contrast clustering learning method provided in embodiment 1 of the present invention can improve the algorithm in a higher dimension, that is, directly start from the perspective of data distribution, and improve the data offset of the user itself by using prototype matrices of different users. In the federal contrast cluster learning method provided in embodiment 1 of the present invention, each user shares a prototype matrix to the global, and in the training process, local data is mapped by using a local prototype matrix and a random global prototype matrix at the same time, and a loss function calculates a loss according to a deviation of the local data from the local prototype matrix to the local data, so that the local prototype matrix covers the local data distribution and data distributions of other users represented by the global prototype matrix as much as possible at the same time, instead of simply reflecting only the local data distribution, thereby achieving an effect of improving a model. On this basis, the federate comparison clustering learning method provided by embodiment 1 of the present invention reflects, for the prototype matrix generated by the client, the mixture between the distribution of the local data in the high-dimensional space and the distribution of other user data, and usually cannot be semantically interpreted, and at the same time, an attacker cannot reversely obtain the user original data from the matrix, so the federate comparison clustering learning method for sharing the center vector provided by this embodiment can also effectively share the global information for each user while ensuring security.

Specifically, the shared cluster center vector is a prototype matrix shared after local training of the client, which is not normalized yet, and the prototype matrix shared without normalization is selected because normalization affects the original distribution of data. In order to implement sharing of the cluster center vector, in this embodiment, a prototype array is maintained, the length of the prototype array is the number of users participating in training, and the latest prototype matrix of the users is stored according to the user numbers, and the prototype array is initialized by using the original prototype matrix. In the training process, the central server randomly selects a prototype matrix of other clients and sends the prototype matrix to the designated client, and local client not only can carry out loss calculation of the exchange prediction problem on the prototype matrix of the central server, but also can carry out loss calculation of the exchange prediction problem on the prototype matrix sent by the server.

Specifically, the loss function L (z) in the above step B24 _t ，z _s ) Comprises the following steps:

wherein z is _s Is a first sample feature; z is a radical of _t Is as followsTwo sample characteristics; q. q.s _s Encoding a vector for the first feature; q. q.s _t Encoding a vector for the second feature; q. q.s _s ' is a third feature encoding vector; q. q.s _t ' is a fourth feature encoding vector; q. q.s _t (k) Coding the kth feature in the second feature coding vector; c. C _k Vector representation of the kth cluster center in the prototype matrix; tau is a temperature coefficient; q. q of _s (k) Coding the kth feature in the first feature coding vector; q. q of _t ' (k) is the kth feature code in the fourth feature code vector; d is a radical of _k Is a matrix C _P A vector representation of the kth cluster center in (a); q. q.s _s ' (k) encodes the kth feature in the third feature encoding vector. Pseudo codes of the federate comparison clustering learning method for sharing a center vector provided by the embodiment are shown in table 2.

TABLE 2

Examples 3,

The federate contrast cluster learning method for sharing a center vector provided in embodiment 2 is trained based on a cluster-based contrast learning algorithm, and refreshes the model effect through sharing a cluster, but it still has a defect because if the data characteristics under Non-IID data distribution cannot be better utilized, model offset caused by local data training of users cannot be effectively solved by pure sharing global information, and therefore, considering the data distribution difference between different users under Non-IID data distribution, the federate contrast cluster learning method for sharing a center vector provided in embodiment 2 still has an improvement space. In the training process, in order to make the prototype matrix reflect the global data distribution better, the local prototype matrix should cover the local data distribution and the data scores of the rest users represented by the global prototype matrix at the same time as much as possible When training, part of the clustering centers in the local prototype matrix are close to the local data distribution, and the other part of the clustering centers are biased to the data distribution represented by the global prototype matrix, and finally the effect of simultaneously covering the two distributions is achieved. The federate comparison clustering learning method for sharing a center vector provided in embodiment 2 adopts the same loss function for the user local prototype and the global prototype when calculating loss, which makes the speed of the clustering center closing to the two data distributions consistent, but because the user local data distribution is closer to the local prototype matrix and is far away from the global prototype matrix under Non-IID data distribution, the clustering center closing to the data distribution represented by the global prototype matrix needs to have a faster moving speed to ensure that the clustering center moving to the local data distribution reaches the target region at the same time. Therefore, in order to solve the problem, this embodiment proposes a federate comparison clustering learning method based on temperature difference and shared center vector on the basis of the federate comparison clustering learning method of shared center vector provided in embodiment 2, which introduces a temperature difference coefficient τ' on the basis of the federate comparison clustering learning method of shared center vector provided in embodiment 2 and uses the global prototype array C by the user _g The loss is calculated for the following reasons:

for the learning of the cluster center, the algorithm often encounters some troublesome problems, such as empty cluster clustering and collapsed parameter problems. When the discriminant model learns the boundaries between different classes, all inputs are assigned to a class when the discriminant model has the best decision boundary, so that there is no sample distribution around the center of other clusters, which is called a null clustering problem. In this embodiment, this problem is implicitly avoided because the cluster center is constantly changing with user local training and global aggregation. The parametric problem of collapse is that if most images in a dataset are assigned to a few cluster centers, the model parameters θ will specifically distinguish only these few classes resulting in an overfitting. In a real federal learning environment, different user devices have different data sets, the data come from various behaviors and habits of users, and particularly, the data sets are sample data sets (such as picture data sets), the data sets come from aspects of personal preferences, work and the like of the users, so the data sets usually have obvious preferences, the data sets in the user devices are more and more pictures of some types over time, meanwhile, due to the fact that the personal preferences and other influence factors of each user are different in a large probability, the types of pictures locally gathered by each user are often inconsistent, the picture types are reflected on the overall data distribution as high Non-IID data distribution, and the collapsed parameter problem is very easy to occur when the number of images of each type under the Non-IID data distribution is highly unbalanced. In the invention, images are sampled based on uniform distribution on class or pseudo labels, consistency constraint is utilized to solve the problem, but in addition to the parameter problem of collapse, data distribution in a federal environment also gives researchers a very important hint that in the federal environment, due to user interests and other factors, user local data are highly cohesive in categories (tend to possess a few kinds of data in specific categories rather than uniformly possess each category), while in the user-to-user environment, the data clustering center distribution is far away from each other (from the statistical viewpoint, when the number of users is large enough, two users are randomly selected, and the data categories owned by the two users are far apart due to personal preference, habits and environment inconsistency), and under the condition that the algorithm has good parameters for avoiding empty clusters and collapse, reasonable utilization of the data distribution characteristic of the Non-IID is beneficial for the model to better learn overall data distribution, and a better model effect is achieved.

Based on the above conclusion, we further analyze the relationship between the user data distribution and the cluster center distribution, which is specifically shown in fig. 4, wherein (a) in fig. 4 shows the distribution of the cluster center in an ideal state; FIG. 4 (b) is a graph showing the distribution and trend of cluster centers under Non-IID data distribution before training; the distribution of the cluster centers after training is shown in the graph (c) in fig. 4. In fig. 4, circles represent a global class space, and triangles represent cluster center vectors of local prototypes of users, and the distribution thereof represents a class space that a model can recognize; the areas with darker shades of gray represent the local data distribution of the user, and the areas with lighter shades of gray represent the data distribution of other users represented by the global prototype matrix. It can be seen that there is no overlap between the lighter and darker gray areas in the graph, indicating that the data distribution between different users is far away from each other, and the areas are narrower, indicating that each user has only a few categories of data. As can be seen from the graph (a) in fig. 4, ideally, the cluster centers owned by the local prototype matrix of the user are uniformly distributed in the global range, which means that the cluster centers can correctly identify all the categories of data, while in the graph (b) in fig. 4, it can be seen that under the Non-IID data distribution, the cluster centers owned by the local prototype matrix of the user are closer to the local data distribution of the user, in the training process, the arrows indicate the moving direction of the cluster centers, in order to enable the local prototype matrix of the user to identify the categories represented by the two distributions simultaneously, a part of the cluster centers will move towards the local data distribution, and another part of the cluster centers will move towards the data distribution of the rest users, and in the graph (c) in fig. 4, if all the cluster centers migrate at the same moving rate, a part of the cluster centers will earlier cover the local data distribution, and the rest of the cluster centers cannot cover the data distribution of the other users. Through the analysis, we can reasonably guess: if the partial clustering centers can be more quickly moved to the non-local data distribution area in the training process, the effect of the model can be improved.

Specifically, it is considered that the federate comparison cluster learning method for sharing a center vector provided in embodiment 2 applies the same loss function to the user local prototype and the global prototype when calculating loss, which makes the speed of the cluster center closing to two different data distributions consistent, but because the class distribution that can be recognized by the user local prototype matrix under the Non-IID data distribution is closer to the local data distribution and is far from the other user data distributions represented by the global prototype matrix, the consistency of the moving speed of the cluster center makes some cluster centers in the local prototype already converge inside the local data distribution, and the remaining cluster centers closing to other user data distributions have not yet reached the target area or cluster centers closing to other user data distributions reach the target area but have already been fitted when moving to the cluster center of the local data distribution. Therefore, the cluster center close to the data distribution represented by the global prototype matrix needs to have a faster moving speed to ensure that the cluster center moving to the local data distribution and the cluster center moving to the local data distribution reach the target area at the same time.

In a conventional comparison learning algorithm, a temperature coefficient τ is generally used in a comparison loss function, and the function of the parameter is to adjust the attention degree of edge samples, within a reasonable interval range, when the temperature coefficient is smaller, the model focuses more on separating the sample itself from the sample with the highest similarity degree, namely focuses more on the sample which is not far away, and the reasonable value interval of the temperature coefficient τ is generally 0.1-1. The function of the temperature coefficient tau enables the contrast loss function to have the property of self-discovered edge negative samples, the property is important for a contrast learning algorithm to learn high-quality feature representation, and the self-discovered edge negative samples have the following functions: for those samples that are already at the edge, there is no need to keep them further away, but rather the main focus is on how to keep those samples that are not yet further away as far as possible, so that the resulting feature representation space is more uniform.

By combining the above analysis, in order to enable the cluster center to move to other user data distributions at a faster rate relative to the migration rate to the local data distribution in the training process, so as to achieve the effect that the cluster center covers the local and non-local data distributions of the user at the same time, the present embodiment adopts the temperature difference on the basis of sharing the center vector and provides the federal cluster comparison algorithm based on the center vector and the temperature difference. In the present embodiment, the effect of the temperature coefficient in the cluster-based contrast learning is further expanded. In order to enable the cluster center to move to other user data distributions at a faster rate, the embodiment redesigns the comparison loss function adopted by the algorithm, unlike the case that the client uses the same temperature parameter for the loss functions of the local user prototype and the other user prototype in embodiment 2, so that the comparison loss function enables the calculated loss value of the local data for the other user data distributions to be larger, which causes the cluster center to move to other user data distributions at a faster rateThe data distribution has a faster speed when moving, and is beneficial to the clustering center to simultaneously cover the data distribution of local and non-local users after training. Specifically, the present embodiment employs new temperature parameters when the client calculates loss values for the exchange prediction problem using the local data against prototype matrices of other users. By setting different temperature parameters for the local prototype and the global prototype, the newly introduced temperature coefficient can enable the local prototype matrix to effectively learn the characteristics of different data distributions at the same time, and ensure that the model encoder can correctly identify the characteristics belonging to different distributions, thereby enabling the obtained global characteristic space to be more uniformly expressed. After introducing a new temperature coefficient, a new loss function L (z) _t ，z _s ) Comprises the following steps:

L(z _s ,z _t )＝l(z _s ,q _t )+l(z _t ,q _s )+l″(z _s ,q _t ')+l″(z _t ,q _s ′)

wherein, the loss function l "used by the non-native prototype; z is a radical of _s Is a first sample feature; z is a radical of _t Is a second sample feature; q. q.s _s Encoding a vector for the first feature; q. q.s _t Encoding a vector for the second feature; q. q.s _s ' is a third feature encoding vector; q. q.s _t ' is a fourth feature encoding vector; q. q of _t (k) Coding the second feature in the vectorThe kth feature code of (1); c. C _k Vector representation of the kth cluster center in the prototype matrix; tau is a temperature coefficient; q. q of _s (k) Coding the kth feature in the first feature coding vector; q. q.s _t ' (k) is the kth feature code in the fourth feature code vector; tau' is a temperature difference coefficient aiming at a non-local prototype loss function and is used for adjusting the speed of the distribution and migration of the clustering center in the local prototype matrix to other user data; d _k Is a matrix C _P A vector representation of the kth cluster center in (a); q. q.s _s ' (k) encodes the kth feature in the third feature encoding vector. In this embodiment, the temperature coefficient and the temperature difference coefficient are both initialized by the central server and then sent to each client.

Preferably, the temperature difference coefficient τ' is smaller than the temperature coefficient τ to achieve the effect of accelerating migration of the cluster center to other user data distributions. It should be noted that the temperature parameter used to calculate the exchange prediction loss value of the data for the non-local prototype matrix should be smaller than the temperature parameter for the local prototype matrix, which can make the exchange prediction loss value for the non-local prototype matrix larger, and this semantically means telling the model: compared with local data distribution, the cluster center distribution of the current data is far away from the data distribution represented by the global prototype matrix, and the model can make the migration speed of the cluster center to the Non-local data distribution faster in order to cover the two distributions simultaneously, which is in accordance with the analysis of Non-IID cluster center distribution in the foregoing. Meanwhile, based on the analysis, the method should have better effect with the increasing degree of Non-IID of the data distribution. It should be noted, however, that this does not mean that the method is not effective in the case of IID data distribution, since IID data distribution does not mean that the data distribution is identical, and the method can be effective as long as the data distribution deviates from user to user. Because the loss function based on the temperature difference coefficient utilizes the characteristic that the user data are far away from each other in distribution, the probability of overfitting is reduced, the gradient direction of the model in the training process is more stable, and the model can be more accurately converged towards the local optimal point, so that the loss function based on the temperature difference coefficient provided by the embodiment The federated comparison clustering learning method based on the temperature difference and the shared center vector can theoretically improve the convergence rate of the model to a certain extent, and the method is beneficial to training the model on edge equipment. Specifically, as shown in FIG. 5, when training is performed locally on the user, the data is enhanced to X _s ，X _t Then, the characteristic Z is obtained through an encoder _s ，Z _t And mapping the characteristic code Q through a local matrix C to obtain a characteristic code Q _s 、Q _t Subsequently to Z _s ，Q _t And Z _t ，Q _s Using tau, C and tau', C, respectively _g And performing exchange prediction to obtain the loss L, and then updating the model and the local prototype C according to the loss L.

As shown in fig. 6, which is a flowchart of the federal comparative cluster learning method based on temperature difference and shared center vector provided in this embodiment, first, the center number K of the prototype is determined in the center server, and the prototype matrix C is initialized accordingly ₀ Then model w is initialized ₀ And according to C ₀ Initializing fill Global prototype array C _g And then initializing a temperature difference coefficient tau', wherein the model and the prototype are initialized by using the normalized random value, and meanwhile, selecting the clients participating in training at the clients and distributing data to each client according to a data partitioning method. Randomly selecting a prototype C ' except the user's own prototype from the global prototype array after model and prototype preparation and data distribution are finished ' _g It is distributed from the central server to each client participating in the training, together with the model and prototype matrix. Client k is receiving global prototype C' _g Model w _i And prototype matrix C _i Then, the local data calculates loss values for the global prototype matrix and the local prototype matrix according to the flow in fig. 6, and the loss values are added to be used as final loss values to perform back propagation, and the model and the local prototype matrix are updated to obtain a new model

And prototype matrix

And returns it to the central server to await aggregation. After obtaining the updated models and prototypes of all the clients, the central server updates the global prototype array by using the prototype matrix which is not normalized, then performs aggregation update on the model parameters and the prototype matrix by adopting a weighted average mode, and performs normalization processing on the prototype matrix to obtain a model w _i+1 And prototype C _i+1 And then, adding one to the round i and judging whether the model converges, if not, distributing the model and the prototype to the client for training again, and if so, returning to the final model.

Pseudo codes of the federal comparative cluster learning method based on temperature difference and shared center vector provided by the embodiment are shown in table 3.

TABLE 3

It should be noted that other technical means except for the loss function are the same as those in embodiment 2, and are not described herein again.

To illustrate the performance of the federated comparative cluster learning method provided in examples 1-3 of the present invention, the present invention uses MNIST, CIFAR-10 and CIFAR-100 data sets as training and testing data sets during the federated learning process. Where MNIST is a handwritten digital black and white image dataset and CIFAR-10 and CIFAR-100 are color image datasets that are closer to a common object. Where the MNIST data set contains ten handwritten digital pictures from 0 to 9, consisting of 60000 pictures, the size of which is 28x28, the CIFAR-10 data set contains color pictures of 10 categories of airplane, car, bird, cat, deer, dog, frog, horse, boat and truck, 50000 training pictures and 10000 test pictures, each category has 6000 pictures, the size of each picture is 32 x 32, the composition of CIFAR-100 data set and CIFAR-10 is similar, except that it contains 100 categories of pictures, each category has 600 pictures. This experiment uses three data sets for picture classification training and to validate model effects. Specifically, the experimental data set is divided into two forms conforming to IID data distribution and Non-IID data distribution, wherein the IID data distribution adopts a random extraction and average distribution mode, and the Non-IID data distribution adopts a Dirichlet distribution function with the distribution coefficient alpha of 0.5 to randomly distribute the number of each class of pictures for each user. The accuracy of the shared model (using the ResNet-18 network in this experiment) obtained by comparing the method provided in embodiments 1 to 3 of the present invention with the existing Swav algorithm and FedCA algorithm under different data distributions of different data sets is shown in table 4.

TABLE 4

As can be seen from table 1, the accuracy of the method provided in embodiment 3 of the present invention on each data set greatly exceeds that of the method provided in embodiment 1 of the present invention due to the simultaneous introduction of the center vector sharing and the temperature difference. The accuracy rate comparison relations of the methods on the three data sets are basically consistent, specifically, taking a CIFAR-10 data set as an example, the accuracy rate of the method provided by the embodiment 3 of the invention under the IID data distribution is 72.34%, which is 3.1% higher than that of FedCA, and the method is almost as good as Swav (72.5%) under the Non-federal learning environment, and meanwhile, the accuracy rate of the method provided by the embodiment 3 of the invention under the Non-IID data distribution is 3.9% higher than that of FedCA, which shows that the federal comparison learning algorithm can be promoted from the method provided by the invention, and the promotion is more obvious under the Non-IID data distribution. Compared with the method provided by the basic embodiment 1 of the invention, the method provided by the embodiment 2 of the invention has the accuracy reduced by 0.2% under the IID data distribution and the accuracy improved by 0.9% under the Non-IID data distribution; the method provided by the embodiment 3 of the invention improves the accuracy rate by 2.7% under the IID data distribution, and improves the accuracy rate by 5.7% under the Non-IID data distribution.

In summary, the invention combines the characteristics of non-independent same distribution data and the federal learning and contrast learning algorithm, and simultaneously excavates the user local data characteristics and the clustering center distribution among the global data by a shared center vector and temperature coefficient difference method from the information contained in the user local data and the global data distribution so as to obtain a more effective characteristic extraction model. Specifically, the invention discloses three novel federated comparison clustering learning methods oriented to non-independent same distribution data, and a more effective network model is obtained from the characteristics of the non-independent same distribution data and privacy protection. The method comprises the following steps: defining a model structure and a contrast clustering method to retain the category information of data; the user center vector sharing is utilized to relieve the offset of the model to the users with large data volume under the non-independent same-distribution data distribution; and further leading the clustering center in training to distribute different migration rates to the data of each user by introducing a temperature difference coefficient, and simultaneously covering the data distribution of each user in the whole world by the clustering center after training as much as possible to obtain the federal learning method which is well suitable for non-independent same-distribution data.

Examples 4,

A federated contrast clustering learning system oriented to non-independent same distribution data comprises: the storage device comprises a storage device and a processor, wherein the storage device stores a computer program, and the processor executes the computer program to execute the federated contrast cluster learning method provided in embodiment 1, embodiment 2 and/or embodiment 3 of the present invention.

The related technical scheme is the same as the embodiment 1-3, and is not described herein.

Examples 5,

A computer-readable storage medium, which includes a stored computer program, wherein when the computer program is executed by a processor, the computer program controls a device on which the storage medium is located to execute the federated contrast cluster learning method provided in embodiment 1, embodiment 2, and/or embodiment 3 of the present invention.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A federated contrast clustering learning method for non-independent same-distribution data is characterized by comprising the following steps:

The following steps are executed at the central server side:

2. The federated contrast cluster learning method of claim 1, wherein each client performs the cluster contrast training in parallel.

3. A federated contrast clustering learning method for non-independent same-distribution data is characterized by comprising the following steps:

the following steps are executed at the central server side:

4. The federated contrast cluster learning method of claim 3, wherein the loss function L (z) in step B24 when performing cluster contrast training on the shared model _t ，z _s ) Comprises the following steps:

wherein z is _s Is a first sample feature; z is a radical of _t Is a second sample feature; q. q.s _s Encoding a vector for the first feature; q. q.s _t Encoding a vector for the second feature; q. q.s _s ' is a third feature encoding vector; q. q.s _t ' is a fourth feature encoding vector; q. q.s _t (k) Coding the kth feature in the second feature coding vector; c. C _k Vector representation of the kth cluster center in the prototype matrix; tau is a temperature coefficient; q. q of _s (k) Coding the kth feature in the first feature coding vectorCode; q. q of _t ' (k) is the kth feature code in the fourth feature code vector; d is a radical of _k Vector representation for the kth cluster center in the matrix CP; q. q of _s ' (k) encodes the kth feature in the third feature encoding vector.

5. The federal contrast cluster learning method of claim 4, wherein the temperature coefficient is initialized by the central server and then sent to each client.

6. The federated contrast cluster learning method of claim 3, wherein the loss function L (z) in step B24 when performing cluster contrast training on the shared model _t ，z _s ) Comprises the following steps:

L(z _s ，z _t )＝l(z _s ，q _t )+l(z _t ，q _s )+l″(z _s ，q _t ′)+l″(z _t ，q _s ′)

wherein z is _s Is a first sample feature; z is a radical of _t Is a second sample feature; q. q.s _s Encoding a vector for the first feature; q. q.s _t Encoding a vector for the second feature; q. q.s _s ' encoding for the third featureAn amount; q. q.s _t ' is a fourth feature encoding vector; q. q.s _t (k) Coding the kth feature in the second feature coding vector; c. C _k Vector representation for the kth cluster center in the prototype matrix; tau is a temperature coefficient; q. q of _s (k) Coding the kth feature in the first feature coding vector; q. q of _t ' (k) is the kth feature code in the fourth feature code vector; τ' is the temperature difference coefficient; d is a radical of _k Is a matrix C _P A vector representation of the kth cluster center in (a); q. q.s _s ' (k) encodes the kth feature in the third feature encoding vector.

7. The federal contrast cluster learning method of claim 6, wherein the temperature coefficient and the temperature difference coefficient are both initialized by a central server and then sent to each client.

8. The federated contrast cluster learning method of claim 6, wherein the temperature difference coefficient is less than the temperature coefficient.

9. The federated contrast cluster learning method of any one of claims 3-8, wherein each client performs the cluster contrast training in parallel.

10. A federated contrast clustering learning system oriented to non-independent same distribution data is characterized by comprising: a memory storing a computer program and a processor executing the computer program to perform the federated contrast cluster learning method of any one of claims 1-2 and/or the federated contrast cluster learning method of any one of claims 3-9.