CN112364913A

CN112364913A - Federal learning communication traffic optimization method and system based on core data set

Info

Publication number: CN112364913A
Application number: CN202011240064.0A
Authority: CN
Inventors: 肖春华; 李开菊
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2020-11-09
Filing date: 2020-11-09
Publication date: 2021-02-12

Abstract

The invention relates to the field of federal machine learning, and discloses a core data set-based federal learning communication traffic optimization method and system. According to the method, firstly, each terminal user screens out core data from local training data in parallel, the cloud center constructs a sparse global model according to a set sparse proportion, and each terminal user conducts local model training according to the screened local core data to obtain local model updating. Then, in order to make the global model more adaptive to the local core data, the cloud center adjusts the network structure of the global model according to the global model update obtained by aggregating the local model updates, wherein the steps include two steps of removing unimportant connections and adding important connections. And finally, the cloud center distributes the adjusted global model to each terminal user, and the steps are iterated until the global model converges. According to the method, the core data are screened from the terminal user, the adaptive sparse network model is deployed, uploading of parameters of the terminal user and the cloud center model is reduced, and the problem of high communication cost caused by frequent transmission of high-dimensional update parameters between the terminal user and the cloud center in the federal learning technology is solved essentially.

Description

Federal learning communication traffic optimization method and system based on core data set

Technical Field

The invention relates to the field of federal machine learning, in particular to a core data set-based federal learning communication traffic optimization method and a core data set-based federal learning communication traffic optimization system, which are used for solving the problem of high communication cost caused by frequent transmission of high-dimensional update parameters between terminal users/equipment and a cloud center in the federal learning technology.

Background

Machine learning, as an important branch of the field of artificial intelligence, is successfully and widely applied in various fields such as pattern recognition, data mining and computer vision. Due to the limited computing resources of the terminal equipment, a cloud-based mode is usually adopted for training a machine learning model at present, and in the mode, data collected by the terminal equipment, such as pictures, videos or personal position information, must be uploaded to a cloud center completely, and the cloud center completes the training of the model in a centralized manner to obtain an inference model. However, the sensitive data of the uploaded users can reveal the privacy information of the uploaded users, and as the privacy awareness of the users increases, more and more users do not want to share the privacy data to participate in the training of the model. This has severely hampered the development and application of machine learning techniques in the long term.

Accordingly, federal learning arises in order to protect sensitive data of users without affecting the training of machine learning models. In a federal learning environment, a terminal user does not need to upload local sensitive data to a cloud center, only needs to share local model updating parameters, and the cloud center interacts with the terminal user for many times until a global model converges, so that the sensitive data of the user is protected, and a final available model is obtained.

In the federal learning environment, multiple rounds of interaction are needed between the end user and the cloud center to obtain a global model of the target precision. Then, for complex model training, such as deep learning model training, each model update may contain millions of parameters, and the high dimensionality of the model update consumes a lot of communication cost, even becoming a bottleneck in model training. In addition, due to the heterogeneity of end users/devices, the unreliability of the network state of each device, and the asymmetry of the internet connection speed, for example, the uploading speed is much lower than the downloading speed, so that the delay of uploading local update parameters by the end user can further deteriorate the bottleneck of model training, and therefore, in order to improve the federal learning model training performance, the communication efficiency must be improved.

At present, in order to improve the communication efficiency problem of federal study, researchers at home and abroad have carried out a lot of studies on the federal study, and put forward a plurality of effective communication optimization methods. The methods mainly consider the redundancy characteristics of the model parameters, perform model compression operations such as sparsification, light weight or knowledge distillation on local model updating parameters, reduce the uploading of the redundancy parameters, and enable the uploaded model to be more compact in updating, thereby achieving the purpose of reducing communication traffic. However, these methods all consider the reduction of traffic from the viewpoint of model parameters, and do not essentially solve the problem. As is known, model parameters are obtained by local data training, and data themselves have redundancy, so that the purpose of reducing communication traffic can be achieved by extracting important data from the data themselves to perform model training and essentially reducing uploading of redundant parameters. Meanwhile, in the existing methods, although the model parameters are sparsified or lightened, and the uploading of redundant parameters is reduced, the existing methods do not operate the network model, and the model has the redundancy characteristic. Therefore, in the method for reducing the communication traffic, in order to substantially reduce the uploading of the redundant parameters, the redundancy characteristic of the data and the redundancy of the model are considered, and the two characteristics are not considered in the existing method.

In order to solve the problems of user sensitive data leakage and model availability caused by cloud-based training, federal learning is carried forward. However, due to the high dimensionality of the model training parameters and the unreliability of the network in the federated learning environment, the communication cost problem becomes a fundamental and important problem in federated learning. Although the existing research methods propose a plurality of effective communication optimization methods from the perspective of reducing communication traffic, they all consider the reduction of communication traffic from the model parameters, but do not consider the substantial reduction of redundant parameter uploading from the aspects of training data and models, so in order to better solve the problem of high communication cost of federal learning, the redundancy characteristics of the training data and models need to be fully considered, and the redundant parameter uploading is substantially reduced, thereby achieving the purpose of reducing communication traffic.

Based on the background, the invention provides a simple and easy-to-implement Federal learning communication traffic optimization method and system based on a core data set, and lays a foundation for solving the problem of high communication cost in Federal learning.

Disclosure of Invention

In order to effectively solve the problem of high communication cost of federal learning, the invention provides a method and a system for optimizing federal learning communication traffic based on a core data set. Firstly, each terminal user screens out core data from local training data in parallel, the cloud center constructs a sparse global model according to a set sparse proportion, and each terminal user conducts local model training according to the screened local core data to obtain local model updating. And finally, the cloud center distributes the adjusted global model to each terminal user, and the steps are iterated until the global model converges. According to the method, core data are screened from the training data for each terminal user according to the redundancy characteristic of the training data, and from the perspective of the model, the redundancy of the model is considered, and the sparse operation is performed on the model structure, so that a sparse network model adaptive to the core data is deployed, uploading of parameters of the terminal users and parameters of the cloud center model are reduced substantially, and the purpose of reducing communication traffic is achieved. The invention provides a Federal learning communication traffic optimization method based on a core data set, which comprises the following steps,

step S1, core data set construction. Given data set

Weight set of all data points in D

Selecting a core data set M from D so that a loss function satisfies | f (D) -f (M) | ≦ ε | f (D), wherein the loss function tolerance ε (0 < ε < 1), the error rate δ (0 < δ < 1) and a parameter R, and the method comprises the following sub-steps:

s1-1, selecting K clustering centers from D, dividing D into K clustering clusters by adopting K-means clustering algorithm, and using G_kRepresents the clustering result of the kth cluster,

representing the weight of the jth data point in the kth cluster;

step S1-2, calculating the redundancy m of each data point i in D_iThe calculation formula is as follows:

wherein the content of the first and second substances,

represents the weighted sum of all data points in the kth cluster except data point i,

representing the weighted average distance of the data point i from all data points in the kth cluster.

Step S1-3, calculating the normalized redundancy P of each data point i in D_iThe calculation formula is as follows:

step S1-4, calculating the redundancy average value of all data points in D

The calculation formula is as follows:

step S1-5, averaging according to the redundancy

The loss function tolerance epsilon and the error rate delta are calculated, and the size of the core data set M is calculated according to the following formula:

where c is a fixed constant.

And S1-6, screening the core data set M from the D according to the formula (2) and the formula (4).

Repeat step S1 for a given set of end users { C₁,C₂,...,C_r,...C_nAnd a corresponding data set { D }₁,D₂,...,D_r,...D_nEach end-user screens out its core data set { M } in parallel₁,M₂,...,M_r,...M_n}。

And step S2, constructing a sparse model. Constructing a sparse global model according to a given model sparsification proportion alpha, and comprising the following sub-steps of:

step S2-1, constructing a network with L number of layers and n number of L (L is less than or equal to L) layer neurons^lThe fully connected network model of (1);

step S2-2, according to model sparsification proportion alpha and network connection probability

The sparse fully-connected network model has the following formula:

wherein n is^lAnd n^l-1Respectively representing the number of l-layer neurons and l-1-layer neurons in the fully-connected network model,

it represents the probability of connection of the neuron i of the l-th layer to the neuron j of the l-1-th layer.

Step S3, cloud center initializes. Initializing global model parameters W according to the constructed sparse global model₀Global model update parameter U₀Iteration round number T and total uploaded parameter number omega.

Step S4, local model training. Each end user C_r(r 1,2.., n) carrying out local model training in parallel to obtain a local model updating parameter H_r(r 1,2.. n) and upload its local model update parameters to the cloud center, including the sub-steps of:

step S4-1, for each end user C_r(r 1,2.., n) from a local core dataset M_r(r ═ 1,2.. times, n) local model training is performed in parallel, resulting in local model update parameters H_r(r＝1,2,...,n)；

Step S4-2, for each end user C_r(r 1,2.., n) upload its local model parameters H_r(r 1,2.., n) to the cloud center, juxtaposed

Step S5, global model update. The cloud center calculates a global model update U of the current iteration round number T (T is less than or equal to T) according to local model update parameters uploaded by the terminal user_tAnd global model parameters W_tThe method comprises the following substeps:

step S5-1, calculating the global model update U of the current iteration round number T (T is less than or equal to T)_tThe calculation formula is as follows:

wherein, U_tAnd the global model updating parameter represents the current iteration round number T (T is less than or equal to T).

Step S5-2, updating U according to the global model_tCalculating the global model parameter W of the current iteration round number T (T is less than or equal to T)_tThe calculation formula is as follows:

W_t＝W_t-1-U_t (7)

and step S6, adjusting the structure of the global model. According to the global model parameter W_tAnd a proportion beta of unimportant connections removal set, removing unimportant network connections in the global model while increasing the proportion beta of important connections, comprising the sub-steps of:

step S6-1, evaluating the influence I of removing any connection I in the global model on the model loss function_iThe calculation formula is as follows:

I_i＝|f(W_t)-f(W_t|w_i＝0)| (8)

wherein, w_iRepresents W_tThe ith weight parameter.

Since it is very time consuming to evaluate the effect of sequentially removing all connections in the network on the model loss function according to equation (8), equation (8) is further expressed as:

I_i＝|g_iw_i| (9)

wherein, g_iRepresenting the ith update gradient in the global model gradient.

Step S6-2, removing I of beta proportion layer by layer from the global model_iA smaller neuronal connection;

step S6-3, according to the importance I of the neuron connection_iThe important junctions are screened layer by layer. Assuming that in step S6-2, the connection formed by the ith node of the l-th layer and the jth node of the l-1 layer is removed, screening out a node set S without removing the unimportant connection in the l-1 layer;

and step S6-4, randomly selecting a node from the set S to be connected with the ith node of the ith layer.

Step S7, global model distribution. And distributing the adjusted global model and the model parameters to each end user, and starting the next iteration.

And repeating the steps S4-S7 until T rounds of iteration are finished, and finishing the model training to obtain the final total communication traffic omega.

Meanwhile, the present invention also provides a federate learning traffic optimization system based on a core data set, as shown in fig. 3, including:

core data set building block for building a set of end users { C }from a given set of end users₁,C₂,...,C_r,...C_nAnd a corresponding data set { D }₁,D₂,...,D_r,...D_nIn the method, each end user screens out a core data set { M) of the end user in parallel₁,M₂,...,M_r,...M_nContains the following sub-modules:

a clustering submodule for dividing a given data set D into K clusters and using G_kRepresents the clustering result of the kth cluster,

representing the weight of the jth data point in the kth cluster;

weight and calculation submodule for calculating

Obtaining the weight sum of all data except the data point i in the kth clustering cluster;

a weighted average distance calculation submodule for calculating

Obtaining the weighted average distance between the data point i and all data points in the kth cluster;

redundancy calculation submodule for calculating

Obtaining the redundancy of each data point i in the D;

normalization submodule for calculating

Obtaining the normalized redundancy of each data point i in the D;

average redundancy calculation submodule for calculating

Obtaining the redundancy average value of all data points in the D;

core data set size calculation submodule for calculating

Obtaining the size of a core data set;

a probability sampling submodule for normalizing the redundancy P according to the size of the core data set | M |)_iScreening a core data set M from the D;

parallel submodule for parallel slave to each end user C_rData set D of (r ═ 1,2.., n)_r(r ═ 1,2.., n) screening the core dataset M_r(r＝1,2,...,n)。

The sparse model building module is used for conducting sparse on the built full-connection network model to obtain a sparse global model and comprises the following sub-modules:

the full-connection network construction submodule is used for constructing a full-connection network model according to the set number of network layers and the number of neurons in each layer;

a sparsification sampling probability calculation submodule for calculating the neuron connection probability of the adjacent layer in the model for each layer of the full-connection network model

A sparsification sampling submodule for sparsifying each layer of the full-connection network model to make the neuron connection probability of the adjacent layer in the model satisfy

A cloud center initialization module for initializing global model parameters W₀Global model update parameter U₀The number of model iteration rounds T and the total uploaded parameters omega.

Local model training module for each end user C_r(r 1,2.., n) carrying out local model training in parallel to obtain a local model updating parameter H_r(r ═ 1,2.., n), including the followingSubmodule:

global model parameter acquisition submodule for each end user C_r(r ═ 1,2.. times, n), obtaining global model parameters W of current iteration round number T (T ≦ T) from the cloud center_t；

Local model update parameter calculation submodule for each end user C_r(r 1,2.., n), from which the local core dataset M is derived_r(r ═ 1,2.. times, n) to carry out local model training, and obtain a model updating parameter H of the current iteration round number T (T is less than or equal to T)_r(r＝1,2,...,n)。

The global model updating module is used for calculating global model updating parameters and global model parameters of the current iteration rounds, and comprises the following sub-modules:

a local model update parameter acquisition submodule for acquiring each end user C_r(r ═ 1,2.., n) model update parameter H for current iteration round number T (T ≦ T)_r(r＝1,2,...,n)；

A global model parameter calculation submodule for calculating

And W_t＝W_t-1-U_tObtaining a global model updating parameter U of the current iteration round number T (T is less than or equal to T)_tAnd global model parameters W_t。

A global model structure adjusting module for adjusting the global model structure according to the global model parameter W_tAdjusting the structure of the global model, comprising the following sub-modules:

a model connection importance calculation submodule for calculating I_i＝|g_iw_iI, removing the influence of the connection i in the global model on the model loss function;

an importance ranking submodule for ranking I_iThe values of (A) are sorted in the order from small to large;

an unimportant connection removal submodule for removing layer-by-layer beta-scaled I from the global model_iA smaller neuronal connection;

an importance node screening submodule for connecting importance I according to neurons_iAnd screening an important connection point set S layer by layer.

And the randomization submodule is used for randomly choosing the importance node i from the set S.

From the data perspective, important training data are screened from local redundant data by comprehensively considering the training data and the redundancy characteristics of the model, the network model is thinned according to the important data, and a thinned network model adaptive to core data is deployed, so that each time a terminal user interacts with the cloud center, only training parameters of the thinned model need to be uploaded, the uploading of redundant parameters is reduced substantially, the problem of high communication cost caused by frequent transmission of high-dimensional update parameters between the terminal user and the cloud center in the federal learning technology is solved, and compared with the prior art, the method has the following beneficial effects:

(1) the method and the system provided by the invention have the advantages that from the data perspective, the redundancy characteristics of data and models are considered, a new idea for solving the problem of high communication cost in federal learning is developed, and the uploading of redundant parameters is substantially reduced, so that the purpose of reducing communication traffic is achieved;

(2) in the prior art, in the model training stage, most local model parameters need to deploy complex network models on terminal equipment, most terminals are resource-limited equipment, and the deployment of the complex models on the equipment is unreasonable in the actual federal learning scene.

(3) The global model structure adjusting part adopts Taylor first-level expansion characteristic of the loss function for network connection importance evaluation, simplifies the complexity of importance evaluation, further reduces the calculation complexity and is convenient for high-efficiency algorithm implementation.

Drawings

Fig. 1 is a flowchart of an overall method provided by an embodiment of the invention.

Fig. 2 is a flowchart illustrating specific steps provided by an embodiment of the present invention.

Fig. 3 is a schematic design diagram of a module of a federal learning traffic optimization system based on a core data set according to an embodiment of the present invention.

Detailed Description

The conception, specific structure and technical effects of the present invention will be further described in conjunction with the accompanying drawings and embodiments, so that the objects, features and effects of the present invention can be fully understood.

The specific implementation steps of the present invention are described below by taking an example of jointly training a multi-layer perceptron (MLP) model by using an MNIST data set by 100 end users, and the purpose is to reduce the total parameters uploaded to the cloud center by local users, so as to achieve the purpose of reducing communication traffic. Let the expression of MLP model be

Wherein N represents the total number of samples, X_iIs a feature vector of a sample, W_iIs the model parameter, b is the bias, σ is the activation function, and y is the output of the model.

The method provided by the technical scheme of the invention can adopt a computer software technology to realize an automatic operation process, fig. 1 is a general method flow chart of the embodiment of the invention, and referring to fig. 1, in combination with a specific step flow chart of fig. 2, the specific steps of the embodiment of the invention based on a core data set comprise:

step S1, core data set construction. Given data set

Weight set of all data points in D

And (3) screening a core data set M from the D so that the loss function satisfies | f (D) -f (M) | < ≦ ε | f (D) |, wherein the loss function tolerance ε (0 < ε < 1), the error rate δ (0 < δ < 1) and a parameter R.

In an embodiment, the MNIST data set is partitioned to each end user C in a non-independent equal distribution (N-IID) manner_r(r 1,2.., 100), to giveTraining data set D to each end user_r(r ═ 1,2,. 100), where,

D_rset of weights for all data points in

Given a loss function tolerance e of 0.08, an error rate δ of 0.05 and a parameter R of 3, the data set D is concurrently derived from the data set D according to the given parameters_r(r 1,2.., 100) screening the core dataset M_r(r 1,2.., 100) such that the loss function satisfies | f (D)_r)-f(M_r)|≤0.08|f(D_r) L (═ 1,2., 100), the following is implemented,

representing the weight of the jth data point in the kth cluster;

in the examples, from D_rSelecting 3 clustering centers from (r 1,2.., 100), and clustering D by using a K-means clustering algorithm_r(r 1,2.., 100) into 3 cluster clusters, and G_k(k ═ 1,2,3) denotes a clustering result of the kth cluster,

representing the weight of the jth data point in the kth cluster;

wherein the content of the first and second substances,

In the examples, pair D_rEach data point i in (r 1,2.. 100) is first calculated

To obtain

Then according to

Respectively calculating the weighted average distance between the data point i and all data points in the k (k is 1,2,3) th cluster_X ^i,AVGAccording to the formula

Computing redundancy m of data point i_i(i＝1,2..,|D_r|)。

in the examples, the calculation

To obtain D_rNormalized redundancy P of all data points in (r ═ 1,2.. times, n)_i(i＝1,2...|D_r|)。

Step S1-4, calculating the redundancy average value of all data points in D

The calculation formula is as follows：

In the examples, the calculation

To obtain D_rMean value of redundancy of all data in

Step S1-5, averaging according to the redundancy

where c is a fixed constant.

In the embodiment, the redundancy average value is calculated according to the known parameters of 0.08 epsilon and 0.05 delta

Computing

Get size | M of core dataset_r|。

In an embodiment, a set of end users { C is known₁,C₂,...,C_r,...C₁₀₀And a corresponding data set { D }₁,D₂,...,D_r,...D₁₀₀Repeat step S1, and each end user screens out its core data set { M }in parallel₁,M₂,...,M_r,...M₁₀₀}。

And step S2, constructing a sparse model. And constructing a sparse global model according to the given model sparse proportion alpha.

In the embodiment, the fully connected global network model is thinned according to the model thinning ratio α being 20 to obtain a thinned global model structure, which is specifically realized as follows,

in the embodiment, a multilayer perceptron Model (MLP) comprising an input layer, a hidden layer and an output layer is constructed, and 784 neurons, 200 neurons and 10 neurons are respectively arranged for the input layer, the hidden layer and the output layer of the model, and each layer of neurons is connected with the neurons of the previous layer.

The sparse fully-connected network model has the following formula:

In an embodiment, each layer of full connections is thinned so that the connection of the neuron i of the l (l ≦ 3) th layer and the neuron j of the l-1 th layer satisfies the probability

Thereby obtaining a thinned global model.

In an embodiment, the global model parameters W are initialized according to a thinned global model₀Global model update parameter U₀The iteration round number T is 10, and the total parameter number Ω is 0.

Step S4, local model training. Each end user C_r(r 1,2.., n) carrying out local model training in parallel to obtain a local model updating parameter H_r(r ═ 1,2.., n), and upload its local model update parameters to the cloud center.

In the embodiment, each terminal user performs local model training in parallel according to the local core data set to obtain local model update, and then uploads the local model update to the cloud center,

In an embodiment, each end user C_r(r 1,2.., 100) from its local core dataset M_r(r 1,2.., 100) performing local model training in parallel to obtain local model update parameters H thereof_r(r＝1,2,...,100)；

Step S4-2, eachEnd user C_r(r 1,2.., n) upload its local model parameters H_r(r 1,2.., n) to the cloud center, juxtaposed

In an embodiment, each end user C_r(r 1,2.., 100) updating parameters H of the trained local model_r(r 1,2.., 100) to a cloud center, juxtaposed

Step S5, global model update. The cloud center calculates a global model update U of the current iteration round number T (T is less than or equal to T) according to local model update parameters uploaded by the terminal user_tAnd global model parameters W_t。

In the embodiment, the cloud center gathers uploaded local model update parameters, and calculates to obtain a global model update U of the current iteration round number T (T is less than or equal to T)_tAnd global model parameters W_tThe concrete implementation is as follows,

In an embodiment, the cloud center aggregates uploaded local model update parameters H_r(r ═ 1,2.., 100), calculated

Obtaining the global model update U of the current iteration round number T (T is less than or equal to T)_t。

W_t＝W_t-1-U_t (7)

in an embodiment, the parameter U is updated according to a global model_tCalculating to obtain the global model parameter W of the current iteration round number T (T is less than or equal to T)_t。

And step S6, adjusting the structure of the global model. According to the global model parameter W_tAnd a proportion beta of unimportant connections is set, unimportant network connections in the global model are removed, and important connections of the proportion beta are increased at the same time.

In an embodiment, the parameters W are based on a global model_tRemoving the non-important links with beta 0.3 from each layer of the global model, while adding the important links with beta 0.3, is implemented as follows,

I_i＝|f(W_t)-f(W_t|w_i＝0)| (8)

wherein, w_iRepresents W_tThe ith weight parameter.

I_i＝|g_iw_i| (9)

wherein, g_iRepresenting the ith update gradient in the global model gradient.

In the examples, I is calculated_i＝|g_iw_iI, obtaining the influence I of removing any connection I in the global model on the model loss function_i；

in an embodiment, | I is removed from each layer of the global model_i0.3 pieces of I_iA lesser value of neuronal connectivity;

step S6-3, according to the importance I of the neuron connection_iLayer by layer sieveSelect important connection points. Assuming that in step S6-2, the connection formed by the ith node of the l-th layer and the jth node of the l-1 layer is removed, screening out a node set S without removing the unimportant connection in the l-1 layer;

in the examples, the importance of neuronal connectivity I_iAnd screening out a node set S without removing the unimportant connections in the l-1 layer from each layer l (l is less than or equal to 3) of the global model. Assuming that in step S6-2, the i-th or 8-th node in the l-3-th layer and the j-th or 50-th node in the l-1-2-th layer are removed, a node set S without removing the insignificant connection in the l-1-2-th layer is screened;

In the embodiment, for each layer l (l is less than or equal to 3) of the global model, a node is randomly selected from the set S to be connected with the ith node of the l layer. Based on the embodiment of step S6-3, a node is randomly selected from the set S to connect with the i-th or 8-th node of the l-3 layer.

In an embodiment, the adjusted global model and model parameters are distributed to each end user C_r(r 1,2.. 100) and the next iteration is started.

In the embodiment, the model training is ended until T is 10 iterations, and the final total traffic Ω is obtained.

The present invention provides a technical solution that can be implemented by those skilled in the art. The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art can make various changes or modifications without departing from the spirit and scope of the present invention, and therefore all equivalent technical solutions are within the scope of the present invention.

Claims

1. A federal learning communication traffic optimization method based on a core data set is characterized in that: comprises the following steps of (a) carrying out,

step S1, core data set construction. Given data set

Weight set of all data points in D

2. The method of claim 1, wherein the step S1 includes the following sub-steps:

wherein the content of the first and second substances,

step S1-4, calculating the redundancy average value of all data points in D

The calculation formula is as follows:

step S1-5, averaging according to the redundancy

Tolerance of loss functionEpsilon, error rate delta, the size of the core data set M is calculated as follows:

where c is a fixed constant.

3. The core data set-based federal learning traffic optimization method of claim 1, wherein said step S2 includes the following sub-steps,

The sparse fully-connected network model has the following formula:

4. The method of claim 1, wherein the step S6 includes the following sub-steps:

I_i＝|f(W_t)-f(W_t|w_i＝0)| (8)

wherein, w_iRepresents W_tThe ith weight parameter.

I_i＝|g_iw_i| (9)

wherein, g_iRepresenting the ith update gradient in the global model gradient.

5. A core data set-based federated learning traffic optimization system, characterized by: comprises the following modules which are used for realizing the functions of the system,

representing the weight of the jth data point in the kth cluster;

weight and calculation submodule for calculating

a weighted average distance calculation submodule for calculating

redundancy calculation submodule for calculating

Obtaining the redundancy of each data point i in the D;

normalization submodule for calculating

Obtaining the normalized redundancy of each data point i in the D;

average redundancy calculation submodule for calculating

Obtaining the redundancy average value of all data points in the D;

core data set size calculation submodule for calculating

Obtaining the size of a core data set;

Local model training module for each end user C_r(r 1,2.., n) carrying out local model training in parallel to obtain a local model updating parameter H_r(r ═ 1,2.., n), containing the following sub-modules:

A global model parameter calculation submodule for calculating