CN114580661A

CN114580661A - Data processing method and device based on federal learning and computer equipment

Info

Publication number: CN114580661A
Application number: CN202210181539.6A
Authority: CN
Inventors: 郭清宇; 蓝利君; 李超; 周义朋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-06-03
Anticipated expiration: 2042-02-25
Also published as: CN114580661B

Abstract

The application relates to a data processing method, a data processing device, a computer device, a storage medium and a computer program product based on federal learning. The method can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, for example, the method is applied to each parameter server in a federal learning architecture and comprises the following steps: receiving gradient data obtained by training participants in the corresponding participant cluster by using local data; summarizing the gradient data of the participant cluster to obtain gradient summarized data; acquiring a topological structure of a parameter server; exchanging gradient summary data with adjacent parameter servers based on the parameter server topology; and updating parameters of the combined model according to gradient summarized data of all parameter servers obtained by exchanging. The method improves the stability of the system, and provides a foundation for improving the data exchange efficiency between the parameter servers because the topological structure of the parameter servers is constructed based on the response time between the parameter servers and is not fixed and unchanged.

Description

Data processing method, device and computer equipment based on federated learning

技术领域technical field

本申请涉及人工智能技术领域，特别是涉及一种基于联邦学习的数据处理方法、装置、计算机设备、存储介质和计算机程序产品。The present application relates to the technical field of artificial intelligence, and in particular, to a data processing method, apparatus, computer equipment, storage medium and computer program product based on federated learning.

背景技术Background technique

联邦学习(Federated Learning)是一种新兴的人工智能基础技术，是一种基于隐私保护的分布式机器学习训练方式，它的目标是当训练数据分散在大量的不可靠且网络延迟较高的客户端上时，能够保护客户端数据隐私的同时训练出一个高质量的预测模型。Federated Learning (Federated Learning) is an emerging artificial intelligence basic technology, which is a distributed machine learning training method based on privacy protection. On the end, it can train a high-quality prediction model while protecting the privacy of client data.

一种联邦学习的架构如图1所示，是一种单参数服务器、多参与方的联邦学习结构。顾名思义，参数服务器只有单节点，分别与多个参与方进行模型参数交换。具体流程如图1所示。单参数服务器、多参与方结构，参数服务器很容易成为整个训练过程中的性能瓶颈，同时因为参数服务器是单点的原因，会导致整个训练过程的鲁棒性较低，如果参数服务器发生故障或者网络问题，那么整个联邦学习训练过程都将出现问题。A federated learning architecture is shown in Figure 1, which is a single-parameter server, multi-participant federated learning structure. As the name suggests, the parameter server has only a single node, which exchanges model parameters with multiple participants respectively. The specific process is shown in Figure 1. Single-parameter server and multi-participant structure, the parameter server can easily become a performance bottleneck in the entire training process. At the same time, because the parameter server is a single point, the robustness of the entire training process will be low. If the parameter server fails or If there is a network problem, then the entire federated learning training process will have problems.

发明内容SUMMARY OF THE INVENTION

基于此，有必要针对上述技术问题，提供一种能够提高稳定性的基于联邦学习的数据处理方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。Based on this, it is necessary to provide a federated learning-based data processing method, apparatus, computer equipment, computer-readable storage medium and computer program product that can improve the stability for the above technical problems.

第一方面，本申请提供了一种基于联邦学习的数据处理方法。所述方法包括：In a first aspect, the present application provides a data processing method based on federated learning. The method includes:

接收对应的参与方群簇中的参与方利用本地数据训练得到的梯度数据；Receive the gradient data obtained by the participants in the corresponding participant cluster using the local data training;

对所述参与方群簇的所述参与方的梯度数据进行汇总，得到梯度汇总数据；Aggregating the gradient data of the participants of the participant cluster to obtain the gradient summary data;

获取参数服务器拓扑结构，其中，所述参数服务器拓扑结构是初始化联邦学习模型时基于所述参数服务器间的响应时间构建的；obtaining a parameter server topology, wherein the parameter server topology is constructed based on the response time between the parameter servers when the federated learning model is initialized;

基于所述参数服务器拓扑结构，与邻接的参数服务器交换所述梯度汇总数据；exchanging the gradient summary data with an adjacent parameter server based on the parameter server topology;

根据交换获得的全部参数服务器的梯度汇总数据，更新联合模型的参数。The parameters of the joint model are updated according to the gradient summary data of all parameter servers obtained by the exchange.

第二方面，本申请还提供了一种基于联邦学习的数据处理装置。应用于联邦学习架构中的各参数服务器，所述装置包括：In a second aspect, the present application also provides a data processing apparatus based on federated learning. Applied to each parameter server in the federated learning architecture, the device includes:

接收模块，用于接收对应的参与方群簇中的参与方利用本地数据训练得到的梯度数据；The receiving module is used to receive the gradient data obtained by the participants in the corresponding participant cluster using the local data training;

汇总模块，用于对所述参与方群簇的所述参与方的梯度数据进行汇总，得到梯度汇总数据；an aggregation module, configured to aggregate the gradient data of the participants of the participant cluster to obtain the gradient aggregated data;

结构获取模块，用于获取参数服务器拓扑结构，其中，所述参数服务器拓扑结构是初始化联邦学习模型时基于所述参数服务器间的响应时间构建的；a structure acquisition module, configured to acquire a parameter server topology, wherein the parameter server topology is constructed based on the response time between the parameter servers when the federated learning model is initialized;

交换模块，用于基于所述参数服务器拓扑结构，与邻接的参数服务器交换所述梯度汇总数据；an exchange module, configured to exchange the gradient summary data with an adjacent parameter server based on the parameter server topology;

更新模块，用于根据交换获得的全部参数服务器的梯度汇总数据，更新联合模型的参数。The update module is used to summarize the data according to the gradients of all parameter servers obtained by the exchange, and update the parameters of the joint model.

第三方面，本申请还提供了一种计算机设备。所述计算机设备包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现以下步骤：In a third aspect, the present application also provides a computer device. The computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:

第四方面，本申请还提供了一种计算机可读存储介质。所述计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：In a fourth aspect, the present application also provides a computer-readable storage medium. The computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by the processor, the following steps are implemented:

第五方面，本申请还提供了一种计算机程序产品。所述计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现以下步骤：In a fifth aspect, the present application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, implements the following steps:

上述基于联邦学习的数据处理方法、装置、计算机设备、存储介质和计算机程序产品，各参数服务器对应参与方群簇，接收对应参数方群簇训练的梯度数据，是一种多参与方、多参数服务器的架构，各参数服务器与对应的参与方群簇做数据交换，即使有参数服务器出现故障，也不会影响联邦学习的训练，提高了系统稳定性。同时，各参数服务器根据参数服务器间的响应时间，构建参数服务器拓扑结构，各参数服务器在获得对应参与方群簇的梯度汇总数据后，基于参数服务器拓扑结构，交换梯度汇总数据。由于参数服务器拓扑结构是基于参数服务器间的响应时间构建的，不是固定不变的，为提高参数服务器间的数据交换效率提供了基础。The above-mentioned data processing method, device, computer equipment, storage medium and computer program product based on federated learning, each parameter server corresponds to a participant cluster and receives the gradient data trained by the corresponding parameter cluster, which is a multi-participant, multi-parameter The architecture of the server, each parameter server exchanges data with the corresponding participant cluster, even if a parameter server fails, it will not affect the training of federated learning and improve the system stability. At the same time, each parameter server constructs a parameter server topology structure according to the response time between the parameter servers. After each parameter server obtains the gradient summary data of the corresponding participant cluster, it exchanges the gradient summary data based on the parameter server topology structure. Since the parameter server topology is constructed based on the response time between parameter servers, it is not fixed, which provides a basis for improving the data exchange efficiency between parameter servers.

附图说明Description of drawings

图1为一个实施例中单参数服务器、多参与方的联邦学习架构示意图；1 is a schematic diagram of a federated learning architecture of a single-parameter server and multiple participants in one embodiment;

图2为一个实施例中多参数服务器、多参数方的联邦学习架构示意图；2 is a schematic diagram of a federated learning architecture of a multi-parameter server and a multi-parameter party in one embodiment;

图3为一个实施例中多参数服务器的直连型拓扑结构示意图；3 is a schematic diagram of a direct-connected topology structure of a multi-parameter server in one embodiment;

图4为一个实施例中多参数服务器的星型拓扑结构示意图；4 is a schematic diagram of a star topology of a multi-parameter server in one embodiment;

图5为一个实施例中基于联邦学习的数据处理方法的应用环境图；5 is an application environment diagram of a federated learning-based data processing method in one embodiment;

图6为一个实施例中基于联邦学习的数据处理方法的流程示意图；6 is a schematic flowchart of a data processing method based on federated learning in one embodiment;

图7为一个实施例中各参数服务器对应的参与方群簇的构架示意图；7 is a schematic diagram of the framework of a participant cluster corresponding to each parameter server in one embodiment;

图8为一个实施例中参数服务器拓扑结构示意图；8 is a schematic diagram of a parameter server topology in one embodiment;

图9为一个实施例中参数服务器间的响应时间的关系示意图；9 is a schematic diagram of the relationship between the response times between parameter servers in one embodiment;

图10为一个实施例中直连型的参数服务器拓扑结构示意图；10 is a schematic diagram of a topology structure of a direct-connected parameter server in one embodiment;

图11为另一个实施例中直连型的参数服务器拓扑结构示意图；11 is a schematic diagram of a topology structure of a direct-connected parameter server in another embodiment;

图12为一个实施例中参数服务器拓扑结构构建的关系示意图；12 is a schematic diagram of the relationship of parameter server topology construction in one embodiment;

图13为另一个实施例中基于联邦学习的数据处理方法示意图；13 is a schematic diagram of a data processing method based on federated learning in another embodiment;

图14为一个实施例中在MINIST图像分类数据集的实验对比结果；Fig. 14 is the experimental comparison result in MINIST image classification data set in one embodiment;

图15为一个实施例中在Synthetic实验对比结果；Fig. 15 is the experimental contrast result in Synthetic in one embodiment;

图16为一个实施例中不同拓扑结构的复杂度和鲁棒性对比结果；FIG. 16 is a comparison result of complexity and robustness of different topologies in one embodiment;

图17为一个实施例中基于联邦学习的数据处理装置的结构框图；17 is a structural block diagram of a data processing apparatus based on federated learning in one embodiment;

图18为一个实施例中计算机设备的内部结构图。Figure 18 is a diagram of the internal structure of a computer device in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

随着人工智能技术研究和进步，人工智能技术在多个领域展开研究和应用，例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等，相信随着技术的发展，人工智能技术将在更多的领域得到应用，并发挥越来越重要的价值。With the research and progress of artificial intelligence technology, artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones It is believed that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important value.

本申请实施例提供的方案涉及人工智能的联邦学习等技术。联邦学习(FederatedLearning)是一种新兴的人工智能基础技术，是一种基于隐私保护的分布式机器学习训练方式，它的目标是当训练数据分散在大量的不可靠且网络延迟较高的客户端上时，能够保护客户端数据隐私的同时训练出一个高质量的预测模型。能够保障在大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，多参与方或多计算结点之间开展高效率的机器学习。The solutions provided in the embodiments of this application involve technologies such as federated learning of artificial intelligence. Federated Learning (Federated Learning) is an emerging artificial intelligence basic technology, which is a distributed machine learning training method based on privacy protection. At the same time, a high-quality prediction model can be trained while protecting the privacy of client data. Under the premise of ensuring information security during big data exchange, protecting the privacy of terminal data and personal data, and ensuring legal compliance, efficient machine learning can be carried out between multiple participants or multiple computing nodes.

对于多参与方参与联邦训练的应用场景中，数据分布在同一组织的不同客户端(参与方)，利用参数服务器，作为模型参数交换的中间服务器。针对单参数服务器、多参与方的联邦学习结构不稳定的问题，提出了一种多参数服务器、多参数方的联邦学习结构。For the application scenario where multiple participants participate in federated training, the data is distributed in different clients (participants) of the same organization, and the parameter server is used as an intermediate server for model parameter exchange. Aiming at the instability of the federated learning structure of single-parameter server and multi-participant, a federated learning structure of multi-parameter server and multi-parameter party is proposed.

其中，多参数服务器、多参与方的联邦学习结构有多个参数服务器，多个参数服务器分别于自己对应的参与方群做参数交换，具体训练流程如图2所示：Among them, the multi-parameter server and multi-participant federated learning structure has multiple parameter servers, and multiple parameter servers exchange parameters with their corresponding participant groups. The specific training process is shown in Figure 2:

步骤21：参与方客户端(如，用户的手机、多家银行，即产品使用方)从各自对应的参数服务器端下载共享预测模型。Step 21: Participating client clients (eg, user's mobile phone, multiple banks, ie product users) download the shared prediction model from their corresponding parameter servers.

步骤22：参与方客户端利用本地数据对模型进行训练迭代。Step 22: The participant client uses local data to train the model and iterate.

步骤23：参与方客户端将模型训练后获得的梯度更新，加密上传到各自对应的参数服务器端。Step 23: Participating clients update the gradients obtained after model training, encrypt and upload them to their corresponding parameter servers.

步骤24：所有参数服务器端等待收集来自所有参与方的梯度更新，然后与邻居服务器进行梯度交换和整合，然后对自己端的共享模型进行更新。Step 24: All parameter servers wait to collect gradient updates from all participants, then exchange and integrate gradients with neighbor servers, and then update the shared model on their own side.

重复上述步骤21至24的过程，参与方不断将服务器端每次更新的模型下载到本地客户端进行更新迭代，直到模型收敛，停止更新。Repeating the process of steps 21 to 24 above, the participants continuously download the model updated each time on the server side to the local client for update iteration until the model converges and stop updating.

多参数服务器、多参与方的联邦学习结构中，多参数服务器之间的结构主要是直连型和星型结构，参数服务器间的直连型结构如图3所示，参数服务器间的星型结构如图4所示。In the multi-parameter server and multi-participant federated learning structure, the structures between multi-parameter servers are mainly direct-connected and star-shaped structures. The direct-connected structure between parameter servers is shown in Figure 3. The structure is shown in Figure 4.

本申请实施例提供的基于联邦学习的数据处理方法，可以应用于如图5所示的多参数服务器、多参与方的联邦学习结构中。参与方与参数服务器通过网络进行通信。各参数服务接收对应参与方群簇的参与方利用本地数据训练得到的梯度数据；对参与方群簇的梯度数据进行汇总，得到梯度汇总数据；获取基于参数服务器间的响应时间构建的参数服务器拓扑结构；基于参数服务器拓扑结构，与邻接的参数服务器交换梯度汇总数据；根据交换获得的全部参数服务器的梯度汇总数据，更新联合模型的参数。其中，参与方102可以是用户终端，用户终端包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端、飞行器等。参数服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。本发明实施例可应用于各种场景，包括但不限于云技术、人工智能、智慧交通、辅助驾驶等。The data processing method based on federated learning provided by the embodiments of the present application can be applied to the federated learning structure of multi-parameter server and multi-participant as shown in FIG. 5 . The participants communicate with the parameter server over the network. Each parameter service receives the gradient data obtained by the participants of the corresponding participant cluster using local data training; summarizes the gradient data of the participant cluster to obtain the gradient summary data; obtains the parameter server topology constructed based on the response time between the parameter servers Structure; based on the parameter server topology, exchange gradient summary data with adjacent parameter servers; update the parameters of the joint model according to the gradient summary data of all parameter servers obtained from the exchange. The participant 102 may be a user terminal, and the user terminal includes but is not limited to a mobile phone, a computer, an intelligent voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, and the like. The parameter server 104 can be implemented by an independent server or a server cluster composed of multiple servers. The embodiments of the present invention can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, intelligent transportation, assisted driving, and the like.

在一个实施例中，如图6所示，提供了一种基于联邦学习的数据处理方法，以该方法应用于图1中的一个参与服务器为例进行说明，包括以下步骤602至步骤612，可以理解的是，多参与方服务器在进行联邦学习时，各参与方服务器分别执行以下步骤602至步骤610：In one embodiment, as shown in FIG. 6 , a data processing method based on federated learning is provided. Taking the method applied to a participating server in FIG. 1 as an example, the method includes the following steps 602 to 612 , which can be It is understood that when the multi-participant server performs federated learning, each participant server performs the following steps 602 to 610 respectively:

步骤602，接收对应的参与方群簇中参与方利用本地数据训练得到的梯度数据。Step 602: Receive gradient data obtained by participants in the corresponding participant cluster using local data training.

其中，联邦学习是由参与方共同训练得到全局模型，各参与方基于自己的本地数据训练模型，再通过参数服务器交换和汇总，得到全局模型，在这个过程中，用户数据始终在本地，不对外发送，满足数据安全和隐私保护要求。联邦学习的参与方一般包括数据方、算法方、协调方、计算方、结果方、任务发起方等。以联邦学习应用于不同银行间的风控联合建模为例，参与方包括银行、用户和信贷机构。Among them, federated learning is a global model that is jointly trained by the participants. Each participant trains the model based on their own local data, and then exchanges and summarizes the global model through the parameter server. In this process, the user data is always local, not external. Send, meet data security and privacy protection requirements. The participants of federated learning generally include the data party, the algorithm party, the coordinator, the calculation party, the result party, and the task initiator. Take the application of federated learning to the joint modeling of risk control among different banks as an example, the participants include banks, users and credit institutions.

参与方群簇，是一些参与方的集群，每个参与方群簇对应一个参数服务器，向该参数服务器上报利用本地数据训练得到的梯度数据。The participant cluster is a cluster of some participants, each participant cluster corresponds to a parameter server, and reports the gradient data obtained by using local data training to the parameter server.

其中，可以在联邦学习初始化时，初始化各参数服务器对应的参与方群簇。一种实施方式中，可以以地域进行划分，确定每个参数服务器对应的参与方群簇，将相同或相近地域的参与方构建为本地的参数服务器的参与方群簇。一种实施方式中，还可以以每个服务器为中心节点，以服务器与参与方的响应时间作聚类，形成多个参与方聚簇(与服务器个数对应)。整个聚类算法使得落入同一聚簇的参与方与对应的服务器平均响应时间最短。一种方式中，可以是参数服务器向参与方发送梯度更新请求，根据参与方的响应时间，选择参与方构建对应的参与方群簇。一种方式中，可以是参数服务器向参与方发送测试请求，根据参与方的响应时间，选择参与方构建对应的参与方群簇。Among them, the participant cluster corresponding to each parameter server can be initialized when the federated learning is initialized. In one embodiment, regions can be divided to determine the participant cluster corresponding to each parameter server, and the participants in the same or similar regions can be constructed as the local parameter server participant cluster. In one embodiment, each server can be used as a central node, and the response time of the server and the participants can be used as clusters to form a cluster of multiple participants (corresponding to the number of servers). The whole clustering algorithm makes the average response time of the participants who fall into the same cluster and the corresponding servers the shortest. In one way, the parameter server may send a gradient update request to the participants, and select the participants to construct a corresponding participant cluster according to the response time of the participants. In one method, the parameter server may send a test request to the participant, and select the participant to construct the corresponding participant cluster according to the response time of the participant.

通过初始化，构建了每个参数服务器对应的参与方群簇，如图7所示。其中，在联邦学习中，参数服务器将训练程序下发至参与方，参与方利用本地数据计算下降的梯度和损失，得到一轮训练的梯度数据，加密并上传至对应的参数服务器。所以，对于参数服务器而言，其接收的是对应参与方群簇中的各参与方一次迭代训练得到的梯度数据。如图7所示，参数服务器1接收参与方聚簇1中各参与在本次迭代训练得到的梯度数据。Through initialization, the participant cluster corresponding to each parameter server is constructed, as shown in Figure 7. Among them, in federated learning, the parameter server sends the training program to the participants, and the participants use the local data to calculate the gradient and loss of the descent, obtain the gradient data of one round of training, encrypt and upload it to the corresponding parameter server. Therefore, for the parameter server, what it receives is the gradient data obtained by one iteration of training of each participant in the corresponding participant cluster. As shown in FIG. 7 , the parameter server 1 receives the gradient data obtained by each participant in the participant cluster 1 during this iterative training.

可以理解的是，为了保证训练过程中数据的保密性，参与方对梯度数据进行加密后，发送至对应的参数服务器。It is understandable that, in order to ensure the confidentiality of the data during the training process, the participants encrypt the gradient data and send it to the corresponding parameter server.

步骤604，对参与方群簇的梯度数据进行汇总，得到梯度汇总数据。Step 604: Summarize the gradient data of the participant clusters to obtain the gradient summary data.

具体地，每个参数服务器对接收到的参与方群簇的梯度数据进行汇总，得到梯度汇总数据。其中，汇总公式如下：Specifically, each parameter server summarizes the received gradient data of the participant cluster to obtain the gradient summary data. Among them, the summary formula is as follows:

步骤606，获取参数服务器拓扑结构，其中，所述参数服务器拓扑结构是初始化联邦学习模型时基于所述参数服务器间的响应时间构建的。Step 606: Obtain a parameter server topology structure, wherein the parameter server topology structure is constructed based on the response time between the parameter servers when the federated learning model is initialized.

本申请的基于联邦学习的数据处理方法，适用于多参与方多参数服务器的联邦学习的应用场景。每个参数服务器接收一个参与方群簇的梯度数据，进行汇总。汇总后，各参数服务器间需要交换梯度汇总数据。The data processing method based on federated learning of the present application is suitable for the application scenario of federated learning of multi-participant and multi-parameter servers. Each parameter server receives gradient data for a cluster of participants and aggregates them. After the aggregation, the gradient aggregation data needs to be exchanged between the parameter servers.

通常各参数服务器间为如图3所示的直连型拓扑结构，或是如图4所示的星型拓扑结构。直连型的多参数服务器的拓扑结构主要缺点是模型收敛速度要慢，各参数服务器要交互多次参数才能使模型收敛，也会由于单点故障导致系统稳定性差。星型结构的多参数服务器的拓扑结构，模型收敛速度很快，但是依然存在中心节点由于单点故障导致系统稳定性差。Usually, the parameter servers are in a direct connection topology as shown in Figure 3, or a star topology as shown in Figure 4. The main disadvantage of the directly connected multi-parameter server topology is that the model convergence speed is slow, each parameter server needs to exchange parameters for many times to make the model converge, and the system stability is also poor due to a single point of failure. In the star-structured multi-parameter server topology, the model converges quickly, but there is still a single point of failure in the central node, which leads to poor system stability.

本实施例中，基于参数服务器间的响应时间，构建参数服务器拓扑结构，从而参数服务器间的拓扑结构不是固定一成不变的，而是基于参数服务器间的响应时间灵活构建，能够真实反应联邦学习过程中，参数服务间的响应时延长短对参数服务器间梯度汇总数据交换效率的影响。In this embodiment, the parameter server topology is constructed based on the response time between the parameter servers, so that the topology between the parameter servers is not fixed, but is flexibly constructed based on the response time between the parameter servers, which can truly reflect the process of federated learning. , the effect of short response time between parameter servers on the exchange efficiency of gradient summary data between parameter servers.

其中，可以在联邦学习模型初始化过程中，每个参数服务器向其它的参数服务器发送请求，获得其它各参数服务器对该请求的响应时间。如图7所示的C个参数服务器，每个服务器获取与其它服务器间的响应时间。Wherein, during the initialization process of the federated learning model, each parameter server may send a request to other parameter servers to obtain the response time of the other parameter servers to the request. As shown in Figure 7, there are C parameter servers, each server obtains the response time with other servers.

一种方式中，在获得了所有参数服务器之间的响应时间后，以最短交换时间为目标，构建直连型的参数服务器拓扑结构。In one way, after obtaining the response time between all parameter servers, a direct-connected parameter server topology is constructed with the shortest exchange time as the goal.

直连型的参数服务器拓扑结构如图3所示，各参数服务器间依次连接。以最短交换时间为目标，构建的直连型的参数服务器拓扑结构，是利用参数服务器间的响应时间构建的，与现有的固定的直连型的参数服务器拓扑结构不同。通过使得参数服务器间的数据交换时间最短，提高了各参数服务器间的数据交换效率，进而提高联邦学习的效率。The topological structure of the parameter server of the direct connection type is shown in Figure 3, and each parameter server is connected in sequence. Aiming at the shortest exchange time, the directly-connected parameter server topology is constructed by utilizing the response time between parameter servers, which is different from the existing fixed direct-connected parameter server topology. By making the data exchange time between parameter servers the shortest, the data exchange efficiency between parameter servers is improved, thereby improving the efficiency of federated learning.

一种方式中，在获得了所有参数服务器之间的响应时间后，以最短交换时间和最大度约束为目标，构建参数服务器拓扑结构。In one way, after obtaining the response time between all parameter servers, the parameter server topology is constructed with the shortest exchange time and the maximum degree constraint as the goal.

最大度约束是指，使参数服务器间的边满足最大数量要求。由于设置了最大度约束，使得每个参数服务器与多个参数服务器具有连接关系，而避免直连型连接易存在的单点故障问题。以最短交换时间和最大度约束为目标，构建的参数服务器拓扑结构，不仅考虑了数据交换的效率，还考虑了数据的稳定性，避免单点故障导致系统稳定性差的问题。The maximum degree constraint means that the edges between parameter servers meet the maximum number of requirements. Due to the setting of the maximum degree constraint, each parameter server has a connection relationship with multiple parameter servers, and the single point of failure problem that is easy to exist in the direct connection type is avoided. Aiming at the shortest exchange time and maximum degree constraints, the constructed parameter server topology not only considers the efficiency of data exchange, but also the stability of data, so as to avoid the problem of poor system stability caused by single point of failure.

步骤608，基于参数服务器拓扑结构，与邻接的参数服务器交换梯度汇总数据。Step 608 , exchange gradient summary data with adjacent parameter servers based on the parameter server topology.

具体地，利用参数服务器拓扑结构，每个参数服务器，与具有连接关系的邻接的参数服务器交换梯度汇总数据。以图8所示的参数服务器拓扑结构为例，参数服务器A将梯度汇总数据交换给参数服务器B、参数服务器C以及参数服务器E，对应的，通过轮交换，参数服务器A能够获得参数服务器B、参数服务器E以及参数服务器C的梯度汇总数据。同时在第一轮交换中，其它参数服务器邻接的参数服务器的梯度，例如，参数服务器B获得的参数服务器A、参数服务器C和参数服务器E的梯度汇总数据。Specifically, using the parameter server topology, each parameter server exchanges gradient summary data with adjacent parameter servers having a connection relationship. Taking the parameter server topology shown in FIG. 8 as an example, parameter server A exchanges the gradient summary data to parameter server B, parameter server C and parameter server E. Correspondingly, through round exchange, parameter server A can obtain parameter server B, Gradient summary data of parameter server E and parameter server C. Meanwhile, in the first round of exchange, the gradients of other parameter servers adjacent to the parameter server, for example, the gradient summary data of parameter server A, parameter server C, and parameter server E obtained by parameter server B.

各参数服务器交换获得邻接的参数服务器的梯度汇总数据后，再次对交换得到的邻接参数服务器的梯度汇总数据，与自身的梯度汇总数据进行梯度汇总，此时梯度汇总数据中包括了自身以及邻接参数服务器的梯度汇总数据。例如，第一轮交换后，参数服务器A对自身、参数服务器B、参数服务器E以及参数服务器C的梯度汇总数据进行梯度汇总。After each parameter server exchanges and obtains the gradient summary data of the adjacent parameter servers, the gradient summary data of the adjacent parameter servers obtained by the exchange is again carried out with its own gradient summary data. At this time, the gradient summary data includes itself and the adjacent parameters. Gradient summary data for the server. For example, after the first round of exchange, parameter server A performs gradient summary on the gradient summary data of itself, parameter server B, parameter server E, and parameter server C.

没有直接连接的参数服务器之间在第一轮交换中不能获得相互的梯度汇总数据，则继续进行轮交换。例如，参数服务器B和参数服务器D之间没有直接连接，则参数服务器C在第一轮交换中获得了参数服务器D的梯度汇总数据，通过第二轮交换，参数服务器C将包括了参数服务器D的梯度汇总数据交换给参数服务器B。可以理解的是，通过多轮交换以及梯度汇总，各参数服务器间获得全部参数服务器的梯度总汇数据。If the parameter servers without direct connection cannot obtain the mutual gradient summary data in the first round of exchange, the round of exchange is continued. For example, if there is no direct connection between parameter server B and parameter server D, parameter server C obtains the gradient summary data of parameter server D in the first round of exchange. Through the second round of exchange, parameter server C will include parameter server D. The gradient summary data is exchanged to parameter server B. It can be understood that, through multiple rounds of exchange and gradient aggregation, each parameter server obtains the gradient aggregation data of all parameter servers.

可以理解的是，为了保证训练过程中数据的保密性，参数服务器之间对梯度汇总数据进行加密后，与邻接的参数服务器进行交换，也就是说，在参数服务器间交换的为同态加密后的梯度汇总数据。It is understandable that, in order to ensure the confidentiality of the data during the training process, after the gradient summary data is encrypted between the parameter servers, it is exchanged with the adjacent parameter servers, that is, the exchange between the parameter servers is after homomorphic encryption. The gradient summary data of .

步骤610，根据交换获得的全部参数服务器的梯度汇总数据，更新联合模型的参数。Step 610: Update the parameters of the joint model according to the gradient summary data of all parameter servers obtained through the exchange.

具体地，通过多轮交换以及梯度汇总，各参数服务器间获得全部参数服务器的梯度总汇数据，更新联合模型的参数，完成一次联邦学习的迭代训练。Specifically, through multiple rounds of exchange and gradient aggregation, each parameter server obtains the gradient aggregation data of all parameter servers, updates the parameters of the joint model, and completes an iterative training of federated learning.

参数服务器根据更新的联合模型，继续向对应的参数服务器聚簇下发更新的联合模型，各参与方利用本地数据继续进行训练，各参数服务器通过重复步骤602至步骤610，不断更新联合模型，直至训练结束。The parameter server continues to distribute the updated joint model to the corresponding parameter server cluster according to the updated joint model, and each participant uses the local data to continue training, and each parameter server continuously updates the joint model by repeating steps 602 to 610, until Training is over.

上述的基于联邦学习的数据处理方法，各参数服务器对应参与方群簇，接收对应参数方群簇训练的梯度数据，是一种多参与方、多参数服务器的架构，各参数服务器与对应的参与方群簇做数据交换，即使有参数服务器出现故障，也不会影响联邦学习的训练，提高了系统稳定性。同时，各参数服务器根据参数服务器间的响应时间，构建参数服务器拓扑结构，各参数服务器在获得对应参与方群簇的梯度汇总数据后，基于参数服务器拓扑结构，交换梯度汇总数据。由于参数服务器拓扑结构是基于参数服务器间的响应时间构建的，不是固定不变的，为提高参数服务器间的数据交换效率提供了基础。In the above-mentioned data processing method based on federated learning, each parameter server corresponds to the participant cluster and receives the gradient data trained by the corresponding parameter cluster, which is a multi-participant and multi-parameter server architecture. For data exchange with the cluster cluster, even if there is a failure of the parameter server, it will not affect the training of federated learning, which improves the stability of the system. At the same time, each parameter server constructs a parameter server topology structure according to the response time between the parameter servers. After each parameter server obtains the gradient summary data of the corresponding participant cluster, it exchanges the gradient summary data based on the parameter server topology structure. Since the parameter server topology is constructed based on the response time between parameter servers, it is not fixed, which provides a basis for improving the data exchange efficiency between parameter servers.

在另一个实施例中，接收对应的参与方群簇中的参与方利用本地数据训练得到的梯度数据，包括：向对应的参与方聚簇发送梯度更新请求，接收参与方聚簇中最快响应梯度更新请求的前N个参与方利用本地数据训练得到的梯度数据。In another embodiment, receiving the gradient data obtained by the participants in the corresponding participant cluster using local data training includes: sending a gradient update request to the corresponding participant cluster, and receiving the fastest response in the participant cluster The first N participants of the gradient update request use the gradient data obtained by local data training.

具体地，联邦学习各个参与方由于地域，网络环境等不同，响应时间各不相同，而在同一个联邦中响应时间慢的参与方将严重拖累整体训练时间。针对这一问题，先通过基于参与方对参数服务器的响应时间，对参与方聚类，构建参数服务器的参与方聚簇。具体地，将对参数服务器的响应时间近的各参数方聚为一个簇，作为参数服务器的参与方聚簇，从而将联邦学习的所有参与方聚类成不同的簇，使得落入相同簇的参与方具有相近的响应时间。Specifically, each participant of federated learning has different response times due to different regions, network environments, etc., and the participants with slow response time in the same federation will seriously drag down the overall training time. In order to solve this problem, the participant cluster of the parameter server is constructed by clustering the participants based on the response time of the participants to the parameter server. Specifically, each parameter party with a short response time to the parameter server is clustered into a cluster, and the participants of the parameter server are clustered, so that all the participants of the federated learning are clustered into different clusters, so that those belonging to the same cluster are clustered. The parties have similar response times.

通常影响参与方响应速度的因素是参与方与参数服务器间的距离以及参与方自身的处理能力，若参与方与参数服务器在同一个区域，则能够较快地响应参数服务器的请求，因此，通常一个参数服务器基于响应时间选择的参与方群簇由地理位置较近的参与方组成。而各参数服务器作为模型参数交换的中间服务器，其接收参与方基于本地数据进行训练得到的梯度数据，即使有参与方被多个参数服务器选择，并不会对训练结果造成影响，因此，本实施例中可以仅以响应时间进行参与方群簇的划分。The factors that usually affect the response speed of the participant are the distance between the participant and the parameter server and the processing capability of the participant. If the participant and the parameter server are in the same area, they can respond to the request of the parameter server faster. Therefore, usually A parameter server selects a cluster of participants based on response time consisting of geographically close participants. As an intermediate server for model parameter exchange, each parameter server receives the gradient data obtained by the participants through training based on local data. Even if a participant is selected by multiple parameter servers, the training results will not be affected. Therefore, this implementation In this example, the partitioning of the participant clusters can be performed only by the response time.

进一步地，在联邦学习的每次训练迭代中，服务器只选择相同簇内的前N个参与方进行联合模型迭代，因此极大降低响应慢的参与方对响应快的参与方的拖累。Further, in each training iteration of federated learning, the server only selects the top N participants in the same cluster for joint model iteration, thus greatly reducing the drag of slow-responding participants on fast-responding participants.

具体地，参与方群簇的构建可以在联邦学习建模的初始化过程中，初始化包括了参与方聚簇的初始化。以每个参数服务器为中心，各参数服务器分别向全部参与方发送请求，获取各参与方响应请求的响应时间。以响应时间作为聚类，聚类的目标是使落入同一聚簇的参与方与对应服务器平均响应时间最短。具体地，根据响应时间，选择最快较近的多个参与方服务器构建对应的参与方群簇。从而形成每个服务器对应的参与方聚簇。Specifically, the construction of the participant cluster can be in the initialization process of the federated learning modeling, and the initialization includes the initialization of the participant cluster. Taking each parameter server as the center, each parameter server sends a request to all participants respectively, and obtains the response time of each participant to respond to the request. Taking response time as clustering, the goal of clustering is to make the average response time of participants and corresponding servers in the same cluster to be the shortest. Specifically, according to the response time, the fastest and closest multiple participant servers are selected to construct corresponding participant clusters. Thus, a cluster of participants corresponding to each server is formed.

基于参与方聚簇，在每一轮迭代训练中，各参数服务器向对应的参与方聚簇发送梯度更新请求，接收参与方聚簇中最快响应梯度更新请求的前N个参与方利用本地数据训练得到的梯度数据。Based on participant clustering, in each round of iterative training, each parameter server sends a gradient update request to the corresponding participant cluster, and receives the top N participants in the participant cluster that respond to the gradient update request fastest using local data. Gradient data obtained from training.

具体地，各参数服务器分别向各参与方发送梯度更新请求，只接收前N个最快响应的参与方所发送的梯度数据。例如，N为50，参数服务器的数量为5个，在各地分布有1000个参与方，每个参数服务器分别对应一个参与方群簇，共有5个参与方群簇。则每个参数服务器只选择前50个最先响应的参与方的梯度数据。Specifically, each parameter server sends a gradient update request to each participant, and only receives the gradient data sent by the top N fastest responding participants. For example, N is 50, the number of parameter servers is 5, there are 1000 participants distributed in various places, each parameter server corresponds to a participant cluster, and there are 5 participant clusters in total. Then each parameter server only selects the gradient data of the first 50 participants who responded first.

可以预见的是，由于参数服务器根据响应时间，选择参与方群簇中响应最快的前N个参与方的梯度数据，因而，响应快的参与方不会被响应慢的参与方拖累，从而降低了参数服务器的总等待时间，提升联合建模效率。It is foreseeable that since the parameter server selects the gradient data of the top N participants with the fastest response in the participant cluster according to the response time, the participants with fast responses will not be dragged down by the participants with slow responses, thereby reducing the number of participants. The total waiting time of the parameter server is reduced, and the joint modeling efficiency is improved.

在另一个实施例中，基于参数服务器间的响应时间构建参数服务器拓扑结构的方式，包括：获取参数服务器间的响应时间；基于响应时间，以最短交换时间为目标，构建直连型的参数服务器拓扑结构。In another embodiment, the method of constructing a parameter server topology structure based on the response time between the parameter servers includes: obtaining the response time between the parameter servers; Topology.

其中，直连型的参数服务器拓扑结构如图4所示，各参与服务器间依次连接。与传统的直连型的参数服务器拓扑结构不同的是，本实施例中的直连型的参数服务器拓扑结构，基于服务器间的响应时间，考虑了交换时间。通过使得交换时间最短，构建的直连型的参数服务器拓扑结构提高了数据交换效率。Among them, the topological structure of the parameter server of the direct connection type is shown in Figure 4, and the participating servers are connected in sequence. Different from the traditional direct-connected parameter server topology, the direct-connected parameter server topology in this embodiment considers the exchange time based on the response time between servers. By making the exchange time shortest, the constructed direct-connected parameter server topology improves the data exchange efficiency.

具体地，基于响应时间，以最短交换时间为目标，构建参数服务器拓扑结构，包括：分别以各参数服务器为起点，迭代根据参数服务器间的响应时间，选择与起点响应时间最短且为未连接状态的参数服务器作为下一起点的步骤，直至连接所有的参数服务器，得到多个候选参数服务器拓扑结构；将总响应时间最短的候选参数服务器作为最终的参数服务器拓扑结构。Specifically, based on the response time, with the shortest exchange time as the goal, construct the parameter server topology structure, including: taking each parameter server as the starting point, iteratively selects the response time with the starting point with the shortest response time and is in the disconnected state according to the response time between the parameter servers. The parameter server is used as the next starting point until all parameter servers are connected, and multiple candidate parameter server topology structures are obtained; the candidate parameter server with the shortest total response time is used as the final parameter server topology structure.

其中，参数服务器拓扑结构可以在联邦学习联合建模的初始化时构建。每个参数服务器向其它的参数服务发送请求，获取各参数服务器响应自身请求的响应时间。一个实施例中，各参数服务器之间的响应时间如图9所示。Among them, the parameter server topology can be constructed during the initialization of federated learning joint modeling. Each parameter server sends a request to other parameter services, and obtains the response time of each parameter server in response to its own request. In one embodiment, the response time between the parameter servers is shown in FIG. 9 .

在此基础上，执行了下步骤：On this basis, the following steps were performed:

S10，分别以各参数服务器为起点，选择与起点响应时间最短的参数服务器作为下一起点。以图9为例，分别选择选择服务器A、参数服务器B、参数服务器C、参数服务器D以及参数服务器E为起点，选择与起点响应时间最短的参数服务器作为下一起点。以选择服务器A作为起点为例，选择与参数服务器A响应时间最短的参数服务器B作为下一节点。S10, each parameter server is taken as the starting point, and the parameter server with the shortest response time from the starting point is selected as the next starting point. Taking FIG. 9 as an example, the selection server A, parameter server B, parameter server C, parameter server D and parameter server E are respectively selected as the starting point, and the parameter server with the shortest response time from the starting point is selected as the next starting point. Taking server A as the starting point as an example, select parameter server B with the shortest response time from parameter server A as the next node.

S11，重复上述步骤S10，直至连接所有的参数服务器，得到多个候选参数服务器拓扑结构。S11, the above step S10 is repeated until all parameter servers are connected, and a plurality of candidate parameter server topology structures are obtained.

具体地，根据参数服务器间的响应时间，选择与起点响应时间最短且为未连接状态的参数服务器作为下一起点。例如，以参数服务器A作为起点为例，第二个节点为参数服务器B，第三个节点为参数服务器C，第四个点节为参数服务器D，第五个节点的判断时，参数服务器B为参数服务器E的响应时间相同，但参数服务器B为已连接状态，则选择参数服务器E作为第五个节点。则以参数服务器A为起点构建得到的参数服务器拓扑结构如图10所示。以参数服务器E为起点构建得到的参数服务器拓扑结构如图11所示。Specifically, according to the response time between the parameter servers, the parameter server with the shortest response time to the starting point and in an unconnected state is selected as the next starting point. For example, taking parameter server A as the starting point, the second node is parameter server B, the third node is parameter server C, the fourth node is parameter server D, and when the fifth node judges, parameter server B If the response time of parameter server E is the same, but parameter server B is connected, parameter server E is selected as the fifth node. Then, the topology structure of the parameter server constructed with the parameter server A as the starting point is shown in Fig. 10 . Figure 11 shows the topology of the parameter server constructed with the parameter server E as the starting point.

分别以各参数服务器为起点，则能够构建得到多个候选参数服务器拓扑结构。Taking each parameter server as a starting point, a plurality of candidate parameter server topology structures can be constructed and obtained.

S12，将总响应时间最短的候选参数服务器作为最终的参数服务器拓扑结构。S12, the candidate parameter server with the shortest total response time is used as the final parameter server topology structure.

其中，总响应时间是各连接的参数服务器间的响应时间的总和。以图10所示的参数服务器拓扑结构为例，总响应时间为10，以图11所示的参数服务器拓扑结构为例，总响应时间为12。总响应时间最短，则基于该参数服务器拓扑结构，参数服务器间进行数据交换的时间也最短。经过比较，以A为起点的直连型参数服务器拓扑结构的总响应时间最短，则以图10所示的结构作为最终的参数服务器拓扑结构。可以比较得到，由于总响应时间最短，利用该直连型参数服务器拓扑结构进行梯度数据交换时，所花费的时间也最短，能够提高数据交换效率。Among them, the total response time is the sum of the response times between the connected parameter servers. Taking the parameter server topology shown in FIG. 10 as an example, the total response time is 10, and taking the parameter server topology shown in FIG. 11 as an example, the total response time is 12. If the total response time is the shortest, based on the parameter server topology, the data exchange time between the parameter servers is also the shortest. After comparison, the total response time of the direct-connected parameter server topology with A as the starting point is the shortest, and the structure shown in Figure 10 is used as the final parameter server topology. It can be compared that, due to the shortest total response time, when using the direct-connected parameter server topology to exchange gradient data, the time spent is also the shortest, which can improve data exchange efficiency.

直连型参数服务器需要交互多次才能使得模型收敛，同时也会由于单点故障导致系统稳定性差。在此基础上，提供了另一种参数服务器拓扑结构的构建方式。Directly connected parameter servers require multiple interactions to make the model converge, and also lead to poor system stability due to a single point of failure. On this basis, another construction method of parameter server topology is provided.

在另一个实施例中，基于参数服务器间的响应时间构建参数服务器拓扑结构的方式，包括：获取参数服务器间的响应时间；基于响应时间，以最短交换时间和最大度约束为目标，构建参数服务器拓扑结构。In another embodiment, the method of constructing a parameter server topology structure based on the response time between the parameter servers includes: obtaining the response time between the parameter servers; Topology.

最大度约束，是指参数服务器拓扑结构中各节点的边的最大数量。通过将最大度的数量设置为至少为2，即各节点的边的数量至少为2，能够使得参数服务器间的交互关系可以有多种，避免单点故障，但又不至于过多而影响效率。通过最短交换时间和最大度约束，综合了交换效率以及系统的稳定性。The maximum degree constraint refers to the maximum number of edges of each node in the parameter server topology. By setting the maximum number of degrees to at least 2, that is, the number of edges of each node is at least 2, it is possible to make the interaction relationship between parameter servers various, avoiding single point of failure, but not too much and affecting efficiency . Through the shortest exchange time and maximum degree constraints, the exchange efficiency and the stability of the system are synthesized.

具体地，基于响应时间，以最短交换时间和最大度约束为目标，构建参数服务器拓扑结构，包括：Specifically, based on response time, with the goal of shortest exchange time and maximum degree constraints, a parameter server topology is constructed, including:

S121：以任意参数服务器为起点，根据所述参数服务器间的响应时间，选择与所述起点响应时间最短且为未连接状态的参数服务器作为下一起点。S121: Taking any parameter server as a starting point, and according to the response time between the parameter servers, select a parameter server with the shortest response time from the starting point and in an unconnected state as the next starting point.

以图12所示，基于全部参数服务器间的响应时间，以任意参数服务器为起点，如以参数服务器B为起点，选择与参数服务器B响应时间最短的参数服务器A作为下一起点。若遇到响应时间相同的节点，随机选择其中一条作为下一起点。As shown in FIG. 12 , based on the response time among all parameter servers, starting from any parameter server, such as parameter server B, the parameter server A with the shortest response time with parameter server B is selected as the next starting point. If nodes with the same response time are encountered, one of them is randomly selected as the next starting point.

S122，迭代上述步骤S121，直至连接所有的参数服务器，得到参数服务器连通结构。S122, the above step S121 is iterated until all parameter servers are connected, and a parameter server connection structure is obtained.

以图12为例，以参数服务器B作为起点，选择与参数服务器B间的响应时间最短的参数服务器A为第二个节点，选择参数服务器C作为第三个节点，选择参数服务器D作为第四个节点，选择参数服务器E作为第五个节点，得到的参数服务器连通结构如图12所示。Taking Figure 12 as an example, with parameter server B as the starting point, parameter server A with the shortest response time with parameter server B is selected as the second node, parameter server C is selected as the third node, and parameter server D is selected as the fourth node. select the parameter server E as the fifth node, and the obtained connection structure of the parameter server is shown in Figure 12.

S123，按照参数服务器的响应时间从小到大排序，基于最大度约束依次将没有加入所述参数服务器连通结构的响应关系加入所述参数服务器连通结构，得到最终的参数服务器拓扑结构。S123: Sort the response time of the parameter server from small to large, and sequentially add the response relationship not added to the parameter server connection structure into the parameter server connection structure based on the maximum degree constraint, to obtain the final parameter server topology structure.

具体地，响应关系是指服务器间用于表示响应时间的连接关系，可以为拓扑结构中的一条边。可以按照参数服务器的响应时间从小到大排序，依次将没有加入参数服务器连通结构的响应关系(边)加入到参数服务器连通结构中。迭代过程中，如果待加入的边两边的任意一个节点的度等于最大度数量，则放弃加入这条边。直到可加入的边均加入参数服务器连通结构中为止，最终形成服务器节点的网络拓扑结构。如图12所示，最大度数量设置为3，在按照响应时间排序之后，依次添加边(B->C,B->E,A-E)到图中，构成最终的参数服务器拓扑结构。Specifically, the response relationship refers to a connection relationship between servers for expressing response time, which may be an edge in a topology structure. The response time of the parameter server can be sorted from small to large, and the response relationships (edges) that are not added to the parameter server connection structure can be added to the parameter server connection structure in turn. In the iterative process, if the degree of any node on both sides of the edge to be added is equal to the maximum number of degrees, the edge will be discarded. Until all the joinable edges are added to the parameter server connection structure, the network topology of the server node is finally formed. As shown in Figure 12, the maximum number of degrees is set to 3. After sorting according to the response time, edges (B->C, B->E, A-E) are added to the graph in turn to form the final parameter server topology.

本实施例中，通过利用服务器间的响应时间构建参数服务器拓扑结构，考虑了节点响应时延长短，硬件地域等问题对联邦学习造成的影响。基于最大度约束构建的参数服务器拓扑结构，各参数服务器不是依次单点连接的关系，具有多方向性，即使一个节点出现单点故障，仍然可以通过其它的连接关系与其它参数服务器交换数据。该方法考虑了传递效率以及系统的稳定性。In this embodiment, by using the response time between servers to construct a parameter server topology structure, the impact on federated learning caused by problems such as short node response time and hardware region is considered. The parameter server topology is constructed based on the maximum degree constraint. Each parameter server is not connected to a single point in sequence, and has multi-directionality. Even if a node has a single point of failure, it can still exchange data with other parameter servers through other connection relationships. The method takes into account the transfer efficiency as well as the stability of the system.

本申请的基于联邦学习的数据处理方法，可应用于任何多家机构在不暴露各自隐私数据前提下的联合建模，如不同银行或信贷机构间的风控联合建模，不同医疗异构间的病例诊断联合建模、不同电商平台间的商品推荐联合建模等。在基于联邦学习的联合模型训练过程中，可以实现自适应调整的去中心化算法，在不损失联合建模效果的前提下提升联邦学习整体效率和系统鲁棒性，提升产品使用体验。The data processing method based on federated learning in this application can be applied to the joint modeling of any number of institutions without exposing their own private data, such as joint modeling of risk control between different banks or credit institutions, and between different medical heterogeneous Joint modeling of case diagnosis, joint modeling of product recommendation between different e-commerce platforms, etc. In the joint model training process based on federated learning, a decentralized algorithm for self-adaptive adjustment can be implemented, which can improve the overall efficiency and system robustness of federated learning without losing the effect of joint modeling, and improve the product experience.

具体地，一个实施例中，基于联邦学习的数据处理方法，如图13所示，包括以下步骤：Specifically, in one embodiment, the data processing method based on federated learning, as shown in FIG. 13 , includes the following steps:

S131：联邦学习初始化。在该步骤中，参数服务器拓扑结构初始化、参与方群簇初始化以及联合模型参数初始化。S131: Federated learning initialization. In this step, the parameter server topology is initialized, the participant cluster is initialized, and the joint model parameters are initialized.

具体地，参数服务器拓扑结构初始化，包括：Specifically, the parameter server topology is initialized, including:

1)、记录所有服务器之间的响应时间。1), record the response time between all servers.

2)、以任意参数服务器为起点，迭代根据参数服务器间的响应时间，选择与起点响应时间最短且为未连接状态的参数服务器作为下一起点的步骤，直至连接所有的参数服务器，得到参数服务器连通结构。2) Take any parameter server as the starting point, iteratively select the parameter server with the shortest response time and the disconnected state as the next starting point according to the response time between the parameter servers, until all parameter servers are connected, and the parameter server is obtained connected structure.

3)、按照参数服务器的响应时间从小到大排序，依次将没有加入参数服务器连通结构的响应关系加入到参数服务器连通结构中。迭代过程中，如果待加入的边(响应关系)两边的任意一个节点的度等于最大度数量，则放弃加入这条边。直到可加入的边均加入参数服务器连通结构中为止，最终形成服务器节点的网络拓扑结构。3) Rank the response time of the parameter server from small to large, and sequentially add the response relationship that is not added to the parameter server connection structure to the parameter server connection structure. In the iterative process, if the degree of any node on both sides of the edge to be added (response relationship) is equal to the maximum number of degrees, the edge will be discarded. Until all the joinable edges are added to the parameter server connection structure, the network topology of the server node is finally formed.

其中，参与方群簇初始化，具体包括：各参数服务器向各参与方发送请求，以每个服务器为中心节点，以服务器与参与方的响应时间作聚类，形成多个参与方聚簇(与服务器个数对应)。整个聚类算法使得落入同一聚簇的参与方与对应的服务器平均响应时间最短，构建各参数服务器对应的参与方群簇。Among them, the initialization of the participant cluster specifically includes: each parameter server sends a request to each participant, with each server as the central node, and the response time of the server and the participant as a cluster to form a plurality of participant clusters (with the number of servers). The whole clustering algorithm makes the average response time of the participants who fall into the same cluster and the corresponding servers the shortest, and constructs the participant clusters corresponding to each parameter server.

其中，联邦模型参数初始化则保证每个服务器上的模型参数初始化为同一套。Among them, the initialization of federated model parameters ensures that the model parameters on each server are initialized to the same set.

S132，参与方客户端(如，用户的手机、多家银行，即产品使用方)从服务器端下载联合模型，该联合模型在训练开始阶段被服务器簇初始化(所有服务器的联合模型相同)。S132, the participant client (eg, the user's mobile phone, multiple banks, ie product users) downloads the joint model from the server, and the joint model is initialized by the server cluster at the beginning of training (all servers have the same joint model).

S133，参与方客户端利用本地数据对模型进行训练迭代。S133 , the participant client uses local data to perform training iteration on the model.

S134，以每个参数服务器为中心节点，每个参与方聚簇选最快响应的前n个参与方，向对应服务器上传汇总梯度。n的大小由服务器设定，联邦模型训练通过n的大小控制参与每轮迭代的参与方的最大个数。S134 , with each parameter server as the central node, each participant cluster selects the top n participants with the fastest response, and upload the aggregated gradient to the corresponding server. The size of n is set by the server, and the maximum number of participants participating in each round of iteration is controlled by the size of n in federated model training.

S135，服务器簇中的每个服务器等待收集来自对应的参与方上报的梯度更新。当服务器接收到前n个参与方的梯度更新后，将所有收集到的梯度进行聚合，然后与相邻服务器进行梯度交换，交换结束后，每个服务器端对联合模型进行更新。然后进入下一轮迭代(每个服务器再次邀请所有对应参与方聚簇进行联合模型下载和本地模型更新)。S135, each server in the server cluster waits to collect the gradient updates reported by the corresponding participants. When the server receives the gradient updates of the first n participants, it aggregates all the collected gradients, and then exchanges gradients with neighboring servers. After the exchange, each server updates the joint model. Then enter the next round of iteration (each server again invites all corresponding participants to cluster for joint model download and local model update).

其中，在每轮训练迭代中，服务器向对应参与方群聚簇内请求n个参与方的梯度上报，参与方群簇收到请求后，返回响应最快的前n个参与方的梯度。由于收到请求的参与方具有相近的响应时间并且只返回最快的前n个，所以响应快的参与方不会被响应慢的参与方拖累，从而降低服务器的总等待时间、提升联合建模效率。Among them, in each round of training iteration, the server requests the gradient report of n participants in the corresponding participant cluster. After the participant cluster receives the request, it returns the gradients of the top n participants with the fastest response. Since the participants receiving the request have similar response times and only return the fastest top n, the participants who respond quickly will not be dragged down by the participants who respond slowly, thereby reducing the total server waiting time and improving joint modeling. efficiency.

每个服务器等待对应参与方聚簇上报的梯度，然后进行梯度汇总，具体公式如公式1所示。Each server waits for the gradient reported by the corresponding participant cluster, and then summarizes the gradient. The specific formula is shown in Equation 1.

每个服务器进行梯度汇总之后，将自己的梯度按照拓扑结构与领居节点进行交换，交换之后再次进行梯度汇总与参数更新，完成本次模型的更新迭代。After each server summarizes the gradient, it exchanges its own gradient with the neighbor node according to the topology structure. After the exchange, the gradient summary and parameter update are performed again to complete the update iteration of this model.

重复上述S132至S135的步骤，不断将服务器端每次更新的模型下载到参与方本地客户端，直到模型收敛，停止更新。The above steps from S132 to S135 are repeated, and the model updated each time on the server side is continuously downloaded to the local client of the participant, until the model converges, and the update is stopped.

本申请基于联邦学习的数据处理方法，可以是一种通用的自适应调整的去中心化的方法，在联邦学习或分布式训练等场景下，能在有限的训练时间预算内有效提升模型的训练效率，同时保证模型的准确率和网络拓扑稳定性。从应用角度，本方案不仅仅局限于应用在多银行或互金企业参与下的金融反欺诈联合建模场景，对在隐私保护约束下的多客户端联合建模的多种场景均可应用。从模型角度，服务器在联合建模时使用的模型架构均可根据实际应场景灵活变动。The data processing method based on federated learning in this application can be a general self-adaptive and decentralized method. In scenarios such as federated learning or distributed training, it can effectively improve the training of the model within a limited training time budget. efficiency, while ensuring the accuracy of the model and the stability of the network topology. From the application point of view, this scheme is not limited to the joint modeling scenario of financial anti-fraud involving the participation of multiple banks or mutual financial enterprises, and can be applied to various scenarios of joint modeling of multiple clients under the constraints of privacy protection. From a model perspective, the model architecture used by the server in joint modeling can be flexibly changed according to the actual application scenario.

本方案所提出的自适应调整的去中心化联邦训练方法，能在有限的模型训练预算下，通过自适应生成最优的参数服务器的拓扑结构和响应时间聚类选择参与方的方式，提升联邦学习系统的模型训练效率和训练系统稳定性，并实现在有限的训练时间内达到最大的模型正确性。图14展示了在1000个参与方，10个服务器，3种不同的拓扑结构下，在MINIST图像分类数据集实验对比结果，图15展示了在1000个参与方，10个服务器，3种不同的拓扑结构下，在Synthetic实验对比结果。时延结果表示自适应的拓扑结构相比星型和直连型的模型收敛速度要优秀，并且模型效果(正确率)没有衰减。图16展示了不同拓扑结构的复杂度和鲁棒性，结果也证明了本方法在增强系统稳定性和鲁棒性上有优势。进一步验证了本方案的有效性。The self-adaptive decentralized federation training method proposed in this scheme can automatically generate the optimal parameter server topology and response time clustering method to select participants under a limited model training budget to improve federation Learn the model training efficiency and training system stability of the system, and achieve maximum model correctness within a limited training time. Figure 14 shows the experimental comparison results on the MINIST image classification dataset under 1000 participants, 10 servers, and 3 different topologies. Under the topology structure, the results are compared in the Synthetic experiment. The delay results show that the adaptive topology has better convergence speed than the star and direct-connected models, and the model effect (correct rate) is not attenuated. Figure 16 shows the complexity and robustness of different topologies, and the results also demonstrate the advantages of this method in enhancing system stability and robustness. The effectiveness of this scheme is further verified.

应该理解的是，虽然如上所述的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，如上所述的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the steps in the flowcharts involved in the above embodiments are sequentially displayed according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and the steps may be executed in other orders. Moreover, at least a part of the steps in the flowcharts involved in the above embodiments may include multiple steps or multiple stages, and these steps or stages are not necessarily executed and completed at the same time, but may be performed at different times The execution order of these steps or phases is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or phases in the other steps.

基于同样的发明构思，本申请实施例还提供了一种用于实现上述所涉及的基于联邦学习的数据处理方法的基于联邦学习的数据处理装置。该装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似，故下面所提供的一个或多个基于联邦学习的数据处理装置实施例中的具体限定可以参见上文中对于基于联邦学习的数据处理方法的限定，在此不再赘述。Based on the same inventive concept, an embodiment of the present application also provides a federated learning-based data processing apparatus for implementing the above-mentioned federated learning-based data processing method. The implementation solution for solving the problem provided by the device is similar to the implementation solution described in the above method, so the specific limitations in one or more embodiments of the federated learning-based data processing device provided below can refer to the above for the federated learning-based data processing device. The limitations of the learned data processing method will not be repeated here.

在一个实施例中，如图17所示，提供了一种基于联邦学习的数据处理装置，应用于联邦学习架构中的各参数服务器，包括：In one embodiment, as shown in FIG. 17, a data processing apparatus based on federated learning is provided, which is applied to each parameter server in the federated learning architecture, including:

接收模块1702，用于接收对应的参与方群簇中的参与方利用本地数据训练得到的梯度数据。The receiving module 1702 is configured to receive the gradient data obtained by the participants in the corresponding participant cluster using local data training.

汇总模块1704，用于对所述参与方群簇的所述参与方的梯度数据进行汇总，得到梯度汇总数据。The summarizing module 1704 is configured to summarize the gradient data of the participants of the participant cluster to obtain the gradient summary data.

结构获取模块1706，用于获取参数服务器拓扑结构，其中，参数服务器拓扑结构是初始化联邦学习模型时基于参数服务器间的响应时间构建的。The structure obtaining module 1706 is configured to obtain the parameter server topology structure, wherein the parameter server topology structure is constructed based on the response time between the parameter servers when the federated learning model is initialized.

交换模块1708，用于基于所述参数服务器拓扑结构，与邻接的参数服务器交换所述梯度汇总数据。The exchange module 1708 is configured to exchange the gradient summary data with adjacent parameter servers based on the parameter server topology.

更新模块1710，用于根据交换获得的全部参数服务器的梯度汇总数据，更新联合模型的参数。The updating module 1710 is configured to update the parameters of the joint model according to the gradient summary data of all parameter servers obtained through the exchange.

在另一个实施例中，接收模块，用于向对应的参与方聚簇发送梯度更新请求，接收所述参与方聚簇中最快响应所述梯度更新请求的前N个参与方利用本地数据训练得到的梯度数据。In another embodiment, a receiving module is configured to send a gradient update request to a corresponding participant cluster, and receive the top N participants in the participant cluster that respond to the gradient update request fastest using local data for training the obtained gradient data.

在另一个实施例中，结构获取模块，包括：In another embodiment, the structure acquisition module includes:

时间获取模块，用于获取参数服务器间的响应时间。The time acquisition module is used to acquire the response time between parameter servers.

构建模块，用于基于所述响应时间，以最短交换时间为目标，构建直连型的参数服务器拓扑结构。The building module is configured to build a direct-connected parameter server topology with the shortest exchange time as the goal based on the response time.

在另一个实施例中，构建模块，用于分别以各参数服务器为起点，迭代根据所述参数服务器间的响应时间，选择与所述起点响应时间最短且为未连接状态的参数服务器作为下一起点的步骤，直至连接所有的参数服务器，得到多个候选参数服务器拓扑结构；将总响应时间最短的所述候选参数服务器作为最终的参数服务器拓扑结构。In another embodiment, a building module is configured to take each parameter server as a starting point, and iteratively select the parameter server with the shortest response time and the disconnected state from the starting point according to the response time between the parameter servers as the next parameter server. until all parameter servers are connected, and multiple candidate parameter server topologies are obtained; the candidate parameter server with the shortest total response time is used as the final parameter server topology.

在另一个实施例中，构建模块，用于基于所述响应时间，以最短交换时间和最大度约束为目标，构建参数服务器拓扑结构，所述最大度的数量至少为2。In another embodiment, a building module is configured to build a parameter server topology based on the response time, targeting a minimum exchange time and a maximum degree constraint, the number of maximum degrees being at least two.

在另一个实施例中，构建模块，用于以任意参数服务器为起点，迭代根据所述参数服务器间的响应时间，选择与所述起点响应时间最短且为未连接状态的参数服务器作为下一起点的步骤，直至连接所有的参数服务器，得到参数服务器连通结构；按照参数服务器的响应时间从小到大排序，基于最大度约束依次将没有加入所述参数服务器连通结构的响应关系加入所述参数服务器连通结构，得到最终的参数服务器拓扑结构。In another embodiment, a building module is configured to take any parameter server as a starting point, and iteratively select a parameter server with the shortest response time and a disconnected state from the starting point according to the response time between the parameter servers as the next starting point until all parameter servers are connected, and the parameter server connection structure is obtained; according to the response time of the parameter server, the response time is sorted from small to large, and the response relationship that is not added to the parameter server connection structure is sequentially added to the parameter server connection based on the maximum degree constraint. structure to get the final parameter server topology.

上述基于联邦学习的数据处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。Each module in the above-mentioned federated learning-based data processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是服务器，其内部结构图可以如图18所示。该计算机设备包括处理器、存储器、输入/输出接口(Input/Output，简称I/O)和通信接口。其中，处理器、存储器和输入/输出接口通过系统总线连接，通信接口通过输入/输出接口连接到系统总线。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储联邦学习的训练数据。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于联邦学习的数据处理方法。In one embodiment, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 18 . The computer device includes a processor, a memory, an input/output interface (Input/Output, I/O for short) and a communication interface. Wherein, the processor, the memory and the input/output interface are connected through the system bus, and the communication interface is connected to the system bus through the input/output interface. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes non-volatile storage media and internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store training data for federated learning. The input/output interface of the computer device is used to exchange information between the processor and external devices. The communication interface of the computer device is used to communicate with an external terminal through a network connection. The computer program implements a federated learning-based data processing method when executed by the processor.

本领域技术人员可以理解，图18中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 18 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

在一个实施例中，提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现上述各实施例的基于联邦学习的数据处理方法的步骤。In one embodiment, a computer device is provided, including a memory and a processor, where a computer program is stored in the memory, and when the processor executes the computer program, the processor implements the steps of the federated learning-based data processing method of the foregoing embodiments.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现上述各实施例的基于联邦学习的数据处理方法的步骤。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the federated learning-based data processing method of the foregoing embodiments.

在一个实施例中，提供了一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述各实施例的基于联邦学习的数据处理方法的步骤。In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps of the federated learning-based data processing methods of the foregoing embodiments.

需要说明的是，本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)，均为经用户授权或者经过各方充分授权的信息和数据，且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) involved in this application are all It is the information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data need to comply with the relevant laws, regulations and standards of the relevant countries and regions.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-OnlyMemory，ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(ReRAM)、磁变存储器(Magnetoresistive Random Access Memory，MRAM)、铁电存储器(Ferroelectric Random Access Memory，FRAM)、相变存储器(Phase Change Memory，PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory，RAM)或外部高速缓冲存储器等。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器(Static Random Access Memory，SRAM)或动态随机存取存储器(Dynamic RandomAccess Memory，DRAM)等。本申请所提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等，不限于此。本申请所提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器等，不限于此。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to a memory, a database or other media used in the various embodiments provided in this application may include at least one of a non-volatile memory and a volatile memory. Non-volatile memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive memory (ReRAM), magnetic variable memory (Magnetoresistive Random Memory) Access Memory, MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (Phase Change Memory, PCM), graphene memory, etc. Volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration and not limitation, the RAM may be in various forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM). The databases involved in the various embodiments provided in this application may include at least one of relational databases and non-relational databases. The non-relational database may include a blockchain-based distributed database, etc., but is not limited thereto. The processors involved in the various embodiments provided in this application may be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, data processing logic devices based on quantum computing, etc., and are not limited to this.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the patent of the present application. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the present application should be determined by the appended claims.

Claims

1. A data processing method based on federated learning, wherein the method is applied to each parameter server in the federated learning architecture, and the method comprises:

Receive the gradient data obtained by the participants in the corresponding participant cluster using the local data training;

summarizing the gradient data of the participant cluster to obtain the gradient summary data;

obtaining a parameter server topology, wherein the parameter server topology is constructed based on the response time between the parameter servers when the federated learning model is initialized;

exchanging the gradient summary data with an adjacent parameter server based on the parameter server topology;

The parameters of the joint model are updated according to the gradient summary data of all parameter servers obtained by the exchange.

2. The method according to claim 1, wherein receiving the gradient data obtained by the participants in the corresponding participant cluster using local data training, comprising:

Send a gradient update request to the corresponding participant cluster, and receive gradient data obtained by training with local data from the top N participants in the participant cluster that respond to the gradient update request fastest.

3. The method according to claim 1, wherein the method for constructing a parameter server topology structure based on the response time between the parameter servers comprises:

Get the response time between parameter servers;

Based on the response time, a direct-connected parameter server topology is constructed with the shortest exchange time as the goal.

4. The method according to claim 3, wherein, based on the response time, the parameter server topology structure is constructed with the shortest exchange time as a target, comprising:

Taking each parameter server as the starting point, iteratively selects the parameter server with the shortest response time with the starting point and is in an unconnected state as the next starting point according to the response time between the parameter servers, until all the parameter servers are connected, obtaining: Multiple candidate parameter server topologies;

The candidate parameter server with the shortest total response time is used as the final parameter server topology.

5. The method according to claim 1, wherein the method for constructing a parameter server topology structure based on the response time between the parameter servers comprises:

Get the response time between parameter servers;

Based on the response time, a parameter server topology is constructed with the goal of a shortest exchange time and a maximum degree constraint, the number of which is at least two.

6. The method according to claim 5, wherein, based on the response time, with the shortest exchange time and the maximum degree constraint as the goal, constructing a parameter server topology structure, comprising:

Taking any parameter server as the starting point, iteratively selects the parameter server with the shortest response time and the disconnected state from the starting point as the next starting point according to the response time between the parameter servers, until all parameter servers are connected, and the parameters are obtained. Server connectivity structure;

The response time of the parameter server is sorted from small to large, and the response relationships not added to the parameter server connection structure are sequentially added to the parameter server connection structure based on the maximum degree constraint to obtain the final parameter server topology structure.

7. A data processing device based on federated learning, characterized in that it is applied to each parameter server in a federated learning architecture, the device comprising:

The receiving module is used to receive the gradient data obtained by the participants in the corresponding participant cluster using the local data training;

an aggregation module, configured to aggregate the gradient data of the participants of the participant cluster to obtain the gradient aggregated data;

a structure acquisition module, configured to acquire a parameter server topology, wherein the parameter server topology is constructed based on the response time between the parameter servers when the federated learning model is initialized;

an exchange module, configured to exchange the gradient summary data with an adjacent parameter server based on the parameter server topology;

The update module is used to summarize the data according to the gradients of all parameter servers obtained by the exchange, and update the parameters of the joint model.

8. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the method according to any one of claims 1 to 6 when the processor executes the computer program. step.

9. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 6 are implemented.

10. A computer program product comprising a computer program, characterized in that the computer program implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed by a processor.