CN118301114A

CN118301114A - Switch resource management method and system for intra-edge stateless network aggregation

Info

Publication number: CN118301114A
Application number: CN202410330731.6A
Authority: CN
Inventors: 郭得科; 夏俊旭; 罗来龙; 程葛瑶; 顾舜贤
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Filing date: 2024-03-22
Publication date: 2024-07-05

Abstract

The application relates to a method and a system for managing the resources of an integrated switch in an edge stateless network, which can dynamically adjust to different network conditions by maintaining a sliding window with a size at each source node, so as to avoid that messages with different sequence numbers are erroneously matched to the same aggregator. Meanwhile, the problem of window asynchronism is solved by using the aggregator with the size twice that of the maximum sliding window through the first index of the aggregator, and when the aggregator completes aggregation, the second index of the aggregator initializes the other aggregator which is not matched currently so as to recycle the memory of the switch and realize resynchronization of the window, so that the aggregation process can automatically recover to a normal aggregation processing state under the condition that the windows are asynchronous, the recovery of the memory of the switch is realized while packet loss is effectively realized, and the memory utilization rate of the switch is effectively improved.

Description

Switch resource management method and system for intra-edge stateless network aggregation

Technical Field

The invention belongs to the technical field of model data processing, and relates to a method and a system for managing switch resources in an edge stateless network.

Background

Federal Learning (FL) is a promising framework that allows collaborative training of global models without sharing participants' personal data. Typically, there is one central server that manages the training process and gathers model updates, while multiple clients perform local model training using their respective data. Despite the wide range of applications and high performance, this framework can be further improved by accelerating model aggregation and enhancing data privacy. In one aspect, multiple clients send respective local models to a central server, forming a "many-to-one" incast transmission over the network, which may block the central server, slowing down the overall training process. On the other hand, after collecting the local model, the central server performs stateful aggregation, which requires that the local updates of each client be preserved in each round. By such historical update data, the central server can infer sensitive attributes and even recover the original data for each client. Thus, the privacy preserving advantages claimed in federal learning may be compromised. However, current research is focused on communication optimization or data privacy, and there is a technical problem of low memory utilization of the switch.

Disclosure of Invention

In view of the above-mentioned problems in the conventional methods, the present invention proposes a method for managing switch resources in an edge stateless network, a system for managing switch resources in an edge stateless network, a computer device, and a computer-readable storage medium, which can effectively improve the memory utilization of the switch.

In order to achieve the above object, the embodiment of the present invention adopts the following technical scheme:

in one aspect, a method for managing switch resources in an edge stateless network is provided, including the steps of:

Acquiring a batch of messages sent by different connected clients according to the maintained sliding window; the message carries a sequence number and a fixed number of weighted model updated values, the sequence number corresponds to the position of the weighted model updated values in the array of the message, and the maximum value of the sliding window does not exceed the message number threshold value cached by the aggregation switch;

Finding a matched aggregator by using a first index of the aggregator according to the serial numbers of the messages;

If any aggregator receives the matched message and then reaches an aggregation forwarding condition, forwarding an aggregation result corresponding to any aggregator to a subsequent node, and initializing another aggregator which is not matched currently according to a second index of the aggregator.

In another aspect, there is also provided a switch resource management system for edge stateless intra-network aggregation, including:

the message acquisition module is used for acquiring a batch of messages sent by different connected clients according to the maintained sliding window; the message carries a sequence number and a fixed number of weighted model updated values, the sequence number corresponds to the position of the weighted model updated values in the array of the message, and the maximum value of the sliding window does not exceed the message number threshold value cached by the aggregation switch;

The index matching module is used for finding a matched aggregator by using a first aggregator index according to the serial numbers of the messages;

and the forwarding initial module is used for forwarding the aggregation result corresponding to any aggregator to a subsequent node when the aggregation forwarding condition is reached after any aggregator receives the matched message, and initializing another aggregator which is not matched currently according to the second index of the aggregator.

In yet another aspect, there is also provided a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the above methods for intra-edge stateless intra-network aggregation switch resource management when the computer program is executed.

In yet another aspect, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the above methods for intra-edge stateless intra-network aggregation switch resource management.

One of the above technical solutions has the following advantages and beneficial effects:

According to the method and the system for managing the switch resources in the edge stateless network, a sliding window is maintained at each source node, wherein the sliding window comprises the sent and unacknowledged messages and the processing of the messages which can be sent in the future, the size of the sliding window can be dynamically adjusted to adapt to different network conditions, but the maximum value of the sliding window does not exceed the threshold value of the number of the messages cached by the aggregation switch, so that overflow of the switch memory is prevented, and the messages with different serial numbers are prevented from being erroneously matched to the same aggregator. Meanwhile, the problem of window asynchronism is solved by using the aggregator with the size twice that of the maximum sliding window through the first index of the aggregator, and when the aggregator completes aggregation, the second index of the aggregator initializes the other aggregator which is not matched currently so as to recycle the memory of the switch and realize resynchronization of the window, so that the aggregation process can automatically recover to a normal aggregation processing state under the condition that the windows are asynchronous, the recovery of the memory of the switch is realized while packet loss is effectively realized, and the memory utilization rate of the switch is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments or the conventional techniques of the present application, the drawings required for the descriptions of the embodiments or the conventional techniques will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a comparative schematic diagram of a training process of Federal learning in one embodiment, wherein (a) is a training process of Federal learning in a conventional manner, and (b) is a training process corresponding to GAIN;

FIG. 2 is a schematic illustration of a polymerization process in a desired case of an embodiment;

FIG. 3 is a flow diagram of a method for intra-edge stateless intra-network aggregation switch resource management in one embodiment;

FIG. 4 is a schematic diagram of an aggregation process in which a packet loss causes window asynchronization in one embodiment;

FIG. 5 is a schematic diagram of the aggregation logic process of GAIN in one embodiment;

FIG. 6 is a schematic diagram of a network topology of an experimental example in one embodiment;

FIG. 7 is a schematic diagram showing the training accuracy over time in one embodiment, wherein (a) corresponding experimental results are evaluated for AlexNet models, (b) corresponding experimental results are evaluated for ResNet models, (c) corresponding experimental results are evaluated for MobileNetV3-S models, and (d) corresponding experimental results are evaluated for VGG11 models;

FIG. 8 is a schematic diagram of experimental results of traffic, update, and convergence in one embodiment, where (a) is traffic received by a Central Server (CS) when 70% accuracy is achieved, (b) is a policy to evaluate deferred global model updates, (c) is a policy to select a subset of clients for update, (d) is a model convergence comparison to select a different number of clients for global model updates for each round;

FIG. 9 is a diagram of training throughput versus different models in one embodiment;

FIG. 10 is a diagram of the impact of the number of aggregators and the packet loss ratio on throughput in one embodiment, where (a) is the impact of different aggregators and (b) is the impact of different packet loss ratios;

FIG. 11 is a graph of performance in performing multiple tasks in one embodiment, (a) accuracy of training time, (b) dynamic task joining and exiting;

fig. 12 is a schematic block diagram of a switch resource management system for edge stateless intra-network aggregation in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It is noted that reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Federal learning is a privacy preserving training method for multi-device collaborative training of global models. In each round of training, the central server updates the global model weights (i.e., model parameters) by aggregating local model updates from a large number of clients using a federal average or derivative algorithm thereof. The updated global model is then distributed back to the clients for the next round of training. However, model updates typically vary in size from megabytes to gigabytes, and clients may have hundreds. Thus, the training process typically generates significant communication overhead and greatly affects the duration of the training.

In conventional research, to reduce communication overhead, clients tend to quantize/compress (or calculate more rounds of) model data prior to uploading, thereby reducing the volume and number of data transmitted. However, these methods may prevent model convergence or require additional training iterations. In terms of data privacy, homomorphic encryption, differential privacy-based methods, and the like are introduced in federal learning to prevent data from being revealed from a central server. Homomorphic encryption, however, increases computational and communication overhead, while differential privacy compromises the accuracy of the final model.

To improve training efficiency, many studies have shown that communication can be deferred until more rounds of local training are completed. In addition, the central server only collects partial model updates of a part of clients at each turn, so that communication overhead is further reduced. These improvements, while effective in reducing communication frequency and enhancing model convergence, are still not able to avoid excessive traffic and communication time per round. For this reason, researchers have further proposed techniques such as model update quantization and compression to alleviate communication overhead by reducing the amount of data transmitted. However, these techniques may incur some information loss. Therefore, they need to trade off between reducing communication overhead and maintaining the fidelity of model updates.

Regarding privacy in federal learning: in particular, data privacy is another challenge faced by federal learning, and many security training schemes have been explored for enhancing the privacy of local updates, such as homomorphic encryption, differential privacy-based methods, and secure multiparty computing. Homomorphic encryption allows the central server to calculate the encrypted data without decrypting, thereby preventing the central server from accessing the private data. However, this method generates a large computational overhead, increasing the amount of communication data. Differential privacy-based methods conceal the data information of individual participants by adding noise in local updates, which, although computationally efficient, can significantly reduce the accuracy of the final model. Secure multi-party computing allows multiple parties to co-compute a function while maintaining input privacy, however, it also introduces significant communication and computational overhead, making the scalability of large federal learning systems challenging. Furthermore, many frameworks of distributed training are under exploration to reduce reliance on a central server, where clients communicate directly with each other to synchronize the global model. While these methods eliminate the need for a central server to collect data from individual participants (clients), they still present a security risk because model updates are exchanged directly between these clients.

In the studies of the present invention, the above conventional methods were considered to be limited because they did not touch the most fundamental problem of the federal learning framework, i.e., where and how to perform model aggregation. Ideally, once the data of the local model meets in the network, the aggregation operation should be completed as soon as possible to reduce bandwidth occupation and achieve faster convergence. Furthermore, the aggregate operation should be stateless and avoid caching any model parameters from the client. Thus, wherever aggregation is performed, the original data is unrecoverable. Based on the above study, the present invention stateless aggregates local model data on the ingress switch before the model data enters the central server. In this approach, the switches involved have to run an aggregation protocol.

Fortunately, the advent of programmable switches enabled legacy network devices to provide a higher level of flexibility and customization. Numerous studies have claimed deploying programmable switches at the edge of a network to achieve autonomy and adaptivity of the network, ultimately improving the business performance of the edge network. When the traffic of a plurality of clients is converged to the same exchanger, the exchanger can actively aggregate and forward the aggregated traffic to the downstream, so that traffic and communication delay are reduced. This process is referred to herein as intra-network polymerization (in-network aggregation). In addition, the switch can immediately aggregate the received messages and only temporarily store partial aggregated results, thereby realizing stateless aggregation.

Despite its advantages, deployment of this approach has met with several challenges in practical applications. For example, model updates involve complex computational processes, particularly time-consuming multiply and divide operations, that require a large number of clock cycles to execute on the switch, which can result in considerable network delays and reduced transmission throughput.

In summary, the present invention proposes GAIN, a secure aggregation acceleration service designed for federal learning, whose design concept is to aggregate model updates (temporarily rather than permanently storing model parameters from clients) on the ingress programmable switch in a stateless manner and then continue to the central server. To address the above challenges, the present invention optimizes the workflow of model updates to enable the switch to perform lightweight addition operations for data aggregation. The invention also designs a new memory recovery mechanism to realize the high-efficiency utilization of the switch memory, and also provides a safe message recovery mechanism to solve the problem of packet loss, and meanwhile, the attack of Acknowledge Character (ACK) spoofing is avoided. The switch memory layout is further optimized to support runtime switch memory allocation, which allows the GAIN to provide runtime resource allocation and training acceleration for multiple tasks. In order to verify the efficiency and performance of the novel method, the invention realizes a GAIN prototype system on the FPGA module, and the analysis result further shows that the GAIN can obtain more considerable performance benefit under the condition of larger number of clients while promoting privacy protection.

Embodiments of the present invention will be described in detail below with reference to the attached drawings in the drawings of the embodiments of the present invention.

First, a comparison of a conventional training process with GAIN may be as shown in fig. 1, assuming that there is one federal learning task in the edge network, involving s clients. Each client has a local data set, the size of which is denoted as n _i (1.ltoreq.i.ltoreq.s). The total size of these datasets is noted as n,These clients cooperate with a central server to train the global model. The central server may be located in an edge network or some location, such as a cloud data center. Assuming that each client and central server are semi-trusted entities, affiliated with different organizations or individuals, they follow training procedures (otherwise, cannot perform federal learning tasks), but feel curious about the data of others. The edge network belongs to one or more network providers that have full management capabilities for the respective devices.

As shown in fig. 1 (a), which illustrates a conventional training process: at the beginning of each round, the central server randomly selects m clients and passes current global model parameters w _t to them. Each selected client then performs local calculations based on its individual local data set and model parameters w _t, and updates the local model parametersAnd sent back to the central server. After receiving these parameters, the central server will log the status of each client fromUpdate toA new global model parameter w _t+1 is then calculated. The update process can be seen as an aggregation of the updates of these models by a weighted average, namely:

Since the central server maintains updates for individual clients in each round, the present invention refers to this aggregation mode as a stateful aggregation mode, which may lead to privacy leakage. Furthermore, such training procedures can result in large amounts of network traffic and poor communication performance, as the central server must frequently collect updates from a large number of clients.

It will be appreciated that the present invention contemplates a generic edge network architecture in which clients connect to an edge network through an ingress switch. These ingress switches are interconnected with the backbone network by switches at the edge aggregation or core layer. These ingress switches are assumed to be programmable switches because such deployment may facilitate active identification, detection, classification, and other operations of incoming traffic.

In order to improve communication performance and data privacy, the invention provides a secure aggregation acceleration service (namely GAIN), which utilizes an entry programmable switch to realize stateless intra-network aggregation, as shown in fig. 1 (b), which is a training process corresponding to GAIN. The GAIN may be provided by a network provider and may be used by the user in a fee-based manner. Notably, even though some ingress switches are traditional switches, they do not have aggregation capability, in which case the GAIN is still valid. In addition, GAIN has scalability to accommodate future deployments across more layers of programmable switches.

In each round of training, the GAIN aggregates local model updates in a stateless manner using programmable switches at the ingress. The aggregated model data is then transmitted to a subsequent node. If the subsequent node is also a programmable switch, it can aggregate model data from its sub-clients as well as model data of upstream switches, thereby further reducing traffic. For traditional switches on paths, they simply forward data according to the routing rules. The aggregated data is finally sent to a central server via a core network. Since model updates from a single client are already aggregated in the edge network, the traffic on the backbone network can be significantly reduced. The central server then updates the global model with these aggregated model data and selects a new set of clients for the next round of training.

Next, the design of the GAIN will be specifically described, which may include federally learned workflow orchestration, stateless aggregation, secure packet loss recovery, switch memory reclamation, and runtime multitasking. Workflow arrangement: in the traditional training process, after each round of training is finished, the selected client uploads own local model parameters to the central server, and then the central server updates the global model through weighted average. But such a workflow is not compatible with GAIN because the switch is unaware of the weight of each client. An easy and efficient way to improve is to pre-cache weights on the switchN _i (i is greater than or equal to 1 and less than or equal to s) is the local data set size of the ith client, and n is the total size of the local data sets of all clients; the multiplication is then performed by the switch on the incoming data to perform a weighted average. Multiplication consumes a significant number of clock cycles and increases the delay in message processing, resulting in reduced throughput. To avoid this, the present invention reorganizes the training process of GAIN, where clients upload their weighted model updatesRather than updated local model parametersNamely:

Weighting of May be obtained by a central server and delivered to each client when performing the choreography training task.

It is assumed that one programmable switch is connected to d selected clients. When performing global model updates, the programmable switch aggregates the local model update data of all child nodes by addition operations, i.eThen the result isTo a subsequent node. For convenience of description, a programmable switch that needs to perform an aggregation operation may be referred to as an aggregation switch. In particular, in an actual deployment, the aggregation switch may also need to aggregate results from other switches or help nearby switches aggregate data for its children (e.g., such switches are legacy switches or lack the aggregate resources required for aggregation). Without loss of generality, these nodes may all be considered children of an aggregation switch, and the aggregation operation on the aggregation switch may be represented as:

Wherein, Data representing the ith child node of the aggregate switch, d _j represents the number of child nodes of aggregate switch j, in the data flows of the different nodes,May be equal toMay also be equal toV is used herein to denote child node data in order to distinguish from the definition of u above, which is data from clients, where v is defined, which may be from clients or from aggregated data of child node switches. The above adjustment causes local model updates to be aggregated by the switch. However, a new challenge is that the central server may not be able to update the global model with these aggregated results. As before, each round of central server may randomly select m from s clients to train. The symbol C may be used to represent a set of clients involved in a federated learning task. In round t+1, the central server will randomly select m clients from set C, denoted set C _t+1. The unselected clients may be represented as set C-C _t+1. According to equation (1), the global model parameters for the t+1st round can be expressed as:

The above can be further written as:

where v _t+1 is the result of the switch aggregation, The central server cannot keep track of local model updates for each client due to GAINCannot be obtainedIs a value of (2). Therefore, the central server cannot update the global model according to equation (1). To solve this problem, an asynchronous federal learning model update scheme is employed in which new global model parameters are calculated from a combination of old global model parameters and new partial updates. Specifically, if w _t represents global model parameters of the t-th round, and w _t+1 represents global model parameters to be calculated for the current round, then there are:

In the above equation, the reference number, Is the result of a weighted average of local model updates in the t+1st round at the selected m clients, which is equal to the aggregate result v _t+1 from the switch. Therefore, the above formula (6) can be rewritten as:

w_t+1＝βw_t+v_t+1 (7)

Wherein, Indicating the proportion of the size of the data set that was not updated in round t+1 to the total data set size. Thus, the central server need only store the global model parameters of the previous round and then calculate the proportion of the current round's unexplored dataset size. Based on this, it can update the global model parameters using the aggregate results from the switches, the overall process is as shown in fig. 1 (b). Because the central server no longer needs to collect and store local model updates from the various clients, instead it only retains updated global model parameters, thus facilitating privacy protection by this approach while correspondingly optimizing the training process to reduce the computational burden on the switches and central server.

Stateless polymerization: since the message is the basic processing unit of the switch, each switch can be made to perform the aggregation operation at the message level. In particular, local model updatesIs an array (vector) containing value elements. Thus, one potential approach is for each client to update the local weighting model in turnThe multiple values in (a) are packed into messages, each carrying a fixed number of values. Thus, the Sequence Number (SN) assigned to each message corresponds toThe position of the value in the array. The switch may aggregate messages of the same SN, but only send one result message to the following node.

However, messages from different clients cannot arrive at the same switch at the same time. The switch must buffer some of the early messages and wait for the messages associated with them (i.e., messages from other clients having the same SN) to complete the aggregation operation. While memory space may be pre-allocated to reserve message data for each child node, this will result in the aggregation being stateful, thereby affecting data security. In addition, this approach requires a lot of memory space for the message buffering, which creates new problems due to the limited and scarce memory resources in the switch.

To address this problem, the present invention precompiles switch memory into a plurality of slots (slots), each slot holding a message with a particular SN. For ease of description, each such slot may be referred to as an aggregator (aggregator), as shown in fig. 2, where the switch contains 2 aggregators, further, a CN field may be incorporated in the header and in each aggregator to indicate whether the aggregation operation has been completed. The CN field in the header indicates the number of child nodes connected to the aggregation switch, while the CN field in each aggregator keeps track of how many child nodes' packets have been aggregated so far.

When a new message arrives at the aggregation switch, it is matched to the aggregators using the index idx=pkt.sn% L, where pkt.sn represents the sequence number of the message and L represents the number of aggregators on the switch. Each value in its payload (i.e., the portion of the data updated by the local model) is then added to each value cached in the aggregator (only cache report Wen Zaihe if the aggregator is empty), and the CN value in the aggregator is added by 1. Then, by comparing the aggregator with the CN value in the message, judging whether the aggregation operation is finished: if the messages are equal, the exchange can combine the aggregation result into the message for forwarding, wherein the aggregation result indicates that all relevant messages from the child nodes are aggregated; otherwise, the message is discarded after aggregation.

It should be noted that this process does not require the aggregation switch to cache the data of each client, but only temporarily cache their aggregation results. Thus, such polymerization operations are stateless. In contrast to stateful aggregation described above, stateless aggregation may avoid collecting local model updates from a single client, thereby enhancing data privacy.

Further, the concept of bandwidth-delay product (BDP) is introduced to estimate the number of aggregators required for a task. The maximum amount of data that can be transmitted in the network at any given time can be represented by BDP, which can be calculated from the network bandwidth times the Round Trip Time (RTT). If the network bandwidth is denoted B, RTT is denoted R, the amount of data transferred is denoted a, and the relationship is a=b×r. On this basis, in order to reach the network bandwidth B in case of RTT R, it is necessary to buffer data of size a on the aggregation switch. In particular, assuming that the length of the messages is fixed (because the number of values carried by each message is fixed), denoted S, l=a/S messages are buffered at the switch. Thus, the following relationship can be derived:

L＝B×R/S (8)

Equation (8) above allows for estimating the memory overhead required by the aggregation switch to achieve the desired transport throughput. However, the GAIN should maintain a certain number of aggregation results on the aggregation switch to solve the message loss problem, thereby resulting in an actual memory overhead of 2L.

For switch j connected to d _j clients, GAIN can slave the transport throughputTo be increased toIn other words, the obtained speed ratio can be expressed as:

When the switch memory is sufficient, the GAIN can increase the throughput by a factor of d _j -1. This shows that in a scenario with a large number of clients, GAIN can achieve a greater acceleration.

Safe packet loss recovery: since packet loss may occur in the edge network, an efficient mechanism is needed to achieve packet loss recovery. The present invention uses a timeout as an indicator to identify whether a message is lost. Specifically, the source node (i.e., client) may append a timer (the packet is referred to as a data message) after transmitting each packet, and wait for an acknowledgement packet (the packet is referred to as an ACK message) from the destination node (i.e., central server). To distinguish these packets, a 1-bit ACK field may be included in the packet header, where pkt.ack=0 represents a data packet and pkt.ack=1 represents an ACK packet.

If an ACK message (whose sequence number SN is the same as the sequence number SN of the data message) is not received before the timer expires, this indicates that the transmitted data message may be lost. Therefore, in order to prevent interruption of the aggregation process, the data packet needs to be retransmitted. However, retransmissions may have a negative impact on correctness, as the aggregation switch may aggregate the retransmitted message twice, yielding incorrect results. To avoid this, an alternative solution is to degenerate the GAIN to the traditional transmission scheme, i.e. the client directly retransmits the timeout message to the central server for aggregation. However, this solution may cause security problems because it may degrade the stateless aggregation of the GAIN to the traditional stateful aggregation, thereby affecting data privacy. For example, the central server may intentionally discard the received data message, or may not send a corresponding ACK message, so as to induce the client to directly retransmit the data message to the central server. Thus, the central server easily obtains local updates from each client, resulting in privacy disclosure. The present invention refers to this problem as "ACK spoofing attack".

In order to avoid the occurrence of the situation, the invention causes the retransmitted data message to be recombined on the aggregation switch instead of being directly forwarded to the central server, and proposes a safe packet loss recovery mechanism. Specifically, the invention adds a Bitmap field in each aggregator, and adds an identifier field in the header of the message, so as to distinguish the retransmitted data message from the normal data message. Bitmap is a Bitmap consisting of p bits (initial value is all 0), and identifier is an integer. For a Bitmap of length p bits, the aggregation switch may identify up to p different child nodes (e.g., the field may be set to 256 bits in the experiment). Each child node of the aggregation switch is assigned a unique Identified value. When a message arrives at the aggregation switch, it is first matched with the aggregator according to its sequence number SN. Then, the aggregation switch judges whether the message is a retransmission message according to the Bitmap field of the matched aggregator: if the pkt.identified number of positions in the Bitmap field is 0, the aggregation switch considers that the message arrives for the first time; thus, it can be polymerized. Then, the aggregation switch sets the pkt.identified position corresponding bit in the Bitmap field to 1 to record the arrival of the message. Conversely, if the bit is 1, the message may be a retransmission message.

For retransmission of messages, the aggregation switch cannot discard the message directly, as the message may also be lost during transmission from the aggregation switch to subsequent nodes. In this case, if the aggregation switch discards all the retransmission messages, the downstream node will not be able to receive the retransmission messages, resulting in transmission interruption. Therefore, before discarding the retransmission message, the aggregation switch needs to verify whether the CN value in the matched aggregator is equal to the CN value in the header. If they are equal, the aggregation switch should update the payload of the message with the result of the aggregation and then forward it to the next node, rather than simply discard it. Thus, the problem of packet loss can be effectively processed, and meanwhile, the 'ACK spoofing attack' is avoided.

It should be noted that the GAIN can also be extended to support more general multi-level aggregation, where aggregation switches can aggregate results from other aggregation switches, thereby further reducing traffic on the backbone. To achieve this, the aggregation switch may update its identifier field prior to forwarding the resulting message to ensure that subsequent aggregation switches can still distinguish between messages from each of its child nodes.

Referring to fig. 3, in one embodiment, a method for managing switch resources for edge stateless intra-network aggregation is provided, including the following processing steps S12 to S16:

S12, acquiring a batch of messages sent by different connected clients according to the maintained sliding window; the message carries a sequence number and a fixed number of weighted model updated values, the sequence number corresponds to the position of the weighted model updated values in the array of the message, and the maximum value of the sliding window does not exceed the message number threshold value cached by the aggregation switch;

s14, finding a matched aggregator by using a first index of the aggregator according to the serial numbers of the messages;

s16, if any aggregator receives the matched message and then reaches an aggregation forwarding condition, forwarding an aggregation result corresponding to any aggregator to a subsequent node, and initializing another aggregator which is not matched currently according to a second index of the aggregators.

It can be appreciated that the recycling of the switch memory: the limited and scarce memory of the switch limits the source node from injecting an unlimited number of messages into the network. To prevent overflow of the switch memory, it is necessary to impose a limit on the maximum number of messages that can be sent. To achieve the transmission bandwidth B, the switch must buffer at least l=b×r/S messages, S being the length of the messages. If the number of messages in the network exceeds L (i.e., the message number threshold), there is a possibility that messages of different sequence numbers are assigned to the same aggregator, resulting in incorrect results.

The present invention thus employs each source node to maintain a sliding window that includes the processing of both sent but unacknowledged messages and future transmittable messages. The size of the sliding window can be dynamically adjusted to accommodate different network conditions (similar to the congestion window of TCP), but its maximum cannot exceed L messages. This can avoid messages of different sequence numbers from being erroneously matched to the same aggregator.

To reclaim the memory resources of the switch, the aggregator may be initialized after the switch aggregates all corresponding messages from its child nodes. However, when the ACK message is lost, some source nodes may send their sliding window in advance, while other nodes may retransmit the timeout message. This condition is referred to herein as "window out of sync" and can result in false aggregator matches.

As shown in fig. 4, a case of "window out of sync" due to the loss of the ACK message sent to the client 2 is illustrated. In this embodiment, for example, there are 2 aggregators in the switch, and the source node maintains a sliding window of 2 messages. If no packet is lost, both the client 1 and the client 2 slide the window in advance, send messages with the sequence number sn=2 and the sequence number sn=3 respectively, and then aggregate the messages by the switch. However, if an ACK message with sn=1 is lost when sent to the client 2, the client 1 advances its sliding window by 2 messages, and sends messages with sn=2 and sn=3; meanwhile, the client 2 advances its sliding window by 1 message and transmits a sn=2 message, and then retransmits a timeout message with sn=1. Thus, the sn=3 message from client 1 will be aggregated with the sn=1 message from client 2, leading to erroneous results.

In this embodiment, this problem is solved by using an aggregator that is twice the maximum sliding window size. Specifically, when a packet with a sequence number pkt.sn arrives, the aggregation switch finds the corresponding aggregator using the following formula of the first index of the aggregator:

idx＝pkt.SN％2L (10)

to reclaim the switch memory, if the aggregator completes the aggregation, the aggregation switch calculates the aggregator index and initializes another corresponding aggregator using the following equation (i.e., aggregator second index):

idx＝((SN％2L)+ L)％2L (11)

this process can be illustrated with the example shown in fig. 5, where client 1 and client 2 advance their sliding windows to transmit messages sn=2 and sn=3. According to formula (10), messages with sn=2 are distributed to the 3 rd aggregator for aggregation. Since this aggregator has received a sufficient number of messages (i.e., aggregate forwarding conditions), the switch will forward the result and initialize the first aggregator (i.e., the leftmost one in the aggregate inter-machine memory) according to equation (11). The fourth aggregator (i.e. the right-most one in the aggregation inter-machine memory) continues to wait because it has not yet received a message with sn=3 from client 2.

According to the switch resource management method for the aggregation of the edge stateless network, a sliding window is maintained at each source node, wherein the sliding window comprises the sent and unacknowledged messages and the processing of the future transmittable messages, the size of the sliding window can be dynamically adjusted to adapt to different network conditions, but the maximum value of the sliding window does not exceed the message number threshold value of the aggregation switch cache, so that overflow of the switch memory is prevented, and the messages with different sequence numbers are prevented from being erroneously matched to the same aggregator. Meanwhile, the problem of window asynchronism is solved by using the aggregator with the size twice that of the maximum sliding window through the first index of the aggregator, and when the aggregator completes aggregation, the second index of the aggregator initializes the other aggregator which is not matched currently so as to recycle the memory of the switch and realize resynchronization of the window, so that the aggregation process can automatically recover to a normal aggregation processing state under the condition that the windows are asynchronous, the recovery of the memory of the switch is realized while packet loss is effectively realized, and the memory utilization rate of the switch is effectively improved.

In one embodiment, the method for managing switch resources for intra-edge stateless network aggregation may further include the following processing steps:

If the aggregator does not reach the aggregation forwarding condition after receiving the matched message, continuing to wait for a new message to arrive; the message comprises a first sent message or a retransmission message, wherein the retransmission message is a message with the same serial number which is retransmitted to the aggregation switch after the client side does not receive the confirmation message with the same serial number broadcast by the central server and triggers retransmission.

It will be appreciated that once the central server receives all messages containing the same sequence number SN (e.g. sn=2), it will broadcast an ACK message to all source nodes with the same SN as the messages. If both clients receive this ACK message (with sn=2), their sliding windows will each move forward one message. However, considering the example in fig. 4, if client 2 does not receive an ACK message with sn=1, it will retransmit a timeout message (with sn=1) and a new message with sn=2, and client 1 will send new messages with sn=2 and sn=3. According to equation (10), the sn=1 message from client 2 will be matched to the second aggregator. Since the aggregator has completed aggregation (i.e., aggregator sequence number aggregator. Sn=pkt. Sn), the switch forwards the aggregation result to the downstream node. Eventually, the central server will return an ACK message with sn=1 again. Meanwhile, messages sn=2 and sn=3 of the client 1 will be matched to the third and fourth aggregators, respectively. The third aggregator receives messages with sn=2 from two clients; thus, the switch forwards the aggregate result after the aggregation is completed. For the 4 th aggregator, it should continue waiting for new messages to arrive because its SN value is not equal to the SN value of the message.

After receiving the ACK message with sn=1, the client 2 advances its sliding window and sends a message with sn=3 to resynchronize the window. In this way, the polymerization process can be restored to the normal state. Otherwise, if the client 2 still does not receive the ACK message, the retransmission will be triggered again, and the above procedure is repeated. Therefore, the packet loss can be effectively processed, and meanwhile, the recovery of the switch memory is realized.

In one embodiment, when obtaining a batch of messages sent by different connected clients according to the maintained sliding window, the method further comprises the following processing steps:

searching an aggregator table according to task identifiers carried in message heads of messages, and determining the starting positions of training tasks corresponding to the task identifiers in the aggregator table; the task identification is used to specify training tasks for the federated learning task.

It can be appreciated that for multitasking runtime management: to achieve run-time multitasking, the present invention proposes pre-compiling a set of aggregators, called an aggregator table, on each programmable switch and dynamically allocating these aggregator resources among the different federal learning tasks. Specifically, task identification (TaskID) in the header may be utilized to specify different training tasks. After the message arrives at the aggregation switch, the aggregation switch searches the matching aggregator table according to the task ID of the message and obtains a corresponding Offset (Offset), namely the starting position of the task in the aggregator table. The aggregation switch then aggregates this message to an aggregator whose index can be expressed as:

idx＝Offset+pkt.SN％2L (12)

If the network bandwidth is denoted B, the packet round trip time RTT is denoted R, the amount of data transmitted is denoted a, and the relationship is a=b×r. On this basis, in order to reach the network bandwidth B in case of RTT R, it is necessary to buffer data of size a on the aggregation switch. Specifically, assuming that the length of the messages is fixed (because the number of values carried by each message is fixed), denoted S, l=a/S messages are buffered at the switch, where pkt.sn represents the sequence number of the arriving message. As described above, if one aggregator completes aggregation, the aggregation switch should initialize another corresponding aggregator, equation (11) can be rewritten in the form of:

idx＝Offset+((pkt.SN％2L)+L)％2L (13)

The initiator of the federal learning task can obtain the TaskID by applying the GAIN service to the network provider, which can determine the transmission bandwidth of each task and assign aggregators (where the actual number is twice that value) to provide acceleration for them according to equation (8). After the tasks are completed, the network provider may also reclaim the assigned aggregators at run-time based on the taskID and Offset for each task.

In one embodiment, in the process of forwarding the aggregation result corresponding to any aggregator to the subsequent node, the method may further include the following processing steps:

When the outlet port of the aggregation switch is congested, setting the value of a congestion field of a message where an aggregation result is located as 1; the congestion field is used to indicate to the client that the egress port of the aggregate interactor is congested and to indicate to the client to reduce the sliding window.

In addition to the above-mentioned packet processing logic that aggregates the gains on the switch and terminal (client) hosts, a 1-bit congestion field (i.e., ECN field) may be added to the header to indicate congestion. When forwarding the message, if the exit port of the switch is congested, the ECN bit of the message is set to 1, otherwise, the ECN bit of the message is kept to be default 0, which indicates that the exit port of the switch is not congested. For the client, if a plurality of messages with overtime continuous serial numbers or ACK messages with ECN=1 are received, the network is considered to be congested, and the size of a sliding window is reduced; otherwise, it increases the window size. Furthermore, the client has to ensure that the maximum window size does not exceed L.

In one embodiment, in order to more intuitively illustrate the performance and effect of the above-described switch resource management method for edge stateless intra-network aggregation, one example of this experiment is presented herein. It will be appreciated by those skilled in the art that this experimental example is illustrative only and is not the only limitation on the implementation of the switch resource management method described above for edge stateless intra-network aggregation.

Experiments performed on the GAIN described above were implemented on an FPGA-based test platform. Since the GAIN can perform stateless aggregation without storing private data, the privacy of the data can be ensured. Therefore, the communication performance of GAIN is mainly tested in a centralized manner during evaluation.

Experiment setting: the GAIN prototype is realized on the FPGA-based test platform, and the server is provided with 2 CPUs, 128GB of RAM memory, 500GB of SSD solid state disk, 1 GPU, 2 network cards with 4 ports and 3 FPGA devices with 4 ports. 8 virtual hosts are created on the server, one of which acts as a central server and the other as a federal learning client. Each virtual host is bound to one port of the physical network card. Notably, these virtual clients are configured by default for model training using only 4 CPU cores, taking into account the computational resource limitations of the edge clients.

The switch processing logic of the GAIN is implemented using FPGA devices and these virtual hosts and FPGA switches are connected using 1Gbps links. The experimental topology is shown in fig. 6. Among other things, because current commercial programmable switches lack support for floating point number computation, an existing scaling method is employed herein to enable the switch to aggregate floating point numbers. Specifically, prior to aggregation, the floating point number to be sent is first multiplied by an amplification factor, which is converted to a 32-bit integer. Each recipient then divides the aggregate data by the same magnification factor, restoring it to a floating point type value. This approach allows the switch to perform calculations specifically on integers for data aggregation. The present example implements a prototype of GAIN based on PyTorch framework, setting the message payload size of GAIN to 1024B. On the test platform of this example, with 24 aggregators (consuming about 25KB of switch memory), GAIN can achieve an aggregate throughput of 1Gbps line speed.

Experiments were performed using a variety of representative deep learning models, which may include, for example, the existing AlexNet model, resNet model, resNet model, reseNet model, mobileNetV-S model, mobilenetV-L model, VGG11 model, VGG13 model, and VGG16 model. The size of these models ranges from approximately 11MB (MobileNetV-S) to 528MB (VGG 16). Model training is performed using Cifar data sets and the data is pre-distributed to clients. These clients then perform federal learning tasks under the coordination of the central server. Experiments were performed by default using the ResNet model as an example. The evaluation index includes training accuracy, traffic overhead, and training throughput over time.

GAIN was compared to the following method: the common model updating method based on the existing Gloo communication library is: in this approach, a single client transmits its local model updates to the central server, which then aggregates the updates and returns the global model. By default, all clients update the global model after each round of training. Conv-Quant method: the update data is quantized to float16 type, thereby reducing the amount of communication data. The method is also realized based on Gloo communication libraries, and all clients participate in model updating after each training round under the default condition. The Defer method: the global model update is performed after a delay update, i.e., after a number of batches of computations. In the experiment, a delay of 10 batches, called Defer10, was set by default. Select method: a subset of clients is selected for model updating. By default, 5 clients are selected for global model updating every round, and the global model updating is marked as Select5.

Experimental results part, training accuracy changes with time: four representative models were used to evaluate the training accuracy of the various methods over time, namely AlexNet model, resNet model, mobileNetV3-S model and VGG11 model. To ensure comparability, GAIN allows all clients to participate in global model updates after each batch of computation, similar to Conv and Conv-Quant methods. Furthermore, the refer and Select methods are also compared. Since these two methods can be used in combination with the above three methods, only Conv-refer 10 and Conv-Select5 are focused here (both methods are data transmission by using the Conv method). Experimental results as shown in fig. 7, different methods share the same training settings, such as batch size and learning rate, in each set of experiments.

Of all the comparison methods, the GAIN method has the fastest convergence rate, followed by Conv-Quant. In contrast, conv requires a longer time to reach the same level of accuracy due to significant data transmission delays. For ease of comparison, 70% accuracy was targeted and the training time of these methods was evaluated. Because of the significant acceleration in transport speed, GAIN achieves target accuracy faster than other comparative methods. Training time for GAIN was reduced by 54.5%, 72.6%, 49.7% and 78.7% on the 4 models, respectively, compared to Conv. GAIN still reduced training times by 33.8%, 53.1%, 14.9% and 62.3%, respectively, compared to Conv-Quant. This is mainly due to the ability of the GAIN to aggregate local updates with switches on the path, effectively alleviating congestion and improving communication performance.

Furthermore, GAIN shows the most significant performance improvement in VGG11 because the model is communications intensive. In the experiment, GAIN reduced its training time from 11.33 hours of Conv to about 2.42 hours. GAIN still presents advantages for the lightweight MobileNetV-S model, even though the model requires less traffic. GAIN reduced its training time from 147 minutes of Conv to around 74 minutes.

Communication overhead: in addition to shortening training time, GAIN provides another significant advantage in that the traffic overhead of the network and the Central Server (CS) is reduced. Fig. 8 (a) shows the traffic received by the central server with a target accuracy of 70%. GAIN can reduce the traffic of the central server by 83.8% -86.5% compared to Conv. GAIN can reduce the flow rate by 70.1% -75.3% compared with Conv-Quant. It is worth mentioning that the GAIN advantage will be more pronounced as the number of participating clients increases, as the switch can aggregate more client traffic, thus achieving a larger input-to-output traffic ratio.

Delaying communications and selecting a portion of clients: FIG. 8 (b) shows the training throughput of ResNet, 34, where the client updates the global model after local computation of different batch numbers. As the number of latches increases, all methods show an increase in training throughput, as communication is a major bottleneck in model training. Notably, conv and Conv-Quant significantly increase in throughput as the number of latches increases, as they typically face longer communication times. In contrast, the training throughput of GAIN increases only slightly with the increase in batch, as GAIN can effectively reduce communication time, resulting in a marginal utility decrease that delays more rounds of communication acquisition.

Fig. 8 (c) shows the relationship between throughput and the number of clients selected per round of global model update. The throughput of both Conv-Quant and Conv drops significantly as the number of participating clients increases. This is because the more clients involved, the more severe the network congestion and the worse the transmission performance. In contrast, the GAIN utilizes the switch to aggregate related traffic, thereby effectively relieving congestion of the switch outlet port, and improving communication performance. Specifically, when 7 clients are engaged per round, GAIN achieves throughput improvement of 1.28x and 2.90x, respectively, compared to Conv and Conv-Quant.

Fig. 8 (d) evaluates the validity of the global model update scheme of GAIN. In comparison to the conventional update scheme (using equation (1)), GAIN demonstrated the same or even better model convergence at the same experimental setup. For example, after 3000 rounds of training, with 2 clients randomly selected for global model updates per round, GAIN achieved 76.64% accuracy (GAIN-Select 2), whereas the traditional method (Conv-Select 2) achieved only 68.02% accuracy.

Training throughput under different models: since the transfer and Select methods can be combined with the GAIN, conv and Conv-Quant methods, the performance of the latter three methods is focused on in the following experiments. Figure 9 shows the training throughput of these three methods in different training models. GAIN exhibits significant acceleration effects in a communications-intensive model, especially in VGG11, VGG13, and VGG16, and achieves throughput increases of more than 4.11 times compared to Conv, and more than 1.75 times compared to Conv-Quant. In the lower traffic lightweight model, such as MobileNetV-S and MobileNetV-L, GAIN achieves 82.4% and 82.34% throughput improvement over Conv, respectively, while also exhibiting about 10% improvement over Conv-Quant.

Aggregator and packet loss rate: taking ResNet training tasks as an example, the relationship between the number of aggregators allocated and the training throughput is evaluated. As shown in fig. 10 (a), increasing the number of aggregators can increase training throughput. This is because more aggregators can achieve higher aggregate throughput, as analyzed by equation (8). In a configuration containing 24 aggregators, the GAIN-CPU (trained using the CPU) achieved a peak aggregate throughput of approximately 38.8 images per second. However, further increasing the number of polymerizers does not produce any additional boost, since the polymerization capacity of the GAIN has already reached the line rate. Similar trends can be observed when the client trains using a GPU (GAIN-GPU), but peak throughput can reach further up to about 44.5 pieces/sec. This is because enhancing the client computing power can further increase the overlap ratio between communications and computation, thereby increasing bandwidth utilization.

The effect of packet loss on training throughput is shown in fig. 10 (b). In order to simulate network packet loss, the received message is discarded on the network card of each node with a certain probability. When the packet loss rate is lower than 0.01%, the throughput of GAIN is less affected, and only drops from 37.9 images/second to 35.2 images/second. When the network packet loss rate is large (the packet loss rate is greater than 0.1%), frequent window resynchronization results in a slightly significant decrease in throughput. Nevertheless, in the case of 1% packet loss, the GAIN throughput is still improved by 107.6% and 23.4% compared to Conv and Conv-Quant methods, respectively.

Multitasking: performance under multitasking is shown in fig. 11, where task1 is training ResNet the model and task2 is training MobileNetV the 3-S model. Each task ends after a fixed number of rounds. Each task is trained using 2 CPU cores (4 CPU cores per client). As shown in fig. 11 (a), which shows the accuracy of the training time, GAIN still achieves significant performance improvement when multiple tasks are performed simultaneously. For Task1 (Task 1), conv-Quant and Conv required 184min and 338min to complete, respectively, while GAIN required only 101min, training time was reduced by 45.1% and 70.1%, respectively. Likewise, the training time for Task2 (Task 2) was significantly reduced from 187 minutes for Conv-Quant and 328 minutes for Conv to only 138 minutes.

Fig. 11 (b) illustrates the variation in gain training throughput under dynamic task joining and exiting. Initially, 48 aggregators were compiled on the switch (whereas FPGAs were capable of deploying at least 512 aggregators) and 24 aggregators were allocated for task 1 to achieve peak aggregate performance. At 80 seconds, task 2 was assigned another 24 aggregators at run-time, ending at 176 seconds. After task 2 was added, the throughput of task 1 was slightly reduced from 29 images/second to about 18 images/second. However, after task 2 exits, its throughput is restored to its original level. This procedure demonstrates that GAIN can quickly adapt to multitasking and be deployed at run-time for training tasks. It also shows that the gat's aggregator can be dynamically assigned and provide flexible speed ratios for different training tasks.

In summary, a security aggregation acceleration service GAIN for edge joint learning is presented herein. The GAIN uses a programmable switch to aggregate model updates to obtain data from numerous clients in a stateless manner, thereby improving transmission speed while ensuring data privacy. An effective aggregation mechanism is designed, the security and the robustness of GAIN are ensured, and the efficient utilization of the memory of the switch and the multi-task runtime management are realized. In order to evaluate its performance, a prototype of GAIN was implemented on an FPGA-based test platform, and experimental results showed that GAIN could increase training throughput by a factor of up to 4.11 and reduce traffic by a factor of up to 86.5%. In addition, theoretical analysis shows that with the increase of the number of clients, GAIN can obtain a greater performance improvement while guaranteeing data privacy.

It should be understood that, although the steps in the flowchart of fig. 3 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 3 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

Referring to fig. 12, in one embodiment, a switch resource management system 100 for aggregation in an edge stateless network is provided, including a packet acquisition module 11, an index matching module 13, and a forwarding initiation module 15. The message obtaining module 11 is configured to obtain a batch of messages sent by different connected clients according to the maintained sliding window; the message carries a sequence number and a fixed number of weighted model updated values, the sequence number corresponds to the position of the weighted model updated values in the array of the message, and the maximum value of the sliding window does not exceed the message number threshold value cached by the aggregation switch. The index matching module 13 is configured to find a matching aggregator according to the sequence number of each packet using the first index of the aggregator. The forwarding initiation module 15 is configured to forward an aggregation result corresponding to any aggregator to a subsequent node when an aggregation forwarding condition has been reached after any aggregator receives the matched message, and initialize another aggregator that is not currently matched according to the second index of the aggregator.

The switch resource management system 100 for aggregation in an edge stateless network maintains a sliding window at each source node, wherein the sliding window comprises the transmitted and unacknowledged messages and the processing of the messages which can be transmitted in the future, the size of the sliding window can be dynamically adjusted to adapt to different network conditions, but the maximum value of the sliding window does not exceed the threshold value of the number of messages cached by the aggregation switch, so as to prevent the overflow of the switch memory and avoid the mismatching of the messages with different serial numbers to the same aggregator. Meanwhile, the problem of window asynchronism is solved by using the aggregator with the size twice that of the maximum sliding window through the first index of the aggregator, and when the aggregator completes aggregation, the second index of the aggregator initializes the other aggregator which is not matched currently so as to recycle the memory of the switch and realize resynchronization of the window, so that the aggregation process can automatically recover to a normal aggregation processing state under the condition that the windows are asynchronous, the recovery of the memory of the switch is realized while packet loss is effectively realized, and the memory utilization rate of the switch is effectively improved.

In one embodiment, the forwarding initiation module 15 is further configured to continue waiting for a new packet to arrive when the aggregation forwarding condition is not reached after the aggregator receives the matched packet; the message comprises a first sent message or a retransmission message, wherein the retransmission message is a message with the same serial number which is retransmitted to the aggregation switch after the client side does not receive the confirmation message with the same serial number broadcast by the central server and triggers retransmission.

In one embodiment, the message obtaining module 11 is further configured to search the aggregator table according to the task identifier carried in the header of the message, and determine the starting position of the training task corresponding to the task identifier in the aggregator table; the task identification is used to specify training tasks for the federated learning task.

In one embodiment, the forwarding initiation module 15 is further configured to set a value of a congestion field of a message where an aggregation result is located to 1 when an output port of the aggregation switch is congested in a process of forwarding the aggregation result corresponding to any aggregator to a subsequent node; the congestion field is used to indicate to the client that the egress port of the aggregate interactor is congested and to indicate to the client to reduce the sliding window.

For specific limitations of the switch resource management system 100 for edge stateless intra-network aggregation, reference may be made to the corresponding limitations of the switch resource management method for edge stateless intra-network aggregation hereinabove, and will not be described in detail herein. The various modules in the switch resource management system 100 described above for edge stateless intra-network aggregation may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a device with a data processing function, or may be stored in a memory of the device in software, so that the processor may call and execute operations corresponding to the above modules, where the device may be, but is not limited to, various model training devices existing in the art.

In one embodiment, there is further provided a computer device, including a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to implement the processing steps of the method for managing switch resources in an edge stateless network in any of the above embodiments.

It will be appreciated that the above-mentioned computer device may include other software and hardware components not listed in the specification besides the above-mentioned memory and processor, and may be specifically determined according to the model of the specific training device in different application scenarios, which will not be listed in detail in the specification.

In one embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the processing steps of the method for intra-edge stateless intra-network aggregation switch resource management in any of the embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), erasable Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus dynamic random access memory (Rambus DRAM, RDRAM for short), and interface dynamic random access memory (DRDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it is possible for those skilled in the art to make several variations and modifications without departing from the spirit of the present application, which fall within the protection scope of the present application. The scope of the application is therefore intended to be covered by the appended claims.

Claims

1. A method for intra-edge stateless intra-network aggregation switch resource management, comprising the steps of:

Finding a matched aggregator by using a first index of the aggregator according to the serial number of each message;

2. The method for management of switch resources for intra-edge stateless network aggregation of claim 1, further comprising the steps of:

If the aggregation device does not reach the aggregation forwarding condition after receiving the matched message, continuing to wait for a new message to arrive; the message comprises a first sent message or a retransmission message, wherein the retransmission message is a message with the same serial number, which is retransmitted to the aggregation switch after the client side does not receive the confirmation message with the same serial number and broadcast by the central server, and the retransmission message is triggered.

3. The method for managing switch resources in an edge stateless network according to claim 1 or 2, wherein when obtaining a batch of messages sent by different clients connected according to a maintained sliding window, further comprising the steps of:

searching an aggregator table according to a task identifier carried in a message header of the message, and determining the starting position of a training task corresponding to the task identifier in the aggregator table; the task identification is used for specifying training tasks of the federal learning task.

4. A method for managing switch resources for aggregation in an edge stateless network according to claim 3, wherein forwarding the aggregation result corresponding to any one of the aggregators to a subsequent node further comprises the steps of:

5. A switch resource management system for edge stateless intra-network aggregation, comprising:

The index matching module is used for finding a matched aggregator by using a first index of the aggregator according to the sequence number of each message;

And the forwarding initial module is used for forwarding an aggregation result corresponding to any one of the aggregators to a subsequent node when the aggregation forwarding condition is reached after any one of the aggregators receives the matched message, and initializing another aggregator which is not matched currently according to a second index of the aggregator.

6. The system for switch resource management for edge stateless in-network aggregation of claim 5, wherein the forwarding initiation module is further configured to continue waiting for a new packet to arrive when an aggregate forwarding condition is not reached after the aggregator receives the matched packet; the message comprises a first sent message or a retransmission message, wherein the retransmission message is a message with the same serial number, which is retransmitted to the aggregation switch after the client side does not receive the confirmation message with the same serial number and broadcast by the central server, and the retransmission message is triggered.

7. The switch resource management system for edge stateless intra-network aggregation according to claim 5 or 6, wherein the message obtaining module is further configured to search an aggregator table according to a task identifier carried in a header of the message, and determine a starting position of a training task corresponding to the task identifier in the aggregator table; the task identification is used for specifying training tasks of the federal learning task.

8. The system for managing switch resources for aggregation in an edge stateless network according to claim 7, wherein the forwarding initiation module is further configured to set a value of a congestion field of a message in which an aggregation result is located to 1 when an output port of the aggregation switch is congested in a process of forwarding the aggregation result corresponding to any one of the aggregators to a subsequent node; the congestion field is used to indicate to the client that the egress port of the aggregate interactor is congested and to indicate to the client to reduce the sliding window.

9. A computer device comprising a memory and a processor, characterized in that the memory stores a computer program, the processor implementing the steps of the method for intra-edge stateless intra-network aggregation switch resource management of any one of claims 1 to 4 when the computer program is executed.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the method for switch resource management in an edge stateless network of any of claims 1 to 4.