CN115714937A

CN115714937A - All-optical switching distributed reinforcement learning system and method based on array waveguide grating

Info

Publication number: CN115714937A
Application number: CN202211372521.0A
Authority: CN
Inventors: 薛旭伟; 郭元之; 赵家鹏; 丁蕊; 郭秉礼; 黄善国
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-11-03
Filing date: 2022-11-03
Publication date: 2023-02-24

Abstract

The invention provides a full optical switching distributed reinforcement learning system and method based on arrayed waveguide grating, the system includes the arrayed waveguide grating router and multiple clusters among the clusters; each cluster comprises a parameter server rack, an in-cluster arrayed waveguide grating router and a plurality of working server racks; the parameter server rack comprises a first top switch and a plurality of parameter servers, and the network ports of the parameter servers are connected to the first top switch; the work server rack comprises a second top switch and a plurality of work servers, and the network ports of the work servers are connected to the second top switch; the intra-group arrayed waveguide grating router is interconnected with the first top-rack switch and each second top-rack switch in a full-connection mode; the inter-cluster arrayed waveguide grating router is interconnected in a fully connected manner with a first top-of-rack switch within each intra-cluster parameter server chassis. The distributed reinforcement learning method is suitable for distributed reinforcement learning, only needs one network hop for data transmission, and has the characteristics of low time delay and low loss.

Description

All-optical switching distributed reinforcement learning system and method based on arrayed waveguide grating

技术领域technical field

本发明涉及光交换技术领域，尤其涉及一种基于阵列波导光栅的全光交换分布式强化学习系统及方法。The invention relates to the field of optical switching technology, in particular to an arrayed waveguide grating-based all-optical switching distributed reinforcement learning system and method.

背景技术Background technique

随着大数据在各领域的快速发展和应用，机器学习逐渐成为处理大数据的一种重要工具。机器学习的核心思想是训练一个模型来拟合输入的训练数据，将训练好的模型部署于相应的应用中，期望模型能够对应用运行过程中产生的数据进行准确的分类或者预测。如今的机器学习应用往往需要复杂的模型来处理超大规模的数据集。单机部署已无法满足如此大规模机器学习训练的需求，分布式机器学习应运而生，其是通过分布式部署方式利用多个计算节点进行协同训练，大大提高训练的速度。但是面对庞大的数据，仍存在分布式机器学习节点之间在单位时间内为了传输更多的数据，造成节点之间的通信频率增加的问题，如果某些计算节点因为网络拥塞原因没有收到数据，整体就无法及时进入下一个迭代，最终训练任务完成时间的瓶颈从计算转移到了网络通信。With the rapid development and application of big data in various fields, machine learning has gradually become an important tool for processing big data. The core idea of machine learning is to train a model to fit the input training data, deploy the trained model in the corresponding application, and expect the model to accurately classify or predict the data generated during the running of the application. Today's machine learning applications often require complex models to handle very large datasets. Single-machine deployment can no longer meet the needs of such large-scale machine learning training, and distributed machine learning has emerged as the times require. It uses multiple computing nodes for collaborative training through distributed deployment, which greatly improves the speed of training. However, in the face of huge data, there is still a problem that the communication frequency between nodes increases in order to transmit more data per unit time between distributed machine learning nodes. If some computing nodes do not receive due to network congestion Data, as a whole, cannot enter the next iteration in time, and finally the bottleneck of the completion time of the training task is shifted from calculation to network communication.

强化学习是机器学习的范式和方法论之一，用于描述和解决智能体与环境的交互过程中通过学习策略以达成汇报最大化或实现特定目标的问题。相比于分布式机器学习中常用的算法，诸如深度卷积神经网络、循环神经网络、深度图卷积神经网路等等，分布式强化学习训练需要利用更小的梯度聚合生成更大数量级的迭代。典型的强化学习算法将产生15万至2000万次的迭代，因此，梯度通信在每次迭代中的延迟是影响分布式强化学习训练性能的关键因素。Reinforcement learning is one of the paradigms and methodologies of machine learning, which is used to describe and solve the problem of maximizing the report or achieving specific goals through learning strategies during the interaction between the agent and the environment. Compared with the commonly used algorithms in distributed machine learning, such as deep convolutional neural network, recurrent neural network, deep graph convolutional neural network, etc., distributed reinforcement learning training needs to use smaller gradient aggregation to generate a larger order of magnitude iterate. A typical reinforcement learning algorithm will generate 150,000 to 20 million iterations, so the latency of gradient communication in each iteration is a key factor affecting the performance of distributed reinforcement learning training.

现有技术一种是采用集中式参数服务器的交换机架构，参数服务器通过电交换机将各工作服务器的梯度参数聚合来进行权重更新。该方案限制了分布式强化学习系统的可拓展性，同时，考虑到千万数量级的迭代，带宽有限和耗能的电交换会带来百微秒至十毫秒的延迟。另一种是采用基于All-reduce的架构，各工作服务器将任务分为N个子任务，每个工作服务器中相同序列的子任务会依次循环梯度聚合。该方案的梯度聚合虽然是以非集中式的方式进行，但通信过程的网络跳数随着网络规模的增大而线性增加，同样会存在毫秒级的延迟。One of the existing technologies is a switch architecture using a centralized parameter server, and the parameter server aggregates the gradient parameters of each working server through the electric switch to update the weights. This scheme limits the scalability of the distributed reinforcement learning system. At the same time, considering the iterations on the order of tens of millions, the limited bandwidth and energy-consuming electrical exchange will bring a delay of hundreds of microseconds to ten milliseconds. The other is to use an All-reduce-based architecture. Each worker server divides the task into N subtasks, and the same sequence of subtasks in each worker server will be cyclically aggregated sequentially. Although the gradient aggregation of this scheme is performed in a non-centralized manner, the number of network hops in the communication process increases linearly with the increase of the network size, and there will also be a millisecond-level delay.

发明内容Contents of the invention

鉴于此，本发明实施例提供了一种基于阵列波导光栅的全光交换分布式强化学习系统及方法，以消除或改善现有技术中存在的一个或更多个缺陷，解决现有技术限制分布式强化学习系统的可拓展性以及模型训练参数迭代时延高的问题。In view of this, the embodiment of the present invention provides an all-optical switching distributed reinforcement learning system and method based on arrayed waveguide gratings, to eliminate or improve one or more defects existing in the prior art, and solve the limitations of the prior art. The scalability of the traditional reinforcement learning system and the high iteration delay of model training parameters.

一方面，本发明提供了一种基于阵列波导光栅的全光交换分布式强化学习系统，其特征在于，包括：On the one hand, the present invention provides an all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings, which is characterized in that it includes:

多个集群，每个集群包括参数服务器机架、群内阵列波导光栅路由器和多个工作服务器机架；所述参数服务器机架包括第一架顶交换机和多个参数服务器，各参数服务器的网口均接入所述第一架顶交换机；所述工作服务器机架包括第二架顶交换机和多个工作服务器，各工作服务器的网口均接入所述第二架顶交换机；所述群内阵列波导光栅路由器以全连接的方式与所述第一架顶交换机和各第二架顶交换机互连；在每个集群中，各参数服务器、各工作服务器通过所述第一架顶交换机、所述第二架顶交换机和所述群内阵列波导光栅路由器进行通信连接；A plurality of clusters, each cluster includes a parameter server rack, an arrayed waveguide grating router in the group, and a plurality of working server racks; the parameter server rack includes a first top-of-rack switch and a plurality of parameter servers, and the network of each parameter server ports are all connected to the first top-of-rack switch; the working server rack includes a second top-of-rack switch and a plurality of working servers, and the network ports of each working server are connected to the second top-of-rack switch; the group The inner arrayed waveguide grating router is interconnected with the first top-of-rack switch and each second top-of-rack switch in a fully connected manner; in each cluster, each parameter server and each working server pass through the first top-of-rack switch, The second top-of-rack switch communicates with the arrayed waveguide grating router in the group;

群间阵列波导光栅路由器，所述群间阵列波导光栅路由器以全连接的方式与各集群内参数服务器机架内的第一架顶交换机互连；各集群内的参数服务器通过相应的第一架顶交换机以及所述群间阵列波导光栅路由器进行通信连接；An arrayed waveguide grating router between groups, the arrayed waveguide grating router between groups is interconnected with the first top-of-rack switch in the parameter server rack in each cluster in a fully connected manner; the parameter servers in each cluster are connected through the corresponding first rack The top switch and the arrayed waveguide grating router between the groups are connected for communication;

其中，在每个集群中，参数服务器机架内的参数服务器按照不同的子任务将预设强化学习模型的参数下发至各工作服务器机架内的工作服务器，由各工作服务器机架内的工作服务器将所述预设强化学习模型按照相应子任务训练得到的参数反馈至参数服务器，并进行梯度聚合；在各集群之间，各参数服务器机架内的参数服务器通过所述群间阵列波导光栅路由器交换各集群梯度聚合后得到的预设强化学习模型的参数。Among them, in each cluster, the parameter server in the parameter server rack sends the parameters of the preset reinforcement learning model to the working servers in each working server rack according to different subtasks, and the parameters in each working server rack The working server feeds back the parameters obtained by the preset reinforcement learning model to the parameter server according to the corresponding subtask training, and performs gradient aggregation; between each cluster, the parameter servers in each parameter server rack pass through the inter-cluster array waveguide The grating router exchanges the parameters of the preset reinforcement learning model obtained after the gradient aggregation of each cluster.

在本发明的一些实施例中，所述群内阵列波导光栅路由器和所述群间阵列波导光栅路由器采用波分复用技术建立多通道通信链路。In some embodiments of the present invention, the arrayed waveguide grating routers within the group and the arrayed waveguide grating routers between groups use wavelength division multiplexing technology to establish multi-channel communication links.

在本发明的一些实施例中，所述群内阵列波导光栅路由器和所述群间阵列波导光栅路由器均按照循环波长路由的方式将光信号路由至对应的输出端口。In some embodiments of the present invention, both the intra-group arrayed waveguide grating routers and the inter-group arrayed waveguide grating routers route optical signals to corresponding output ports in a circular wavelength routing manner.

在本发明的一些实施例中，所述第一架顶交换机和所述第二架顶交换机均包括交换模块、多个接收模块和多个发送模块，所述交换模块包括数据包处理器、调度器、广播模块和选择器，所述交换模块还构建并记载用于映射局域网地址与发送端口的流表。In some embodiments of the present invention, both the first top-of-rack switch and the second top-of-rack switch include a switching module, multiple receiving modules, and multiple sending modules, and the switching module includes a data packet processor, a scheduling device, a broadcast module and a selector, and the switch module also constructs and records a flow table for mapping LAN addresses and sending ports.

在本发明的一些实施例中，所述数据包处理器根据待转发的数据包的报文头判断所述数据包的目的地；In some embodiments of the present invention, the data packet processor determines the destination of the data packet according to the header of the data packet to be forwarded;

当所述数据包的目的地指向对应机架内时，所述选择器直接将所述数据包转发至相应机架内的服务器；When the destination of the data packet points to the corresponding rack, the selector directly forwards the data packet to the server in the corresponding rack;

当所述数据包的目的地指向机架间时，所述调度器从所述数据包的报文头中提取局域网地址，并查询流表获取所述数据包的目的地对应的发送端口；所述选择器根据获取的发送端口将所述数据包转发至该发送端口。When the destination of the data packet points to the inter-rack, the scheduler extracts the LAN address from the header of the data packet, and queries the flow table to obtain the sending port corresponding to the destination of the data packet; The selector forwards the data packet to the sending port according to the obtained sending port.

在本发明的一些实施例中，所述选择器根据获取的发送端口将所述数据包转发至该发送端口，还包括：In some embodiments of the present invention, the selector forwards the data packet to the sending port according to the obtained sending port, and further includes:

当所述目的地与所述数据包处于同一集群内时，将所述数据包经由所述群内阵列波导光栅路由器转发至相应的目的地服务器；When the destination and the data packet are in the same cluster, forward the data packet to a corresponding destination server via the arrayed waveguide grating router in the cluster;

当所述目的地与所述数据包处于不同集群时，将所述数据包经由所述群间阵列波导光栅路由器转发至相应的目的地服务器。When the destination and the data packet are in different clusters, the data packet is forwarded to a corresponding destination server via the inter-cluster ARWG router.

在本发明的一些实施例中，所述调度器从所述数据包的报文头中提取局域网地址，并查询流表获取所述数据包的目的地对应的发送端口，还包括：In some embodiments of the present invention, the scheduler extracts the LAN address from the header of the data packet, and queries the flow table to obtain the sending port corresponding to the destination of the data packet, and further includes:

当所述流表中没有所述局域网地址及其对应的发送端口时，进行泛洪，并发出警报。When the LAN address and its corresponding sending port do not exist in the flow table, flooding is performed and an alarm is issued.

另一方面，本发明提供一种基于阵列波导光栅的全光交换分布式强化学习方法，其特征在于，所述方法在如上文中所述基于阵列波导光栅的全光交换分布式强化学习系统上运行，在一个循环中，所述方法包括：On the other hand, the present invention provides an all-optical switching distributed reinforcement learning method based on arrayed waveguide gratings, which is characterized in that the method runs on the all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings as described above , in a loop, the method includes:

在每个集群中，参数服务器机架内的参数服务器按照不同的子任务将预设强化学习模型的参数下发至各工作服务器机架内的工作服务器；In each cluster, the parameter server in the parameter server rack sends the parameters of the preset reinforcement learning model to the working servers in each working server rack according to different subtasks;

在每个集群中，各工作服务器机架内的工作服务器将所述预设强化学习模型按照相应子任务训练得到的参数反馈至参数服务器，并进行梯度聚合。In each cluster, the working servers in each working server rack feed back the parameters obtained by training the preset reinforcement learning model according to corresponding subtasks to the parameter server, and perform gradient aggregation.

在本发明的一些实施例中，各工作服务器机架内的工作服务器将所述预设强化学习模型按照相应子任务训练得到的参数反馈至参数服务器，并进行梯度聚合，还包括：In some embodiments of the present invention, the working server in each working server rack feeds back the parameters obtained by the preset reinforcement learning model according to the corresponding subtask training to the parameter server, and performs gradient aggregation, and further includes:

在每个集群中，参数服务器机架内的参数服务器同步进行梯度聚合，并更新所述预设强化学习模型的参数后；在各集群之间，通过所述群间阵列波导光栅路由器交换各集群梯度聚合后得到的预设强化学习模型的参数；In each cluster, the parameter server in the parameter server rack performs gradient aggregation synchronously, and after updating the parameters of the preset reinforcement learning model; between each cluster, exchange each cluster through the inter-cluster array waveguide grating router The parameters of the preset reinforcement learning model obtained after gradient aggregation;

在每个集群中，参数服务器机架内的参数服务器重新建立子任务，将新的参数下发至各工作服务器机架内的工作服务器，执行下一个循环。In each cluster, the parameter server in the parameter server rack re-establishes subtasks, sends new parameters to the working servers in each working server rack, and executes the next cycle.

另一方面，本发明还提供一种计算机可读存储介质，其上存储有计算机程序，其特征在于，该程序被处理器执行时实现如上文中任一项所述基于阵列波导光栅的全光交换分布式强化学习方法的步骤。On the other hand, the present invention also provides a computer-readable storage medium on which a computer program is stored, which is characterized in that, when the program is executed by a processor, the AWG-based all-optical switching as described in any one of the above Steps of a distributed reinforcement learning method.

本发明的有益效果至少是：The beneficial effects of the present invention are at least:

本发明提供一种基于阵列波导光栅的全光交换分布式强化学习系统及方法，该系统每个集群中设置参数服务器机架和工作服务器机架，将各参数服务器装入参数服务器机架，将各工作服务器装入工作服务器机架，构建参数服务器池和工作服务器池的概念，并利用第一架顶交换机、第二架顶交换机和群内阵列波导光栅路由器进行通信，解决集中式服务器带来的可拓展性问题，极大提高了网络的拓展灵活性。The present invention provides an all-optical switching distributed reinforcement learning system and method based on arrayed waveguide gratings. Each cluster of the system is provided with a parameter server rack and a working server rack, and each parameter server is loaded into the parameter server rack. Each working server is loaded into the working server rack, and the concept of parameter server pool and working server pool is constructed, and the first top-of-rack switch, the second top-of-rack switch and the array waveguide grating router in the group are used for communication, so as to solve the problems caused by centralized servers. The scalability problem greatly improves the flexibility of network expansion.

群内阵列波导光栅路由器以全连接的方式与第一架顶交换机和各第二架顶交换机互连；群间阵列波导光栅路由器以全连接的方式与各集群内参数服务器机架内的第一架顶交换机互连。群内阵列波导光栅路由器和群间阵列波导光栅路由器配合激光器实现全光交换，避免由于光电光转换造成的额外延时和功耗。同时，全连接的拓扑架构大幅降低了网络跳数，仅需一次网络跳数便可以满足梯度通信，满足纳秒级的低延迟通信，解决现有技术中网络跳数随着网络规模的增大而线性增加。The arrayed waveguide grating router in the group is interconnected with the first top-of-rack switch and each second top-of-rack switch in a fully connected manner; the inter-group arrayed waveguide grating router is fully connected with the first Top-of-rack switch interconnect. Arrayed waveguide grating routers within a group and arrayed waveguide grating routers between groups cooperate with lasers to realize all-optical switching, avoiding additional delay and power consumption caused by photoelectric-optical conversion. At the same time, the fully connected topology architecture greatly reduces the number of network hops. Only one network hop is needed to meet gradient communication and nanosecond-level low-latency communication. while increasing linearly.

本发明的附加优点、目的，以及特征将在下面的描述中将部分地加以阐述，且将对于本领域普通技术人员在研究下文后部分地变得明显，或者可以根据本发明的实践而获知。本发明的目的和其它优点可以通过在说明书以及附图中具体指出的结构实现到并获得。Additional advantages, objects, and features of the present invention will be set forth in part in the following description, and will be partly apparent to those of ordinary skill in the art after studying the following text, or can be learned from the practice of the present invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and appended drawings.

本领域技术人员将会理解的是，能够用本发明实现的目的和优点不限于以上具体所述，并且根据以下详细说明将更清楚地理解本发明能够实现的上述和其他目的。It will be understood by those skilled in the art that the objects and advantages that can be achieved by the present invention are not limited to the above specific ones, and the above and other objects that can be achieved by the present invention will be more clearly understood from the following detailed description.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解，构成本申请的一部分，并不构成对本发明的限定。在附图中：The drawings described here are used to provide further understanding of the present invention, constitute a part of the application, and do not limit the present invention. In the attached picture:

图1为本发明一实施例中基于阵列波导光栅的全光交换分布式强化学习系统整体结构示意图。FIG. 1 is a schematic diagram of the overall structure of an all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings in an embodiment of the present invention.

图2为本发明一实施例中基于阵列波导光栅的全光交换分布式强化学习系统的群内结构示意图。Fig. 2 is a schematic diagram of the intra-group structure of the all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings in an embodiment of the present invention.

图3为本发明一实施例中群内阵列波导光栅路由器和群间阵列波导光栅路由器结构示意图。FIG. 3 is a schematic structural diagram of an arrayed waveguide grating router within a group and an arrayed waveguide grating router between groups in an embodiment of the present invention.

图4为本发明一实施例中第一架顶交换机和第二架顶交换机结构示意图。Fig. 4 is a schematic structural diagram of a first top-of-rack switch and a second top-of-rack switch in an embodiment of the present invention.

图5为本发明一实施例中基于阵列波导光栅的全光交换分布式强化学习方法的步骤示意图。FIG. 5 is a schematic diagram of the steps of an arrayed waveguide grating-based distributed reinforcement learning method for all-optical switching in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白，下面结合实施方式和附图，对本发明做进一步详细说明。在此，本发明的示意性实施方式及其说明用于解释本发明，但并不作为对本发明的限定。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with the embodiments and accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

在此，还需要说明的是，为了避免因不必要的细节而模糊了本发明，在附图中仅仅示出了与根据本发明的方案密切相关的结构和/或处理步骤，而省略了与本发明关系不大的其他细节。Here, it should also be noted that, in order to avoid obscuring the present invention due to unnecessary details, only the structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, and the related Other details are not relevant to the invention.

应该强调，术语“包括/包含”在本文使用时指特征、要素、步骤或组件的存在，但并不排除一个或更多个其它特征、要素、步骤或组件的存在或附加。It should be emphasized that the term "comprising/comprising" when used herein refers to the presence of a feature, element, step or component, but does not exclude the presence or addition of one or more other features, elements, steps or components.

在此，还需要说明的是，如果没有特殊说明，术语“连接”在本文不仅可以指直接连接，也可以表示存在中间物的间接连接。Here, it should also be noted that, unless otherwise specified, the term "connection" herein may refer not only to a direct connection, but also to an indirect connection with an intermediate.

在下文中，将参考附图描述本发明的实施例。在附图中，相同的附图标记代表相同或类似的部件，或者相同或类似的步骤。Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.

这里需要强调的是，在下文中提及的各步骤标记并不是对各步骤先后顺序的限定，而应当理解为可以按照实施例中提及的顺序执行步骤，也可以不同于实施例中的顺序，或者若干步骤同时执行。What needs to be emphasized here is that the labels of the steps mentioned below do not limit the sequence of the steps, but should be understood as the steps can be performed in the order mentioned in the embodiment, or can be different from the order in the embodiment, Or several steps are performed simultaneously.

为了解决现有技术限制分布式强化学习系统的可拓展性以及迭代时延高的问题，本发明提供一种基于阵列波导光栅的全光交换分布式强化学习系统，如图1所示，该系统包括多个集群和一个群间阵列波导光栅路由器，具体的：In order to solve the problem that the existing technology limits the scalability of the distributed reinforcement learning system and the high iteration delay, the present invention provides an all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings, as shown in Figure 1, the system Including multiple clusters and an inter-cluster arrayed waveguide grating router, specifically:

如图2所示，每个集群包括参数服务器机架、群内阵列波导光栅路由器和多个工作服务器机架。其中，参数服务器机架包括第一架顶交换机和多个参数服务器，各参数服务器的网口均接入第一架顶交换机；工作服务器机架包括第二架顶交换机和多个工作服务器，各工作服务器的网口均接入第二架顶交换机；群内阵列波导光栅路由器以全连接的方式与第一架顶交换机和各第二架顶交换机互连；在每个集群中，各参数服务器、各工作服务器通过第一架顶交换机、第二架顶交换机和群内阵列波导光栅路由器进行通信连接。As shown in Figure 2, each cluster includes parameter server racks, array waveguide grating routers within the cluster, and multiple working server racks. Among them, the parameter server rack includes the first top-of-rack switch and multiple parameter servers, and the network ports of each parameter server are connected to the first top-of-rack switch; the working server rack includes the second top-of-rack switch and multiple working servers, each The network ports of the working servers are all connected to the second top-of-rack switch; the arrayed waveguide grating routers in the group are interconnected with the first top-of-rack switch and each second top-of-rack switch in a fully connected manner; in each cluster, each parameter server 1. Each working server is communicatively connected through the first top-of-rack switch, the second top-of-rack switch and the array waveguide grating router in the group.

群间阵列波导光栅路由器以全连接的方式与各集群内参数服务器机架内的第一架顶交换机互连；各集群内的参数服务器通过相应的第一架顶交换机以及群间阵列波导光栅路由器进行通信连接。The arrayed waveguide grating router between groups is interconnected with the first top-of-rack switch in the parameter server rack in each cluster in a fully connected manner; the parameter servers in each cluster are connected through the corresponding first top-of-rack switch and the inter-group arrayed waveguide grating router Make a communication connection.

在每个集群中，参数服务器机架内的参数服务器按照不同的子任务将预设强化学习模型的参数下发至各工作服务器机架内的工作服务器；由各工作服务器机架内的工作服务器将预设强化学习模型按照相应子任务训练得到的参数反馈至参数服务器，并进行梯度聚合；在各集群之间，各参数服务器机架内的参数服务器通过群间阵列波导光栅路由器交换各集群梯度聚合后得到的预设强化学习模型的参数。In each cluster, the parameter server in the parameter server rack sends the parameters of the preset reinforcement learning model to the working servers in each working server rack according to different subtasks; the working servers in each working server rack The parameters obtained by the preset reinforcement learning model are fed back to the parameter server according to the corresponding subtask training, and the gradient aggregation is performed; between each cluster, the parameter servers in each parameter server rack exchange the gradients of each cluster through the inter-cluster array waveguide grating router. The parameters of the preset reinforcement learning model obtained after aggregation.

本发明利用群内阵列波导光栅路由器和群间阵列波导光栅路由器实现集群内和集群间的流量梯度切换和权重切换。The invention utilizes the intra-group array waveguide grating router and the inter-group array waveguide grating router to realize flow gradient switching and weight switching within the cluster and between the clusters.

群内阵列波导光栅路由器和群间阵列波导光栅路由器结构相同，阵列波导光栅路由器的英文全称为ArrayedWaveguide Grating Router。Arrayed waveguide grating routers within a group have the same structure as arrayed waveguide grating routers between groups. The full English name of arrayed waveguide grating routers is ArrayedWaveguide Grating Router.

阵列波导光栅(Arrayed Waveguide Grating，简称为AWG)是密集波分复用系统(DWDM)中的首选技术。AWG是一种无源平面波导器件，是利用可编程逻辑控制器(PLC)技术在芯片衬底上制作的阵列波导光栅。与光纤布拉格光栅(FBG)和介质膜滤光片(TTF)相比，AWG具有集成度高、通道数目多、插入损耗小、易于批量自动化生产等优点。AWG通常用于波分复用系统中的光复用器，通过这些设备能够把许多波长的光复合到单一的光纤中，从而提高光纤网络的传播效率。Arrayed Waveguide Grating (AWG for short) is the preferred technology in Dense Wavelength Division Multiplexing (DWDM). AWG is a passive planar waveguide device, which is an arrayed waveguide grating fabricated on a chip substrate using programmable logic controller (PLC) technology. Compared with fiber Bragg grating (FBG) and dielectric film filter (TTF), AWG has the advantages of high integration, large number of channels, small insertion loss, and easy batch automation. AWG is usually used in optical multiplexers in wavelength division multiplexing systems. Through these devices, light of many wavelengths can be multiplexed into a single optical fiber, thereby improving the propagation efficiency of the optical fiber network.

在一些实施例中，群内阵列波导光栅路由器和群间阵列波导光栅路由器采用5×5端口，具体结构如图3所示。左侧为输入端口1、输入端口2、输入端口3、输入端口4和输入端口5；右侧为输出端口a、输出端口b、输出端口c输出端口d和输出端口e；中间为阵列波导光栅。每个输入端口都会输入不同波长的光信号，按照波长路由的方式将相应的光信号路由至对应的输出端口。其中，波长路由是指光信号在经过网络节点时，根据它的波长来选择路由。示例性的，如图3所示，λ₁、λ₂、λ₃、λ₄、和λ₅分别表示五种波长的光信号，以输入端口3为例，输入λ₁、λ₂、λ₃、λ₄、和λ₅五种波长的光信号，按照各自的波长，波长为λ₁的光信号的输出端口为e；波长为λ₂的光信号的输出端口为d；波长为λ₃的光信号的输出端口为c；波长为λ₄的光信号的输出端口为b；波长为λ₅的光信号的输出端口为a。其他输入端口的光信号同理。In some embodiments, the arrayed waveguide grating routers within a group and the arrayed waveguide grating routers between groups use 5×5 ports, and the specific structure is shown in FIG. 3 . On the left are input port 1, input port 2, input port 3, input port 4, and input port 5; on the right are output port a, output port b, output port c, output port d, and output port e; in the middle are arrayed waveguide gratings . Optical signals of different wavelengths are input to each input port, and corresponding optical signals are routed to corresponding output ports according to wavelength routing. Wherein, the wavelength routing refers to selecting a route according to the wavelength of an optical signal when passing through a network node. Exemplarily, as shown in FIG. 3 , λ ₁ , λ ₂ , λ ₃ , λ ₄ , and λ ₅ represent optical signals of five wavelengths respectively. Taking input port 3 as an example, input λ ₁ , λ ₂ , λ ₃ , λ ₄ , and λ ₅ optical signals of five wavelengths, according to their respective wavelengths, the output port of the optical signal with a wavelength of λ ₁ is e; the output port of an optical signal with a wavelength of λ ₂ is d; the output port of an optical signal with a wavelength of λ ₃ The output port of the optical signal is c; the output port of the optical signal with a wavelength of _λ4 is b; the output port of the optical signal with a wavelength of _λ5 is a. The optical signals of other input ports are the same.

在一些实施例中，群内阵列波导光栅路由器和群间阵列波导光栅路由器均按照循环波长路由的方式将光信号路由至对应的输出端口。如图3所示，以波长为λ₃的光信号为例，输入端口1输入的波长为λ₃的光信号，对应的输出端口为a；输入端口2输入的波长为λ₃的光信号，对应的输出端口为b；输入端口3输入的波长为λ₃的光信号，对应的输出端口为c；输入端口4输入的波长为λ₃的光信号，对应的输出端口为d；输入端口5输入的波长为λ₃的光信号，对应的输出端口为e，形成循环。其他波长的光信号同理。In some embodiments, both the arrayed waveguide grating routers within the group and the arrayed waveguide grating routers between the groups route the optical signals to the corresponding output ports in a circular wavelength routing manner. As shown in Figure 3, take the optical signal of wavelength λ ₃ as an example, the wavelength of the input port 1 input is the optical signal of λ ₃ , the corresponding output port is a; the wavelength of the input port 2 input is the optical signal of λ ₃ , The corresponding output port is b; the input port 3 is an optical signal with a wavelength of λ3, and the corresponding output port is c; the input port 4 is an optical _signal with a wavelength of _λ3 , and the corresponding output port is d; the input port 5 The input optical signal with a wavelength of _λ3 corresponds to an output port e, forming a loop. The same applies to optical signals of other wavelengths.

在一些实施例中，群内阵列波导光栅路由器和群间阵列波导光栅路由器采用波分复用技术建立多通道通信链路。示例性的，如图3所示，基于5种波长建立25通道通信链路。In some embodiments, the arrayed waveguide grating routers within the group and the arrayed waveguide grating routers between the groups use wavelength division multiplexing technology to establish multi-channel communication links. Exemplarily, as shown in FIG. 3 , 25 channel communication links are established based on 5 wavelengths.

第一架顶交换机和第二架顶交换机结构相同，均由现场可编程的门阵列(FieldProgrammable GateArray，FPGA)实现，主要负责机架内工作服务器或参数服务器的互连，以及集群内和集群间的路由和转发。The structure of the first top-of-rack switch and the second top-of-rack switch are the same, and both are realized by Field Programmable Gate Array (Field Programmable GateArray, FPGA). routing and forwarding.

FPGA是在可编程阵列逻辑(PAL)、通用阵列逻辑(GAL)等可编程器件的基础上进一步发展的产物。它是作为专用集成电路(ASIC)领域中的一种半定制电路而出现的，既解决了定制电路的不足，又克服了原有可编程器件门电路数有限的缺点。FPGA is a product of further development on the basis of programmable devices such as programmable array logic (PAL) and general array logic (GAL). It emerged as a semi-custom circuit in the field of application-specific integrated circuits (ASIC), which not only solves the shortcomings of custom circuits, but also overcomes the shortcomings of the limited number of original programmable device gates.

第一架顶交换机和第二架顶交换机均采用架顶式接线方式(Top ofRank，TOR)。TOR是EOR(End ofRow)/MOR(Middle ofRow)方式的扩展，三者都是数据中心的一种架构设计方式。传统的机架主要以EOR和MOR方式为主，采取类似的集中式布线，EOR和MOR两者的差别主要在于交换机位于网络机架的位置不同。而TOR方式将交换机设置在机架顶部，机架内所有服务器直接通过短跳线连接到顶部的交换机上，再经由光纤从交换机的上行链路端口连至核心交换机。TOR方式将集中式布线变为了点对点布线方式，大大缩减了布线使用量。Both the first top-of-rack switch and the second top-of-rack switch adopt a top-of-rack (Top of Rank, TOR) connection mode. TOR is an extension of the EOR (End of Row)/MOR (Middle of Row) method, all of which are an architectural design method of the data center. Traditional racks are mainly based on EOR and MOR, and adopt similar centralized wiring. The difference between EOR and MOR mainly lies in the position of the switch in the network rack. In the TOR mode, the switch is set on the top of the rack, and all servers in the rack are directly connected to the top switch through short jumpers, and then connected to the core switch from the uplink port of the switch through optical fibers. The TOR method changes the centralized wiring into a point-to-point wiring method, which greatly reduces the amount of wiring used.

同时，在本发明中，采用TOR的方式，将第一架顶交换机设置在参数服务器机架顶部，各参数服务器的网口均接入第一架顶交换机；将第二架顶交换机设置在工作服务器机架顶部，各工作服务器的网口均接入所述第二架顶交换机。实现了第一架顶交换机与各参数服务器的全连接，第二架顶交换机与各工作服务器的全连接。将集群内所有参数服务器装入参数服务器机架，所有工作服务器装入工作服务器机架，由第一架顶交换机、第二架顶交换机与群内阵列波导光栅路由器、群间阵列波导光栅路由器完成数据通信，即提出参数服务器池和工作服务器池的概念，有效提升网络的可拓展性。At the same time, in the present invention, the TOR mode is adopted to set the first top-of-rack switch on the top of the parameter server rack, and the network ports of each parameter server are connected to the first top-of-rack switch; the second top-of-rack switch is set to work On the top of the server rack, the network ports of each working server are connected to the second top-of-rack switch. Realize the full connection between the first top-of-rack switch and each parameter server, and the full connection between the second top-of-rack switch and each working server. Install all the parameter servers in the cluster into the parameter server rack, and install all the working servers into the working server rack, which is completed by the first top-of-rack switch, the second top-of-rack switch, intra-group array waveguide grating routers, and inter-group array waveguide grating routers Data communication, that is, the concept of parameter server pool and working server pool is proposed to effectively improve the scalability of the network.

如图4所示，为第一架顶交换机和第二架顶交换机的结构图，由于其两者结构相同，在后文描述结构时统称为架顶交换机。架顶交换机包括交换模块、多个接收模块和多个发送模块，其中，交换模块包括数据包处理器、调度器、广播模块和选择器，为各接收模块和各发送模块共用。交换模块还构建并记载用于映射局域网地址与发送端口的流表。As shown in FIG. 4 , it is a structure diagram of the first top-of-rack switch and the second top-of-rack switch. Since both of them have the same structure, they are collectively referred to as top-of-rack switches when describing the structure later. The top-of-rack switch includes a switching module, a plurality of receiving modules and a plurality of sending modules, wherein the switching module includes a data packet processor, a scheduler, a broadcast module and a selector, which are shared by each receiving module and each sending module. The switching module also builds and records a flow table for mapping LAN addresses and sending ports.

接收模块接收待转发的数据包后，将其存存储在数据包缓存区中，其中，数据包缓存区采用FIFO(First In First Out)的数据缓存器，FIFO表示读写数据时只能顺序写入顺序读出。在一些实施例中，数据包为以太网数据包。After receiving the data packet to be forwarded by the receiving module, it is stored in the data packet buffer area, wherein, the data packet buffer area adopts a FIFO (First In First Out) data buffer, and FIFO means that when reading and writing data, it can only be written sequentially read out in sequence. In some embodiments, the data packets are Ethernet data packets.

在一些实施例中，数据包处理器根据数据包的以太网报文头判断该数据包的目的地。若数据包的目的地指向对应机架内时，选择器直接将数据包转发至相应机架内的服务器。示例性的，参数服务器机架内的第一架顶交换机从数据包缓存区收集数据包，经数据包处理器判断，该数据包的目的地正好为该参数服务器机架内的参数服务器3，则第一架顶交换机的选择器直接将该数据包转发至参数服务器机架内的参数服务器3。In some embodiments, the data packet processor determines the destination of the data packet according to the Ethernet packet header of the data packet. If the destination of the data packet points to the corresponding rack, the selector directly forwards the data packet to the server in the corresponding rack. Exemplarily, the first top-of-rack switch in the parameter server rack collects data packets from the data packet buffer area, and the data packet processor judges that the destination of the data packets is just the parameter server 3 in the parameter server rack, Then the selector of the first top-of-rack switch directly forwards the data packet to the parameter server 3 in the parameter server rack.

若数据包的目的地指向机架间时，调度器从数据包的报文头中提取局域网地址，并查询流表，以获取数据包的目的地对应的发送端口；选择器再根据获取的发送端口将数据包转发至该发送端。示例性的，参数服务器机架内的第一架顶交换机从数据包缓存区收集数据包，经数据包处理器判断，该数据包的目的地为与第一架顶交换机处于同一集群内的工作服务器机架1内的工作服务器3，调度器从该数据包的报文头中提取局域网地址，并将提取的局域网地址与流表中记录的地址对比，得到该局域网地址对应的发送端口，第一架顶交换机的选择器将该数据包发送至相应发送端口，再经由群内阵列波导光栅路由器传输至工作服务器机架1内的工作服务器3。If the destination of the data packet points to the inter-rack, the scheduler extracts the LAN address from the header of the data packet, and queries the flow table to obtain the sending port corresponding to the destination of the data packet; The port forwards the packet to the sender. Exemplarily, the first top-of-rack switch in the parameter server rack collects data packets from the data packet buffer, and the data packet processor determines that the destination of the data packets is the working For the working server 3 in the server rack 1, the scheduler extracts the LAN address from the header of the data packet, compares the extracted LAN address with the address recorded in the flow table, and obtains the sending port corresponding to the LAN address. The selector of a top-of-rack switch sends the data packet to the corresponding sending port, and then transmits the data packet to the working server 3 in the working server rack 1 via the arrayed waveguide grating router in the group.

在一些实施例中，当目的地与数据包处于同一集群内时，将数据包经由群内阵列波导光栅路由器转发至相应的目的地服务器；当目的地与数据包处于不同集群时，将数据包经由群间阵列波导光栅路由器转发至相应的目的地服务器。In some embodiments, when the destination and the data packet are in the same cluster, the data packet is forwarded to the corresponding destination server via the array waveguide grating router in the cluster; when the destination and the data packet are in different clusters, the data packet is forwarded to It is forwarded to the corresponding destination server via the intergroup array waveguide grating router.

初始化阶段，第一架顶交换机和第二架顶交换机刚启动，其对应的流表内无表项。当接入各参数服务器或工作服务器后，相应的第一架顶交换机或第二架顶交换机开始学习局域网地址。示例性的，第一架顶交换机把参数服务器机架内参数服务器1发出的数据包中的源地址MAC_A与接收到此帧的端口A关联起来，并在流表中关联地记录MAC_A-A。待第一架顶交换机和第二架顶交换机将所有数据包源地址与相应端口关联后，第一架顶交换机和第二架顶交换机的流表学习完成，可以开始进行数据包转发。In the initialization phase, the first top-of-rack switch and the second top-of-rack switch have just started up, and there are no entries in their corresponding flow tables. After accessing each parameter server or working server, the corresponding first top-of-rack switch or second top-of-rack switch starts to learn the LAN address. Exemplarily, the first top-of-rack switch associates the source address MAC _A in the data packet sent by the parameter server 1 in the parameter server rack with the port A that received the frame, and records the MAC _A in the flow table associatively - a. After the first top-of-rack switch and the second top-of-rack switch associate all data packet source addresses with corresponding ports, the flow table learning of the first top-of-rack switch and the second top-of-rack switch is completed, and data packet forwarding can begin.

在一些实施例中，当数据包的目的地指向机架间，调度器从数据包的报文头中提取局域网地址，流表中未记录该局域网地址及其对应的发送端口，第一架顶交换机和/或第二架顶交换机则不知道将该数据包发送至哪个端口时，第一架顶交换机和/或第二架顶交换机会将该数据包发送至所有发送端口，并发出警报提示人工复查。In some embodiments, when the destination of the data packet points to the inter-rack, the scheduler extracts the LAN address from the header of the data packet, and the LAN address and its corresponding sending port are not recorded in the flow table. When the switch and/or the second top-of-rack switch does not know which port to send the packet to, the first top-of-rack switch and/or the second top-of-rack switch will send the packet to all sending ports and sound an alarm Manual review.

在一些实施例中，第一架顶交换机和第二架顶交换机均采用25GbE架顶交换机。25GbE架顶交换机可以实现纳秒级数据分发，加快数据向光交换机的流动，同时避免了多对一架构带来的弊端。In some embodiments, both the first top-of-rack switch and the second top-of-rack switch are 25GbE top-of-rack switches. The 25GbE top-of-rack switch can realize nanosecond-level data distribution, speed up the flow of data to the optical switch, and avoid the disadvantages caused by the many-to-one architecture.

下面结合一具体实施例对本发明提供的基于阵列波导光栅的全光交换分布式强化学习系统进行进一步说明。The arrayed waveguide grating-based all-optical switching distributed reinforcement learning system provided by the present invention will be further described below in conjunction with a specific embodiment.

示例性的，可以将本发明提供的基于阵列波导光栅的全光交换分布式强化学习系统分为四种工作模式：建立通信模式、参数下发模式、梯度聚合模式和群间通信模式。Exemplarily, the arrayed waveguide grating-based all-optical switching distributed reinforcement learning system provided by the present invention can be divided into four working modes: establishing communication mode, parameter delivery mode, gradient aggregation mode and inter-group communication mode.

建立通信模式：在每个集群中，参数服务器与工作服务器采用光电路交换技术建立连接，并通过握手建立通信信道。其中，握手是指在通信电路建立之后，信息传输开始之前，接收端和发送端之间建立通信参数的过程，示例性的，参数包括信息传输率、字母表、奇偶校验、中断过程等其他协议特性。建立通信信道后，参数服务器与工作服务器利用该信道进行参数下发模式和梯度聚合模式中的参数下发和梯度聚合操作。Establish communication mode: In each cluster, the parameter server and the working server establish a connection using optical circuit switching technology, and establish a communication channel through handshaking. Among them, the handshake refers to the process of establishing communication parameters between the receiving end and the sending end after the communication circuit is established and before the information transmission starts. Exemplarily, the parameters include information transmission rate, alphabet, parity check, interrupt process, etc. protocol properties. After the communication channel is established, the parameter server and the working server use the channel to perform parameter delivery and gradient aggregation operations in the parameter delivery mode and gradient aggregation mode.

参数下发模式：在每个集群中，参数服务器按照不同的子任务将预设强化学习模型的参数下发至工作服务器。参数服务器将参数以以太网数据包形式发送至所属的第一架顶交换机，第一架顶交换机的接收模块接收该数据包，并存储于数据包缓存区；第一架顶交换机的数据包处理器根据数据包的报文头判断该数据包的目的地，判断得知该数据包的目的地为机架间；在FPGA三个工作周期内，根据该数据包的以太网报文头的地址将报文切换到集群内；第一架顶交换机的调度器从数据包的报文头中提取局域网地址，并查询流表，以获取目的地对应的发送端口；第一架顶交换机将该数据包通过得到的发送端口，通过固定波长激光器，转化为相应波长的光信号；再按照循环波长路由的方式由群内阵列波导光栅路由器完成光交换，最后数据包由各工作服务器所属的第二架顶交换机解调后发往至对应的工作服务器。Parameter delivery mode: In each cluster, the parameter server sends the parameters of the preset reinforcement learning model to the working server according to different subtasks. The parameter server sends the parameters to the first top-of-rack switch in the form of Ethernet data packets, and the receiving module of the first top-of-rack switch receives the data packets and stores them in the data packet buffer area; the data packet processing of the first top-of-rack switch The device judges the destination of the data packet according to the message header of the data packet, and judges that the destination of the data packet is inter-rack; within three working cycles of the FPGA, according to the address of the Ethernet message header of the data packet Switch the message to the cluster; the scheduler of the first top-of-rack switch extracts the LAN address from the header of the data packet, and queries the flow table to obtain the sending port corresponding to the destination; the first top-of-rack switch sends the data The packet passes through the obtained sending port, and is converted into an optical signal of the corresponding wavelength by a fixed-wavelength laser; then the optical switching is completed by the arrayed waveguide grating router in the group according to the cyclic wavelength routing method, and finally the data packet is transmitted by the second rack to which each working server belongs After being demodulated by the top switch, it is sent to the corresponding working server.

梯度聚合模式：在每个集群中，工作服务器将预设强化学习模型按照相应子任务训练得到的参数反馈至参数服务器，并进行梯度聚合。梯度结果以以太网数据包形式发送至所属的第二架顶交换机，第二架顶交换机的接收模块接收该数据包，并存储于数据包缓存区；第二架顶交换机的数据包处理器根据数据包的报文头判断该数据包的目的地，判断得知该数据包的目的地为机架间；在FPGA三个工作周期内，根据该数据包的以太网报文头的地址将报文切换到集群内；第二架顶交换机的调度器从数据包的报文头中提取局域网地址，并查询流表，以获取目的地对应的发送端口；第二架顶交换机将该数据包通过得到的发送端口，通过固定波长激光器，转化为相应波长的光信号；再按照循环波长路由的方式由群内阵列波导光栅路由器完成光交换，最后数据包由各参数服务器所属的第一架顶交换机解调后发往至对应的参数服务器。Gradient aggregation mode: In each cluster, the working server feeds back the parameters obtained by the preset reinforcement learning model according to the corresponding subtask training to the parameter server, and performs gradient aggregation. The gradient result is sent to the second top-of-rack switch in the form of Ethernet data packet, and the receiving module of the second top-of-rack switch receives the data packet and stores it in the data packet buffer area; the data packet processor of the second top-of-rack switch according to The message header of the data packet judges the destination of the data packet, and it is judged that the destination of the data packet is between racks; within three working cycles of the FPGA, according to the address of the Ethernet message header of the data packet, the The text is switched to the cluster; the scheduler of the second top-of-rack switch extracts the LAN address from the header of the data packet, and queries the flow table to obtain the sending port corresponding to the destination; the second top-of-rack switch passes the data packet through The obtained sending port is converted into an optical signal of the corresponding wavelength through a fixed-wavelength laser; then the optical switching is completed by the arrayed waveguide grating router in the group according to the cyclic wavelength routing method, and finally the data packet is transmitted by the first top-of-rack switch to which each parameter server belongs After demodulation, it is sent to the corresponding parameter server.

群间通信模式：各参数服务器根据各工作服务器反馈的梯度结果，更新训练模型的参数，并将更新后的参数以以太网数据包形式发送至所属的第一架顶交换机，第一架顶交换机的接收模块接收该数据包，并存储于数据包缓存区；第一架顶交换机的数据包处理器根据数据包的报文头判断该数据包的目的地，判断得知该数据包的目的地为机架间；在FPGA三个工作周期内，根据该数据包的以太网报文头的地址将报文切换到集群间；第一架顶交换机的调度器从数据包的报文头中提取局域网地址，并查询流表，以获取目的地对应的发送端口；第一架顶交换机将该数据包通过得到的发送端口，通过固定波长激光器，转化为相应波长的光信号；再按照循环波长路由的方式由群间列波导光栅路由器完成光交换，最后数据包由各参数服务器所属的第一架顶交换机解调后发往至对应的参数服务器。Inter-group communication mode: Each parameter server updates the parameters of the training model according to the gradient results fed back by each working server, and sends the updated parameters to the first top-of-rack switch in the form of Ethernet data packets, and the first top-of-rack switch The receiving module receives the data packet and stores it in the data packet buffer area; the data packet processor of the first top-of-rack switch judges the destination of the data packet according to the message header of the data packet, and determines the destination of the data packet It is between the racks; within three working cycles of the FPGA, the message is switched to the inter-cluster according to the address of the Ethernet header of the data packet; the scheduler of the first top-of-rack switch extracts from the header of the data packet LAN address, and query the flow table to obtain the sending port corresponding to the destination; the first top-of-rack switch passes the data packet through the obtained sending port, and converts the data packet into an optical signal of the corresponding wavelength through a fixed-wavelength laser; and then routes according to the cyclic wavelength The inter-group waveguide grating router completes the optical switching, and finally the data packet is demodulated by the first top-of-rack switch to which each parameter server belongs and then sent to the corresponding parameter server.

在实施时，各参数服务器向各工作服务器下发参数，各工作服务器向各参数服务器进行梯度聚合，参数服务器同步梯度聚合后，参数服务器根据训练模型需求开始下一次的任务切片，即新一轮的参数下发，以此循环，直至训练模型达到预设的性能。During implementation, each parameter server sends parameters to each working server, and each working server performs gradient aggregation to each parameter server. After the parameter server synchronizes the gradient aggregation, the parameter server starts the next task slice according to the training model requirements, that is, a new round The parameters are issued, and this cycle continues until the training model reaches the preset performance.

在一些实施例中，当第一架顶交换机和第二架顶交换机均采用25GbE交换机时，系统完成一次参数下发和一次梯度聚合(即一次迭代操作)仅需975纳秒，相较于传统的架构，有效加速分布式强化学习的通信阶段。In some embodiments, when the first top-of-rack switch and the second top-of-rack switch both use 25GbE switches, it only takes 975 nanoseconds for the system to complete a parameter delivery and a gradient aggregation (that is, an iterative operation), compared with traditional An architecture that effectively accelerates the communication phase of distributed reinforcement learning.

本发明还提供一种基于阵列波导光栅的全光交换分布式强化学习方法，该方法在如上文中所述基于阵列波导光栅的全光交换分布式强化学习系统上运行，如图5所示，在一个循环中，该方法包括以下步骤S101～S102：The present invention also provides an all-optical switching distributed reinforcement learning method based on arrayed waveguide gratings. The method runs on the all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings as described above, as shown in FIG. 5 , in In one cycle, the method includes the following steps S101-S102:

步骤S101：在每个集群中，参数服务器机架内的参数服务器按照不同的子任务将预设强化学习模型的参数下发至各工作服务器机架内的工作服务器。Step S101: In each cluster, the parameter server in the parameter server rack delivers the parameters of the preset reinforcement learning model to the working servers in each working server rack according to different subtasks.

步骤S102：在每个集群中，各工作服务器机架内的工作服务器将预设强化学习模型按照相应子任务训练得到的参数反馈至参数服务器，并进行梯度聚合。Step S102: In each cluster, the working servers in each working server rack feed back the parameters obtained by the preset reinforcement learning model according to the corresponding subtask training to the parameter server, and perform gradient aggregation.

在一些实施例中，各工作服务器机架内的工作服务器将预设强化学习模型按照相应子任务训练得到的参数反馈至参数服务器，并进行梯度聚合，还包括步骤S103～S104：In some embodiments, the working servers in each working server rack feed back the parameters obtained by the preset reinforcement learning model according to the corresponding subtask training to the parameter server, and perform gradient aggregation, further including steps S103-S104:

步骤S103：在每个集群中，参数服务器机架内的参数服务器同步进行梯度聚合，并更新预设强化学习模型的参数；在各集群之间，通过群间阵列波导光栅路由器交换各集群梯度聚合后得到的预设强化学习模型的参数。Step S103: In each cluster, the parameter server in the parameter server rack performs gradient aggregation synchronously, and updates the parameters of the preset reinforcement learning model; between each cluster, the gradient aggregation of each cluster is exchanged through the inter-cluster array waveguide grating router The parameters of the preset reinforcement learning model obtained after.

步骤S104：在每个集群中，参数服务器机架内的参数服务器重新建立子任务，将新的参数下发至各工作服务器机架内的工作服务器，执行下一个循环。Step S104: In each cluster, the parameter server in the parameter server rack re-establishes a subtask, sends new parameters to the working servers in each working server rack, and executes the next cycle.

与上述方法相应地，本发明还提供了一种设备，该设备包括计算机设备，所述计算机设备包括处理器和存储器，所述存储器中存储有计算机指令，所述处理器用于执行所述存储器中存储的计算机指令，当所述计算机指令被处理器执行时该设备实现如前所述方法的步骤。Corresponding to the above method, the present invention also provides a device, which includes a computer device, the computer device includes a processor and a memory, the memory stores computer instructions, and the processor is used to execute the instructions in the memory. Stored computer instructions, when the computer instructions are executed by the processor, the device implements the steps of the aforementioned method.

本发明实施例还提供一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时以实现前述边缘计算服务器部署方法的步骤。该计算机可读存储介质可以是有形存储介质，诸如随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、软盘、硬盘、可移动存储盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质。An embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the aforementioned method for deploying an edge computing server can be implemented. The computer readable storage medium may be a tangible storage medium such as random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disk, hard disk, removable storage disk, CD-ROM, or any other form of storage medium known in the art.

综上所述，本发明提供一种基于阵列波导光栅的全光交换分布式强化学习系统及方法，该系统每个集群中设置参数服务器机架和工作服务器机架，将各参数服务器装入参数服务器机架，将各工作服务器装入工作服务器机架，构建参数服务器池和工作服务器池的概念，并利用第一架顶交换机、第二架顶交换机和群内阵列波导光栅路由器进行通信，解决集中式服务器带来的可拓展性问题，极大提高了网络的拓展灵活性。To sum up, the present invention provides an all-optical switching distributed reinforcement learning system and method based on arrayed waveguide gratings. In each cluster of the system, a parameter server rack and a working server rack are set, and each parameter server is loaded into a parameter The server rack, put each working server into the working server rack, construct the concept of parameter server pool and working server pool, and use the first top-of-rack switch, the second top-of-rack switch and the array waveguide grating router in the group to communicate, solve the problem of The scalability problem brought about by the centralized server greatly improves the flexibility of network expansion.

本领域普通技术人员应该可以明白，结合本文中所公开的实施方式描述的各示例性的组成部分、系统和方法，能够以硬件、软件或者二者的结合来实现。具体究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。当以硬件方式实现时，其可以例如是电子电路、专用集成电路(ASIC)、适当的固件、插件、功能卡等等。当以软件方式实现时，本发明的元素是被用于执行所需任务的程序或者代码段。程序或者代码段可以存储在机器可读介质中，或者通过载波中携带的数据信号在传输介质或者通信链路上传送。Those of ordinary skill in the art should understand that each exemplary component, system and method described in conjunction with the embodiments disclosed herein can be implemented by hardware, software or a combination of the two. Whether it is implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an application specific integrated circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments employed to perform the required tasks. Programs or code segments can be stored in machine-readable media, or transmitted over transmission media or communication links by data signals carried in carrier waves.

需要明确的是，本发明并不局限于上文所描述并在图中示出的特定配置和处理。为了简明起见，这里省略了对已知方法的详细描述。在上述实施例中，描述和示出了若干具体的步骤作为示例。但是，本发明的方法过程并不限于所描述和示出的具体步骤，本领域的技术人员可以在领会本发明的精神后，作出各种改变、修改和添加，或者改变步骤之间的顺序。It is to be understood that the invention is not limited to the specific arrangements and processes described above and shown in the drawings. For conciseness, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of the present invention is not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the sequence of steps after understanding the spirit of the present invention.

本发明中，针对一个实施方式描述和/或例示的特征，可以在一个或更多个其它实施方式中以相同方式或以类似方式使用，和/或与其他实施方式的特征相结合或代替其他实施方式的特征。In the present invention, features described and/or exemplified for one embodiment can be used in the same or similar manner in one or more other embodiments, and/or can be combined with features of other embodiments or replace other Features of the implementation.

以上所述仅为本发明的优选实施例，并不用于限制本发明，对于本领域的技术人员来说，本发明实施例可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, various modifications and changes may be made to the embodiments of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. An all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings, characterized in that it comprises:

A plurality of clusters, each cluster includes a parameter server rack, an arrayed waveguide grating router in the group, and a plurality of working server racks; the parameter server rack includes a first top-of-rack switch and a plurality of parameter servers, and the network of each parameter server ports are all connected to the first top-of-rack switch; the working server rack includes a second top-of-rack switch and a plurality of working servers, and the network ports of each working server are connected to the second top-of-rack switch; the group The inner arrayed waveguide grating router is interconnected with the first top-of-rack switch and each second top-of-rack switch in a fully connected manner; in each cluster, each parameter server and each working server pass through the first top-of-rack switch, The second top-of-rack switch communicates with the arrayed waveguide grating router in the group;

An arrayed waveguide grating router between groups, the arrayed waveguide grating router between groups is interconnected with the first top-of-rack switch in the parameter server rack in each cluster in a fully connected manner; the parameter servers in each cluster are connected through the corresponding first rack The top switch and the arrayed waveguide grating router between the groups are connected for communication;

Among them, in each cluster, the parameter server in the parameter server rack sends the parameters of the preset reinforcement learning model to the working servers in each working server rack according to different subtasks, and the parameters in each working server rack The working server feeds back the parameters obtained by the preset reinforcement learning model to the parameter server according to the corresponding subtask training, and performs gradient aggregation; between each cluster, the parameter servers in each parameter server rack pass through the inter-cluster array waveguide The grating router exchanges the parameters of the preset reinforcement learning model obtained after the gradient aggregation of each cluster.

2. The all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings according to claim 1, wherein the arrayed waveguide grating routers in the group and the arrayed waveguide grating routers between the groups adopt wavelength division multiplexing technology Establish a multi-channel communication link.

3. The all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings according to claim 1, wherein the arrayed waveguide grating routers in the group and the arrayed waveguide grating routers between the groups are all routed according to the cycle wavelength The way to route the optical signal to the corresponding output port.

4. The all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings according to claim 1, wherein the first top-of-rack switch and the second top-of-rack switch both include a switching module, a plurality of receiving module and a plurality of sending modules, the switching module includes a data packet processor, a scheduler, a broadcast module and a selector, and the switching module also constructs and records a flow table for mapping LAN addresses and sending ports.

5. The all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings according to claim 4, wherein the data packet processor judges the purpose of the data packet according to the header of the data packet to be forwarded land;

When the destination of the data packet points to the corresponding rack, the selector directly forwards the data packet to the server in the corresponding rack;

When the destination of the data packet points to the inter-rack, the scheduler extracts the LAN address from the header of the data packet, and queries the flow table to obtain the sending port corresponding to the destination of the data packet; The selector forwards the data packet to the sending port according to the obtained sending port.

6. The all-optical switching distributed reinforcement learning system based on arrayed waveguide gratings according to claim 5, wherein the selector forwards the data packet to the sending port according to the sending port obtained, further comprising:

When the destination and the data packet are in the same cluster, forward the data packet to a corresponding destination server via the arrayed waveguide grating router in the cluster;

When the destination and the data packet are in different clusters, the data packet is forwarded to a corresponding destination server via the inter-cluster array waveguide grating router.

7. The AWG-based all-optical switching distributed reinforcement learning system according to claim 5, wherein the scheduler extracts the LAN address from the header of the data packet, and queries the flow table to obtain The sending port corresponding to the destination of the data packet also includes:

When the LAN address and its corresponding sending port do not exist in the flow table, flooding is performed and an alarm is issued.

8. An arrayed waveguide grating-based all-optical switching distributed reinforcement learning method, characterized in that the method is based on an arrayed waveguide grating-based all-optical switching distributed reinforcement learning method according to any one of claims 1 to 7. system, in a loop, the method includes:

In each cluster, the parameter server in the parameter server rack sends the parameters of the preset reinforcement learning model to the working servers in each working server rack according to different subtasks;

In each cluster, the working servers in each working server rack feed back the parameters obtained by training the preset reinforcement learning model according to corresponding subtasks to the parameter server, and perform gradient aggregation.

9. The all-optical switching distributed reinforcement learning method based on arrayed waveguide gratings according to claim 8, wherein the working servers in each working server rack train the preset reinforcement learning model according to corresponding subtasks to obtain The parameters of are fed back to the parameter server for gradient aggregation, including:

In each cluster, the parameter server in the parameter server rack performs gradient aggregation synchronously, and after updating the parameters of the preset reinforcement learning model; between each cluster, exchange each cluster through the inter-cluster array waveguide grating router The parameters of the preset reinforcement learning model obtained after gradient aggregation;

In each cluster, the parameter server in the parameter server rack re-establishes subtasks, sends new parameters to the working servers in each working server rack, and executes the next cycle.

10. A computer-readable storage medium on which a computer program is stored, wherein when the program is executed by a processor, the all-optical switching distribution based on the arrayed waveguide grating according to any one of claims 8 to 9 is realized. Steps in a reinforcement learning method.