WO2019214128A1 - 一种动态可重构的智能计算集群及其配置方法 - Google Patents

一种动态可重构的智能计算集群及其配置方法 Download PDF

Info

Publication number
WO2019214128A1
WO2019214128A1 PCT/CN2018/106105 CN2018106105W WO2019214128A1 WO 2019214128 A1 WO2019214128 A1 WO 2019214128A1 CN 2018106105 W CN2018106105 W CN 2018106105W WO 2019214128 A1 WO2019214128 A1 WO 2019214128A1
Authority
WO
WIPO (PCT)
Prior art keywords
intelligent computing
computing
plane
intelligent
cluster
Prior art date
Application number
PCT/CN2018/106105
Other languages
English (en)
French (fr)
Inventor
姜凯
于治楼
王子彤
Original Assignee
济南浪潮高新科技投资发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 济南浪潮高新科技投资发展有限公司 filed Critical 济南浪潮高新科技投资发展有限公司
Publication of WO2019214128A1 publication Critical patent/WO2019214128A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Definitions

  • the invention relates to the field of artificial intelligence computing, in particular to a dynamic reconfigurable intelligent computing cluster and a configuration method thereof.
  • FPGA Field Programmable Gate Array
  • heterogeneous computing using a reconfigurable architecture of CPU+FPGA has many advantages, such as higher performance, greater flexibility, lower power consumption characteristics, inherent fault tolerance, and greatly reduced products. Development cycle, etc.
  • FPGAs instead of GPUs as the accelerator for future high-performance computing should be the main theme of the development of FPGA heterogeneous intelligent computing at this stage.
  • a dynamic reconfigurable intelligent computing cluster and its configuration method are proposed to realize the dynamic configuration of the serial and parallel mode of FPGA and FPGA.
  • the technical task of the present invention is to provide a dynamically reconfigurable intelligent computing cluster and a configuration method thereof for the above deficiencies.
  • a dynamically reconfigurable intelligent computing cluster including
  • a general computing plane configured to complete a computing task including clustering, scheduling, or parameter passing, consisting of a parameter server node and a smart computing node, the smart computing node including a processor, a preprocessor connected memory, and a processor connection exchange Chip, smart computing card connected to the switch chip, PCIE bridge, BMC, hard disk, network card, on the general computing plane, between all parameter server nodes, between all intelligent computing nodes and between parameter server nodes and intelligent computing nodes All connected through the network;
  • the intelligent computing plane is used to complete the computing tasks including parallel and pipeline. It consists of several intelligent computing nodes. All intelligent computing nodes use SRIO interconnection. The intelligent computing plane and the general computing plane are interconnected through the PCIE interface.
  • the SRIO interconnection path between the intelligent computing nodes means that the SRIO bus is connected to the intelligent computing card and the QSFP interface on the intelligent computing node, and then connected to the SRIO switch through the QSFP interface, thereby realizing all intelligent computing.
  • the interconnection of nodes means that the SRIO bus is connected to the intelligent computing card and the QSFP interface on the intelligent computing node, and then connected to the SRIO switch through the QSFP interface, thereby realizing all intelligent computing.
  • the BMC accesses the management network port through the SGMII signal, and then connects the management network port to the Gigabit switch to implement interconnection between multiple BMCs.
  • the intelligent computing nodes implement extended interconnection between the following ways:
  • all intelligent computing nodes are configured into three expansion planes, SRIO expansion plane, PCIE extension plane and 40G extension plane.
  • all intelligent computing nodes in the SRIO expansion plane are extended by SRIO bus connection; in the PCIE extension plane, all intelligent computing nodes Connection expansion is implemented through the PCIE interface; in the 40G expansion plane, all intelligent computing nodes implement connection expansion through the form of a network cable connection processor.
  • all intelligent computing nodes can adopt a serial, parallel or serial combined working mode.
  • a dynamic reconfigurable intelligent computing cluster configuration method based on the above cluster, the configuration process is
  • a software scheduling module and a file pool are first configured in the cluster, wherein the software scheduling module provides a system task manager, issues a system task command, and stores a configuration file in the file pool;
  • the intelligent computing plane is invoked through the software scheduling module, and the task scheduling is started;
  • the corresponding configuration file is extracted in the configuration file pool inside the cluster, and the resource scheduling is performed through the intelligent computing node, and the FPGA resource is dynamically reconstructed according to requirements in the cluster.
  • the configuration file includes a neural network configuration file, a linear regression configuration file, a decision tree configuration file, and an enhanced learning configuration file.
  • the general calculation plane when the task is clustering, scheduling, or parameter transfer calculation, the general calculation plane is used; when the task is parallel and pipeline calculation, the intelligent calculation plane is used.
  • the intelligent computing node scheduling resource is implemented by configuring a serial, parallel or serial combined working mode.
  • serial or serial combined working mode is adopted, all cross-node data is exchanged by the associated intelligent computing node through SRIO. .
  • a dynamically reconfigurable intelligent computing cluster and a configuration method thereof have the following beneficial effects:
  • the invention relates to a dynamic reconfigurable intelligent computing cluster and a configuration method thereof, and an intelligent computing node can be reasonably scheduled resources, and can flexibly configure a serial, parallel or serial combined working mode; serial or serial combined operation mode Cross-node data can be directly exchanged by the intelligent computing node through SRIO, without going through the server node, reducing the CPU load and shortening the communication path; when the intelligent computing node resources are tight or some algorithms are not suitable for FPGA operations, the CPU can be flexibly dispatched to the CPU.
  • the resource utilization of the system is improved; the resource scheduling management of the CPU and the FPGA can be realized through the cluster management scheduling software, and the server node can be dynamically increased or decreased, and the serial and parallel mode of the FPGA and the FPGA can be dynamically configured according to the computing task requirement, and the utility model is strong. It has a wide range of applications and has a good promotion value.
  • Figure 1 is a diagram showing an example of a dual calculation plane structure of the present invention.
  • Figure 2 is a diagram of an intelligent computing node architecture.
  • Figure 3 is a diagram of a parameter server node architecture.
  • Figure 4 is a cluster interconnect topology.
  • Figure 5 is a schematic diagram of cluster layering expansion.
  • Figure 6 is a schematic diagram of a dynamic reconfigurable.
  • a dynamically reconfigurable intelligent computing cluster includes,
  • a general computing plane configured to complete a computing task including clustering, scheduling, or parameter passing, consisting of a parameter server node and a smart computing node, the smart computing node including a processor, a preprocessor connected memory, and a processor connection exchange Chip, smart computing card connected to the switch chip, PCIE bridge, BMC, hard disk, network card, on the general computing plane, between all parameter server nodes, between all intelligent computing nodes and between parameter server nodes and intelligent computing nodes All connected through the network;
  • the intelligent computing plane is used to complete the computing tasks including parallel and pipeline. It consists of several intelligent computing nodes. All intelligent computing nodes use SRIO interconnection. The intelligent computing plane and the general computing plane are interconnected through the PCIE interface.
  • the SRIO interconnection path between the intelligent computing nodes means that the SRIO bus is connected to the intelligent computing card and the QSFP interface on the intelligent computing node, and then connected to the SRIO switch through the QSFP interface, thereby realizing all intelligent computing.
  • the interconnection of nodes means that the SRIO bus is connected to the intelligent computing card and the QSFP interface on the intelligent computing node, and then connected to the SRIO switch through the QSFP interface, thereby realizing all intelligent computing.
  • the BMC accesses the management network port through the SGMII signal, and then connects the management network port to the Gigabit switch to implement interconnection between multiple BMCs.
  • the intelligent computing nodes implement extended interconnection between the following ways:
  • all intelligent computing nodes are configured into three expansion planes, SRIO expansion plane, PCIE extension plane and 40G extension plane.
  • all intelligent computing nodes in the SRIO expansion plane are extended by SRIO bus connection; in the PCIE extension plane, all intelligent computing nodes Connection expansion is implemented through the PCIE interface; in the 40G expansion plane, all intelligent computing nodes implement connection expansion through the form of a network cable connection processor.
  • all intelligent computing nodes can adopt a serial, parallel or serial combined working mode.
  • the general computing plane is composed of a parameter server and an intelligent computing node (including a general-purpose server + intelligent computing node), and the nodes between the planes are interconnected by 40G networks; the intelligent computing plane is composed of intelligent computing nodes, and the planes are connected by SRIO; The planes are interconnected by PCIEx8.
  • the computing tasks can be completed independently between the two planes.
  • the intelligent computing plane is mainly used to complete a large number of parallel and pipeline computing tasks.
  • the general computing plane is mainly used to complete tasks such as clustering, scheduling or parameter passing.
  • the intelligent computing node architecture diagram mainly includes a general-purpose processor, a memory, a PCIE bridge, a BMC, a hard disk, a high-speed network, and an intelligent computing node.
  • the intelligent computing unit in the figure is an intelligent computing card.
  • the parameter server node architecture diagram differs from Figure 2 in that it does not contain intelligent computing nodes, but has more high-speed network interfaces.
  • the cluster interconnection topology diagram includes the SRIO interconnection path of the intelligent computing node, interconnected by the SRIO switch; the management path, the BMC of each node is interconnected through the Gigabit switch; the storage path, the intelligent computing node, and the storage node are interconnected through the 40G network switch; Paths, parameter servers, and intelligent computing nodes are interconnected through 40G network switches.
  • the cluster expansion diagram the intelligent computing node through the 40G network, PCIE bus, SRIO to achieve inter-node, intra-node, intelligent computing nodes spread across the nodes.
  • the high-speed network interconnection-based general-purpose server and the high-speed serial bus-based intelligent computing node form a large-scale scalable dual computing plane, through efficient Cluster management scheduling software to achieve large-scale expansion and dynamic reconfigurability of clusters; heterogeneous computing clusters are divided into parameter server nodes and intelligent computing nodes according to node functions, parameter server nodes have higher network bandwidth, and intelligent computing nodes are CPUs.
  • a dynamic reconfigurable intelligent computing cluster configuration method based on the above cluster, in a heterogeneous computing cluster of a general purpose server (CPU) + intelligent computing node (FPGA), a general-purpose server based on high-speed network interconnection
  • the intelligent computing nodes based on the high-speed serial bus interconnection form a large-scale scalable dual computing plane, and realize large-scale expansion and dynamic reconfigurability of the cluster through efficient cluster management scheduling software.
  • the configuration process is,
  • a software scheduling module and a file pool are first configured in the cluster, wherein the software scheduling module provides a system task manager, issues a system task command, and stores a configuration file in the file pool;
  • the intelligent computing plane is invoked through the software scheduling module, and the task scheduling is started;
  • the corresponding configuration file is extracted in the configuration file pool inside the cluster, and the resource scheduling is performed through the intelligent computing node, and the FPGA resource is dynamically reconstructed according to requirements in the cluster.
  • the configuration file includes a neural network configuration file, a linear regression configuration file, a decision tree configuration file, and an enhanced learning configuration file.
  • the general calculation plane when the task is clustering, scheduling, or parameter transfer calculation, the general calculation plane is used; when the task is parallel and pipeline calculation, the intelligent calculation plane is used.
  • the intelligent computing node scheduling resource is implemented by configuring a serial, parallel or serial combined working mode.
  • serial or serial combined working mode is adopted, all cross-node data is exchanged by the associated intelligent computing node through SRIO. .
  • the computing, storage, and management networks of the cluster are independent of each other, and the SRIO interconnection is used between the heterogeneous computing cards, and the communication delay is lower; the computing and storage interconnection is cross-node, and the heterogeneous computing card and the computing interconnection are in the node.
  • the interconnection between heterogeneous computing cards exists in both nodes and across nodes; through the above-mentioned different bus interconnections, heterogeneous protocol fusion is formed, and the entire cluster forms a double computing plane between the computing node cluster and the heterogeneous computing card cluster, and between planes Through PCIE interconnection; at the same time, the expansion aspect of the cluster can be extended from three levels: network expansion plane (between nodes), PCIE extension plane (intra-node), SRIO extension plane (in-node and cross-node), so that the task of the whole system can be Dynamic allocation, the efficiency of the cluster is greatly improved. For example, the management of the computing task can be done by one node, but the heterogeneous computing card allocated to multiple nodes is calculated for cross-node computing without cross-node scheduling.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种动态可重构的智能计算集群及其配置方法,包括通用计算平面,用于完成包括聚类、调度或参数传递的计算任务,由参数服务器节点和智能计算节点组成,所述智能计算节点包括处理器、预处理器连接的内存、与处理器连接交换芯片、连接交换芯片的智能计算卡、PCIE桥片、BMC、硬盘、网卡;智能计算平面,用于完成包括并行及流水的计算任务,由若干智能计算节点组成,所有智能计算节点之间均采用SRIO互联;该智能计算平面与通用计算平面之间通过PCIE接口互联。该动态可重构的智能计算集群及其配置方法与现有技术相比,可灵活调度FPGA运算给CPU,使得系统资源利用率提升;可实现对CPU和FPGA的资源调度管理,同时可动态增加或减少服务器节点。

Description

一种动态可重构的智能计算集群及其配置方法 技术领域
本发明涉及人工智能计算领域,具体地说是一种动态可重构的智能计算集群及其配置方法。
背景技术
FPGA(Field Programmable Gate Array),即现场可编程门阵列,它是在PAL、GAL、CPLD等可编程器件的基础上进一步发展的产物。它是作为专用集成电路(ASIC)领域中的一种半定制电路而出现的,既解决了定制电路的不足,又克服了原有可编程器件门电路数有限的缺点。
当前技术中,采用CPU+FPGA的可重构架构的异构计算具有很多优势,例如:较高的性能、较大的灵活性、较低的功耗特性、天生的容错特性以及能够大大缩减产品开发周期等。采用FPGA来替代GPU作为未来高性能计算的加速器,应该是现阶段的FPGA异构智能计算发展的主旋律。基于此,现提出一种动态可重构的智能计算集群及其配置方法,来实现动态配置FPGA及FPGA的串并模式。
技术问题
本发明的技术任务是针对以上不足之处,提供一种动态可重构的智能计算集群及其配置方法。
技术解决方案
一种动态可重构的智能计算集群,包括,
通用计算平面,用于完成包括聚类、调度或参数传递的计算任务,由参数服务器节点和智能计算节点组成,所述智能计算节点包括处理器、预处理器连接的内存、与处理器连接交换芯片、连接交换芯片的智能计算卡、PCIE桥片、BMC、硬盘、网卡,在该通用计算平面上,所有参数服务器节点之间、所有智能计算节点之间及参数服务器节点和智能计算节点之间均通过网络互联;
智能计算平面,用于完成包括并行及流水的计算任务,由若干智能计算节点组成,所有智能计算节点之间均采用SRIO互联;该智能计算平面与通用计算平面之间通过PCIE接口互联。
所述智能计算平面上,智能计算节点之间通过SRIO互联通路是指,首先在智能计算节点上配置SRIO总线连接智能计算卡及QSFP接口,然后通过QSFP接口连接至SRIO交换机,从而实现所有智能计算节点的互联。
在智能计算节点中,其BMC通过SGMII信号接入管理网口,然后将该管理网口接入千兆交换机后实现多个BMC之间的互联。
所述智能计算节点之间通过以下方式实现扩展互联:
首先所有智能计算节点配置成三个扩展平面,SRIO扩展平面、PCIE扩展平面和40G扩展平面,其中,SRIO扩展平面中,所有智能计算节点通过SRIO总线连接扩展;PCIE扩展平面中,所有智能计算节点通过PCIE接口实现连接扩展;40G扩展平面中,所有智能计算节点通过网线连接处理器的形式实现连接扩展。
当智能计算节点之间扩展互联时,所有智能计算节点可采用串行、并行或串并结合的工作模式。
一种动态可重构的智能计算集群配置方法,基于上述集群,其配置过程为,
一、首先在集群中配置一软件调度模块、文件池,其中软件调度模块提供系统任务管理器,发出系统任务命令;文件池中存储配置文件;
二、通过软件调度模块调用智能计算平面,开始计算任务调度;
三、计算完成后,在集群内部的配置文件池中提取相应配置文件,通过智能计算节点进行资源调度,在集群中完成FPGA资源的按需求动态重构。
所述步骤一中,所述配置文件包括神经网络配置文件、线性回归配置文件、决策树配置文件、增强学习配置文件。
所述步骤二中,任务为聚类、调度或参数传递的计算时,采用通用计算平面完成;任务为并行及流水的计算时,采用智能计算平面完成。
所述智能计算节点调度资源通过配置串行、并行或串并结合的工作模式实现,当采用串行或串并结合工作模式中,所有跨节点数据由相关联的智能计算节点通过SRIO进行数据交换。
有益效果
本发明的一种动态可重构的智能计算集群及其配置方法和现有技术相比,具有以下有益效果:
本发明的一种动态可重构的智能计算集群及其配置方法,智能计算节点可被合理调度资源,灵活配置串行、并行或串并结合的工作模式;串行或串并结合工作模式中,跨节点数据可直接由智能计算节点通过SRIO进行数据交换,无需经过服务器节点,降低CPU负载,缩短通信通路;智能计算节点资源紧张或某些算法不适合FPGA运算时,可灵活调度给CPU,使得系统资源利用率提升;通过集群管理调度软件可实现对CPU和FPGA的资源调度管理,同时可动态增加或减少服务器节点,并依据计算任务需求动态配置FPGA及FPGA的串并模式,实用性强,适用范围广泛,具有很好的推广使用价值。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
附图1是本发明双计算平面结构示例图。
附图2 是智能计算节点架构图。
附图3 是参数服务器节点架构图。
附图4 是集群互联拓扑。
附图5 是集群分层扩展示意图。
附图6是 动态可重构示意图。
本发明的实施方式
为了使本技术领域的人员更好地理解本发明的方案,下面结合具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
如附图1所示,一种动态可重构的智能计算集群,包括,
通用计算平面,用于完成包括聚类、调度或参数传递的计算任务,由参数服务器节点和智能计算节点组成,所述智能计算节点包括处理器、预处理器连接的内存、与处理器连接交换芯片、连接交换芯片的智能计算卡、PCIE桥片、BMC、硬盘、网卡,在该通用计算平面上,所有参数服务器节点之间、所有智能计算节点之间及参数服务器节点和智能计算节点之间均通过网络互联;
智能计算平面,用于完成包括并行及流水的计算任务,由若干智能计算节点组成,所有智能计算节点之间均采用SRIO互联;该智能计算平面与通用计算平面之间通过PCIE接口互联。
所述智能计算平面上,智能计算节点之间通过SRIO互联通路是指,首先在智能计算节点上配置SRIO总线连接智能计算卡及QSFP接口,然后通过QSFP接口连接至SRIO交换机,从而实现所有智能计算节点的互联。
在智能计算节点中,其BMC通过SGMII信号接入管理网口,然后将该管理网口接入千兆交换机后实现多个BMC之间的互联。
所述智能计算节点之间通过以下方式实现扩展互联:
首先所有智能计算节点配置成三个扩展平面,SRIO扩展平面、PCIE扩展平面和40G扩展平面,其中,SRIO扩展平面中,所有智能计算节点通过SRIO总线连接扩展;PCIE扩展平面中,所有智能计算节点通过PCIE接口实现连接扩展;40G扩展平面中,所有智能计算节点通过网线连接处理器的形式实现连接扩展。
当智能计算节点之间扩展互联时,所有智能计算节点可采用串行、并行或串并结合的工作模式。
下面结合示意图说明。
如图1,通用计算平面为参数服务器和智能计算节点(内含通用服务器+智能计算节点)组成,平面间节点采用40G网络互联;智能计算平面由智能计算节点组成,平面间采用SRIO互联;两平面间采用PCIEx8互联。两平面间可独立完成计算任务,智能计算平面主要用于完成大量并行及流水的计算任务,通用计算平面主要用于完成聚类、调度或参数传递等任务。
如图2,智能计算节点架构图,架构主要包含通用处理器、内存、PCIE桥片、BMC、硬盘、高速网络、智能计算节点,该附图中的智能计算单元即智能计算卡。
如图3,参数服务器节点架构图,与图2区别在于不含智能计算节点,但具备更多的高速网络接口。
如图4,集群互联拓扑图,包括智能计算节点SRIO互联通路,通过SRIO交换机互联;管理通路,各节点BMC通过千兆交换机互联;存储通路,智能计算节点、存储节点通过40G网络交换机互联;计算通路,参数服务器、智能计算节点通过40G网络交换机互联。
如图5,集群扩展示意图,智能计算节点通过40G网络、PCIE总线、SRIO实现节点间、节点内、智能计算节点跨节点扩展互联。
在通用服务器(CPU)+智能计算节点(FPGA)的智能计算集群中,基于高速网络互联的通用服务器和基于高速串行总线互联的智能计算节点形成大规模可扩展的双计算平面,通过高效的集群管理调度软件,实现集群的大规模扩展和动态可重构;异构计算集群中按照节点功能分为参数服务器节点和智能计算节点,参数服务器节点具备更高的网络带宽,智能计算节点为CPU+FPGA(智能计算节点)异构架构;各节点间采用40G网络互联,智能计算节点之间采用SRIO互联,通用计算平面与智能计算平面间通过PCIEx8互联;智能计算节点可被合理调度资源,灵活配置串行、并行或串并结合的工作模式;串行或串并结合工作模式中,跨节点数据可直接由智能计算节点通过SRIO进行数据交换,无需经过服务器节点,降低CPU负载,缩短通信通路;智能计算节点资源紧张或某些算法不适合FPGA运算时,可灵活调度给CPU,使得系统资源利用率提升;通过集群管理调度软件可实现对CPU和FPGA的资源调度管理,同时可动态增加或减少服务器节点,并依据计算任务需求动态配置FPGA及FPGA的串并模式。
如图6所示,一种动态可重构的智能计算集群配置方法,基于上述集群,在通用服务器(CPU)+智能计算节点(FPGA)的异构计算集群中,基于高速网络互联的通用服务器和基于高速串行总线互联的智能计算节点形成大规模可扩展的双计算平面,通过高效的集群管理调度软件,实现集群的大规模扩展和动态可重构。
其配置过程为,
一、首先在集群中配置一软件调度模块、文件池,其中软件调度模块提供系统任务管理器,发出系统任务命令;文件池中存储配置文件;
二、通过软件调度模块调用智能计算平面,开始计算任务调度;
三、计算完成后,在集群内部的配置文件池中提取相应配置文件,通过智能计算节点进行资源调度,在集群中完成FPGA资源的按需求动态重构。
所述步骤一中,所述配置文件包括神经网络配置文件、线性回归配置文件、决策树配置文件、增强学习配置文件。
所述步骤二中,任务为聚类、调度或参数传递的计算时,采用通用计算平面完成;任务为并行及流水的计算时,采用智能计算平面完成。
所述智能计算节点调度资源通过配置串行、并行或串并结合的工作模式实现,当采用串行或串并结合工作模式中,所有跨节点数据由相关联的智能计算节点通过SRIO进行数据交换。
在本发明中,集群的计算、存储、管理网络相互独立,异构计算卡间采用SRIO互联,通信时延更低; 计算与存储互联是跨节点的,异构计算卡与计算互联是节点内的,异构计算卡之间互联是节点内和跨节点均存在的; 通过上述不同总线互联,形成异构协议融合,整个集群形成计算节点集群与异构计算卡集群的双计算平面,平面间通过PCIE互联;同时 集群的扩展方面可从三个层次进行扩展:网络扩展平面(节点间)、PCIE扩展平面(节点内)、SRIO扩展平面(节点内和跨节点),使得整个系统的任务可以动态分配,集群的效率大大提高,如,可以计算任务的管理由一个节点来做,但计算分配给多个节点的异构计算卡进行跨节点计算,却无需通过跨节点调度。
通过上面具体实施方式,所述技术领域的技术人员可容易的实现本发明。但是应当理解,本发明并不限于上述的具体实施方式。在公开的实施方式的基础上,所述技术领域的技术人员可任意组合不同的技术特征,从而实现不同的技术方案。
除说明书所述的技术特征外,均为本专业技术人员的已知技术。

Claims (9)

  1. 一种动态可重构的智能计算集群,其特征在于,包括,
    通用计算平面,用于完成包括聚类、调度或参数传递的计算任务,由参数服务器节点和智能计算节点组成,所述智能计算节点包括处理器、预处理器连接的内存、与处理器连接交换芯片、连接交换芯片的智能计算卡、PCIE桥片、BMC、硬盘、网卡,在该通用计算平面上,所有参数服务器节点之间、所有智能计算节点之间及参数服务器节点和智能计算节点之间均通过网络互联;
    智能计算平面,用于完成包括并行及流水的计算任务,由若干智能计算节点组成,所有智能计算节点之间均采用SRIO互联;该智能计算平面与通用计算平面之间通过PCIE接口互联。
  2. 根据权利要求1所述的一种动态可重构的智能计算集群,其特征在于,所述智能计算平面上,智能计算节点之间通过SRIO互联通路是指,首先在智能计算节点上配置SRIO总线连接智能计算卡及QSFP接口,然后通过QSFP接口连接至SRIO交换机,从而实现所有智能计算节点的互联。
  3. 根据权利要求1所述的一种动态可重构的智能计算集群,其特征在于,在智能计算节点中,其BMC通过SGMII信号接入管理网口,然后将该管理网口接入千兆交换机后实现多个BMC之间的互联。
  4. 根据权利要求1~3任一所述的一种动态可重构的智能计算集群,其特征在于,所述智能计算节点之间通过以下方式实现扩展互联:
    首先所有智能计算节点配置成三个扩展平面,SRIO扩展平面、PCIE扩展平面和40G扩展平面,其中,SRIO扩展平面中,所有智能计算节点通过SRIO总线连接扩展;PCIE扩展平面中,所有智能计算节点通过PCIE接口实现连接扩展;40G扩展平面中,所有智能计算节点通过网线连接处理器的形式实现连接扩展。
  5. 根据权利要求1所述的一种动态可重构的智能计算集群,其特征在于,当智能计算节点之间扩展互联时,所有智能计算节点可采用串行、并行或串并结合的工作模式。
  6. 一种动态可重构的智能计算集群配置方法,其特征在于,基于上述集群,其配置过程为,
    一、首先在集群中配置一软件调度模块、文件池,其中软件调度模块提供系统任务管理器,发出系统任务命令;文件池中存储配置文件;
    二、通过软件调度模块调用智能计算平面,开始计算任务调度;
    三、计算完成后,在集群内部的配置文件池中提取相应配置文件,通过智能计算节点进行资源调度,在集群中完成FPGA资源的按需求动态重构。
  7. 根据权利要求6所述的一种动态可重构的智能计算集群配置方法,其特征在于,所述步骤一中,所述配置文件包括神经网络配置文件、线性回归配置文件、决策树配置文件、增强学习配置文件。
  8. 根据权利要求6所述的一种动态可重构的智能计算集群配置方法,其特征在于,所述步骤二中,任务为聚类、调度或参数传递的计算时,采用通用计算平面完成;任务为并行及流水的计算时,采用智能计算平面完成。
  9. 根据权利要求6所述的一种动态可重构的智能计算集群配置方法,其特征在于,所述智能计算节点调度资源通过配置串行、并行或串并结合的工作模式实现,当采用串行或串并结合工作模式中,所有跨节点数据由相关联的智能计算节点通过SRIO进行数据交换。
PCT/CN2018/106105 2018-05-08 2018-09-18 一种动态可重构的智能计算集群及其配置方法 WO2019214128A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810431792.6A CN108628800A (zh) 2018-05-08 2018-05-08 一种动态可重构的智能计算集群及其配置方法
CN201810431792.6 2018-05-08

Publications (1)

Publication Number Publication Date
WO2019214128A1 true WO2019214128A1 (zh) 2019-11-14

Family

ID=63695820

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/106105 WO2019214128A1 (zh) 2018-05-08 2018-09-18 一种动态可重构的智能计算集群及其配置方法

Country Status (2)

Country Link
CN (1) CN108628800A (zh)
WO (1) WO2019214128A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111338907A (zh) * 2020-03-09 2020-06-26 山东超越数控电子股份有限公司 一种pcie设备的远程状态监测系统及方法
CN115809685A (zh) * 2023-02-09 2023-03-17 鹏城实验室 一种npu集群网络结构和网络互连方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995856B (zh) * 2019-12-16 2022-09-13 上海米哈游天命科技有限公司 一种服务器扩展的方法、装置、设备及存储介质
CN113032329B (zh) * 2021-05-21 2021-09-14 千芯半导体科技(北京)有限公司 基于可重构存算芯片的计算结构、硬件架构及计算方法
CN113553031B (zh) * 2021-06-04 2023-02-24 中国人民解放军战略支援部队信息工程大学 软件定义变结构计算架构及利用其实现的左右脑一体化资源联合分配方法
CN113392065A (zh) * 2021-07-14 2021-09-14 中科晶锐(苏州)科技有限公司 异构计算系统及计算方法
CN114428757B (zh) * 2021-12-06 2024-05-17 中国船舶集团有限公司第七一六研究所 一种架构可重构的计算装置及其重构方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140355478A1 (en) * 2013-06-04 2014-12-04 Electronics & Telecommunications Research Institute Method of providing a dynamic node service and device using the same
CN105703940A (zh) * 2015-12-10 2016-06-22 中国电力科学研究院 一种面向多级调度分布式并行计算的监控系统及监控方法
CN105933219A (zh) * 2016-04-06 2016-09-07 中国科学院自动化研究所 异构多源高速数据交换适配装置
CN105933154A (zh) * 2016-04-28 2016-09-07 安徽四创电子股份有限公司 一种云计算资源的管理方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6920545B2 (en) * 2002-01-17 2005-07-19 Raytheon Company Reconfigurable processor with alternately interconnected arithmetic and memory nodes of crossbar switched cluster
CN102053945B (zh) * 2009-11-09 2012-11-21 中国科学院过程工程研究所 一种面向多尺度离散模拟的并行计算系统
KR102130813B1 (ko) * 2013-10-08 2020-07-06 삼성전자주식회사 재구성 가능 프로세서 및 재구성 가능 프로세서를 동작하는 방법
CN104657330A (zh) * 2015-03-05 2015-05-27 浪潮电子信息产业股份有限公司 一种基于x86架构处理器和FPGA的高性能异构计算平台
US10423892B2 (en) * 2016-04-05 2019-09-24 Omni Ai, Inc. Trajectory cluster model for learning trajectory patterns in video data
CN106339351B (zh) * 2016-08-30 2019-05-10 浪潮(北京)电子信息产业有限公司 一种sgd算法优化系统及方法
CN106598738A (zh) * 2016-12-13 2017-04-26 郑州云海信息技术有限公司 一种计算机集群系统及其并行计算方法
CN107678752B (zh) * 2017-08-31 2021-09-21 北京百度网讯科技有限公司 一种面向异构集群的任务处理方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140355478A1 (en) * 2013-06-04 2014-12-04 Electronics & Telecommunications Research Institute Method of providing a dynamic node service and device using the same
CN105703940A (zh) * 2015-12-10 2016-06-22 中国电力科学研究院 一种面向多级调度分布式并行计算的监控系统及监控方法
CN105933219A (zh) * 2016-04-06 2016-09-07 中国科学院自动化研究所 异构多源高速数据交换适配装置
CN105933154A (zh) * 2016-04-28 2016-09-07 安徽四创电子股份有限公司 一种云计算资源的管理方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111338907A (zh) * 2020-03-09 2020-06-26 山东超越数控电子股份有限公司 一种pcie设备的远程状态监测系统及方法
CN115809685A (zh) * 2023-02-09 2023-03-17 鹏城实验室 一种npu集群网络结构和网络互连方法

Also Published As

Publication number Publication date
CN108628800A (zh) 2018-10-09

Similar Documents

Publication Publication Date Title
WO2019214128A1 (zh) 一种动态可重构的智能计算集群及其配置方法
WO2019227837A1 (zh) 一种自由切换gpu服务器拓扑的装置及方法
WO2022099559A1 (zh) 支持亿级神经元的类脑计算机
KR100986006B1 (ko) 마이크로프로세서 서브시스템
CN104820657A (zh) 一种基于嵌入式异构多核处理器上的核间通信方法及并行编程模型
CN101819556B (zh) 一种信号处理板
CN112269751B (zh) 一种面向亿级神经元类脑计算机的芯片扩展方法
CN111488308B (zh) 一种支持不同架构多处理器扩展的系统和方法
CN105930598B (zh) 一种基于控制器流水架构的层次化信息处理方法及电路
CN103336756B (zh) 一种数据计算节点的生成装置
CN112800715B (zh) 软件定义晶上系统及数据交互方法和系统体系架构
CN107430574A (zh) 用于分析系统的io、处理和存储器带宽的优化的方法和装置
CN105335330A (zh) 一种基于主从架构的微服务器集群系统
Yin et al. Scalable mapreduce framework on fpga accelerated commodity hardware
US11645225B2 (en) Partitionable networked computer
CN106844263B (zh) 一种基于可配置的多处理器计算机系统及实现方法
CN110059797A (zh) 一种计算装置及相关产品
CN105045761B (zh) 一种数据中心的高速并行处理架构
CN110059809A (zh) 一种计算装置及相关产品
US11461234B2 (en) Coherent node controller
Hou et al. Co-designing the topology/algorithm to accelerate distributed training
WO2021213076A1 (zh) 基于多处理节点来构建通信拓扑结构的方法和设备
WO2021213075A1 (zh) 一种基于多处理节点来进行节点间通信的方法和设备
Di et al. Microprocessor architecture and design in post exascale computing era
CN104635879A (zh) 一种基于sdn的刀片服务器实现方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18917948

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18917948

Country of ref document: EP

Kind code of ref document: A1