CN111614581A - 集体通信系统和方法 - Google Patents

集体通信系统和方法 Download PDF

Info

Publication number
CN111614581A
CN111614581A CN202010117006.2A CN202010117006A CN111614581A CN 111614581 A CN111614581 A CN 111614581A CN 202010117006 A CN202010117006 A CN 202010117006A CN 111614581 A CN111614581 A CN 111614581A
Authority
CN
China
Prior art keywords
data
subgroup
processes
given
destination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010117006.2A
Other languages
English (en)
Other versions
CN111614581B (zh
Inventor
理查德·格雷汉姆
利龙·莱维
吉尔·布洛赫
丹尼尔·马可维奇
诺姆·布洛赫
秦勇
亚尼夫·布卢门菲尔德
埃坦·扎哈维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mellanox Technologies Ltd
Original Assignee
Tel Aviv Melos Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tel Aviv Melos Technology Co ltd filed Critical Tel Aviv Melos Technology Co ltd
Publication of CN111614581A publication Critical patent/CN111614581A/zh
Application granted granted Critical
Publication of CN111614581B publication Critical patent/CN111614581B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L12/40169Flexible bus arrangements
    • H04L12/40176Flexible bus arrangements involving redundancy
    • H04L12/40182Flexible bus arrangements involving redundancy by using a plurality of communication lines
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9057Arrangements for supporting packet reassembly or resequencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0456Selection of precoding matrices or codebooks, e.g. using matrices antenna weighting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/44Star or tree networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/10Scheduling measurement reports ; Arrangements for measurement reports
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/02Terminal devices
    • H04W88/06Terminal devices adapted for operation in multiple networks or having at least two operational modes, e.g. multi-mode terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种方法,其中多个进程被配置成持有去往其他进程的数据块,且数据重新打包电路包括:接收电路,其被配置用于从多个进程中的源进程接收至少一个数据块;重新打包电路,其被配置用于根据多个进程中的至少一个目的进程重新打包接收到的数据;以及发送电路,其被配置用于将重新打包的数据发送到多个进程中的至少一个目的进程;接收用于全对全数据交换的数据组,该数据组被配置为矩阵,该矩阵被分布在所述多个进程中,以及通过以下方式转置数据:多个进程中的每个进程将矩阵数据从该进程发送到重新打包电路,以及重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。

Description

集体通信系统和方法
技术领域
本发明在其示例性实施方式中涉及集体通信系统和方法,尤其涉及但不限于消息传递操作,并且还尤其涉及但不限于全对全(all-to-all)操作。
优先权声明
本申请要求Graham等人于2019年2月25日提交的美国临时专利申请S/N 62/809,786的优先权。
发明内容
本发明在其特定实施方式中,旨在提供改进的系统和方法用于集体通信,尤其涉及但不仅限于消息传递操作,包括全对全(all-to-all)操作。
因此,根据本发明的示例性实施方式,提供了一种方法包括提供多个进程,所述多个进程中的每个进程被配置成持有去往所述多个进程中的其他进程的数据块;提供至少一个数据重新打包电路实例,其包括被配置用于从所述多个进程中的至少一个源进程接收至少一个数据块的接收电路,被配置用于根据所述多个进程中的至少一个目的进程重新打包接收到的数据的重新打包电路,以及被配置用于将重新打包的数据发送到所述多个进程中的所述至少一个目的进程的发送电路;接收用于全对全数据交换的数据组,该数据组被配置为矩阵,该矩阵被分布在所述多个进程中;以及通过以下方式转置数据:所述多个进程中的每个进程将矩阵数据从所述进程发送到所述重新打包电路,以及所述重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。
此外,根据本发明的示例性实施方式,所述方法还包括提供控制树,所述控制树被配置用于控制所述多个进程和所述重新打包电路。
更进一步地,根据本发明的示例性实施方式,所述控制树还被配置用于从所述多个进程中的每个进程接收注册消息,当已经从所述多个进程的给定子组的所有成员接收到注册消息时将所述给定子组标记为准备好操作,当作为源子组的给定子组和作为目的子组的对应子组准备好操作时将给定的源子组和给定的目的子组配对并将所述给定的源子组和所述给定的目的子组分配给重新打包电路实例,以及在关于每个所述源子组和每个所述目的子组的操作完成时通知每个所述源子组和每个所述目的子组。
此外,根据本发明的示例性实施方式,所述控制树被配置用于,除了将所述给定的源子组和所述给定的目的子组配对之外,将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
此外,根据本发明的示例性实施方式,所述方法还包括除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
此外,根据本发明的示例性实施方式,所述控制树包括约简树。
根据本发明的另一示例性实施方式,还提供了一种设备,包括:接收电路,其被配置用于从多个进程中的至少一个源进程接收至少一个数据块,所述多个进程中的每个进程被配置成持有去往所述多个进程中的其他进程的数据块;至少一个数据重新打包电路实例,其被配置用于根据所述多个进程中的至少一个目的进程重新打包接收到的数据;以及发送电路,其被配置用于将所述重新打包的数据发送到所述多个进程中的所述至少一个目的进程,所述设备被配置用于接收用于全对全数据交换的数据组,所述数据组被配置为矩阵,所述矩阵被分布在所述多个进程中,并且所述设备还被配置用于通过以下方式转置所述数据:在所述重新打包电路处从所述多个进程中的每个进程接收来自所述进程的矩阵数据,以及所述数据重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。
此外,根据本发明的示例性实施方式,所述设备还包括控制树,所述控制树被配置用于控制所述多个进程和所述重新打包电路。
更进一步地,根据本发明的示例性实施方式,所述控制树还被配置用于从所述多个进程中的每个进程接收注册消息,当已经从所述多个进程的给定子组的所有成员接收到注册消息时将所述给定子组标记为准备好操作,当作为源子组的给定子组和作为目的子组的对应子组准备好操作时将给定的源子组和给定的目的子组配对并将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例,以及在关于每个源子组和每个目的子组的操作完成时通知每个所述源子组和每个所述目的子组。
此外,根据本发明的示例性实施方式,所述控制树被配置用于,除了将所述给定的源子组和所述给定的目的子组配对之外,将所述给定的源子组和所述给定的目的子组分配到给定的数据重新打包电路实例。
此外,根据本发明的示例性实施方式,所述设备还包括除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配到给定的数据重新打包电路实例。
此外,根据本发明的示例性实施方式,所述控制树包括约简树。
附图说明
通过以下详细描述并结合附图,将会更充分地理解和领会本发明,其中:
图1A是根据本发明的示例性实施方式构建和操作的示例性计算机系统的简化图示;
图1B是示例性数据块布局的简化图示;
图2是另一示例性数据块布局的简化图示;
图3是描绘全对全v初始阶段和最终阶段的简化图示;
图4是描绘直接成对交换的简化图示;
图5是描述聚合算法的简化图示;
图6是描绘根据本发明的示例性实施方式的全对全操作的初始块分布的简化图示;
图7是描绘根据本发明的示例性实施方式的全对全操作的最终块分布的简化图示;
图8是描绘根据本发明的另一示例性实施方式的全对全子矩阵分布的简化图示;以及
图9是描绘根据本发明的示例性实施方式的子块转置的简化图示。
具体实施方式
在通信标准诸如消息传递接口(Message Passing Interface,MPI)(论坛,2015)中定义的全对全(all-to-all)操作是集体数据操作,其中每个进程向集体组中的每个其他进程发送数据,并从组中的每个进程接收相同量的数据。发送到每个进程的数据具有相同的长度a,并且是惟一的,源自不同的存储器位置。在诸如MPI之类通信标准中,进程操作的概念与任何特定的硬件基础架构解耦。本文讨论的集体组是指定义(集体)操作的一组进程。在MPI规范中,集体组被称为“通信子(communicator)”,而在OpenSHMEM中(例如,参见www.openshmem.org/site/),集体组被称为“团队(team)”。
现参考图1A,其为根据本发明的示例性实施方式构建和操作的示例性计算机系统的简化图示。图1A的系统,总体上标示为110,包括多个进程120,其中数据(通常为数据块)130在其间流动。本文使用的术语“数据块”(在各种语法形式下)是指数据,所述数据在集体组内从成员(进程、等级、……)i发送到成员j。应当理解,正如本文其他各处所解释的,对于全对全,所有块的大小是相同的(并且可以是0),而对于全对全v/w,假设数据块的大小是不一致的,并且一些/所有块可能是0。
下文描述了图1A的系统的示例性操作方法。在图1A中,通过非限制示例的方式示出了在片上系统中互连的多个CPU(包括CPU 1、CPU 2和CPU N)正在运行多个进程120。其他系统示例,举非限制示例而言,包括:单个CPU;由网络连接起来的多个系统或服务器;或者任何其他合适的系统。如上所述,本文所述的进程操作的概念与任何特定的硬件基础架构解耦,尽管应当理解,在任何实际的实现中,将会使用一些硬件基础架构(如图1A中所示或如上文所述)。
现参考图1B,其为包括多个数据块180的示例性数据块布局175的简化图示;并且参考图2,其为包括多个数据块220的另一示例性数据块布局210的简化图示。图1B示出了施加全对全操作之前的示例性数据块布局175,而图2示出了施加全对全操作之后对应的数据块布局210。图1B中的每个数据块180和图2中的每个数据块220对应于长度为a的矢量。
用于实现全对全算法的算法一般分为两类——直接交换算法和聚合算法。
全对全聚合算法旨在降低延迟成本,该延迟成本在短数据传输中占主导地位。全对全聚合算法采用数据转发方法,以便减少发送的消息的数目,从而降低延迟成本。这样的方法从/向多个源收集/分散数据,从而产生更少的较大数据传输,但是将给定的数据段发送多次。当参与集体操作的通信上下文的数目变得过多时,聚合技术变得比直接数据交换更低效;这是由于将给定的数据段多次传输的成本越来越高。全对全算法利用了数据长度a是算法常数这一事实,从而提供了足够的全局知识来协调中间过程中的数据交换。
直接交换算法通常用于全对全实例,其中传输的数据长度a超过带宽贡献占主导的阈值,或者当聚合技术聚合了来自过多进程的数据时,会导致聚合技术效率低下。
随着系统大小的增长,对于支持小型数据全对全交换的高效实现的需求也在增加,因为这是许多高性能计算(high-performance computing,HPC)应用所使用的数据交换模式。本发明在其示例性实施方式中,提出了一种新的全对全算法,其被设计用于在通信子大小的全范围内提高小型数据交换的效率。这包括一种新的基于聚合的算法,其适用于小型数据个体化全对全数据交换,并且可以被视为分布式矩阵的转置。虽然在本说明书和权利要求书中以各种语法形式使用了转置,但应当理解,转置包括根据本发明的示例性实施方式将算法概念化的方式;例如,在不限制上述声明的一般性的情况下,在(例如)MPI标准的层面上可能不存在这样的概念化。在示例性实施方式中,这样的转置包括改变块相对于其他块的位置,而不改变任何块内的结构。参考本发明的示例性实施方式,本文描述的算法受益于网络中可用的大量并发性,并且被设计为对于通过网络硬件的实现简单高效。在示例性实施方式中,交换硬件和主机通道适配器的实现都是这种新设计的目标。
个体化全对全v/w(all-to-all-v/w)算法在某些方面与个体化全对全数据交换相似。个体化全对全w算法与全对全v算法的不同之处在于,每个单独传输的数据类型在整个函数中可能是唯一的。对全对全算法做出改变以支持这种集体操作。更具体地关于数据类型:使用MPI标准接口传输的数据为所有数据指定了数据类型,诸如MPI_DOUBLE用于双精度字。全对全v接口指定所有数据元素具有相同的数据类型。全对全w允许为每个数据块指定不同的数据类型,举例而言,诸如为从进程i到进程j的数据指定数据类型。
将全对全v/w操作用于每个进程以与参与此集体操作的进程组中的每个其他进程交换独特数据。两个给定进程之间交换的数据的大小可能是不对称的,并且每一对进程可能具有与其他对不同的数据模式,且交换的数据大小可能有很大差异。给定的等级只需要具有其所参与的数据交换的本地API级信息。
针对硬件实现的个体化全对全v/w算法有些类似于个体化全对全算法,但需要更多描述用以实现的详细数据长度的元数据。此外,该算法只处理低于预先指定阈值的消息。针对较大的消息,使用直接数据交换。
先前,用于全对全函数实现的算法分为两大类:
-直接数据交换
-聚合算法
基本算法定义描述了集体组或者MPI定义中的MPI通信子中所有进程对之间的数据交换。术语“基本算法”是指接口级的算法定义——从逻辑上讲函数是什么/做什么,而不是如何实现函数结果。因此,举特定的非限制性示例而言,全对全v的基本描述是每个进程向组中的所有进程发送数据块。在本发明的某些示例性实施方式中,举特定的非限制性示例而言,描述了通过聚合数据并使用本文描述的通信模式来实现特定函数的方法。总体而言,算法定义在概念上需要O(N2)次数据交换,其中N为组大小。
现参考图3,其为描绘全对全v初始阶段和最终阶段的简化图示。
图3提供了个体化全对全v的示例,示出了初始(参考标号310)阶段和最终(参考标号320)阶段。在图3中,符号(i,j)表示从位置j的等级i开始并且应当传送到位置i的等级j的数据段。所有段的数据大小可能不同(甚至可能是零长度)。发送位置和接收位置的偏移也可能不同。
所述函数的直接数据交换实现是全对全函数的最简单实现。简单的实现将许多消息放在网络上,并潜在地通过引起拥塞和端点n→1争用而严重降低网络利用率。(本文使用的术语“端点”表示向集体操作贡献数据的实体,诸如进程或线程)。因此,实现直接数据交换的算法使用如图4中所示的诸如成对交换等通信模式,(Jelena Pjevsivac-Grbovic,2007),以减少网络负载和端点争用。对于带宽有限的大型消息交换,直接数据交换算法往往会充分利用网络资源。然而,当数据交换规模小时,延迟和消息速率成本将主导整个算法成本,并随N线性扩大,并且不能很好地利用系统资源。具体而言,图4描绘了涉及进程0的交换的直接成对交换模式的非限制性示例。每个交换的长度为a,具有双向数据交换。
聚合算法(Ana Gainaru,2016)被用于实现小型数据聚合,且Bruck(J.Bruck,1997)算法可能是该类中最著名的算法。其中使用此方法涉及每个进程的数据交换的数目为O((k-1)*logk(N)),其中N为集体组大小,而k为算法基数。图5示出了一种可能的聚合模式的通信模式。具体而言,图5描绘了聚合算法发送任意基数k的侧数据模式的非限制性示例,假设N是算法基数k的整数次幂。N是集体组的大小。聚合算法提供了比直接交换更好的可扩展性特性。消息数目的减少降低了全对全操作的延迟和消息速率成本,但增加了与带宽相关的成本。如果组规模不太大,则所述聚合算法胜过直接交换算法。聚合算法中每个数据交换的消息大小规模为O(a*N/k),其中a为全对全函数消息大小。因此,当组变大时,聚合算法在降低全对全数据交换的延迟方面是无效的,并将导致超过直接数据交换算法的延迟。
在本发明的示例性实施方式中,全对全和全对全v/w算法旨在通过以下方式优化小型数据交换:
1.在网络中定义多个聚合点,交换机或者主机通道适配器(host channeladapter,HCA)。
2.针对从进程的子块去往进程的相同子块或其他子块的数据,向网络基础架构中的各个聚合器分配聚合点。这些数据可以被视为分布式矩阵的子矩阵。单个聚合器可以处理来自单个个体化全对全或全对全v/w算法的子矩阵的多个块。
3.子块可以由不连续的进程组组成,这些进程组在某些示例性实施方式中即时形成,以处理调用应用中的负载不平衡。在这样的情况下,矩阵子块可能是不连续的。
4.本文使用的术语“聚合器”是指这样的实体:其对子矩阵进行聚合,对其进行转置,并且继而将结果发送到其最终目的地。在本发明的某些示例性实施方式中,聚合器是HCA内的逻辑块。继而,本步骤4可以包括使聚合器:
a.从所有源收集数据
b.混洗数据以准备使得去往特定进程的数据可以当作单个消息发送到此目的地。在当前上下文中,术语“混洗”指的是对来自不同源进程的传入数据重新排序,使得去往给定目的地的数据能够被方便地处理。在本发明的某些示例性实施方式中,发往单个目的地的数据可以被复制到一个连续的存储器块。
c.将数据发送到目的地
5.在某些优选实施方式中,数据不连续性以及数据源和/或目的地在网络边缘处理,使得聚合器仅处理连续的打包数据。换言之,从用户发送或由用户接收的数据不需要在用户的虚拟存储器空间中是连续的;这种情况可以被看作是立方体的面,其中6个面中的2个面不会在连续的存储器地址中。硬件发送连续数据流。处理从非连续的变为连续的“打包(packing)”是在第一步完成的(通过使用CPU将数据打包到连续的缓冲区,或者通过使用HCA收集能力)。类似地,将非连续数据拆包到用户缓冲区可以通过HCA将数据传递到连续目的缓冲区并继而使用CPU拆包,或者通过使用HCA分散能力来完成。因此,中间步骤中的算法数据操纵可以处理连续的打包数据。
本发明在其示例性实施方式中,可以被视为使用网络内的聚合点从分布式矩阵的非连续部分收集数据,转置数据,并将数据发送到其目的地。
在示例性实施方式中,本发明可被概括如下:
1.数据布局被视为分布式矩阵,其中每个进程持有去往每个其他进程的数据块。对于全对全算法,所有源数据块的数据块大小是相同的;而对于全对全v/w,数据块大小可以是不同的长度,包括长度0。在本文使用的符号中,水平索引表示数据源,垂直索引表示其目的地。
2.集体操作执行数据块的转置。
3.为了对分布式矩阵进行转置,将矩阵细分为dh×dv维度的矩形子矩阵,其中dh是在水平维度的大小,而dv是在垂直维度的大小。子块不需要在逻辑上连续。子矩阵可以被预定义,或者可以在运行时基于一些准则来确定,举非限制示例而言,诸如按进入全对全操作的顺序。
4.提供数据重新打包单元,其接受来自指定源集的数据,所述数据去往指定的目的地集,按目的地重新打包数据,并将数据发送到指定的目的地。在示例性实施方式中,所述数据重新打包单元具有用于所描述的每个操作的子单元。在本发明的某些示例性实施方式中,如本文所述的聚合器将会包括或利用数据重新打包单元。
5.将子矩阵的转置分配到给定的数据重新打包单元,其中每个单元被分配多个子矩阵进行转置。在本发明的某些示例性实施方式中,所述分配可以由在下面第7点中提到的控制树来完成;备选地,可以提供另一组件(举非限制性示例而言,诸如软件组件)来完成分配。
6.数据重新打包单元可以在系统内适当地实现。例如,其可以在交换机ASIC、主机通道适配器(HCA)单元、CPU或其他合适的硬件中实现,并且能够以硬件、固件、软件或其任何适当的组合来实现。
7.使用约简树作为控制树以对集体操作进行控制,方法如下:
7.1.组中的每个进程通过向控制树传递到达通知,来向控制树注册自己。
7.2.一旦子组的所有成员到达,该子组就被标记为准备好操作(准备好发送/接收)。
7.3.当给定子矩阵的源和目的组就绪时,相关的数据重新打包单元调度数据移动。
7.4.数据从源进程传输到数据重新打包单元。该单元对数据进行重新打包并将其发送到适当的目的地。
7.5.每个源进程都会得到完成通知,每个目的进程也是。在本发明的某些示例性实施方式中,这是通过聚合器通知源块和目的块完成来实现的;举特定的非限制性示例而言,这可以使用控制树来实现。
7.6.一旦接收到所有预期的数据并完成所有源数据的传输,操作就在每个进程本地完成。
在示例性实施方式中,更详细的说明如下:
在全对全算法和全对全v/w算法中,每个进程具有去往组中每个其他进程的唯一数据块。全对全与全对全v的主要区别在于数据布局模式。全对全的数据块大小都相同,而全对全v/w算法支持不同大小的数据块,并且数据块在用户缓冲区中不必以单调递增的顺序排序。
全对全算法的数据块布局可以被视为分布式矩阵,其中全对全算法对这个块分布进行转置。需要重点注意的是,在本发明的示例性实施方式中,每个块内的数据在转置中没有重排,而只是重排了数据块本身的排序。
图6示出了大小为六的组的示例性全对全数据源数据块布局,从而展示了全对全操作的示例性初始分布。每列表示每个进程持有用于所有其他进程的数据块。每个块都用两个索引标签标记,其中第一索引指示数据源自的进程,第二索引是该块的目的进程的等级(术语“等级”,根据MPI标准使用,其中通信子(对集体进行定义的进程组)的每个成员都被给予等级或ID)。
在全对全操作被施加于图6示例中的数据之后,伴随着数据块被转置,产生图7中展示的数据块布局。
全对全v/w算法进行了类似的数据转置。这样的变换的不同之处如下:
1.各块之间的数据大小可能不同,并且甚至可能长度为零。
2.源缓冲区和目的缓冲区处的数据块都不必按目的地(源缓冲区)或源(结果缓冲区)递增顺序排列。实际的块顺序被指定为全对全v/w操作的一部分。
因此,类似的通信模式可以用于实现全对全和全对全v/w。
实际矩阵变换是对数据的子块执行。本文使用术语“实际矩阵转换”是因为当矩阵中的每个元素都是数据块时,操作所定义的数据传输块可以被视为矩阵转换。矩阵的列是每个进程拥有的数据块。每个进程具有与组中的每个进程相关联的数据块,因此可以将矩阵视为方阵。对于全对全,所有块的大小是相同的,对于全对全v和全对全w,块的大小可能不同。从数据布局的块状视图(不是每个块的实际大小)来看,全对全v和全对全w仍然是正方形。
为了进行变换,定义了水平子矩阵维数dh和垂直子矩阵维数dv。子块维数不必是整个矩阵维数的整数因数,并且dh和dv不必相等。允许有不完整的子块;也就是说,对于给定的组大小,有些子组的组大小与子块大小之比不是整数。这种情况会在边缘处产生“剩余”块。举特定的非限制性示例而言,这样的“剩余”块会在大小为11的矩阵中出现,并带有大小为3的子块。最后,整个矩阵中的值的垂直和水平范围不必是连续的,例如,当映射到整个矩阵时,这样的子矩阵可以分布到矩阵上几个不同的连续数据块中。
例如,如果我们取dh=hv=2,并且我们使用进程组{1,2},{0,3}和{4,5}来对矩阵进行分块,则图8使用编码[a]到[i]来展示整个矩阵可以如何在一个非限制性示例中被细分为2×2的子块。注意,示例中有三个分布式子块:1)数据块(0,0)(0,3)(3,0)(3,3),表示为[a];2)数据块(0,1)(0,2)(3,1)(3,2),表示为[c];以及3)(1,0)(2,0)(1,3)(2,3),表示为[b]。
在本发明的示例性实施方式中,使用约简树对整个端到端全对全进行统筹。当进程调用集体操作时,每个进程使用约简树来注册集体操作。当子组的所有成员都注册了操作时,该子组被标记为有效。当源和目的子组都有效时,该子组可以被转置。
在本发明的某些示例性实施方式中,集体操作以下列方式执行:
1.组中的每个进程通过向控制器传递到达通知,来向控制树注册自己。
2.一旦子组的所有成员到达,该子组就被标记为准备好操作。
3.当源组和目的组准备就绪时,将它们配对并分配给数据重新打包单元。
4.数据从源进程传输到数据重新打包单元。该单元对数据进行重新打包并将其发送到合适的目的地。
5.每个源进程都会得到完成通知,每个目的进程也是。
6.一旦接收到所有预期的数据并完成所有源数据的传输,操作就在每个进程本地完成。
图9示出了在非限制性的示例实施方式中,如何使用系统中的数据重新打包单元910之一来转置由水平子组{0,3}和垂直子组{1,2}定义的子矩阵。进程0和进程3各自将其子矩阵的一部分发送给数据重新打包单元,该单元重新排列数据,并发送给进程1和进程2。在图9中所示的具体非限制性示例中,进程0具有数据元素(0,1)和(0,2),进程3具有数据元素(3,1)和(3,2)。该数据被发送到控制器,所述控制器将(0,1)和(3,1)发送到进程1并将(0,2)和(3,2)发送到进程2。结果缓冲区中的最终数据放置由端点处理。通常,在示例性实施方式中,重新打包单元910将所有由其处理的数据视为连续的“一团(blob)”数据——重新打包单元910不识别数据中的任何结构。在每个块内的端点处的最终数据分布可能是连续的,在这种情况下所述重新打包单元和所述目的进程将会具有相同的数据视图。然而,目的进程处的最终数据布局可能是不连续的,在这种情况下是端点在目的地适当地分布数据。应当理解,端点或任何其他合适的系统组件可以适当地分布数据。
参考文献
Ana Gainaru,R.L.Graham,Artem Polyakov,Gilad Shainer(2016).UsingInfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All(Vol.Proceedings of the 23rd European MPI Users'Group Meeting).Edinburgh,United Kingdom:ACM
MPI Forum,(2015).Message Passing Interface.Knoxville:University ofTennessee.
J.Bruck,Ching-Tien Ho,Shlomo Kipnis,Derrick Weathersby(1997).E cientalgorithms for all-to-all communications in multi-port message-passingsystems.In IEEE Transactions on Parallel and Distributed Systems,pages 298–309.
Jelena Pjevsivac-Grbovic,Thara Angskun,Geroge Bosilca,Graham Fagg,Edgar Gabriel,Jack Dongarra,(2007).Performance analysis of MPI collectiveoperations.Cluster Computing.
应当理解,如果需要,本发明的软件组件可以以ROM(只读存储器)的形式实现。如果需要,软件组件通常可以使用传统技术以硬件实现。还应当理解,软件组件可以被实例化,例如:作为计算机程序产品或处在有形介质上。在一些情况下,有可能将软件组件实例化为可由合适的计算机解读的信号,尽管这样的实例化可能在本发明的某些实施方式中被排除在外。
应当理解,为了清楚起见,在单独的实施方式的上下文中描述的本发明各个特征也可以组合在单一实施方式中提供。反之,为简洁起见,在单一实施方式的上下文中描述的本发明各个特征也可以分开提供或以任何适当的子组合形式提供。
本领域技术人员应当理解,本发明不受上述的具体表示和描述的限制。相反,发明的范围由所附的权利要求书及其等同项确定。

Claims (14)

1.一种方法,包括:
提供多个进程,所述多个进程中的每个进程被配置成持有去往所述多个进程中的其他进程的数据块;
提供至少一个数据重新打包电路实例,包括:
接收电路,其被配置用于从所述多个进程中的至少一个源进程接收至少一个数据块;
重新打包电路,其被配置用于根据所述多个进程中的至少一个目的进程重新打包接收到的数据;以及
发送电路,其被配置用于将重新打包的数据发送到所述多个进程中的所述至少一个目的进程;
接收用于全对全数据交换的数据组,所述数据组被配置为矩阵,所述矩阵被分布在所述多个进程中;以及
通过以下方式转置所述数据:所述多个进程中的每个进程将矩阵数据从所述进程发送到所述数据重新打包电路;以及所述数据重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。
2.根据权利要求1所述的方法,还包括提供控制树,所述控制树被配置用于控制所述多个进程和所述重新打包电路。
3.根据权利要求2所述的方法,其中所述控制树还被配置用于:
从所述多个进程中的每个进程接收注册消息;
当已经从所述多个进程的给定子组的所有成员接收到注册消息时,将所述给定子组标记为准备好操作;
当作为源子组的给定子组和作为目的子组的对应子组准备好操作时,将给定的源子组和给定的目的子组配对并将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例;以及
在关于每个所述源子组和每个所述目的子组的操作完成时,通知每个所述源子组和每个所述目的子组。
4.根据权利要求3所述的方法,其中所述控制树被配置用于,除了将所述给定的源子组和所述给定的目的子组配对之外,将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
5.根据权利要求3所述的方法,还包括提供除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
6.根据权利要求2所述的方法,其中所述控制树包括约简树。
7.根据权利要求6所述的方法,还包括提供除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
8.一种设备,包括:
接收电路,其被配置用于从多个进程中的至少一个源进程接收至少一个数据块,所述多个进程中的每个进程被配置成持有去往所述多个进程中的其他进程的数据块;
至少一个数据重新打包电路实例,其被配置用于根据所述多个进程中的至少一个目的进程重新打包接收到的数据;以及
发送电路,其被配置用于将所述重新打包的数据发送到所述多个进程中的所述至少一个目的进程,
所述设备被配置用于接收用于全对全数据交换的数据组,所述数据组被配置为矩阵,所述矩阵被分布在所述多个进程中;以及
所述设备还被配置用于通过以下方式转置所述数据:在所述重新打包电路处从所述多个进程中的每个进程接收来自所述进程的矩阵数据;以及所述数据重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。
9.根据权利要求8所述的设备,还包括控制树,所述控制树被配置用于控制所述多个进程和所述重新打包电路。
10.根据权利要求9所述的设备,其中所述控制树还被配置用于:
从所述多个进程中的每个进程接收注册消息;
当已经从所述多个进程的给定子组的所有成员接收到注册消息时,将所述给定子组标记为准备好操作;
当作为源子组的给定子组和作为目的子组的对应子组准备好操作时,将给定的源子组和给定的目的子组配对并将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例;以及
在关于每个所述源子组和每个所述目的子组的操作完成时,通知每个所述源子组和每个所述目的子组。
11.根据权利要求10所述的设备,其中所述控制树被配置用于,除了将所述给定的源子组和所述给定的目的子组配对之外,将所述给定的源子组和所述给定的目的子组分配到给定的数据重新打包电路实例。
12.根据权利要求10所述的设备,还包括除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配到给定的数据重新打包电路实例。
13.根据权利要求12所述的设备,其中所述控制树包括约简树。
14.根据权利要求9所述的设备,其中所述控制树包括约简树。
CN202010117006.2A 2019-02-25 2020-02-25 集体通信系统和方法 Active CN111614581B (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962809786P 2019-02-25 2019-02-25
US62/809,786 2019-02-25
EP20156490.3 2020-02-10
EP20156490.3A EP3699770A1 (en) 2019-02-25 2020-02-10 Collective communication system and methods

Publications (2)

Publication Number Publication Date
CN111614581A true CN111614581A (zh) 2020-09-01
CN111614581B CN111614581B (zh) 2022-07-05

Family

ID=69645874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010117006.2A Active CN111614581B (zh) 2019-02-25 2020-02-25 集体通信系统和方法

Country Status (3)

Country Link
US (3) US11196586B2 (zh)
EP (1) EP3699770A1 (zh)
CN (1) CN111614581B (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3699770A1 (en) 2019-02-25 2020-08-26 Mellanox Technologies TLV Ltd. Collective communication system and methods
US11750699B2 (en) 2020-01-15 2023-09-05 Mellanox Technologies, Ltd. Small message aggregation
US11876885B2 (en) 2020-07-02 2024-01-16 Mellanox Technologies, Ltd. Clock queue with arming and/or self-arming features
US11836549B2 (en) * 2020-10-15 2023-12-05 Advanced Micro Devices, Inc. Fast block-based parallel message passing interface transpose
US11556378B2 (en) 2020-12-14 2023-01-17 Mellanox Technologies, Ltd. Offloading execution of a multi-task parameter-dependent operation to a network device
US11934332B2 (en) 2022-02-01 2024-03-19 Mellanox Technologies, Ltd. Data shuffle offload
US11922237B1 (en) 2022-09-12 2024-03-05 Mellanox Technologies, Ltd. Single-step collective operations

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101267448A (zh) * 2008-05-09 2008-09-17 东北大学 一种基于嵌入式qnx操作系统的智能规约转换装置及方法
US20100017420A1 (en) * 2008-07-21 2010-01-21 International Business Machines Corporation Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations
CN101854556A (zh) * 2009-03-30 2010-10-06 索尼公司 信息处理设备和方法
CN102915031A (zh) * 2012-10-25 2013-02-06 中国科学技术大学 并联机器人运动学参数的智能自标定系统
CN104662855A (zh) * 2012-06-25 2015-05-27 科希尔技术股份有限公司 正交时频移动通信系统中的调制和均衡
US20150193269A1 (en) * 2014-01-06 2015-07-09 International Business Machines Corporation Executing an all-to-allv operation on a parallel computer that includes a plurality of compute nodes

Family Cites Families (276)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB8704883D0 (en) 1987-03-03 1987-04-08 Hewlett Packard Co Secure information storage
US5068877A (en) 1990-04-02 1991-11-26 At&T Bell Laboratories Method for synchronizing interconnected digital equipment
US5353412A (en) 1990-10-03 1994-10-04 Thinking Machines Corporation Partition control circuit for separately controlling message sending of nodes of tree-shaped routing network to divide the network into a number of partitions
US5325500A (en) 1990-12-14 1994-06-28 Xerox Corporation Parallel processing units on a substrate, each including a column of memory
WO1993007691A1 (en) 1991-10-01 1993-04-15 Norand Corporation A radio frequency local area network
JPH0752437B2 (ja) 1991-08-07 1995-06-05 インターナショナル・ビジネス・マシーンズ・コーポレイション メッセージの進行を追跡する複数ノード・ネットワーク
US5408469A (en) 1993-07-22 1995-04-18 Synoptics Communications, Inc. Routing device utilizing an ATM switch as a multi-channel backplane in a communication network
US6072796A (en) 1995-06-14 2000-06-06 Avid Technology, Inc. Apparatus and method for accessing memory in a TDM network
US5606703A (en) 1995-12-06 1997-02-25 International Business Machines Corporation Interrupt protocol system and method using priority-arranged queues of interrupt status block control data structures
US5944779A (en) 1996-07-02 1999-08-31 Compbionics, Inc. Cluster of workstations for solving compute-intensive applications by exchanging interim computation results using a two phase communication protocol
US6041049A (en) 1997-05-06 2000-03-21 International Business Machines Corporation Method and apparatus for determining a routing table for each node in a distributed nodal system
US6434620B1 (en) 1998-08-27 2002-08-13 Alacritech, Inc. TCP/IP offload network interface device
US6381682B2 (en) 1998-06-10 2002-04-30 Compaq Information Technologies Group, L.P. Method and apparatus for dynamically sharing memory in a multiprocessor system
US6438137B1 (en) 1997-12-22 2002-08-20 Nms Communications Corporation Packet-based trunking
US6115394A (en) 1998-03-04 2000-09-05 Ericsson Inc. Methods, apparatus and computer program products for packet transport over wireless communication links
US6507562B1 (en) 1998-06-30 2003-01-14 Sun Microsystems, Inc. Dynamic optimization for receivers using distance between a repair head and a member station in a repair group for receivers having a closely knit topological arrangement to locate repair heads near the member stations which they serve in tree based repair in reliable multicast protocol
US20190116159A9 (en) 1998-10-30 2019-04-18 Virnetx, Inc. Agile protocol for secure communications with assured system availability
US7418504B2 (en) 1998-10-30 2008-08-26 Virnetx, Inc. Agile network protocol for secure communications using secure domain names
US10511573B2 (en) 1998-10-30 2019-12-17 Virnetx, Inc. Agile network protocol for secure communications using secure domain names
US6483804B1 (en) 1999-03-01 2002-11-19 Sun Microsystems, Inc. Method and apparatus for dynamic packet batching with a high performance network interface
US7102998B1 (en) 1999-03-22 2006-09-05 Lucent Technologies Inc. Scaleable congestion control method for multicast communications over a data network
US6370502B1 (en) 1999-05-27 2002-04-09 America Online, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
JP2003503787A (ja) 1999-06-25 2003-01-28 マッシブリー パラレル コンピューティング, インコーポレイテッド 大規模集合ネットワークによる処理システムおよびその方法
US7124180B1 (en) 2000-04-27 2006-10-17 Hewlett-Packard Development Company, L.P. Internet usage data recording system and method employing a configurable rule engine for the processing and correlation of network data
JP2001313639A (ja) 2000-04-27 2001-11-09 Nec Corp ネットワーク構成データ管理システム及び方法並びに記録媒体
US6728862B1 (en) 2000-05-22 2004-04-27 Gazelle Technology Corporation Processor array and parallel data processing methods
US7171484B1 (en) 2000-05-24 2007-01-30 Krause Michael R Reliable datagram transport service
US7418470B2 (en) 2000-06-26 2008-08-26 Massively Parallel Technologies, Inc. Parallel processing systems and method
US7164422B1 (en) 2000-07-28 2007-01-16 Ab Initio Software Corporation Parameterized graphs with conditional components
US6816492B1 (en) 2000-07-31 2004-11-09 Cisco Technology, Inc. Resequencing packets at output ports without errors using packet timestamps and timestamp floors
US6937576B1 (en) 2000-10-17 2005-08-30 Cisco Technology, Inc. Multiple instance spanning tree protocol
US20020150094A1 (en) 2000-10-27 2002-10-17 Matthew Cheng Hierarchical level-based internet protocol multicasting
US7346698B2 (en) 2000-12-20 2008-03-18 G. W. Hannaway & Associates Webcasting method and system for time-based synchronization of multiple, independent media streams
CA2437629A1 (en) 2001-02-24 2002-09-06 International Business Machines Corporation Arithmetic functions in torus and tree networks
EP1381959A4 (en) 2001-02-24 2008-10-29 Ibm GLOBAL ARBORESCENT NETWORK FOR CALCULATION STRUCTURES
US20020152328A1 (en) 2001-04-11 2002-10-17 Mellanox Technologies, Ltd. Network adapter with shared database for message context information
US8051212B2 (en) 2001-04-11 2011-11-01 Mellanox Technologies Ltd. Network interface adapter with shared data send resources
EP1265124B1 (de) 2001-06-07 2004-05-19 Siemens Aktiengesellschaft Verfahren zum Übermitteln von Zeitinformation über ein Datenpaketnetz
US20030018828A1 (en) 2001-06-29 2003-01-23 International Business Machines Corporation Infiniband mixed semantic ethernet I/O path
US7383421B2 (en) 2002-12-05 2008-06-03 Brightscale, Inc. Cellular engine for a data processing system
US6789143B2 (en) 2001-09-24 2004-09-07 International Business Machines Corporation Infiniband work and completion queue management via head and tail circular buffers with indirect work queue entries
US20030065856A1 (en) 2001-10-03 2003-04-03 Mellanox Technologies Ltd. Network adapter with multiple event queues
US6754735B2 (en) 2001-12-21 2004-06-22 Agere Systems Inc. Single descriptor scatter gather data transfer to or from a host processor
US7224669B2 (en) 2002-01-22 2007-05-29 Mellandx Technologies Ltd. Static flow rate control
US7245627B2 (en) 2002-04-23 2007-07-17 Mellanox Technologies Ltd. Sharing a network interface card among multiple hosts
US7370117B2 (en) 2002-09-26 2008-05-06 Intel Corporation Communication system and method for communicating frames of management information in a multi-station network
US7167850B2 (en) 2002-10-10 2007-01-23 Ab Initio Software Corporation Startup and control of graph-based computation
US7310343B2 (en) 2002-12-20 2007-12-18 Hewlett-Packard Development Company, L.P. Systems and methods for rapid selection of devices in a tree topology network
US7584303B2 (en) 2002-12-20 2009-09-01 Forte 10 Networks, Inc. Lossless, stateful, real-time pattern matching with deterministic memory resources
US20040252685A1 (en) 2003-06-13 2004-12-16 Mellanox Technologies Ltd. Channel adapter with integrated switch
US20040260683A1 (en) 2003-06-20 2004-12-23 Chee-Yong Chan Techniques for information dissemination using tree pattern subscriptions and aggregation thereof
US20050097300A1 (en) 2003-10-30 2005-05-05 International Business Machines Corporation Processing system and method including a dedicated collective offload engine providing collective processing in a distributed computing environment
US7810093B2 (en) 2003-11-14 2010-10-05 Lawrence Livermore National Security, Llc Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes
US7219170B2 (en) 2003-12-04 2007-05-15 Intel Corporation Burst transfer register arrangement
US20050129039A1 (en) 2003-12-11 2005-06-16 International Business Machines Corporation RDMA network interface controller with cut-through implementation for aligned DDP segments
US7680670B2 (en) 2004-01-30 2010-03-16 France Telecom Dimensional vector and variable resolution quantization
US7327693B1 (en) 2004-03-30 2008-02-05 Cisco Technology, Inc. Method and apparatus for precisely measuring a packet transmission time
US20050223118A1 (en) 2004-04-05 2005-10-06 Ammasso, Inc. System and method for placement of sharing physical buffer lists in RDMA communication
US8041799B1 (en) 2004-04-30 2011-10-18 Sprint Communications Company L.P. Method and system for managing alarms in a communications network
JP4156568B2 (ja) 2004-06-21 2008-09-24 富士通株式会社 通信システムの制御方法、通信制御装置、プログラム
US7624163B2 (en) 2004-10-21 2009-11-24 Apple Inc. Automatic configuration information generation for distributed computing environment
US7336646B2 (en) 2004-10-26 2008-02-26 Nokia Corporation System and method for synchronizing a transport stream in a single frequency network
US7356625B2 (en) 2004-10-29 2008-04-08 International Business Machines Corporation Moving, resizing, and memory management for producer-consumer queues by consuming and storing any queue entries from an old queue before entries from a new queue
US7555549B1 (en) 2004-11-07 2009-06-30 Qlogic, Corporation Clustered computing model and display
US8698817B2 (en) 2004-11-15 2014-04-15 Nvidia Corporation Video processor having scalar and vector components
US7620071B2 (en) 2004-11-16 2009-11-17 Intel Corporation Packet coalescing
US7613774B1 (en) 2005-03-01 2009-11-03 Sun Microsystems, Inc. Chaperones in a distributed system
US20060282838A1 (en) 2005-06-08 2006-12-14 Rinku Gupta MPI-aware networking infrastructure
US7770088B2 (en) 2005-12-02 2010-08-03 Intel Corporation Techniques to transmit network protocol units
US7817580B2 (en) 2005-12-07 2010-10-19 Cisco Technology, Inc. Preventing transient loops in broadcast/multicast trees during distribution of link state information
US7760743B2 (en) 2006-03-06 2010-07-20 Oracle America, Inc. Effective high availability cluster management and effective state propagation for failure recovery in high availability clusters
US7743087B1 (en) 2006-03-22 2010-06-22 The Math Works, Inc. Partitioning distributed arrays according to criterion and functions applied to the distributed arrays
US8074026B2 (en) 2006-05-10 2011-12-06 Intel Corporation Scatter-gather intelligent memory architecture for unstructured streaming data on multiprocessor systems
US7996583B2 (en) 2006-08-31 2011-08-09 Cisco Technology, Inc. Multiple context single logic virtual host channel adapter supporting multiple transport protocols
CN101163240A (zh) 2006-10-13 2008-04-16 国际商业机器公司 一种滤波装置及其方法
US8094585B2 (en) 2006-10-31 2012-01-10 International Business Machines Corporation Membership management of network nodes
US7895601B2 (en) 2007-01-10 2011-02-22 International Business Machines Corporation Collective send operations on a system area network
US7949890B2 (en) 2007-01-31 2011-05-24 Net Power And Light, Inc. Method and system for precise synchronization of audio and video streams during a distributed communication session with multiple participants
US8380880B2 (en) 2007-02-02 2013-02-19 The Mathworks, Inc. Scalable architecture
US7913077B2 (en) 2007-02-13 2011-03-22 International Business Machines Corporation Preventing IP spoofing and facilitating parsing of private data areas in system area network connection requests
US7835391B2 (en) 2007-03-07 2010-11-16 Texas Instruments Incorporated Protocol DMA engine
CN101282276B (zh) 2007-04-03 2011-11-09 华为技术有限公司 一种以太网树业务的保护方法及设备
US7752421B2 (en) 2007-04-19 2010-07-06 International Business Machines Corporation Parallel-prefix broadcast for a parallel-prefix operation on a parallel computer
US8768898B1 (en) * 2007-04-26 2014-07-01 Netapp, Inc. Performing direct data manipulation on a storage device
US8539498B2 (en) 2007-05-17 2013-09-17 Alcatel Lucent Interprocess resource-based dynamic scheduling system and method
US8068429B2 (en) 2007-05-31 2011-11-29 Ixia Transmit scheduling
US7856551B2 (en) 2007-06-05 2010-12-21 Intel Corporation Dynamically discovering a system topology
US7738443B2 (en) 2007-06-26 2010-06-15 International Business Machines Corporation Asynchronous broadcast for ordered delivery between compute nodes in a parallel computing system where packet header space is limited
US8090704B2 (en) 2007-07-30 2012-01-03 International Business Machines Corporation Database retrieval with a non-unique key on a parallel computer system
US8108545B2 (en) 2007-08-27 2012-01-31 International Business Machines Corporation Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture
US7793158B2 (en) 2007-08-27 2010-09-07 International Business Machines Corporation Providing reliability of communication between supernodes of a multi-tiered full-graph interconnect architecture
US7958183B2 (en) 2007-08-27 2011-06-07 International Business Machines Corporation Performing collective operations using software setup and partial software execution at leaf nodes in a multi-tiered full-graph interconnect architecture
US8085686B2 (en) 2007-09-27 2011-12-27 Cisco Technology, Inc. Aggregation and propagation of sensor data within neighbor discovery messages in a tree-based ad hoc network
US8527590B2 (en) 2008-01-16 2013-09-03 Janos Tapolcai Solving mixed integer programs with peer-to-peer applications
US7801024B2 (en) 2008-02-12 2010-09-21 At&T Intellectual Property Ii, L.P. Restoring aggregated circuits with circuit integrity checks in a hierarchical network
US7991857B2 (en) 2008-03-24 2011-08-02 International Business Machines Corporation Broadcasting a message in a parallel computer
US8375197B2 (en) 2008-05-21 2013-02-12 International Business Machines Corporation Performing an allreduce operation on a plurality of compute nodes of a parallel computer
US7948979B2 (en) 2008-05-28 2011-05-24 Intel Corporation Programmable network interface card
US7944946B2 (en) 2008-06-09 2011-05-17 Fortinet, Inc. Virtual memory protocol segmentation offloading
US7797445B2 (en) 2008-06-26 2010-09-14 International Business Machines Corporation Dynamic network link selection for transmitting a message between compute nodes of a parallel computer
US7865693B2 (en) 2008-10-14 2011-01-04 International Business Machines Corporation Aligning precision converted vector data using mask indicating offset relative to element boundary corresponding to precision type
US20190377580A1 (en) 2008-10-15 2019-12-12 Hyperion Core Inc. Execution of instructions based on processor and data availability
US9853922B2 (en) 2012-02-24 2017-12-26 Sococo, Inc. Virtual area communications
US8370675B2 (en) 2009-01-28 2013-02-05 Mellanox Technologies Ltd. Precise clock synchronization
US8239847B2 (en) 2009-03-18 2012-08-07 Microsoft Corporation General distributed reduction for data parallel computing
US8255475B2 (en) 2009-04-28 2012-08-28 Mellanox Technologies Ltd. Network interface device with memory management capabilities
US9596186B2 (en) 2009-06-30 2017-03-14 Oracle America, Inc. Multiple processes sharing a single infiniband connection
US8447954B2 (en) 2009-09-04 2013-05-21 International Business Machines Corporation Parallel pipelined vector reduction in a data processing system
US8321454B2 (en) 2009-09-14 2012-11-27 Myspace Llc Double map reduce distributed computing framework
US8838907B2 (en) 2009-10-07 2014-09-16 Hewlett-Packard Development Company, L.P. Notification protocol based endpoint caching of host memory
EP2488963A1 (en) 2009-10-15 2012-08-22 Rogers Communications Inc. System and method for phrase identification
US9110860B2 (en) 2009-11-11 2015-08-18 Mellanox Technologies Tlv Ltd. Topology-aware fabric-based offloading of collective functions
US8571834B2 (en) 2010-01-08 2013-10-29 International Business Machines Corporation Opcode counting for performance measurement
US10158702B2 (en) 2009-11-15 2018-12-18 Mellanox Technologies, Ltd. Network operation offloading for collective operations
US8811417B2 (en) 2009-11-15 2014-08-19 Mellanox Technologies Ltd. Cross-channel network operation offloading for collective operations
US8213315B2 (en) 2009-11-19 2012-07-03 Mellanox Technologies Ltd. Dynamically-connected transport service
US9081501B2 (en) 2010-01-08 2015-07-14 International Business Machines Corporation Multi-petascale highly efficient parallel supercomputer
US8751655B2 (en) 2010-03-29 2014-06-10 International Business Machines Corporation Collective acceleration unit tree structure
US8332460B2 (en) 2010-04-14 2012-12-11 International Business Machines Corporation Performing a local reduction operation on a parallel computer
US8555265B2 (en) 2010-05-04 2013-10-08 Google Inc. Parallel processing of data
US9406336B2 (en) 2010-08-26 2016-08-02 Blast Motion Inc. Multi-sensor event detection system
US9253248B2 (en) 2010-11-15 2016-02-02 Interactic Holdings, Llc Parallel information system utilizing flow control and virtual channels
US9552206B2 (en) 2010-11-18 2017-01-24 Texas Instruments Incorporated Integrated circuit with control node circuitry and processing circuitry
US8490112B2 (en) 2010-12-03 2013-07-16 International Business Machines Corporation Data communications for a collective operation in a parallel active messaging interface of a parallel computer
US9258390B2 (en) 2011-07-29 2016-02-09 Solarflare Communications, Inc. Reducing network latency
JP5776267B2 (ja) 2011-03-29 2015-09-09 日本電気株式会社 分散ファイルシステム
US9619301B2 (en) 2011-04-06 2017-04-11 Telefonaktiebolaget L M Ericsson (Publ) Multi-core memory model and speculative mode processor management
FR2979719B1 (fr) 2011-09-02 2014-07-25 Thales Sa Systeme de communications permettant la transmission de signaux entre des equipements terminaux raccordes a des equipements intermediaires relies a un reseau ethernet
US8645663B2 (en) 2011-09-12 2014-02-04 Mellanox Technologies Ltd. Network interface controller with flexible memory handling
US9009686B2 (en) 2011-11-07 2015-04-14 Nvidia Corporation Algorithm for 64-bit address mode optimization
US9397960B2 (en) 2011-11-08 2016-07-19 Mellanox Technologies Ltd. Packet steering
US8694701B2 (en) 2011-12-15 2014-04-08 Mellanox Technologies Ltd. Recovering dropped instructions in a network interface controller
KR20130068849A (ko) 2011-12-16 2013-06-26 한국전자통신연구원 이종 네트워크로 구성된 네트워크 환경에서 장치들간의 계층적 메시지 전송을 위한 시스템 및 그 방법
CN103297172B (zh) 2012-02-24 2016-12-21 华为技术有限公司 分组聚合的数据传输方法、接入点、中继节点和数据节点
JP2015511074A (ja) 2012-03-23 2015-04-13 日本電気株式会社 通信のためのシステム及び方法
US10387448B2 (en) 2012-05-15 2019-08-20 Splunk Inc. Replication of summary data in a clustered computing environment
US9158602B2 (en) 2012-05-21 2015-10-13 Intermational Business Machines Corporation Processing posted receive commands in a parallel computer
US8972986B2 (en) 2012-05-25 2015-03-03 International Business Machines Corporation Locality-aware resource allocation for cloud computing
WO2013180738A1 (en) 2012-06-02 2013-12-05 Intel Corporation Scatter using index array and finite state machine
US9123219B2 (en) 2012-06-19 2015-09-01 Honeywell International Inc. Wireless fire system based on open standard wireless protocols
US8761189B2 (en) 2012-06-28 2014-06-24 Mellanox Technologies Ltd. Responding to dynamically-connected transport requests
US9002970B2 (en) 2012-07-12 2015-04-07 International Business Machines Corporation Remote direct memory access socket aggregation
US11403317B2 (en) 2012-07-26 2022-08-02 Mongodb, Inc. Aggregation framework system architecture and method
US8887056B2 (en) 2012-08-07 2014-11-11 Advanced Micro Devices, Inc. System and method for configuring cloud computing systems
JP5939305B2 (ja) 2012-09-07 2016-06-22 富士通株式会社 情報処理装置,並列計算機システム及び情報処理装置の制御方法
US9842046B2 (en) * 2012-09-28 2017-12-12 Intel Corporation Processing memory access instructions that have duplicate memory indices
US9424214B2 (en) 2012-09-28 2016-08-23 Mellanox Technologies Ltd. Network interface controller with direct connection to host memory
US9606961B2 (en) 2012-10-30 2017-03-28 Intel Corporation Instruction and logic to provide vector compress and rotate functionality
US9160607B1 (en) 2012-11-09 2015-10-13 Cray Inc. Method and apparatus for deadlock avoidance
US10049061B2 (en) 2012-11-12 2018-08-14 International Business Machines Corporation Active memory device gather, scatter, and filter
US9411584B2 (en) 2012-12-29 2016-08-09 Intel Corporation Methods, apparatus, instructions, and logic to provide vector address conflict detection functionality
US10218808B2 (en) 2014-10-20 2019-02-26 PlaceIQ, Inc. Scripting distributed, parallel programs
US10275375B2 (en) 2013-03-10 2019-04-30 Mellanox Technologies, Ltd. Network interface controller with compression capabilities
US11966355B2 (en) 2013-03-10 2024-04-23 Mellanox Technologies, Ltd. Network adapter with a common queue for both networking and data manipulation work requests
US9275014B2 (en) 2013-03-13 2016-03-01 Qualcomm Incorporated Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods
US9495154B2 (en) 2013-03-13 2016-11-15 Qualcomm Incorporated Vector processing engines having programmable data path configurations for providing multi-mode vector processing, and related vector processors, systems, and methods
US9952975B2 (en) 2013-04-30 2018-04-24 Hewlett Packard Enterprise Development Lp Memory network to route memory traffic and I/O traffic
US9384168B2 (en) 2013-06-11 2016-07-05 Analog Devices Global Vector matrix product accelerator for microprocessor integration
US9817742B2 (en) 2013-06-25 2017-11-14 Dell International L.L.C. Detecting hardware and software problems in remote systems
US9541947B2 (en) 2013-08-07 2017-01-10 General Electric Company Time protocol based timing system for time-of-flight instruments
GB2518425A (en) 2013-09-20 2015-03-25 Tcs John Huxley Europ Ltd Messaging system
WO2015051387A1 (en) 2013-10-11 2015-04-16 Fts Computertechnik Gmbh Method for executing tasks in a computer network
CA2867589A1 (en) 2013-10-15 2015-04-15 Coho Data Inc. Systems, methods and devices for implementing data management in a distributed data storage system
US9977676B2 (en) 2013-11-15 2018-05-22 Qualcomm Incorporated Vector processing engines (VPEs) employing reordering circuitry in data flow paths between execution units and vector data memory to provide in-flight reordering of output vector data stored to vector data memory, and related vector processor systems and methods
US20150143076A1 (en) 2013-11-15 2015-05-21 Qualcomm Incorporated VECTOR PROCESSING ENGINES (VPEs) EMPLOYING DESPREADING CIRCUITRY IN DATA FLOW PATHS BETWEEN EXECUTION UNITS AND VECTOR DATA MEMORY TO PROVIDE IN-FLIGHT DESPREADING OF SPREAD-SPECTRUM SEQUENCES, AND RELATED VECTOR PROCESSING INSTRUCTIONS, SYSTEMS, AND METHODS
US9792118B2 (en) 2013-11-15 2017-10-17 Qualcomm Incorporated Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption, and related vector processor systems and methods
US9684509B2 (en) 2013-11-15 2017-06-20 Qualcomm Incorporated Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory, and related vector processing instructions, systems, and methods
US9880845B2 (en) 2013-11-15 2018-01-30 Qualcomm Incorporated Vector processing engines (VPEs) employing format conversion circuitry in data flow paths between vector data memory and execution units to provide in-flight format-converting of input vector data to execution units for vector processing operations, and related vector processor systems and methods
US9619227B2 (en) 2013-11-15 2017-04-11 Qualcomm Incorporated Vector processing engines (VPEs) employing tapped-delay line(s) for providing precision correlation / covariance vector processing operations with reduced sample re-fetching and power consumption, and related vector processor systems and methods
US10305980B1 (en) 2013-11-27 2019-05-28 Intellectual Property Systems, LLC Arrangements for communicating data in a computing system using multiple processors
JP6152786B2 (ja) 2013-11-29 2017-06-28 富士通株式会社 通信制御装置、情報処理装置、並列計算機システム、制御プログラム、及び並列計算機システムの制御方法
GB2521441B (en) 2013-12-20 2016-04-20 Imagination Tech Ltd Packet loss mitigation
US9563426B1 (en) 2013-12-30 2017-02-07 EMC IP Holding Company LLC Partitioned key-value store with atomic memory operations
US9355061B2 (en) 2014-01-28 2016-05-31 Arm Limited Data processing apparatus and method for performing scan operations
US9696942B2 (en) 2014-03-17 2017-07-04 Mellanox Technologies, Ltd. Accessing remote storage devices using a local bus protocol
US9925492B2 (en) 2014-03-24 2018-03-27 Mellanox Technologies, Ltd. Remote transactional memory
US9442968B2 (en) 2014-03-31 2016-09-13 Sap Se Evaluation of variant configuration using in-memory technology
US10339079B2 (en) 2014-06-02 2019-07-02 Western Digital Technologies, Inc. System and method of interleaving data retrieved from first and second buffers
US9350825B2 (en) 2014-06-16 2016-05-24 International Business Machines Corporation Optimizing network communications
US20150379022A1 (en) 2014-06-27 2015-12-31 General Electric Company Integrating Execution of Computing Analytics within a Mapreduce Processing Environment
WO2016057783A1 (en) 2014-10-08 2016-04-14 Interactic Holdings, Llc Fast fourier transform using a distributed computing system
US9756154B1 (en) 2014-10-13 2017-09-05 Xilinx, Inc. High throughput packet state processing
US10331595B2 (en) 2014-10-23 2019-06-25 Mellanox Technologies, Ltd. Collaborative hardware interaction by multiple entities using a shared queue
US10904122B2 (en) 2014-10-28 2021-01-26 Salesforce.Com, Inc. Facilitating workload-aware shuffling and management of message types in message queues in an on-demand services environment
GB2549883A (en) 2014-12-15 2017-11-01 Hyperion Core Inc Advanced processor architecture
US9851970B2 (en) 2014-12-23 2017-12-26 Intel Corporation Method and apparatus for performing reduction operations on a set of vector elements
US9674071B2 (en) 2015-02-20 2017-06-06 Telefonaktiebolaget Lm Ericsson (Publ) High-precision packet train generation
US20160295426A1 (en) 2015-03-30 2016-10-06 Nokia Solutions And Networks Oy Method and system for communication networks
US10425350B1 (en) 2015-04-06 2019-09-24 EMC IP Holding Company LLC Distributed catalog service for data processing platform
US10541938B1 (en) 2015-04-06 2020-01-21 EMC IP Holding Company LLC Integration of distributed data processing platform with one or more distinct supporting platforms
US10277668B1 (en) 2015-04-06 2019-04-30 EMC IP Holding Company LLC Beacon-based distributed data processing platform
US10282347B2 (en) 2015-04-08 2019-05-07 Louisana State University Research & Technology Foundation Architecture for configuration of a reconfigurable integrated circuit
ES2929626T3 (es) 2015-05-21 2022-11-30 Goldman Sachs & Co Llc Arquitectura de computación paralela de propósito general
US10210134B2 (en) 2015-05-21 2019-02-19 Goldman Sachs & Co. LLC General-purpose parallel computing architecture
US10320695B2 (en) 2015-05-29 2019-06-11 Advanced Micro Devices, Inc. Message aggregation, combining and compression for efficient data communications in GPU-based clusters
US10027601B2 (en) 2015-06-03 2018-07-17 Mellanox Technologies, Ltd. Flow-based packet modification
US10042794B2 (en) 2015-06-12 2018-08-07 Apple Inc. Methods and apparatus for synchronizing uplink and downlink transactions on an inter-device communication link
US10284383B2 (en) 2015-08-31 2019-05-07 Mellanox Technologies, Ltd. Aggregation protocol
US20170072876A1 (en) 2015-09-14 2017-03-16 Broadcom Corporation Hardware-Accelerated Protocol Conversion in an Automotive Gateway Controller
WO2017053468A1 (en) * 2015-09-21 2017-03-30 Dolby Laboratories Licensing Corporation Efficient delivery of customized content over intelligent network
US10063474B2 (en) 2015-09-29 2018-08-28 Keysight Technologies Singapore (Holdings) Pte Ltd Parallel match processing of network packets to identify packet data for masking or other actions
US20170116154A1 (en) 2015-10-23 2017-04-27 The Intellisis Corporation Register communication in a network-on-a-chip architecture
CN105528191B (zh) 2015-12-01 2017-04-12 中国科学院计算技术研究所 数据累加装置、方法及数字信号处理装置
US10498654B2 (en) 2015-12-28 2019-12-03 Amazon Technologies, Inc. Multi-path transport design
US9985903B2 (en) 2015-12-29 2018-05-29 Amazon Technologies, Inc. Reliable, out-of-order receipt of packets
US9985904B2 (en) 2015-12-29 2018-05-29 Amazon Technolgies, Inc. Reliable, out-of-order transmission of packets
US11044183B2 (en) 2015-12-29 2021-06-22 Xilinx, Inc. Network interface device
US20170192782A1 (en) 2015-12-30 2017-07-06 Robert Valentine Systems, Apparatuses, and Methods for Aggregate Gather and Stride
US10187400B1 (en) 2016-02-23 2019-01-22 Area 1 Security, Inc. Packet filters in security appliances with modes and intervals
US10521283B2 (en) 2016-03-07 2019-12-31 Mellanox Technologies, Ltd. In-node aggregation and disaggregation of MPI alltoall and alltoallv collectives
WO2017155545A1 (en) 2016-03-11 2017-09-14 Tektronix Texas, Llc. Timestamping data received by monitoring system in nfv
CN107181724B (zh) 2016-03-11 2021-02-12 华为技术有限公司 一种协同流的识别方法、系统以及使用该方法的服务器
KR102564165B1 (ko) 2016-04-25 2023-08-04 삼성전자주식회사 비휘발성 메모리 익스프레스 컨트롤러에 의한 입출력 큐 관리 방법
US20190339688A1 (en) 2016-05-09 2019-11-07 Strong Force Iot Portfolio 2016, Llc Methods and systems for data collection, learning, and streaming of machine signals for analytics and maintenance using the industrial internet of things
US10320952B2 (en) 2016-05-16 2019-06-11 Mellanox Technologies Tlv Ltd. System-wide synchronized switch-over of multicast flows
US20170344589A1 (en) 2016-05-26 2017-11-30 Hewlett Packard Enterprise Development Lp Output vector generation from feature vectors representing data objects of a physical system
US10748210B2 (en) 2016-08-09 2020-08-18 Chicago Mercantile Exchange Inc. Systems and methods for coordinating processing of scheduled instructions across multiple components
US10810484B2 (en) 2016-08-12 2020-10-20 Xilinx, Inc. Hardware accelerator for compressed GRU on FPGA
US10528518B2 (en) 2016-08-21 2020-01-07 Mellanox Technologies, Ltd. Using hardware gather-scatter capabilities to optimize MPI all-to-all
JP6820586B2 (ja) 2016-08-31 2021-01-27 株式会社メディアリンクス 時刻同期システム
US10977260B2 (en) 2016-09-26 2021-04-13 Splunk Inc. Task distribution in an execution node of a distributed execution environment
US11461334B2 (en) 2016-09-26 2022-10-04 Splunk Inc. Data conditioning for dataset destination
US11243963B2 (en) 2016-09-26 2022-02-08 Splunk Inc. Distributing partial results to worker nodes from an external data system
US10425358B2 (en) 2016-09-29 2019-09-24 International Business Machines Corporation Network switch architecture supporting multiple simultaneous collective operations
CN107896238B (zh) 2016-10-04 2020-09-18 丰田自动车株式会社 车载网络系统
US10929174B2 (en) 2016-12-15 2021-02-23 Ecole Polytechnique Federale De Lausanne (Epfl) Atomic object reads for in-memory rack-scale computing
US10296351B1 (en) 2017-03-15 2019-05-21 Ambarella, Inc. Computer vision processing in hardware data paths
US10296473B2 (en) 2017-03-24 2019-05-21 Western Digital Technologies, Inc. System and method for fast execution of in-capsule commands
US10218642B2 (en) 2017-03-27 2019-02-26 Mellanox Technologies Tlv Ltd. Switch arbitration based on distinct-flow counts
US10419329B2 (en) 2017-03-30 2019-09-17 Mellanox Technologies Tlv Ltd. Switch-based reliable multicast service
US10108581B1 (en) 2017-04-03 2018-10-23 Google Llc Vector reduction processor
US20180302324A1 (en) 2017-04-18 2018-10-18 Atsushi Kasuya Packet forwarding mechanism
US10318306B1 (en) 2017-05-03 2019-06-11 Ambarella, Inc. Multidimensional vectors in a coprocessor
US10338919B2 (en) 2017-05-08 2019-07-02 Nvidia Corporation Generalized acceleration of matrix multiply accumulate operations
US10628236B2 (en) * 2017-06-06 2020-04-21 Huawei Technologies Canada Co., Ltd. System and method for inter-datacenter communication
US10367750B2 (en) 2017-06-15 2019-07-30 Mellanox Technologies, Ltd. Transmission and reception of raw video using scalable frame rate
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11397428B2 (en) 2017-08-02 2022-07-26 Strong Force Iot Portfolio 2016, Llc Self-organizing systems and methods for data collection
US10693787B2 (en) 2017-08-25 2020-06-23 Intel Corporation Throttling for bandwidth imbalanced data transfers
US10727966B1 (en) 2017-08-30 2020-07-28 Amazon Technologies, Inc. Time synchronization with distributed grand master
CN110245751B (zh) 2017-08-31 2020-10-09 中科寒武纪科技股份有限公司 一种gemm运算方法及装置
CN109426574B (zh) * 2017-08-31 2022-04-05 华为技术有限公司 分布式计算系统,分布式计算系统中数据传输方法和装置
US10547553B2 (en) 2017-09-17 2020-01-28 Mellanox Technologies, Ltd. Stateful connection tracking
US10482337B2 (en) 2017-09-29 2019-11-19 Infineon Technologies Ag Accelerating convolutional neural network computation throughput
US10445098B2 (en) 2017-09-30 2019-10-15 Intel Corporation Processors and methods for privileged configuration in a spatial array
US10380063B2 (en) 2017-09-30 2019-08-13 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator
US11694066B2 (en) 2017-10-17 2023-07-04 Xilinx, Inc. Machine learning runtime library for neural network acceleration
GB2569276B (en) * 2017-10-20 2020-10-14 Graphcore Ltd Compiler method
US10771406B2 (en) 2017-11-11 2020-09-08 Microsoft Technology Licensing, Llc Providing and leveraging implicit signals reflecting user-to-BOT interaction
US10887252B2 (en) 2017-11-14 2021-01-05 Mellanox Technologies, Ltd. Efficient scatter-gather over an uplink
US11277350B2 (en) 2018-01-09 2022-03-15 Intel Corporation Communication of a large message using multiple network interface controllers
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US10747708B2 (en) 2018-03-08 2020-08-18 Allegro Microsystems, Llc Communication system between electronic devices
US10621489B2 (en) 2018-03-30 2020-04-14 International Business Machines Corporation Massively parallel neural inference computing elements
US20190303263A1 (en) 2018-03-30 2019-10-03 Kermin E. Fleming, JR. Apparatus, methods, and systems for integrated performance monitoring in a configurable spatial accelerator
US20190044827A1 (en) 2018-03-30 2019-02-07 Intel Corporatoin Communication of a message using a network interface controller on a subnet
US10564980B2 (en) 2018-04-03 2020-02-18 Intel Corporation Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator
EP3791236A4 (en) 2018-05-07 2022-06-08 Strong Force Iot Portfolio 2016, LLC METHODS AND SYSTEMS FOR DATA COLLECTION, LEARNING AND STREAMING MACHINE SIGNALS FOR ANALYSIS AND MAINTENANCE USING THE INDUSTRIAL INTERNET OF THINGS
US10678540B2 (en) 2018-05-08 2020-06-09 Arm Limited Arithmetic operation with shift
US11048509B2 (en) 2018-06-05 2021-06-29 Qualcomm Incorporated Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices
US11277455B2 (en) 2018-06-07 2022-03-15 Mellanox Technologies, Ltd. Streaming system
US10839894B2 (en) 2018-06-29 2020-11-17 Taiwan Semiconductor Manufacturing Company Ltd. Memory computation circuit and method
US20190044889A1 (en) 2018-06-29 2019-02-07 Intel Corporation Coalescing small payloads
US10754649B2 (en) 2018-07-24 2020-08-25 Apple Inc. Computation engine that operates in matrix and vector modes
US10915324B2 (en) 2018-08-16 2021-02-09 Tachyum Ltd. System and method for creating and executing an instruction word for simultaneous execution of instruction operations
US20200106828A1 (en) 2018-10-02 2020-04-02 Mellanox Technologies, Ltd. Parallel Computation Network Device
US11019016B2 (en) * 2018-10-27 2021-05-25 International Business Machines Corporation Subgroup messaging within a group-based messaging interface
US11625393B2 (en) 2019-02-19 2023-04-11 Mellanox Technologies, Ltd. High performance computing system
EP3699770A1 (en) 2019-02-25 2020-08-26 Mellanox Technologies TLV Ltd. Collective communication system and methods
US11296807B2 (en) 2019-06-25 2022-04-05 Intel Corporation Techniques to operate a time division multiplexing(TDM) media access control (MAC)
US11516151B2 (en) 2019-12-31 2022-11-29 Infinera Oy Dynamically switching queueing systems for network switches
US11750699B2 (en) 2020-01-15 2023-09-05 Mellanox Technologies, Ltd. Small message aggregation
US11252027B2 (en) 2020-01-23 2022-02-15 Mellanox Technologies, Ltd. Network element supporting flexible data reduction operations
US11271874B2 (en) 2020-02-05 2022-03-08 Mellanox Technologies, Ltd. Network adapter with time-aware packet-processing pipeline
US11476928B2 (en) 2020-03-18 2022-10-18 Mellanox Technologies, Ltd. TDMA networking using commodity NIC/switch
US20220201103A1 (en) 2022-03-09 2022-06-23 Intel Corporation Metadata compaction in packet coalescing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101267448A (zh) * 2008-05-09 2008-09-17 东北大学 一种基于嵌入式qnx操作系统的智能规约转换装置及方法
US20100017420A1 (en) * 2008-07-21 2010-01-21 International Business Machines Corporation Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations
CN101854556A (zh) * 2009-03-30 2010-10-06 索尼公司 信息处理设备和方法
CN104662855A (zh) * 2012-06-25 2015-05-27 科希尔技术股份有限公司 正交时频移动通信系统中的调制和均衡
CN102915031A (zh) * 2012-10-25 2013-02-06 中国科学技术大学 并联机器人运动学参数的智能自标定系统
US20150193269A1 (en) * 2014-01-06 2015-07-09 International Business Machines Corporation Executing an all-to-allv operation on a parallel computer that includes a plurality of compute nodes

Also Published As

Publication number Publication date
US20240089147A1 (en) 2024-03-14
US20220029854A1 (en) 2022-01-27
US11196586B2 (en) 2021-12-07
CN111614581B (zh) 2022-07-05
EP3699770A1 (en) 2020-08-26
US11876642B2 (en) 2024-01-16
US20200274733A1 (en) 2020-08-27

Similar Documents

Publication Publication Date Title
CN111614581B (zh) 集体通信系统和方法
Cheng et al. Using high-bandwidth networks efficiently for fast graph computation
Husbands et al. MPI-StarT: Delivering network performance to numerical applications
Petrini et al. Performance evaluation of the quadrics interconnection network
Liu et al. High performance RDMA-based MPI implementation over InfiniBand
Krishna et al. Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
Buntinas et al. Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem
Chiang et al. Multi-address encoding for multicast
Almási et al. Design and implementation of message-passing services for the Blue Gene/L supercomputer
US20050038918A1 (en) Method and apparatus for implementing work request lists
US20120151292A1 (en) Supporting Distributed Key-Based Processes
Stunkel et al. The high-speed networks of the Summit and Sierra supercomputers
CN103348641A (zh) 单一调制解调器板上改进的多小区支持的方法和系统
JP2007249810A (ja) 並列計算機のリダクション処理方法及び並列計算機
US20180052803A1 (en) Using Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All
Daneshtalab et al. Low-distance path-based multicast routing algorithm for network-on-chips
Kumar et al. Scaling alltoall collective on multi-core systems
Fei et al. FlexNFV: Flexible network service chaining with dynamic scaling
Suh et al. All-to-all personalized communication in multidimensional torus and mesh networks
US7929439B1 (en) Multiple network interface core apparatus and method
KR20140096587A (ko) 기능 유닛들 간의 기능 로직 공유 장치, 방법 및 재구성 가능 프로세서
Petrini et al. Scalable collective communication on the ASCI Q machine
Vishnu et al. Topology agnostic hot‐spot avoidance with InfiniBand
Woodside et al. Alternative software architectures for parallel protocol execution with synchronous IPC
CN102710772A (zh) 一种基于云平台的海量数据通讯系统

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220507

Address after: Israel Yuekeni Mourinho

Applicant after: Mellanox Technologies, Ltd.

Address before: Lai ananna

Applicant before: Mellanox Technologies TLV Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant