CN111614581A - 集体通信系统和方法 - Google Patents
集体通信系统和方法 Download PDFInfo
- Publication number
- CN111614581A CN111614581A CN202010117006.2A CN202010117006A CN111614581A CN 111614581 A CN111614581 A CN 111614581A CN 202010117006 A CN202010117006 A CN 202010117006A CN 111614581 A CN111614581 A CN 111614581A
- Authority
- CN
- China
- Prior art keywords
- data
- subgroup
- processes
- given
- destination
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 176
- 238000004891 communication Methods 0.000 title description 12
- 230000008569 process Effects 0.000 claims abstract description 153
- 239000011159 matrix material Substances 0.000 claims abstract description 45
- 230000009467 reduction Effects 0.000 claims description 9
- 238000012857 repacking Methods 0.000 claims description 5
- 238000004220 aggregation Methods 0.000 description 20
- 230000002776 aggregation Effects 0.000 description 18
- 239000000872 buffer Substances 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000009826 distribution Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000009466 transformation Effects 0.000 description 5
- 238000012546 transfer Methods 0.000 description 4
- 230000017105 transposition Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 2
- 238000011112 process operation Methods 0.000 description 2
- XUGPCRRUMVWELT-UHFFFAOYSA-N 2-(2,5-dimethoxy-4-propan-2-ylphenyl)ethanamine Chemical compound COC1=CC(C(C)C)=C(OC)C=C1CCN XUGPCRRUMVWELT-UHFFFAOYSA-N 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/542—Event management; Broadcasting; Multicasting; Notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/40—Bus networks
- H04L12/40169—Flexible bus arrangements
- H04L12/40176—Flexible bus arrangements involving redundancy
- H04L12/40182—Flexible bus arrangements involving redundancy by using a plurality of communication lines
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/90—Buffering arrangements
- H04L49/9057—Arrangements for supporting packet reassembly or resequencing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0238—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
- G06F12/0246—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/0413—MIMO systems
- H04B7/0456—Selection of precoding matrices or codebooks, e.g. using matrices antenna weighting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/44—Star or tree networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/10—Scheduling measurement reports ; Arrangements for measurement reports
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W88/00—Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
- H04W88/02—Terminal devices
- H04W88/06—Terminal devices adapted for operation in multiple networks or having at least two operational modes, e.g. multi-mode terminals
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Mobile Radio Communication Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
一种方法,其中多个进程被配置成持有去往其他进程的数据块,且数据重新打包电路包括:接收电路,其被配置用于从多个进程中的源进程接收至少一个数据块;重新打包电路,其被配置用于根据多个进程中的至少一个目的进程重新打包接收到的数据;以及发送电路,其被配置用于将重新打包的数据发送到多个进程中的至少一个目的进程;接收用于全对全数据交换的数据组,该数据组被配置为矩阵,该矩阵被分布在所述多个进程中,以及通过以下方式转置数据:多个进程中的每个进程将矩阵数据从该进程发送到重新打包电路,以及重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。
Description
技术领域
本发明在其示例性实施方式中涉及集体通信系统和方法,尤其涉及但不限于消息传递操作,并且还尤其涉及但不限于全对全(all-to-all)操作。
优先权声明
本申请要求Graham等人于2019年2月25日提交的美国临时专利申请S/N 62/809,786的优先权。
发明内容
本发明在其特定实施方式中,旨在提供改进的系统和方法用于集体通信,尤其涉及但不仅限于消息传递操作,包括全对全(all-to-all)操作。
因此,根据本发明的示例性实施方式,提供了一种方法包括提供多个进程,所述多个进程中的每个进程被配置成持有去往所述多个进程中的其他进程的数据块;提供至少一个数据重新打包电路实例,其包括被配置用于从所述多个进程中的至少一个源进程接收至少一个数据块的接收电路,被配置用于根据所述多个进程中的至少一个目的进程重新打包接收到的数据的重新打包电路,以及被配置用于将重新打包的数据发送到所述多个进程中的所述至少一个目的进程的发送电路;接收用于全对全数据交换的数据组,该数据组被配置为矩阵,该矩阵被分布在所述多个进程中;以及通过以下方式转置数据:所述多个进程中的每个进程将矩阵数据从所述进程发送到所述重新打包电路,以及所述重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。
此外,根据本发明的示例性实施方式,所述方法还包括提供控制树,所述控制树被配置用于控制所述多个进程和所述重新打包电路。
更进一步地,根据本发明的示例性实施方式,所述控制树还被配置用于从所述多个进程中的每个进程接收注册消息,当已经从所述多个进程的给定子组的所有成员接收到注册消息时将所述给定子组标记为准备好操作,当作为源子组的给定子组和作为目的子组的对应子组准备好操作时将给定的源子组和给定的目的子组配对并将所述给定的源子组和所述给定的目的子组分配给重新打包电路实例,以及在关于每个所述源子组和每个所述目的子组的操作完成时通知每个所述源子组和每个所述目的子组。
此外,根据本发明的示例性实施方式,所述控制树被配置用于,除了将所述给定的源子组和所述给定的目的子组配对之外,将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
此外,根据本发明的示例性实施方式,所述方法还包括除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
此外,根据本发明的示例性实施方式,所述控制树包括约简树。
根据本发明的另一示例性实施方式,还提供了一种设备,包括:接收电路,其被配置用于从多个进程中的至少一个源进程接收至少一个数据块,所述多个进程中的每个进程被配置成持有去往所述多个进程中的其他进程的数据块;至少一个数据重新打包电路实例,其被配置用于根据所述多个进程中的至少一个目的进程重新打包接收到的数据;以及发送电路,其被配置用于将所述重新打包的数据发送到所述多个进程中的所述至少一个目的进程,所述设备被配置用于接收用于全对全数据交换的数据组,所述数据组被配置为矩阵,所述矩阵被分布在所述多个进程中,并且所述设备还被配置用于通过以下方式转置所述数据:在所述重新打包电路处从所述多个进程中的每个进程接收来自所述进程的矩阵数据,以及所述数据重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。
此外,根据本发明的示例性实施方式,所述设备还包括控制树,所述控制树被配置用于控制所述多个进程和所述重新打包电路。
更进一步地,根据本发明的示例性实施方式,所述控制树还被配置用于从所述多个进程中的每个进程接收注册消息,当已经从所述多个进程的给定子组的所有成员接收到注册消息时将所述给定子组标记为准备好操作,当作为源子组的给定子组和作为目的子组的对应子组准备好操作时将给定的源子组和给定的目的子组配对并将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例,以及在关于每个源子组和每个目的子组的操作完成时通知每个所述源子组和每个所述目的子组。
此外,根据本发明的示例性实施方式,所述控制树被配置用于,除了将所述给定的源子组和所述给定的目的子组配对之外,将所述给定的源子组和所述给定的目的子组分配到给定的数据重新打包电路实例。
此外,根据本发明的示例性实施方式,所述设备还包括除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配到给定的数据重新打包电路实例。
此外,根据本发明的示例性实施方式,所述控制树包括约简树。
附图说明
通过以下详细描述并结合附图,将会更充分地理解和领会本发明,其中:
图1A是根据本发明的示例性实施方式构建和操作的示例性计算机系统的简化图示;
图1B是示例性数据块布局的简化图示;
图2是另一示例性数据块布局的简化图示;
图3是描绘全对全v初始阶段和最终阶段的简化图示;
图4是描绘直接成对交换的简化图示;
图5是描述聚合算法的简化图示;
图6是描绘根据本发明的示例性实施方式的全对全操作的初始块分布的简化图示;
图7是描绘根据本发明的示例性实施方式的全对全操作的最终块分布的简化图示;
图8是描绘根据本发明的另一示例性实施方式的全对全子矩阵分布的简化图示;以及
图9是描绘根据本发明的示例性实施方式的子块转置的简化图示。
具体实施方式
在通信标准诸如消息传递接口(Message Passing Interface,MPI)(论坛,2015)中定义的全对全(all-to-all)操作是集体数据操作,其中每个进程向集体组中的每个其他进程发送数据,并从组中的每个进程接收相同量的数据。发送到每个进程的数据具有相同的长度a,并且是惟一的,源自不同的存储器位置。在诸如MPI之类通信标准中,进程操作的概念与任何特定的硬件基础架构解耦。本文讨论的集体组是指定义(集体)操作的一组进程。在MPI规范中,集体组被称为“通信子(communicator)”,而在OpenSHMEM中(例如,参见www.openshmem.org/site/),集体组被称为“团队(team)”。
现参考图1A,其为根据本发明的示例性实施方式构建和操作的示例性计算机系统的简化图示。图1A的系统,总体上标示为110,包括多个进程120,其中数据(通常为数据块)130在其间流动。本文使用的术语“数据块”(在各种语法形式下)是指数据,所述数据在集体组内从成员(进程、等级、……)i发送到成员j。应当理解,正如本文其他各处所解释的,对于全对全,所有块的大小是相同的(并且可以是0),而对于全对全v/w,假设数据块的大小是不一致的,并且一些/所有块可能是0。
下文描述了图1A的系统的示例性操作方法。在图1A中,通过非限制示例的方式示出了在片上系统中互连的多个CPU(包括CPU 1、CPU 2和CPU N)正在运行多个进程120。其他系统示例,举非限制示例而言,包括:单个CPU;由网络连接起来的多个系统或服务器;或者任何其他合适的系统。如上所述,本文所述的进程操作的概念与任何特定的硬件基础架构解耦,尽管应当理解,在任何实际的实现中,将会使用一些硬件基础架构(如图1A中所示或如上文所述)。
现参考图1B,其为包括多个数据块180的示例性数据块布局175的简化图示;并且参考图2,其为包括多个数据块220的另一示例性数据块布局210的简化图示。图1B示出了施加全对全操作之前的示例性数据块布局175,而图2示出了施加全对全操作之后对应的数据块布局210。图1B中的每个数据块180和图2中的每个数据块220对应于长度为a的矢量。
用于实现全对全算法的算法一般分为两类——直接交换算法和聚合算法。
全对全聚合算法旨在降低延迟成本,该延迟成本在短数据传输中占主导地位。全对全聚合算法采用数据转发方法,以便减少发送的消息的数目,从而降低延迟成本。这样的方法从/向多个源收集/分散数据,从而产生更少的较大数据传输,但是将给定的数据段发送多次。当参与集体操作的通信上下文的数目变得过多时,聚合技术变得比直接数据交换更低效;这是由于将给定的数据段多次传输的成本越来越高。全对全算法利用了数据长度a是算法常数这一事实,从而提供了足够的全局知识来协调中间过程中的数据交换。
直接交换算法通常用于全对全实例,其中传输的数据长度a超过带宽贡献占主导的阈值,或者当聚合技术聚合了来自过多进程的数据时,会导致聚合技术效率低下。
随着系统大小的增长,对于支持小型数据全对全交换的高效实现的需求也在增加,因为这是许多高性能计算(high-performance computing,HPC)应用所使用的数据交换模式。本发明在其示例性实施方式中,提出了一种新的全对全算法,其被设计用于在通信子大小的全范围内提高小型数据交换的效率。这包括一种新的基于聚合的算法,其适用于小型数据个体化全对全数据交换,并且可以被视为分布式矩阵的转置。虽然在本说明书和权利要求书中以各种语法形式使用了转置,但应当理解,转置包括根据本发明的示例性实施方式将算法概念化的方式;例如,在不限制上述声明的一般性的情况下,在(例如)MPI标准的层面上可能不存在这样的概念化。在示例性实施方式中,这样的转置包括改变块相对于其他块的位置,而不改变任何块内的结构。参考本发明的示例性实施方式,本文描述的算法受益于网络中可用的大量并发性,并且被设计为对于通过网络硬件的实现简单高效。在示例性实施方式中,交换硬件和主机通道适配器的实现都是这种新设计的目标。
个体化全对全v/w(all-to-all-v/w)算法在某些方面与个体化全对全数据交换相似。个体化全对全w算法与全对全v算法的不同之处在于,每个单独传输的数据类型在整个函数中可能是唯一的。对全对全算法做出改变以支持这种集体操作。更具体地关于数据类型:使用MPI标准接口传输的数据为所有数据指定了数据类型,诸如MPI_DOUBLE用于双精度字。全对全v接口指定所有数据元素具有相同的数据类型。全对全w允许为每个数据块指定不同的数据类型,举例而言,诸如为从进程i到进程j的数据指定数据类型。
将全对全v/w操作用于每个进程以与参与此集体操作的进程组中的每个其他进程交换独特数据。两个给定进程之间交换的数据的大小可能是不对称的,并且每一对进程可能具有与其他对不同的数据模式,且交换的数据大小可能有很大差异。给定的等级只需要具有其所参与的数据交换的本地API级信息。
针对硬件实现的个体化全对全v/w算法有些类似于个体化全对全算法,但需要更多描述用以实现的详细数据长度的元数据。此外,该算法只处理低于预先指定阈值的消息。针对较大的消息,使用直接数据交换。
先前,用于全对全函数实现的算法分为两大类:
-直接数据交换
-聚合算法
基本算法定义描述了集体组或者MPI定义中的MPI通信子中所有进程对之间的数据交换。术语“基本算法”是指接口级的算法定义——从逻辑上讲函数是什么/做什么,而不是如何实现函数结果。因此,举特定的非限制性示例而言,全对全v的基本描述是每个进程向组中的所有进程发送数据块。在本发明的某些示例性实施方式中,举特定的非限制性示例而言,描述了通过聚合数据并使用本文描述的通信模式来实现特定函数的方法。总体而言,算法定义在概念上需要O(N2)次数据交换,其中N为组大小。
现参考图3,其为描绘全对全v初始阶段和最终阶段的简化图示。
图3提供了个体化全对全v的示例,示出了初始(参考标号310)阶段和最终(参考标号320)阶段。在图3中,符号(i,j)表示从位置j的等级i开始并且应当传送到位置i的等级j的数据段。所有段的数据大小可能不同(甚至可能是零长度)。发送位置和接收位置的偏移也可能不同。
所述函数的直接数据交换实现是全对全函数的最简单实现。简单的实现将许多消息放在网络上,并潜在地通过引起拥塞和端点n→1争用而严重降低网络利用率。(本文使用的术语“端点”表示向集体操作贡献数据的实体,诸如进程或线程)。因此,实现直接数据交换的算法使用如图4中所示的诸如成对交换等通信模式,(Jelena Pjevsivac-Grbovic,2007),以减少网络负载和端点争用。对于带宽有限的大型消息交换,直接数据交换算法往往会充分利用网络资源。然而,当数据交换规模小时,延迟和消息速率成本将主导整个算法成本,并随N线性扩大,并且不能很好地利用系统资源。具体而言,图4描绘了涉及进程0的交换的直接成对交换模式的非限制性示例。每个交换的长度为a,具有双向数据交换。
聚合算法(Ana Gainaru,2016)被用于实现小型数据聚合,且Bruck(J.Bruck,1997)算法可能是该类中最著名的算法。其中使用此方法涉及每个进程的数据交换的数目为O((k-1)*logk(N)),其中N为集体组大小,而k为算法基数。图5示出了一种可能的聚合模式的通信模式。具体而言,图5描绘了聚合算法发送任意基数k的侧数据模式的非限制性示例,假设N是算法基数k的整数次幂。N是集体组的大小。聚合算法提供了比直接交换更好的可扩展性特性。消息数目的减少降低了全对全操作的延迟和消息速率成本,但增加了与带宽相关的成本。如果组规模不太大,则所述聚合算法胜过直接交换算法。聚合算法中每个数据交换的消息大小规模为O(a*N/k),其中a为全对全函数消息大小。因此,当组变大时,聚合算法在降低全对全数据交换的延迟方面是无效的,并将导致超过直接数据交换算法的延迟。
在本发明的示例性实施方式中,全对全和全对全v/w算法旨在通过以下方式优化小型数据交换:
1.在网络中定义多个聚合点,交换机或者主机通道适配器(host channeladapter,HCA)。
2.针对从进程的子块去往进程的相同子块或其他子块的数据,向网络基础架构中的各个聚合器分配聚合点。这些数据可以被视为分布式矩阵的子矩阵。单个聚合器可以处理来自单个个体化全对全或全对全v/w算法的子矩阵的多个块。
3.子块可以由不连续的进程组组成,这些进程组在某些示例性实施方式中即时形成,以处理调用应用中的负载不平衡。在这样的情况下,矩阵子块可能是不连续的。
4.本文使用的术语“聚合器”是指这样的实体:其对子矩阵进行聚合,对其进行转置,并且继而将结果发送到其最终目的地。在本发明的某些示例性实施方式中,聚合器是HCA内的逻辑块。继而,本步骤4可以包括使聚合器:
a.从所有源收集数据
b.混洗数据以准备使得去往特定进程的数据可以当作单个消息发送到此目的地。在当前上下文中,术语“混洗”指的是对来自不同源进程的传入数据重新排序,使得去往给定目的地的数据能够被方便地处理。在本发明的某些示例性实施方式中,发往单个目的地的数据可以被复制到一个连续的存储器块。
c.将数据发送到目的地
5.在某些优选实施方式中,数据不连续性以及数据源和/或目的地在网络边缘处理,使得聚合器仅处理连续的打包数据。换言之,从用户发送或由用户接收的数据不需要在用户的虚拟存储器空间中是连续的;这种情况可以被看作是立方体的面,其中6个面中的2个面不会在连续的存储器地址中。硬件发送连续数据流。处理从非连续的变为连续的“打包(packing)”是在第一步完成的(通过使用CPU将数据打包到连续的缓冲区,或者通过使用HCA收集能力)。类似地,将非连续数据拆包到用户缓冲区可以通过HCA将数据传递到连续目的缓冲区并继而使用CPU拆包,或者通过使用HCA分散能力来完成。因此,中间步骤中的算法数据操纵可以处理连续的打包数据。
本发明在其示例性实施方式中,可以被视为使用网络内的聚合点从分布式矩阵的非连续部分收集数据,转置数据,并将数据发送到其目的地。
在示例性实施方式中,本发明可被概括如下:
1.数据布局被视为分布式矩阵,其中每个进程持有去往每个其他进程的数据块。对于全对全算法,所有源数据块的数据块大小是相同的;而对于全对全v/w,数据块大小可以是不同的长度,包括长度0。在本文使用的符号中,水平索引表示数据源,垂直索引表示其目的地。
2.集体操作执行数据块的转置。
3.为了对分布式矩阵进行转置,将矩阵细分为dh×dv维度的矩形子矩阵,其中dh是在水平维度的大小,而dv是在垂直维度的大小。子块不需要在逻辑上连续。子矩阵可以被预定义,或者可以在运行时基于一些准则来确定,举非限制示例而言,诸如按进入全对全操作的顺序。
4.提供数据重新打包单元,其接受来自指定源集的数据,所述数据去往指定的目的地集,按目的地重新打包数据,并将数据发送到指定的目的地。在示例性实施方式中,所述数据重新打包单元具有用于所描述的每个操作的子单元。在本发明的某些示例性实施方式中,如本文所述的聚合器将会包括或利用数据重新打包单元。
5.将子矩阵的转置分配到给定的数据重新打包单元,其中每个单元被分配多个子矩阵进行转置。在本发明的某些示例性实施方式中,所述分配可以由在下面第7点中提到的控制树来完成;备选地,可以提供另一组件(举非限制性示例而言,诸如软件组件)来完成分配。
6.数据重新打包单元可以在系统内适当地实现。例如,其可以在交换机ASIC、主机通道适配器(HCA)单元、CPU或其他合适的硬件中实现,并且能够以硬件、固件、软件或其任何适当的组合来实现。
7.使用约简树作为控制树以对集体操作进行控制,方法如下:
7.1.组中的每个进程通过向控制树传递到达通知,来向控制树注册自己。
7.2.一旦子组的所有成员到达,该子组就被标记为准备好操作(准备好发送/接收)。
7.3.当给定子矩阵的源和目的组就绪时,相关的数据重新打包单元调度数据移动。
7.4.数据从源进程传输到数据重新打包单元。该单元对数据进行重新打包并将其发送到适当的目的地。
7.5.每个源进程都会得到完成通知,每个目的进程也是。在本发明的某些示例性实施方式中,这是通过聚合器通知源块和目的块完成来实现的;举特定的非限制性示例而言,这可以使用控制树来实现。
7.6.一旦接收到所有预期的数据并完成所有源数据的传输,操作就在每个进程本地完成。
在示例性实施方式中,更详细的说明如下:
在全对全算法和全对全v/w算法中,每个进程具有去往组中每个其他进程的唯一数据块。全对全与全对全v的主要区别在于数据布局模式。全对全的数据块大小都相同,而全对全v/w算法支持不同大小的数据块,并且数据块在用户缓冲区中不必以单调递增的顺序排序。
全对全算法的数据块布局可以被视为分布式矩阵,其中全对全算法对这个块分布进行转置。需要重点注意的是,在本发明的示例性实施方式中,每个块内的数据在转置中没有重排,而只是重排了数据块本身的排序。
图6示出了大小为六的组的示例性全对全数据源数据块布局,从而展示了全对全操作的示例性初始分布。每列表示每个进程持有用于所有其他进程的数据块。每个块都用两个索引标签标记,其中第一索引指示数据源自的进程,第二索引是该块的目的进程的等级(术语“等级”,根据MPI标准使用,其中通信子(对集体进行定义的进程组)的每个成员都被给予等级或ID)。
在全对全操作被施加于图6示例中的数据之后,伴随着数据块被转置,产生图7中展示的数据块布局。
全对全v/w算法进行了类似的数据转置。这样的变换的不同之处如下:
1.各块之间的数据大小可能不同,并且甚至可能长度为零。
2.源缓冲区和目的缓冲区处的数据块都不必按目的地(源缓冲区)或源(结果缓冲区)递增顺序排列。实际的块顺序被指定为全对全v/w操作的一部分。
因此,类似的通信模式可以用于实现全对全和全对全v/w。
实际矩阵变换是对数据的子块执行。本文使用术语“实际矩阵转换”是因为当矩阵中的每个元素都是数据块时,操作所定义的数据传输块可以被视为矩阵转换。矩阵的列是每个进程拥有的数据块。每个进程具有与组中的每个进程相关联的数据块,因此可以将矩阵视为方阵。对于全对全,所有块的大小是相同的,对于全对全v和全对全w,块的大小可能不同。从数据布局的块状视图(不是每个块的实际大小)来看,全对全v和全对全w仍然是正方形。
为了进行变换,定义了水平子矩阵维数dh和垂直子矩阵维数dv。子块维数不必是整个矩阵维数的整数因数,并且dh和dv不必相等。允许有不完整的子块;也就是说,对于给定的组大小,有些子组的组大小与子块大小之比不是整数。这种情况会在边缘处产生“剩余”块。举特定的非限制性示例而言,这样的“剩余”块会在大小为11的矩阵中出现,并带有大小为3的子块。最后,整个矩阵中的值的垂直和水平范围不必是连续的,例如,当映射到整个矩阵时,这样的子矩阵可以分布到矩阵上几个不同的连续数据块中。
例如,如果我们取dh=hv=2,并且我们使用进程组{1,2},{0,3}和{4,5}来对矩阵进行分块,则图8使用编码[a]到[i]来展示整个矩阵可以如何在一个非限制性示例中被细分为2×2的子块。注意,示例中有三个分布式子块:1)数据块(0,0)(0,3)(3,0)(3,3),表示为[a];2)数据块(0,1)(0,2)(3,1)(3,2),表示为[c];以及3)(1,0)(2,0)(1,3)(2,3),表示为[b]。
在本发明的示例性实施方式中,使用约简树对整个端到端全对全进行统筹。当进程调用集体操作时,每个进程使用约简树来注册集体操作。当子组的所有成员都注册了操作时,该子组被标记为有效。当源和目的子组都有效时,该子组可以被转置。
在本发明的某些示例性实施方式中,集体操作以下列方式执行:
1.组中的每个进程通过向控制器传递到达通知,来向控制树注册自己。
2.一旦子组的所有成员到达,该子组就被标记为准备好操作。
3.当源组和目的组准备就绪时,将它们配对并分配给数据重新打包单元。
4.数据从源进程传输到数据重新打包单元。该单元对数据进行重新打包并将其发送到合适的目的地。
5.每个源进程都会得到完成通知,每个目的进程也是。
6.一旦接收到所有预期的数据并完成所有源数据的传输,操作就在每个进程本地完成。
图9示出了在非限制性的示例实施方式中,如何使用系统中的数据重新打包单元910之一来转置由水平子组{0,3}和垂直子组{1,2}定义的子矩阵。进程0和进程3各自将其子矩阵的一部分发送给数据重新打包单元,该单元重新排列数据,并发送给进程1和进程2。在图9中所示的具体非限制性示例中,进程0具有数据元素(0,1)和(0,2),进程3具有数据元素(3,1)和(3,2)。该数据被发送到控制器,所述控制器将(0,1)和(3,1)发送到进程1并将(0,2)和(3,2)发送到进程2。结果缓冲区中的最终数据放置由端点处理。通常,在示例性实施方式中,重新打包单元910将所有由其处理的数据视为连续的“一团(blob)”数据——重新打包单元910不识别数据中的任何结构。在每个块内的端点处的最终数据分布可能是连续的,在这种情况下所述重新打包单元和所述目的进程将会具有相同的数据视图。然而,目的进程处的最终数据布局可能是不连续的,在这种情况下是端点在目的地适当地分布数据。应当理解,端点或任何其他合适的系统组件可以适当地分布数据。
参考文献
Ana Gainaru,R.L.Graham,Artem Polyakov,Gilad Shainer(2016).UsingInfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All(Vol.Proceedings of the 23rd European MPI Users'Group Meeting).Edinburgh,United Kingdom:ACM
MPI Forum,(2015).Message Passing Interface.Knoxville:University ofTennessee.
J.Bruck,Ching-Tien Ho,Shlomo Kipnis,Derrick Weathersby(1997).E cientalgorithms for all-to-all communications in multi-port message-passingsystems.In IEEE Transactions on Parallel and Distributed Systems,pages 298–309.
Jelena Pjevsivac-Grbovic,Thara Angskun,Geroge Bosilca,Graham Fagg,Edgar Gabriel,Jack Dongarra,(2007).Performance analysis of MPI collectiveoperations.Cluster Computing.
应当理解,如果需要,本发明的软件组件可以以ROM(只读存储器)的形式实现。如果需要,软件组件通常可以使用传统技术以硬件实现。还应当理解,软件组件可以被实例化,例如:作为计算机程序产品或处在有形介质上。在一些情况下,有可能将软件组件实例化为可由合适的计算机解读的信号,尽管这样的实例化可能在本发明的某些实施方式中被排除在外。
应当理解,为了清楚起见,在单独的实施方式的上下文中描述的本发明各个特征也可以组合在单一实施方式中提供。反之,为简洁起见,在单一实施方式的上下文中描述的本发明各个特征也可以分开提供或以任何适当的子组合形式提供。
本领域技术人员应当理解,本发明不受上述的具体表示和描述的限制。相反,发明的范围由所附的权利要求书及其等同项确定。
Claims (14)
1.一种方法,包括:
提供多个进程,所述多个进程中的每个进程被配置成持有去往所述多个进程中的其他进程的数据块;
提供至少一个数据重新打包电路实例,包括:
接收电路,其被配置用于从所述多个进程中的至少一个源进程接收至少一个数据块;
重新打包电路,其被配置用于根据所述多个进程中的至少一个目的进程重新打包接收到的数据;以及
发送电路,其被配置用于将重新打包的数据发送到所述多个进程中的所述至少一个目的进程;
接收用于全对全数据交换的数据组,所述数据组被配置为矩阵,所述矩阵被分布在所述多个进程中;以及
通过以下方式转置所述数据:所述多个进程中的每个进程将矩阵数据从所述进程发送到所述数据重新打包电路;以及所述数据重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。
2.根据权利要求1所述的方法,还包括提供控制树,所述控制树被配置用于控制所述多个进程和所述重新打包电路。
3.根据权利要求2所述的方法,其中所述控制树还被配置用于:
从所述多个进程中的每个进程接收注册消息;
当已经从所述多个进程的给定子组的所有成员接收到注册消息时,将所述给定子组标记为准备好操作;
当作为源子组的给定子组和作为目的子组的对应子组准备好操作时,将给定的源子组和给定的目的子组配对并将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例;以及
在关于每个所述源子组和每个所述目的子组的操作完成时,通知每个所述源子组和每个所述目的子组。
4.根据权利要求3所述的方法,其中所述控制树被配置用于,除了将所述给定的源子组和所述给定的目的子组配对之外,将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
5.根据权利要求3所述的方法,还包括提供除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
6.根据权利要求2所述的方法,其中所述控制树包括约简树。
7.根据权利要求6所述的方法,还包括提供除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例。
8.一种设备,包括:
接收电路,其被配置用于从多个进程中的至少一个源进程接收至少一个数据块,所述多个进程中的每个进程被配置成持有去往所述多个进程中的其他进程的数据块;
至少一个数据重新打包电路实例,其被配置用于根据所述多个进程中的至少一个目的进程重新打包接收到的数据;以及
发送电路,其被配置用于将所述重新打包的数据发送到所述多个进程中的所述至少一个目的进程,
所述设备被配置用于接收用于全对全数据交换的数据组,所述数据组被配置为矩阵,所述矩阵被分布在所述多个进程中;以及
所述设备还被配置用于通过以下方式转置所述数据:在所述重新打包电路处从所述多个进程中的每个进程接收来自所述进程的矩阵数据;以及所述数据重新打包电路进行接收、重新打包并将所产生的矩阵数据发送到目的进程。
9.根据权利要求8所述的设备,还包括控制树,所述控制树被配置用于控制所述多个进程和所述重新打包电路。
10.根据权利要求9所述的设备,其中所述控制树还被配置用于:
从所述多个进程中的每个进程接收注册消息;
当已经从所述多个进程的给定子组的所有成员接收到注册消息时,将所述给定子组标记为准备好操作;
当作为源子组的给定子组和作为目的子组的对应子组准备好操作时,将给定的源子组和给定的目的子组配对并将所述给定的源子组和所述给定的目的子组分配给数据重新打包电路实例;以及
在关于每个所述源子组和每个所述目的子组的操作完成时,通知每个所述源子组和每个所述目的子组。
11.根据权利要求10所述的设备,其中所述控制树被配置用于,除了将所述给定的源子组和所述给定的目的子组配对之外,将所述给定的源子组和所述给定的目的子组分配到给定的数据重新打包电路实例。
12.根据权利要求10所述的设备,还包括除了所述控制树之外的分配电路,所述分配电路被配置用于将所述给定的源子组和所述给定的目的子组分配到给定的数据重新打包电路实例。
13.根据权利要求12所述的设备,其中所述控制树包括约简树。
14.根据权利要求9所述的设备,其中所述控制树包括约简树。
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962809786P | 2019-02-25 | 2019-02-25 | |
US62/809,786 | 2019-02-25 | ||
EP20156490.3 | 2020-02-10 | ||
EP20156490.3A EP3699770A1 (en) | 2019-02-25 | 2020-02-10 | Collective communication system and methods |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111614581A true CN111614581A (zh) | 2020-09-01 |
CN111614581B CN111614581B (zh) | 2022-07-05 |
Family
ID=69645874
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010117006.2A Active CN111614581B (zh) | 2019-02-25 | 2020-02-25 | 集体通信系统和方法 |
Country Status (3)
Country | Link |
---|---|
US (3) | US11196586B2 (zh) |
EP (1) | EP3699770A1 (zh) |
CN (1) | CN111614581B (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3699770A1 (en) | 2019-02-25 | 2020-08-26 | Mellanox Technologies TLV Ltd. | Collective communication system and methods |
US11750699B2 (en) | 2020-01-15 | 2023-09-05 | Mellanox Technologies, Ltd. | Small message aggregation |
US11876885B2 (en) | 2020-07-02 | 2024-01-16 | Mellanox Technologies, Ltd. | Clock queue with arming and/or self-arming features |
US11836549B2 (en) * | 2020-10-15 | 2023-12-05 | Advanced Micro Devices, Inc. | Fast block-based parallel message passing interface transpose |
US11556378B2 (en) | 2020-12-14 | 2023-01-17 | Mellanox Technologies, Ltd. | Offloading execution of a multi-task parameter-dependent operation to a network device |
US11934332B2 (en) | 2022-02-01 | 2024-03-19 | Mellanox Technologies, Ltd. | Data shuffle offload |
US11922237B1 (en) | 2022-09-12 | 2024-03-05 | Mellanox Technologies, Ltd. | Single-step collective operations |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101267448A (zh) * | 2008-05-09 | 2008-09-17 | 东北大学 | 一种基于嵌入式qnx操作系统的智能规约转换装置及方法 |
US20100017420A1 (en) * | 2008-07-21 | 2010-01-21 | International Business Machines Corporation | Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations |
CN101854556A (zh) * | 2009-03-30 | 2010-10-06 | 索尼公司 | 信息处理设备和方法 |
CN102915031A (zh) * | 2012-10-25 | 2013-02-06 | 中国科学技术大学 | 并联机器人运动学参数的智能自标定系统 |
CN104662855A (zh) * | 2012-06-25 | 2015-05-27 | 科希尔技术股份有限公司 | 正交时频移动通信系统中的调制和均衡 |
US20150193269A1 (en) * | 2014-01-06 | 2015-07-09 | International Business Machines Corporation | Executing an all-to-allv operation on a parallel computer that includes a plurality of compute nodes |
Family Cites Families (276)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB8704883D0 (en) | 1987-03-03 | 1987-04-08 | Hewlett Packard Co | Secure information storage |
US5068877A (en) | 1990-04-02 | 1991-11-26 | At&T Bell Laboratories | Method for synchronizing interconnected digital equipment |
US5353412A (en) | 1990-10-03 | 1994-10-04 | Thinking Machines Corporation | Partition control circuit for separately controlling message sending of nodes of tree-shaped routing network to divide the network into a number of partitions |
US5325500A (en) | 1990-12-14 | 1994-06-28 | Xerox Corporation | Parallel processing units on a substrate, each including a column of memory |
WO1993007691A1 (en) | 1991-10-01 | 1993-04-15 | Norand Corporation | A radio frequency local area network |
JPH0752437B2 (ja) | 1991-08-07 | 1995-06-05 | インターナショナル・ビジネス・マシーンズ・コーポレイション | メッセージの進行を追跡する複数ノード・ネットワーク |
US5408469A (en) | 1993-07-22 | 1995-04-18 | Synoptics Communications, Inc. | Routing device utilizing an ATM switch as a multi-channel backplane in a communication network |
US6072796A (en) | 1995-06-14 | 2000-06-06 | Avid Technology, Inc. | Apparatus and method for accessing memory in a TDM network |
US5606703A (en) | 1995-12-06 | 1997-02-25 | International Business Machines Corporation | Interrupt protocol system and method using priority-arranged queues of interrupt status block control data structures |
US5944779A (en) | 1996-07-02 | 1999-08-31 | Compbionics, Inc. | Cluster of workstations for solving compute-intensive applications by exchanging interim computation results using a two phase communication protocol |
US6041049A (en) | 1997-05-06 | 2000-03-21 | International Business Machines Corporation | Method and apparatus for determining a routing table for each node in a distributed nodal system |
US6434620B1 (en) | 1998-08-27 | 2002-08-13 | Alacritech, Inc. | TCP/IP offload network interface device |
US6381682B2 (en) | 1998-06-10 | 2002-04-30 | Compaq Information Technologies Group, L.P. | Method and apparatus for dynamically sharing memory in a multiprocessor system |
US6438137B1 (en) | 1997-12-22 | 2002-08-20 | Nms Communications Corporation | Packet-based trunking |
US6115394A (en) | 1998-03-04 | 2000-09-05 | Ericsson Inc. | Methods, apparatus and computer program products for packet transport over wireless communication links |
US6507562B1 (en) | 1998-06-30 | 2003-01-14 | Sun Microsystems, Inc. | Dynamic optimization for receivers using distance between a repair head and a member station in a repair group for receivers having a closely knit topological arrangement to locate repair heads near the member stations which they serve in tree based repair in reliable multicast protocol |
US20190116159A9 (en) | 1998-10-30 | 2019-04-18 | Virnetx, Inc. | Agile protocol for secure communications with assured system availability |
US7418504B2 (en) | 1998-10-30 | 2008-08-26 | Virnetx, Inc. | Agile network protocol for secure communications using secure domain names |
US10511573B2 (en) | 1998-10-30 | 2019-12-17 | Virnetx, Inc. | Agile network protocol for secure communications using secure domain names |
US6483804B1 (en) | 1999-03-01 | 2002-11-19 | Sun Microsystems, Inc. | Method and apparatus for dynamic packet batching with a high performance network interface |
US7102998B1 (en) | 1999-03-22 | 2006-09-05 | Lucent Technologies Inc. | Scaleable congestion control method for multicast communications over a data network |
US6370502B1 (en) | 1999-05-27 | 2002-04-09 | America Online, Inc. | Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec |
JP2003503787A (ja) | 1999-06-25 | 2003-01-28 | マッシブリー パラレル コンピューティング, インコーポレイテッド | 大規模集合ネットワークによる処理システムおよびその方法 |
US7124180B1 (en) | 2000-04-27 | 2006-10-17 | Hewlett-Packard Development Company, L.P. | Internet usage data recording system and method employing a configurable rule engine for the processing and correlation of network data |
JP2001313639A (ja) | 2000-04-27 | 2001-11-09 | Nec Corp | ネットワーク構成データ管理システム及び方法並びに記録媒体 |
US6728862B1 (en) | 2000-05-22 | 2004-04-27 | Gazelle Technology Corporation | Processor array and parallel data processing methods |
US7171484B1 (en) | 2000-05-24 | 2007-01-30 | Krause Michael R | Reliable datagram transport service |
US7418470B2 (en) | 2000-06-26 | 2008-08-26 | Massively Parallel Technologies, Inc. | Parallel processing systems and method |
US7164422B1 (en) | 2000-07-28 | 2007-01-16 | Ab Initio Software Corporation | Parameterized graphs with conditional components |
US6816492B1 (en) | 2000-07-31 | 2004-11-09 | Cisco Technology, Inc. | Resequencing packets at output ports without errors using packet timestamps and timestamp floors |
US6937576B1 (en) | 2000-10-17 | 2005-08-30 | Cisco Technology, Inc. | Multiple instance spanning tree protocol |
US20020150094A1 (en) | 2000-10-27 | 2002-10-17 | Matthew Cheng | Hierarchical level-based internet protocol multicasting |
US7346698B2 (en) | 2000-12-20 | 2008-03-18 | G. W. Hannaway & Associates | Webcasting method and system for time-based synchronization of multiple, independent media streams |
CA2437629A1 (en) | 2001-02-24 | 2002-09-06 | International Business Machines Corporation | Arithmetic functions in torus and tree networks |
EP1381959A4 (en) | 2001-02-24 | 2008-10-29 | Ibm | GLOBAL ARBORESCENT NETWORK FOR CALCULATION STRUCTURES |
US20020152328A1 (en) | 2001-04-11 | 2002-10-17 | Mellanox Technologies, Ltd. | Network adapter with shared database for message context information |
US8051212B2 (en) | 2001-04-11 | 2011-11-01 | Mellanox Technologies Ltd. | Network interface adapter with shared data send resources |
EP1265124B1 (de) | 2001-06-07 | 2004-05-19 | Siemens Aktiengesellschaft | Verfahren zum Übermitteln von Zeitinformation über ein Datenpaketnetz |
US20030018828A1 (en) | 2001-06-29 | 2003-01-23 | International Business Machines Corporation | Infiniband mixed semantic ethernet I/O path |
US7383421B2 (en) | 2002-12-05 | 2008-06-03 | Brightscale, Inc. | Cellular engine for a data processing system |
US6789143B2 (en) | 2001-09-24 | 2004-09-07 | International Business Machines Corporation | Infiniband work and completion queue management via head and tail circular buffers with indirect work queue entries |
US20030065856A1 (en) | 2001-10-03 | 2003-04-03 | Mellanox Technologies Ltd. | Network adapter with multiple event queues |
US6754735B2 (en) | 2001-12-21 | 2004-06-22 | Agere Systems Inc. | Single descriptor scatter gather data transfer to or from a host processor |
US7224669B2 (en) | 2002-01-22 | 2007-05-29 | Mellandx Technologies Ltd. | Static flow rate control |
US7245627B2 (en) | 2002-04-23 | 2007-07-17 | Mellanox Technologies Ltd. | Sharing a network interface card among multiple hosts |
US7370117B2 (en) | 2002-09-26 | 2008-05-06 | Intel Corporation | Communication system and method for communicating frames of management information in a multi-station network |
US7167850B2 (en) | 2002-10-10 | 2007-01-23 | Ab Initio Software Corporation | Startup and control of graph-based computation |
US7310343B2 (en) | 2002-12-20 | 2007-12-18 | Hewlett-Packard Development Company, L.P. | Systems and methods for rapid selection of devices in a tree topology network |
US7584303B2 (en) | 2002-12-20 | 2009-09-01 | Forte 10 Networks, Inc. | Lossless, stateful, real-time pattern matching with deterministic memory resources |
US20040252685A1 (en) | 2003-06-13 | 2004-12-16 | Mellanox Technologies Ltd. | Channel adapter with integrated switch |
US20040260683A1 (en) | 2003-06-20 | 2004-12-23 | Chee-Yong Chan | Techniques for information dissemination using tree pattern subscriptions and aggregation thereof |
US20050097300A1 (en) | 2003-10-30 | 2005-05-05 | International Business Machines Corporation | Processing system and method including a dedicated collective offload engine providing collective processing in a distributed computing environment |
US7810093B2 (en) | 2003-11-14 | 2010-10-05 | Lawrence Livermore National Security, Llc | Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes |
US7219170B2 (en) | 2003-12-04 | 2007-05-15 | Intel Corporation | Burst transfer register arrangement |
US20050129039A1 (en) | 2003-12-11 | 2005-06-16 | International Business Machines Corporation | RDMA network interface controller with cut-through implementation for aligned DDP segments |
US7680670B2 (en) | 2004-01-30 | 2010-03-16 | France Telecom | Dimensional vector and variable resolution quantization |
US7327693B1 (en) | 2004-03-30 | 2008-02-05 | Cisco Technology, Inc. | Method and apparatus for precisely measuring a packet transmission time |
US20050223118A1 (en) | 2004-04-05 | 2005-10-06 | Ammasso, Inc. | System and method for placement of sharing physical buffer lists in RDMA communication |
US8041799B1 (en) | 2004-04-30 | 2011-10-18 | Sprint Communications Company L.P. | Method and system for managing alarms in a communications network |
JP4156568B2 (ja) | 2004-06-21 | 2008-09-24 | 富士通株式会社 | 通信システムの制御方法、通信制御装置、プログラム |
US7624163B2 (en) | 2004-10-21 | 2009-11-24 | Apple Inc. | Automatic configuration information generation for distributed computing environment |
US7336646B2 (en) | 2004-10-26 | 2008-02-26 | Nokia Corporation | System and method for synchronizing a transport stream in a single frequency network |
US7356625B2 (en) | 2004-10-29 | 2008-04-08 | International Business Machines Corporation | Moving, resizing, and memory management for producer-consumer queues by consuming and storing any queue entries from an old queue before entries from a new queue |
US7555549B1 (en) | 2004-11-07 | 2009-06-30 | Qlogic, Corporation | Clustered computing model and display |
US8698817B2 (en) | 2004-11-15 | 2014-04-15 | Nvidia Corporation | Video processor having scalar and vector components |
US7620071B2 (en) | 2004-11-16 | 2009-11-17 | Intel Corporation | Packet coalescing |
US7613774B1 (en) | 2005-03-01 | 2009-11-03 | Sun Microsystems, Inc. | Chaperones in a distributed system |
US20060282838A1 (en) | 2005-06-08 | 2006-12-14 | Rinku Gupta | MPI-aware networking infrastructure |
US7770088B2 (en) | 2005-12-02 | 2010-08-03 | Intel Corporation | Techniques to transmit network protocol units |
US7817580B2 (en) | 2005-12-07 | 2010-10-19 | Cisco Technology, Inc. | Preventing transient loops in broadcast/multicast trees during distribution of link state information |
US7760743B2 (en) | 2006-03-06 | 2010-07-20 | Oracle America, Inc. | Effective high availability cluster management and effective state propagation for failure recovery in high availability clusters |
US7743087B1 (en) | 2006-03-22 | 2010-06-22 | The Math Works, Inc. | Partitioning distributed arrays according to criterion and functions applied to the distributed arrays |
US8074026B2 (en) | 2006-05-10 | 2011-12-06 | Intel Corporation | Scatter-gather intelligent memory architecture for unstructured streaming data on multiprocessor systems |
US7996583B2 (en) | 2006-08-31 | 2011-08-09 | Cisco Technology, Inc. | Multiple context single logic virtual host channel adapter supporting multiple transport protocols |
CN101163240A (zh) | 2006-10-13 | 2008-04-16 | 国际商业机器公司 | 一种滤波装置及其方法 |
US8094585B2 (en) | 2006-10-31 | 2012-01-10 | International Business Machines Corporation | Membership management of network nodes |
US7895601B2 (en) | 2007-01-10 | 2011-02-22 | International Business Machines Corporation | Collective send operations on a system area network |
US7949890B2 (en) | 2007-01-31 | 2011-05-24 | Net Power And Light, Inc. | Method and system for precise synchronization of audio and video streams during a distributed communication session with multiple participants |
US8380880B2 (en) | 2007-02-02 | 2013-02-19 | The Mathworks, Inc. | Scalable architecture |
US7913077B2 (en) | 2007-02-13 | 2011-03-22 | International Business Machines Corporation | Preventing IP spoofing and facilitating parsing of private data areas in system area network connection requests |
US7835391B2 (en) | 2007-03-07 | 2010-11-16 | Texas Instruments Incorporated | Protocol DMA engine |
CN101282276B (zh) | 2007-04-03 | 2011-11-09 | 华为技术有限公司 | 一种以太网树业务的保护方法及设备 |
US7752421B2 (en) | 2007-04-19 | 2010-07-06 | International Business Machines Corporation | Parallel-prefix broadcast for a parallel-prefix operation on a parallel computer |
US8768898B1 (en) * | 2007-04-26 | 2014-07-01 | Netapp, Inc. | Performing direct data manipulation on a storage device |
US8539498B2 (en) | 2007-05-17 | 2013-09-17 | Alcatel Lucent | Interprocess resource-based dynamic scheduling system and method |
US8068429B2 (en) | 2007-05-31 | 2011-11-29 | Ixia | Transmit scheduling |
US7856551B2 (en) | 2007-06-05 | 2010-12-21 | Intel Corporation | Dynamically discovering a system topology |
US7738443B2 (en) | 2007-06-26 | 2010-06-15 | International Business Machines Corporation | Asynchronous broadcast for ordered delivery between compute nodes in a parallel computing system where packet header space is limited |
US8090704B2 (en) | 2007-07-30 | 2012-01-03 | International Business Machines Corporation | Database retrieval with a non-unique key on a parallel computer system |
US8108545B2 (en) | 2007-08-27 | 2012-01-31 | International Business Machines Corporation | Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture |
US7793158B2 (en) | 2007-08-27 | 2010-09-07 | International Business Machines Corporation | Providing reliability of communication between supernodes of a multi-tiered full-graph interconnect architecture |
US7958183B2 (en) | 2007-08-27 | 2011-06-07 | International Business Machines Corporation | Performing collective operations using software setup and partial software execution at leaf nodes in a multi-tiered full-graph interconnect architecture |
US8085686B2 (en) | 2007-09-27 | 2011-12-27 | Cisco Technology, Inc. | Aggregation and propagation of sensor data within neighbor discovery messages in a tree-based ad hoc network |
US8527590B2 (en) | 2008-01-16 | 2013-09-03 | Janos Tapolcai | Solving mixed integer programs with peer-to-peer applications |
US7801024B2 (en) | 2008-02-12 | 2010-09-21 | At&T Intellectual Property Ii, L.P. | Restoring aggregated circuits with circuit integrity checks in a hierarchical network |
US7991857B2 (en) | 2008-03-24 | 2011-08-02 | International Business Machines Corporation | Broadcasting a message in a parallel computer |
US8375197B2 (en) | 2008-05-21 | 2013-02-12 | International Business Machines Corporation | Performing an allreduce operation on a plurality of compute nodes of a parallel computer |
US7948979B2 (en) | 2008-05-28 | 2011-05-24 | Intel Corporation | Programmable network interface card |
US7944946B2 (en) | 2008-06-09 | 2011-05-17 | Fortinet, Inc. | Virtual memory protocol segmentation offloading |
US7797445B2 (en) | 2008-06-26 | 2010-09-14 | International Business Machines Corporation | Dynamic network link selection for transmitting a message between compute nodes of a parallel computer |
US7865693B2 (en) | 2008-10-14 | 2011-01-04 | International Business Machines Corporation | Aligning precision converted vector data using mask indicating offset relative to element boundary corresponding to precision type |
US20190377580A1 (en) | 2008-10-15 | 2019-12-12 | Hyperion Core Inc. | Execution of instructions based on processor and data availability |
US9853922B2 (en) | 2012-02-24 | 2017-12-26 | Sococo, Inc. | Virtual area communications |
US8370675B2 (en) | 2009-01-28 | 2013-02-05 | Mellanox Technologies Ltd. | Precise clock synchronization |
US8239847B2 (en) | 2009-03-18 | 2012-08-07 | Microsoft Corporation | General distributed reduction for data parallel computing |
US8255475B2 (en) | 2009-04-28 | 2012-08-28 | Mellanox Technologies Ltd. | Network interface device with memory management capabilities |
US9596186B2 (en) | 2009-06-30 | 2017-03-14 | Oracle America, Inc. | Multiple processes sharing a single infiniband connection |
US8447954B2 (en) | 2009-09-04 | 2013-05-21 | International Business Machines Corporation | Parallel pipelined vector reduction in a data processing system |
US8321454B2 (en) | 2009-09-14 | 2012-11-27 | Myspace Llc | Double map reduce distributed computing framework |
US8838907B2 (en) | 2009-10-07 | 2014-09-16 | Hewlett-Packard Development Company, L.P. | Notification protocol based endpoint caching of host memory |
EP2488963A1 (en) | 2009-10-15 | 2012-08-22 | Rogers Communications Inc. | System and method for phrase identification |
US9110860B2 (en) | 2009-11-11 | 2015-08-18 | Mellanox Technologies Tlv Ltd. | Topology-aware fabric-based offloading of collective functions |
US8571834B2 (en) | 2010-01-08 | 2013-10-29 | International Business Machines Corporation | Opcode counting for performance measurement |
US10158702B2 (en) | 2009-11-15 | 2018-12-18 | Mellanox Technologies, Ltd. | Network operation offloading for collective operations |
US8811417B2 (en) | 2009-11-15 | 2014-08-19 | Mellanox Technologies Ltd. | Cross-channel network operation offloading for collective operations |
US8213315B2 (en) | 2009-11-19 | 2012-07-03 | Mellanox Technologies Ltd. | Dynamically-connected transport service |
US9081501B2 (en) | 2010-01-08 | 2015-07-14 | International Business Machines Corporation | Multi-petascale highly efficient parallel supercomputer |
US8751655B2 (en) | 2010-03-29 | 2014-06-10 | International Business Machines Corporation | Collective acceleration unit tree structure |
US8332460B2 (en) | 2010-04-14 | 2012-12-11 | International Business Machines Corporation | Performing a local reduction operation on a parallel computer |
US8555265B2 (en) | 2010-05-04 | 2013-10-08 | Google Inc. | Parallel processing of data |
US9406336B2 (en) | 2010-08-26 | 2016-08-02 | Blast Motion Inc. | Multi-sensor event detection system |
US9253248B2 (en) | 2010-11-15 | 2016-02-02 | Interactic Holdings, Llc | Parallel information system utilizing flow control and virtual channels |
US9552206B2 (en) | 2010-11-18 | 2017-01-24 | Texas Instruments Incorporated | Integrated circuit with control node circuitry and processing circuitry |
US8490112B2 (en) | 2010-12-03 | 2013-07-16 | International Business Machines Corporation | Data communications for a collective operation in a parallel active messaging interface of a parallel computer |
US9258390B2 (en) | 2011-07-29 | 2016-02-09 | Solarflare Communications, Inc. | Reducing network latency |
JP5776267B2 (ja) | 2011-03-29 | 2015-09-09 | 日本電気株式会社 | 分散ファイルシステム |
US9619301B2 (en) | 2011-04-06 | 2017-04-11 | Telefonaktiebolaget L M Ericsson (Publ) | Multi-core memory model and speculative mode processor management |
FR2979719B1 (fr) | 2011-09-02 | 2014-07-25 | Thales Sa | Systeme de communications permettant la transmission de signaux entre des equipements terminaux raccordes a des equipements intermediaires relies a un reseau ethernet |
US8645663B2 (en) | 2011-09-12 | 2014-02-04 | Mellanox Technologies Ltd. | Network interface controller with flexible memory handling |
US9009686B2 (en) | 2011-11-07 | 2015-04-14 | Nvidia Corporation | Algorithm for 64-bit address mode optimization |
US9397960B2 (en) | 2011-11-08 | 2016-07-19 | Mellanox Technologies Ltd. | Packet steering |
US8694701B2 (en) | 2011-12-15 | 2014-04-08 | Mellanox Technologies Ltd. | Recovering dropped instructions in a network interface controller |
KR20130068849A (ko) | 2011-12-16 | 2013-06-26 | 한국전자통신연구원 | 이종 네트워크로 구성된 네트워크 환경에서 장치들간의 계층적 메시지 전송을 위한 시스템 및 그 방법 |
CN103297172B (zh) | 2012-02-24 | 2016-12-21 | 华为技术有限公司 | 分组聚合的数据传输方法、接入点、中继节点和数据节点 |
JP2015511074A (ja) | 2012-03-23 | 2015-04-13 | 日本電気株式会社 | 通信のためのシステム及び方法 |
US10387448B2 (en) | 2012-05-15 | 2019-08-20 | Splunk Inc. | Replication of summary data in a clustered computing environment |
US9158602B2 (en) | 2012-05-21 | 2015-10-13 | Intermational Business Machines Corporation | Processing posted receive commands in a parallel computer |
US8972986B2 (en) | 2012-05-25 | 2015-03-03 | International Business Machines Corporation | Locality-aware resource allocation for cloud computing |
WO2013180738A1 (en) | 2012-06-02 | 2013-12-05 | Intel Corporation | Scatter using index array and finite state machine |
US9123219B2 (en) | 2012-06-19 | 2015-09-01 | Honeywell International Inc. | Wireless fire system based on open standard wireless protocols |
US8761189B2 (en) | 2012-06-28 | 2014-06-24 | Mellanox Technologies Ltd. | Responding to dynamically-connected transport requests |
US9002970B2 (en) | 2012-07-12 | 2015-04-07 | International Business Machines Corporation | Remote direct memory access socket aggregation |
US11403317B2 (en) | 2012-07-26 | 2022-08-02 | Mongodb, Inc. | Aggregation framework system architecture and method |
US8887056B2 (en) | 2012-08-07 | 2014-11-11 | Advanced Micro Devices, Inc. | System and method for configuring cloud computing systems |
JP5939305B2 (ja) | 2012-09-07 | 2016-06-22 | 富士通株式会社 | 情報処理装置,並列計算機システム及び情報処理装置の制御方法 |
US9842046B2 (en) * | 2012-09-28 | 2017-12-12 | Intel Corporation | Processing memory access instructions that have duplicate memory indices |
US9424214B2 (en) | 2012-09-28 | 2016-08-23 | Mellanox Technologies Ltd. | Network interface controller with direct connection to host memory |
US9606961B2 (en) | 2012-10-30 | 2017-03-28 | Intel Corporation | Instruction and logic to provide vector compress and rotate functionality |
US9160607B1 (en) | 2012-11-09 | 2015-10-13 | Cray Inc. | Method and apparatus for deadlock avoidance |
US10049061B2 (en) | 2012-11-12 | 2018-08-14 | International Business Machines Corporation | Active memory device gather, scatter, and filter |
US9411584B2 (en) | 2012-12-29 | 2016-08-09 | Intel Corporation | Methods, apparatus, instructions, and logic to provide vector address conflict detection functionality |
US10218808B2 (en) | 2014-10-20 | 2019-02-26 | PlaceIQ, Inc. | Scripting distributed, parallel programs |
US10275375B2 (en) | 2013-03-10 | 2019-04-30 | Mellanox Technologies, Ltd. | Network interface controller with compression capabilities |
US11966355B2 (en) | 2013-03-10 | 2024-04-23 | Mellanox Technologies, Ltd. | Network adapter with a common queue for both networking and data manipulation work requests |
US9275014B2 (en) | 2013-03-13 | 2016-03-01 | Qualcomm Incorporated | Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods |
US9495154B2 (en) | 2013-03-13 | 2016-11-15 | Qualcomm Incorporated | Vector processing engines having programmable data path configurations for providing multi-mode vector processing, and related vector processors, systems, and methods |
US9952975B2 (en) | 2013-04-30 | 2018-04-24 | Hewlett Packard Enterprise Development Lp | Memory network to route memory traffic and I/O traffic |
US9384168B2 (en) | 2013-06-11 | 2016-07-05 | Analog Devices Global | Vector matrix product accelerator for microprocessor integration |
US9817742B2 (en) | 2013-06-25 | 2017-11-14 | Dell International L.L.C. | Detecting hardware and software problems in remote systems |
US9541947B2 (en) | 2013-08-07 | 2017-01-10 | General Electric Company | Time protocol based timing system for time-of-flight instruments |
GB2518425A (en) | 2013-09-20 | 2015-03-25 | Tcs John Huxley Europ Ltd | Messaging system |
WO2015051387A1 (en) | 2013-10-11 | 2015-04-16 | Fts Computertechnik Gmbh | Method for executing tasks in a computer network |
CA2867589A1 (en) | 2013-10-15 | 2015-04-15 | Coho Data Inc. | Systems, methods and devices for implementing data management in a distributed data storage system |
US9977676B2 (en) | 2013-11-15 | 2018-05-22 | Qualcomm Incorporated | Vector processing engines (VPEs) employing reordering circuitry in data flow paths between execution units and vector data memory to provide in-flight reordering of output vector data stored to vector data memory, and related vector processor systems and methods |
US20150143076A1 (en) | 2013-11-15 | 2015-05-21 | Qualcomm Incorporated | VECTOR PROCESSING ENGINES (VPEs) EMPLOYING DESPREADING CIRCUITRY IN DATA FLOW PATHS BETWEEN EXECUTION UNITS AND VECTOR DATA MEMORY TO PROVIDE IN-FLIGHT DESPREADING OF SPREAD-SPECTRUM SEQUENCES, AND RELATED VECTOR PROCESSING INSTRUCTIONS, SYSTEMS, AND METHODS |
US9792118B2 (en) | 2013-11-15 | 2017-10-17 | Qualcomm Incorporated | Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption, and related vector processor systems and methods |
US9684509B2 (en) | 2013-11-15 | 2017-06-20 | Qualcomm Incorporated | Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory, and related vector processing instructions, systems, and methods |
US9880845B2 (en) | 2013-11-15 | 2018-01-30 | Qualcomm Incorporated | Vector processing engines (VPEs) employing format conversion circuitry in data flow paths between vector data memory and execution units to provide in-flight format-converting of input vector data to execution units for vector processing operations, and related vector processor systems and methods |
US9619227B2 (en) | 2013-11-15 | 2017-04-11 | Qualcomm Incorporated | Vector processing engines (VPEs) employing tapped-delay line(s) for providing precision correlation / covariance vector processing operations with reduced sample re-fetching and power consumption, and related vector processor systems and methods |
US10305980B1 (en) | 2013-11-27 | 2019-05-28 | Intellectual Property Systems, LLC | Arrangements for communicating data in a computing system using multiple processors |
JP6152786B2 (ja) | 2013-11-29 | 2017-06-28 | 富士通株式会社 | 通信制御装置、情報処理装置、並列計算機システム、制御プログラム、及び並列計算機システムの制御方法 |
GB2521441B (en) | 2013-12-20 | 2016-04-20 | Imagination Tech Ltd | Packet loss mitigation |
US9563426B1 (en) | 2013-12-30 | 2017-02-07 | EMC IP Holding Company LLC | Partitioned key-value store with atomic memory operations |
US9355061B2 (en) | 2014-01-28 | 2016-05-31 | Arm Limited | Data processing apparatus and method for performing scan operations |
US9696942B2 (en) | 2014-03-17 | 2017-07-04 | Mellanox Technologies, Ltd. | Accessing remote storage devices using a local bus protocol |
US9925492B2 (en) | 2014-03-24 | 2018-03-27 | Mellanox Technologies, Ltd. | Remote transactional memory |
US9442968B2 (en) | 2014-03-31 | 2016-09-13 | Sap Se | Evaluation of variant configuration using in-memory technology |
US10339079B2 (en) | 2014-06-02 | 2019-07-02 | Western Digital Technologies, Inc. | System and method of interleaving data retrieved from first and second buffers |
US9350825B2 (en) | 2014-06-16 | 2016-05-24 | International Business Machines Corporation | Optimizing network communications |
US20150379022A1 (en) | 2014-06-27 | 2015-12-31 | General Electric Company | Integrating Execution of Computing Analytics within a Mapreduce Processing Environment |
WO2016057783A1 (en) | 2014-10-08 | 2016-04-14 | Interactic Holdings, Llc | Fast fourier transform using a distributed computing system |
US9756154B1 (en) | 2014-10-13 | 2017-09-05 | Xilinx, Inc. | High throughput packet state processing |
US10331595B2 (en) | 2014-10-23 | 2019-06-25 | Mellanox Technologies, Ltd. | Collaborative hardware interaction by multiple entities using a shared queue |
US10904122B2 (en) | 2014-10-28 | 2021-01-26 | Salesforce.Com, Inc. | Facilitating workload-aware shuffling and management of message types in message queues in an on-demand services environment |
GB2549883A (en) | 2014-12-15 | 2017-11-01 | Hyperion Core Inc | Advanced processor architecture |
US9851970B2 (en) | 2014-12-23 | 2017-12-26 | Intel Corporation | Method and apparatus for performing reduction operations on a set of vector elements |
US9674071B2 (en) | 2015-02-20 | 2017-06-06 | Telefonaktiebolaget Lm Ericsson (Publ) | High-precision packet train generation |
US20160295426A1 (en) | 2015-03-30 | 2016-10-06 | Nokia Solutions And Networks Oy | Method and system for communication networks |
US10425350B1 (en) | 2015-04-06 | 2019-09-24 | EMC IP Holding Company LLC | Distributed catalog service for data processing platform |
US10541938B1 (en) | 2015-04-06 | 2020-01-21 | EMC IP Holding Company LLC | Integration of distributed data processing platform with one or more distinct supporting platforms |
US10277668B1 (en) | 2015-04-06 | 2019-04-30 | EMC IP Holding Company LLC | Beacon-based distributed data processing platform |
US10282347B2 (en) | 2015-04-08 | 2019-05-07 | Louisana State University Research & Technology Foundation | Architecture for configuration of a reconfigurable integrated circuit |
ES2929626T3 (es) | 2015-05-21 | 2022-11-30 | Goldman Sachs & Co Llc | Arquitectura de computación paralela de propósito general |
US10210134B2 (en) | 2015-05-21 | 2019-02-19 | Goldman Sachs & Co. LLC | General-purpose parallel computing architecture |
US10320695B2 (en) | 2015-05-29 | 2019-06-11 | Advanced Micro Devices, Inc. | Message aggregation, combining and compression for efficient data communications in GPU-based clusters |
US10027601B2 (en) | 2015-06-03 | 2018-07-17 | Mellanox Technologies, Ltd. | Flow-based packet modification |
US10042794B2 (en) | 2015-06-12 | 2018-08-07 | Apple Inc. | Methods and apparatus for synchronizing uplink and downlink transactions on an inter-device communication link |
US10284383B2 (en) | 2015-08-31 | 2019-05-07 | Mellanox Technologies, Ltd. | Aggregation protocol |
US20170072876A1 (en) | 2015-09-14 | 2017-03-16 | Broadcom Corporation | Hardware-Accelerated Protocol Conversion in an Automotive Gateway Controller |
WO2017053468A1 (en) * | 2015-09-21 | 2017-03-30 | Dolby Laboratories Licensing Corporation | Efficient delivery of customized content over intelligent network |
US10063474B2 (en) | 2015-09-29 | 2018-08-28 | Keysight Technologies Singapore (Holdings) Pte Ltd | Parallel match processing of network packets to identify packet data for masking or other actions |
US20170116154A1 (en) | 2015-10-23 | 2017-04-27 | The Intellisis Corporation | Register communication in a network-on-a-chip architecture |
CN105528191B (zh) | 2015-12-01 | 2017-04-12 | 中国科学院计算技术研究所 | 数据累加装置、方法及数字信号处理装置 |
US10498654B2 (en) | 2015-12-28 | 2019-12-03 | Amazon Technologies, Inc. | Multi-path transport design |
US9985903B2 (en) | 2015-12-29 | 2018-05-29 | Amazon Technologies, Inc. | Reliable, out-of-order receipt of packets |
US9985904B2 (en) | 2015-12-29 | 2018-05-29 | Amazon Technolgies, Inc. | Reliable, out-of-order transmission of packets |
US11044183B2 (en) | 2015-12-29 | 2021-06-22 | Xilinx, Inc. | Network interface device |
US20170192782A1 (en) | 2015-12-30 | 2017-07-06 | Robert Valentine | Systems, Apparatuses, and Methods for Aggregate Gather and Stride |
US10187400B1 (en) | 2016-02-23 | 2019-01-22 | Area 1 Security, Inc. | Packet filters in security appliances with modes and intervals |
US10521283B2 (en) | 2016-03-07 | 2019-12-31 | Mellanox Technologies, Ltd. | In-node aggregation and disaggregation of MPI alltoall and alltoallv collectives |
WO2017155545A1 (en) | 2016-03-11 | 2017-09-14 | Tektronix Texas, Llc. | Timestamping data received by monitoring system in nfv |
CN107181724B (zh) | 2016-03-11 | 2021-02-12 | 华为技术有限公司 | 一种协同流的识别方法、系统以及使用该方法的服务器 |
KR102564165B1 (ko) | 2016-04-25 | 2023-08-04 | 삼성전자주식회사 | 비휘발성 메모리 익스프레스 컨트롤러에 의한 입출력 큐 관리 방법 |
US20190339688A1 (en) | 2016-05-09 | 2019-11-07 | Strong Force Iot Portfolio 2016, Llc | Methods and systems for data collection, learning, and streaming of machine signals for analytics and maintenance using the industrial internet of things |
US10320952B2 (en) | 2016-05-16 | 2019-06-11 | Mellanox Technologies Tlv Ltd. | System-wide synchronized switch-over of multicast flows |
US20170344589A1 (en) | 2016-05-26 | 2017-11-30 | Hewlett Packard Enterprise Development Lp | Output vector generation from feature vectors representing data objects of a physical system |
US10748210B2 (en) | 2016-08-09 | 2020-08-18 | Chicago Mercantile Exchange Inc. | Systems and methods for coordinating processing of scheduled instructions across multiple components |
US10810484B2 (en) | 2016-08-12 | 2020-10-20 | Xilinx, Inc. | Hardware accelerator for compressed GRU on FPGA |
US10528518B2 (en) | 2016-08-21 | 2020-01-07 | Mellanox Technologies, Ltd. | Using hardware gather-scatter capabilities to optimize MPI all-to-all |
JP6820586B2 (ja) | 2016-08-31 | 2021-01-27 | 株式会社メディアリンクス | 時刻同期システム |
US10977260B2 (en) | 2016-09-26 | 2021-04-13 | Splunk Inc. | Task distribution in an execution node of a distributed execution environment |
US11461334B2 (en) | 2016-09-26 | 2022-10-04 | Splunk Inc. | Data conditioning for dataset destination |
US11243963B2 (en) | 2016-09-26 | 2022-02-08 | Splunk Inc. | Distributing partial results to worker nodes from an external data system |
US10425358B2 (en) | 2016-09-29 | 2019-09-24 | International Business Machines Corporation | Network switch architecture supporting multiple simultaneous collective operations |
CN107896238B (zh) | 2016-10-04 | 2020-09-18 | 丰田自动车株式会社 | 车载网络系统 |
US10929174B2 (en) | 2016-12-15 | 2021-02-23 | Ecole Polytechnique Federale De Lausanne (Epfl) | Atomic object reads for in-memory rack-scale computing |
US10296351B1 (en) | 2017-03-15 | 2019-05-21 | Ambarella, Inc. | Computer vision processing in hardware data paths |
US10296473B2 (en) | 2017-03-24 | 2019-05-21 | Western Digital Technologies, Inc. | System and method for fast execution of in-capsule commands |
US10218642B2 (en) | 2017-03-27 | 2019-02-26 | Mellanox Technologies Tlv Ltd. | Switch arbitration based on distinct-flow counts |
US10419329B2 (en) | 2017-03-30 | 2019-09-17 | Mellanox Technologies Tlv Ltd. | Switch-based reliable multicast service |
US10108581B1 (en) | 2017-04-03 | 2018-10-23 | Google Llc | Vector reduction processor |
US20180302324A1 (en) | 2017-04-18 | 2018-10-18 | Atsushi Kasuya | Packet forwarding mechanism |
US10318306B1 (en) | 2017-05-03 | 2019-06-11 | Ambarella, Inc. | Multidimensional vectors in a coprocessor |
US10338919B2 (en) | 2017-05-08 | 2019-07-02 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
US10628236B2 (en) * | 2017-06-06 | 2020-04-21 | Huawei Technologies Canada Co., Ltd. | System and method for inter-datacenter communication |
US10367750B2 (en) | 2017-06-15 | 2019-07-30 | Mellanox Technologies, Ltd. | Transmission and reception of raw video using scalable frame rate |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11397428B2 (en) | 2017-08-02 | 2022-07-26 | Strong Force Iot Portfolio 2016, Llc | Self-organizing systems and methods for data collection |
US10693787B2 (en) | 2017-08-25 | 2020-06-23 | Intel Corporation | Throttling for bandwidth imbalanced data transfers |
US10727966B1 (en) | 2017-08-30 | 2020-07-28 | Amazon Technologies, Inc. | Time synchronization with distributed grand master |
CN110245751B (zh) | 2017-08-31 | 2020-10-09 | 中科寒武纪科技股份有限公司 | 一种gemm运算方法及装置 |
CN109426574B (zh) * | 2017-08-31 | 2022-04-05 | 华为技术有限公司 | 分布式计算系统,分布式计算系统中数据传输方法和装置 |
US10547553B2 (en) | 2017-09-17 | 2020-01-28 | Mellanox Technologies, Ltd. | Stateful connection tracking |
US10482337B2 (en) | 2017-09-29 | 2019-11-19 | Infineon Technologies Ag | Accelerating convolutional neural network computation throughput |
US10445098B2 (en) | 2017-09-30 | 2019-10-15 | Intel Corporation | Processors and methods for privileged configuration in a spatial array |
US10380063B2 (en) | 2017-09-30 | 2019-08-13 | Intel Corporation | Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator |
US11694066B2 (en) | 2017-10-17 | 2023-07-04 | Xilinx, Inc. | Machine learning runtime library for neural network acceleration |
GB2569276B (en) * | 2017-10-20 | 2020-10-14 | Graphcore Ltd | Compiler method |
US10771406B2 (en) | 2017-11-11 | 2020-09-08 | Microsoft Technology Licensing, Llc | Providing and leveraging implicit signals reflecting user-to-BOT interaction |
US10887252B2 (en) | 2017-11-14 | 2021-01-05 | Mellanox Technologies, Ltd. | Efficient scatter-gather over an uplink |
US11277350B2 (en) | 2018-01-09 | 2022-03-15 | Intel Corporation | Communication of a large message using multiple network interface controllers |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US10747708B2 (en) | 2018-03-08 | 2020-08-18 | Allegro Microsystems, Llc | Communication system between electronic devices |
US10621489B2 (en) | 2018-03-30 | 2020-04-14 | International Business Machines Corporation | Massively parallel neural inference computing elements |
US20190303263A1 (en) | 2018-03-30 | 2019-10-03 | Kermin E. Fleming, JR. | Apparatus, methods, and systems for integrated performance monitoring in a configurable spatial accelerator |
US20190044827A1 (en) | 2018-03-30 | 2019-02-07 | Intel Corporatoin | Communication of a message using a network interface controller on a subnet |
US10564980B2 (en) | 2018-04-03 | 2020-02-18 | Intel Corporation | Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator |
EP3791236A4 (en) | 2018-05-07 | 2022-06-08 | Strong Force Iot Portfolio 2016, LLC | METHODS AND SYSTEMS FOR DATA COLLECTION, LEARNING AND STREAMING MACHINE SIGNALS FOR ANALYSIS AND MAINTENANCE USING THE INDUSTRIAL INTERNET OF THINGS |
US10678540B2 (en) | 2018-05-08 | 2020-06-09 | Arm Limited | Arithmetic operation with shift |
US11048509B2 (en) | 2018-06-05 | 2021-06-29 | Qualcomm Incorporated | Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices |
US11277455B2 (en) | 2018-06-07 | 2022-03-15 | Mellanox Technologies, Ltd. | Streaming system |
US10839894B2 (en) | 2018-06-29 | 2020-11-17 | Taiwan Semiconductor Manufacturing Company Ltd. | Memory computation circuit and method |
US20190044889A1 (en) | 2018-06-29 | 2019-02-07 | Intel Corporation | Coalescing small payloads |
US10754649B2 (en) | 2018-07-24 | 2020-08-25 | Apple Inc. | Computation engine that operates in matrix and vector modes |
US10915324B2 (en) | 2018-08-16 | 2021-02-09 | Tachyum Ltd. | System and method for creating and executing an instruction word for simultaneous execution of instruction operations |
US20200106828A1 (en) | 2018-10-02 | 2020-04-02 | Mellanox Technologies, Ltd. | Parallel Computation Network Device |
US11019016B2 (en) * | 2018-10-27 | 2021-05-25 | International Business Machines Corporation | Subgroup messaging within a group-based messaging interface |
US11625393B2 (en) | 2019-02-19 | 2023-04-11 | Mellanox Technologies, Ltd. | High performance computing system |
EP3699770A1 (en) | 2019-02-25 | 2020-08-26 | Mellanox Technologies TLV Ltd. | Collective communication system and methods |
US11296807B2 (en) | 2019-06-25 | 2022-04-05 | Intel Corporation | Techniques to operate a time division multiplexing(TDM) media access control (MAC) |
US11516151B2 (en) | 2019-12-31 | 2022-11-29 | Infinera Oy | Dynamically switching queueing systems for network switches |
US11750699B2 (en) | 2020-01-15 | 2023-09-05 | Mellanox Technologies, Ltd. | Small message aggregation |
US11252027B2 (en) | 2020-01-23 | 2022-02-15 | Mellanox Technologies, Ltd. | Network element supporting flexible data reduction operations |
US11271874B2 (en) | 2020-02-05 | 2022-03-08 | Mellanox Technologies, Ltd. | Network adapter with time-aware packet-processing pipeline |
US11476928B2 (en) | 2020-03-18 | 2022-10-18 | Mellanox Technologies, Ltd. | TDMA networking using commodity NIC/switch |
US20220201103A1 (en) | 2022-03-09 | 2022-06-23 | Intel Corporation | Metadata compaction in packet coalescing |
-
2020
- 2020-02-10 EP EP20156490.3A patent/EP3699770A1/en active Pending
- 2020-02-13 US US16/789,458 patent/US11196586B2/en active Active
- 2020-02-25 CN CN202010117006.2A patent/CN111614581B/zh active Active
-
2021
- 2021-10-07 US US17/495,824 patent/US11876642B2/en active Active
-
2023
- 2023-11-19 US US18/513,565 patent/US20240089147A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101267448A (zh) * | 2008-05-09 | 2008-09-17 | 东北大学 | 一种基于嵌入式qnx操作系统的智能规约转换装置及方法 |
US20100017420A1 (en) * | 2008-07-21 | 2010-01-21 | International Business Machines Corporation | Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations |
CN101854556A (zh) * | 2009-03-30 | 2010-10-06 | 索尼公司 | 信息处理设备和方法 |
CN104662855A (zh) * | 2012-06-25 | 2015-05-27 | 科希尔技术股份有限公司 | 正交时频移动通信系统中的调制和均衡 |
CN102915031A (zh) * | 2012-10-25 | 2013-02-06 | 中国科学技术大学 | 并联机器人运动学参数的智能自标定系统 |
US20150193269A1 (en) * | 2014-01-06 | 2015-07-09 | International Business Machines Corporation | Executing an all-to-allv operation on a parallel computer that includes a plurality of compute nodes |
Also Published As
Publication number | Publication date |
---|---|
US20240089147A1 (en) | 2024-03-14 |
US20220029854A1 (en) | 2022-01-27 |
US11196586B2 (en) | 2021-12-07 |
CN111614581B (zh) | 2022-07-05 |
EP3699770A1 (en) | 2020-08-26 |
US11876642B2 (en) | 2024-01-16 |
US20200274733A1 (en) | 2020-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111614581B (zh) | 集体通信系统和方法 | |
Cheng et al. | Using high-bandwidth networks efficiently for fast graph computation | |
Husbands et al. | MPI-StarT: Delivering network performance to numerical applications | |
Petrini et al. | Performance evaluation of the quadrics interconnection network | |
Liu et al. | High performance RDMA-based MPI implementation over InfiniBand | |
Krishna et al. | Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication | |
Buntinas et al. | Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem | |
Chiang et al. | Multi-address encoding for multicast | |
Almási et al. | Design and implementation of message-passing services for the Blue Gene/L supercomputer | |
US20050038918A1 (en) | Method and apparatus for implementing work request lists | |
US20120151292A1 (en) | Supporting Distributed Key-Based Processes | |
Stunkel et al. | The high-speed networks of the Summit and Sierra supercomputers | |
CN103348641A (zh) | 单一调制解调器板上改进的多小区支持的方法和系统 | |
JP2007249810A (ja) | 並列計算機のリダクション処理方法及び並列計算機 | |
US20180052803A1 (en) | Using Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All | |
Daneshtalab et al. | Low-distance path-based multicast routing algorithm for network-on-chips | |
Kumar et al. | Scaling alltoall collective on multi-core systems | |
Fei et al. | FlexNFV: Flexible network service chaining with dynamic scaling | |
Suh et al. | All-to-all personalized communication in multidimensional torus and mesh networks | |
US7929439B1 (en) | Multiple network interface core apparatus and method | |
KR20140096587A (ko) | 기능 유닛들 간의 기능 로직 공유 장치, 방법 및 재구성 가능 프로세서 | |
Petrini et al. | Scalable collective communication on the ASCI Q machine | |
Vishnu et al. | Topology agnostic hot‐spot avoidance with InfiniBand | |
Woodside et al. | Alternative software architectures for parallel protocol execution with synchronous IPC | |
CN102710772A (zh) | 一种基于云平台的海量数据通讯系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220507 Address after: Israel Yuekeni Mourinho Applicant after: Mellanox Technologies, Ltd. Address before: Lai ananna Applicant before: Mellanox Technologies TLV Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |