CN109690512B - 具有触发操作的gpu远程通信 - Google Patents

具有触发操作的gpu远程通信 Download PDF

Info

Publication number
CN109690512B
CN109690512B CN201780056487.7A CN201780056487A CN109690512B CN 109690512 B CN109690512 B CN 109690512B CN 201780056487 A CN201780056487 A CN 201780056487A CN 109690512 B CN109690512 B CN 109690512B
Authority
CN
China
Prior art keywords
command
gpu
network
data
generated network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201780056487.7A
Other languages
English (en)
Chinese (zh)
Other versions
CN109690512A (zh
Inventor
迈克尔·W·莱贝恩
史蒂文·K·莱因哈特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Micro Devices Inc
Original Assignee
Advanced Micro Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Micro Devices Inc filed Critical Advanced Micro Devices Inc
Publication of CN109690512A publication Critical patent/CN109690512A/zh
Application granted granted Critical
Publication of CN109690512B publication Critical patent/CN109690512B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9063Intermediate storage in different physical parts of a node or terminal
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/382Information transfer, e.g. on bus using universal interface adapter
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer And Data Communications (AREA)
  • Information Transfer Systems (AREA)
  • Bus Control (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
CN201780056487.7A 2016-10-18 2017-09-19 具有触发操作的gpu远程通信 Active CN109690512B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/297,079 US10936533B2 (en) 2016-10-18 2016-10-18 GPU remote communication with triggered operations
US15/297,079 2016-10-18
PCT/US2017/052250 WO2018075182A1 (en) 2016-10-18 2017-09-19 Gpu remote communication with triggered operations

Publications (2)

Publication Number Publication Date
CN109690512A CN109690512A (zh) 2019-04-26
CN109690512B true CN109690512B (zh) 2023-07-18

Family

ID=61904564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780056487.7A Active CN109690512B (zh) 2016-10-18 2017-09-19 具有触发操作的gpu远程通信

Country Status (6)

Country Link
US (1) US10936533B2 (enExample)
EP (1) EP3529706B1 (enExample)
JP (1) JP6961686B2 (enExample)
KR (1) KR102245247B1 (enExample)
CN (1) CN109690512B (enExample)
WO (1) WO2018075182A1 (enExample)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10534606B2 (en) 2011-12-08 2020-01-14 Oracle International Corporation Run-length encoding decompression
US11113054B2 (en) 2013-09-10 2021-09-07 Oracle International Corporation Efficient hardware instructions for single instruction multiple data processors: fast fixed-length value compression
US10599488B2 (en) 2016-06-29 2020-03-24 Oracle International Corporation Multi-purpose events for notification and sequence control in multi-core processor systems
US10380058B2 (en) 2016-09-06 2019-08-13 Oracle International Corporation Processor core to coprocessor interface with FIFO semantics
US10783102B2 (en) 2016-10-11 2020-09-22 Oracle International Corporation Dynamically configurable high performance database-aware hash engine
US10459859B2 (en) 2016-11-28 2019-10-29 Oracle International Corporation Multicast copy ring for database direct memory access filtering engine
US10725947B2 (en) 2016-11-29 2020-07-28 Oracle International Corporation Bit vector gather row count calculation and handling in direct memory access engine
US20190044809A1 (en) * 2017-08-30 2019-02-07 Intel Corporation Technologies for managing a flexible host interface of a network interface controller
US11429413B2 (en) * 2018-03-30 2022-08-30 Intel Corporation Method and apparatus to manage counter sets in a network interface controller
US10740163B2 (en) * 2018-06-28 2020-08-11 Advanced Micro Devices, Inc. Network packet templating for GPU-initiated communication
US10795840B2 (en) 2018-11-12 2020-10-06 At&T Intellectual Property I, L.P. Persistent kernel for graphics processing unit direct memory access network packet processing
US12267229B2 (en) * 2019-05-23 2025-04-01 Hewlett Packard Enterprise Development Lp System and method for facilitating data-driven intelligent network with endpoint congestion detection and control
US11182221B1 (en) * 2020-12-18 2021-11-23 SambaNova Systems, Inc. Inter-node buffer-based streaming for reconfigurable processor-as-a-service (RPaaS)
US11665113B2 (en) * 2021-07-28 2023-05-30 Hewlett Packard Enterprise Development Lp System and method for facilitating dynamic triggered operation management in a network interface controller (NIC)
US11960813B2 (en) 2021-08-02 2024-04-16 Advanced Micro Devices, Inc. Automatic redistribution layer via generation
US12418906B2 (en) 2022-02-27 2025-09-16 Nvidia Corporation System and method for GPU-initiated communication
US20230276301A1 (en) * 2022-02-27 2023-08-31 Nvidia Corporation System and method for gpu-initiated communication
US12229057B2 (en) 2023-01-19 2025-02-18 SambaNova Systems, Inc. Method and apparatus for selecting data access method in a heterogeneous processing system with multiple processors
US12210468B2 (en) 2023-01-19 2025-01-28 SambaNova Systems, Inc. Data transfer between accessible memories of multiple processors incorporated in coarse-grained reconfigurable (CGR) architecture within heterogeneous processing system using one memory to memory transfer operation
US12380041B2 (en) 2023-01-19 2025-08-05 SambaNova Systems, Inc. Method and apparatus for data transfer between accessible memories of multiple processors in a heterogeneous processing system using two memory to memory transfer operations
US20250254137A1 (en) * 2024-02-05 2025-08-07 Mellanox Technologies, Ltd. Low latency communication channel over a communications bus using a host channel adapter

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007316859A (ja) * 2006-05-24 2007-12-06 Sony Computer Entertainment Inc マルチグラフィックスプロセッサシステム、グラフィックスプロセッサおよびデータ転送方法
CN101539902A (zh) * 2009-05-05 2009-09-23 中国科学院计算技术研究所 多计算机系统中节点的dma设备及通信方法
US8131814B1 (en) * 2008-07-11 2012-03-06 Hewlett-Packard Development Company, L.P. Dynamic pinning remote direct memory access
CN102804156A (zh) * 2009-06-17 2012-11-28 超威半导体公司 动态随机存取存储器通道控制器的并行训练

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278956A (en) 1990-01-22 1994-01-11 Vlsi Technology, Inc. Variable sized FIFO memory and programmable trigger level therefor for use in a UART or the like
US8766993B1 (en) 2005-04-06 2014-07-01 Teradici Corporation Methods and apparatus for enabling multiple remote displays
US8269782B2 (en) 2006-11-10 2012-09-18 Sony Computer Entertainment Inc. Graphics processing apparatus
US20100013839A1 (en) * 2008-07-21 2010-01-21 Rawson Andrew R Integrated GPU, NIC and Compression Hardware for Hosted Graphics
US9645866B2 (en) 2010-09-20 2017-05-09 Qualcomm Incorporated Inter-processor communication techniques in a multiple-processor computing platform
US8902228B2 (en) * 2011-09-19 2014-12-02 Qualcomm Incorporated Optimizing resolve performance with tiling graphics architectures
US9830288B2 (en) 2011-12-19 2017-11-28 Nvidia Corporation System and method for transmitting graphics rendered on a primary computer to a secondary computer
CN104025065B (zh) * 2011-12-21 2018-04-06 英特尔公司 用于存储器层次察觉的生产者‑消费者指令的装置和方法
US9171348B2 (en) * 2012-01-23 2015-10-27 Google Inc. Rendering content on computing systems
ITRM20120094A1 (it) * 2012-03-14 2013-09-14 Istituto Naz Di Fisica Nuclea Re Scheda di interfaccia di rete per nodo di rete di calcolo parallelo su gpu, e relativo metodo di comunicazione internodale
US9602437B1 (en) * 2012-10-03 2017-03-21 Tracey M. Bernath System and method for accelerating network applications using an enhanced network interface and massively parallel distributed processing
US9582402B2 (en) 2013-05-01 2017-02-28 Advanced Micro Devices, Inc. Remote task queuing by networked computing devices
US10134102B2 (en) * 2013-06-10 2018-11-20 Sony Interactive Entertainment Inc. Graphics processing hardware for using compute shaders as front end for vertex shaders
WO2015130282A1 (en) * 2014-02-27 2015-09-03 Hewlett-Packard Development Company, L. P. Communication between integrated graphics processing units
US10218645B2 (en) * 2014-04-08 2019-02-26 Mellanox Technologies, Ltd. Low-latency processing in a network node
US10331595B2 (en) * 2014-10-23 2019-06-25 Mellanox Technologies, Ltd. Collaborative hardware interaction by multiple entities using a shared queue
US9582463B2 (en) 2014-12-09 2017-02-28 Intel Corporation Heterogeneous input/output (I/O) using remote direct memory access (RDMA) and active message
US9779466B2 (en) * 2015-05-07 2017-10-03 Microsoft Technology Licensing, Llc GPU operation
US10248610B2 (en) * 2015-06-23 2019-04-02 Mellanox Technologies, Ltd. Enforcing transaction order in peer-to-peer interactions
US10445850B2 (en) * 2015-08-26 2019-10-15 Intel Corporation Technologies for offloading network packet processing to a GPU
US10210593B2 (en) * 2016-01-28 2019-02-19 Qualcomm Incorporated Adaptive context switching
US10331590B2 (en) * 2016-06-30 2019-06-25 Intel Corporation Graphics processing unit (GPU) as a programmable packet transfer mechanism
US10410313B2 (en) * 2016-08-05 2019-09-10 Qualcomm Incorporated Dynamic foveation adjustment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007316859A (ja) * 2006-05-24 2007-12-06 Sony Computer Entertainment Inc マルチグラフィックスプロセッサシステム、グラフィックスプロセッサおよびデータ転送方法
US8131814B1 (en) * 2008-07-11 2012-03-06 Hewlett-Packard Development Company, L.P. Dynamic pinning remote direct memory access
CN101539902A (zh) * 2009-05-05 2009-09-23 中国科学院计算技术研究所 多计算机系统中节点的dma设备及通信方法
CN102804156A (zh) * 2009-06-17 2012-11-28 超威半导体公司 动态随机存取存储器通道控制器的并行训练

Also Published As

Publication number Publication date
EP3529706A1 (en) 2019-08-28
US10936533B2 (en) 2021-03-02
WO2018075182A1 (en) 2018-04-26
JP6961686B2 (ja) 2021-11-05
KR20190058483A (ko) 2019-05-29
KR102245247B1 (ko) 2021-04-27
CN109690512A (zh) 2019-04-26
EP3529706B1 (en) 2023-03-22
EP3529706A4 (en) 2020-03-25
US20180107627A1 (en) 2018-04-19
JP2019532427A (ja) 2019-11-07

Similar Documents

Publication Publication Date Title
CN109690512B (zh) 具有触发操作的gpu远程通信
CN102906726B (zh) 协处理加速方法、装置及系统
CN107077441B (zh) 用于提供使用rdma和主动消息的异构i/o的方法和装置
US9582402B2 (en) Remote task queuing by networked computing devices
US20210294292A1 (en) Method and apparatus for remote field programmable gate array processing
US9881680B2 (en) Multi-host power controller (MHPC) of a flash-memory-based storage device
US10534737B2 (en) Accelerating distributed stream processing
CN115934625B (zh) 一种用于远程直接内存访问的敲门铃方法、设备及介质
US20250060912A1 (en) Method of submitting work to fabric attached memory
US10769092B2 (en) Apparatus and method for reducing latency of input/output transactions in an information handling system using no-response commands
US10284501B2 (en) Technologies for multi-core wireless network data transmission
US10951537B1 (en) Adjustable receive queue for processing packets in a network device
JP2008547139A (ja) 一方向全二重インタフェースを有するメモリのポスト書き込みバッファのための方法、装置及びシステム
US12468575B2 (en) System and method for scheduling resources of distributed systems to perform workloads
US12468542B2 (en) State management with distributed control plane
US9880748B2 (en) Bifurcated memory management for memory elements
CN115297169B (zh) 数据处理方法、装置、电子设备及介质
US12001370B2 (en) Multi-node memory address space for PCIe devices
US9111039B2 (en) Limiting bandwidth for write transactions across networks of components in computer systems
US11102150B2 (en) Communication apparatus and control method for communication apparatus
CN119473958A (zh) 中断控制装置及方法、存储介质和电子装置
TW202236103A (zh) 通過在基於處理器的裝置中的應用程式入口來啟用周邊裝置訊息傳遞

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant