WO2021119072A1 - Gpu packet aggregation system - Google Patents

Gpu packet aggregation system Download PDF

Info

Publication number
WO2021119072A1
WO2021119072A1 PCT/US2020/063923 US2020063923W WO2021119072A1 WO 2021119072 A1 WO2021119072 A1 WO 2021119072A1 US 2020063923 W US2020063923 W US 2020063923W WO 2021119072 A1 WO2021119072 A1 WO 2021119072A1
Authority
WO
WIPO (PCT)
Prior art keywords
packet
input
output
input packet
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2020/063923
Other languages
English (en)
French (fr)
Inventor
Todd Martin
Tad Litwiller
Nishank Pathak
Mangesh P. NIJARSURE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Micro Devices Inc
Original Assignee
Advanced Micro Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Micro Devices Inc filed Critical Advanced Micro Devices Inc
Priority to JP2022534186A priority Critical patent/JP7528217B2/ja
Priority to EP20899498.8A priority patent/EP4073639B1/en
Priority to KR1020227019998A priority patent/KR102709341B1/ko
Priority to CN202080085569.6A priority patent/CN114902181A/zh
Publication of WO2021119072A1 publication Critical patent/WO2021119072A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17337Direct connection machines, e.g. completely connected computers, point to point communication networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/625Queue scheduling characterised by scheduling criteria for service slots or service orders
    • H04L47/6255Queue scheduling characterised by scheduling criteria for service slots or service orders queue load conditions, e.g. longest queue first
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9057Arrangements for supporting packet reassembly or resequencing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/28Indexing scheme for image data processing or generation, in general involving image processing hardware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2212/00Encapsulation of packets

Definitions

  • processors often employ multiple modules, referred to as compute units (CUs), to execute operations in parallel.
  • a processor employs a graphics processing unit (GPU) to carry out a variety of image processing or other general- purpose processing applications.
  • the GPU includes multiple CUs to execute the operations in parallel.
  • communication of data used to perform these operations impacts the overall efficiency of the processor.
  • indices for the graphics and vector processing operations are sent to the CUs via a communication fabric, such as a bus.
  • the communication traffic supporting these data transfers consumes an undesirably large portion of the communication fabric’s available bandwidth, thereby reducing overall processing efficiency at the GPU.
  • FIG. 1 is a block diagram of a graphics processing unit including hardware that automatically aggregates data from input packets in accordance with some embodiments.
  • FIG. 3 is a block diagram illustrating an example packet management component processing an example timeline of input and output packets in accordance with some embodiments.
  • FIG. 4 is a block diagram illustrating an example packet management component aggregating indices from input packets and sending the indices in an output packet in accordance with some embodiments.
  • a packet management component of a packet aggregation system of a processing unit such as a graphics processing unit (GPU) aggregates data from incoming packets in response to detecting that an output wavefront will be smaller than an output size threshold.
  • a send condition e.g., an incoming packet indicates a context switch or data has been stored or held at the packet management component for a particular amount of time
  • the packet management component outputs the aggregated data as a wavefront.
  • the output conditions are difficult for software systems (e.g., drivers) to detect in a timely manner because of the number of input packets and because of a time lag due to software processing.
  • software systems e.g., drivers
  • the described systems because the described systems detect output conditions at the hardware level as input packets are received, the system more easily detects output conditions, as compared to a system where a software driver aggregates the data.
  • GPUs and other multithreaded processing units typically implement multiple processing elements (which are also referred to as processor cores or compute units) that concurrently execute sets of instructions or operations on multiple data sets.
  • the sets of instructions or operations are referred to as threads.
  • Operations and program data for the threads is sent to the processing elements by a command processor via communications referred to as packets.
  • packets are collections of graphics data referred to as wavefronts.
  • communication of wavefronts is hardware inefficient. For example, if a program calls for a large number of wavefronts (e.g., draws) that each have only a few indices (e.g., one or five indices), the resulting wavefronts would each inefficiently utilize communication infrastructure designed to send wavefronts that include more indices (e.g., a 32-wide infrastructure or a 256- wide communication infrastructure).
  • indices refer to values generated by a user that provide locations of vertex coordinates.
  • an incoming packet includes more data than can be communicated in a single wavefront but not enough data for a last wavefront generated based on the packet to efficiently use communication infrastructure.
  • output conditions e.g., register state updates, pipeline flushes, or context switches
  • GPU 100 is part of a device such as a desktop or laptop computer, server, smartphone, tablet, game console, or other electronic device.
  • the device includes a central processing unit (CPU) that sends various commands or instructions (e.g., draw commands) to GPU 100.
  • CPU central processing unit
  • GPU 100 executes various operations. These operations include program calls that call for data to be processed.
  • command processor 102 sends various input packets 130 to packet management component 104 where input packets indicate various sets of commands from GPU 100 based on the program calls.
  • input packets 130 include various types of data including draw indices or indications of send conditions (e.g., indications of events that would cause packet management component 104 to output data to the one or more compute units).
  • input packets 130 are sent sequentially over a period of time. Packet management component generates output packet 132 based on data from input packets 130.
  • packet management component 104 In response to an output packet being smaller than a an output size threshold (e.g., because an input packet includes less data than an amount used to generate a packet of the output size threshold or because an input packet includes enough data that multiple output packets are generated and a last output packet would be smaller than the output size threshold), packet management component 104 holds and aggregates data corresponding to one or more input packets of input packets 130 and outputs the aggregated data as output packet 132 to one or more compute units such as compute unit 106. In some embodiments, output packet 132 is sent to each of the plurality of compute units or different output packets are sent to respective compute units.
  • the aggregated output packet is sent as if it were the first received input packet included in the aggregated output packet (e.g., including various headers and other data corresponding to the first received input packet). In other embodiments, the aggregated output packet is sent as if it were a different received input packet, the aggregated output packet is indicative of multiple received input packets, or the aggregated output packet is indicative of none of the received input packets.
  • packet management component 104 analyzes the received input packets. In response to detecting, based on send condition detection component 120, that an input packet does not indicate a send condition, packet management component 104, using packet aggregation component 112, aggregates data corresponding to the input packet in packet buffer 110. For example, packet aggregation component 112 aggregates data corresponding to an incoming input packet with previously stored data in packet buffer 110. In some embodiments, the data is the entire input packet. In other embodiments, the data is a portion of the input packet, data indicated by the input packet (e.g., data generated as a result of one or more computations indicated by the input packet), or both.
  • send condition detection component 120 In response to detecting, using send condition detection component 120, that an input packet indicates a send condition, packet management component 104 sends the aggregated data to one or more compute units such as compute unit 106 in output packet 132. Accordingly, fewer output packets 132 are sent to compute unit 106, as compared to a system where an output packet is sent for each input packet.
  • input packets 130 are indices of draw commands and output packet 132 is a wavefront including indices corresponding to multiple input packets of input packets 130.
  • send condition detection component 120 only determines whether the input packet indicates a send condition in response to detecting that the output packet would be smaller than an output size threshold. In other embodiments, send condition detection component 120 detects various send conditions in parallel with detecting whether the output packet would be smaller than the output size threshold.
  • output conditions include send conditions (e.g., conditions indicated by incoming packets), timeout conditions, and size conditions.
  • output condition detection component 114 includes various hardware such as buffers and read enable logic to detect various output conditions. In the illustrated embodiment, some output conditions are send conditions indicated by an input packet of input packets 130.
  • state information of an input packet indicates a register state update (e.g., a packet specifying a draw topology, controlling a distribution of a draw workload, or specifying a number of bits of an index type) or an event (e.g., a pipeline flush (a process where instructions in a pipeline are removed, for example, due to an incorrect branch prediction) or a context switch (a switch between two applications, tasks, or programs)).
  • output conditions include changing a draw source (e.g., from direct memory access to auto index or vice versa), changing virtual reality control fields, or changing an index size between draws.
  • other output conditions including those detected by various other means, are also contemplated.
  • an output condition includes timeout detection component 122 indicating that a packet storage timer of timeout detection component 122 exceeds a timeout threshold.
  • the packet storage timer tracks an amount of time at least some data has been stored at packet buffer 110 (e.g., the data stored the longest).
  • timeout detection component 122 indicates an output condition.
  • the timeout threshold is user-specified. In other cases, the timeout threshold is specified by another entity such as an application running on GPU 100.
  • an output condition includes determining that an amount of the aggregated data stored at packet buffer 110 exceeds an output size threshold.
  • the output size threshold is user specified.
  • the output size threshold corresponds to a size of a communication infrastructure used to send output packet 132 to compute unit 106. To illustrate, if the communication infrastructure is 32-wide, then detecting that packet buffer stores more than 31 indices causes output condition detection component 114 to indicate that an output condition is satisfied.
  • packet aggregation component 112 causes packet buffer 110 to store the data of input packets 130 separated by respective delimiters.
  • aggregating the data of input packets 130 includes updating a header file stored at packet buffer 110 to indicate addresses corresponding to respective input packets of input packets 130.
  • input packets 130 in their entirety are stored or otherwise held at packet buffer 110. In other embodiments, only a portion of input packets 130 are stored or otherwise held at packet buffer 110.
  • a system where packets (e.g., draw indices) are automatically aggregated (e.g., without specific software instructions with regard to the packets) by hardware components.
  • the system aggregates the packets without software management.
  • the system detects various output conditions (e.g., register state updates and events) and sends aggregated packets in response to the output conditions.
  • output conditions e.g., register state updates and events
  • FIG. 2 is a flow diagram illustrating a method 200 of aggregating data from input packets in accordance with some embodiments.
  • the method 200 is implemented, in some embodiments, by packet management component 104 of GPU 100 of FIG. 1.
  • method 200 is initiated by one or more processors in response to one or more instructions stored by a computer-readable storage medium.
  • method 200 includes receiving an input packet from a command processor.
  • packet management component 104 receives input packet 130 from command processor 102.
  • method 200 includes determining whether the input packet indicates a send condition. For example, in some cases, packet management component 104 determines whether the received input packet 130 indicates (e.g., via state information) a send condition (e.g., a register state update or an event). In response to determining that the input packet indicates a send condition, method 200 proceeds to 216. In response to determining that the input packet does not indicate a send condition, method 200 proceeds to 206.
  • a send condition e.g., a register state update or an event.
  • method 200 in response to determining that the input packet does not indicate a send condition, includes determining whether an output packet is open. For example, in some cases, packet management component 104 determines whether packet buffer 110 includes an open output packet. In response to determining that an output packet is open, method 200 proceeds to 210. In response to determining that no output packet is open, method 200 proceeds to 208.
  • method 200 includes creating a new output packet. For example, in some cases, packet management component 104 creates a new output packet in packet buffer 110.
  • method 200 in response to determining that an output packet is open or subsequent to creating the new output packet, includes adding contents of the input packet to the output packet. For example, in some cases, packet management component 104 aggregates, in packet buffer 110, data corresponding to input packet 130 with data corresponding to one or more previously stored or otherwise held input packets. As another example, in some cases, packet management component 104 adds data corresponding to input packet 130 to the newly created output packet in packet buffer 110.
  • method 200 includes determining whether a timeout condition is satisfied. For example, in some cases, timeout detection component 122 checks a timeout storage tracker that indicates an amount of time at least a portion of the output packet has been stored or otherwise held at packet buffer 110. In response to the timeout storage tracker exceeding a timeout threshold, timeout detection component 122 determines that a timeout condition is satisfied. In response to determining that the timeout condition is satisfied, method 200 proceeds to 216. In response to the timeout storage tracker failing to exceed the timeout threshold, timeout detection component 122 determines that the timeout condition is not satisfied.
  • method 200 proceeds to 214.
  • 212 further includes determining whether a size of the output packet exceeds an output size threshold, and, in response to determining that the size of the output packet exceeds the output size threshold, proceeding to 216. In some embodiments, determining that the size of the output packet exceeds the output size threshold and proceeding to 216 is performed additionally or alternatively in other portions of method 200 including, for example, between 202 and 204.
  • method 200 includes determining whether an incoming input packet is indicated. For example, in some cases, packet management component 104 determines whether command processor 102 is sending an input packet. In response to detecting an incoming input packet, method 200 proceeds to 202. In response to failing to detect an input packet, method 200 proceeds to 212.
  • method 200 in response to determining that the input packet indicates a send condition or in response to determining that the timeout condition is satisfied, includes sending the output packet to a compute unit. For example, in some cases, in response to input packet 130 indicating a send condition, packet management component 104 closes the output packet and sends the output packet to compute unit 106 as output packet 132. As another example, in some cases, in response to timeout detection component 122 detecting that a timeout condition is satisfied, packet management component 104 closes the output packet and sends the output packet to compute unit 106 as output packet 132.
  • method 200 subsequent to sending the output packet to compute unit 106, includes performing a send condition if it is indicated (e.g., at 204). For example, in response to input packet 130 indicating a send condition, packet management component 104 sends output packet 132 to compute unit 106 and then performs the indicated send condition. Accordingly, a method of aggregating data from input packets is depicted.
  • FIG. 3 is a block diagram depicting a timeline 300 that illustrates an example packet management component processing input and output packets in accordance with some embodiments.
  • input packets 302-312 and 316 are received at a packet management component (e.g., packet management component 104). Further, event 314 is detected at the packet management component.
  • packet management component e.g., packet management component 104
  • Input packet 308 indicates a context switch (a send condition).
  • input packet 308 indicates that input packets 302-306 correspond to a different context than subsequently received input packets 310 and 312. Accordingly, in response to detecting the send condition, the output packet including the draw data indicated by input packets 302-306 is sent and then the context switch is performed.
  • a new output packet is created and draw data (draw4) indicated by input packet 310 is added to the output packet.
  • draw data (draw5) indicated by input packet 312 is added to the output packet.
  • a timeout detection component detects that a package storage timer indicates that at least a portion of the data in the output packet (e.g., the draw data indicated by input packet 310) has been stored for longer than a timeout threshold. Accordingly, at event 314, a timeout condition is satisfied and the output packet including the draw data indicated by input packets 310 and 312 is sent.
  • a new output packet is created and draw data (draw6) indicated by input packet 316 is added to the output packet. Accordingly, an example timeline 300 of input and output packets is illustrated.
  • FIG. 4 is a block diagram illustrating an example GPU 400 that includes packet management component 104, which includes packet buffer 110 in accordance with some embodiments.
  • packet management component 104 aggregates indices 410-414 from input packets 402-406 in packet buffer 110. As a result, indices 410-414 are aggregated and stored together in packet buffer 110.
  • packet management component 104 in response to a register state update indication 416 from input packet 408, sends indices 410-414 in an output packet 420.
  • input packets 402-408 correspond to input packets 130 of FIG. 1 and output packet 420 corresponds to output packet 132 of FIG. 1.
  • a method includes: receiving, by a packet management component from a command processor of a graphics processing unit (GPU), a first input packet indicating a first set of commands; in response to determining that the first input packet does not indicate a send condition, automatically aggregating data corresponding to the first input packet with previously received packet data stored at a packet buffer of the packet management component.
  • the method includes receiving a second input packet indicating a second set of commands received from the GPU; in response to determining that the second input packet indicates a send condition, sending the aggregated data to a compute unit in an output packet; and performing an operation indicated by the send condition.
  • the first input packet includes a first plurality of draw indices
  • the previously received packet data includes a second plurality of draw indices
  • the aggregated data includes the first plurality of draw indices and the second plurality of draw indices.
  • the output packet is a wavefront including a set of operations to be performed by the compute unit of the GPU.
  • the second input packet indicates at least one of a register state update, a context switch, or a pipeline flush.
  • the method includes: subsequent to performing the operation, receiving a third input packet indicating a third set of commands received from the GPU; storing data corresponding to the third input packet at the packet buffer; and in response to detecting that a timeout condition has been satisfied, sending the third input packet to the compute unit in a second output packet.
  • the method includes: subsequent to performing the operation, receiving a third input packet indicating a third set of commands received from the GPU; storing data corresponding to the third input packet at the packet buffer; and in response to detecting that an amount of second aggregated data stored at the packet buffer exceeds an output size threshold, sending the third input packet to the compute unit in a second output packet.
  • the output size threshold is user programmable.
  • a graphics processing unit includes: a command processor configured to send input packets indicating commands received from the GPU; a packet management component, including: a packet buffer configured to store data corresponding to the input packets received from the command processor; a packet aggregation component configured to: identify state information of an incoming first input packet; in response to the state information indicating an aggregation condition, aggregate data corresponding to the first input packet with data corresponding to a second input packet stored at the packet buffer; and in response to the state information indicating a send condition, send an output packet for processing by a compute unit, wherein the output packet includes aggregated data stored at the packet buffer.
  • a command processor configured to send input packets indicating commands received from the GPU
  • a packet management component including: a packet buffer configured to store data corresponding to the input packets received from the command processor; a packet aggregation component configured to: identify state information of an incoming first input packet; in response to the state information indicating an aggregation condition, aggregate data corresponding to the first input packet with data
  • the packet aggregation component comprises a timeout detection component configured to cause the output packet to be sent in response to an amount of time at least a portion of the data corresponding to the second input packet has been stored exceeding a timeout threshold.
  • the timeout threshold is user-specified.
  • the output packet is a wavefront.
  • the aggregated data includes a portion of the first input packet and a portion of the second input packet.
  • the aggregated data includes the first input packet and the second input packet.
  • a method includes: receiving, by a packet management component from a command processor, a first input packet indicating a first set of commands received from a graphics processing unit (GPU); storing data corresponding to the first input packet at a packet buffer of the packet management component; receiving a second input packet indicating a second set of commands received from the GPU; in response to determining that an output condition has not been satisfied, automatically aggregating data corresponding to the second input packet with the data corresponding to the first input packet; and in response to determining that an output condition has been satisfied, sending the aggregated data to one or more compute units in one or more output packets.
  • determining that the output condition has been satisfied is performed in response to determining that an amount of the aggregated data stored at the packet buffer exceeds an output size threshold.
  • determining that the output condition has been satisfied comprises determining that a third input packet indicates a send condition.
  • the method includes: in response to receiving the first input packet, starting, at a timeout detection component of the packet management component, a packet storage timer.
  • determining that the output condition has been satisfied comprises determining that the packet storage timer exceeds a timeout threshold.
  • the timeout threshold is user-specified.
  • a computer readable storage medium includes any non- transitory storage medium, or combination of non-transitory storage media, accessible by a computer system during use to provide instructions and/or data to the computer system.
  • such storage media includes, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu- Ray disc), magnetic media (e.g., floppy disc , magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media.
  • optical media e.g., compact disc (CD), digital versatile disc (DVD), Blu- Ray disc
  • magnetic media e.g., floppy disc , magnetic tape, or magnetic hard drive
  • volatile memory e.g., random access memory (RAM) or cache
  • non-volatile memory e.g., read-only memory (ROM) or
  • the computer readable storage medium is embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).
  • the computing system e.g., system RAM or ROM
  • fixedly attached to the computing system e.g., a magnetic hard drive
  • removably attached to the computing system e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory
  • USB Universal Serial Bus
  • certain aspects of the techniques described above are implemented by one or more processors of a processing system executing software.
  • the software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium.
  • the software includes the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above.
  • the non-transitory computer readable storage medium includes a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like.
  • the executable instructions stored on the non-transitory computer readable storage medium are in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
PCT/US2020/063923 2019-12-13 2020-12-09 Gpu packet aggregation system Ceased WO2021119072A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2022534186A JP7528217B2 (ja) 2019-12-13 2020-12-09 Gpuパケット集約システム
EP20899498.8A EP4073639B1 (en) 2019-12-13 2020-12-09 Gpu packet aggregation system
KR1020227019998A KR102709341B1 (ko) 2019-12-13 2020-12-09 Gpu 패킷 집계 시스템
CN202080085569.6A CN114902181A (zh) 2019-12-13 2020-12-09 Gpu包聚合系统

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/713,472 2019-12-13
US16/713,472 US11210757B2 (en) 2019-12-13 2019-12-13 GPU packet aggregation system

Publications (1)

Publication Number Publication Date
WO2021119072A1 true WO2021119072A1 (en) 2021-06-17

Family

ID=76316977

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/063923 Ceased WO2021119072A1 (en) 2019-12-13 2020-12-09 Gpu packet aggregation system

Country Status (6)

Country Link
US (1) US11210757B2 (https=)
EP (1) EP4073639B1 (https=)
JP (1) JP7528217B2 (https=)
KR (1) KR102709341B1 (https=)
CN (1) CN114902181A (https=)
WO (1) WO2021119072A1 (https=)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12106112B2 (en) * 2020-12-03 2024-10-01 Intel Corporation Methods and apparatus to generate graphics processing unit long instruction traces
CN113626369B (zh) * 2021-08-14 2023-05-26 苏州浪潮智能科技有限公司 一种多节点集群环形通信的方法、装置、设备及可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7839876B1 (en) * 2006-01-25 2010-11-23 Marvell International Ltd. Packet aggregation
US20160283416A1 (en) * 2015-03-23 2016-09-29 Samsung Electronics Co., Ltd. Bus interface device, semiconductor integrated circuit device including the same, and method of operating the same
US20160352598A1 (en) * 2015-05-29 2016-12-01 Advanced Micro Devices, Inc. Message aggregation, combining and compression for efficient data communications in gpu-based clusters
US20160379336A1 (en) * 2015-04-01 2016-12-29 Mediatek Inc. Methods of a graphics-processing unit for tile-based rendering of a display area and graphics-processing apparatus
US20170123866A1 (en) * 2008-05-15 2017-05-04 Ip Reservoir, Llc Method and System for Accelerated Stream Processing

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100628619B1 (ko) * 2000-07-10 2006-09-26 마쯔시다덴기산교 가부시키가이샤 복수의 디코드 장치 및 방법
WO2005053216A2 (en) * 2003-11-25 2005-06-09 Dg2L Technologies Methods and systems for reliable distribution of media over a network
US7209139B1 (en) * 2005-01-07 2007-04-24 Electronic Arts Efficient rendering of similar objects in a three-dimensional graphics engine
CN101471826B (zh) * 2007-12-27 2012-12-12 华为技术有限公司 命令行接口的测试方法及装置
JP2010055214A (ja) 2008-08-26 2010-03-11 Sanyo Electric Co Ltd データ処理装置
EP2596470A1 (en) * 2010-07-19 2013-05-29 Advanced Micro Devices, Inc. Data processing using on-chip memory in multiple processing units
CN102323917B (zh) * 2011-09-06 2013-05-15 中国人民解放军国防科学技术大学 一种基于共享内存实现多进程共享gpu的方法
US20130155077A1 (en) * 2011-12-14 2013-06-20 Advanced Micro Devices, Inc. Policies for Shader Resource Allocation in a Shader Core
US20130162661A1 (en) * 2011-12-21 2013-06-27 Nvidia Corporation System and method for long running compute using buffers as timeslices
US9509616B1 (en) * 2014-11-24 2016-11-29 Amazon Technologies, Inc. Congestion sensitive path-balancing
US20170300361A1 (en) * 2016-04-15 2017-10-19 Intel Corporation Employing out of order queues for better gpu utilization
JP7100624B2 (ja) * 2016-08-29 2022-07-13 アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド 優先プリミティブバッチのビニング及びソートを用いたハイブリッドレンダリング
US10572258B2 (en) * 2017-04-01 2020-02-25 Intel Corporation Transitionary pre-emption for virtual reality related contexts
CN110223216B (zh) * 2019-06-11 2023-01-17 西安芯瞳半导体技术有限公司 一种基于并行plb的数据处理方法、装置及计算机存储介质
CN110415161B (zh) * 2019-07-19 2023-06-27 龙芯中科(合肥)技术有限公司 图形处理方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7839876B1 (en) * 2006-01-25 2010-11-23 Marvell International Ltd. Packet aggregation
US20170123866A1 (en) * 2008-05-15 2017-05-04 Ip Reservoir, Llc Method and System for Accelerated Stream Processing
US20160283416A1 (en) * 2015-03-23 2016-09-29 Samsung Electronics Co., Ltd. Bus interface device, semiconductor integrated circuit device including the same, and method of operating the same
US20160379336A1 (en) * 2015-04-01 2016-12-29 Mediatek Inc. Methods of a graphics-processing unit for tile-based rendering of a display area and graphics-processing apparatus
US20160352598A1 (en) * 2015-05-29 2016-12-01 Advanced Micro Devices, Inc. Message aggregation, combining and compression for efficient data communications in gpu-based clusters

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4073639A4 *

Also Published As

Publication number Publication date
JP7528217B2 (ja) 2024-08-05
EP4073639A4 (en) 2024-01-10
US20210183004A1 (en) 2021-06-17
CN114902181A (zh) 2022-08-12
EP4073639A1 (en) 2022-10-19
JP2023505783A (ja) 2023-02-13
US11210757B2 (en) 2021-12-28
KR102709341B1 (ko) 2024-09-25
KR20220113710A (ko) 2022-08-16
EP4073639B1 (en) 2026-03-04

Similar Documents

Publication Publication Date Title
US10198377B2 (en) Virtual machine state replication using DMA write records
US9413683B2 (en) Managing resources in a distributed system using dynamic clusters
US9146682B2 (en) Method and apparatus for storing data
WO2019226355A1 (en) Embedded scheduling of hardware resources for hardware acceleration
CN103019962A (zh) 数据缓存处理方法、装置以及系统
US11831410B2 (en) Intelligent serverless function scaling
US10289418B2 (en) Cooperative thread array granularity context switch during trap handling
US11210757B2 (en) GPU packet aggregation system
CN109753338B (zh) 虚拟gpu使用率的检测方法和装置
US10754783B2 (en) Techniques to manage cache resource allocations for a processor cache
US20170010914A1 (en) Cooperative thread array granularity context switch during trap handling
KR20230025464A (ko) 사용 정책 및 코어 제약요인을 기반으로 한 코어 선택
US11977907B2 (en) Hybrid push and pull event source broker for serverless function scaling
CN108139938A (zh) 用于利用次级线程以辅助主线程执行应用程序任务的装置、方法及计算机程序
CN112783652A (zh) 当前任务的运行状态获取方法、装置、设备及存储介质
KR102407781B1 (ko) 플립 큐 관리에 기초한 그래픽스 컨텍스트 스케줄링
CN104331322B (zh) 一种进程迁移方法和装置
CN110515729A (zh) 基于图形处理器的图计算节点向量负载平衡方法及装置
CN102947803B (zh) 对指令执行次数进行计数的方法、系统及处理器
US20240160451A1 (en) Dynamic thread count optimizations
US11023274B2 (en) Method and system for processing data
WO2024102236A1 (en) Dynamic thread count optimizations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20899498

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022534186

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020899498

Country of ref document: EP

Effective date: 20220713

WWG Wipo information: grant in national office

Ref document number: 2020899498

Country of ref document: EP