WO2021119072A1 - Gpu packet aggregation system - Google Patents
Gpu packet aggregation system Download PDFInfo
- Publication number
- WO2021119072A1 WO2021119072A1 PCT/US2020/063923 US2020063923W WO2021119072A1 WO 2021119072 A1 WO2021119072 A1 WO 2021119072A1 US 2020063923 W US2020063923 W US 2020063923W WO 2021119072 A1 WO2021119072 A1 WO 2021119072A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- packet
- input
- output
- input packet
- gpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17337—Direct connection machines, e.g. completely connected computers, point to point communication networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/50—Queue scheduling
- H04L47/62—Queue scheduling characterised by scheduling criteria
- H04L47/625—Queue scheduling characterised by scheduling criteria for service slots or service orders
- H04L47/6255—Queue scheduling characterised by scheduling criteria for service slots or service orders queue load conditions, e.g. longest queue first
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/90—Buffering arrangements
- H04L49/9057—Arrangements for supporting packet reassembly or resequencing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2212/00—Encapsulation of packets
Definitions
- processors often employ multiple modules, referred to as compute units (CUs), to execute operations in parallel.
- a processor employs a graphics processing unit (GPU) to carry out a variety of image processing or other general- purpose processing applications.
- the GPU includes multiple CUs to execute the operations in parallel.
- communication of data used to perform these operations impacts the overall efficiency of the processor.
- indices for the graphics and vector processing operations are sent to the CUs via a communication fabric, such as a bus.
- the communication traffic supporting these data transfers consumes an undesirably large portion of the communication fabric’s available bandwidth, thereby reducing overall processing efficiency at the GPU.
- FIG. 1 is a block diagram of a graphics processing unit including hardware that automatically aggregates data from input packets in accordance with some embodiments.
- FIG. 3 is a block diagram illustrating an example packet management component processing an example timeline of input and output packets in accordance with some embodiments.
- FIG. 4 is a block diagram illustrating an example packet management component aggregating indices from input packets and sending the indices in an output packet in accordance with some embodiments.
- a packet management component of a packet aggregation system of a processing unit such as a graphics processing unit (GPU) aggregates data from incoming packets in response to detecting that an output wavefront will be smaller than an output size threshold.
- a send condition e.g., an incoming packet indicates a context switch or data has been stored or held at the packet management component for a particular amount of time
- the packet management component outputs the aggregated data as a wavefront.
- the output conditions are difficult for software systems (e.g., drivers) to detect in a timely manner because of the number of input packets and because of a time lag due to software processing.
- software systems e.g., drivers
- the described systems because the described systems detect output conditions at the hardware level as input packets are received, the system more easily detects output conditions, as compared to a system where a software driver aggregates the data.
- GPUs and other multithreaded processing units typically implement multiple processing elements (which are also referred to as processor cores or compute units) that concurrently execute sets of instructions or operations on multiple data sets.
- the sets of instructions or operations are referred to as threads.
- Operations and program data for the threads is sent to the processing elements by a command processor via communications referred to as packets.
- packets are collections of graphics data referred to as wavefronts.
- communication of wavefronts is hardware inefficient. For example, if a program calls for a large number of wavefronts (e.g., draws) that each have only a few indices (e.g., one or five indices), the resulting wavefronts would each inefficiently utilize communication infrastructure designed to send wavefronts that include more indices (e.g., a 32-wide infrastructure or a 256- wide communication infrastructure).
- indices refer to values generated by a user that provide locations of vertex coordinates.
- an incoming packet includes more data than can be communicated in a single wavefront but not enough data for a last wavefront generated based on the packet to efficiently use communication infrastructure.
- output conditions e.g., register state updates, pipeline flushes, or context switches
- GPU 100 is part of a device such as a desktop or laptop computer, server, smartphone, tablet, game console, or other electronic device.
- the device includes a central processing unit (CPU) that sends various commands or instructions (e.g., draw commands) to GPU 100.
- CPU central processing unit
- GPU 100 executes various operations. These operations include program calls that call for data to be processed.
- command processor 102 sends various input packets 130 to packet management component 104 where input packets indicate various sets of commands from GPU 100 based on the program calls.
- input packets 130 include various types of data including draw indices or indications of send conditions (e.g., indications of events that would cause packet management component 104 to output data to the one or more compute units).
- input packets 130 are sent sequentially over a period of time. Packet management component generates output packet 132 based on data from input packets 130.
- packet management component 104 In response to an output packet being smaller than a an output size threshold (e.g., because an input packet includes less data than an amount used to generate a packet of the output size threshold or because an input packet includes enough data that multiple output packets are generated and a last output packet would be smaller than the output size threshold), packet management component 104 holds and aggregates data corresponding to one or more input packets of input packets 130 and outputs the aggregated data as output packet 132 to one or more compute units such as compute unit 106. In some embodiments, output packet 132 is sent to each of the plurality of compute units or different output packets are sent to respective compute units.
- the aggregated output packet is sent as if it were the first received input packet included in the aggregated output packet (e.g., including various headers and other data corresponding to the first received input packet). In other embodiments, the aggregated output packet is sent as if it were a different received input packet, the aggregated output packet is indicative of multiple received input packets, or the aggregated output packet is indicative of none of the received input packets.
- packet management component 104 analyzes the received input packets. In response to detecting, based on send condition detection component 120, that an input packet does not indicate a send condition, packet management component 104, using packet aggregation component 112, aggregates data corresponding to the input packet in packet buffer 110. For example, packet aggregation component 112 aggregates data corresponding to an incoming input packet with previously stored data in packet buffer 110. In some embodiments, the data is the entire input packet. In other embodiments, the data is a portion of the input packet, data indicated by the input packet (e.g., data generated as a result of one or more computations indicated by the input packet), or both.
- send condition detection component 120 In response to detecting, using send condition detection component 120, that an input packet indicates a send condition, packet management component 104 sends the aggregated data to one or more compute units such as compute unit 106 in output packet 132. Accordingly, fewer output packets 132 are sent to compute unit 106, as compared to a system where an output packet is sent for each input packet.
- input packets 130 are indices of draw commands and output packet 132 is a wavefront including indices corresponding to multiple input packets of input packets 130.
- send condition detection component 120 only determines whether the input packet indicates a send condition in response to detecting that the output packet would be smaller than an output size threshold. In other embodiments, send condition detection component 120 detects various send conditions in parallel with detecting whether the output packet would be smaller than the output size threshold.
- output conditions include send conditions (e.g., conditions indicated by incoming packets), timeout conditions, and size conditions.
- output condition detection component 114 includes various hardware such as buffers and read enable logic to detect various output conditions. In the illustrated embodiment, some output conditions are send conditions indicated by an input packet of input packets 130.
- state information of an input packet indicates a register state update (e.g., a packet specifying a draw topology, controlling a distribution of a draw workload, or specifying a number of bits of an index type) or an event (e.g., a pipeline flush (a process where instructions in a pipeline are removed, for example, due to an incorrect branch prediction) or a context switch (a switch between two applications, tasks, or programs)).
- output conditions include changing a draw source (e.g., from direct memory access to auto index or vice versa), changing virtual reality control fields, or changing an index size between draws.
- other output conditions including those detected by various other means, are also contemplated.
- an output condition includes timeout detection component 122 indicating that a packet storage timer of timeout detection component 122 exceeds a timeout threshold.
- the packet storage timer tracks an amount of time at least some data has been stored at packet buffer 110 (e.g., the data stored the longest).
- timeout detection component 122 indicates an output condition.
- the timeout threshold is user-specified. In other cases, the timeout threshold is specified by another entity such as an application running on GPU 100.
- an output condition includes determining that an amount of the aggregated data stored at packet buffer 110 exceeds an output size threshold.
- the output size threshold is user specified.
- the output size threshold corresponds to a size of a communication infrastructure used to send output packet 132 to compute unit 106. To illustrate, if the communication infrastructure is 32-wide, then detecting that packet buffer stores more than 31 indices causes output condition detection component 114 to indicate that an output condition is satisfied.
- packet aggregation component 112 causes packet buffer 110 to store the data of input packets 130 separated by respective delimiters.
- aggregating the data of input packets 130 includes updating a header file stored at packet buffer 110 to indicate addresses corresponding to respective input packets of input packets 130.
- input packets 130 in their entirety are stored or otherwise held at packet buffer 110. In other embodiments, only a portion of input packets 130 are stored or otherwise held at packet buffer 110.
- a system where packets (e.g., draw indices) are automatically aggregated (e.g., without specific software instructions with regard to the packets) by hardware components.
- the system aggregates the packets without software management.
- the system detects various output conditions (e.g., register state updates and events) and sends aggregated packets in response to the output conditions.
- output conditions e.g., register state updates and events
- FIG. 2 is a flow diagram illustrating a method 200 of aggregating data from input packets in accordance with some embodiments.
- the method 200 is implemented, in some embodiments, by packet management component 104 of GPU 100 of FIG. 1.
- method 200 is initiated by one or more processors in response to one or more instructions stored by a computer-readable storage medium.
- method 200 includes receiving an input packet from a command processor.
- packet management component 104 receives input packet 130 from command processor 102.
- method 200 includes determining whether the input packet indicates a send condition. For example, in some cases, packet management component 104 determines whether the received input packet 130 indicates (e.g., via state information) a send condition (e.g., a register state update or an event). In response to determining that the input packet indicates a send condition, method 200 proceeds to 216. In response to determining that the input packet does not indicate a send condition, method 200 proceeds to 206.
- a send condition e.g., a register state update or an event.
- method 200 in response to determining that the input packet does not indicate a send condition, includes determining whether an output packet is open. For example, in some cases, packet management component 104 determines whether packet buffer 110 includes an open output packet. In response to determining that an output packet is open, method 200 proceeds to 210. In response to determining that no output packet is open, method 200 proceeds to 208.
- method 200 includes creating a new output packet. For example, in some cases, packet management component 104 creates a new output packet in packet buffer 110.
- method 200 in response to determining that an output packet is open or subsequent to creating the new output packet, includes adding contents of the input packet to the output packet. For example, in some cases, packet management component 104 aggregates, in packet buffer 110, data corresponding to input packet 130 with data corresponding to one or more previously stored or otherwise held input packets. As another example, in some cases, packet management component 104 adds data corresponding to input packet 130 to the newly created output packet in packet buffer 110.
- method 200 includes determining whether a timeout condition is satisfied. For example, in some cases, timeout detection component 122 checks a timeout storage tracker that indicates an amount of time at least a portion of the output packet has been stored or otherwise held at packet buffer 110. In response to the timeout storage tracker exceeding a timeout threshold, timeout detection component 122 determines that a timeout condition is satisfied. In response to determining that the timeout condition is satisfied, method 200 proceeds to 216. In response to the timeout storage tracker failing to exceed the timeout threshold, timeout detection component 122 determines that the timeout condition is not satisfied.
- method 200 proceeds to 214.
- 212 further includes determining whether a size of the output packet exceeds an output size threshold, and, in response to determining that the size of the output packet exceeds the output size threshold, proceeding to 216. In some embodiments, determining that the size of the output packet exceeds the output size threshold and proceeding to 216 is performed additionally or alternatively in other portions of method 200 including, for example, between 202 and 204.
- method 200 includes determining whether an incoming input packet is indicated. For example, in some cases, packet management component 104 determines whether command processor 102 is sending an input packet. In response to detecting an incoming input packet, method 200 proceeds to 202. In response to failing to detect an input packet, method 200 proceeds to 212.
- method 200 in response to determining that the input packet indicates a send condition or in response to determining that the timeout condition is satisfied, includes sending the output packet to a compute unit. For example, in some cases, in response to input packet 130 indicating a send condition, packet management component 104 closes the output packet and sends the output packet to compute unit 106 as output packet 132. As another example, in some cases, in response to timeout detection component 122 detecting that a timeout condition is satisfied, packet management component 104 closes the output packet and sends the output packet to compute unit 106 as output packet 132.
- method 200 subsequent to sending the output packet to compute unit 106, includes performing a send condition if it is indicated (e.g., at 204). For example, in response to input packet 130 indicating a send condition, packet management component 104 sends output packet 132 to compute unit 106 and then performs the indicated send condition. Accordingly, a method of aggregating data from input packets is depicted.
- FIG. 3 is a block diagram depicting a timeline 300 that illustrates an example packet management component processing input and output packets in accordance with some embodiments.
- input packets 302-312 and 316 are received at a packet management component (e.g., packet management component 104). Further, event 314 is detected at the packet management component.
- packet management component e.g., packet management component 104
- Input packet 308 indicates a context switch (a send condition).
- input packet 308 indicates that input packets 302-306 correspond to a different context than subsequently received input packets 310 and 312. Accordingly, in response to detecting the send condition, the output packet including the draw data indicated by input packets 302-306 is sent and then the context switch is performed.
- a new output packet is created and draw data (draw4) indicated by input packet 310 is added to the output packet.
- draw data (draw5) indicated by input packet 312 is added to the output packet.
- a timeout detection component detects that a package storage timer indicates that at least a portion of the data in the output packet (e.g., the draw data indicated by input packet 310) has been stored for longer than a timeout threshold. Accordingly, at event 314, a timeout condition is satisfied and the output packet including the draw data indicated by input packets 310 and 312 is sent.
- a new output packet is created and draw data (draw6) indicated by input packet 316 is added to the output packet. Accordingly, an example timeline 300 of input and output packets is illustrated.
- FIG. 4 is a block diagram illustrating an example GPU 400 that includes packet management component 104, which includes packet buffer 110 in accordance with some embodiments.
- packet management component 104 aggregates indices 410-414 from input packets 402-406 in packet buffer 110. As a result, indices 410-414 are aggregated and stored together in packet buffer 110.
- packet management component 104 in response to a register state update indication 416 from input packet 408, sends indices 410-414 in an output packet 420.
- input packets 402-408 correspond to input packets 130 of FIG. 1 and output packet 420 corresponds to output packet 132 of FIG. 1.
- a method includes: receiving, by a packet management component from a command processor of a graphics processing unit (GPU), a first input packet indicating a first set of commands; in response to determining that the first input packet does not indicate a send condition, automatically aggregating data corresponding to the first input packet with previously received packet data stored at a packet buffer of the packet management component.
- the method includes receiving a second input packet indicating a second set of commands received from the GPU; in response to determining that the second input packet indicates a send condition, sending the aggregated data to a compute unit in an output packet; and performing an operation indicated by the send condition.
- the first input packet includes a first plurality of draw indices
- the previously received packet data includes a second plurality of draw indices
- the aggregated data includes the first plurality of draw indices and the second plurality of draw indices.
- the output packet is a wavefront including a set of operations to be performed by the compute unit of the GPU.
- the second input packet indicates at least one of a register state update, a context switch, or a pipeline flush.
- the method includes: subsequent to performing the operation, receiving a third input packet indicating a third set of commands received from the GPU; storing data corresponding to the third input packet at the packet buffer; and in response to detecting that a timeout condition has been satisfied, sending the third input packet to the compute unit in a second output packet.
- the method includes: subsequent to performing the operation, receiving a third input packet indicating a third set of commands received from the GPU; storing data corresponding to the third input packet at the packet buffer; and in response to detecting that an amount of second aggregated data stored at the packet buffer exceeds an output size threshold, sending the third input packet to the compute unit in a second output packet.
- the output size threshold is user programmable.
- a graphics processing unit includes: a command processor configured to send input packets indicating commands received from the GPU; a packet management component, including: a packet buffer configured to store data corresponding to the input packets received from the command processor; a packet aggregation component configured to: identify state information of an incoming first input packet; in response to the state information indicating an aggregation condition, aggregate data corresponding to the first input packet with data corresponding to a second input packet stored at the packet buffer; and in response to the state information indicating a send condition, send an output packet for processing by a compute unit, wherein the output packet includes aggregated data stored at the packet buffer.
- a command processor configured to send input packets indicating commands received from the GPU
- a packet management component including: a packet buffer configured to store data corresponding to the input packets received from the command processor; a packet aggregation component configured to: identify state information of an incoming first input packet; in response to the state information indicating an aggregation condition, aggregate data corresponding to the first input packet with data
- the packet aggregation component comprises a timeout detection component configured to cause the output packet to be sent in response to an amount of time at least a portion of the data corresponding to the second input packet has been stored exceeding a timeout threshold.
- the timeout threshold is user-specified.
- the output packet is a wavefront.
- the aggregated data includes a portion of the first input packet and a portion of the second input packet.
- the aggregated data includes the first input packet and the second input packet.
- a method includes: receiving, by a packet management component from a command processor, a first input packet indicating a first set of commands received from a graphics processing unit (GPU); storing data corresponding to the first input packet at a packet buffer of the packet management component; receiving a second input packet indicating a second set of commands received from the GPU; in response to determining that an output condition has not been satisfied, automatically aggregating data corresponding to the second input packet with the data corresponding to the first input packet; and in response to determining that an output condition has been satisfied, sending the aggregated data to one or more compute units in one or more output packets.
- determining that the output condition has been satisfied is performed in response to determining that an amount of the aggregated data stored at the packet buffer exceeds an output size threshold.
- determining that the output condition has been satisfied comprises determining that a third input packet indicates a send condition.
- the method includes: in response to receiving the first input packet, starting, at a timeout detection component of the packet management component, a packet storage timer.
- determining that the output condition has been satisfied comprises determining that the packet storage timer exceeds a timeout threshold.
- the timeout threshold is user-specified.
- a computer readable storage medium includes any non- transitory storage medium, or combination of non-transitory storage media, accessible by a computer system during use to provide instructions and/or data to the computer system.
- such storage media includes, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu- Ray disc), magnetic media (e.g., floppy disc , magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media.
- optical media e.g., compact disc (CD), digital versatile disc (DVD), Blu- Ray disc
- magnetic media e.g., floppy disc , magnetic tape, or magnetic hard drive
- volatile memory e.g., random access memory (RAM) or cache
- non-volatile memory e.g., read-only memory (ROM) or
- the computer readable storage medium is embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).
- the computing system e.g., system RAM or ROM
- fixedly attached to the computing system e.g., a magnetic hard drive
- removably attached to the computing system e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory
- USB Universal Serial Bus
- certain aspects of the techniques described above are implemented by one or more processors of a processing system executing software.
- the software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium.
- the software includes the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above.
- the non-transitory computer readable storage medium includes a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like.
- the executable instructions stored on the non-transitory computer readable storage medium are in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022534186A JP7528217B2 (ja) | 2019-12-13 | 2020-12-09 | Gpuパケット集約システム |
| EP20899498.8A EP4073639B1 (en) | 2019-12-13 | 2020-12-09 | Gpu packet aggregation system |
| KR1020227019998A KR102709341B1 (ko) | 2019-12-13 | 2020-12-09 | Gpu 패킷 집계 시스템 |
| CN202080085569.6A CN114902181A (zh) | 2019-12-13 | 2020-12-09 | Gpu包聚合系统 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/713,472 | 2019-12-13 | ||
| US16/713,472 US11210757B2 (en) | 2019-12-13 | 2019-12-13 | GPU packet aggregation system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021119072A1 true WO2021119072A1 (en) | 2021-06-17 |
Family
ID=76316977
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2020/063923 Ceased WO2021119072A1 (en) | 2019-12-13 | 2020-12-09 | Gpu packet aggregation system |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US11210757B2 (https=) |
| EP (1) | EP4073639B1 (https=) |
| JP (1) | JP7528217B2 (https=) |
| KR (1) | KR102709341B1 (https=) |
| CN (1) | CN114902181A (https=) |
| WO (1) | WO2021119072A1 (https=) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12106112B2 (en) * | 2020-12-03 | 2024-10-01 | Intel Corporation | Methods and apparatus to generate graphics processing unit long instruction traces |
| CN113626369B (zh) * | 2021-08-14 | 2023-05-26 | 苏州浪潮智能科技有限公司 | 一种多节点集群环形通信的方法、装置、设备及可读介质 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7839876B1 (en) * | 2006-01-25 | 2010-11-23 | Marvell International Ltd. | Packet aggregation |
| US20160283416A1 (en) * | 2015-03-23 | 2016-09-29 | Samsung Electronics Co., Ltd. | Bus interface device, semiconductor integrated circuit device including the same, and method of operating the same |
| US20160352598A1 (en) * | 2015-05-29 | 2016-12-01 | Advanced Micro Devices, Inc. | Message aggregation, combining and compression for efficient data communications in gpu-based clusters |
| US20160379336A1 (en) * | 2015-04-01 | 2016-12-29 | Mediatek Inc. | Methods of a graphics-processing unit for tile-based rendering of a display area and graphics-processing apparatus |
| US20170123866A1 (en) * | 2008-05-15 | 2017-05-04 | Ip Reservoir, Llc | Method and System for Accelerated Stream Processing |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR100628619B1 (ko) * | 2000-07-10 | 2006-09-26 | 마쯔시다덴기산교 가부시키가이샤 | 복수의 디코드 장치 및 방법 |
| WO2005053216A2 (en) * | 2003-11-25 | 2005-06-09 | Dg2L Technologies | Methods and systems for reliable distribution of media over a network |
| US7209139B1 (en) * | 2005-01-07 | 2007-04-24 | Electronic Arts | Efficient rendering of similar objects in a three-dimensional graphics engine |
| CN101471826B (zh) * | 2007-12-27 | 2012-12-12 | 华为技术有限公司 | 命令行接口的测试方法及装置 |
| JP2010055214A (ja) | 2008-08-26 | 2010-03-11 | Sanyo Electric Co Ltd | データ処理装置 |
| EP2596470A1 (en) * | 2010-07-19 | 2013-05-29 | Advanced Micro Devices, Inc. | Data processing using on-chip memory in multiple processing units |
| CN102323917B (zh) * | 2011-09-06 | 2013-05-15 | 中国人民解放军国防科学技术大学 | 一种基于共享内存实现多进程共享gpu的方法 |
| US20130155077A1 (en) * | 2011-12-14 | 2013-06-20 | Advanced Micro Devices, Inc. | Policies for Shader Resource Allocation in a Shader Core |
| US20130162661A1 (en) * | 2011-12-21 | 2013-06-27 | Nvidia Corporation | System and method for long running compute using buffers as timeslices |
| US9509616B1 (en) * | 2014-11-24 | 2016-11-29 | Amazon Technologies, Inc. | Congestion sensitive path-balancing |
| US20170300361A1 (en) * | 2016-04-15 | 2017-10-19 | Intel Corporation | Employing out of order queues for better gpu utilization |
| JP7100624B2 (ja) * | 2016-08-29 | 2022-07-13 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | 優先プリミティブバッチのビニング及びソートを用いたハイブリッドレンダリング |
| US10572258B2 (en) * | 2017-04-01 | 2020-02-25 | Intel Corporation | Transitionary pre-emption for virtual reality related contexts |
| CN110223216B (zh) * | 2019-06-11 | 2023-01-17 | 西安芯瞳半导体技术有限公司 | 一种基于并行plb的数据处理方法、装置及计算机存储介质 |
| CN110415161B (zh) * | 2019-07-19 | 2023-06-27 | 龙芯中科(合肥)技术有限公司 | 图形处理方法、装置、设备及存储介质 |
-
2019
- 2019-12-13 US US16/713,472 patent/US11210757B2/en active Active
-
2020
- 2020-12-09 KR KR1020227019998A patent/KR102709341B1/ko active Active
- 2020-12-09 WO PCT/US2020/063923 patent/WO2021119072A1/en not_active Ceased
- 2020-12-09 CN CN202080085569.6A patent/CN114902181A/zh active Pending
- 2020-12-09 EP EP20899498.8A patent/EP4073639B1/en active Active
- 2020-12-09 JP JP2022534186A patent/JP7528217B2/ja active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7839876B1 (en) * | 2006-01-25 | 2010-11-23 | Marvell International Ltd. | Packet aggregation |
| US20170123866A1 (en) * | 2008-05-15 | 2017-05-04 | Ip Reservoir, Llc | Method and System for Accelerated Stream Processing |
| US20160283416A1 (en) * | 2015-03-23 | 2016-09-29 | Samsung Electronics Co., Ltd. | Bus interface device, semiconductor integrated circuit device including the same, and method of operating the same |
| US20160379336A1 (en) * | 2015-04-01 | 2016-12-29 | Mediatek Inc. | Methods of a graphics-processing unit for tile-based rendering of a display area and graphics-processing apparatus |
| US20160352598A1 (en) * | 2015-05-29 | 2016-12-01 | Advanced Micro Devices, Inc. | Message aggregation, combining and compression for efficient data communications in gpu-based clusters |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4073639A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7528217B2 (ja) | 2024-08-05 |
| EP4073639A4 (en) | 2024-01-10 |
| US20210183004A1 (en) | 2021-06-17 |
| CN114902181A (zh) | 2022-08-12 |
| EP4073639A1 (en) | 2022-10-19 |
| JP2023505783A (ja) | 2023-02-13 |
| US11210757B2 (en) | 2021-12-28 |
| KR102709341B1 (ko) | 2024-09-25 |
| KR20220113710A (ko) | 2022-08-16 |
| EP4073639B1 (en) | 2026-03-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10198377B2 (en) | Virtual machine state replication using DMA write records | |
| US9413683B2 (en) | Managing resources in a distributed system using dynamic clusters | |
| US9146682B2 (en) | Method and apparatus for storing data | |
| WO2019226355A1 (en) | Embedded scheduling of hardware resources for hardware acceleration | |
| CN103019962A (zh) | 数据缓存处理方法、装置以及系统 | |
| US11831410B2 (en) | Intelligent serverless function scaling | |
| US10289418B2 (en) | Cooperative thread array granularity context switch during trap handling | |
| US11210757B2 (en) | GPU packet aggregation system | |
| CN109753338B (zh) | 虚拟gpu使用率的检测方法和装置 | |
| US10754783B2 (en) | Techniques to manage cache resource allocations for a processor cache | |
| US20170010914A1 (en) | Cooperative thread array granularity context switch during trap handling | |
| KR20230025464A (ko) | 사용 정책 및 코어 제약요인을 기반으로 한 코어 선택 | |
| US11977907B2 (en) | Hybrid push and pull event source broker for serverless function scaling | |
| CN108139938A (zh) | 用于利用次级线程以辅助主线程执行应用程序任务的装置、方法及计算机程序 | |
| CN112783652A (zh) | 当前任务的运行状态获取方法、装置、设备及存储介质 | |
| KR102407781B1 (ko) | 플립 큐 관리에 기초한 그래픽스 컨텍스트 스케줄링 | |
| CN104331322B (zh) | 一种进程迁移方法和装置 | |
| CN110515729A (zh) | 基于图形处理器的图计算节点向量负载平衡方法及装置 | |
| CN102947803B (zh) | 对指令执行次数进行计数的方法、系统及处理器 | |
| US20240160451A1 (en) | Dynamic thread count optimizations | |
| US11023274B2 (en) | Method and system for processing data | |
| WO2024102236A1 (en) | Dynamic thread count optimizations |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20899498 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022534186 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020899498 Country of ref document: EP Effective date: 20220713 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2020899498 Country of ref document: EP |