EP4073639A1 - Gpu packet aggregation system - Google Patents
Gpu packet aggregation systemInfo
- Publication number
- EP4073639A1 EP4073639A1 EP20899498.8A EP20899498A EP4073639A1 EP 4073639 A1 EP4073639 A1 EP 4073639A1 EP 20899498 A EP20899498 A EP 20899498A EP 4073639 A1 EP4073639 A1 EP 4073639A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- packet
- input
- output
- input packet
- gpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17337—Direct connection machines, e.g. completely connected computers, point to point communication networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2212/00—Encapsulation of packets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/50—Queue scheduling
- H04L47/62—Queue scheduling characterised by scheduling criteria
- H04L47/625—Queue scheduling characterised by scheduling criteria for service slots or service orders
- H04L47/6255—Queue scheduling characterised by scheduling criteria for service slots or service orders queue load conditions, e.g. longest queue first
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/90—Buffering arrangements
- H04L49/9057—Arrangements for supporting packet reassembly or resequencing
Definitions
- processors often employ multiple modules, referred to as compute units (CUs), to execute operations in parallel.
- a processor employs a graphics processing unit (GPU) to carry out a variety of image processing or other general- purpose processing applications.
- the GPU includes multiple CUs to execute the operations in parallel.
- communication of data used to perform these operations impacts the overall efficiency of the processor.
- indices for the graphics and vector processing operations are sent to the CUs via a communication fabric, such as a bus.
- the communication traffic supporting these data transfers consumes an undesirably large portion of the communication fabric’s available bandwidth, thereby reducing overall processing efficiency at the GPU.
- FIG. 1 is a block diagram of a graphics processing unit including hardware that automatically aggregates data from input packets in accordance with some embodiments.
- FIG. 2 is a flow diagram illustrating a method of aggregating data from input packets in accordance with some embodiments.
- FIG. 3 is a block diagram illustrating an example packet management component processing an example timeline of input and output packets in accordance with some embodiments.
- FIG. 4 is a block diagram illustrating an example packet management component aggregating indices from input packets and sending the indices in an output packet in accordance with some embodiments.
- a packet management component of a packet aggregation system of a processing unit such as a graphics processing unit (GPU) aggregates data from incoming packets in response to detecting that an output wavefront will be smaller than an output size threshold.
- a send condition e.g., an incoming packet indicates a context switch or data has been stored or held at the packet management component for a particular amount of time
- the packet management component outputs the aggregated data as a wavefront.
- the output conditions are difficult for software systems (e.g., drivers) to detect in a timely manner because of the number of input packets and because of a time lag due to software processing.
- software systems e.g., drivers
- the described systems because the described systems detect output conditions at the hardware level as input packets are received, the system more easily detects output conditions, as compared to a system where a software driver aggregates the data.
- GPUs and other multithreaded processing units typically implement multiple processing elements (which are also referred to as processor cores or compute units) that concurrently execute sets of instructions or operations on multiple data sets.
- the sets of instructions or operations are referred to as threads.
- Operations and program data for the threads is sent to the processing elements by a command processor via communications referred to as packets.
- packets are collections of graphics data referred to as wavefronts.
- communication of wavefronts is hardware inefficient. For example, if a program calls for a large number of wavefronts (e.g., draws) that each have only a few indices (e.g., one or five indices), the resulting wavefronts would each inefficiently utilize communication infrastructure designed to send wavefronts that include more indices (e.g., a 32-wide infrastructure or a 256- wide communication infrastructure).
- indices refer to values generated by a user that provide locations of vertex coordinates.
- an incoming packet includes more data than can be communicated in a single wavefront but not enough data for a last wavefront generated based on the packet to efficiently use communication infrastructure.
- output conditions e.g., register state updates, pipeline flushes, or context switches
- FIG. 1 is a block diagram of a graphics processing unit (GPU) 100 including hardware that automatically aggregates data from input packets in accordance with some embodiments.
- GPU 100 includes command processor 102, packet management component 104, and compute unit 106.
- Packet management component 104 includes packet buffer 110, packet aggregation component 112, and output condition detection component 114.
- Output condition detection component 114 includes send condition detection component 120 and timeout detection component 122.
- send condition detection component 120 includes timeout detection component 122.
- timeout detection component 122 is separate from output condition detection component 114.
- GPU 100 includes a plurality of compute units.
- GPU 100 is part of a device such as a desktop or laptop computer, server, smartphone, tablet, game console, or other electronic device.
- the device includes a central processing unit (CPU) that sends various commands or instructions (e.g., draw commands) to GPU 100.
- CPU central processing unit
- GPU 100 executes various operations. These operations include program calls that call for data to be processed.
- command processor 102 sends various input packets 130 to packet management component 104 where input packets indicate various sets of commands from GPU 100 based on the program calls.
- input packets 130 include various types of data including draw indices or indications of send conditions (e.g., indications of events that would cause packet management component 104 to output data to the one or more compute units).
- input packets 130 are sent sequentially over a period of time. Packet management component generates output packet 132 based on data from input packets 130.
- packet management component 104 In response to an output packet being smaller than a an output size threshold (e.g., because an input packet includes less data than an amount used to generate a packet of the output size threshold or because an input packet includes enough data that multiple output packets are generated and a last output packet would be smaller than the output size threshold), packet management component 104 holds and aggregates data corresponding to one or more input packets of input packets 130 and outputs the aggregated data as output packet 132 to one or more compute units such as compute unit 106. In some embodiments, output packet 132 is sent to each of the plurality of compute units or different output packets are sent to respective compute units.
- the aggregated output packet is sent as if it were the first received input packet included in the aggregated output packet (e.g., including various headers and other data corresponding to the first received input packet). In other embodiments, the aggregated output packet is sent as if it were a different received input packet, the aggregated output packet is indicative of multiple received input packets, or the aggregated output packet is indicative of none of the received input packets.
- packet management component 104 analyzes the received input packets. In response to detecting, based on send condition detection component 120, that an input packet does not indicate a send condition, packet management component 104, using packet aggregation component 112, aggregates data corresponding to the input packet in packet buffer 110. For example, packet aggregation component 112 aggregates data corresponding to an incoming input packet with previously stored data in packet buffer 110. In some embodiments, the data is the entire input packet. In other embodiments, the data is a portion of the input packet, data indicated by the input packet (e.g., data generated as a result of one or more computations indicated by the input packet), or both.
- send condition detection component 120 In response to detecting, using send condition detection component 120, that an input packet indicates a send condition, packet management component 104 sends the aggregated data to one or more compute units such as compute unit 106 in output packet 132. Accordingly, fewer output packets 132 are sent to compute unit 106, as compared to a system where an output packet is sent for each input packet.
- input packets 130 are indices of draw commands and output packet 132 is a wavefront including indices corresponding to multiple input packets of input packets 130.
- send condition detection component 120 only determines whether the input packet indicates a send condition in response to detecting that the output packet would be smaller than an output size threshold. In other embodiments, send condition detection component 120 detects various send conditions in parallel with detecting whether the output packet would be smaller than the output size threshold.
- output conditions include send conditions (e.g., conditions indicated by incoming packets), timeout conditions, and size conditions.
- output condition detection component 114 includes various hardware such as buffers and read enable logic to detect various output conditions. In the illustrated embodiment, some output conditions are send conditions indicated by an input packet of input packets 130.
- state information of an input packet indicates a register state update (e.g., a packet specifying a draw topology, controlling a distribution of a draw workload, or specifying a number of bits of an index type) or an event (e.g., a pipeline flush (a process where instructions in a pipeline are removed, for example, due to an incorrect branch prediction) or a context switch (a switch between two applications, tasks, or programs)).
- output conditions include changing a draw source (e.g., from direct memory access to auto index or vice versa), changing virtual reality control fields, or changing an index size between draws.
- other output conditions including those detected by various other means, are also contemplated.
- an output condition includes timeout detection component 122 indicating that a packet storage timer of timeout detection component 122 exceeds a timeout threshold.
- the packet storage timer tracks an amount of time at least some data has been stored at packet buffer 110 (e.g., the data stored the longest).
- timeout detection component 122 indicates an output condition.
- the timeout threshold is user-specified. In other cases, the timeout threshold is specified by another entity such as an application running on GPU 100.
- an output condition includes determining that an amount of the aggregated data stored at packet buffer 110 exceeds an output size threshold.
- the output size threshold is user specified.
- the output size threshold corresponds to a size of a communication infrastructure used to send output packet 132 to compute unit 106. To illustrate, if the communication infrastructure is 32-wide, then detecting that packet buffer stores more than 31 indices causes output condition detection component 114 to indicate that an output condition is satisfied.
- packet aggregation component 112 causes packet buffer 110 to store the data of input packets 130 separated by respective delimiters.
- aggregating the data of input packets 130 includes updating a header file stored at packet buffer 110 to indicate addresses corresponding to respective input packets of input packets 130.
- input packets 130 in their entirety are stored or otherwise held at packet buffer 110. In other embodiments, only a portion of input packets 130 are stored or otherwise held at packet buffer 110.
- a system where packets (e.g., draw indices) are automatically aggregated (e.g., without specific software instructions with regard to the packets) by hardware components.
- the system aggregates the packets without software management.
- the system detects various output conditions (e.g., register state updates and events) and sends aggregated packets in response to the output conditions.
- output conditions e.g., register state updates and events
- FIG. 2 is a flow diagram illustrating a method 200 of aggregating data from input packets in accordance with some embodiments.
- the method 200 is implemented, in some embodiments, by packet management component 104 of GPU 100 of FIG. 1.
- method 200 is initiated by one or more processors in response to one or more instructions stored by a computer-readable storage medium.
- method 200 includes receiving an input packet from a command processor.
- packet management component 104 receives input packet 130 from command processor 102.
- method 200 includes determining whether the input packet indicates a send condition. For example, in some cases, packet management component 104 determines whether the received input packet 130 indicates (e.g., via state information) a send condition (e.g., a register state update or an event). In response to determining that the input packet indicates a send condition, method 200 proceeds to 216. In response to determining that the input packet does not indicate a send condition, method 200 proceeds to 206.
- a send condition e.g., a register state update or an event.
- method 200 in response to determining that the input packet does not indicate a send condition, includes determining whether an output packet is open. For example, in some cases, packet management component 104 determines whether packet buffer 110 includes an open output packet. In response to determining that an output packet is open, method 200 proceeds to 210. In response to determining that no output packet is open, method 200 proceeds to 208.
- method 200 includes creating a new output packet. For example, in some cases, packet management component 104 creates a new output packet in packet buffer 110.
- method 200 in response to determining that an output packet is open or subsequent to creating the new output packet, includes adding contents of the input packet to the output packet. For example, in some cases, packet management component 104 aggregates, in packet buffer 110, data corresponding to input packet 130 with data corresponding to one or more previously stored or otherwise held input packets. As another example, in some cases, packet management component 104 adds data corresponding to input packet 130 to the newly created output packet in packet buffer 110.
- method 200 includes determining whether a timeout condition is satisfied. For example, in some cases, timeout detection component 122 checks a timeout storage tracker that indicates an amount of time at least a portion of the output packet has been stored or otherwise held at packet buffer 110. In response to the timeout storage tracker exceeding a timeout threshold, timeout detection component 122 determines that a timeout condition is satisfied. In response to determining that the timeout condition is satisfied, method 200 proceeds to 216. In response to the timeout storage tracker failing to exceed the timeout threshold, timeout detection component 122 determines that the timeout condition is not satisfied.
- method 200 proceeds to 214.
- 212 further includes determining whether a size of the output packet exceeds an output size threshold, and, in response to determining that the size of the output packet exceeds the output size threshold, proceeding to 216. In some embodiments, determining that the size of the output packet exceeds the output size threshold and proceeding to 216 is performed additionally or alternatively in other portions of method 200 including, for example, between 202 and 204.
- method 200 includes determining whether an incoming input packet is indicated. For example, in some cases, packet management component 104 determines whether command processor 102 is sending an input packet. In response to detecting an incoming input packet, method 200 proceeds to 202. In response to failing to detect an input packet, method 200 proceeds to 212.
- method 200 in response to determining that the input packet indicates a send condition or in response to determining that the timeout condition is satisfied, includes sending the output packet to a compute unit. For example, in some cases, in response to input packet 130 indicating a send condition, packet management component 104 closes the output packet and sends the output packet to compute unit 106 as output packet 132. As another example, in some cases, in response to timeout detection component 122 detecting that a timeout condition is satisfied, packet management component 104 closes the output packet and sends the output packet to compute unit 106 as output packet 132.
- method 200 subsequent to sending the output packet to compute unit 106, includes performing a send condition if it is indicated (e.g., at 204). For example, in response to input packet 130 indicating a send condition, packet management component 104 sends output packet 132 to compute unit 106 and then performs the indicated send condition. Accordingly, a method of aggregating data from input packets is depicted.
- FIG. 3 is a block diagram depicting a timeline 300 that illustrates an example packet management component processing input and output packets in accordance with some embodiments.
- input packets 302-312 and 316 are received at a packet management component (e.g., packet management component 104). Further, event 314 is detected at the packet management component.
- packet management component e.g., packet management component 104
- Input packet 308 indicates a context switch (a send condition).
- input packet 308 indicates that input packets 302-306 correspond to a different context than subsequently received input packets 310 and 312. Accordingly, in response to detecting the send condition, the output packet including the draw data indicated by input packets 302-306 is sent and then the context switch is performed.
- a new output packet is created and draw data (draw4) indicated by input packet 310 is added to the output packet.
- draw data (draw5) indicated by input packet 312 is added to the output packet.
- a timeout detection component detects that a package storage timer indicates that at least a portion of the data in the output packet (e.g., the draw data indicated by input packet 310) has been stored for longer than a timeout threshold. Accordingly, at event 314, a timeout condition is satisfied and the output packet including the draw data indicated by input packets 310 and 312 is sent.
- a new output packet is created and draw data (draw6) indicated by input packet 316 is added to the output packet. Accordingly, an example timeline 300 of input and output packets is illustrated.
- FIG. 4 is a block diagram illustrating an example GPU 400 that includes packet management component 104, which includes packet buffer 110 in accordance with some embodiments.
- packet management component 104 aggregates indices 410-414 from input packets 402-406 in packet buffer 110. As a result, indices 410-414 are aggregated and stored together in packet buffer 110.
- packet management component 104 in response to a register state update indication 416 from input packet 408, sends indices 410-414 in an output packet 420.
- input packets 402-408 correspond to input packets 130 of FIG. 1 and output packet 420 corresponds to output packet 132 of FIG. 1.
- a method includes: receiving, by a packet management component from a command processor of a graphics processing unit (GPU), a first input packet indicating a first set of commands; in response to determining that the first input packet does not indicate a send condition, automatically aggregating data corresponding to the first input packet with previously received packet data stored at a packet buffer of the packet management component.
- the method includes receiving a second input packet indicating a second set of commands received from the GPU; in response to determining that the second input packet indicates a send condition, sending the aggregated data to a compute unit in an output packet; and performing an operation indicated by the send condition.
- the first input packet includes a first plurality of draw indices
- the previously received packet data includes a second plurality of draw indices
- the aggregated data includes the first plurality of draw indices and the second plurality of draw indices.
- the output packet is a wavefront including a set of operations to be performed by the compute unit of the GPU.
- the second input packet indicates at least one of a register state update, a context switch, or a pipeline flush.
- the method includes: subsequent to performing the operation, receiving a third input packet indicating a third set of commands received from the GPU; storing data corresponding to the third input packet at the packet buffer; and in response to detecting that a timeout condition has been satisfied, sending the third input packet to the compute unit in a second output packet.
- the method includes: subsequent to performing the operation, receiving a third input packet indicating a third set of commands received from the GPU; storing data corresponding to the third input packet at the packet buffer; and in response to detecting that an amount of second aggregated data stored at the packet buffer exceeds an output size threshold, sending the third input packet to the compute unit in a second output packet.
- the output size threshold is user programmable.
- a graphics processing unit includes: a command processor configured to send input packets indicating commands received from the GPU; a packet management component, including: a packet buffer configured to store data corresponding to the input packets received from the command processor; a packet aggregation component configured to: identify state information of an incoming first input packet; in response to the state information indicating an aggregation condition, aggregate data corresponding to the first input packet with data corresponding to a second input packet stored at the packet buffer; and in response to the state information indicating a send condition, send an output packet for processing by a compute unit, wherein the output packet includes aggregated data stored at the packet buffer.
- a command processor configured to send input packets indicating commands received from the GPU
- a packet management component including: a packet buffer configured to store data corresponding to the input packets received from the command processor; a packet aggregation component configured to: identify state information of an incoming first input packet; in response to the state information indicating an aggregation condition, aggregate data corresponding to the first input packet with data
- the packet aggregation component comprises a timeout detection component configured to cause the output packet to be sent in response to an amount of time at least a portion of the data corresponding to the second input packet has been stored exceeding a timeout threshold.
- the timeout threshold is user-specified.
- the output packet is a wavefront.
- the aggregated data includes a portion of the first input packet and a portion of the second input packet.
- the aggregated data includes the first input packet and the second input packet.
- a method includes: receiving, by a packet management component from a command processor, a first input packet indicating a first set of commands received from a graphics processing unit (GPU); storing data corresponding to the first input packet at a packet buffer of the packet management component; receiving a second input packet indicating a second set of commands received from the GPU; in response to determining that an output condition has not been satisfied, automatically aggregating data corresponding to the second input packet with the data corresponding to the first input packet; and in response to determining that an output condition has been satisfied, sending the aggregated data to one or more compute units in one or more output packets.
- determining that the output condition has been satisfied is performed in response to determining that an amount of the aggregated data stored at the packet buffer exceeds an output size threshold.
- determining that the output condition has been satisfied comprises determining that a third input packet indicates a send condition.
- the method includes: in response to receiving the first input packet, starting, at a timeout detection component of the packet management component, a packet storage timer.
- determining that the output condition has been satisfied comprises determining that the packet storage timer exceeds a timeout threshold.
- the timeout threshold is user-specified.
- a computer readable storage medium includes any non- transitory storage medium, or combination of non-transitory storage media, accessible by a computer system during use to provide instructions and/or data to the computer system.
- such storage media includes, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu- Ray disc), magnetic media (e.g., floppy disc , magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media.
- optical media e.g., compact disc (CD), digital versatile disc (DVD), Blu- Ray disc
- magnetic media e.g., floppy disc , magnetic tape, or magnetic hard drive
- volatile memory e.g., random access memory (RAM) or cache
- non-volatile memory e.g., read-only memory (ROM) or
- the computer readable storage medium is embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).
- the computing system e.g., system RAM or ROM
- fixedly attached to the computing system e.g., a magnetic hard drive
- removably attached to the computing system e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory
- USB Universal Serial Bus
- certain aspects of the techniques described above are implemented by one or more processors of a processing system executing software.
- the software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium.
- the software includes the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above.
- the non-transitory computer readable storage medium includes a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like.
- the executable instructions stored on the non-transitory computer readable storage medium are in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/713,472 US11210757B2 (en) | 2019-12-13 | 2019-12-13 | GPU packet aggregation system |
PCT/US2020/063923 WO2021119072A1 (en) | 2019-12-13 | 2020-12-09 | Gpu packet aggregation system |
Publications (2)
Publication Number | Publication Date |
---|---|
EP4073639A1 true EP4073639A1 (en) | 2022-10-19 |
EP4073639A4 EP4073639A4 (en) | 2024-01-10 |
Family
ID=76316977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20899498.8A Pending EP4073639A4 (en) | 2019-12-13 | 2020-12-09 | Gpu packet aggregation system |
Country Status (6)
Country | Link |
---|---|
US (1) | US11210757B2 (en) |
EP (1) | EP4073639A4 (en) |
JP (1) | JP7528217B2 (en) |
KR (1) | KR102709341B1 (en) |
CN (1) | CN114902181A (en) |
WO (1) | WO2021119072A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12106112B2 (en) * | 2020-12-03 | 2024-10-01 | Intel Corporation | Methods and apparatus to generate graphics processing unit long instruction traces |
CN113626369B (en) * | 2021-08-14 | 2023-05-26 | 苏州浪潮智能科技有限公司 | Method, device, equipment and readable medium for multi-node cluster ring communication |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100628619B1 (en) * | 2000-07-10 | 2006-09-26 | 마쯔시다덴기산교 가부시키가이샤 | Apparatus and method of multiple decoding |
WO2005053216A2 (en) * | 2003-11-25 | 2005-06-09 | Dg2L Technologies | Methods and systems for reliable distribution of media over a network |
US7209139B1 (en) * | 2005-01-07 | 2007-04-24 | Electronic Arts | Efficient rendering of similar objects in a three-dimensional graphics engine |
US7839876B1 (en) | 2006-01-25 | 2010-11-23 | Marvell International Ltd. | Packet aggregation |
CN101471826B (en) * | 2007-12-27 | 2012-12-12 | 华为技术有限公司 | Test method and device for command line interface |
US8374986B2 (en) | 2008-05-15 | 2013-02-12 | Exegy Incorporated | Method and system for accelerated stream processing |
JP2010055214A (en) | 2008-08-26 | 2010-03-11 | Sanyo Electric Co Ltd | Data processor |
EP2596470A1 (en) * | 2010-07-19 | 2013-05-29 | Advanced Micro Devices, Inc. | Data processing using on-chip memory in multiple processing units |
CN102323917B (en) * | 2011-09-06 | 2013-05-15 | 中国人民解放军国防科学技术大学 | A method to realize multi-process sharing GPU based on shared memory |
US20130155077A1 (en) | 2011-12-14 | 2013-06-20 | Advanced Micro Devices, Inc. | Policies for Shader Resource Allocation in a Shader Core |
US20130162661A1 (en) * | 2011-12-21 | 2013-06-27 | Nvidia Corporation | System and method for long running compute using buffers as timeslices |
US9509616B1 (en) * | 2014-11-24 | 2016-11-29 | Amazon Technologies, Inc. | Congestion sensitive path-balancing |
KR102287402B1 (en) | 2015-03-23 | 2021-08-06 | 삼성전자주식회사 | Bus Interface Device and Semiconductor Integrated Circuit including the same, and Method of operating the same |
US9830731B2 (en) | 2015-04-01 | 2017-11-28 | Mediatek Inc. | Methods of a graphics-processing unit for tile-based rendering of a display area and graphics-processing apparatus |
US10320695B2 (en) | 2015-05-29 | 2019-06-11 | Advanced Micro Devices, Inc. | Message aggregation, combining and compression for efficient data communications in GPU-based clusters |
US20170300361A1 (en) * | 2016-04-15 | 2017-10-19 | Intel Corporation | Employing out of order queues for better gpu utilization |
KR102479395B1 (en) * | 2016-08-29 | 2022-12-20 | 어드밴스드 마이크로 디바이시즈, 인코포레이티드 | Hybrid render with preferred batch binning and classification of primitives |
US10572258B2 (en) * | 2017-04-01 | 2020-02-25 | Intel Corporation | Transitionary pre-emption for virtual reality related contexts |
CN110223216B (en) * | 2019-06-11 | 2023-01-17 | 西安芯瞳半导体技术有限公司 | A data processing method, device and computer storage medium based on parallel PLB |
CN110415161B (en) * | 2019-07-19 | 2023-06-27 | 龙芯中科(合肥)技术有限公司 | Graphics processing method, device, equipment and storage medium |
-
2019
- 2019-12-13 US US16/713,472 patent/US11210757B2/en active Active
-
2020
- 2020-12-09 EP EP20899498.8A patent/EP4073639A4/en active Pending
- 2020-12-09 KR KR1020227019998A patent/KR102709341B1/en active Active
- 2020-12-09 WO PCT/US2020/063923 patent/WO2021119072A1/en unknown
- 2020-12-09 CN CN202080085569.6A patent/CN114902181A/en active Pending
- 2020-12-09 JP JP2022534186A patent/JP7528217B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
EP4073639A4 (en) | 2024-01-10 |
KR20220113710A (en) | 2022-08-16 |
JP2023505783A (en) | 2023-02-13 |
JP7528217B2 (en) | 2024-08-05 |
US11210757B2 (en) | 2021-12-28 |
US20210183004A1 (en) | 2021-06-17 |
KR102709341B1 (en) | 2024-09-25 |
WO2021119072A1 (en) | 2021-06-17 |
CN114902181A (en) | 2022-08-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10198377B2 (en) | Virtual machine state replication using DMA write records | |
US9413683B2 (en) | Managing resources in a distributed system using dynamic clusters | |
US10198189B2 (en) | Data allocation among devices with different data rates | |
EP3803588A1 (en) | Embedded scheduling of hardware resources for hardware acceleration | |
US11831410B2 (en) | Intelligent serverless function scaling | |
US9996349B2 (en) | Clearing specified blocks of main storage | |
WO2020157599A1 (en) | Engine pre-emption and restoration | |
US11210757B2 (en) | GPU packet aggregation system | |
US10289418B2 (en) | Cooperative thread array granularity context switch during trap handling | |
US20190042454A1 (en) | Techniques to manage cache resource allocations for a processor cache | |
CN108139938A (en) | For assisting the device of main thread executing application task, method and computer program using secondary thread | |
KR20230025464A (en) | Core selection based on usage policy and core constraints | |
CN112783652A (en) | Method, device and equipment for acquiring running state of current task and storage medium | |
US20220300312A1 (en) | Hybrid push and pull event source broker for serverless function scaling | |
CN104331322B (en) | A kind of process migration method and apparatus | |
KR102407781B1 (en) | Graphics context scheduling based on flip queue management | |
CN110515729A (en) | Graphical processor-based graph computing node vector load balancing method and device | |
US12147814B2 (en) | Dynamic thread count optimizations | |
CN102947803B (en) | Instruction is carried out to method, system and the processor that number of times is counted | |
US11023274B2 (en) | Method and system for processing data | |
WO2024102236A1 (en) | Dynamic thread count optimizations | |
CN114428673A (en) | Method for concurrently requesting resource status | |
EP4433898A1 (en) | Hierarchical asymmetric core attribute detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20220609 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Free format text: PREVIOUS MAIN CLASS: G06F0009380000 Ipc: G06T0001200000 |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 20231211 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: H04L 49/9057 20220101ALN20231205BHEP Ipc: H04L 47/625 20220101ALN20231205BHEP Ipc: G06F 9/48 20060101ALI20231205BHEP Ipc: G06T 1/20 20060101AFI20231205BHEP |