US12265735B2 - Approach for processing near-memory processing commands using near-memory register definition data - Google Patents
Approach for processing near-memory processing commands using near-memory register definition data Download PDFInfo
- Publication number
- US12265735B2 US12265735B2 US17/845,263 US202217845263A US12265735B2 US 12265735 B2 US12265735 B2 US 12265735B2 US 202217845263 A US202217845263 A US 202217845263A US 12265735 B2 US12265735 B2 US 12265735B2
- Authority
- US
- United States
- Prior art keywords
- pim
- command
- register
- execution unit
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7821—Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
Definitions
- Processing In Memory incorporates processing capability within memory modules so that tasks can be processed directly within the memory modules.
- DRAM Dynamic Random-Access Memory
- an example PIM configuration includes vector compute elements and local registers. The vector compute elements and the local registers allow a memory module to perform some computations locally, such as arithmetic computations. This allows a memory controller to trigger local computations at multiple memory modules in parallel without requiring data movement across the memory module interface, which can greatly improve performance, particularly for data-intensive workloads. Examples of data-intensive workloads include machine learning, genomics, and graph analytics.
- One of the challenges with PIM is that in situations where the information required for a complete PIM command requires more bits that the command bus width, multiple command cycles are needed for each PIM command to convey the required information. For example, suppose that N number of bits is needed to specify a command, a source register, and a destination register. If the command bus width is only K number of bits and Nis greater than K, then multiple command cycles are needed for each PIM command to convey the N number of bits of command information. Requiring multiple command cycles for each PIM command increases command bus congestion, which reduces throughput and increase power consumption.
- Another solution for addressing this problem uses processor instructions that can repeat a single instruction with incrementing operand addresses until a specified condition is satisfied, such as a count threshold, a zero/non-zero result, etc.
- a specified condition such as a count threshold, a zero/non-zero result, etc.
- One disadvantage to this solution is that it is only applicable to memory-to-memory string operations where a memory address is incremented, and is not applicable to incrementing source and destination registers.
- Yet another similar solution is vector computing architectures that increment register and operand IDs while repeating an operation, but this solution is only applicable to operations performed on entire vectors of a specified length at a host, and does not allow for fine-grained interleaving of instructions, especially in PIM. There is, therefore, a need for an approach for implementing PIM that addresses the foregoing limitations.
- FIG. 1 is a flow diagram that depicts an approach for processing PIM commands using PIM register definition data.
- FIG. 2 A is a block diagram that depicts an example computing architecture upon which the approach for processing PIM commands using PIM register definition data is implemented.
- FIG. 2 B depicts an example implementation of the memory module in the context of a PIM-enabled DRAM memory module.
- FIG. 2 C is a block diagram that depicts an example implementation of a PIM execution unit.
- FIG. 3 depicts two example PIM code segments that include similar computations with varied register operands.
- FIG. 4 A depicts a table of PIM register definition data that specifies two pre-defined combinations of source/and or destination registers for each of the four PIM commands.
- FIG. 4 B depicts a table of PIM register definition data that specifies, for each PIM command, a combination of source and/or destination registers and corresponding update functions.
- FIG. 5 is a flow diagram that depicts an approach for processing near-memory processing commands, e.g., PIM commands, using PIM register definition data that specifies pre-defined combinations of source and/or destination registers.
- PIM commands e.g., PIM commands
- FIG. 6 depicts an example of dynamically updating the destination register for a pim-load command using PIM command definition data over three iterations.
- FIG. 7 is a flow diagram that depicts an approach for processing PIM commands using PIM register definition data that specifies update functions for dynamically determining source and/or destination registers for PIM commands.
- PIM register definition data defines multiple combinations of source and/or destination registers to be used to process PIM commands.
- a particular combination of source and/or destination registers to be used to process a PIM command is specified by the PIM command or determined by a near-memory processing element processing the PIM command.
- the PIM register definition data specifies initial source and/or destination registers and one or more update functions for each PIM command.
- a near-memory processing element processes a PIM command using the initial source and/or destination registers and uses the one or more update functions to update the source and/or destination registers to be used the next time the PIM command is processed, e.g., by changing a source register value, a destination register value, or both the source register value and the destination register value. Applying an update function may, for example, increment or decrement a source or destination register value by a specified amount.
- the approach harnesses commonality in source and/or destination registers among PIM commands to reduce the amount of data in PIM commands, e.g., bits, which need to be allocated to specify source and destination registers in PIM commands, and makes those bits available for other purposes.
- the approach eliminates the need for multiple command cycles to provide all of the information needed for a PIM command. This reduces command bus traffic and power consumption, while maintaining fine-grained control.
- the approach is particularly beneficial for code segments that repeat similar computations with varied operands that specify different source and/or destination registers.
- Implementations are described herein in the context of PIM and PIM commands for purposes of explanation, but implementations are applicable to any type of near-memory processing technology. Implementations are also described herein in the context of near-memory registers for purposes of explanation, but implementations are applicable to any type of near-memory local storage, such as buffers, etc. As used herein, the term “near-memory” refers to anywhere within or near a memory module, such as at caches memory controllers, etc.
- FIG. 1 is a flow diagram 100 that depicts an approach for processing PIM commands using PIM register definition data.
- a near-memory processing element receives a PIM command.
- a PIM execution unit in or near a DRAM memory module receives a PIM command from a memory controller.
- the PIM command specifies a particular PIM command and variable information, such as a memory location, but does not specify one or more source and/or destination registers that would ordinarily be specified by a PIM command.
- the near-memory processing element determines one or more source and/or destination registers for the PIM command using PIM register definition data.
- the PIM register definition data defines multiple combinations of source and/or destination registers
- the near-memory processing element uses a particular combination of source and/or destination registers as specified by the PIM command or determined by the near-memory processing element, as described in more detail hereinafter.
- the PIM register definition data specifies how to dynamically determine PIM registers
- the near-memory processing element determines the initial combination of source and/or destination registers to be used to process the PIM command.
- step 106 the near-memory processing element processes the PIM command using the source and/or destination registers determined using the PIM register definition data.
- step 108 in the implementation where the PIM register definition data specifies how to dynamically determine the source and/or destination registers, the near-memory processing element updates the source and/or destination registers for the next invocation of the PIM command using the PIM register definition data, as described in more detail hereinafter.
- FIG. 2 A is a block diagram that depicts an example computing architecture 200 upon which the approach for processing PIM commands using PIM register definition data is implemented.
- the computing architecture 200 includes a processor 210 , a memory controller 220 , and a memory module 230 .
- the computing architecture 200 includes fewer, additional, and/or different elements depending upon a particular implementation.
- implementations are applicable to computing architecture 200 with any number of processors, memory controllers and memory modules.
- the processor 210 is any type of processor, such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Application-Specific Integrated Circuit (ASIC), a Field-Programmable Logic Array (FPGA), an accelerator, a Digital Signal Processor (DSP), etc.
- the processor 210 includes the capability, e.g., via memory command logic, to issue near-memory processing commands, such as PIM commands.
- the memory module 230 is any type of memory module, such as a Dynamic Random Access Memory (DRAM) module, a Static Random Access Memory (SRAM) module, etc. According to an implementation the memory module 230 is a PIM-enabled memory module.
- DRAM Dynamic Random Access Memory
- SRAM Static Random Access Memory
- the memory controller 220 manages the flow of data between the processor 210 and the memory module 230 and is implemented as a stand-alone element or in the processor 210 , for example on a separate die from the processor 210 , on the same die but separate from the processor, or integrated into the processor circuitry as an integrated memory controller.
- the memory controller 220 is depicted in the figures and described herein as a separate element for explanation purposes.
- FIG. 2 B depicts an example implementation of the memory module 230 in the context of a PIM-enabled DRAM memory module communicatively coupled to the memory controller 220 via a command bus 240 and a data bus 250 .
- the PIM-enabled DRAM memory module includes N number of banks, where each bank includes a corresponding PIM execution unit.
- the PIM execution units include processing logic and local storage in the form of registers for performing local computations.
- the memory module 230 includes fewer or additional elements that vary depending upon a particular implementation.
- FIG. 2 C is a block diagram that depicts an example implementation of a PIM execution unit 260 that includes processing logic 262 , local storage 264 and PIM register definition data 266 .
- the PIM execution unit 260 incudes other elements and functionality that vary depending upon a particular implementation.
- the processing logic 262 , the local storage 264 and the PIM register definition data 266 are depicted in FIG. 2 C as separate, their respective functionality may be combined in any manner depending upon a particular implementation.
- the processing logic 262 processes PIM commands using the PIM register definition data 266 and is implemented by computer hardware elements, computer software, or any combination of computer hardware elements and computer software.
- the local storage 264 is used by the processing logic 262 for performing computations and is implemented, for example, by one or more registers, although any type of local storage may be used.
- the PIM register definition data 266 generally specifies combinations of source and/or destination registers to be used to process PIM commands. As described in more detail hereinafter, in one implementation the PIM register definition data 266 defines pre-defined combinations of source and/or destination registers that are selectable for use with each PIM command.
- the PIM register definition data 266 defines an initial combination of source and/or destination registers to be used with each PIM command and one or more update functions to update the combination of source and/or destination registers to be used to process subsequent invocations of each PIM command.
- the PIM register definition data 266 is stored, for example, in a command buffer in the PIM execution unit 260 and is configurable. Although implementations are depicted in the figures and described herein in the context of the PIM register definition data 266 being stored within the PIM execution unit 260 , implementations are not limited to this example, and the PIM register definition data may be stored external to the PIM execution unit 260 , within the memory module 230 or external to the memory module 230 .
- FIG. 3 depicts two example PIM code segments that include similar computations with varied register operands.
- the registers are local to the near-memory processing element that is processing the PIM commands, for example, in the local storage 264 .
- the first pim-load command loads a value from memory at column-address 0 into register reg 0 .
- the second pim-load command loads a value from memory at column-address 1 into register reg 2 .
- the pim-multiply commands multiply the values in the first and second registers and store the result in the third register.
- the first pim-multiply command multiplies the value in register reg 0 and register reg 0 and stores the result in register reg 1 .
- the pim-add (reg 0 , reg 1 , reg 0 ) command adds the values stored in registers reg 0 and reg 1 , and stores the result in register reg 0 .
- the PIM register definition data 266 specifies multiple pre-defined combinations of source and/or destination registers.
- FIG. 4 A depicts a table 400 of PIM register definition data 266 that specifies two pre-defined combinations of source/and or destination registers for each of the four PIM commands of FIG. 3 , i.e., two pre-defined combinations for the pim-load, pim-multiply, pim-add, and pim-store commands. Each combination specifies a particular source and/or destination register for the corresponding PIM command.
- the first command in the table 400 is a pim-load command for which the first combination (Combination 1 ) specifies that data from location X, where “X” represents, for example, a location in memory, is to be stored in register reg 0 .
- the second combination (Combination 2 ) of the pim-load command specifies that the data from location X is to be stored in register reg 2 .
- the first combination (Combination 1 ) for the pim-add command specifies that the value stored in register reg 0 is added to the value stored in register reg 1 , and then the sum is stored in register reg 0 .
- the second combination for the pim-add command specifies that value stored in register reg 2 is added to the value stored in register reg 3 , and the sum is stored in register reg 2 .
- the particular register combinations depicted in FIG. 4 A are for example purposes only and any number and type of register combinations may be used.
- the combination of source/and or destination registers to be used is specified by the PIM command.
- PIM commands include an operand that specifies the combination of source/and or destination registers to be used for a particular PIM command.
- an operand of zero corresponds to Combination 1 while an operand of one corresponds to Combination 2 .
- the combination is specified by other information in the PIM command, such as low order bits of a DRAM column index in the PIM command.
- the PIM command specifies the particular command and the combination of source/and or destination registers using fewer bits than approaches that specify the source and destination registers in the PIM command
- the current combination of source/and or destination registers is tracked and automatically incremented on each invocation of a PIM command. For example, the first time that a pim-load command is executed, the source and/or destination register combination specified by Combination 1 is used. The second time that the pim-load command is executed, the source and/or destination register combination specified by Combination 2 is used. This continues until the last combination of source/and or destination registers has been used and on the next invocation of the pim-load command, the current combination of source/and or destination registers “rolls over” to the first combination of source/and or destination registers and Combination 1 is used again.
- the processing logic 262 tracks the current combination of source/and or destination registers for each PIM command and advances it to the next combination upon each invocation of the corresponding PIM command.
- This implementation provides the additional technical benefit that PIM commands do even not need to specify the combination of source/and or destination registers to be used, which further reduces the amount of data required for a complete PIM command.
- FIG. 4 A depicts two combinations of source/and or destination registers for each PTM command
- implementations are not limited to only two combinations and the PIM register definition data 266 may specify any number of combinations of source/and or destination registers.
- the number of combinations of source/and or destination registers may be different for each PIM command.
- one PTM command has two combinations of source/and or destination registers, as depicted in the table 400 of FIG. 4 A
- another PIM command has N number of combinations of source/and or destination registers. This provides the capability and flexibility for software developers to configure the PIM register definition data 266 in a manner that is best suited for particular code regions.
- FIG. 5 is a flow diagram 500 that depicts an approach for processing near-memory processing commands, e.g., PTM commands, using PTM register definition data that specifies pre-defined combinations of source and/or destination registers.
- a near-memory processing element receives a PIM command.
- the PIM execution unit 260 receives a PIM command that specifies a command and a particular combination of source/and or destination registers to be used to process the PIM command, e.g., by an operand of the PIM command.
- the particular combination of source/and or destination registers is not specified by the PIM command and is instead determined by the near-memory processing element, as previously described herein.
- the near-memory processing element determines one or more source and/or destination registers for the PIM command using PIM register definition data.
- the processing logic 262 in the PIM execution unit 260 uses the PIM register definition data 266 to determine the particular source and/or destination registers for the combination of source/and or destination registers specified by the PIM command or determined by the processing logic 262 , e.g., based upon the current combination of source and/or destination registers.
- the near-memory processing element processes the PIM command using the determined source and/or destination registers.
- the processing logic 262 processes the PTM command using the source and/or destination registers determined in step 504 .
- combinations of source/and or destination registers are dynamically determined using the PIM register definition data 266 . This includes dynamically determining the source and/or destination registers for PIM commands using update functions.
- FIG. 4 B depicts a table 410 of PIM register definition data 266 that specifies, for each PIM command, a combination of source and/or destination registers and corresponding update functions.
- the initial values for the source and/or destination registers in the table 410 are used the first time that a PIM command is processed and then the update functions are used to update the source and/or destination register values for the next time that the PIM command is processed, and so on.
- the processing logic 262 uses register reg 0 as the destination for the value stored at location X.
- the processing logic 262 uses the “Add 2” function in the “Destination Update Function” column to increment the destination register value by two for the next time that the pim-load command is processed.
- the processing logic 262 stores the updated destination value of register reg 2 in the Destination column for the pim-load command in the table 410 .
- the updated destination value is stored elsewhere, such as in the local storage 264 .
- the source and destination register values for other PIM commands are not updated.
- the Source 1 Update Function and the Source 2 Update Function are indicated in the table 410 as not applicable (n/a) since the source for a pim-load command is specified as an operand in the command.
- FIG. 6 depicts an example of dynamically updating the destination register value for the pim-load command using the table 410 of PIM register definition data 266 over three iterations.
- a set of instructions 600 includes three pim-load commands with the respective sources of locations L 1 , L 2 , and L 3 in a memory 610 .
- the memory 610 is depicted as a two-dimensional array for purposes of explanation only and implementations are applicable to any type of memory arrangement.
- a set of registers 620 includes registers Reg 0 -RegN implemented, for example, in the local storage 264 .
- the processing logic 262 determines the initial value for the destination register of register reg 0 from table 410 .
- the value from location L 1 in the memory 610 is loaded into register reg 0 .
- the processing logic 262 then updates the destination register value using the Destination Update Function from table 410 and adds two to the destination register value, or register reg 2 .
- the value from location L 2 in the memory 610 is stored in reg 2 and the destination register value is again incremented by two to register reg 4 .
- the value from location L 3 in the memory 610 is stored in register reg 4 .
- the current value in register reg 0 is added to the value in register reg 1 and the sum is stored in register reg 0 .
- the source and destination register values are each incremented by two so the next time a pim-add command is processed, the current value in register reg 2 is added to the value in register reg 3 and the sum is stored in register reg 2 , and so on.
- the next update when a change to a source or destination register value reaches the last or first register, by incrementing or decrementing, respectively, the next update causes the register value to roll over to the next available register to avoid invalid register values.
- the current register value is the last register, e.g., register reg 9 in a 10 register implementation
- the update function specifies that the register value is to be incremented by one
- the next register value rolls over to register reg 0 .
- the current register value is zero and the update function specifies that the register value is to be decremented by one
- the next register value rolls over to register reg 9 .
- Implementations are not limited to the example update functions depicted in FIG. 4 B and source and destination register values may be decremented and incremented by different amounts.
- the update functions include various types of logic to update the source and destination registers values. For example, update logic specifies that register values are incremented until a specified register value is reached and then the register value is reset to the first register, or a specified register. As another example, update logic specifies that if the value stored in a particular register or memory location satisfies one or more criteria, then the register value is incremented or decremented by a specified amount, or updated to a specified value.
- update functions can include any arithmetic operation, such as addition, subtraction, multiplication, division, etc.
- implementations are depicted in FIG. 4 B in the context of the update functions being the same, i.e., Add 2, implementations are not limited to this example and in some implementations, different update functions are used for source and destination registers. Also, implementations are not limited to updating source and/or destination registers after a PIM command is processed and according to another implementation, the update functions are applied before a PIM command is processed. For example, the first time that the pim-load command is processed, the destination register value is incremented by two and register reg 2 is used.
- the data in tables 400 , 410 is presented in the figures in table format for explanation purposes only and the data in tables 400 , 410 is stored in any manner and/or format that may vary depending upon a particular implementation.
- FIG. 7 is a flow diagram 700 that depicts an approach for processing PIM commands using PTM register definition data that specifies update functions for dynamically determining source and/or destination registers for PIM commands.
- a near-memory processing element receives a PIM command.
- the PIM execution unit 260 receives a PIM command that specifies a particular PIM command.
- the near-memory processing element determines one or more source and/or destination registers for the PIM command using PIM register definition data. For example, the processing logic 262 in the PIM execution unit 260 identifies the current source and/or destination register values specified in table 410 .
- the near-memory processing element processes the PIM command using the determined source and/or destination registers.
- the processing logic 262 processes the PTM command using the source and/or destination registers determined in step 704 .
- the near-memory processing element uses the update functions to update the source and/or destination register values for the next time that the PIM command is processed.
- the processing logic 262 applies the update function(s) to determine new source and/or destination register values to be used the next time that the PIM command is processed.
- the update functions are applied before a PIM command is processed.
- the PIM register definition data 266 includes both predefined combinations of source and destination registers for some PIM commands, and dynamically determined combinations using update functions for other PIM commands. This provides great flexibility for software developers to optimize the use of these solutions for particular implementations.
- software support is provided for configuring and updating the PIM register definition data 266 .
- the software support includes the capability to configure and update the data contained in the tables 400 , 410 .
- the software support includes, for the pre-defined combinations implementation of FIG. 4 A , the capability to explicitly specify the current combination of source/and or destination registers, increment or decrement the current combination, or otherwise manipulate the current combination of source/and or destination registers for each PIM command.
- the software support includes, for the dynamically-determined combinations implementation of FIG. 4 B , the capability to explicitly specify the current combination of source/and or destination registers for each PIM command and the update functions used to update the source and destination registers.
- the update functions may be contingent upon address bits specified by a PIM command.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
-
- I. Overview
- II. Architecture
- III. Processing PIM Commands Using PIM Command Definition Data
- A. Introduction
- B. Pre-Defined Combinations of PIM Registers
- C. Dynamically-Determined Combinations of PIM Registers
- IV. Alternatives, Extensions and Software Support
Claims (18)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/845,263 US12265735B2 (en) | 2022-06-21 | 2022-06-21 | Approach for processing near-memory processing commands using near-memory register definition data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/845,263 US12265735B2 (en) | 2022-06-21 | 2022-06-21 | Approach for processing near-memory processing commands using near-memory register definition data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230409238A1 US20230409238A1 (en) | 2023-12-21 |
| US12265735B2 true US12265735B2 (en) | 2025-04-01 |
Family
ID=89169895
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/845,263 Active US12265735B2 (en) | 2022-06-21 | 2022-06-21 | Approach for processing near-memory processing commands using near-memory register definition data |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12265735B2 (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12517669B2 (en) | 2022-09-28 | 2026-01-06 | Advanced Micro Devices, Inc. | Scheduling processing-in-memory requests and memory requests |
| US12131026B2 (en) * | 2022-12-29 | 2024-10-29 | Advanced Micro Devices, Inc. | Adaptive scheduling of memory and processing-in-memory requests |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5613132A (en) * | 1993-09-30 | 1997-03-18 | Intel Corporation | Integer and floating point register alias table within processor device |
| US5881280A (en) * | 1997-07-25 | 1999-03-09 | Hewlett-Packard Company | Method and system for selecting instructions for re-execution for in-line exception recovery in a speculative execution processor |
| US5923883A (en) * | 1996-03-12 | 1999-07-13 | Matsushita Electric Industrial Co., Ltd. | Optimization apparatus which removes transfer instructions by a global analysis of equivalence relations |
| US20140281321A1 (en) * | 2013-03-15 | 2014-09-18 | Intel Corporation | Register access white listing |
| US20150095623A1 (en) * | 2013-09-27 | 2015-04-02 | Intel Corporation | Vector indexed memory access plus arithmetic and/or logical operation processors, methods, systems, and instructions |
| US20150261590A1 (en) * | 2014-03-15 | 2015-09-17 | Zeev Sperber | Conditional memory fault assist suppression |
| US10691435B1 (en) * | 2018-11-26 | 2020-06-23 | Parallels International Gmbh | Processor register assignment for binary translation |
| US20220365726A1 (en) * | 2021-05-17 | 2022-11-17 | Samsung Electronics Co., Ltd. | Near memory processing dual in-line memory module and method for operating the same |
-
2022
- 2022-06-21 US US17/845,263 patent/US12265735B2/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5613132A (en) * | 1993-09-30 | 1997-03-18 | Intel Corporation | Integer and floating point register alias table within processor device |
| US5923883A (en) * | 1996-03-12 | 1999-07-13 | Matsushita Electric Industrial Co., Ltd. | Optimization apparatus which removes transfer instructions by a global analysis of equivalence relations |
| US5881280A (en) * | 1997-07-25 | 1999-03-09 | Hewlett-Packard Company | Method and system for selecting instructions for re-execution for in-line exception recovery in a speculative execution processor |
| US20140281321A1 (en) * | 2013-03-15 | 2014-09-18 | Intel Corporation | Register access white listing |
| US20150095623A1 (en) * | 2013-09-27 | 2015-04-02 | Intel Corporation | Vector indexed memory access plus arithmetic and/or logical operation processors, methods, systems, and instructions |
| US20150261590A1 (en) * | 2014-03-15 | 2015-09-17 | Zeev Sperber | Conditional memory fault assist suppression |
| US10691435B1 (en) * | 2018-11-26 | 2020-06-23 | Parallels International Gmbh | Processor register assignment for binary translation |
| US20220365726A1 (en) * | 2021-05-17 | 2022-11-17 | Samsung Electronics Co., Ltd. | Near memory processing dual in-line memory module and method for operating the same |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230409238A1 (en) | 2023-12-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8984256B2 (en) | Thread optimized multiprocessor architecture | |
| US20110320765A1 (en) | Variable width vector instruction processor | |
| JP4987882B2 (en) | Thread-optimized multiprocessor architecture | |
| US20210201439A1 (en) | Low power and low latency gpu coprocessor for persistent computing | |
| US9304775B1 (en) | Dispatching of instructions for execution by heterogeneous processing engines | |
| CN114341802B (en) | Method for performing in-memory processing operations and related memory devices and systems | |
| US10761851B2 (en) | Memory apparatus and method for controlling the same | |
| US12265735B2 (en) | Approach for processing near-memory processing commands using near-memory register definition data | |
| US20240168639A1 (en) | Efficient reduce-scatter via near-memory computation | |
| US20240070223A1 (en) | Increased computation efficiency with multi-stage 8-bit floating point matrix multiplication with format conversion | |
| CN112241290A (en) | Techniques for efficient execution of data reduction in parallel processing units | |
| US20250278318A1 (en) | Data processing method and apparatus, electronic device, and computer-readable storage medium | |
| CN112506468A (en) | RISC-V general processor supporting high throughput multi-precision multiplication | |
| EP1586991A2 (en) | Processor with plurality of register banks | |
| CN106716346A (en) | Cross-coupled level shifter with transition tracking circuits | |
| US10754818B2 (en) | Multiprocessor device for executing vector processing commands | |
| US11977782B2 (en) | Approach for enabling concurrent execution of host memory commands and near-memory processing commands | |
| US5109497A (en) | Arithmetic element controller for controlling data, control and micro store memories | |
| US11966328B2 (en) | Near-memory determination of registers | |
| WO2019141160A1 (en) | Data processing method and apparatus | |
| US12333307B2 (en) | Approach for managing near-memory processing commands from multiple processor threads to prevent interference at near-memory processing elements | |
| EP3933605A1 (en) | Memory device for performing in-memory processing | |
| US20230359558A1 (en) | Approach for skipping near-memory processing commands | |
| US20250306928A1 (en) | Load instruction division | |
| US20250238141A1 (en) | Memory device and operating method of memory device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: ADVANCED MICRO DEVICES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AGA, SHAIZEEN;JAYASENA, NUWAN;SIGNING DATES FROM 20220620 TO 20220621;REEL/FRAME:060263/0567 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |