EP4229572A1 - Parallelverarbeitungsarchitektur mit hintergrundlasten - Google Patents
Parallelverarbeitungsarchitektur mit hintergrundlastenInfo
- Publication number
- EP4229572A1 EP4229572A1 EP21881045.5A EP21881045A EP4229572A1 EP 4229572 A1 EP4229572 A1 EP 4229572A1 EP 21881045 A EP21881045 A EP 21881045A EP 4229572 A1 EP4229572 A1 EP 4229572A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- array
- compute elements
- data
- compute
- elements
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 93
- 230000015654 memory Effects 0.000 claims abstract description 174
- 238000000034 method Methods 0.000 claims abstract description 60
- 230000008878 coupling Effects 0.000 claims abstract description 20
- 238000010168 coupling process Methods 0.000 claims abstract description 20
- 238000005859 coupling reaction Methods 0.000 claims abstract description 20
- 238000004590 computer program Methods 0.000 claims description 26
- 238000012546 transfer Methods 0.000 claims description 19
- 238000010801 machine learning Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 20
- 239000000872 buffer Substances 0.000 description 17
- 238000003491 array Methods 0.000 description 14
- 238000007726 management method Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 238000013528 artificial neural network Methods 0.000 description 9
- 239000013598 vector Substances 0.000 description 9
- 238000013480 data collection Methods 0.000 description 7
- 230000008520 organization Effects 0.000 description 7
- 241000238876 Acari Species 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 125000004122 cyclic group Chemical group 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000010191 image analysis Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000001364 causal effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000017525 heat dissipation Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000505 pernicious effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G06F15/8023—Two dimensional arrays, e.g. mesh, torus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/1642—Handling requests for interconnection or transfer for access to memory bus based on arbitration with request queuing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/453—Data distribution
Definitions
- Fig. 2 is a flow diagram for data tagging.
- the other elements to which the CEs can be coupled can include storage elements such as scratchpad memories; multiplier units; address generator units for generating load (LD) and store (ST) addresses; load queues; and so on.
- the compiler to which each compute element is known can include a general-purpose compiler such as a C, C++, or Python compiler; a hardware-oriented compiler such as a VHDL or Verilog compiler; a compiler written for the array of compute elements; and so on.
- the coupling of each CE to it neighboring CEs enables communication between or among neighboring CEs and the like.
- Fig. 2 is a flow diagram for data tagging.
- tasks can be processed on an array of compute elements.
- the tasks can include general operations such as arithmetic, vector, or matrix operations; operations based on applications such as neural network or deep learning operations; and so on.
- the tasks In order for the tasks to be processed correctly, the tasks must be scheduled on the array of compute elements, and data must be accessed that will be operated on by the tasks.
- the data can be provided to the tasks by using background loads.
- the background loads can transfer data to compute elements from load queues, from a memory system, from local or remote storage, etc. Since the data that is loaded can be intended for one or more compute elements within the array of compute elements, the data can be tagged.
- the data tagging enables a parallel processing architecture with background loads.
- a two-dimensional (2D) array of compute elements is accessed, wherein each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Operation of the array of compute elements is paused, wherein the pausing occurs while a memory system continues operation.
- a bus coupling the array of compute elements to the memory system is repurposed for operation during the pausing. Data is transferred from the memory system to the array of compute elements, using the bus that was repurposed.
- a system block diagram 300 for a highly parallel architecture with a shallow pipeline is shown.
- the system block diagram can include a compute element array 310.
- the compute element array 310 can be based on compute elements, where the compute elements can include processors, central processing units (CPUs), graphics processing units (GPUs), coprocessors, and so on.
- the compute elements can be based on processing cores configured within chips such as application specific integrated circuits (ASICs), processing cores programmed into programmable chips such as field programmable gate arrays (FPGAs), and so on.
- the compute elements can comprise a homogeneous array of compute elements.
- the system block diagram 300 can include translation and look-aside buffers such as translation and look-aside buffers 312 and 338.
- the memory systems can be free running and can continue to operate while the array is paused. Because multicycle latency can occur due to control signal transport, which results in additional “dead time”, it can be beneficial to allow the memory system to "reach into” the array and deliver load data to appropriate scratchpad memories while the array is paused. This mechanism can operate such that the array state is known, as far as the compiler is concerned. When array operation resumes after a pause, new load data will have arrived at a scratchpad, as required for the compiler to maintain the statically scheduled model.
- Wall time which can include system clock ticks, system processing cycles, and the like, can occur continuously. That is, while the compiler time can suspend during the array being paused, wall time can proceed. Using this technique, background loads can appear to occur during a single, virtual compiler cycle, while the actual accessing of load queues, a memory system, etc., can be performed under wall time.
- the accesses can also be associated with a second or further column such as column 7 530.
- the accesses that originate within column 7 can include access 4 532 and access 5 534.
- the accesses 4 and 5 can also be offset.
- the accesses can be performed.
- the accesses to load queues, the memory system, etc. can be performed based on wall time. Since compiler time suspends while the array is paused, as opposed to wall time that never stops, the accesses occur within one virtual compiler clock tick or cycle. When the accesses are complete, the array can be resumed, and compile time can continue.
- Fig. 6 shows virtual single cycle load latency.
- An array of compute elements can be known to a compiler, where the compiler can generate or compile code for the compute elements.
- the compiler can also direct communications to or from, between, and among compute elements, where the communications are used for data transfers.
- the data that is transferred can include one or more operands.
- the compiler can pause the compute elements, resume the compute elements, and the like. Since data can be transferred between a memory system and the compute elements of the array while the compute elements within the array are paused, and since pausing the compute elements can comprise a single compiler time step, the data transfers can appear to the compiler to have taken place within as few as a one compiler time step.
- a virtual single cycle load latency enables a parallel processing architecture with background loads.
- a two-dimensional (2D) array of compute elements is accessed, where each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements. Operation of the array of compute elements is paused, and a bus coupling the array of compute elements to the memory system is repurposed. The repurposing couples one or more compute elements in the array of compute elements to the memory system, and a memory system operation is enabled during the pausing. Data is transferred from the memory system to the array of compute elements, using the bus that was repurposed.
- Fig. 7 illustrates logic for control background loads.
- a background load can be used to transfer or load data from a memory system into an array of compute elements for processing by the compute elements.
- a background load can occur while the array of compute elements is paused. Background loads enable a parallel processing architecture.
- a two-dimensional (2D) array of compute elements is accessed, where each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Operation of the array of compute elements is paused, wherein the pausing occurs while a memory system continues operation.
- a background load can be based on or controlled by a data “packet” 710.
- the packet can include data, where the data can be available on a bus.
- the data can include 64-bit data and can be available on a bus such as a column data bus.
- the packet can further include a target ID 712.
- the target ID can include a 4-bit target ID, where the target ID can be associated with a target row of compute elements within an array of compute elements.
- the packet can also include one or more control signals.
- a control signal can include a background load data valid signal 714.
- the data available on the 64-bit column data bus can be stored in one or more scratchpad memories.
- Fig. 8 is a system diagram for a parallel processing architecture with background loads.
- the parallel processing architecture with background loads enables task processing.
- the system 800 can include one or more processors 810, which are attached to a memory 812 which stores instructions.
- the system 800 can further include a display 814 coupled to the one or more processors 810 for displaying data; intermediate steps; control words; control words implementing Very Long Instruction Word (VLIW) functionality; topologies including systolic, vector, cyclic, spatial, streaming, or VLIW topologies; and so on.
- VLIW Very Long Instruction Word
- the compute elements can include compute elements within one or more integrated circuits or chips, compute elements or cores configured within one or more programmable chips such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs), processors configured as a mesh, standalone processors, etc.
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- processors configured as a mesh, standalone processors, etc.
- the system 800 can include one or more scratchpad memories 820.
- the one or more scratchpad memories 820 can be used to store data, control words, intermediate results, microcode, and so on.
- the scratchpad memory can be used for data transfer.
- the data from the memory system is transferred to a scratchpad memory in one or more compute elements within the two-dimensional array.
- a scratchpad memory can comprise a small, local, easily accessible memory available to a compute elements.
- the scratchpad memory provides operand storage. Since a scratchpad memory is associated with a particular compute element, the compute element for which the contents of the scratchpad memory are intended can be identified. Further embodiments include tagging the data before it is transferred.
- the tagging can include a flag, an address, a code, and so on. In embodiments, the tagging can guide the transferring to a particular compute element within the array of compute elements. The tagging can be based on a location within the array. In embodiments, the tagging can include a target row location within the array of compute elements. The tagging can further include a target column location within the array of compute elements.
- the scratchpad memory can be accessible to one or more compute elements. In embodiments, the scratchpad memory can include a dual read, single write (2R1W) scratchpad memory. That is, the 2R1W scratchpad memory can enable two contemporaneous read operations and one write operation without the read and write operations interfering with one another.
- Communication between and among compute elements can be accomplished using a bus such as an industry standard bus, an on-chip bus such as a ring bus, a network such as a computer network, etc.
- the ring bus is implemented as a distributed multiplexor (MUX).
- the ring bus can be used to support various communication geometries within the array of compute elements such as a Manhattan communication geometry.
- the bus can include a bus, such as a ring bus, along a row or column of the array of compute elements.
- the system 800 can include a pausing component 840.
- the pausing component 840 can include control and functions for pausing operation of the array of compute elements, wherein the pausing occurs while a memory system continues operation.
- the pausing operation can occur due to waiting for data such as operands to be processed by the compute elements.
- the pausing operation can be necessitated by an exception.
- An exception can include an arithmetic exception, waiting for data, waiting for an acknowledgement that data has been received, and the like.
- An exception can occur due to a data cache “miss”, where data needed for a computation by a compute element is neither available within a scratchpad associated with that compute element nor available in the data cache, which necessitates seeking the data from the memory system.
- the pausing operation can be necessitated by data congestion. That is, one or more buses within the array of compute elements can become congested while trying to move data between memory system and the compute elements, between or among compute elements, etc.
- the data congestion can be due to access jitter.
- the data congestion can be due to a cache miss.
- the pausing operation of the array of compute elements can include storing a state of the compute elements within the array. Other components within the array of compute elements can continue operation during the pausing.
- the bus can continue operation during the pausing.
- the bus operation can include transferring data to one or more compute elements within the array of compute elements. The data can be transferred from the memory system to one or more compute elements.
- the system 800 can include a repurposing component 850.
- the repurposing component 850 can include control logic and functions for repurposing a bus coupling the array of compute elements to the memory system for operation during the pausing.
- the repurposing of the bus can include placing the bus into a “pass through” mode in which the bus can continue operation during the pausing. Pass through mode may include saving the state currently on the bus to allow background load data to pass, and then restoring that saved data when the array resumes from the pause.
- a bus in a pass-through mode can be used for passing data between the memory system and one or more scratchpad memories, one or more queues, and so on. Further embodiments include load queues coupled between the memory system and the bus.
- the buffers can be filled and emptied during a pause of the array of compute elements.
- the load queues can be emptied of the data that was buffered before a resume occurs.
- the data can be tagged before it is transferred between the memory system and the array of compute elements.
- the tagging can guide the transferring to a particular compute element within the array of compute elements.
- the tagging can serve as a compute element address, an identifier, and the like.
- the pausing, the repurposing, and the transferring can comprise a background data load.
- a background data load can be used to provide data such as operands to one or more compute elements for other data arrives at the compute elements.
- the background data load can be used to anticipate outcomes of a branch or other control transfer operation.
- the system 800 can include a computer program product embodied in a computer readable medium for task processing, the computer program product comprising code which causes one or more processors to perform operations of: accessing a two- dimensional (2D) array of compute elements, wherein each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements; pausing operation of the array of compute elements, wherein the pausing occurs while a memory system continues operation; repurposing a bus coupling the array of compute elements, wherein the repurposing couples one or more compute elements in the array of compute elements to the memory system, and wherein a memory system operation is enabled during the pausing; and transferring data from the memory system to the array of compute elements, using the bus that was repurposed.
- 2D two- dimensional
- Each of the above methods may be executed on one or more processors on one or more computer systems.
- Embodiments may include various forms of distributed computing, client/server computing, and cloud-based computing.
- the depicted steps or boxes contained in this disclosure’s flow charts are solely illustrative and explanatory. The steps may be modified, omitted, repeated, or reordered without departing from the scope of this disclosure. Further, each step may contain one or more sub-steps. While the foregoing drawings and description set forth functional aspects of the disclosed systems, no particular implementation or arrangement of software and/or hardware should be inferred from these descriptions unless explicitly stated or otherwise clear from the context. All such arrangements of software and/or hardware are intended to fall within the scope of this disclosure.
- FIG. 1 The block diagrams and flowchart illustrations depict methods, apparatus, systems, and computer program products.
- the elements and combinations of elements in the block diagrams and flow diagrams show functions, steps, or groups of steps of the methods, apparatus, systems, computer program products and/or computer-implemented methods. Any and all such functions — generally referred to herein as a “circuit,” “module,” or “system” — may be implemented by computer program instructions, by special-purpose hardware-based computer systems, by combinations of special purpose hardware and computer instructions, by combinations of general-purpose hardware and computer instructions, and so on.
- a programmable apparatus which executes any of the above-mentioned computer program products or computer-implemented methods may include one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors, programmable devices, programmable gate arrays, programmable array logic, memory devices, application specific integrated circuits, or the like. Each may be suitably employed or configured to process computer program instructions, execute computer logic, store computer data, and so on.
- a computer may include a computer program product from a computer-readable storage medium and that this medium may be internal or external, removable and replaceable, or fixed.
- a computer may include a Basic Input/Output System (BIOS), firmware, an operating system, a database, or the like that may include, interface with, or support the software and hardware described herein.
- BIOS Basic Input/Output System
- Embodiments of the present invention are limited to neither conventional computer applications nor the programmable apparatus that run them. To illustrate: the embodiments of the presently claimed invention could include an optical computer, quantum computer, analog computer, or the like.
- a computer program may be loaded onto a computer to produce a particular machine that may perform any and all of the depicted functions. This particular machine provides a means for carrying out any and all of the depicted functions.
- any combination of one or more computer readable media may be utilized including but not limited to: a non-transitory computer readable medium for storage; an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor computer readable storage medium or any suitable combination of the foregoing; a portable computer diskette; a hard disk; a random access memory (RAM); a read-only memory (ROM), an erasable programmable read-only memory (EPROM, Flash, MRAM, FeRAM, or phase change memory); an optical fiber; a portable compact disc; an optical storage device; a magnetic storage device; or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- computer program instructions may include computer executable code.
- languages for expressing computer program instructions may include without limitation C, C++, Java, JavaScriptTM, ActionScriptTM, assembly language, Lisp, Perl, Tel, Python, Ruby, hardware description languages, database programming languages, functional programming languages, imperative programming languages, and so on.
- computer program instructions may be stored, compiled, or interpreted to run on a computer, a programmable data processing apparatus, a heterogeneous combination of processors or processor architectures, and so on.
- embodiments of the present invention may take the form of web-based computer software, which includes client/server software, software-as-a-service, peer-to-peer software, or the like.
- a computer may enable execution of computer program instructions including multiple programs or threads.
- the multiple programs or threads may be processed approximately simultaneously to enhance utilization of the processor and to facilitate substantially simultaneous functions.
- any and all methods, program codes, program instructions, and the like described herein may be implemented in one or more threads which may in turn spawn other threads, which may themselves have priorities associated with them.
- a computer may process these threads based on priority or other order.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202063091947P | 2020-10-15 | 2020-10-15 | |
PCT/US2021/054889 WO2022081784A1 (en) | 2020-10-15 | 2021-10-14 | Parallel processing architecture with background loads |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4229572A1 true EP4229572A1 (de) | 2023-08-23 |
Family
ID=81208770
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21881045.5A Pending EP4229572A1 (de) | 2020-10-15 | 2021-10-14 | Parallelverarbeitungsarchitektur mit hintergrundlasten |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4229572A1 (de) |
KR (1) | KR20230087553A (de) |
WO (1) | WO2022081784A1 (de) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU3829500A (en) * | 1999-04-09 | 2000-11-14 | Clearspeed Technology Limited | Parallel data processing apparatus |
EP2996035A1 (de) * | 2008-10-15 | 2016-03-16 | Hyperion Core, Inc. | Datenverarbeitungsvorrichtung |
US9329834B2 (en) * | 2012-01-10 | 2016-05-03 | Intel Corporation | Intelligent parametric scratchap memory architecture |
US11029949B2 (en) * | 2015-10-08 | 2021-06-08 | Shanghai Zhaoxin Semiconductor Co., Ltd. | Neural network unit |
US11347477B2 (en) * | 2019-09-27 | 2022-05-31 | Intel Corporation | Compute in/near memory (CIM) circuit architecture for unified matrix-matrix and matrix-vector computations |
-
2021
- 2021-10-14 KR KR1020237016022A patent/KR20230087553A/ko unknown
- 2021-10-14 EP EP21881045.5A patent/EP4229572A1/de active Pending
- 2021-10-14 WO PCT/US2021/054889 patent/WO2022081784A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
KR20230087553A (ko) | 2023-06-16 |
WO2022081784A1 (en) | 2022-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220107812A1 (en) | Highly parallel processing architecture using dual branch execution | |
US20220075627A1 (en) | Highly parallel processing architecture with shallow pipeline | |
EP4384902A1 (de) | Parallelverarbeitungsarchitektur mit verteilten registerspeichern | |
WO2022055792A1 (en) | Highly parallel processing architecture with shallow pipeline | |
US20220075740A1 (en) | Parallel processing architecture with background loads | |
EP4229572A1 (de) | Parallelverarbeitungsarchitektur mit hintergrundlasten | |
EP4244726A1 (de) | Hochparallele verarbeitungsarchitektur mit compiler | |
US20230350713A1 (en) | Parallel processing architecture with countdown tagging | |
US20220291957A1 (en) | Parallel processing architecture with distributed register files | |
US20220308872A1 (en) | Parallel processing architecture using distributed register files | |
US20230273818A1 (en) | Highly parallel processing architecture with out-of-order resolution | |
US20230031902A1 (en) | Load latency amelioration using bunch buffers | |
US20240168802A1 (en) | Parallel processing with hazard detection and store probes | |
US20240070076A1 (en) | Parallel processing using hazard detection and mitigation | |
US20220374286A1 (en) | Parallel processing architecture for atomic operations | |
US20220214885A1 (en) | Parallel processing architecture using speculative encoding | |
US20230409328A1 (en) | Parallel processing architecture with memory block transfers | |
US20230342152A1 (en) | Parallel processing architecture with split control word caches | |
US20240264974A1 (en) | Parallel processing hazard mitigation avoidance | |
US20240078182A1 (en) | Parallel processing with switch block execution | |
US20230221931A1 (en) | Autonomous compute element operation using buffers | |
US20230281014A1 (en) | Parallel processing of multiple loops with loads and stores | |
WO2024015318A1 (en) | Parallel processing architecture with countdown tagging | |
WO2022251272A1 (en) | Parallel processing architecture with distributed register files | |
US20240193009A1 (en) | Parallel processing architecture for branch path suppression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20230404 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) |