EP4254176A1 - System zur verwaltung einer gruppe von willkürlich definierten rotierenden registern in einem prozessorregisterspeicher - Google Patents

System zur verwaltung einer gruppe von willkürlich definierten rotierenden registern in einem prozessorregisterspeicher Download PDF

Info

Publication number
EP4254176A1
EP4254176A1 EP23163156.5A EP23163156A EP4254176A1 EP 4254176 A1 EP4254176 A1 EP 4254176A1 EP 23163156 A EP23163156 A EP 23163156A EP 4254176 A1 EP4254176 A1 EP 4254176A1
Authority
EP
European Patent Office
Prior art keywords
register
registers
buffer area
vector
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP23163156.5A
Other languages
English (en)
French (fr)
Inventor
Benoît Dupont de Dinechin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kalray SA
Original Assignee
Kalray SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kalray SA filed Critical Kalray SA
Publication of EP4254176A1 publication Critical patent/EP4254176A1/de
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3816Instruction alignment, e.g. cache line crossing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • G06F9/30112Register structure comprising data of variable length
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers

Definitions

  • the invention relates to loading data blocks from a memory into registers of a processor, where the data blocks may begin at addresses that are not aligned with a data bus of the memory.
  • one or more streams of data may be stored in a memory and read back by a processor for further processing.
  • the memory read operations to retrieve the stream data do not use a cache, so each read operation has a latency of several cycles of a system clock.
  • Read operations are often executed in a loop to create a processing pipeline in which read instructions can be chained without latency.
  • the core of such a loop is an instruction that loads a number of bits of the width of the data bus into processor registers explicitly designated by the load instruction.
  • a first difficulty lies in the management of the destination registers, considering the memory latency. Indeed, the registers designated in an iteration of the loop do not receive their data until several cycles later, so that the next iteration cannot designate the same registers.
  • a second difficulty is that the memory access bus operates at maximum performance only when the memory read addresses are aligned, i.e., are multiples of the data bus width. Indeed, the organization of the application software does not allow this alignment to be respected in most cases.
  • a modular variable expansion (MVE) technique is used.
  • a macro-loop is designed in which several read iterations are unrolled to use different destination registers.
  • the number of unrolled iterations is chosen to have an execution time greater than the maximum memory latency.
  • This technique is difficult to apply when the memory latency is high, which is typically the case in streaming mode, because the number of available registers could become insufficient considering that other operations of the loop also consume registers.
  • Some processor architectures have a group of rotating registers that allow a loop to be unrolled transparently. Successive writes or reads to the same address in the rotating register group result in accesses to successive registers in the register group. When the last register is reached, the succession starts again from the first register. A loop can then be written in a traditional way with a single access operation that uses an address assigned to the rotating register group as its destination.
  • Patent US7594102 refers to such a group of registers.
  • the document by Hewlett-Packard HPL-PD Architecture Specification", available at the link https://www.hpl.hp.com/techreports/93/HPL-93-80R1.pdf , describes a processor architecture including a group of rotating registers.
  • a processor core including an N-bit system memory interface; a register file comprising a plurality of general purpose registers of capacity less than N bits; a set of N-bit vector registers ; in its instruction set, a register manipulation instruction executable with the following parameters: a) a value defining in the set of vector registers a buffer area formed by a plurality of consecutive vector registers, and b) a reference to a first general purpose register, the first general purpose register containing an index identifying a vector register ) within the buffer area; and an execution unit configured to, upon execution of a register manipulation instruction, read or write, in one cycle, N bits in a vector register identified from the value defining the buffer area and the index contained in the first general purpose register .
  • the register manipulation instruction may be a vector load instruction executable with the following parameters: a) the value defining the buffer area, b) the reference to the first general purpose register containing the index, and c) a reference to a second general purpose register containing a source memory address ; and the execution unit be configured to, upon execution of a vector load instruction, transfer data from the memory at the address contained in the second general purpose register to the vector register identified by the index.
  • the register manipulation instruction may be an alignment instruction executable with the following parameters: a) the value defining the buffer area, the reference to the first general purpose register , the first general purpose register containing a value combining the index identifying the vector register within the buffer area and a right shift count, and d) a destination defining a vector register or a plurality of consecutive general purpose registers having together a capacity of N bits; and the execution unit be configured to, upon execution of an alignment instruction, simultaneously read two consecutive vector registers at the index, shift the concatenated contents of the two vector registers to the right by the right shift count, and write the N least significant bits of the shifted contents at the destination.
  • the value defining the buffer area may encode the rank of the initial vector register of the buffer area and the size of the buffer area, and the execution unit may be configured to produce the index modulo the size of the buffer area, whereby the buffer area is used in a rotating manner.
  • a method of aligning data read from a memory comprising the following steps implemented at low level in a processor core: providing a rotating buffer area of a plurality of registers of the processor core; executing a series of load instructions to transfer blocks of data from the memory to first successive registers of the buffer, the number of instructions in the series being selected based on a memory read latency; and executing a loop including: i) a load instruction to transfer a memory block to a successive register of the buffer area, ii) an alignment instruction to simultaneously access two previously loaded successive registers of the buffer area and extract a data block overlapping the two successive registers, and iii) instructions processing the extracted data block.
  • the load instruction and the alignment instruction may each be executed with a first parameter defining the start and size of the buffer, and a second parameter referencing an index that identifies a position in the buffer, the method comprising steps of updating the indexes to designate successive registers in the buffer.
  • conventional rotating register groups are not designed to receive data read in streaming mode, nor to support the realignment operations required at each iteration of a loop.
  • Such an alignment involves, for example, concatenating the contents of two successive registers to extract a block of data that overlaps the two registers.
  • memory is often designed to allow data to be accessed with a granularity of one byte, which in principle allows correctly aligned data to be retrieved on the data bus.
  • fine-grained accesses cost at least two cycles in aligning the data on the bus.
  • a particular structure is proposed hereafter allowing to operate on an arbitrary set of registers forming a group of rotating registers, or more generally a buffer area, manageable by dedicated low-level instructions of the processor instruction set.
  • one of these instructions operates simultaneously (in one cycle) on two rotating registers that have been loaded at different iterations.
  • FIG. 1 shows a block diagram of a number of elements of a central processing unit CPU of a processor core used to execute a special instruction to load a vector register from data in memory, called VLOAD. Only those elements that are useful for understanding the execution of the instruction are illustrated; many other conventional elements of a processor core are not described.
  • the CPU is connected to a shared memory MEM by a data bus D.
  • the width N of the bus D is equal to 256 bits, as an example, i.e., 32 bytes.
  • the memory MEM is controlled by an address bus A, which may have a size of 64 bits to access the memory with a granularity of one byte.
  • the CPU includes a set of general-purpose registers GPRF (General-Purpose Register File), which are used to store addresses, operands of usual instructions and results of usual instructions.
  • GPRF General-Purpose Register File
  • Each register is denoted by $rX (where X is the register rank) and has a capacity of, say, 64 bits.
  • the registers may be organized into banks of four, with each register in a bank connected to the data bus D by a respective group of 64 lines.
  • the registers are also wired, which is not shown in detail, to be individually accessible by hardware execution units implementing the various instructions in the processor instruction set.
  • Such a register organization allows execution units to operate on 64-bit data by designating individual registers, or to perform 256-bit data block transfers by designating register quadruplets.
  • the CPU also includes a set of vector registers VRF (Vector Register File), typically used to store several data to be processed in SIMD mode (Single Instruction - Multiple Data).
  • VRF Vector Register File
  • Each vector register is designated by $aX (where X is the rank of the register) and has a capacity of 256 bits, i.e., the width of the data bus to which it is connected.
  • the vector registers $a may be a superset of the general purpose registers $r, each vector register then corresponding to a bank of four general purpose registers.
  • a plurality of consecutive vector registers $aB to $a(B+s-1) are interpreted by the VLOAD instruction as a buffer area BUF.
  • the buffer area may start at an arbitrary vector register of rank B and have an arbitrary size s, usually an even number.
  • An execution unit 10 is designed to implement the execution of a vector register load instruction VLOAD.
  • the VLOAD instruction conveys three parameters, namely:
  • the execution unit retrieves the index idx contained in the referenced register $rV and the source memory address @src contained in the referenced register $rS.
  • the index as stored in the register $rV is a number of bytes.
  • the content of the register $rV is divided by 32, as shown, which is equivalent to shifting the register content to the right by 5 positions.
  • the address @src is presented to the memory to read 256 corresponding bits through the bus D, which will be loaded in a vector register $a(B+idx) corresponding to the index idx.
  • the selection of the vector register may be achieved by adding the index idx to the base B contained in the first parameter BUF of the VLOAD instruction.
  • the index idx is adjusted modulo s, the size of the buffer area also contained in the parameter BUF.
  • the base B and the size s may be encoded in two respective fields of the parameter BUF.
  • An instruction parameter which may in some cases be used as an immediate value, typically has the same size as the general purpose registers, here 64 bits.
  • FIG. 2 shows a block diagram of additional CPU elements used to execute a special alignment instruction, denoted VALIGN.
  • the odd and even vector registers are accessible to the CPU execution units via two separate 256-bit buses, allowing an even register (e.g., $a0) and an odd register (e.g., $a1) to be read or written simultaneously.
  • An execution unit 20 is designed to implement the execution of a VALIGN instruction.
  • the VALIGN instruction conveys three parameters, namely:
  • the execution unit retrieves the content of the referenced register $rV.
  • the index idx encoded in this register is used, as for the VLOAD instruction, in combination with the parameter BUF, to designate a vector register in the buffer area, for example $a0.
  • the immediately following vector register $a1 is implicitly designated as well.
  • the contents of the designated vector registers $a0 and $a1 are concatenated, with the register $a0 on the right side, i.e., as the least significant bits. In other words, the weights increase from right to left.
  • the concatenated contents are simultaneously presented to a 512-bit input right-shift circuit SHIFT-R.
  • This circuit performs a right shift of the 512 inputs by the count rsc taken from the referenced register $rV and presents at its output the 256 least significant bits of the shifted concatenated contents.
  • the output of the shifter is loaded into the registers designated by the destination parameter D, here the register bank $r4 to $r7.
  • the output of the shifter could be loaded into another vector register, designated by the D parameter.
  • the right shift count rsc has the granularity of the memory addressing, here one byte. Since the width of the memory bus D is 32 bytes, the shift has a maximum value of 31 ⁇ 8 bits or 31 bytes, a value that can be encoded by 5 bits. Thus, the count rsc may be encoded in the 5 least significant bits of the register $rV, and the index idx in the remaining bits. As illustrated in Figure 2 , the index idx and the count rsc may be formalized as the quotient Q and the remainder R of the division by 32 of the content of the register $rV.
  • VLOAD and VALIGN instructions reveal their interest by using them jointly in a loop to process one or more data streams, each stream being associated with a data buffer.
  • An example of a loop to process a single stream is shown below in Table 1. This loop is designed to copy a misaligned data block stored in memory at an address @src to an address @dst in the same memory. The fact that the block is misaligned is reflected by an address @src whose 5 least significant bits convey a non-zero value, e.g., 8.
  • the register $r0 is the register $rV referenced by the VALIGN instructions, encoding the buffer index and the shift count.
  • the register $r1 is the register $rS referenced by the VLOAD instructions, containing the memory read address.
  • the register $r2 is the register $rV referenced by the VLOAD instructions, containing the buffer index.
  • the register $r0 receives a bitwise AND between the address @src and the value 31. In other words, all the bits of the address @src are cancelled except the 5 least significant bits. Thus, the register $r0 receives the value 8 in this example.
  • the register $r1 receives a bitwise AND between the @src address and the value -32, encoded in a two's complement format to represent negative numbers. This cancels the five least significant bits, and stores in the register $r1 an address aligned on the data bus.
  • the registers $r2 and $r3 are initialized to the values 0 and the address @dst, respectively.
  • a first VLOAD instruction is executed using: (i) a buffer area formed by the registers $a0 to $a3, (ii) the index (equal to 0) contained in the register $r2, designating the register $a0, and (iii) a data block of 32 bytes (256 bits) starting at the address contained in the register $r1. This address is aligned to the 32-byte boundary preceding the position (+8) where the block of useful data starts.
  • each of the registers $r2 and $r1 is incremented by the value 32 to designate, respectively, the next register $a1 of the buffer area and the next block of 32 bytes to be read from memory.
  • a VALIGN instruction is executed, using (i) the general purpose register bank $r8 through $r11 to receive the aligned data block, (ii) the buffer area $a0 through $a3, and (iii) the index and count contained in the register $r0.
  • the register $r0 contains the value 8, according to the initialization on line 1.
  • the index idx is 0 and the right shift rsc is 8.
  • the concatenated contents of registers $a0 and $a1 are shifted right by 8 bytes, and the shifted result is written to registers $r8 to $r11.
  • VALIGN instruction is executed at a time when the memory data has had time to arrive in the registers $a0 and $a1 given the memory latency.
  • the VLOAD instruction of line 8 is the one that loads data into the second register $a1, and it was executed 10 cycles earlier.
  • this loop allows a memory latency of up to 10 cycles. If a larger latency is to be compensated, a larger buffer size will be used, which increases the number of cycles of the loop "preamble" for pre-filling the buffer.
  • register $r0 At line 18, the content of register $r0 is incremented by 32.
  • the register $r0 At the first iteration, the register $r0 is updated to the value 40, encoding an index idx equal to 1 and a right shift rsc still equal to 8 (this shift usually remains constant in the loop).
  • the VLOAD instruction loads the corresponding memory contents into the vector register, $a0, of index 4 modulo 4, overwriting the value that was used by the VALIGN instruction in the previous iteration.
  • the VALIGN instruction in the second iteration uses the next vector registers $a1 and $a2. And so on.
  • VLOAD and VALIGN instructions define and manage a set of vector registers as a group of rotating registers by, in particular, using an explicit index idx and handling this index in software.
  • the hardware is adapted to simplify some of the details of rotating register management (such as the modulo s operation to produce the effective index idx) and to implement an alignment of the contents of two consecutive vector registers (512 bits) in one cycle (the separate access lines for even and odd registers).
  • These instructions also allow the buffer area to be defined arbitrarily (position and size) in the set of vector registers.
  • the VLOAD instruction described so far reads a 256-bit block to transfer it completely into a 256-bit register.
  • a typical memory structure usually also allows reading smaller blocks, such as 128, 64, 32, 16 or 8 bits.
  • Extensions of this VLOAD instruction may thus be envisaged that read smaller blocks from memory to transfer them at a specified position of the destination register, for example read 64 bits that can be written at position 0, 64, 128 or 192 of the 256-bit destination register.
  • the destination position may be encoded in the five least significant bits of the register $rV referenced by the VLOAD instruction.
  • Such a VLOAD instruction executed several times with a different destination position, allows data blocks that are disjoint in memory to be gathered together in the same vector register.
  • an alignment functionality may nevertheless be realized based on a conventional rotating buffer structure, e.g., as described in the document "HPL-93-80 HPL-PD Architecture Specification", called RRB (Rotating Register Buffer).
  • RRB Rotating Register Buffer
  • a feature of such a rotating register buffer is that the address range used to access the registers is a slot on a circle of registers that rotates by one register at each execution of the branch that iterates the loop, which implies that all the available rotating buffers evolve in the same way.
  • the loop preamble executes three VLOAD instructions with the same address in the rotating buffer, say 4, and three corresponding updates of the read address in memory.
  • a new VLOAD instruction is executed, also with address 4.
  • the rotating buffer is rotated to present a new register behind address 4 and the four blocks read from memory are available at addresses 0 to 3 of the rotating buffer.
  • the corresponding execution unit may be configured to simultaneously access the registers behind addresses 0 and 1 without causing a rotation of the registers. To this end, the registers may be wired, as before, so that the odd and even registers are accessible individually and simultaneously.
  • This embodiment allows to write a more compact code, because the instructions for updating the explicit index are omitted.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)
EP23163156.5A 2022-03-31 2023-03-21 System zur verwaltung einer gruppe von willkürlich definierten rotierenden registern in einem prozessorregisterspeicher Pending EP4254176A1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
FR2202950A FR3134206A1 (fr) 2022-03-31 2022-03-31 Système de gestion d'un groupe de registres tournants défini de façon arbitraire dans des registres de processeur

Publications (1)

Publication Number Publication Date
EP4254176A1 true EP4254176A1 (de) 2023-10-04

Family

ID=82694033

Family Applications (1)

Application Number Title Priority Date Filing Date
EP23163156.5A Pending EP4254176A1 (de) 2022-03-31 2023-03-21 System zur verwaltung einer gruppe von willkürlich definierten rotierenden registern in einem prozessorregisterspeicher

Country Status (4)

Country Link
US (1) US20230315472A1 (de)
EP (1) EP4254176A1 (de)
CN (1) CN116893989A (de)
FR (1) FR3134206A1 (de)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0918290A1 (de) * 1997-11-19 1999-05-26 Interuniversitair Micro-Elektronica Centrum Vzw Verfahren zur Übertragung von Datenstrukturen zu und von Vektorregistern eines Prozessors
GB2338094A (en) * 1998-05-27 1999-12-08 Advanced Risc Mach Ltd Vector register addressing
US6266758B1 (en) * 1997-10-09 2001-07-24 Mips Technologies, Inc. Alignment and ordering of vector elements for single instruction multiple data processing
US20040073773A1 (en) * 2002-02-06 2004-04-15 Victor Demjanenko Vector processor architecture and methods performed therein
US20050125624A1 (en) * 2003-12-09 2005-06-09 Arm Limited Data processing apparatus and method for moving data between registers and memory
US20050219422A1 (en) * 2004-03-31 2005-10-06 Mikhail Dorojevets Parallel vector processing
US7197625B1 (en) 1997-10-09 2007-03-27 Mips Technologies, Inc. Alignment and ordering of vector elements for single instruction multiple data processing
US7594102B2 (en) 2004-12-15 2009-09-22 Stmicroelectronics, Inc. Method and apparatus for vector execution on a scalar machine

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6266758B1 (en) * 1997-10-09 2001-07-24 Mips Technologies, Inc. Alignment and ordering of vector elements for single instruction multiple data processing
US7197625B1 (en) 1997-10-09 2007-03-27 Mips Technologies, Inc. Alignment and ordering of vector elements for single instruction multiple data processing
EP0918290A1 (de) * 1997-11-19 1999-05-26 Interuniversitair Micro-Elektronica Centrum Vzw Verfahren zur Übertragung von Datenstrukturen zu und von Vektorregistern eines Prozessors
GB2338094A (en) * 1998-05-27 1999-12-08 Advanced Risc Mach Ltd Vector register addressing
US20040073773A1 (en) * 2002-02-06 2004-04-15 Victor Demjanenko Vector processor architecture and methods performed therein
US20050125624A1 (en) * 2003-12-09 2005-06-09 Arm Limited Data processing apparatus and method for moving data between registers and memory
US20050219422A1 (en) * 2004-03-31 2005-10-06 Mikhail Dorojevets Parallel vector processing
US7594102B2 (en) 2004-12-15 2009-09-22 Stmicroelectronics, Inc. Method and apparatus for vector execution on a scalar machine

Also Published As

Publication number Publication date
US20230315472A1 (en) 2023-10-05
FR3134206A1 (fr) 2023-10-06
CN116893989A (zh) 2023-10-17

Similar Documents

Publication Publication Date Title
US10229089B2 (en) Efficient hardware instructions for single instruction multiple data processors
KR960003046B1 (ko) 비정렬 레퍼런스의 처리가 가능한 risc 컴퓨터 및 그 처리방법
US5812147A (en) Instruction methods for performing data formatting while moving data between memory and a vector register file
US9792117B2 (en) Loading values from a value vector into subregisters of a single instruction multiple data register
US7191318B2 (en) Native copy instruction for file-access processor with copy-rule-based validation
EP1019805B1 (de) Datenverarbeitungseinheit mit der fähigkeit von digitaler signalverarbeitung
US7694109B2 (en) Data processing apparatus of high speed process using memory of low speed and low power consumption
US7921263B2 (en) System and method for performing masked store operations in a processor
EP3106979B1 (de) Effiziente hardware-befehle für mehrere datenprozessoren mit einzelbefehlen
KR20060135642A (ko) 레지스터와 메모리 사이에 데이터를 이동시키는 데이터처리장치 및 방법
US20140047218A1 (en) Multi-stage register renaming using dependency removal
WO2004004191A2 (en) Digital signal processor with cascaded simd organization
US5459847A (en) Program counter mechanism having selector for selecting up-to-date instruction prefetch address based upon carry signal of adder which adds instruction size and LSB portion of address register
US20100318766A1 (en) Processor and information processing system
US6332188B1 (en) Digital signal processor with bit FIFO
US20080148018A1 (en) Shift Processing Unit
US8051122B2 (en) SIMD arithmetic device capable of high-speed computing
US8156310B2 (en) Method and apparatus for data stream alignment support
US10567163B2 (en) Processor with secure hash algorithm and digital signal processing method with secure hash algorithm
EP4254176A1 (de) System zur verwaltung einer gruppe von willkürlich definierten rotierenden registern in einem prozessorregisterspeicher
EP1193594B1 (de) Gerät und Prozessor zur Umbenennung von Registern
US7975127B2 (en) Computer system for processing instructions each containing a group of operations to be executed out of order
US7124280B2 (en) Execution control apparatus of data driven information processor for instruction inputs
JP2006527436A (ja) データ処理装置及びレジスタ・ファイルとメモリとの間でデータ値を転送する方法
JP2607689B2 (ja) ベクトル処理装置

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20240328

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR