EP1527385A1 - Verfahren und vorrichtung zum parallelzugriff auf vektorelemente - Google Patents

Verfahren und vorrichtung zum parallelzugriff auf vektorelemente

Info

Publication number
EP1527385A1
EP1527385A1 EP03741006A EP03741006A EP1527385A1 EP 1527385 A1 EP1527385 A1 EP 1527385A1 EP 03741006 A EP03741006 A EP 03741006A EP 03741006 A EP03741006 A EP 03741006A EP 1527385 A1 EP1527385 A1 EP 1527385A1
Authority
EP
European Patent Office
Prior art keywords
memory
vector
address
elements
port
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP03741006A
Other languages
English (en)
French (fr)
Inventor
Antonius A. M. Van Wel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NXP BV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Priority to EP03741006A priority Critical patent/EP1527385A1/de
Publication of EP1527385A1 publication Critical patent/EP1527385A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/345Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units

Definitions

  • the present invention relates to a computer system comprising: a processor; a multi-port memory, the multi-port memory being accessible by the processor.
  • the present invention further relates to a method for transmitting a vector, in said computer system.
  • the present invention relates to a computer program for implementing said method.
  • Vector processing is a suitable technique for processing applications that have large computational demands.
  • Vector processors provide high-level operations that work on vectors, i.e. linear arrays of numbers.
  • Vector processors pipeline the operations on the individual elements of a vector.
  • the pipeline includes not only the arithmetic operations, but also memory accesses and effective address calculations, i addition, most high-end vector processors allow multiple operations to be done at the same time, creating parallelism among the operations on different elements.
  • Vector instructions have several important properties. First, the computations of each result are independent of the computations of previous results, allowing a very deep pipeline without generating any data hazards. Second, a vector instruction is equivalent to executing an entire loop, reducing the instruction bandwidth requirement.
  • a vector memory system has a large datawidth, which allows retrieving a complete vector of data elements in one memory access using a single memory address. Subsequently, these data elements can be processed in parallel.
  • several problems can occur when retrieving data from a vector memory system.
  • the problem of vector alignment is related to reading from a vector memory system data that cross vector boundaries, hi that case the data can be retrieved by requesting the contents of two memory addresses, i.e. two vectors, and subsequently transfer the requested data to a new vector.
  • the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for vector processing.
  • a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register.
  • a subset of elements is selected from the first register and the second register.
  • the elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent vector processing.
  • An object of the invention is to provide an improved method for vector alignment and ordering of vector elements, resulting in a better performance of vector processors.
  • This object is achieved with a method for transmitting a vector, characterized in that the method comprises the steps of: passing a base memory address to an address configuration means; defining a set of memory addresses by the address configuration means using the base memory address and a configuration instruction for configuring the address configuration means; transmitting the vector to/from the multi-port memory using the set of memory addresses.
  • the method allows transmitting a complete vector to or from a multi-port memory, using a single base memory address.
  • the data elements of a vector can be transmitted to or from arbitrary positions within the memory, improving flexibility and avoiding problems related to vector alignment and ordering of vector elements.
  • the use of a multi-port memory in combination with said address configuration means reduces the instruction width.
  • a complete vector can be transmitted using a single base memory address, whereas otherwise each memory address used by the multi-port memory should be present in the instruction. For certain types of processors, such as very large instruction word processors, reducing the code size is an important issue.
  • a computer system is characterized in that the computer system further comprises an address configuration means, wherein the address configuration means is conceived to define a set of memory addresses using a base memory address and a configuration instruction for configuring the address configuration means, and wherein the multi-port memory is conceived to use the set of memory addresses.
  • the address configuration means is conceived to define a set of memory addresses using a base memory address and a configuration instruction for configuring the address configuration means
  • the multi-port memory is conceived to use the set of memory addresses.
  • Complete vectors can be transmitted to or from the multi-port memory using one base memory address, which reduces memory overhead and increases the performance of the computer system.
  • the address configuration means comprises: a plurality of register files arranged to be configured by the configuration instruction, and a plurality of address calculation units for calculating the set of memory addresses; the register files are accessible by the address calculation units; the address calculation units are coupled to the multi-port memory.
  • the configuration instruction configures the plurality of register files, and these register files can hold this configuration until the next configuration instruction is executed. In between, this configuration can be used repeatedly, for example during execution of a loop of instructions.
  • An embodiment of the computer system according to the invention is characterized in that the configuration instruction comprises a set of offsets, each offset in combination with the base memory address defining a second memory address.
  • the set of offsets can be directly loaded in the plurality of register files and used by the plurality of address calculation units, improving the performance of the address configuration means.
  • Fig. 1 shows a schematic diagram of a computer system according to the invention.
  • Fig. 2 shows a schematic diagram of a memory system having a multi-port memory and an address configuration means.
  • Fig. 1 shows a block diagram of a computer system comprising a processor PROC, an address configuration unit ACU, a multi-port memory MEM and a system bus SB.
  • the processor PROC, the address configuration unit ACU and the multi-port memory MEM are coupled via the system bus SB.
  • the processor PROC may issue operations to access the multi-port memory MEM in order to read or write a vector with data elements.
  • the address configuration unit ACU Prior to reading or writing a set of data elements from the multi-port memory MEM, the address configuration unit ACU should be configured by means of a configuration instruction, issued by the processor PROC.
  • the configuration instruction configures the address configuration unit ACU so that it is capable of calculating a set of memory addresses specific for the set of data elements to be retrieved from the multi-port memory MEM, using a base memory address.
  • the configuration of the address calculation unit ACU remains unchanged until a next configuration instruction is issued.
  • the processor issues a read operation, comprising a base memory address, and the latter is sent to the address calculation unit ACU.
  • the address calculation unit ACU calculates a set of memory addresses. These memory addresses are sent to the multi-port memory MEM via the system bus SB, followed by reading the data elements from the multi-port memory MEM. These data elements are sent as a single vector to the processor PROC and-used for further processing.
  • a base memory address is sent to the address configuration unit ACU.
  • the address configuration unit ACU calculates a set of memory addresses, which are sent to the multi-port memory MEM, via the system bus SB.
  • the data elements are also sent to the multi-port memory MEM via the system bus SB.
  • the data elements are written to the multi-port memory MEM.
  • Fig. 2 shows a block diagram of a memory system MS, comprising a multi- port memory MEM and an embodiment of an address configuration unit ACU.
  • the multi- port memory MEM comprises a RAM memory, four data input ports Datln, four address ports Addr and four data output ports DatOut.
  • the address configuration unit ACU comprises an address port Addrln, four address calculation units AU, four register files RF and four data input ports Datln.
  • the data inputs Datln are shared data input ports for both the address configuration unit ACU and the multi-port memory MEM.
  • the address input port Addrln is coupled to the address calculation units AU, and the address calculation units AU are coupled to their corresponding address port Addr of the multi-port memory MEM.
  • the data input ports Datln are coupled to the register files RF.
  • the register files RF are accessible by the address calculation units AU.
  • the multi-port memory MEM supports commands for reading and writing of data.
  • data can be read from the RAM memory via the data output ports DatOut.
  • the four data elements read from the data output ports DatOut can be combined into one vector.
  • a set of four data elements can be written to the multi-port memory, via the data input ports Datln and using the address ports Addr for memory addressing.
  • the address configuration units ACU support a configuration instruction, which specifies a set of offsets relative to a base memory address.
  • a configuration instruction which specifies a set of offsets relative to a base memory address.
  • an offset value is written to each of the register files RF, via the corresponding data input port Datln.
  • the address calculation units AU fetch the offset value from their corresponding register file RF and store this value internally.
  • the processor PROC issues a read operation to the memory system MS, abase memory address is provided at the address port Addrln.
  • the address calculation units AU take the value of the base memory address from the address input port Addrln and add their corresponding offset value.
  • the address calculation units AU send the resulting set of memory addresses to the corresponding address ports Addr, and subsequently a read command is issued to the multi-port memory MEM.
  • the resulting set of data elements is provided at the data output ports DatOut of the multi-port memory MEM.
  • the processor PROC may also issue a write operation to the memory system MS in order to write a set of data elements to the RAM memory.
  • the address port Addrln receives a base memory address.
  • the address calculation units AU calculate a set of memory addresses, using the base memory address and their corresponding offset value.
  • the resulting set of memory addresses is sent to the corresponding address ports Addr of the multi-port memory MEM.
  • the data elements are sent to the data input ports Datln of the multi-port memory MEM. Subsequently, a write command is issued to the multi-port memory MEM and the data elements are written to the RAM memory.
  • the configuration instruction may comprise a set of commands issued to the address configuration units AU for calculating a set of offsets.
  • the set of offsets received by the register files RF will be such that in combination with a base memory address the address calculation units AU are capable of defining an arbitrary set of memory addresses.
  • a set of data elements can be simultaneously written to or retrieved from the multi-port memory MEM.
  • the memory system MS therefore behaves as a vector memory system, having the advantage of allowing retrieving a set of data elements from arbitrary memory locations using one base memory address.
  • the memory system MS has the advantage that using one memory address, a set of data elements can be addressed instead of requiring a set of memory addresses from an external source.
  • the instruction width can be reduced, which is especially of interest for very large instruction word processors, where reduction of code size is an important issue.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)
  • Executing Machine-Instructions (AREA)
EP03741006A 2002-07-26 2003-07-10 Verfahren und vorrichtung zum parallelzugriff auf vektorelemente Withdrawn EP1527385A1 (de)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP03741006A EP1527385A1 (de) 2002-07-26 2003-07-10 Verfahren und vorrichtung zum parallelzugriff auf vektorelemente

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP02078074 2002-07-26
EP02078074 2002-07-26
EP03741006A EP1527385A1 (de) 2002-07-26 2003-07-10 Verfahren und vorrichtung zum parallelzugriff auf vektorelemente
PCT/IB2003/003150 WO2004013752A1 (en) 2002-07-26 2003-07-10 Method and apparatus for accessing multiple vector elements in parallel

Publications (1)

Publication Number Publication Date
EP1527385A1 true EP1527385A1 (de) 2005-05-04

Family

ID=31197898

Family Applications (1)

Application Number Title Priority Date Filing Date
EP03741006A Withdrawn EP1527385A1 (de) 2002-07-26 2003-07-10 Verfahren und vorrichtung zum parallelzugriff auf vektorelemente

Country Status (5)

Country Link
EP (1) EP1527385A1 (de)
JP (1) JP2005534120A (de)
CN (1) CN1672128A (de)
AU (1) AU2003281792A1 (de)
WO (1) WO2004013752A1 (de)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100349122C (zh) * 2005-08-19 2007-11-14 华为技术有限公司 一种实现多引擎并行处理器中数据包排序的方法
CN100417142C (zh) * 2005-12-22 2008-09-03 华为技术有限公司 将接口流量在多个网络处理器引擎中均担的方法
JP2009527188A (ja) 2006-02-15 2009-07-23 トムソン ライセンシング 非線形ディジタル編集用下見フィルム
US9021233B2 (en) * 2011-09-28 2015-04-28 Arm Limited Interleaving data accesses issued in response to vector access instructions
CN102930008B (zh) * 2012-10-29 2015-10-07 无锡江南计算技术研究所 向量查表方法
US9606803B2 (en) 2013-07-15 2017-03-28 Texas Instruments Incorporated Highly integrated scalable, flexible DSP megamodule architecture
CN109284822B (zh) * 2017-07-20 2021-09-21 上海寒武纪信息科技有限公司 一种神经网络运算装置及方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01265347A (ja) * 1988-04-18 1989-10-23 Matsushita Electric Ind Co Ltd アドレス生成装置
JPH0728786A (ja) * 1993-07-15 1995-01-31 Hitachi Ltd ベクトルプロセッサ
US6463518B1 (en) * 2000-06-19 2002-10-08 Philips Electronics No. America Corp. Generation of memory addresses for accessing a memory utilizing scheme registers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2004013752A1 *

Also Published As

Publication number Publication date
JP2005534120A (ja) 2005-11-10
AU2003281792A1 (en) 2004-02-23
WO2004013752A1 (en) 2004-02-12
CN1672128A (zh) 2005-09-21

Similar Documents

Publication Publication Date Title
US20060155953A1 (en) Method and apparatus for accessing multiple vector elements in parallel
US11183225B2 (en) Memories and methods for performing vector atomic memory operations with mask control and variable data length and data unit size
US9400652B1 (en) Methods and apparatus for address translation functions
US20090172348A1 (en) Methods, apparatus, and instructions for processing vector data
JP2002518730A (ja) 単一命令複数データシステムに使用するレジスタにアクセスするレジスタおよびアクセス方法
US8046568B2 (en) Microprocessor with integrated high speed memory
JP2001273277A (ja) 演算処理システム、演算処理方法およびそのプログラム格納装置
US6061772A (en) Split write data processing mechanism for memory controllers utilizing inactive periods during write data processing for other transactions
US5307300A (en) High speed processing unit
JP2818529B2 (ja) 情報記憶装置
EP1527385A1 (de) Verfahren und vorrichtung zum parallelzugriff auf vektorelemente
US5581720A (en) Apparatus and method for updating information in a microcode instruction
US5752271A (en) Method and apparatus for using double precision addressable registers for single precision data
US5893928A (en) Data movement apparatus and method
US7069386B2 (en) Associative memory device
US8209520B2 (en) Expanded functionality of processor operations within a fixed width instruction encoding
US20250173146A1 (en) Technique for handling data elements stored in an array storage
US12504973B2 (en) Technique for handling data elements stored in an array storage
US9583158B2 (en) Method of managing requests for access to memories and data storage system
JPH1091593A (ja) マイクロプロセッサと付加的計算ユニットとを含むデータ処理装置
US4805133A (en) Processor memory element and a new computer architecture
JP2576589B2 (ja) 仮想記憶アクセス制御方式
WO2025210334A1 (en) Vector cryptographic processing
KR20230095775A (ko) Ndp 기능을 포함하는 메모리 확장 장치 및 이를 포함하는 가속기 시스템
JP2003196087A (ja) マイクロコントローラのメモリアドレッシング方法及びページマッピング装置

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20050228

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: NXP B.V.

17Q First examination report despatched

Effective date: 20110120

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110531