CN1672128A - Method and apparatus for accessing multiple vector elements in parallel - Google Patents

Method and apparatus for accessing multiple vector elements in parallel Download PDF

Info

Publication number
CN1672128A
CN1672128A CN 03817860 CN03817860A CN1672128A CN 1672128 A CN1672128 A CN 1672128A CN 03817860 CN03817860 CN 03817860 CN 03817860 A CN03817860 A CN 03817860A CN 1672128 A CN1672128 A CN 1672128A
Authority
CN
China
Prior art keywords
address
memory
vector
configuration
computer system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 03817860
Other languages
Chinese (zh)
Inventor
A·A·M·范维尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of CN1672128A publication Critical patent/CN1672128A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/345Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units

Abstract

Vector processing is a suitable technique for processing applications that have large computational demands. Vector processors provide high-level operations that work on vectors, i.e. linear arrays of numbers. Vector operations can be made faster than a sequence of scalar operations on the same number or data items. Typical applications where vector processing can be used is the domain of audio and video processing. A vector memory system has a large data width, which allows retrieving a complete vector of data elements in one memory access using a single memory address. Subsequently, these data elements can be processed in parallel. However, when using vector memory systems the problem of vector alignment and ordering of a set of elements of a vector can occur. The present invention provides an improved method for vector alignment and ordering of vector elements in a computer system comprising a processor (PROC) and a multi-port memory (MEM), resulting in a better performance. The first step comprises passing of a base memory address to an address configuration unit (ACU). Next, defining a set of memory addresses by the address configuration unit (ACU) using the base memory address and a configuration instruction for configuring the address configuration unit. Finally, transmitting a vector to or from the multi-port memory (MEM) using the set of memory addresses.

Description

The method and apparatus that is used for a plurality of vector element of concurrent access
Technical field
The present invention relates to a kind of computer system, comprising:
Processor;
Multiport memory, described multiport memory can be by processor access.
The invention further relates to a kind of method that is used at described computer system used for vector transfer.
Further, the present invention relates to a kind of computer program that is used to realize described method.
Background technology
It is a kind of appropriate technology that is used to handle the application with a large amount of computation requirements that vector is handled.Vector processor provides the senior computing to vector (being linear arrays of numbers) work.Computing on the individual element of vector processor streamline used for vector transfer.Streamline not only comprises arithmetical operation, and comprises memory access and effective address calculating.In addition, most high-end vector processor all allows a plurality of computings to carry out simultaneously, creates parallel work-flow between the computing on the different elements.Vector instruction has several important characteristics.The first, each result's calculating is all irrelevant with previous result's calculating, and this allows very dark streamline and can not produce any data hazards.The second, vector instruction is equivalent to carry out whole circulation, and this has reduced the instruction bandwidth requirement.The 3rd, owing in single reference, retrieve complete vector but not the retrieve data element, thereby reduced the expense of memory access.For this reason, the scalar operation sequence that can allow identical numeral of vector calculus comparison or data item carry out is faster.It is exactly the field that Voice ﹠ Video is handled that the typical case that wherein can use vector to handle uses.
Vector memory system has big data width, and this allows to utilize the single memory address to come the complete vector of retrieve data element in the primary memory visit.Subsequently, can handle these data elements concurrently.Yet, when retrieve data from vector memory system, several problems may take place.The first, the problem of vector alignment (alignment) is relevant with the data that read from vector memory system across the vector border.Under the sort of situation, (that is, two vectors) content is come retrieve data, and subsequently the data of being asked is sent to new vector may to pass through two storage addresss of request.The second, when problem has not appearred in the order of one group of vector element of needs and their order of storage simultaneously.If need vector to have the orderly element set of in different vectors, storing, must retrieve the content of these vectors so, two memory accesses is succeeded by selecting suitable data element at least.United States Patent (USP) 5,933,650 have described the alignment that is used for vector element and the method for ordering.In the alignment of vector element, a vector is loaded into first register from storage unit, and another vector is loaded into second register from storage unit.Determine the start byte of first byte of the vector after appointment is alignd.Next, first in first byte of first register comes to extract vector from first register and second register via the position in second register continuously.At last, the vector that extracts is copied in the 3rd register, so that a plurality of elements that the 3rd register comprises after the alignment are handled for vector.According to the order of vector element, first vector is loaded into first register from storage unit, and second vector is loaded into second register from storage unit.Then, from first register and second register, select the subclass of element.Then, copy in the element in the 3rd register according to being suitable for element that certain order that vector subsequently handles will come from subclass.
The shortcoming of the method for the alignment of vector element and ordering just is in the prior art: need carry out once above read access to vector memory system, this has increased the expense of obtaining vector data.In addition, need additional hardware, for example, be used for storing the additional firmware of vector temporarily, wherein be necessary for the ordering of vector alignment or vector and from described vector, select element.
Summary of the invention
An object of the present invention is, the method after a kind of improvement that is used for the ordering of vector alignment and vector element is provided, this causes the more performance of vector processor.
This purpose is to utilize a kind of method that is used for used for vector transfer to realize, it is characterized in that: said method comprising the steps of:
Base memory address is delivered to address configuration means;
Utilize base memory address and the configuration-direct that is used for the configuration address inking device to define the storage stack address by address configuration means;
Utilize this set of memory addresses that vector is transferred to multiport memory/from the multiport memory used for vector transfer.
Described method allows to utilize the single memory base address that complete vector is transferred to multiport memory or transmits complete vector from multiport memory.The data element of vector can be transferred to the optional position in the storer or the data element of the optional position used for vector transfer in storer, this has improved dirigibility and has been avoided the problem relevant with the vector element ordering with vector alignment.In addition, use multiport memory to reduce instruction width in combination with described address configuration means.Can utilize the single memory base address to transmit complete vector, yet employed each storage address of multiport memory all should be present in the instruction.For the processor of some type, such as very large instruction word processors, reducing code size is a major issue.
According to the present invention, a kind of computer system is characterised in that: described computer system further comprises address configuration means, the configuration-direct that wherein said address configuration means is designed to utilize base memory address and be used for the configuration address inking device defines the storage stack address, and wherein said multiport memory is designed to use this set of memory addresses.Can utilize a base memory address that complete vector is transferred to multiport memory or transmits complete vector from multiport memory, this has reduced memory spending and has improved performance of computer systems.
Define the preferred embodiments of the present invention in the dependent claims.In claim 8, define a kind of computer program of being used for the method for used for vector transfer according to of the present invention of being used to realize.
Embodiment according to computer system of the present invention is characterised in that:
Address configuration means comprises: be arranged to a plurality of register files that disposed by configuration-direct and be used for a plurality of address calculation of computing store group of addresses;
Described register file can be visited by address calculation;
Described address calculation is coupled to multiport memory.
The a plurality of register files of described configuration instruction configures, and these register files can be preserved this configuration up to carrying out next bar configuration-direct.Between the two, for example, can repeatedly use this configuration in the cycle period of execution command.
Embodiment according to computer system of the present invention is characterised in that: configuration-direct comprises one group of side-play amount, and each side-play amount all combines with the base memory address of definition second memory address.Described side-play amount group directly can be loaded in a plurality of register files, and be used by a plurality of address calculation, this has improved the performance of address configuration means.
Description of drawings
Further illustrate and describe the feature of described embodiment with reference to the accompanying drawings:
Fig. 1 shows the synoptic diagram according to computer system of the present invention.
Fig. 2 shows the synoptic diagram of the storage system with multiport memory and address configuration means.
Embodiment
Fig. 1 shows the block diagram of the computer system that comprises processor P ROC, address configuration unit ACU, multiport memory MEM and system bus SB.Processor P ROC, address configuration unit ACU and multiport memory MEM are coupled via system bus SB.During executing instruction, in order to read or write the vector with data element, processor P ROC can issue operation so that visit multiport memory MEM.Before from multiport memory MEM, reading or write one group of data element, should come by the configuration-direct that sends by processor P ROC address dispensing unit ACU is configured.Configuration-direct is configured address dispensing unit ACU, so it can utilize base memory address to calculate for will be for the data elements groups that retrieves the multiport memory MEM specific storage stack address.The configuration of address calculation ACU remains unchanged up to sending next bar configuration-direct.After address dispensing unit ACU was configured, the processor issue comprised the read operation of base memory address, and this base memory address is sent to address calculation ACU.Subsequently, address calculation ACU calculates the storage stack address.These storage addresss are sent to multiport memory MEM via system bus SB, succeeded by read data elements from multiport memory MEM.These data elements are sent to processor P ROC as single vector, and use for further handling.If processor P ROC issues write operation, then just send base memory address to address configuration unit ACU.Address configuration unit ACU calculates the storage stack address, and described set of memory addresses is sent to multiport memory MEM via system bus SB.Also data element is sent to multiport memory MEM via system bus SB.In next step, data element is write multiport memory MEM.Next time write or read operation before, perhaps need to issue new configuration-direct according to needed set of memory addresses.For example, if one group of data element that must read need identical set of memory addresses and apply identical base memory address, so needn't the repeated configuration order.When using different base memory address but the required configuration of address configuration unit ACU when keeping identical also needn't be sent new configuration-direct.
Fig. 2 shows the block diagram of the storage system MS of an embodiment who comprises multiport memory MEM and address configuration unit ACU.Described multiport memory MEM comprises: RAM storer, four data input port DatIn, four address port Addr and four data output port DatOut.Address configuration unit ACU comprises: address port AddIn, four address calculation AU, four register file RF and four data input port DatIn.In this embodiment, data input Datln is the shared data input port that is used for address configuration unit ACU and multiport memory MEM.Address input end mouth AddrIn is coupled to address calculation AU, and address calculation AU is coupled to its corresponding address port Addr of multiport memory MEM.Data-in port DatIn is coupled to register file RF.Register file RF can be visited by address calculation AU.
Multiport memory MEM supports to be used for order that data are read and write.By utilizing address port Addr, can be via data-out port DatOut reading of data from the RAM storer.Four data elements that read can be merged into a vector from data-out port DatOut.Can via data-in port DatIn and be used for memory addressing address port Addr one group of four data element is write multiport memory.
Address configuration unit ACU supports configuration-direct, and described configuration-direct has been specified one group of side-play amount with respect to base memory address.When carrying out configuration-direct, offset value is write each register file RF via the data-in port DatIn of correspondence.Subsequently, address calculation AU takes out offset value from the register file RF of their correspondences, and this value is stored in inside.
If processor P ROC then just provides base memory address at address port Addrln place to storage system MS issue read operation.Address calculation AU obtains the value of base memory address from address input end mouth AddrIn, and increases their corresponding offset values.Address calculation AU sends to corresponding address port Addr with the set of memory addresses that obtains, and sends reading order to multiport memory MEM subsequently.Data-out port DatOut place at multiport memory MEM provides the data elements groups that obtains.Processor P ROC can also be to storage system MS issue write operation, so that one group of data element is write the RAM storer.Address port AddrIn reception memorizer base address.Address calculation AU utilizes base memory address and their corresponding offset values to calculate the storage stack address.Resulting set of memory addresses is sent to the corresponding address port Addr of multiport memory MEM.Data element is sent to the data-in port DatIn of multiport memory MEM.Subsequently, MEM sends write command to multiport memory, and described data element is write the RAM storer.
In other embodiments, configuration-direct can comprise to address configuration unit AU send be used to calculate the side-play amount group Management Information Base.
Utilize suitable configuration-direct, the side-play amount group that is received by register file RF is to combine with base memory address like this, makes address calculation AU can define set of memory addresses arbitrarily.Utilize this set of memory addresses, can simultaneously one group of data element be write multiport memory MEM or from multiport memory MEM, retrieve one group of data element simultaneously.Therefore, storage system MS plays vector memory system, it allow to utilize a base memory address from memory location arbitrarily retrieve one group of data element aspect preponderate.In addition, compare with multiport memory, storage system MS has following advantage: by utilizing a storage address, can be addressed to one group of data element, and need be from the storage stack address of external source.Consequently, can reduce instruction width, this is concerned about especially that for very large instruction word processors dwindling of code size is a major issue in described instruction word processors.
Should be noted in the discussion above that for example clear the present invention of the foregoing description and unrestricted the present invention, and those skilled in the art will can design many interchangeable embodiment under the situation of the scope that does not deviate from claims.In the claims, any reference marker that is positioned at bracket should not regarded as is the restriction claim.Word " comprises " not getting rid of except those and is listed in element or other element the step or the existence of step in the claim.The existence that word " " before the element or " one " do not get rid of a plurality of these class components.Hardware that can be by comprising several different elements is also realized the present invention by means of the computing machine of suitably programming.In enumerating the device claim of several means, several can both the realization in these devices by same hardware branch.The pure fact of some measure of narrating in mutually different dependent claims does not represent that the combination of these measures just not possesses advantage.

Claims (8)

1. be used for the method at the computer system used for vector transfer, described computer system comprises:
Processor;
Can be by the multiport memory of processor access,
It is characterized in that said method comprising the steps of:
Base memory address is delivered to address configuration means;
Utilize base memory address and the configuration-direct that is used for the configuration address inking device to define the storage stack address by address configuration means;
Utilize described set of memory addresses that vector is transferred to multiport memory/from the multiport memory used for vector transfer.
2. method according to claim 1, wherein:
Address configuration means comprises: by a plurality of register files of configuration instruction configures be used to calculate a plurality of address calculation of described set of memory addresses;
Described register file can be visited by address calculation;
Described address calculation is coupled to multiport memory.
3. method according to claim 1, wherein:
Described configuration-direct comprises one group of side-play amount, and each side-play amount combines with base memory address and defines the second memory address.
4. computer system comprises:
Processor;
Multiport memory, described multiport memory can be by processor access,
It is characterized in that described computer system further comprises: address configuration means, the configuration-direct that wherein said address configuration means is designed to utilize base memory address and be used for the configuration address inking device defines the storage stack address, and wherein said multiport memory is designed to use described set of memory addresses.
5. computer system according to claim 4, wherein:
Described address configuration means comprises: be arranged to a plurality of register files that disposed by configuration-direct and be used to calculate a plurality of address calculation of described set of memory addresses;
Described register file can be visited by address calculation;
Described address calculation is coupled to multiport memory.
6. computer system according to claim 4, wherein:
Described configuration-direct comprises one group of side-play amount, and each side-play amount combines with base memory address and defines the second memory address.
7. computer system according to claim 4, wherein: described multiport memory and address configuration means are included in the storage system.
8. computer program that comprises computer program code means, described computer program code means are used to indicate computer system to carry out the step of the method for claim 1.
CN 03817860 2002-07-26 2003-07-10 Method and apparatus for accessing multiple vector elements in parallel Pending CN1672128A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP02078074.8 2002-07-26
EP02078074 2002-07-26

Publications (1)

Publication Number Publication Date
CN1672128A true CN1672128A (en) 2005-09-21

Family

ID=31197898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 03817860 Pending CN1672128A (en) 2002-07-26 2003-07-10 Method and apparatus for accessing multiple vector elements in parallel

Country Status (5)

Country Link
EP (1) EP1527385A1 (en)
JP (1) JP2005534120A (en)
CN (1) CN1672128A (en)
AU (1) AU2003281792A1 (en)
WO (1) WO2004013752A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8184260B2 (en) 2006-02-15 2012-05-22 Thomson Licensing Non-linear, digital dailies
CN102930008A (en) * 2012-10-29 2013-02-13 无锡江南计算技术研究所 Vector table looking up method and processor
CN103930883A (en) * 2011-09-28 2014-07-16 Arm有限公司 Interleaving data accesses issued in response to vector access instructions
CN110597558A (en) * 2017-07-20 2019-12-20 上海寒武纪信息科技有限公司 Neural network task processing system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100349122C (en) * 2005-08-19 2007-11-14 华为技术有限公司 Method for realizing data packet sequencing for multi engine paralled processor
CN100417142C (en) * 2005-12-22 2008-09-03 华为技术有限公司 Method for average distributing interface flow at multi network processor engines

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01265347A (en) * 1988-04-18 1989-10-23 Matsushita Electric Ind Co Ltd Address generating device
JPH0728786A (en) * 1993-07-15 1995-01-31 Hitachi Ltd Vector processor
US6463518B1 (en) * 2000-06-19 2002-10-08 Philips Electronics No. America Corp. Generation of memory addresses for accessing a memory utilizing scheme registers

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8184260B2 (en) 2006-02-15 2012-05-22 Thomson Licensing Non-linear, digital dailies
CN103930883A (en) * 2011-09-28 2014-07-16 Arm有限公司 Interleaving data accesses issued in response to vector access instructions
CN103930883B (en) * 2011-09-28 2017-02-15 Arm 有限公司 Interleaving data accesses method and device in response to vector access instructions
CN102930008A (en) * 2012-10-29 2013-02-13 无锡江南计算技术研究所 Vector table looking up method and processor
CN102930008B (en) * 2012-10-29 2015-10-07 无锡江南计算技术研究所 Vector look-up method
CN110597558A (en) * 2017-07-20 2019-12-20 上海寒武纪信息科技有限公司 Neural network task processing system
CN110597558B (en) * 2017-07-20 2021-11-12 上海寒武纪信息科技有限公司 Neural network task processing system

Also Published As

Publication number Publication date
WO2004013752A1 (en) 2004-02-12
JP2005534120A (en) 2005-11-10
AU2003281792A1 (en) 2004-02-23
EP1527385A1 (en) 2005-05-04

Similar Documents

Publication Publication Date Title
US5832290A (en) Apparatus, systems and method for improving memory bandwidth utilization in vector processing systems
US20200371888A1 (en) Streaming engine with deferred exception reporting
JP3098071B2 (en) Computer system for efficient execution of programs with conditional branches
JP4987882B2 (en) Thread-optimized multiprocessor architecture
US5586256A (en) Computer system using multidimensional addressing between multiple processors having independently addressable internal memory for efficient reordering and redistribution of data arrays between the processors
JP2010532905A (en) Thread-optimized multiprocessor architecture
US20050273576A1 (en) Microprocessor with integrated high speed memory
CN1702773A (en) Programmable parallel lookup memory
US20190187903A1 (en) Streaming engine with fetch ahead hysteresis
US11709778B2 (en) Streaming engine with early and late address and loop count registers to track architectural state
US20230028372A1 (en) Memory shapes
US20060155953A1 (en) Method and apparatus for accessing multiple vector elements in parallel
CN108268596A (en) Search for the method and system of data stored in memory
CN1672128A (en) Method and apparatus for accessing multiple vector elements in parallel
US11640397B2 (en) Acceleration of data queries in memory
US8656133B2 (en) Managing storage extents and the obtaining of storage blocks within the extents
KR20220053017A (en) Space-time fusion-multiplication-addition and related systems, methods and devices
US6829691B2 (en) System for compressing/decompressing data
US5752271A (en) Method and apparatus for using double precision addressable registers for single precision data
GB2571352A (en) An apparatus and method for accessing metadata when debugging a device
CN1269026C (en) register move operation
US11782718B2 (en) Implied fence on stream open
Melnyk Parallel Ordered-Access Machine Computational Model and Architecture
TW202331713A (en) Method for storing and accessing a data operand in a memory unit
TW202340947A (en) Technique for handling data elements stored in an array storage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: NXP CO., LTD.

Free format text: FORMER OWNER: KONINKLIJKE PHILIPS ELECTRONICS N.V.

Effective date: 20070914

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20070914

Address after: Holland Ian Deho Finn

Applicant after: Koninkl Philips Electronics NV

Address before: Holland Ian Deho Finn

Applicant before: Koninklijke Philips Electronics N.V.

C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication