CN1672128A

CN1672128A - Method and apparatus for accessing multiple vector elements in parallel

Info

Publication number: CN1672128A
Application number: CN 03817860
Authority: CN
Inventors: A·A·M·范维尔
Original assignee: Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2002-07-26
Filing date: 2003-07-10
Publication date: 2005-09-21
Also published as: WO2004013752A1; JP2005534120A; AU2003281792A1; EP1527385A1

Abstract

Vector processing is a suitable technique for processing applications that have large computational demands. Vector processors provide high-level operations that work on vectors, i.e. linear arrays of numbers. Vector operations can be made faster than a sequence of scalar operations on the same number or data items. Typical applications where vector processing can be used is the domain of audio and video processing. A vector memory system has a large data width, which allows retrieving a complete vector of data elements in one memory access using a single memory address. Subsequently, these data elements can be processed in parallel. However, when using vector memory systems the problem of vector alignment and ordering of a set of elements of a vector can occur. The present invention provides an improved method for vector alignment and ordering of vector elements in a computer system comprising a processor (PROC) and a multi-port memory (MEM), resulting in a better performance. The first step comprises passing of a base memory address to an address configuration unit (ACU). Next, defining a set of memory addresses by the address configuration unit (ACU) using the base memory address and a configuration instruction for configuring the address configuration unit. Finally, transmitting a vector to or from the multi-port memory (MEM) using the set of memory addresses.

Description

The method and apparatus that is used for a plurality of vector element of concurrent access

Technical field

The present invention relates to a kind of computer system, comprising:

Processor;

Multiport memory, described multiport memory can be by processor access.

The invention further relates to a kind of method that is used at described computer system used for vector transfer.

Further, the present invention relates to a kind of computer program that is used to realize described method.

Background technology

It is a kind of appropriate technology that is used to handle the application with a large amount of computation requirements that vector is handled.Vector processor provides the senior computing to vector (being linear arrays of numbers) work.Computing on the individual element of vector processor streamline used for vector transfer.Streamline not only comprises arithmetical operation, and comprises memory access and effective address calculating.In addition, most high-end vector processor all allows a plurality of computings to carry out simultaneously, creates parallel work-flow between the computing on the different elements.Vector instruction has several important characteristics.The first, each result's calculating is all irrelevant with previous result's calculating, and this allows very dark streamline and can not produce any data hazards.The second, vector instruction is equivalent to carry out whole circulation, and this has reduced the instruction bandwidth requirement.The 3rd, owing in single reference, retrieve complete vector but not the retrieve data element, thereby reduced the expense of memory access.For this reason, the scalar operation sequence that can allow identical numeral of vector calculus comparison or data item carry out is faster.It is exactly the field that Voice ﹠ Video is handled that the typical case that wherein can use vector to handle uses.

Vector memory system has big data width, and this allows to utilize the single memory address to come the complete vector of retrieve data element in the primary memory visit.Subsequently, can handle these data elements concurrently.Yet, when retrieve data from vector memory system, several problems may take place.The first, the problem of vector alignment (alignment) is relevant with the data that read from vector memory system across the vector border.Under the sort of situation, (that is, two vectors) content is come retrieve data, and subsequently the data of being asked is sent to new vector may to pass through two storage addresss of request.The second, when problem has not appearred in the order of one group of vector element of needs and their order of storage simultaneously.If need vector to have the orderly element set of in different vectors, storing, must retrieve the content of these vectors so, two memory accesses is succeeded by selecting suitable data element at least.United States Patent (USP) 5,933,650 have described the alignment that is used for vector element and the method for ordering.In the alignment of vector element, a vector is loaded into first register from storage unit, and another vector is loaded into second register from storage unit.Determine the start byte of first byte of the vector after appointment is alignd.Next, first in first byte of first register comes to extract vector from first register and second register via the position in second register continuously.At last, the vector that extracts is copied in the 3rd register, so that a plurality of elements that the 3rd register comprises after the alignment are handled for vector.According to the order of vector element, first vector is loaded into first register from storage unit, and second vector is loaded into second register from storage unit.Then, from first register and second register, select the subclass of element.Then, copy in the element in the 3rd register according to being suitable for element that certain order that vector subsequently handles will come from subclass.

The shortcoming of the method for the alignment of vector element and ordering just is in the prior art: need carry out once above read access to vector memory system, this has increased the expense of obtaining vector data.In addition, need additional hardware, for example, be used for storing the additional firmware of vector temporarily, wherein be necessary for the ordering of vector alignment or vector and from described vector, select element.

Summary of the invention

An object of the present invention is, the method after a kind of improvement that is used for the ordering of vector alignment and vector element is provided, this causes the more performance of vector processor.

This purpose is to utilize a kind of method that is used for used for vector transfer to realize, it is characterized in that: said method comprising the steps of:

Base memory address is delivered to address configuration means;

Utilize base memory address and the configuration-direct that is used for the configuration address inking device to define the storage stack address by address configuration means;

Utilize this set of memory addresses that vector is transferred to multiport memory/from the multiport memory used for vector transfer.

Described method allows to utilize the single memory base address that complete vector is transferred to multiport memory or transmits complete vector from multiport memory.The data element of vector can be transferred to the optional position in the storer or the data element of the optional position used for vector transfer in storer, this has improved dirigibility and has been avoided the problem relevant with the vector element ordering with vector alignment.In addition, use multiport memory to reduce instruction width in combination with described address configuration means.Can utilize the single memory base address to transmit complete vector, yet employed each storage address of multiport memory all should be present in the instruction.For the processor of some type, such as very large instruction word processors, reducing code size is a major issue.

According to the present invention, a kind of computer system is characterised in that: described computer system further comprises address configuration means, the configuration-direct that wherein said address configuration means is designed to utilize base memory address and be used for the configuration address inking device defines the storage stack address, and wherein said multiport memory is designed to use this set of memory addresses.Can utilize a base memory address that complete vector is transferred to multiport memory or transmits complete vector from multiport memory, this has reduced memory spending and has improved performance of computer systems.

Define the preferred embodiments of the present invention in the dependent claims.In claim 8, define a kind of computer program of being used for the method for used for vector transfer according to of the present invention of being used to realize.

Embodiment according to computer system of the present invention is characterised in that:

Address configuration means comprises: be arranged to a plurality of register files that disposed by configuration-direct and be used for a plurality of address calculation of computing store group of addresses;

Described register file can be visited by address calculation;

Described address calculation is coupled to multiport memory.

The a plurality of register files of described configuration instruction configures, and these register files can be preserved this configuration up to carrying out next bar configuration-direct.Between the two, for example, can repeatedly use this configuration in the cycle period of execution command.

Embodiment according to computer system of the present invention is characterised in that: configuration-direct comprises one group of side-play amount, and each side-play amount all combines with the base memory address of definition second memory address.Described side-play amount group directly can be loaded in a plurality of register files, and be used by a plurality of address calculation, this has improved the performance of address configuration means.

Description of drawings

Further illustrate and describe the feature of described embodiment with reference to the accompanying drawings:

Fig. 1 shows the synoptic diagram according to computer system of the present invention.

Fig. 2 shows the synoptic diagram of the storage system with multiport memory and address configuration means.

Embodiment

Fig. 1 shows the block diagram of the computer system that comprises processor P ROC, address configuration unit ACU, multiport memory MEM and system bus SB.Processor P ROC, address configuration unit ACU and multiport memory MEM are coupled via system bus SB.During executing instruction, in order to read or write the vector with data element, processor P ROC can issue operation so that visit multiport memory MEM.Before from multiport memory MEM, reading or write one group of data element, should come by the configuration-direct that sends by processor P ROC address dispensing unit ACU is configured.Configuration-direct is configured address dispensing unit ACU, so it can utilize base memory address to calculate for will be for the data elements groups that retrieves the multiport memory MEM specific storage stack address.The configuration of address calculation ACU remains unchanged up to sending next bar configuration-direct.After address dispensing unit ACU was configured, the processor issue comprised the read operation of base memory address, and this base memory address is sent to address calculation ACU.Subsequently, address calculation ACU calculates the storage stack address.These storage addresss are sent to multiport memory MEM via system bus SB, succeeded by read data elements from multiport memory MEM.These data elements are sent to processor P ROC as single vector, and use for further handling.If processor P ROC issues write operation, then just send base memory address to address configuration unit ACU.Address configuration unit ACU calculates the storage stack address, and described set of memory addresses is sent to multiport memory MEM via system bus SB.Also data element is sent to multiport memory MEM via system bus SB.In next step, data element is write multiport memory MEM.Next time write or read operation before, perhaps need to issue new configuration-direct according to needed set of memory addresses.For example, if one group of data element that must read need identical set of memory addresses and apply identical base memory address, so needn't the repeated configuration order.When using different base memory address but the required configuration of address configuration unit ACU when keeping identical also needn't be sent new configuration-direct.

Fig. 2 shows the block diagram of the storage system MS of an embodiment who comprises multiport memory MEM and address configuration unit ACU.Described multiport memory MEM comprises: RAM storer, four data input port DatIn, four address port Addr and four data output port DatOut.Address configuration unit ACU comprises: address port AddIn, four address calculation AU, four register file RF and four data input port DatIn.In this embodiment, data input Datln is the shared data input port that is used for address configuration unit ACU and multiport memory MEM.Address input end mouth AddrIn is coupled to address calculation AU, and address calculation AU is coupled to its corresponding address port Addr of multiport memory MEM.Data-in port DatIn is coupled to register file RF.Register file RF can be visited by address calculation AU.

Multiport memory MEM supports to be used for order that data are read and write.By utilizing address port Addr, can be via data-out port DatOut reading of data from the RAM storer.Four data elements that read can be merged into a vector from data-out port DatOut.Can via data-in port DatIn and be used for memory addressing address port Addr one group of four data element is write multiport memory.

Address configuration unit ACU supports configuration-direct, and described configuration-direct has been specified one group of side-play amount with respect to base memory address.When carrying out configuration-direct, offset value is write each register file RF via the data-in port DatIn of correspondence.Subsequently, address calculation AU takes out offset value from the register file RF of their correspondences, and this value is stored in inside.

If processor P ROC then just provides base memory address at address port Addrln place to storage system MS issue read operation.Address calculation AU obtains the value of base memory address from address input end mouth AddrIn, and increases their corresponding offset values.Address calculation AU sends to corresponding address port Addr with the set of memory addresses that obtains, and sends reading order to multiport memory MEM subsequently.Data-out port DatOut place at multiport memory MEM provides the data elements groups that obtains.Processor P ROC can also be to storage system MS issue write operation, so that one group of data element is write the RAM storer.Address port AddrIn reception memorizer base address.Address calculation AU utilizes base memory address and their corresponding offset values to calculate the storage stack address.Resulting set of memory addresses is sent to the corresponding address port Addr of multiport memory MEM.Data element is sent to the data-in port DatIn of multiport memory MEM.Subsequently, MEM sends write command to multiport memory, and described data element is write the RAM storer.

In other embodiments, configuration-direct can comprise to address configuration unit AU send be used to calculate the side-play amount group Management Information Base.

Utilize suitable configuration-direct, the side-play amount group that is received by register file RF is to combine with base memory address like this, makes address calculation AU can define set of memory addresses arbitrarily.Utilize this set of memory addresses, can simultaneously one group of data element be write multiport memory MEM or from multiport memory MEM, retrieve one group of data element simultaneously.Therefore, storage system MS plays vector memory system, it allow to utilize a base memory address from memory location arbitrarily retrieve one group of data element aspect preponderate.In addition, compare with multiport memory, storage system MS has following advantage: by utilizing a storage address, can be addressed to one group of data element, and need be from the storage stack address of external source.Consequently, can reduce instruction width, this is concerned about especially that for very large instruction word processors dwindling of code size is a major issue in described instruction word processors.

Should be noted in the discussion above that for example clear the present invention of the foregoing description and unrestricted the present invention, and those skilled in the art will can design many interchangeable embodiment under the situation of the scope that does not deviate from claims.In the claims, any reference marker that is positioned at bracket should not regarded as is the restriction claim.Word " comprises " not getting rid of except those and is listed in element or other element the step or the existence of step in the claim.The existence that word " " before the element or " one " do not get rid of a plurality of these class components.Hardware that can be by comprising several different elements is also realized the present invention by means of the computing machine of suitably programming.In enumerating the device claim of several means, several can both the realization in these devices by same hardware branch.The pure fact of some measure of narrating in mutually different dependent claims does not represent that the combination of these measures just not possesses advantage.

Claims

1. be used for the method at the computer system used for vector transfer, described computer system comprises:

Processor;

Can be by the multiport memory of processor access,

It is characterized in that said method comprising the steps of:

Base memory address is delivered to address configuration means;

Utilize described set of memory addresses that vector is transferred to multiport memory/from the multiport memory used for vector transfer.

2. method according to claim 1, wherein:

Address configuration means comprises: by a plurality of register files of configuration instruction configures be used to calculate a plurality of address calculation of described set of memory addresses;

Described register file can be visited by address calculation;

Described address calculation is coupled to multiport memory.

3. method according to claim 1, wherein:

Described configuration-direct comprises one group of side-play amount, and each side-play amount combines with base memory address and defines the second memory address.

4. computer system comprises:

Processor;

Multiport memory, described multiport memory can be by processor access,

It is characterized in that described computer system further comprises: address configuration means, the configuration-direct that wherein said address configuration means is designed to utilize base memory address and be used for the configuration address inking device defines the storage stack address, and wherein said multiport memory is designed to use described set of memory addresses.

5. computer system according to claim 4, wherein:

Described address configuration means comprises: be arranged to a plurality of register files that disposed by configuration-direct and be used to calculate a plurality of address calculation of described set of memory addresses;

Described register file can be visited by address calculation;

Described address calculation is coupled to multiport memory.

6. computer system according to claim 4, wherein:

7. computer system according to claim 4, wherein: described multiport memory and address configuration means are included in the storage system.

8. computer program that comprises computer program code means, described computer program code means are used to indicate computer system to carry out the step of the method for claim 1.