CN1672128A - Method and apparatus for accessing multiple vector elements in parallel - Google Patents
Method and apparatus for accessing multiple vector elements in parallel Download PDFInfo
- Publication number
- CN1672128A CN1672128A CN 03817860 CN03817860A CN1672128A CN 1672128 A CN1672128 A CN 1672128A CN 03817860 CN03817860 CN 03817860 CN 03817860 A CN03817860 A CN 03817860A CN 1672128 A CN1672128 A CN 1672128A
- Authority
- CN
- China
- Prior art keywords
- address
- memory
- vector
- configuration
- computer system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
Abstract
Vector processing is a suitable technique for processing applications that have large computational demands. Vector processors provide high-level operations that work on vectors, i.e. linear arrays of numbers. Vector operations can be made faster than a sequence of scalar operations on the same number or data items. Typical applications where vector processing can be used is the domain of audio and video processing. A vector memory system has a large data width, which allows retrieving a complete vector of data elements in one memory access using a single memory address. Subsequently, these data elements can be processed in parallel. However, when using vector memory systems the problem of vector alignment and ordering of a set of elements of a vector can occur. The present invention provides an improved method for vector alignment and ordering of vector elements in a computer system comprising a processor (PROC) and a multi-port memory (MEM), resulting in a better performance. The first step comprises passing of a base memory address to an address configuration unit (ACU). Next, defining a set of memory addresses by the address configuration unit (ACU) using the base memory address and a configuration instruction for configuring the address configuration unit. Finally, transmitting a vector to or from the multi-port memory (MEM) using the set of memory addresses.
Description
Technical field
The present invention relates to a kind of computer system, comprising:
Processor;
Multiport memory, described multiport memory can be by processor access.
The invention further relates to a kind of method that is used at described computer system used for vector transfer.
Further, the present invention relates to a kind of computer program that is used to realize described method.
Background technology
It is a kind of appropriate technology that is used to handle the application with a large amount of computation requirements that vector is handled.Vector processor provides the senior computing to vector (being linear arrays of numbers) work.Computing on the individual element of vector processor streamline used for vector transfer.Streamline not only comprises arithmetical operation, and comprises memory access and effective address calculating.In addition, most high-end vector processor all allows a plurality of computings to carry out simultaneously, creates parallel work-flow between the computing on the different elements.Vector instruction has several important characteristics.The first, each result's calculating is all irrelevant with previous result's calculating, and this allows very dark streamline and can not produce any data hazards.The second, vector instruction is equivalent to carry out whole circulation, and this has reduced the instruction bandwidth requirement.The 3rd, owing in single reference, retrieve complete vector but not the retrieve data element, thereby reduced the expense of memory access.For this reason, the scalar operation sequence that can allow identical numeral of vector calculus comparison or data item carry out is faster.It is exactly the field that Voice ﹠ Video is handled that the typical case that wherein can use vector to handle uses.
Vector memory system has big data width, and this allows to utilize the single memory address to come the complete vector of retrieve data element in the primary memory visit.Subsequently, can handle these data elements concurrently.Yet, when retrieve data from vector memory system, several problems may take place.The first, the problem of vector alignment (alignment) is relevant with the data that read from vector memory system across the vector border.Under the sort of situation, (that is, two vectors) content is come retrieve data, and subsequently the data of being asked is sent to new vector may to pass through two storage addresss of request.The second, when problem has not appearred in the order of one group of vector element of needs and their order of storage simultaneously.If need vector to have the orderly element set of in different vectors, storing, must retrieve the content of these vectors so, two memory accesses is succeeded by selecting suitable data element at least.United States Patent (USP) 5,933,650 have described the alignment that is used for vector element and the method for ordering.In the alignment of vector element, a vector is loaded into first register from storage unit, and another vector is loaded into second register from storage unit.Determine the start byte of first byte of the vector after appointment is alignd.Next, first in first byte of first register comes to extract vector from first register and second register via the position in second register continuously.At last, the vector that extracts is copied in the 3rd register, so that a plurality of elements that the 3rd register comprises after the alignment are handled for vector.According to the order of vector element, first vector is loaded into first register from storage unit, and second vector is loaded into second register from storage unit.Then, from first register and second register, select the subclass of element.Then, copy in the element in the 3rd register according to being suitable for element that certain order that vector subsequently handles will come from subclass.
The shortcoming of the method for the alignment of vector element and ordering just is in the prior art: need carry out once above read access to vector memory system, this has increased the expense of obtaining vector data.In addition, need additional hardware, for example, be used for storing the additional firmware of vector temporarily, wherein be necessary for the ordering of vector alignment or vector and from described vector, select element.
Summary of the invention
An object of the present invention is, the method after a kind of improvement that is used for the ordering of vector alignment and vector element is provided, this causes the more performance of vector processor.
This purpose is to utilize a kind of method that is used for used for vector transfer to realize, it is characterized in that: said method comprising the steps of:
Base memory address is delivered to address configuration means;
Utilize base memory address and the configuration-direct that is used for the configuration address inking device to define the storage stack address by address configuration means;
Utilize this set of memory addresses that vector is transferred to multiport memory/from the multiport memory used for vector transfer.
Described method allows to utilize the single memory base address that complete vector is transferred to multiport memory or transmits complete vector from multiport memory.The data element of vector can be transferred to the optional position in the storer or the data element of the optional position used for vector transfer in storer, this has improved dirigibility and has been avoided the problem relevant with the vector element ordering with vector alignment.In addition, use multiport memory to reduce instruction width in combination with described address configuration means.Can utilize the single memory base address to transmit complete vector, yet employed each storage address of multiport memory all should be present in the instruction.For the processor of some type, such as very large instruction word processors, reducing code size is a major issue.
According to the present invention, a kind of computer system is characterised in that: described computer system further comprises address configuration means, the configuration-direct that wherein said address configuration means is designed to utilize base memory address and be used for the configuration address inking device defines the storage stack address, and wherein said multiport memory is designed to use this set of memory addresses.Can utilize a base memory address that complete vector is transferred to multiport memory or transmits complete vector from multiport memory, this has reduced memory spending and has improved performance of computer systems.
Define the preferred embodiments of the present invention in the dependent claims.In claim 8, define a kind of computer program of being used for the method for used for vector transfer according to of the present invention of being used to realize.
Embodiment according to computer system of the present invention is characterised in that:
Address configuration means comprises: be arranged to a plurality of register files that disposed by configuration-direct and be used for a plurality of address calculation of computing store group of addresses;
Described register file can be visited by address calculation;
Described address calculation is coupled to multiport memory.
The a plurality of register files of described configuration instruction configures, and these register files can be preserved this configuration up to carrying out next bar configuration-direct.Between the two, for example, can repeatedly use this configuration in the cycle period of execution command.
Embodiment according to computer system of the present invention is characterised in that: configuration-direct comprises one group of side-play amount, and each side-play amount all combines with the base memory address of definition second memory address.Described side-play amount group directly can be loaded in a plurality of register files, and be used by a plurality of address calculation, this has improved the performance of address configuration means.
Description of drawings
Further illustrate and describe the feature of described embodiment with reference to the accompanying drawings:
Fig. 1 shows the synoptic diagram according to computer system of the present invention.
Fig. 2 shows the synoptic diagram of the storage system with multiport memory and address configuration means.
Embodiment
Fig. 1 shows the block diagram of the computer system that comprises processor P ROC, address configuration unit ACU, multiport memory MEM and system bus SB.Processor P ROC, address configuration unit ACU and multiport memory MEM are coupled via system bus SB.During executing instruction, in order to read or write the vector with data element, processor P ROC can issue operation so that visit multiport memory MEM.Before from multiport memory MEM, reading or write one group of data element, should come by the configuration-direct that sends by processor P ROC address dispensing unit ACU is configured.Configuration-direct is configured address dispensing unit ACU, so it can utilize base memory address to calculate for will be for the data elements groups that retrieves the multiport memory MEM specific storage stack address.The configuration of address calculation ACU remains unchanged up to sending next bar configuration-direct.After address dispensing unit ACU was configured, the processor issue comprised the read operation of base memory address, and this base memory address is sent to address calculation ACU.Subsequently, address calculation ACU calculates the storage stack address.These storage addresss are sent to multiport memory MEM via system bus SB, succeeded by read data elements from multiport memory MEM.These data elements are sent to processor P ROC as single vector, and use for further handling.If processor P ROC issues write operation, then just send base memory address to address configuration unit ACU.Address configuration unit ACU calculates the storage stack address, and described set of memory addresses is sent to multiport memory MEM via system bus SB.Also data element is sent to multiport memory MEM via system bus SB.In next step, data element is write multiport memory MEM.Next time write or read operation before, perhaps need to issue new configuration-direct according to needed set of memory addresses.For example, if one group of data element that must read need identical set of memory addresses and apply identical base memory address, so needn't the repeated configuration order.When using different base memory address but the required configuration of address configuration unit ACU when keeping identical also needn't be sent new configuration-direct.
Fig. 2 shows the block diagram of the storage system MS of an embodiment who comprises multiport memory MEM and address configuration unit ACU.Described multiport memory MEM comprises: RAM storer, four data input port DatIn, four address port Addr and four data output port DatOut.Address configuration unit ACU comprises: address port AddIn, four address calculation AU, four register file RF and four data input port DatIn.In this embodiment, data input Datln is the shared data input port that is used for address configuration unit ACU and multiport memory MEM.Address input end mouth AddrIn is coupled to address calculation AU, and address calculation AU is coupled to its corresponding address port Addr of multiport memory MEM.Data-in port DatIn is coupled to register file RF.Register file RF can be visited by address calculation AU.
Multiport memory MEM supports to be used for order that data are read and write.By utilizing address port Addr, can be via data-out port DatOut reading of data from the RAM storer.Four data elements that read can be merged into a vector from data-out port DatOut.Can via data-in port DatIn and be used for memory addressing address port Addr one group of four data element is write multiport memory.
Address configuration unit ACU supports configuration-direct, and described configuration-direct has been specified one group of side-play amount with respect to base memory address.When carrying out configuration-direct, offset value is write each register file RF via the data-in port DatIn of correspondence.Subsequently, address calculation AU takes out offset value from the register file RF of their correspondences, and this value is stored in inside.
If processor P ROC then just provides base memory address at address port Addrln place to storage system MS issue read operation.Address calculation AU obtains the value of base memory address from address input end mouth AddrIn, and increases their corresponding offset values.Address calculation AU sends to corresponding address port Addr with the set of memory addresses that obtains, and sends reading order to multiport memory MEM subsequently.Data-out port DatOut place at multiport memory MEM provides the data elements groups that obtains.Processor P ROC can also be to storage system MS issue write operation, so that one group of data element is write the RAM storer.Address port AddrIn reception memorizer base address.Address calculation AU utilizes base memory address and their corresponding offset values to calculate the storage stack address.Resulting set of memory addresses is sent to the corresponding address port Addr of multiport memory MEM.Data element is sent to the data-in port DatIn of multiport memory MEM.Subsequently, MEM sends write command to multiport memory, and described data element is write the RAM storer.
In other embodiments, configuration-direct can comprise to address configuration unit AU send be used to calculate the side-play amount group Management Information Base.
Utilize suitable configuration-direct, the side-play amount group that is received by register file RF is to combine with base memory address like this, makes address calculation AU can define set of memory addresses arbitrarily.Utilize this set of memory addresses, can simultaneously one group of data element be write multiport memory MEM or from multiport memory MEM, retrieve one group of data element simultaneously.Therefore, storage system MS plays vector memory system, it allow to utilize a base memory address from memory location arbitrarily retrieve one group of data element aspect preponderate.In addition, compare with multiport memory, storage system MS has following advantage: by utilizing a storage address, can be addressed to one group of data element, and need be from the storage stack address of external source.Consequently, can reduce instruction width, this is concerned about especially that for very large instruction word processors dwindling of code size is a major issue in described instruction word processors.
Should be noted in the discussion above that for example clear the present invention of the foregoing description and unrestricted the present invention, and those skilled in the art will can design many interchangeable embodiment under the situation of the scope that does not deviate from claims.In the claims, any reference marker that is positioned at bracket should not regarded as is the restriction claim.Word " comprises " not getting rid of except those and is listed in element or other element the step or the existence of step in the claim.The existence that word " " before the element or " one " do not get rid of a plurality of these class components.Hardware that can be by comprising several different elements is also realized the present invention by means of the computing machine of suitably programming.In enumerating the device claim of several means, several can both the realization in these devices by same hardware branch.The pure fact of some measure of narrating in mutually different dependent claims does not represent that the combination of these measures just not possesses advantage.
Claims (8)
1. be used for the method at the computer system used for vector transfer, described computer system comprises:
Processor;
Can be by the multiport memory of processor access,
It is characterized in that said method comprising the steps of:
Base memory address is delivered to address configuration means;
Utilize base memory address and the configuration-direct that is used for the configuration address inking device to define the storage stack address by address configuration means;
Utilize described set of memory addresses that vector is transferred to multiport memory/from the multiport memory used for vector transfer.
2. method according to claim 1, wherein:
Address configuration means comprises: by a plurality of register files of configuration instruction configures be used to calculate a plurality of address calculation of described set of memory addresses;
Described register file can be visited by address calculation;
Described address calculation is coupled to multiport memory.
3. method according to claim 1, wherein:
Described configuration-direct comprises one group of side-play amount, and each side-play amount combines with base memory address and defines the second memory address.
4. computer system comprises:
Processor;
Multiport memory, described multiport memory can be by processor access,
It is characterized in that described computer system further comprises: address configuration means, the configuration-direct that wherein said address configuration means is designed to utilize base memory address and be used for the configuration address inking device defines the storage stack address, and wherein said multiport memory is designed to use described set of memory addresses.
5. computer system according to claim 4, wherein:
Described address configuration means comprises: be arranged to a plurality of register files that disposed by configuration-direct and be used to calculate a plurality of address calculation of described set of memory addresses;
Described register file can be visited by address calculation;
Described address calculation is coupled to multiport memory.
6. computer system according to claim 4, wherein:
Described configuration-direct comprises one group of side-play amount, and each side-play amount combines with base memory address and defines the second memory address.
7. computer system according to claim 4, wherein: described multiport memory and address configuration means are included in the storage system.
8. computer program that comprises computer program code means, described computer program code means are used to indicate computer system to carry out the step of the method for claim 1.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02078074.8 | 2002-07-26 | ||
EP02078074 | 2002-07-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1672128A true CN1672128A (en) | 2005-09-21 |
Family
ID=31197898
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 03817860 Pending CN1672128A (en) | 2002-07-26 | 2003-07-10 | Method and apparatus for accessing multiple vector elements in parallel |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1527385A1 (en) |
JP (1) | JP2005534120A (en) |
CN (1) | CN1672128A (en) |
AU (1) | AU2003281792A1 (en) |
WO (1) | WO2004013752A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8184260B2 (en) | 2006-02-15 | 2012-05-22 | Thomson Licensing | Non-linear, digital dailies |
CN102930008A (en) * | 2012-10-29 | 2013-02-13 | 无锡江南计算技术研究所 | Vector table looking up method and processor |
CN103930883A (en) * | 2011-09-28 | 2014-07-16 | Arm有限公司 | Interleaving data accesses issued in response to vector access instructions |
CN110597558A (en) * | 2017-07-20 | 2019-12-20 | 上海寒武纪信息科技有限公司 | Neural network task processing system |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100349122C (en) * | 2005-08-19 | 2007-11-14 | 华为技术有限公司 | Method for realizing data packet sequencing for multi engine paralled processor |
CN100417142C (en) * | 2005-12-22 | 2008-09-03 | 华为技术有限公司 | Method for average distributing interface flow at multi network processor engines |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01265347A (en) * | 1988-04-18 | 1989-10-23 | Matsushita Electric Ind Co Ltd | Address generating device |
JPH0728786A (en) * | 1993-07-15 | 1995-01-31 | Hitachi Ltd | Vector processor |
US6463518B1 (en) * | 2000-06-19 | 2002-10-08 | Philips Electronics No. America Corp. | Generation of memory addresses for accessing a memory utilizing scheme registers |
-
2003
- 2003-07-10 WO PCT/IB2003/003150 patent/WO2004013752A1/en active Application Filing
- 2003-07-10 AU AU2003281792A patent/AU2003281792A1/en not_active Abandoned
- 2003-07-10 JP JP2004525660A patent/JP2005534120A/en active Pending
- 2003-07-10 CN CN 03817860 patent/CN1672128A/en active Pending
- 2003-07-10 EP EP03741006A patent/EP1527385A1/en not_active Withdrawn
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8184260B2 (en) | 2006-02-15 | 2012-05-22 | Thomson Licensing | Non-linear, digital dailies |
CN103930883A (en) * | 2011-09-28 | 2014-07-16 | Arm有限公司 | Interleaving data accesses issued in response to vector access instructions |
CN103930883B (en) * | 2011-09-28 | 2017-02-15 | Arm 有限公司 | Interleaving data accesses method and device in response to vector access instructions |
CN102930008A (en) * | 2012-10-29 | 2013-02-13 | 无锡江南计算技术研究所 | Vector table looking up method and processor |
CN102930008B (en) * | 2012-10-29 | 2015-10-07 | 无锡江南计算技术研究所 | Vector look-up method |
CN110597558A (en) * | 2017-07-20 | 2019-12-20 | 上海寒武纪信息科技有限公司 | Neural network task processing system |
CN110597558B (en) * | 2017-07-20 | 2021-11-12 | 上海寒武纪信息科技有限公司 | Neural network task processing system |
Also Published As
Publication number | Publication date |
---|---|
WO2004013752A1 (en) | 2004-02-12 |
JP2005534120A (en) | 2005-11-10 |
AU2003281792A1 (en) | 2004-02-23 |
EP1527385A1 (en) | 2005-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5832290A (en) | Apparatus, systems and method for improving memory bandwidth utilization in vector processing systems | |
US20200371888A1 (en) | Streaming engine with deferred exception reporting | |
JP3098071B2 (en) | Computer system for efficient execution of programs with conditional branches | |
JP4987882B2 (en) | Thread-optimized multiprocessor architecture | |
US5586256A (en) | Computer system using multidimensional addressing between multiple processors having independently addressable internal memory for efficient reordering and redistribution of data arrays between the processors | |
JP2010532905A (en) | Thread-optimized multiprocessor architecture | |
US20050273576A1 (en) | Microprocessor with integrated high speed memory | |
CN1702773A (en) | Programmable parallel lookup memory | |
US20190187903A1 (en) | Streaming engine with fetch ahead hysteresis | |
US11709778B2 (en) | Streaming engine with early and late address and loop count registers to track architectural state | |
US20230028372A1 (en) | Memory shapes | |
US20060155953A1 (en) | Method and apparatus for accessing multiple vector elements in parallel | |
CN108268596A (en) | Search for the method and system of data stored in memory | |
CN1672128A (en) | Method and apparatus for accessing multiple vector elements in parallel | |
US11640397B2 (en) | Acceleration of data queries in memory | |
US8656133B2 (en) | Managing storage extents and the obtaining of storage blocks within the extents | |
KR20220053017A (en) | Space-time fusion-multiplication-addition and related systems, methods and devices | |
US6829691B2 (en) | System for compressing/decompressing data | |
US5752271A (en) | Method and apparatus for using double precision addressable registers for single precision data | |
GB2571352A (en) | An apparatus and method for accessing metadata when debugging a device | |
CN1269026C (en) | register move operation | |
US11782718B2 (en) | Implied fence on stream open | |
Melnyk | Parallel Ordered-Access Machine Computational Model and Architecture | |
TW202331713A (en) | Method for storing and accessing a data operand in a memory unit | |
TW202340947A (en) | Technique for handling data elements stored in an array storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
ASS | Succession or assignment of patent right |
Owner name: NXP CO., LTD. Free format text: FORMER OWNER: KONINKLIJKE PHILIPS ELECTRONICS N.V. Effective date: 20070914 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20070914 Address after: Holland Ian Deho Finn Applicant after: Koninkl Philips Electronics NV Address before: Holland Ian Deho Finn Applicant before: Koninklijke Philips Electronics N.V. |
|
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |