CN1622031A

CN1622031A - Processor and method for trans-boundary aligned multiple transient memory data

Info

Publication number: CN1622031A
Application number: CN 200310118814
Authority: CN
Inventors: 梁伯嵩
Original assignee: Sunplus Technology Co Ltd
Current assignee: Sunplus Technology Co Ltd
Priority date: 2003-11-28
Filing date: 2003-11-28
Publication date: 2005-06-01
Anticipated expiration: 2023-11-28
Also published as: CN1297887C

Abstract

The present invention proposes one kind of processor capable of aligning datan in several temporary storages across boundary and its method. There are one decoder for decoding multiple shift command; one temporary storage file with several temporary storages of N bits; one shifter to connect the first output content and the second output content in the temporary storage file serially to form one 2N bit word block, shift the 2N bit word block by w bits and output the first N bits; and one controller to set the temporary storage file according to the decoded multiple shift command, read out the content in corresponding temporary storage, and write the w bits shifted output of the shift into the temporary storage file.

Description

Trans-boundary alignment multiple transient memory DATA PROCESSING device and method thereof

Technical field

The invention relates to the technical field of Data Processing; Especially refer to a kind of trans-boundary alignment multiple transient memory DATA PROCESSING device and method thereof utilized.

Background technology

When processor carried out Data Processing, whether the alignment of data was related to the usefulness of many key operations, for example the usefulness of computing such as word string, array.As shown in Figure 1, a data (ABCDEFGHIJKL) that needs to handle is often crossed over the data storage border, when a processor carries out word string or array operation to this document, need to carry out earlier many extra computings, so that after can be with this document being reduced into the form of alignment, this processor could be to the document utilization of being correlated with.

At the unjustified problem of processing data, a kind of known method is after data is written into processor, utilizes various processor instructions to operate again and obtain needed data.As shown in Figure 2, the data (ZABC) that will be arranged in the 100h place earlier is written into working storage R16, working storage R16 is moved to left 8 bits so that unwanted data (Z) is removed, the data (DEFG) that will be arranged in the 104h place again is written into working storage R17, and working storage R17 moved to right 24 bits so that unwanted data (EFG) is removed, at last with working storage R16 and working storage R17 carries out or (OR) computing and its result deposited to working storage R16, the content among this moment working storage R16 is the data (ABCD) of required processing.According to above-mentioned same steps as, data EFGH and IJKL are written among working storage R17 and the working storage R18 in regular turn.

As shown in the above description: if the required unjustified data length that is written into is n word group (a word group is 32 bits), known method then needs 5n instruction to describe and reads action, simultaneously need 5n instruction cycle just can finish at least and read action, this makes procedure code tediously long, occupy the storage area, the burden that also increases processor simultaneously makes processor efficient unclear.

Use processor instruction to handle the problem that unjustified data is drawn the tediously long and efficient of Hyper program sign indicating number at known method, in U.S. USP4,814, in No. 976 patent announcements, be to be written into the action that unjustified data is promptly alignd simultaneously, and, be divided into twice and read a document of crossing the boundary.As shown in Figure 3, the data (ABC) that will be arranged in 101h to 103h place earlier is written into the bit group 0,1,2 of working storage R16, this moment working storage R16 bit group 3 in data be X (don ' t care), the data (D) that will be arranged in the 104h place again is written into the bit group 3 of R16, and the content among the working storage R16 is the data (ABCD) of required processing at this moment.Same steps as is written into data EFGH and IJKL among working storage R17 and the working storage R18 in regular turn according to this.

As shown in the above description,, then need 2n instruction to describe and read action, need 2n instruction cycle just can finish at least simultaneously and read action if the required unjustified data length that is written into is n word group.And, make the processor pipeline stop (Pipeline Stall) possibility and improve because same reservoir and working storage position are made repetitive read-write.Same reservoir position is repeated to read, can waste bus bandwidth, especially in some system that does not have cache, the delay that is caused is obvious especially.

Summary of the invention

The object of the present invention is to provide a kind of with trans-boundary alignment multiple transient memory DATA PROCESSING device and method thereof, tediously long with the procedure code of avoiding known technology, as to occupy storage area problem, can avoid because same reservoir is repeated to read waste bus bandwidth simultaneously

According to one of characteristic of the present invention, a kind of trans-boundary alignment multiple transient memory DATA PROCESSING apparatus is proposed, it mainly comprises:

One decoding device is decoded so that a multiple shift is instructed;

One working storage archives, have a plurality of working storages, each working storage is the N bit, these working storage archives can read working storage respectively according to one first address and one second address, and by one first output terminal and the output of one second output terminal, and can write this multiple transient memory one of them (N is a positive integer) via an input end according to one the 3rd address;

One shift unit, be coupled to first output terminal and second output terminal of these working storage archives, and the output content of this first output terminal and second output terminal is concatenated into a 2N bit word group, again according to a shift value w with this 2N bit word group displacement w bit (w is a positive integer), and export top n bit in this 2N bit word group; And

One control device, be coupled to this decoding device and working storage archives, according to this decoded multiple shift instruction, to set this first address, second address, the 3rd address and shift value w, read the content of corresponding working storage, with by this shift unit with the content of read working storage displacement w bit, and the output of this shift unit is write this working storage archives according to the 3rd address.

Described device, wherein N is 32.

Described device, wherein w be 8,16,24 one of them.

Described device, wherein this shift unit w bit that can be shifted to the left or to the right.

Described device, wherein the 3rd address is that setting is identical with this first address.

Described device, wherein this second address is the follow-up address that is set at this first address.

According to another characteristic of the present invention, the align method of a plurality of working storage data of a kind of trans-boundary is proposed, these a plurality of working storages form working storage archives, each working storage is the N bit, these working storage archives can read working storage respectively according to one first address and one second address, and by one first output terminal and the output of one second output terminal, and can write this multiple transient memory one of them (N is a positive integer) via an input end according to one the 3rd address, this method mainly comprises the following step:

(A) set this first address, this second address, the 3rd address and a shift value w according to multiple shift instruction;

(B) content of reading corresponding working storage according to this first address and second address; And

(C) content strings of step (B) working storage of reading is connected into the word group of 2N bit, again to this 2N bit word group w bit that is shifted, and top n bit in this 2N bit word group after will being shifted, according to the 3rd address write these a plurality of working storages one of them.

Described method, wherein step (A) to step (C) is heavily to cover execution, has all finished displacement up to the working storage of a predetermined number.

Described method, wherein N is 32.

Described method, wherein w be 8,16,24 one of them.

Described method, wherein displacement w bit can be the w bit that is shifted to the left or to the right in the step (C).

Described method, wherein the 3rd address is that setting is identical with this first address.

Described method, wherein this second address is the follow-up address that is set at this first address.

Description of drawings

Fig. 1: be one group of synoptic diagram that unjustified data is arranged in reservoir.

Fig. 2: the procedure code that is written into one group of unjustified data for known technology.

Fig. 3: for another known technology is written into the procedure code of one group of unjustified data and the synoptic diagram of working storage.

Fig. 4: be the calcspar of trans-boundary alignment multiple transient memory DATA PROCESSING apparatus of the present invention.

Fig. 5: be the detailed circuit diagram of the technology of the present invention control device 5.

Fig. 6: be the technology of the present invention running synoptic diagram.

Fig. 7: be an exemplary applications of the technology of the present invention.

Embodiment

Fig. 4 shows the calcspar that utilizes trans-boundary alignment multiple transient memory DATA PROCESSING device of the present invention, and it includes a decoding device 100, a control device 200, working storage archives 300 and a shift unit 400.Working storage archives 300 have a plurality of working storages 3001, and each working storage 3001 is the N bit, and in the present embodiment, the N value is preferably 32.These working storage archives 300 can read working storage 3001 respectively according to one first address 301 and one second address 302, and by one first output terminal 310 and 320 outputs of one second output terminal, and can write this multiple transient memory 3001 one of them (N is a positive integer) via an input end 330 according to one the 3rd address 303.

This decoding device 100 is that instruction is decoded to a multiple shift, and this multiple shift instruction can be divided into a multiple left shift instruction (Multiple Left Shin Instruction, MLSI) and a multiple right shift instruction (Multiple Right Shift Instruction, MRSI).Wherein, multiple left shift instruction form is MLSI Rx, Ry, and w, it is represented the working storage contents value in x to the y scope, and integral body is carried out to the action w bit that shifts left.And multiple right shift instruction form is MRSI Rx, Ry, and w, it is represented the working storage contents value in x to the y scope, and integral body is carried out the action w bit of right shift.Decoding device 100 is after instruction is decoded to a multiple shift, can produce x, y, L_R ^*And the w signal, and export this control device 200 to, and wherein, L_R ^*Signal is only first in order to the mobile to the left or to the right w of indication, works as L_R ^*Signal is 1 o'clock, and expression is moved to the left the w bit, works as L_R ^*Signal is 0 o'clock, represents to move right the w bit.

This shift unit 400 is first output terminal 310 and second output terminals 320 that are coupled to these working storage archives 300, and the output content of this first output terminal 310 and second output terminal 320 is concatenated into one 64 bit space groups, again according to a shift value w and a L_R ^*Signal is this 64 bit word group w bit (w is a positive integer) that is shifted to the left or to the right, and exports preceding 32 bits in these displacement back 64 bit word groups.

This control device 200 is coupled to this decoding device 100 and working storage archives 300, according to this decoded x, y, and L_R ^*And w signal, setting first address 301, second address 302, the 3rd address 303 and the shift value w of these working storage archives 300, and the content of reading x working storage and y working storage in these working storage archives 300 by first output terminal 310 of these working storage archives 300 and second output terminal 320.

Fig. 5 is the detailed circuit diagram of this control device 200, and it mainly comprises a multiplexer 210, a comparer 220, one first address working storage 230, a totalizer 240 and one second address working storage 250.This multiplexer 210 is selected an x signal that is produced by decoding device 100 or by the contents value of this second address working storage 250.The output of this multiplexer 210 writes this first address working storage 230, and it exports first address 301 of these working storage archives 300 to, with the working storage 3001 of these first address, 301 indications of access.This totalizer 240 is written to this second address working storage 250 after the contents value of this first address working storage 230 is added 1 again, and the contents value of this second address working storage 250 is in order to the working storage 3001 of these second address, 302 indications of access.This comparer 220 is the contents value of this first address working storage 230 and the y signal that decoding device 100 is produced relatively, if the contents value of this first address working storage 230 during more than or equal to this y signal, then produces a stop signal (stop signal).

Fig. 6 shows running synoptic diagram of the present invention, and it carries out a MLSI R16, R19, and 8 instructions, this instruction represent that contents value with working storage R16, R17, R18 and R19 is to 8 bits that shift left.When first performance period began, these decoding device 100 these instructions of decoding, and produce x=16, y=19, L_R ^*=1 and the w=8 signal.This multiplexer 210 is selected an x signal (=16) that is produced by decoding device 100, and 200 of control device insert 16 with this first address working storage 230, and via these totalizer 240 computings this second address working storage 250 are inserted 17.Because the first address working storage 230 is 16, it is less than 19, so comparer 220 can not produce this stop signal (stop signal).That is these working storage archives 300 can according to this first address 301 (=16) and second address 302 (=17) read respectively working storage R16 contents value (=ZABC) and the contents value of R17 (=DEFG).And export this shift unit 400 to by first output terminal 310 and second output terminal 320.

This shift unit 400 with the contents value of this first output terminal 310 (=ZABC) and the contents value of second output terminal 320 (=DEFG) be concatenated into one 64 bit word groups (=ZABCDEFG), again according to a shift value w=8 and a L_R ^*=1 signal with this 64 bit word group to 8 bits that shift left (=ABCDEFG0), and export in the 64 bit word groups of this displacement back (=ABCDEFG0) preceding 3 bits (=ABCD).200 of control device according to the 3rd address 303 with the output of this shift unit 400 (=ABCD) write among the working storage R16 of these working storage archives 300.

When second performance period began, this multiplexer 210 is selected the contents value (=17) of this second address working storage 250,200 of control device insert 18 with this first address working storage 230, and via these totalizer 240 computings this second address working storage 250 are inserted 18.Its implementation was same as for first performance period, so when second performance period finished, the contents value of this working storage R17 was EFGH.In like manner, so when the 3rd performance period finished, the contents value of this working storage R18 was IJKL.

When the 4th performance period began, this multiplexer 210 is selected the contents value (=19) of this second address working storage 250,200 of control device insert 19 with this first address working storage 230, because the first address working storage 230 is 19, so comparer 220 can produce this stop signal (stop signal) and stop executive routine, that is only needs three performance periods to get final product.

Fig. 7 shows utilization synoptic diagram of the present invention, when desire is written into one group of unjustified data, can respectively unjustified data be written among working storage R16, R17, R18 and the R19 with being written into instruction (LW) earlier, re-using multiple left shift instruction of the present invention (MLSI) can finish.As shown in Figure 7, its procedure code only needs 5 word groups.

As shown in the above description, technology of the present invention can solve the problem that the known technology procedure code is tediously long, occupy the storage area, can avoid because same reservoir is repeated to read the problem of waste bus bandwidth simultaneously.

It should be noted that above-mentioned many embodiment give an example for convenience of explanation, the interest field that the present invention advocated should be as the criterion so that claim is described certainly, but not only limits to the foregoing description.

Claims

1. trans-boundary alignment multiple transient memory DATA PROCESSING apparatus mainly comprises:

One decoding device is decoded so that a multiple shift is instructed;

2. device as claimed in claim 1 is characterized in that, wherein N is 32.

3. device as claimed in claim 1 is characterized in that, wherein w be 8,16,24 one of them.

4. device as claimed in claim 1 is characterized in that, wherein this shift unit w bit that can be shifted to the left or to the right.

5. device as claimed in claim 1 is characterized in that, wherein the 3rd address is that setting is identical with this first address.

6. device as claimed in claim 1 is characterized in that, wherein this second address is the follow-up address that is set at this first address.

7. the trans-boundary method of a plurality of working storage data of aliging, these a plurality of working storages form working storage archives, each working storage is the N bit, these working storage archives can read working storage respectively according to one first address and one second address, and by one first output terminal and the output of one second output terminal, and can write this multiple transient memory one of them (N is a positive integer) via an input end according to one the 3rd address, this method mainly comprises the following step:

8. method as claimed in claim 7 is characterized in that, wherein step (A) to step (C) is heavily to cover execution, has all finished displacement up to the working storage of a predetermined number.

9. method as claimed in claim 7 is characterized in that, wherein N is 32.

10. method as claimed in claim 7 is characterized in that, wherein w be 8,16,24 one of them.

11. method as claimed in claim 7 is characterized in that, wherein displacement w bit can be the w bit that is shifted to the left or to the right in the step (C).

12. method as claimed in claim 7 is characterized in that, wherein the 3rd address is that setting is identical with this first address.

13. method as claimed in claim 7 is characterized in that, wherein this second address is the follow-up address that is set at this first address.