From the mapping mode of CACHE and core address, core address is divided into three position sections in tradition shown in Figure 1.If core address is the m position.The lowest order of core address is that (CACHE is capable to contain 2 to the byte number of data in a CACHE is capable
wIndividual byte), be called capable intrinsic displacement field d; N position in the middle of it is that (capacity of a CACHE memory bank is 2 to capable number of CACHE
nOK), be called row field 1; Its highest t position is zone bit field tag (t=m-(n+w)).
In our XOR mapping mode, the zone bit field of CACHE is identical with traditional mapping mode with the generation type of the capable intrinsic displacement field of CACHE, and CACHE row field then is to form with the step-by-step XOR between the certain bits of traditional row field 1 and zone bit field tag.And the structure of CACHE system just as shown in Figure 2.
The mapping function of the scattering mechanism among Fig. 2 should make 1 with s man-to-man mapping relations to be arranged, to improve the service efficiency in CACHE space in principle.Requirement according to system performance both can realize single mapping function, also realized the mapping function more than in the same mechanism of possibility, made user's (or program compiler) select different mapping functions according to different application problems.And choosing of mapping function can be carried out (selecting different mapping functions) according to different data access patterns, also can carry out according to different data structure sizes (choose 1, the different position section of tag carry out computing to form the certain bits section of s).
We will represent 1 field with j, and represent the CACHE that is formed by mapping function capable number (collection set number) with s with the tag field in the memory address in the i presentation graphs 2.This paper calls " XOR scattering mechanism " to the scattering mechanism that we propose, and the function that is proposed is referred to as " XOR mapping function "." XOR mapping function " this noun derives from Frailong etc. and the parallel memory bank of a class is tiltedly arranged the given generality of scheme describes (reference paper 2).Parallel memory bank tiltedly the bank number in row's scheme be changed to capable number of CACHE in the CACHE system, we provide following description.
An XOR mapping function is described by following formula:
S
τ=A×i
τB×j
τ
I in (formula 1) and j are the n bit vector, and A and B are all n * n matrix, and the computing in the formula is all carried out on GF (2) (galois field), and s is capable number of the CACHE that mapping function produces.
Please note, first, here require i and j to be all the n position, but the tag position section in the real system in the memory address and 1 section not necessarily have identical figure place (or owing to the reason of machine construction, they do not have the identical figure place that is suitable for conversion), at this moment as long as i and j are interpreted as that the corresponding positions section that participates in conversion in tag and 1 is just passable.The second, be to satisfy the requirement that makes full use of the CACHE space, A and B should be nonsingular matrix, could make like this 1 and s have man-to-man mapping relations.
The mapping function of two kinds of our propositions is below described.
The EE function is tiltedly one of row's schemes of several parallel memory banks, and its directly perceived and detailed description can be referring to ((reference paper 3).Must explanation be that in [reference paper 3], the EE function is that the oblique row's scheme as parallel memory system is suggested, and irrelevant with structure and the mapping mode of CACHE.The EE function can be expressed from the next:
S
τ=(R * i
τ) (I * j
τ) R is the back-diagonal matrix of a n * n in (formula 2) formula 2, and I is the unit matrix of a n * n, that is:
With r
U, vThe element of the capable v row of expression R u (0≤u, v≤n-1), then r when u=n-1-v
U, v=1, otherwise r
U, v=0.
An outstanding advantage of EE function be realize very simple.From its structure as can be seen its hardware only to need to realize several NOR gate circuits.Consider the development of device technology, foreseeable in recent years in, this function only wants several can realize that to tens XOR gate its cost can be described as very little.
The outstanding advantage of another of EE function is powerful.It can cover many most frequently used data access patterns.
At one 2
n* 2
nThe header element address of matrix be that it can guarantee the CACHE conflict-free access to following frequently-used data pattern under 0 the situation:
The delegation of matrix or row;
Rectangle master's piece of square and various Aspect Ratios;
Carried out the square or rectangular master piece that level or vertical direction move;
Discrete area is promptly by 2
nBe in 2 of the interior same position of piece in the individual square or rectangular master piece
nThe set that individual element constituted;
By the certain rule of two row (or two row) locational element constituted 2
nThe set of element is called " part row to " (or " part rows to ").
These data access patterns are data access patterns of using always in the core algorithm of applications such as scientific and engineering calculating, numerical analysis, image processing, signal Processing.Detailed, formal definition and proof are please referring to joint " character of mapping function EE and LR " down.
Should be noted that, the EE mapping function ensure to through level or vertical moving square and the character of rectangle data block conflict-free access be very significant to the CACHE system.Because this character has been arranged, even just can ensure for degree of association to be 1 the simplest CACHE, to any starting point by 2
nThe data block that individual continuous element constituted, any one CACHE walks to the capable conflict of CACHE that mostly occurs.In the CACHE system of reality, one-level CACHE has realized that degree of association is more than 2 or 2 mostly.Therefore we we can say, with the degree of association of EE Function Mapping more than or equal to 2 have a capable CACHE system of N, can avoid CACHE to conflict fully to any one visit that comprises the data block of N element in N * N matrix.
Consider the importance of blocks of data visit to a large amount of practical algorithms, obviously, this character of EE mapping function is significant more than or equal to 2 CACHE system for degree of association.This is that we propose to adopt the major reason of EE function as map addresses mode between CACHE and main memory.
When Fig. 3, Fig. 4 had provided by the EE Function Mapping, the various element of one 16 * 16 matrix was mapped to a signal that has among 16 CACHE that go.Suppose that this matrix occupies 0~255 data cell by depositing with behavior master (mode that the C language is adopted) in main memory, the numeral among the figure promptly is capable number of the CACHE that is mapped to through EE mapping back respective data element.We have marked the data pattern (square and rectangle data block) of several conflict-free accesss under the EE mapping.
Fig. 5 has provided the circuit that only has 16 row CACHE systems to realize the EE mapping to.Make formula 2 by the EE function, any delegation of matrix R (back-diagonal matrix) and matrix I (unit matrix) only has one 1, thus S any one only by one in the zone bit field (tag) of cpu address with former row field (1) in an XOR and form.Herein, the zone bit field of cpu data address and row field are 4.As seen from the figure, the EE function only needs n (£ nn=4 herein) XOR gate to realize, it postpones only is the delay (irrelevant with n) of 1 grade of NOR gate circuit.
The wherein row field of a termination CPC data address register of the input end of XOR gate in Fig. 5; The zone bit field tag of another termination cpu data address register, the output terminal of XOR gate inserts zone bit memory bank and data back respectively.
Though the EE function that we introduce previously can be realized the conflict-free access to a large amount of commonly used data access patterns, and has special significance to the data block access, but can not guarantee conflict-free access to any side's power step pitch vector of 2.The method that proposes in the list of references 4 can realize the conflict-free access to any side's power step pitch vector of 2, but its complex structure, and can not realize the conflict-free access to other a large amount of frequently-used data pattern simultaneously.Consider any side's power step pitch vector pattern importance in actual applications of 2, research is satisfied the structure of any side's power step pitch vector of 2 and other data pattern commonly used and implements simple function significant simultaneously.The LR function be we for this purpose and the structure mapping function.The LR function is constructed in the following manner:
S
τ=(H * i
τ) (I * j
τ) I still is a unit matrix in (formula 3) formula (3), and being configured to of H matrix
Address mapping technique and device in a kind of CACHE of the present invention system, a CACHE system that has improved adopts XOR address scattering mechanism to improve the distribution mode of data in the CACHE data back of internal storage in such CACHE system; CACHE zone bit field in the address that is input as data of this address scattering mechanism (or one of them part field) and CACHE row field (or one of them part field), this address scattering mechanism is output as CACHE capable number (or one of them part field).The CACHE of this new formation is used to indicate the actual access address (capable number) of CACHE data back and CACHE zone bit memory bank for capable number; Described address scattering mechanism is by formula
S
τ=(R×i
τ)(I×j
τ)
Defined mapping function EE or formula
S
τ=(H * i
τ) (I * j
τ) defined mapping function LR realizes that wherein i in the formula and j are the input of this address scattering mechanism, and S is the output of this address scattering mechanism.
The LR mapping function can satisfy the nothing conflict CACHE visit to the following data pattern of N * N matrix.The row and column of matrix;
The rectangular blocks of square and various Aspect Ratios;
Discrete area is promptly by N the data acquisition that element constituted that is in same position in N the square or rectangular master piece;
Be spaced apart 2
I(i is an arbitrary integer, the uniformly-spaced principal vector of i<n);
Being shifted uniformly-spaced, principal vector (is spaced apart 2
i, wherein i is the integer less than n);
Principal diagonal (when the n digit pair is counted).
The above-mentioned uniformly-spaced principal vector and the principal vector that is shifted uniformly-spaced be exactly step-length be 2
iCirculation N the data pattern that the data element is constituted of visiting.This data access patterns is one of (for example FFT) most important access module in digital image processing, the signal Processing core algorithm.Therefore the conflict-free access that solves this type of data pattern has important and practical meanings to application problems such as digital image processing, signal Processing.
Fig. 6 and Fig. 7 have provided capable number of the CACHE of 16 * 16 and 32 * 32 each element of matrix under the LR Function Mapping.Fig. 7 has provided the signal of continuous blocks and discrete area.Fig. 8 has provided capable number of different starting points, the different CACHE that vector element was mapped to that jumps distances.
Fig. 8 has provided one 16 (2
n=2
4) the realization circuit of LR mapping function in the row CACHE system.Make formula 3 by the LR function, any delegation of H matrix only contains maximum 21, and any delegation of I matrix (unit matrix) only contains maximum 11, thereby any of S is only formed for XOR by one in the former row field 1 of the maximum sum-bits among the zone bit field tag.As can be seen from the figure, this function only needs 6 XOR gate to realize, and its delay only is 2 grades of NOR gate circuits delays.
Input the 7th among this section Fig. 5 and Fig. 8,6,5,4 promptly corresponding to the CACHE zone bit field tag in the earlier figures 2, input the 3rd, 2,1 among this section Fig. 5 and Fig. 8,0 promptly corresponding to the former CACHE row field 1 in the earlier figures 2, and the CACHE row field of the new formation among this section Fig. 5 and Fig. 8 is promptly corresponding to the CACHE row S in the earlier figures 2.
The hardware spending of EE that this section proposed and LR mapping function is the order of magnitude.That is, if the CACHE capacity is N=2
nOK, its hardware spending is 0 (log
2N).No matter it is EE function or LR function again, and its delay is the constant level, promptly 0 (1), and irrelevant with the size of CACHE.These character realize it being a very important advantage for hardware.
This part provides the explication and the formal proof of the character of mapping function EE and LR.In the following description, we suppose that all the CACHE memory bank contains N=2
nOK, and data element of the capable storage of each CACHE; When we spoke of matrix, we supposed that matrix is of a size of N * N; We suppose that the header element address of this matrix is 0, and the storage mode of matrix is " with the behavior master " (as mode of C language compiling).If being " to classify the master as " (as compile mode of formula translation), the storage mode of matrix knows easily that according to proof procedure following character still sets up.Below in the proof, all n * n matrixes of speaking of, its row number and row be number all with n-1, n-1 ..., 1,0 order count; The behavior n-1 of the top, left classify n-1 as, thereby (n-1, n-1), the coordinate of the element in the lower right corner is (0,0) to following being designated as of the upper left corner element of matrix.
Prove the performance of EE function below.
Theorem 1: under the EE Function Mapping, any two elements in any delegation of matrix all do not produce CACHE conflict, have in the capable CACHE memory bank of N CACHE thereby the N of any delegation of a matrix element can reside in one simultaneously.
Proof: two elements of establishing arbitrary row of matrix A are a
U, vAnd a
X, y, they should CACHE be for capable number S1 and S2.Because this two element is the element of matrix with delegation, so u=x and v ≠ y are arranged.Thereby
S1S2=((R×u)(I×v))((R×x)(I×y))
=R×(ux)I×(vy)=I×(vy)
Because I is a unit matrix, it is nonsingular, so S2 is non-0, and promptly these two matrix elements are stored in the different rows of CACHE.
Theorem 2: under the EE Function Mapping, any two elements in any row of matrix all do not produce CACHE conflict, have in the capable CACHE memory bank of N CACHE thereby N element of any row of matrix can reside in one simultaneously.
Notice that R is a nonsingular matrix, know that easily this character can demonstrate,prove with theorem 1 is similar.
The outstanding advantage of EE function is that its guarantees character to the conflict-free access of the data block of the various patterns that contain N element.Below we provide the formal definition and the proof of data block.The continuous P Q piece BLK (P, Q:k, 1) of the matrix X of definition 1:N * N is defined as:
BLK(P,Q:k,l)={x
k+a,l+b(0≤a≤P-1)∧(0≤b≤Q-1)}
P=2 wherein
p, Q=2
q, p and q are all integer, and p+q=n.
The main PQ piece MBLK (P, Q:k, 1) of the matrix X of definition 2:N * N is defined as the continuous P Q piece of satisfied (kMODP=0) ∧ (lMODQ=0).
Fig. 3, Fig. 4 get the bid and understand main PQ piece MBLK (4,4:0,0) and MBLK (2,8:2,8) in 16 * 16 the matrix.
In the matrix operation of more complicated, because matrix size is very big, for the cache systems that makes full use of in the hierarchical memory system structure improves computing velocity, the program designer usually wants to adopt partitioning of matrix algorithm to come the execution speed of accelerating application.In addition, a kind of application of very typical matrix-block access is the imaging filtering algorithm of widespread use in the image processing.Because the importance of partitioning of matrix algorithm in numerical operation and other application, generally parallel storage means is all listed it in very important parallel access pattern.But, what have significance more is that row address and column address are the main P * Q piece of starting point with 2 positive integer time power all, because in the partitioned matrix computing, it is that the matrix-block of 2 positive integer time power carries out piecemeal and calculates that the program designer is divided into length and width to matrix usually, and partitioned matrix at this moment is exactly the main PQ piece of matrix.
Theorem 3: under the EE Function Mapping, any two elements in any one main PQ piece of matrix all do not produce CACHE conflict, have in the capable CACHE memory bank of N CACHE thereby N element of any one main PQ piece of matrix can reside in one simultaneously.
Proof: in the production of capable number of CACHE, because its multiplying and additive operation all are in the enterprising row operation of GF (2), so can carry out following rewriting: S to it
U, v=( of R * u) (I * v)
=C×l
Wherein: Matrix C
1N-1 to the p row by matrix R are formed l
1=(u
N-1, u
N-2, u
p)
τ
Matrix C
2P-1 to the 0 row by matrix R are formed l
2=(u
P-1, u
P-2, u
0)
τ
Matrix C
3N-1 to the q row by matrix I are formed l
3=(v
N-1, v
N-2, v
q)
τ
Matrix C
4Q-1 to the 0 row by matrix I are formed l
4=(v
Q-1, v
Q-2, v
0)
τ
We investigate by C
2And C
4Matrix [the C that constitutes
2C
4].Because C
2Be p-1 to the 0 row (p row altogether) formation by R, and C
4Q-1 to the 0 row (q row altogether) by I constitute, and consider p+q=n, easily know [ C
2C
4The matrix that constitutes is that dimension is the matrix of n.
For any two element x in the main PQ piece of the matrix X of N * N
U, vWith x '
U ', v 'L is arranged
1=l
1 ', l
3=l
3 ', but l
2=l
2 'With l
4=l
4 'Can not set up simultaneously.
≠ 0 is that capable number of the CACHE that shone upon of any two elements is all inequality.So the EE function can guarantee any two elements in the main PQ piece of matrix be stored in respectively different CACHE capable in, promptly the EE function can guarantee that N element of the main PQ piece of matrix resides among the CACHE simultaneously.
The displacement PQ piece SHBLK (P, Q:k, 1) of the matrix X of definition 3:N * N is defined as the continuous P Q piece of satisfied (kMODP=0) ∨ (lMODQ=0).
Theorem 4: under the EE Function Mapping, matrix arbitrarily-any two elements in the displacement PQ piece all do not produce the CACHE conflict, thereby matrix arbitrarily-N element of displacement PQ piece can reside in one simultaneously to have in the capable CACHE memory bank of N CACHE.
Proof: be similar to the proof of theorem 3, be omitted herein.
The displacement PQ piece of matrix is in the displacement of level or vertical direction and get by main PQ piece.Theorem 4 be one than the more powerful theorem of theorem 3 expression, because can residing in one simultaneously, N the element that it not only guarantees main piece have in the capable CACHE memory bank of N CACHE, and guarantee that its a N element also can reside among the CACHE simultaneously as long as main piece only is shifted in a level or a vertical direction.
Theorem 5: degree of association be equal to or greater than 2 and each CACHE memory bank have N capable the CACHE system in, if adopt the EE function, then can make any N of comprising of N * N matrix continuously the data block of element can reside in simultaneously in the CACHE memory bank.
Proof: as shown in Figure 9, get the p * Q continuous blocks mnop of arbitrary N of comprising element, can think that it is to be moved in the horizontal direction X first vegetarian refreshments and moved the individual first vegetarian refreshments of Y and obtain in vertical direction by a main piece abcd.Be displaced block efgh if main piece abcd moves in the horizontal direction X point back, know easily that then mnop is moved the Y point and obtains in vertical direction by efgh.Be displaced block ijkl if main piece abcd moves Y point back in vertical direction, know easily that then mnop moves in the horizontal direction the X point by ijkl and obtains.But know that by theorem 3 the pairing CACHE of the N of an efgh element does not have conflict capable number, so the pairing CACHE of any two elements among the mngh does not also have conflict capable number.Know that as a same reason the pairing CACHE of any two elements among the hckp does not also have conflict capable number.
But we know that cgok belongs to another main PQ piece, so the pairing CACHE of any two elements does not wherein also have conflict capable number.
In addition, know that by theorem 3 element among the cgok conflicts with the capable number nothing of Elements C ACHE among the jngc, and number also do not have and to conflict with row among the hckp.
In sum, we know, CACHE clashes for capable number may have only a element among the cgok and an element among the mjch, perhaps an element among the jngc and an element among the hckp.The element that clashes may have a plurality of simultaneously, but can only be man-to-man.This has just proved the conclusion of theorem 4.
Because the importance of block algorithm in applications such as scientific and engineering calculating, numerical analysis, image processing and pattern-recognition, signal Processing, theorem 4 has been explained the extremely important and superior character of EE function.
In some image processing algorithm, visit each pattern that constituted of point that is in same position in each continuous blocks by picture matrix, we claim this pattern for " discrete area " (SCatteredBLocKss), as following formal definition.
Definition 4: establish a, b is an integer, and 0≤a≤P-1,0≤b≤Q-1.(P, Q:a b) are defined as all element x to the discrete PQ piece SCBLK of the matrix X of N * N
U, v, wherein u=amod P and v=b mod Q.
This pattern mainly is used in image processing and the algorithm for pattern recognition.
Theorem 6: under the EE Function Mapping, any two elements in any one discrete PQ piece of matrix all do not produce the CACHE conflict, have in the capable CACHE memory bank of N CACHE thereby N element of any one discrete PQ piece of matrix can reside in one simultaneously.
Proof:, the CACHE row expression formula of matrix element is done following rewriting with similar in the proof of theorem 3:
S
ui,v=(R×u)(I×v)
Be easy to prove matrix [C
1C
3] order be n.
Hash P * Q piece SCBLK (P, Q:a, b) any two element x in for the matrix of N * N
U, vWith x
U ', v ', l is arranged
2=l
2 ', l
4=l
4 ', but l
1=l
1 'With l
3=l
3 'Can not set up simultaneously.Be located at when adopting the EE method x
U, vWith x
U ', v 'Be stored in S respectively
U, vWith S
U ', v 'During CACHE is capable, then:
≠ 0 is the CACHE row difference of any two elements, thereby the N of any one discrete area element all can reside among the CACHE simultaneously.
In image processing algorithm, to use pattern " part row to " sometimes and reach " part rows to ".They are defined as follows respectively.
The part row that following N-1 the element of definition 5:N * N matrix X is called X is to PRP:
PRP(k)={x
k,0,x
k,1,…,x
k,k-1,x
N-l-k,0,x
N-l-k,1,…,x
N-l-k,N-l-k-1},
0≤k≤N-1 wherein.
The part rows that following N-1 the element of definition 6:N * N matrix X is called X is to PCP:
PCP(k)={x
0,k,x
1,k,…,x
k-1,k,x
0,n-l-k,,x
1,n-l-k,…,x
N-l-k-1,n-l-k},
0≤k-≤N-1 wherein.
Lemma 1: for any u, v, (0≤u, v≤N-1), all have
S
U, v=S
N-l-u, N-l-v, S wherein
U, v
(S
N-l-u, N-l-v) be element x
U, v(element x
N-l-u, N-l-v) capable number of CACHE under the EE Function Mapping.
Proof: because u (or v) any one is the radix-minus-one complement of corresponding positions among the N-l-u (or N-l-v), so S
U, vIn any one be S
N-l-u, N-l-vIn corresponding positions through negating for twice and get, thereby lemma must be demonstrate,proved.
Theorem 7: under the mapping of circle function, among N * N matrix X N-1 element of any part row centering all can reside in simultaneously the N of CACHE memory bank capable in.
Proof: know, all have by definition 5, if x for any one u
K, u∈ PRP (k) is x then
N-1-k, N-1-u∈ PRP (k), vice versa.Know by lemma 1 that the more shared CACHM of element that belongs to PRP (k) during this row centering k is capable just is for capable number that N-l-k does not belong to the shared row of the element of PRP (k) number in capable.Know that by theorem 2 N element in arbitrary row do not have capable number conflict of CACHE again.Thereby there be not capable number conflict of CACHM in N-1 element knowing arbitrary capable centering.
Fixed 8: under the mapping of EE function, among N * N matrix X N-1 element of any part rows centering all can reside in simultaneously the N-1 of CACHE memory bank capable in.
Fixed 8 proof and theorem 7 are similar, are omitted herein.
Prove the LR function property below.
Theorem 9: under the LR Function Mapping, any two elements in any delegation of matrix all do not produce CACHE conflict, have in the capable CACHE memory bank of N CACHE thereby the N of any delegation of a matrix element can reside in one simultaneously.
This proof of theorem and theorem 1 are similar, are omitted herein.
Theorem 10: under the LR Function Mapping, any two elements in any row of matrix all do not produce CACHE conflict, have in the capable CACHE memory bank of N CACHE thereby N element of any row of matrix can reside in one simultaneously.
This proof of theorem and theorem 2 are similar, are omitted herein.
Theorem 1l: under the LR Function Mapping, any two elements in any one main PQ piece of matrix all do not produce CACHE conflict, have in the capable CACHE memory bank of N CACHE thereby N element of any one main PQ piece of matrix can reside in one simultaneously.
This proof of theorem and theorem 3 are similar, are omitted herein.
Theorem 12: under the LR Function Mapping, any two elements in any one discrete PQ piece of matrix all do not produce the CACHE conflict, have in the capable CACHE memory bank of N CACHE thereby N element of any one discrete PQ piece of matrix can reside in one simultaneously.
This proof of theorem and theorem 6 are similar, are omitted herein.
The uniformly-spaced vectorial continuously SEQ (N, S:k, 1) of the matrix X of definition 4:N * N is defined as:
∧ (q=0,1 ..., N-1) }
Wherein S is the step pitch of vector, and it is a positive integer, and X (k, 1) is first element of this vector.
The uniformly-spaced principal vector MSEQ (N, S:k, 1) of matrix X of definition 5:N * N is defined as and satisfies condition (the mod of k * the N) (vector uniformly-spaced continuously of S * N)=0.
Uniformly-spaced principal vector MSEQ in 32 * 32 the matrix (32,4:4,0) and MSEQ (32,16:16,0) have been marked in the parallel memory system of N=32 among Fig. 6.Theorem 13: under LR £ Function Mapping, step pitch S is 2
sN the element of uniformly-spaced principal vector MSEQ (N, S:k, 1) can reside among the CACHE simultaneously.
Proof: for S=2
s(being that step pitch is the situation of integer side's power of 2) can carry out following rewriting: S to the CACHE row expression formula of element
U, v=( of H * u) (I * v)
=C×l
Wherein: Matrix C
1N-1 to the s row by matrix H are formed,
l
1=(u
n-1,u
n-2…,u
s)
τ
Matrix C
2S-1 to the 0 row by matrix H are formed,
l
2=(u
s-1,u
s-2…,u
0)
τ
Matrix C
3N-1 to the s row by matrix I are formed,
l
3=v
n-1,v
n-2…,v
s)
τ
Matrix C
4S-1 to the 0 row by matrix I are formed,
l
4=(v
s-1,v
s-2…,v
0)
τ
Can prove matrix [C
2C
3) order be n.
Step pitch S=2 for N * N matrix
5Uniformly-spaced principal vector MSEQ (N, S:k, 1) in any two element x
U, vAnd x
U ', v 'L is all arranged
1=l
1 ', l
4=l
4 ', but l
2=l
2 'With l
3=l
3 'Can not set up simultaneously.
≠0
So under situation with the LR Function Mapping, step pitch in N * N matrix is that any two elements in the uniformly-spaced principal vector of S=2s are mapped in respectively among the different CACHE, and promptly step pitch is that N element of the uniformly-spaced principal vector of S=2s all can reside among the CACHE simultaneously.
The displacement of the matrix X of definition 6:N * N uniformly-spaced principal vector SHMSEQ (N, S:k, 1) is defined as and satisfies condition 0≤(mod of k * the N) (vector uniformly-spaced continuously of S * N)≤S-1.
0,1) and SHMSEQ (32,8: 8,2) Fig. 7 represented in the parallel memory system of N=32, and the displacement in 32 * 32 the matrix is principal vector SHMSEQ (32,2: uniformly-spaced.Theorem 14: under the LR Function Mapping, step pitch is 2
sDisplacement uniformly-spaced N the element of principal vector SHMSEQ (N, S:k, 1) can reside among the CACHE simultaneously.This proof of theorem is similar with the proof of deciding 13, omits herein.Fixed 15: under the LR Function Mapping, step pitch is 1/22
sN the element of any uniformly-spaced vectorial SEQ (N, S:k, 1) can reside in degree of association simultaneously more than or equal among 2 the CACHE.
The result who utilizes thought in theorem 5 proof and theorem 13, theorem 14 is the correctness of theorem 15 as can be known, and proof is omitted herein.
Uniformly-spaced Xiang Liang parallel access is calculated and engineering problem has very important significance in finding the solution in science, and especially fast Fourier transform (FFT) is used in calculating is spaced apart the uniformly-spaced vectorial of 2 positive integer time power.This is because fast fourier transform is widely used in many science and engineering calculation field, uses widely as all having in fields such as image processing, digital signal processing, pattern-recognitions very.Owing to use step pitch in the fft algorithm repeatedly and be the circulation of different integer side's power of 2; and the line number of CACHE memory bank is integer side's power of one 2; usually can produce a large amount of CACHE conflicts so carry out the CACHE map addresses in a conventional manner, be difficult to realize efficient calculation.Utilizing the LR function to carry out the CACHE mapping can address this problem effectively.
Theorem 16: under the LR Function Mapping, when n was even number, N element of the principal diagonal of N * N matrix can reside among the CACH simultaneously.
Proof: any two element x of the principal diagonal of matrix X
U, uWith x
V, vCACHE be for capable number S
U, uAnd S
V, v
S
u,uS
v,v=((H×u)(I×u))((H×v)(I×v))
=((HI)×u)((HI)×v)
=(HI)×(uv)
As long as we prove out that the order of matrix H I is n, then can by u ≠ v release (H I) * (u v) ≠ 0, thus draw S
U, u≠ S
V, v
If Matrix C=H I, c
X, yFor following being designated as of Matrix C (then Matrix C can be expressed as follows for x, element y):
When n is even number,
Matrix C is a upper triangular matrix, and its back-diagonal element all is 1, so the order of Matrix C is n.
Theorem 17: under the LR Function Mapping, when n was even number, then N element on the back-diagonal of N * N matrix can reside among the CACHE simultaneously.
Proof: for any two element x of the back-diagonal of matrix X
U, N-l-uWith x
V, N-l-v, establish them and be stored in S respectively
U, N-l-uWith S
V, N-l-vDuring CACHE is capable: S
U, N-l-u h
V, N-l-v=(( of H * u) (I * (N-l-u)) (( of H * v) (I * (N-l-v))
=(H×(uv)(I×((N-l-u)(N-l-v)))
=(HI)×(uv)
Proof by theorem 16 knows that when n was even number, the order of matrix H I was n, and u ≠ v, thereby this theorem must be demonstrate,proved.
From above proof as can be seen, the intersegmental function EE of coordination or LR conversion do not generate the memory address of main memory data in CACHE with core address in the present invention, thereby reduce CACHE data access conflict in the practical core algorithm, the valid memory access speed of raising system, to improve the computing velocity of whole computing system, present technique realizes in the design of the secondary CACHE of experimental system " and flat-bed machine " system, even present technique only is applied in the design of two utmost point CACHE systems, also can make whole computer system that the arithmetic speed of some algorithms most in use is improved 30%-60%.