WO2001048610A1 - Multi-bank, fault-tolerant, high-performance memory addressing system and method - Google Patents

Multi-bank, fault-tolerant, high-performance memory addressing system and method Download PDF

Info

Publication number
WO2001048610A1
WO2001048610A1 PCT/US2000/035209 US0035209W WO0148610A1 WO 2001048610 A1 WO2001048610 A1 WO 2001048610A1 US 0035209 W US0035209 W US 0035209W WO 0148610 A1 WO0148610 A1 WO 0148610A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
address
bank
addresses
memory device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2000/035209
Other languages
English (en)
French (fr)
Other versions
WO2001048610A8 (en
Inventor
Gregory V. Chudnovsky
David V. Chudnovsky
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to EP00988335A priority Critical patent/EP1247185A4/en
Priority to AU24552/01A priority patent/AU2455201A/en
Priority to JP2001549196A priority patent/JP5089842B2/ja
Publication of WO2001048610A1 publication Critical patent/WO2001048610A1/en
Publication of WO2001048610A8 publication Critical patent/WO2001048610A8/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0607Interleaved addressing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0292User address space allocation, e.g. contiguous or non contiguous base addressing using tables or multilevel address translation means

Definitions

  • This invention relates generally to a memory addressing system and method and, in particular, to a memory addressing system and method that provides high-performance access to a multi-bank memory having an arbitrary number of banks.
  • the memory space is composed of individual components, typically called banks, whose number is typically a power of 2.
  • the memory space is "interleaved" among the banks, meaning that consecutive addresses are typically mapped to different banks.
  • This approach has been used in high performance systems using as many as 512 banks of memory. Increasing the number of memory banks generally increases the throughput of memory and thus the bandwidth from the memory system to the processing unit. This throughput has traditionally been the weakest point in computer operations.
  • a known problem with this memory representation lies in the performance degradation it incurs when accessing arrays, or other data structures, with a stride which is even or divisible by a higher power of 2. For example, in a 16-bank system, accesses of stride 16 result in the worst performance, since only one of 16 banks is accessed. In many practical applications, array accesses have strides divisible by a high power of 2. For example, in matrices of sizes 2 m x 2 m , for m ⁇ b, column accesses give only 1 N of the peak performance, since whole columns reside in the same memory bank.
  • Such a chip may, for example, comprise 2 b microprocessors and 2 b memory units (with, for example, 1 to 8 Mbits of DRAM per each unit), communicating with each other over, ideally, a full 2 b x 2 b crossbar switching network.
  • the memories in such chips may be treated in a shared memory model as a flat address space of 2 b • 2 m memory locations, where 2 m is the size of each individual memory unit.
  • Embedded memory chips are much more complex than ordinary memory units; accordingly, the cost of discarding or downgrading such chips is correspondingly greater than the cost of doing so for ordinary memory units. Attempts to solve these problems have not been entirely successful.
  • RAMBUS and other similar technologies, attempt to alleviate the processor-memory bottleneck by providing faster memory operations to a non-banked memory or simply interleaved multi-bank memory.
  • improvements are seen primarily for contiguous memory requests only.
  • the speed of processing units increases dramatically, the bottleneck remains.
  • Another remedy to the bank conflict problem is to use a pseudo-random number generator to generate a mapping between a logical address A and a corresponding bank.
  • a pseudo-random number generator to generate a mapping between a logical address A and a corresponding bank.
  • a pseudo-random generator generates a random sequence of output values for an ordered sequence of input values, but will always produce the same output value for a given input value.
  • One problem with this technique is that it produces bank conflicts for stride 1 accesses. Stride 1 accesses are the most common access patterns in most computer applications (occurring for example when reading an instruction stream) and any significant degradation in memory performance for such accesses is therefore unacceptable.
  • the general problem is that a pseudo-random, or truly random, mapping produces, on average, bank conflicts in not less than l/e% (i.e., 36.78 . . . %) of accesses (where e is the base of the natural log), even for large N. This tends to substantially reduce peak performance. Additionally, certain known pseudo-random number generators may not uniformly map the address space across all banks (i.e., some banks may have more addresses mapped to them than others), which in turn increases bank conflicts and reduces performance.
  • a memory device having a plurality, N, of memory banks comprising a plurality of addressable memory locations. Each memory location has a logical address and a corresponding physical address, the physical address comprising a memory bank number and a local address within the memory bank.
  • the memory device comprises an address mapping system, including an address translation unit, that derives, for each logical address, the corresponding physical address.
  • the address translation unit operates such that, for at least one explicit access sequence of logical addresses (for example, a sequence in which each logical address in the sequence is separated from another address in the sequence by a stride value), the derived physical addresses in the sequence of corresponding physical addresses have memory bank numbers that do not form a repetitive pattern having a period less than N + 1 (or even a period less than the size of the address space) and do not on average repeat a bank number within approximately N addresses in the sequence of corresponding physical addresses.
  • logical addresses for example, a sequence in which each logical address in the sequence is separated from another address in the sequence by a stride value
  • the derived physical addresses in the sequence of corresponding physical addresses have memory bank numbers that do not form a repetitive pattern having a period less than N + 1 (or even a period less than the size of the address space) and do not on average repeat a bank number within approximately N addresses in the sequence of corresponding physical addresses.
  • the mapping performed by the address translation unit is referred to herein as "finite quasi-crystal mapping.”
  • the term derives from the fact that a translation unit in accordance with a preferred embodiment of the present invention produces, for most strides, a bank access pattern that is almost periodic (i.e., quasi-crystal-like); for example, the banks selected may generally be separated by a fixed value but occasionally separated by a different value.
  • an example of a quasi-crystal mapping for a given stride in a 16 bank memory system where the banks are numbered 0 to 15, is 0, 2, 4, 6, 8, 10, 13, 15, 1, 3, 5, 7, 9, 12, 14, . . .
  • bank numbers in the sequence are generally separated by
  • a preferred quasi-crystal mapping for a particular explicit access pattern is one in which each memory bank is accessed approximately the same number of times.
  • the discrepancy here this term means the deviation of a given distribution of bank accesses from the uniform one
  • This discrepancy per bank here is only O(l)
  • the bank number in this example is derived from the top bits of scrambled address A.
  • is selected so as to minimize the deviation from a uniform distribution of bank numbers occurring in explicit access patterns of interest (such as various fixed stride or linear sequences of accesses in a two- or multi-dimensional table, including diagonal access patterns) over the 2 K address space.
  • ⁇ s may be narrowed using a variety of techniques. For example, minimizing the deviation from a uniform distribution of bank numbers is similar to the problem of minimizing the deviation from a uniform distribution of fractional parts
  • the range of potentially suitable ⁇ s may be narrowed through the optimization of continued fraction expansion algorithms for rational numbers of the form ⁇ /2 K .
  • the optimization algorithm tries to find potentially suitable integer multipliers ⁇ such that two conditions happen at the same time: (a) initial terms a, in the continued fraction expansion (ao, a sans a 2 , ...) of ⁇ /2 M for M ⁇ K are all small (for example, 1 or 2); and (b) the number of non-zero bits in the binary (or Booth-encoded binary) expansion of ⁇ is minimal among multipliers satisfying condition (a).
  • This non-linear optimization provides the best multiplier ⁇ needed both for scrambling and for the minimal circuit implementation of the scrambler.
  • the final choice of ⁇ is based solely on the minimization of the deviation from the uniform distribution of bank access for various explicit access patterns over the address space.
  • the deviation is computed through exhaustive simulation of bank access patterns for various strides, or other explicit access patterns, over the entire address space.
  • Suitable ⁇ s can be selected by exhaustive computation of deviations for all possible values of ⁇ (i.e., odd, and in the range 1 ⁇ ⁇ ⁇ 2 K ).
  • the address space shrinks to N • 2 m memory locations, where N ⁇ 2 b .
  • N ⁇ 2 b .
  • the complexity of the hardware logic that performs the translation is crucial.
  • Such switching network could be a full 2 b x 2 b crossbar switch.
  • the total memories in this chip are treated in a shared memory model as a flat address space of N • 2 m memory locations. Since these translation units are needed for all multiprocessors inside the part, the ease of the hardware implementation of the address translation logic is crucial.
  • N is a short constant
  • 2 K / N is a (longer) constant.
  • this approach requires only 2 multiplications by short (6-bit) numbers and addition/subtraction.
  • Fig. 1 is a block diagram of one embodiment of the memory addressing system of the present invention.
  • Fig. 2 is a block diagram of the memory section of one embodiment of the present invention.
  • Fig. 3 is a block diagram of the translation unit of one embodiment of the present invention.
  • Fig. 4 is a flowchart illustrating the operation of S-Box.
  • Fig. 5 is a flowchart illustrating the operation of M-Box.
  • Fig. 6 is a flowchart illustrating the operation of N-Box.
  • Fig. 7 illustrates an embodiment of the present invention connected to a single memory array.
  • Fig. 8 illustrates an embodiment of the present invention connected to a single memory array in a switchable bus architecture.
  • Fig. 9 illustrates a multi-processor and memory system.
  • Fig. 10 illustrates an embodiment of the present invention connected to a local memory unit in a multi-processor and memory system.
  • Fig. 11 is a flowchart illustrating an alternative embodiment of the present invention.
  • Figs. 12 - 37 are hierarchical schematics of a hardware implementation of an ' embodiment of the present invention.
  • Bits_N_K_L Takes in N-bit bus A[N-1 :0] and leaves only a sub-bus A[L:K].
  • INS_N_M_L Takes in A[N- 1 :0] (N inputs) and pads them at the bottom with L Grounds, inserts on the top as many A[J:0] bits as fit in M outputs, and if N+L ⁇ M, adds Ground(O) at the top of the result Q[M- 1 :0].
  • N K M This is the hardwired decimal constant K with M-bits of the output Q[M-1 :0].
  • PDMuxN M Predecoded Mux. Gets in N individual select lines S[N-1 :0] and vector buses A[0][M-1 :0], ..., A[N-1][M-1 :0] to get out
  • Recode_ ⁇ 6,8 ⁇ Serially recodes 6 or 8 bits using Recode block.
  • depends on user- requirements and the invention is not limited to any specific criteria. Suitable ⁇ s can be selected by, for example, exhaustive computation of deviations for all possible values of ⁇ (i.e., 1 ⁇ ⁇ ⁇ 2 K ), via direct computer simulation, and selecting ⁇ s that minimize the deviation from uniform distribution of bank accesses for specific classes of explicit one- and multi-dimension memory access patterns that are of interest.
  • analytic techniques based on number-theoretic properties of ⁇ s can be used to pre-select classes of ⁇ s with needed properties of minimal deviation from the uniform distribution; thus reducing the time needed for the exhaustive simulations.
  • modulus 2 K is preferred because the multiplication ⁇ mod 2 K can be implemented relatively simply in hardware, as those skilled in the art would appreciate. This scheme requires only a few gate delays in its implementation and thus adds at most one additional pipeline stage to the memory access.
  • Modular multiplication, mod, say, 2 16 occupies only half of the chip area (i.e., requires only half the gates) of a 16 bit fixed point multiplier. It is also sufficiently fast, because the multiplier ⁇ is known in advance, and can be Booth encoded to reduce the number of stages in a (Wallace) multiplication tree. The complexity of the circuitry can be further reduced by fixing the value of ⁇ in hardware.
  • a specific example of a quasi-crystal address mapping scheme in the case of a fairly typical multi-bank memory subsystem on-a-chip is presented below.
  • this specific example there are 16 memory banks on a chip and the stream of accesses is buffered by a FIFO on the input and a FIFO on the output of each memory bank
  • the buffering ensures proper in-order sequencing of memory accesses.
  • the definition of bank conflict is based solely on the cycle time of each memory bank.
  • the system cycle time is T ns. (T ⁇ 2.5), and each bank of memory has a cycle time of 10 T ns. (or even 8 T ns. in the next generation of the technology).
  • may be selected, for example, to generate conflict-free bank accesses, and thus minimal latency, in memory accesses for all strides of sizes up to O(2 M ) that are not multiples of 13.
  • the number 13 is the largest (and therefore best) number with this property, but other numbers, such as 11 or some other smaller prime, can be used.
  • This specific example preserves conflict-free bank accesses for most strides and arbitrary numbers of memory banks (including, but not limited to, power of two number of banks), while providing the randomization of other accesses.
  • This scrambler has a minimal complexity (its size is only linear in input/output) for a fixed multiplier ⁇ , which is important for a practical implementation of this addressing scheme since it reduces the number of gates in the circuit.
  • the patterns of bank accesses in this scrambling scheme for fixed stride arrays resemble finite quasi-crystal tilings.
  • This example of a memory translation unit is characteristic of the address scrambling schemes in all of the preferred embodiments of the system disclosed herein .
  • Similar optimization in the choice of ⁇ can be used to minimize the deviation from uniform distribution of bank accesses for other sequences of logical addresses of strides greater than one and other explicit one- and multi-dimensional patterns of memory accesses.
  • may be selected so as to provide conflict-free bank accesses for stride one (contiguous) arrays, 100% bandwidth for all higher strides (up to O(2 M )) not divisible by 89, but a higher latency of bank accesses than the minimal one for some strides under 89.
  • the memory address translation (and scrambling) unit is designed so that it will work when the number N of memory banks and the number S of sub-banks can be set (dynamically) to any value less than or equal to the maximal one(s) available in the memory system.
  • logical address refers to an address that a device external to the memory system uses to identify an item in memory.
  • physical address refers to a physical memory location and comprises a bank number and a local address within the bank. There is a one-to-one mapping of logical addresses to physical addresses over the entire address space.
  • each addressable memory location comprises a predetermined number of bytes, e.g., 32 bytes.
  • the address space is N • S • 2 10 , for a maximum of 2 19 valid words.
  • All valid memory locations may accordingly be represented by logical 19-bit addresses, A, in the range 0 ⁇ A ⁇ N • S • 2 10 .
  • Each logical address corresponds to a physical memory location, where a physical memory location is identified by a bank number and a local address within the bank; i.e., A - (Bank, Local), where 0 ⁇ Bank ⁇ N, 0 ⁇ Local ⁇ S • 2 10 .
  • the present invention is not limited to a memory having the above-described structure.
  • the present invention may be applied to a memory having more or less than 64 banks or more than 8 • 2 10 words per bank or less than 1 * 2 10 words per bank.
  • Modular multiplication is used here as a means to construct a finite quasi-crystal mapping for the memory translation unit (one of many possible means, but the preferred one in this embodiment).
  • Fig. 1 is a block diagram of a memory addressing system for an arbitrary number of banks in accordance with the present invention.
  • Translation unit 1 receives, in this embodiment, a 19-bit logical address, A, the number of banks, N, the number of sub-banks, S, and multiplier ⁇ and translates logical address A into a 6-bit bank number and a 13 -bit local address, which are then output.
  • the bank number is then used to address non-defective bank table 6 in memory section 2, which in turn maps the bank number to a physical bank number of a non-defective bank.
  • the physical bank number and local address is used to address an attached memory device.
  • Non-defective bank table 6 is preferably a writable memory (such as a RAM).
  • the non-defective bank table 6 is a 64x6 bit table in which the row number corresponds to a logical bank number and the contents of the table at each row provides the corresponding physical bank number of a non-defective bank. If there are fewer than 64 banks, not all rows in the table will be used. Memory table 6 is shown as a 64x6 bit memory since there is a maximum of 64 valid banks in this example. Of course, a larger memory is needed if the memory system has more banks and a smaller one is needed if the memory system has fewer banks.
  • N, S and ⁇ are stored in registers 3, 4 and 5, respectively, in memory section 2. Alternatively, these values may be stored in read-only memory or hardwired.
  • registers 3, 4 and 5 and non-defective bank table 6 are configured so that they can be updated using, for example, scan path loading, as illustrated in Fig. 2. (N.B., identical numbers in different figures refer to identical components.)
  • values for N, S and ⁇ are entered in register 3; each bit entered in register 3 right shifts the contents of registers 3, 4 and 5, with the last bit of register 3 being shifted to register 4 and the last bit of register 4 being shifted to register 5, until all three registers are filled with the desired values.
  • the values in non-defective bank table 6 are similarly set.
  • Fig. 3 is a block diagram of translation unit 1.
  • L(S) and C(S) may be stored, for example, in registers or implemented in hardware.
  • R is then A top - Q • S mod 8; i.e., the 3 lowest bits of A lop - S • Q.
  • the values for L(S) and C(S), 1 ⁇ S ⁇ 8, for bit range [15:10] are as follows:
  • Q may, for example, be determined as follows.
  • Fig. 4 illustrates the above process for determining Q and R.
  • step 20 the values of A top and S are input and B is set equal to A top .
  • step 21, 22 and 23 the values of L(S), C and the range [upper: lower], respectively, are determined based on the value of S.
  • step 24 quotient Q is set to (B • L(S) + C)[upper:lower].
  • step 25 R is set to (B - S ⁇ Q) mod 8.
  • Q and R are output.
  • Suitable ⁇ s are again determined, for example, by exhaustive computation; i.e., by using various values in the translation unit described herein and determining the ⁇ values that produce optimal bank access patterns for particular values of N and S.
  • a conflict occurs when two logical addresses are mapped to the same bank number.
  • a conflict must occur at least once every N + 1 accesses.
  • an optimal bank access pattern is one in which bank conflicts are minimized for explicit access patterns of interest, such as fixed stride patterns and linear two- and multi-dimensional patterns of access (including diagonal patterns of access in matrices).
  • stride 1 conflicts and conflicts for other explicit access patterns of interest should occur on average no more than approximately every N accesses.
  • N-Box 16 receives the Q output of S-Box 12 on its Q input; the output of M-Box 14 on its D input, and the number of banks, N, on its N input.
  • N-Box 16 computes and outputs the bank number and lower order 10 bits of the local address, LA[9:0], as described in Fig. 6.
  • the lower order 10 bits of the local address output from N-Box 16 are combined with the high order 3 bits, LA[12:10], from the R output of S-Box to make the entire local address.
  • the process performed by N-Box 16 is illustrated in Fig. 6.
  • Q, D and N are input.
  • Step 42 sets X to D • N + Q.
  • Step 44 sets bits [9:0] of the local address to X[9:0]; i.e.,
  • Step 45 outputs the logical bank number and the low order bits of the local address.
  • the logical bank number is then sent to and used to address the non-defective bank table 6, as described above in connection with Fig. 1.
  • the local address can derived in this case by, for example, appending the local address derived from the 19-bit subset to the unused bits of the K-bit address.
  • the technique described in this embodiment can be easily adapted for any of the following ranges of values: K, either larger or smaller than 19; N, larger or smaller than 64; and S, larger or smaller than 8.
  • K either larger or smaller than 19
  • N larger or smaller than 64
  • S larger or smaller than 8.
  • the choice of the parameter ⁇ is made according to the principles of quasi-crystal mappings described above.
  • the performance of the address translation unit, built from appropriately modified S-, M-, and N- Boxes with a proper choice of ⁇ improves as K increases (for K > 19).
  • FIG. 12 - 37 A hardware implementation for the embodiment shown in Figs. 1-6 is depicted in the hierarchical schematics shown in Figs. 12 - 37.
  • the implementations of low level blocks in these schematics are presented for illustrative purposes and in production will be implemented in library and technology specific fashion.
  • One skilled in the art will understand the range of specific implementations and can choose an appropriate one that is library and process specific. For example, in newer technology one will use buffers to minimize wire length while in the older technology longer wires with less gate delays are preferable.
  • Fig. 12 depicts a hardware implementation of S-Box 12.
  • Sub3 101 receives the four bit constant S[3:0] and the value 1, from hard-wired constant 114 (or, alternatively, from a register), and subtracts 1 from S. This converts S from the range 1 to 8, inclusive, to 0 to 7, inclusive, for use as an index.
  • the result is sent to the S input of multiplexer MUX8x8 104, for selecting the corresponding L(S), and to decoder DEC8 110, for determining the corresponding values of C and the range [upper :lower].
  • Multiplexer MUX8x8 104 selects and outputs one of eight input values A-H based on input value S. Those input values (corresponding to L(S) in Fig. 4) are received from constant bank 102. As shown, constant bank 102 contains the following hard-coded constants in positions 0 through 7, respectively: 128, 64, 171, 32, 205, 171, 146 and 128. Alternatively, the values in bank 102 can be stored in registers.
  • Decoder DEC8 110 also receives the output of box Sub3 101 and sets one of its outputs, Q0-Q7, high based on the received value (e.g., Q0 is set high if the received value is
  • Elements 106, 107 and 108, PDMux3_6 109, and OR gates 111 and 112 select a range of bits (i.e., [upper: lower]) from the output of Sbox_Mult 105. The range depends on the output of decoder DEC8 110.
  • Element 106 directs bits Q[12:7], where Q here is the output of Sbox_Mult 105, to input A of PDMux3_6 109; element 107 directs bits Q[14:9] to input B of PDMux3_6 109; and element 108 directs bits Q[l 5 : 10] to input C of PDMux3_6 109.
  • PDMux3_6 109 is a predecoded multiplexer having three individual select lines, S[2:0], only one of which will be logic 1, which select and output one of three corresponding inputs, A, B or C.
  • input A (range [12:7]) is selected if S equals 1, 2 or 4 (i.e., output Q0, Ql or Q3 from decoder DEC8 1 10 to OR gate 111 is logic 1)
  • input B (range [14:9]) is selected if S equals 3 (i.e., output Q2 from decoder DEC8 110 is logic 1)
  • input C range
  • SBox_BMSQ 113 computes (B - S • Q) mod 2 3 ; in particular, it receives B[2:0], S[2:0], and the three lower order bits of the output of multiplexer 109 on its B[2:0], S[2:0] and Q[2:0] inputs, respectively, and outputs the result on its R output.
  • the inputs are only 3 bits each because the calculation only determines the 3 low order bits of the result (i.e., it is mod 2 3 ).
  • Sbox_Mult 105 An implementation of Sbox_Mult 105 is shown in Fig. 14.
  • Recode ⁇ 250 recodes the bits in 8-bit input A to facilitate efficient multiplication.
  • An implementation of Recode 250 is shown in Fig. 15, comprising four Recode 300 blocks.
  • An implementation of a Recode 300 block is shown in Fig. 16.
  • ProdMux_9 251, 252, 253 arid 254 computes the simple signed product of its 9-bit B input by 0, 1, 2 or -1 depending on whether its A input is 0, 1, 2 or 3, respectively, and produces an 11-bit output, Q, where Q[10] is set to 1 only if input A equals 3 and Q[l 1] is always set to 1.
  • ProdMux_9 can be implemented in a similar manner to
  • Elements 256, 257, 258, 259 and 260 are bus exchanges.
  • Elements 261 and 262 are circuits of the general form Pad_N_M; each pads its input, which is of length N, with M-N grounds (0's) to produce an output of length M.
  • An example of Pad_N_M, Pad_6_10, is shown in Fig. 18; it pads its 6-bit input with four 0's to produce a 10 bit output.
  • Element Gndx4 340 in Fig. 18 is element of the general from GndxN, which returns N grounds (0's); in this case Gndx4 340 returns 4 grounds.
  • Up_14_2 263 is a circuit of the general form Up_N_M; it receives an N bit input and pads it on the bottom with M 0's, producing a result that is N+M long.
  • Elements 264, 265, 269, 271 and 272 are circuits of the general form Ins_N_M_L; each takes an N length input, pads it on the bottom with L grounds, inserts above as many bits of its input as fits in M outputs, and, if N + L ⁇ M, adds grounds (0's) at the top of the result.
  • CSA 6 266, 270 and 273 and CSA_14 268 are circuits of the general form, CSA_N; each is an N-long array of carry-save (full) adders, CSAs.
  • An example of a CSA_N circuit, CSA_10, is shown in Fig. 21, and an implementation of a constituent CSA circuit is shown in Fig. 22.
  • CSA computes the sum and carry bits, S and C, respectively, of its 3 inputs, X, Y, Z.
  • Add 16 274, in Fig. 14, is a circuit of the general form AddN, which is an N-bit adder.
  • AddN An example of an AddN circuit, Addl6, is shown in Fig. 23; it is comprised of four TrAdd4 290 circuits, which are examples of TrAddN circuits.
  • TrAddN circuits are N-bit adders having a carry-in (CI) input and a carry-out (CO) output.
  • CI carry-in
  • CO carry-out
  • Sbox_BMSQ 113 An implementation of Sbox_BMSQ 113 is shown in Fig. 13.
  • AND gates 200, 201 and 202, Up_2_l 203 and Up_l_2 204, CSA_3 205, Ins_3_3_l 206 and Add3 207 compute S • Q, which is output from Add3 207.
  • Sub3 208 takes the output of Add3 207 and subtracts it from B (where B is the lower order three bits of A top ).
  • the circuitry in Up_2_l 203 and Up_l_2 204, CSA_3 205, Ins_3_3_l 206 and Add3 207 is described above.
  • Sub3 208 is a circuit of the general form SubN, which subtracts N-bit inputs producing an N-bit output.
  • FIG. 25 shows a circuit that subtracts 3 -bit input B from 3 -bit input A and outputs 3 -bit result Q.
  • NOT gate 130 outputs the complement of B, designated Y.
  • the input A is designated X.
  • CSA 1 1, 132 and 133 are carry-save (full) adders, each of which outputs a result bit on its S output and a carry bit on its C output.
  • CSA 131 computes the low order bit of the result, Q[0]
  • CSA 132 computes the middle bit of the result, Q[l]
  • CSA 133 computes the high order bit of the result, Q[2].
  • CSA 131 sums X[0], Y[0] and hardwired 1 (i.e., VCC) and outputs the first bit of the result, Q[0], on its S output and carry value on its C output.
  • CSA 132 receives and sums X[l], Y[l] and the C output from CSA 131 and outputs the second bit of the result Q[l] on its S output and carry value on its C output.
  • CSA 133 receives and sums X[2], Y[2] and the C output from CSA 133 and outputs the third bit of the result Q[2] on its S output.
  • An implementation of MUX8x8 104 in Fig. 12 is shown in Figs. 26 - 28.
  • multiplexer 104 is a hardware array of 8 MUX8 150 units, one unit for each bit of the 8 bit input values A-H.
  • a MUX8 150 unit is depicted in Fig. 27 and comprises 7 MUX units 170-176. Each MUX unit selects and outputs one of its inputs, A or B, based on its S input.
  • MUX units 170-173 select outputs based on the lower order bit of S (i.e., S[0]); MUX units 174-175 select outputs based on S[2]; and MUX unit 176 selects an output based on
  • Constant bank 102 and N_l_3 114 in Fig. 12 are hardwired constants of the general form N_K_M, where K is the value of the constant and M is the number of bits of the output.
  • N_K_M, N_171_8, is shown in Fig. 29. It outputs the value 171 (binary
  • Elements 106, 107 and 108 in Fig. 12 are sub-bus junctions of the general form Bits_N_K_L; the junction takes in an N-bit bus, A[N-1:0] and outputs the sub-bus A[L:K].
  • PDMux3_6 109 in Fig. 12 is a predecoded multiplexer of the general form PDMuxN_M; such multiplexers receive N individual select lines and output one of N vector buses of width M.
  • PDMuxN_M, PDMux3_l 0, is shown in Fig. 31. It has three select lines S that select one of three 10-bit wide inputs, A, B, C, and outputs the selected input.
  • decoder DEC8 110 is shown in Fig. 36. DEC8 110 sets one of its outputs, Q0-Q7, high based on the value received on its 3-bit input A.
  • DEC2E 281 and DEC4E 283 and 285, which are examples of circuits of the general form DECNE; each takes an input that is log 2 N wide and sets one of its N outputs high if its enable input E is also high
  • Fig. 32 depicts a hardware implementation of M-Box 14. It receives the 8-bit value ⁇ on its A input and the lower 10 bits of address A (i.e., A[9:0], or A Bot ) on its B input, computes A bot • ⁇ mod 2 10 , and outputs the 10 bit result. All the components of Fig. 32 have been discussed above, except CPM_10 301, CPM_8 302, CPM_6 303 and CPM_4 304, each of which are chopped product multiplexers of the form CPM_N.
  • a CPM_N multiplexer receives an N bit input, B[N-1 :0] and outputs B[N-1 :0] (i.e., 1 • B), B[N-1 :0] left-shifted one bit (i.e., 2 • B), or the complement of B[N-1:0] (i.e., -1 • B), depending on which select line from input A[2:0] is logic 1.
  • An example of CPM_N, CPM_10 301, is shown in Fig. 33. In this example, input is 10 bits wide.
  • Fig 34 depicts a hardware implementation of N-Box 16. It receives the output of M-
  • N 64 when there are no defective banks.
  • a hardware implementation of Nbox_Mult 350 is shown in Fig. 35. All the components of Fig. 35 have been described above.
  • a preferred hardware implementation has a single-cycle operation, typical for conventional systems.
  • pipelined operation of "S-, M-, N- Boxes" is advantageous — it permits significantly shorter cycle time at the cost of adding several short registers for keeping intermediate results.
  • Another possibility is to use an asynchronous implementation of all short multiplications in the S-, M-, N- Boxes described above. For asynchronous accesses to memory arrays, this approach removes setup/hold constraints and provides the fastest time for most data patterns. Depending on the implementation of the asynchronous multiplier arrays, this approach can result in a data-dependent timing. D.
  • Fig. 1 1 illustrates an alternative embodiment of a translation unit in accordance with the present invention.
  • the number of banks, N does not exceed 64.
  • the translation unit receives an address A arid generates a corresponding bank number and local address.
  • A_Top is set to A[18:13]
  • A_Mid is set to A[12:7]
  • A_Bot is set to A[6:0].
  • A_S is set to A_Top • 2 7 + A_Bot.
  • Steps 51 and 52 select 12 bits from A for the purpose of determining a bank number. Bits other than the ones specified in these steps may also be used.
  • Middle_Bits is set to (A_S • ⁇ )[l 8: 13].
  • Middle_Bits is mapped to a bank number via a non-defective bank table, or similar translation mechanism.
  • A is first multiplied by a 19-bit constant LL, looking at bits [36:18] of the product A • LL.
  • the following modular multiplication (transformation) method is used:
  • the prerequisite to any fault-tolerant operation is the memory testing that determines the failures of individual memory banks or modules, and/or processing units controlling these banks or modules (units).
  • This testing can happen at any stage of memory and/or processor use - at the initial testing after the manufacturing of the component, or at any time during the life and operation of the component.
  • This testing can be external (for example, by means of exercising memory accesses through an external memory bus), or internal, using various state of the art approaches, such as a serial or parallel scan path or paths, BIST (built-in-self-test), or special on-the-chip circuitry that generates test patterns for comprehensive testing.
  • defect information must be stored, so it can be efficiently used by the remapping circuitry.
  • the defect information can be hardwired into the chip if testing is done at the time of manufacture - however, no further changes will generally be possible.
  • it can be written, or downloaded, into a special RAM area or areas of the chip after the testing. This requires separate storage of the bad element numbers. Such storage can be done externally in a PROM or other machine-readable form (bar-code, magnetic code, system storage, etc.).
  • a special non-volatile area of the chip can be dedicated for such storage and then accessed or reprogrammed after further testing.
  • both the testing and reprogramming can be done entirely in software, when, for example, following boot procedures the software tests the memory and downloads the list of bad elements into RAM (or even register) areas of the chip.
  • the non-defective bank table 6, described above in connection with Figs. 1 and 2 is especially suited for the third and fourth techniques.
  • Those skilled in the art will appreciate that other techniques for storing defect information may also be used and the present invention is not limited to the specific techniques described above.
  • the information about the defective elements e.g., memory banks, units or processing units
  • the defective element numbers can be stored, with the remapping ensuring that these element numbers are not used.
  • Non- defective bank table 6 in Figure 1 is an example of such a RAM. As described above, it comprises a 64 x 6 RAM (or ROM) array that stores for each valid 6-bit bank number Bank (1 ⁇ Bank ⁇ N), the actual number of one of the N non-defective banks on the chip. This array provides on the output of the "Bank” bus the (binary) value of the non-defective memory bank.
  • the list of defective or non-defective banks can be compressed using various compression techniques for a RAM array that significantly reduce the number of bits required to store the defect information.
  • the number can be reduced from 64 x 6 bits (i.e., the maximal number of bits needed without compression if almost all banks are defective) to at most W x 6 bits where W is the smallest of the number of defective or non-defective banks.
  • This compression comes at a cost of additional decompression circuitry (of about O(W) gates), and an additional time delay to translate the bank number. It might be useful only in cases when small RAM (ROM) blocks cannot be efficiently used on the chip, and the storage array is implemented instead using registers.
  • the values of other parameters may also need to be adjusted in order to provide fault-tolerant operations, such as the values of N and S - i.e., the number of memory banks and sub-banks (units), respectively, the constants L, and multipliers ⁇ or LL.
  • the values of N, S, L and/or ⁇ can be stored or downloaded together with the list of non-defective units. They should be kept in fast registers with buffered output signals, or hardwired directly (if testing and modification is done at manufacturing time).
  • the values of constants L, ⁇ , or LL can be downloaded, or fixed values for L, ⁇ , or LL, can be simply hardwired.
  • Hardwiring L, ⁇ , or LL decreases performance if a significant number of blocks are defective (above 50%), but also significantly reduces the number of gates in S-Box 12 and in M-Box 14 in Figure 3.
  • the place or places where the list of defective or non-defective elements is stored depends on the type of memory system placed on a chip.
  • Non-defective banks in Fig. 7 are labeled 60 and defective banks are labeled 61.
  • Figs. 9 and 10 show a system-on-a-chip, having possibly multiple processing units accessing multiple memory units and banks, as shown in Figs. 9 and 10, the list of defective, or non-defective, units is stored in a distributed fashion with individual processing units (or clusters thereof)- Additionally, re-mapping and scrambling circuitry is placed together with individual processing units (or their clusters). This significantly increases the need for a minimal gate implementation of the re-mapping and scrambling circuitry and makes the implementation of Figures 1-6 the preferred one.
  • Fig. 9 shows a multiprocessor and memory system comprising units 70.
  • Fig. 10 shows memory-related portions of an individual unit 70.
  • Translation unit 81 and memory area 82 control access to local memory units (sub-banks) Bl - B8 83.
  • Switch 80 routes local and global addresses and control bits and routing information of the memory data in and out of the individual units (70) from and to the communication switch of the complete system on-a-chip.
  • the purpose of the proposed re-mapping circuitry is to allow for fault-tolerant operation of large systems with many memory and processing elements where a large number of failures of individual memory or processing components has to be tolerated without degradation of system performance. The only degradation is the graceful decrease in available storage (or processor performance).
  • the address translation and scrambling unit guarantees the same quality of memory access and high bandwidth to the usable (non-defective) memory system.
  • the proposed fault tolerant solution allows for a specific number of additional (so called spare or reserved) memory banks and/or processing units to be added to the chip. The number of such spare banks or units is determined by yield and process factors and can be variable.
  • a system may be configured so that some of the memory banks are ignored for other reasons. The system disclosed herein allows for such variability.
  • circuitry shown in the embodiments above can readily be changed and optimized for particular hardware and may comprise greater or fewer circuits and components.
  • the present invention may be implemented in software that remaps virtual address accesses to physical memory or reorganizes accesses to various memory arrays available to a program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
PCT/US2000/035209 1999-12-27 2000-12-26 Multi-bank, fault-tolerant, high-performance memory addressing system and method Ceased WO2001048610A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP00988335A EP1247185A4 (en) 1999-12-27 2000-12-26 SYSTEM AND METHOD FOR MULTI-BANK FAULT TOLERANT HIGH-PERFORMANCE MEMORY ADDRESSING
AU24552/01A AU2455201A (en) 1999-12-27 2000-12-26 Multi-bank, fault-tolerant, high-performance memory addressing system and method
JP2001549196A JP5089842B2 (ja) 1999-12-27 2000-12-26 マルチバンク、フォルトトレラント、高性能メモリアドレス指定のシステム及び方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/472,930 1999-12-27
US09/472,930 US6381669B1 (en) 1999-12-27 1999-12-27 Multi-bank, fault-tolerant, high-performance memory addressing system and method

Publications (2)

Publication Number Publication Date
WO2001048610A1 true WO2001048610A1 (en) 2001-07-05
WO2001048610A8 WO2001048610A8 (en) 2001-11-29

Family

ID=23877479

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/035209 Ceased WO2001048610A1 (en) 1999-12-27 2000-12-26 Multi-bank, fault-tolerant, high-performance memory addressing system and method

Country Status (7)

Country Link
US (2) US6381669B1 (enExample)
EP (1) EP1247185A4 (enExample)
JP (1) JP5089842B2 (enExample)
KR (1) KR100781132B1 (enExample)
CN (1) CN1437728A (enExample)
AU (1) AU2455201A (enExample)
WO (1) WO2001048610A1 (enExample)

Families Citing this family (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6748480B2 (en) * 1999-12-27 2004-06-08 Gregory V. Chudnovsky Multi-bank, fault-tolerant, high-performance memory addressing system and method
JP2002064145A (ja) * 2000-06-09 2002-02-28 Fujitsu Ltd 冗長素子を備える集積回路チップ、マルチプロセッサおよびその製法
US6671822B1 (en) * 2000-08-31 2003-12-30 Hewlett-Packard Development Company, L.P. Method and system for absorbing defects in high performance microprocessor with a large n-way set associative cache
US6968300B2 (en) * 2001-01-26 2005-11-22 Dell Products L.P. Computer system and printed circuit board manufactured in accordance with a quasi-Monte Carlo simulation technique for multi-dimensional spaces
US6831649B2 (en) * 2001-02-15 2004-12-14 Sony Corporation Two-dimensional buffer pages using state addressing
US6791557B2 (en) * 2001-02-15 2004-09-14 Sony Corporation Two-dimensional buffer pages using bit-field addressing
US6828977B2 (en) * 2001-02-15 2004-12-07 Sony Corporation Dynamic buffer pages
US6850241B2 (en) * 2001-02-15 2005-02-01 Sony Corporation Swapped pixel pages
US6765580B2 (en) * 2001-02-15 2004-07-20 Sony Corporation Pixel pages optimized for GLV
US7088369B2 (en) * 2001-02-15 2006-08-08 Sony Corporation Checkerboard buffer using two-dimensional buffer pages and using bit-field addressing
US6795079B2 (en) * 2001-02-15 2004-09-21 Sony Corporation Two-dimensional buffer pages
US6801204B2 (en) * 2001-02-15 2004-10-05 Sony Corporation, A Japanese Corporation Checkerboard buffer using memory blocks
US6992674B2 (en) * 2001-02-15 2006-01-31 Sony Corporation Checkerboard buffer using two-dimensional buffer pages and using state addressing
US6831651B2 (en) * 2001-02-15 2004-12-14 Sony Corporation Checkerboard buffer
US6803917B2 (en) * 2001-02-15 2004-10-12 Sony Corporation Checkerboard buffer using memory bank alternation
US7379069B2 (en) * 2001-02-15 2008-05-27 Sony Corporation Checkerboard buffer using two-dimensional buffer pages
US7038691B2 (en) * 2001-02-15 2006-05-02 Sony Corporation Two-dimensional buffer pages using memory bank alternation
US7205993B2 (en) * 2001-02-15 2007-04-17 Sony Corporation Checkerboard buffer using two-dimensional buffer pages and using memory bank alternation
US6831650B2 (en) * 2001-02-15 2004-12-14 Sony Corporation Checkerboard buffer using sequential memory locations
US20030058368A1 (en) * 2001-09-24 2003-03-27 Mark Champion Image warping using pixel pages
US6965980B2 (en) * 2002-02-14 2005-11-15 Sony Corporation Multi-sequence burst accessing for SDRAM
US7155575B2 (en) * 2002-12-18 2006-12-26 Intel Corporation Adaptive prefetch for irregular access patterns
US7013378B2 (en) * 2003-04-30 2006-03-14 Hewlett-Packard Development Company, L.P. Method and system for minimizing the length of a defect list for a storage device
JP4765260B2 (ja) * 2004-03-31 2011-09-07 日本電気株式会社 データ処理装置およびその処理方法ならびにプログラムおよび携帯電話装置
KR100539261B1 (ko) * 2004-05-04 2005-12-27 삼성전자주식회사 디지털 데이터의 부호화 장치와 dvd로의 기록 장치 및그 방법
US7873776B2 (en) * 2004-06-30 2011-01-18 Oracle America, Inc. Multiple-core processor with support for multiple virtual processors
US7685354B1 (en) * 2004-06-30 2010-03-23 Sun Microsystems, Inc. Multiple-core processor with flexible mapping of processor cores to cache banks
FR2889349A1 (fr) * 2005-07-26 2007-02-02 St Microelectronics Sa Procede et dispositif de securisation d'un circuit integre, notamment une carte a microprocesseur
KR100855467B1 (ko) * 2006-09-27 2008-09-01 삼성전자주식회사 이종 셀 타입을 지원하는 비휘발성 메모리를 위한 맵핑장치 및 방법
US7694193B2 (en) * 2007-03-13 2010-04-06 Hewlett-Packard Development Company, L.P. Systems and methods for implementing a stride value for accessing memory
US7472038B2 (en) * 2007-04-16 2008-12-30 International Business Machines Corporation Method of predicting microprocessor lifetime reliability using architecture-level structure-aware techniques
US20100262751A1 (en) * 2009-04-09 2010-10-14 Sun Microsystems, Inc. Memory Control Unit Mapping Physical Address to DRAM Address for a Non-Power-of-Two Number of Memory Ranks Using Lower Order Physical Address Bits
US9348751B2 (en) * 2009-09-25 2016-05-24 Nvidia Corporation System and methods for distributing a power-of-two virtual memory page across a non-power-of two number of DRAM partitions
CN102035865B (zh) * 2009-09-30 2013-04-17 阿里巴巴集团控股有限公司 数据存储及数据寻址方法、系统和设备
US9268691B2 (en) 2012-06-11 2016-02-23 Intel Corporation Fast mechanism for accessing 2n±1 interleaved memory system
CN103914390B (zh) * 2013-01-06 2016-08-17 北京忆恒创源科技有限公司 存储设备
CN103399827B (zh) * 2013-07-25 2015-11-25 华为技术有限公司 存储装置、执行访问操作的系统和方法
US9495291B2 (en) * 2013-09-27 2016-11-15 Qualcomm Incorporated Configurable spreading function for memory interleaving
US10268601B2 (en) 2016-06-17 2019-04-23 Massachusetts Institute Of Technology Timely randomized memory protection
US10310991B2 (en) * 2016-08-11 2019-06-04 Massachusetts Institute Of Technology Timely address space randomization
KR102540964B1 (ko) 2018-02-12 2023-06-07 삼성전자주식회사 입출력 장치의 활용도 및 성능을 조절하는 메모리 컨트롤러, 애플리케이션 프로세서 및 메모리 컨트롤러의 동작
CN110350922A (zh) * 2019-07-18 2019-10-18 南京风兴科技有限公司 一种二进制编码的寻址方法及寻址器
KR102833051B1 (ko) 2020-04-20 2025-07-11 삼성전자주식회사 메모리 모듈 및 적층형 메모리 장치
CN114385089B (zh) * 2022-03-22 2022-08-05 北京清微智能信息技术有限公司 一种基于交叉编址的动态bank存储方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5043874A (en) * 1989-02-03 1991-08-27 Digital Equipment Corporation Memory configuration for use with means for interfacing a system control unit for a multi-processor system with the system main memory
US5479624A (en) * 1992-10-14 1995-12-26 Lee Research, Inc. High-performance interleaved memory system comprising a prime number of memory modules
US5497478A (en) * 1991-03-20 1996-03-05 Hewlett-Packard Company Memory access system and method modifying a memory interleaving scheme so that data can be read in any sequence without inserting wait cycles
US5530837A (en) * 1994-03-28 1996-06-25 Hewlett-Packard Co. Methods and apparatus for interleaving memory transactions into an arbitrary number of banks

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6265148A (ja) * 1985-09-17 1987-03-24 Fujitsu Ltd メモリアクセス制御方式
JPS63225837A (ja) * 1987-03-13 1988-09-20 Fujitsu Ltd 距離付きベクトルアクセス方式
US5063526A (en) * 1987-06-03 1991-11-05 Advanced Micro Devices, Inc. Bit map rotation processor
JPH063589B2 (ja) * 1987-10-29 1994-01-12 インターナシヨナル・ビジネス・マシーンズ・コーポレーシヨン アドレス置換装置
US5111389A (en) * 1987-10-29 1992-05-05 International Business Machines Corporation Aperiodic mapping system using power-of-two stride access to interleaved devices
JP3532932B2 (ja) 1991-05-20 2004-05-31 モトローラ・インコーポレイテッド 時間重複メモリ・アクセスを有するランダムにアクセス可能なメモリ
US5526507A (en) * 1992-01-06 1996-06-11 Hill; Andrew J. W. Computer memory array control for accessing different memory banks simullaneously
EP0615190A1 (en) 1993-03-11 1994-09-14 Data General Corporation Expandable memory for a digital computer
JP3304531B2 (ja) 1993-08-24 2002-07-22 富士通株式会社 半導体記憶装置
US6021482A (en) * 1997-07-22 2000-02-01 Seagate Technology, Inc. Extended page mode with a skipped logical addressing for an embedded longitudinal redundancy check scheme

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5043874A (en) * 1989-02-03 1991-08-27 Digital Equipment Corporation Memory configuration for use with means for interfacing a system control unit for a multi-processor system with the system main memory
US5497478A (en) * 1991-03-20 1996-03-05 Hewlett-Packard Company Memory access system and method modifying a memory interleaving scheme so that data can be read in any sequence without inserting wait cycles
US5479624A (en) * 1992-10-14 1995-12-26 Lee Research, Inc. High-performance interleaved memory system comprising a prime number of memory modules
US5530837A (en) * 1994-03-28 1996-06-25 Hewlett-Packard Co. Methods and apparatus for interleaving memory transactions into an arbitrary number of banks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1247185A4 *

Also Published As

Publication number Publication date
WO2001048610A8 (en) 2001-11-29
JP2003520368A (ja) 2003-07-02
US6519673B1 (en) 2003-02-11
KR100781132B1 (ko) 2007-12-03
AU2455201A (en) 2001-07-09
CN1437728A (zh) 2003-08-20
EP1247185A1 (en) 2002-10-09
KR20020079764A (ko) 2002-10-19
US6381669B1 (en) 2002-04-30
EP1247185A4 (en) 2008-01-02
JP5089842B2 (ja) 2012-12-05

Similar Documents

Publication Publication Date Title
US6381669B1 (en) Multi-bank, fault-tolerant, high-performance memory addressing system and method
US6748480B2 (en) Multi-bank, fault-tolerant, high-performance memory addressing system and method
US4654781A (en) Byte addressable memory for variable length instructions and data
US6381668B1 (en) Address mapping for system memory
US5313413A (en) Apparatus and method for preventing I/O bandwidth limitations in fast fourier transform processors
US7577819B2 (en) Vector indexed memory unit and method
US4926314A (en) Method and apparatus for determining available memory size
US6356991B1 (en) Programmable address translation system
US5579277A (en) System and method for interleaving memory banks
JPS63250752A (ja) メモリ・システム
JP2000516418A (ja) 再構成可能な演算システム
US4839796A (en) Static frame digital memory
US6463518B1 (en) Generation of memory addresses for accessing a memory utilizing scheme registers
US20080244169A1 (en) Apparatus for Efficient Streaming Data Access on Reconfigurable Hardware and Method for Automatic Generation Thereof
CA2324219A1 (en) A digital signal processor reducing access contention
US6766433B2 (en) System having user programmable addressing modes and method therefor
EP0166192A2 (en) High-speed buffer store arrangement for fast transfer of data
CN113704142B (zh) 片上存储的地址重映射电路
US6122702A (en) Memory cells matrix for a semiconductor integrated microcontroller
AU616653B2 (en) Method and apparatus for determining available memory size
GB2108737A (en) Byte addressable memory for variable length instructions and data
EP0310446A2 (en) Cache memory management method
Waltz SKIPSM implementations: morphology and much, much more
US6754766B1 (en) Emulation of content-addressable memories
US9979649B2 (en) High density content addressable memory

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: C1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

CFP Corrected version of a pamphlet front page

Free format text: REVISED ABSTRACT RECEIVED BY THE INTERNATIONAL BUREAU AFTER COMPLETION OF THE TECHNICAL PREPARATIONS FOR INTERNATIONAL PUBLICATION

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref country code: JP

Ref document number: 2001 549196

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1020027008395

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2000988335

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 008192162

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2000988335

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020027008395

Country of ref document: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642