WO2013121776A1 - A marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck - Google Patents

A marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck Download PDF

Info

Publication number
WO2013121776A1
WO2013121776A1 PCT/JP2013/000760 JP2013000760W WO2013121776A1 WO 2013121776 A1 WO2013121776 A1 WO 2013121776A1 JP 2013000760 W JP2013000760 W JP 2013000760W WO 2013121776 A1 WO2013121776 A1 WO 2013121776A1
Authority
WO
WIPO (PCT)
Prior art keywords
transistor
bit
backward
main
electrode
Prior art date
Application number
PCT/JP2013/000760
Other languages
French (fr)
Inventor
Tadao Nakamura
Michael J. Flynn
Original Assignee
Tadao Nakamura
Flynn Michael J
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tadao Nakamura, Flynn Michael J filed Critical Tadao Nakamura
Priority to EP21207151.8A priority Critical patent/EP3982366A1/en
Priority to JP2014556207A priority patent/JP6093379B2/en
Priority to EP13748879.7A priority patent/EP2815403B1/en
Priority to KR1020167026590A priority patent/KR101747619B1/en
Priority to EP18000889.8A priority patent/EP3477645B1/en
Priority to CN201380005030.5A priority patent/CN104040635B/en
Priority to KR1020147019139A priority patent/KR101689939B1/en
Publication of WO2013121776A1 publication Critical patent/WO2013121776A1/en
Priority to US14/450,705 priority patent/US10573359B2/en
Priority to US16/744,849 priority patent/US11164612B2/en

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/06Arrangements for interconnecting storage elements electrically, e.g. by wiring
    • G11C5/063Voltage and signal distribution in integrated semi-conductor memory access lines, e.g. word-line, bit-line, cross-over resistance, propagation delay
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/22Read-write [R-W] timing or clocking circuits; Read-write [R-W] control signal generators or management 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C19/00Digital stores in which the information is moved stepwise, e.g. shift registers
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C19/00Digital stores in which the information is moved stepwise, e.g. shift registers
    • G11C19/18Digital stores in which the information is moved stepwise, e.g. shift registers using capacitors as main elements of the stages
    • G11C19/182Digital stores in which the information is moved stepwise, e.g. shift registers using capacitors as main elements of the stages in combination with semiconductor elements, e.g. bipolar transistors, diodes
    • G11C19/184Digital stores in which the information is moved stepwise, e.g. shift registers using capacitors as main elements of the stages in combination with semiconductor elements, e.g. bipolar transistors, diodes with field-effect transistors, e.g. MOS-FET
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C19/00Digital stores in which the information is moved stepwise, e.g. shift registers
    • G11C19/18Digital stores in which the information is moved stepwise, e.g. shift registers using capacitors as main elements of the stages
    • G11C19/182Digital stores in which the information is moved stepwise, e.g. shift registers using capacitors as main elements of the stages in combination with semiconductor elements, e.g. bipolar transistors, diodes
    • G11C19/184Digital stores in which the information is moved stepwise, e.g. shift registers using capacitors as main elements of the stages in combination with semiconductor elements, e.g. bipolar transistors, diodes with field-effect transistors, e.g. MOS-FET
    • G11C19/186Digital stores in which the information is moved stepwise, e.g. shift registers using capacitors as main elements of the stages in combination with semiconductor elements, e.g. bipolar transistors, diodes with field-effect transistors, e.g. MOS-FET using only one transistor per capacitor, e.g. bucket brigade shift register
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C19/00Digital stores in which the information is moved stepwise, e.g. shift registers
    • G11C19/18Digital stores in which the information is moved stepwise, e.g. shift registers using capacitors as main elements of the stages
    • G11C19/182Digital stores in which the information is moved stepwise, e.g. shift registers using capacitors as main elements of the stages in combination with semiconductor elements, e.g. bipolar transistors, diodes
    • G11C19/188Organisation of a multiplicity of shift registers, e.g. regeneration, timing or input-output circuits
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C19/00Digital stores in which the information is moved stepwise, e.g. shift registers
    • G11C19/28Digital stores in which the information is moved stepwise, e.g. shift registers using semiconductor elements
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C19/00Digital stores in which the information is moved stepwise, e.g. shift registers
    • G11C19/28Digital stores in which the information is moved stepwise, e.g. shift registers using semiconductor elements
    • G11C19/287Organisation of a multiplicity of shift registers
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/02Disposition of storage elements, e.g. in the form of a matrix array
    • G11C5/025Geometric lay-out considerations of storage- and peripheral-blocks in a semiconductor storage device
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C8/00Arrangements for selecting an address in a digital store
    • G11C8/12Group selection circuits, e.g. for memory block selection, chip selection, array selection

Definitions

  • the instant invention relates to new memories and new computer systems using the new memories, which operate at a low energy consumption and high speed.
  • the computer system illustrated in Fig. 1 includes a processor 11, a cache memory (321a, 321b) and a main memory 331.
  • the processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, an arithmetic logic unit (ALU) 112 configured to execute arithmetic and logic operations synchronized with the clock signal, a instruction register file (RF) 322a connected to the control unit 111 and a data register file (RF) 322b connected to the ALU 112.
  • the cache memory (321a, 321b) has an instruction cache memory 321a and a data cache memory321b.
  • a portion of the main memory 331 and the instruction cache memory 321a are electrically connected by wires and/or buses, which limits the memory access time (or having the Von Neumann bottleneck).351.
  • the remaining portion of the main memory 331, and the data cache memory 321b are electrically connected to enable a similar memory access.351.
  • wires and/or buses, which implement memory access.352 electrically connect between the data cache memory 321b and the instruction cache memory 321a, and the instruction register file 322a and the data register file 322b.
  • the bottlenecks 351, 352 are ascribable to the wirings between processors 11 and the main memory 331, because the wire length delays access to the computers and stray capacitance existing between wires cause additional delay. Such capacitance requires more power consumption that is proportional to the processor clock frequency in 11.
  • HPC processors are implemented using several vector arithmetic pipelines. This vector processor makes better use of memory bandwidth and is a superior machine for HPC applications that can be expressed in vector notation.
  • the vector instructions are made from loops in a source program and each of these vector instructions is executed in an arithmetic pipeline in a vector processor or corresponding units in a parallel processor.
  • the results of these processing schemes give the same results.
  • the vector processor based system has the memory bottleneck 351, 352 between all the units. Even in a single system with a wide memory and large bandwidth, the same bottleneck 351, 352 appears and if the system consists of many of the same units as in a parallel processor, and the bottleneck 351, 352 is unavoidable.
  • the first problem is wiring lying not only between memory chips and caches or between these two units even on a chip but also inside memory systems. Between chips the wiring between these two chips/units results in more dynamic power consumption due to capacity and the wire signal time delay. This is extended to the internal wire problems within a memory chip related to access lines and the remaining read/write lines. Thus in both inter and intra wiring of memory chips, there exists energy consumption caused by the capacitors with these wires.
  • the second problem is the memory bottleneck 351, 352 between processor chip, cache and memory chips. Since the ALU can access any part of cache or memory, the access path 351, 352 consists of global wires of long length. These paths are also limited in the number of wires available. Such a bottleneck seems to be due to hardware such as busses. Especially when there is a high speed CPU and a large capacity of memory, the apparent bottleneck is basically between these two.
  • the key to removing the bottleneck is to have the same memory clock cycle as the CPU's.
  • addressing proceeding must be created to improve memory access.
  • time delay due to longer wires must be significantly reduced both inside memory and outside memory.
  • An aspect of the present invention inheres in a marching memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a transfer-transistor having a first main-electrode connected to a clock signal supply line through a first delay element and a control-electrode connected to an output terminal of a first neighboring bit-level cell disposed at input side of the array of the memory units, through a second delay element; (b) a reset-transistor having a first main-electrode connected to a second main-electrode of the transfer-transistor, a control-electrode connected to the clock signal supply line, and a second main-electrode connected to the ground potential; and (c) a capacitor configured to store the information of the bit-level cell, connected in parallel with the reset-transistor, wherein an output node connecting the second main-electrode
  • the first main-electrode shall be assigned as a source electrode or a drain electrode for a field effect transistor (FET), a static induction transistor (SIT), a high electron mobility transistor (HEMT), and others
  • the second main-electrode is the drain electrode if the first main-electrode is assigned as the source electrode.
  • the second main-electrode is the source electrode if the first main-electrode is assigned as the drain electrode for FET, SIT, and HEMT etc.
  • the first main-electrode shall be assigned as an emitter electrode or a collector electrode for a bipolar junction transistor (BJT), and the second main-electrode is the collector electrode if the first main-electrode is assigned as the emitter electrode.
  • the second main-electrode is the emitter electrode if the first main-electrode is assigned as the collector electrode for BJT.
  • the control-electrode is a gate electrode for FET, SIT, and HEMT, etc., and a base electrode for BJT.
  • FIG. 1 Another aspect of the present invention inheres in a bidirectional-marching memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element; (b) a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the first clock signal supply line, and a second main-electrode connected to the ground potential; (c) a backward transfer-transistor having a first main-electrode connected to a second clock signal
  • Still another aspect of the present invention inheres in a bidirectional-marching memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element; (b) a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the first clock signal supply line, and a second main-electrode connected to the ground potential; (c) a backward transfer-transistor having a first main-electrode connected to a second clock signal supply line
  • Still another aspect of the present invention inheres in a complex marching memory encompassing a plurality of marching memory blocks being deployed spatially, each of the marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size.
  • each of the memory units transfers synchronously with a clock signal synchronized with the CPU's clock signal, step by step, toward an output side of corresponding marching memory block from an input side of the corresponding marching memory block, and each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
  • Still another aspect of the present invention inheres in a complex marching memory encompassing a plurality of marching memory blocks being deployed spatially, each of the marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size.
  • each of the memory units transfers synchronously with a first clock signal, step by step, toward a first edge side of corresponding marching memory block from a second edge side of the corresponding marching memory block opposing to the first edge side, and further, each of the memory units transfers synchronously with a second clock signal, step by step, toward the second edge side from the first edge side, and each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
  • Still another aspect of the present invention inheres in a computer system comprising a processor and a marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the marching main memory to the processor, the marching main memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a transfer-transistor having a first main-electrode connected to a clock signal supply line through a first delay element and a control-electrode connected to an output terminal of a first neighboring bit-level cell disposed at input side of the array of the memory units, through a second delay element; (b) a reset-transistor having a first main-electrode connected to a
  • Still another aspect of the present invention inheres in a computer system comprising a processor and a bidirectional marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the bidirectional marching main memory to the processor, the bidirectional marching main memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element; (b) a forward reset-trans
  • Still another aspect of the present invention inheres in a computer system comprising a processor and a bidirectional marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the bidirectional marching main memory to the processor, the bidirectional marching main memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element; (b) a forward reset-trans
  • Still another aspect of the present invention inheres in a computer system encompassing a processor and a marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the marching main memory to the processor, the marching main memory encompassing a plurality of marching memory blocks being deployed spatially, each of the marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size.
  • each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
  • Still another aspect of the present invention inheres in a computer system encompassing a processor and a bidirectional marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the bidirectional marching main memory to the processor, the bidirectional marching main memory encompassing a plurality of bidirectional marching memory blocks being deployed spatially, each of the bidirectional marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size.
  • each of the memory units transfers synchronously with a first clock signal, step by step, toward a first edge side of corresponding marching memory block from a second edge side of the corresponding marching memory block opposing to the first edge side, and further, each of the memory units transfers synchronously with a second clock signal, step by step, toward the second edge side from the first edge side, and each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
  • Fig. 1 illustrates a schematic block diagram illustrating an organization of a conventional computer system
  • Fig. 2 illustrates a schematic block diagram illustrating a fundamental organization of a computer system pertaining to a first embodiment of the present invention
  • Fig. 3 illustrates an array of memory units implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention, and a transfer of information in the marching main memory
  • Fig. 4 illustrates an example of a transistor-level representation of the cell-array in the marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 1 illustrates a schematic block diagram illustrating an organization of a conventional computer system
  • Fig. 2 illustrates a schematic block diagram illustrating a fundamental organization of a computer system pertaining to a first embodiment of the present invention
  • Fig. 3 illustrates an array of memory units implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention, and a transfer of information in the marching
  • FIG. 5 illustrates an enlarged transistor-level representation of the cell-array in the marching main memory used in the computer system pertaining to the first embodiment of the present invention, focusing to four neighboring bit-level cells;
  • Fig. 6 illustrates a further enlarged transistor-level representation of a single bit-level cell in the marching main memory used in the computer system pertaining to the first embodiment of the present invention;
  • Fig. 7A illustrates a schematic example of the response of the transistor to the waveform of a clock signal configured to be applied to the marching main memory used in the computer system pertaining to the first embodiment of the present invention, illustrating a case when a signal "1" is transferred from the previous stage;
  • FIG. 7B illustrates another schematic example of the response of the transistor to the waveform of the clock signal configured to be applied to the marching main memory used in the computer system pertaining to the first embodiment of the present invention, illustrating another case when a signal "0" is transferred from the previous stage;
  • Fig. 7C illustrates an actual example of the responses of the transistors to the waveform of a clock signal configured to be applied to the marching main memory used in the computer system pertaining to the first embodiment of the present invention;
  • Fig. 8 illustrates a detailed example of a bit-level cell used in the marching main memory for the computer system pertaining to the first embodiment of the present invention;
  • Fig. 9 illustrates an example of actual plan view implementing the bit-level cell illustrated in Fig. 8;
  • FIG. 10 illustrates a cross-sectional view taken on line A-A in the plan view illustrated in Fig. 9;
  • Fig. 11 illustrates another enlarged transistor-level representation of the single bit-level cell in combination with an inter-unit cell adapted for a marching main memory used in the computer system pertaining to a modification of the first embodiment of the present invention;
  • Fig. 12 illustrates an example of actual plan view implementing the bit-level cell illustrated in Fig. 11;
  • Fig. 13 illustrates an enlarged transistor-level representation of the cell-array in combination with corresponding inter-unit cells, in the marching main memory used in the computer system pertaining to the modification of the first embodiment of the present invention, focusing to two neighboring bit-level cells;
  • Fig. 11 illustrates another enlarged transistor-level representation of the single bit-level cell in combination with an inter-unit cell adapted for a marching main memory used in the computer system pertaining to a modification of the first embodiment of the present invention
  • Fig. 12 illustrates an example of actual plan
  • FIG. 14(a) illustrates a timing diagram of a response of the bit-level cell illustrated in Fig. 13, and Fig. 14(b) illustrates a next timing diagram of a next response of the next bit-level cell illustrated in Fig. 13, to a waveform of a clock signal.
  • Fig. 15 illustrates an actual example of the responses of the transistors to the waveform of a clock signal configured to be applied to the marching main memory used in the computer system pertaining to the modification of the first embodiment of the present invention;
  • Figs. 16 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell illustrated in Figs. 11 and 13, in the marching main memory used in the computer system pertaining to the modification of the first embodiment of the present invention;
  • Fig. 16 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell illustrated in Figs. 11 and 13, in the marching main memory used in the computer system pertaining to the modification of the
  • FIG. 17 illustrates a still another enlarged transistor-level representation of the single bit-level cell in combination with an inter-unit cell adapted for a marching main memory used in the computer system pertaining to another modification (second modification) of the first embodiment of the present invention
  • Fig. 18 illustrates an enlarged transistor-level representation of the cell-array in combination with corresponding inter-unit cells, in the marching main memory used in the computer system pertaining to the second modification of the first embodiment of the present invention, focusing to two neighboring bit-level cells;
  • FIG. 19 illustrates a yet still another enlarged transistor-level representation of the single bit-level cell in combination with an inter-unit cell adapted for a marching main memory used in the computer system pertaining to a still another modification (third modification) of the first embodiment of the present invention
  • Fig. 20 illustrates an enlarged transistor-level representation of the cell-array in combination with corresponding inter-unit cells, in the marching main memory used in the computer system pertaining to the third modification of the first embodiment of the present invention, focusing to two neighboring bit-level cells
  • Fig. 21 illustrates an actual example of the responses of the transistors to the waveform of a clock signal configured to be applied to the marching main memory used in the computer system pertaining to the third modification of the first embodiment of the present invention
  • FIG. 22 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell illustrated in Figs. 20 and 21, in the marching main memory used in the computer system pertaining to the third modification of the first embodiment of the present invention
  • Fig. 23 illustrates a gate-level representation of the cell-array illustrated in Fig. 4
  • Fig. 24 illustrates an array of memory units implementing a reverse directional marching main memory used in the computer system pertaining to the first embodiment of the present invention, and a reverse directional transfer of information in the reverse directional marching main memory
  • FIG. 25(a) illustrates an example of a transistor-level circuit configuration of cell array implementing i-th row of the reverse directional marching main memory illustrated in Fig. 24, and Fig.
  • FIG. 25(b) illustrates an example of the response of the transistor to the waveform of a clock signal configured to be applied to the reverse directional marching main memory illustrated in Fig. 24;
  • Fig. 26 illustrates a gate-level representation of the cell-array implementing i-th row in the reverse directional marching main memory illustrated in Fig. 25 (a);
  • Fig. 27 illustrates a time-domain relationship between the memory unit streaming time in a marching main memory and the clock cycle in a processor (CPU) in the computer system pertaining to the first embodiment of the present invention;
  • FIG. 28 illustrates schematically an organization of the computer system pertaining to the first embodiment of the present invention, in which the memory bottleneck is disappeared between the processor (CPU) and the marching memory structure including the marching main memory, in the computer system pertaining to the first embodiment of the present invention
  • Fig. 29(a) illustrates a forward data stream flowing from the marching memory structure, which includes the marching main memory, to the processor (CPU) and backward data stream flowing from the processor (CPU) to the marching memory structure in the computer system pertaining to the first embodiment of the present invention
  • Fig. 29(a) illustrates a forward data stream flowing from the marching memory structure, which includes the marching main memory, to the processor (CPU) and backward data stream flowing from the processor (CPU) to the marching memory structure in the computer system pertaining to the first embodiment of the present invention
  • Fig. 29(a) illustrates a forward data stream flowing from the marching memory structure, which includes the marching main memory, to the processor (CPU) and backward data stream
  • FIG. 29(b) illustrates bandwidths established between the marching memory structure and the processor (CPU) under an ideal condition that the memory unit streaming time of the marching memory structure is equal to the clock cycle of the processor (CPU);
  • Fig. 30(a) schematically illustrates an extremely high-speed magnetic tape system, comparing with the computer system illustrated in Fig. 30(b), which corresponds to the computer system pertaining to the first embodiment of the present invention;
  • Fig. 31 (a) illustrates a concrete image of a marching behavior of information (a forward marching behavior), in which information marches (shifts) side by side toward right-hand direction in a one-dimensional marching main memory,
  • Fig. 31 (b) illustrates a staying state of the one-dimensional marching main memory, and Fig.
  • FIG. 31 (c) illustrates a concrete image of a reverse-marching behavior of information (a backward marching behavior), in which information marches (shifts) side by side toward left-hand direction in the one-dimensional marching main memory in the computer system pertaining to the first embodiment of the present invention
  • Fig. 32 illustrates an example of a transistor-level circuit configuration of the one-dimensional marching main memory, which can achieve the bidirectional transferring behavior illustrated in Figs. 31 (a)-(c), configured to store and transfer bi-directionally instructions or scalar data in the computer system pertaining to the first embodiment of the present invention
  • FIG. 33 illustrates another example of a transistor-level circuit configuration of the one-dimensional marching main memory, incorporating isolation transistors between memory units, which can achieve the bidirectional transferring behavior illustrated in Figs. 31 (a)-(c), configured to store and transfer bi-directionally instructions or scalar data in the computer system pertaining to the first embodiment of the present invention
  • Fig. 34 illustrates a generic representation of the gate-level circuit configuration of the one-dimensional marching main memory illustrated in Fig. 32
  • Fig. 35(a) illustrates a bidirectional transferring mode of instructions in a one-dimensional marching main memory adjacent to a processor, the instructions moves toward the processor, and moves from / to the next memory arranged at left-hand side, Fig.
  • Fig. 35(b) illustrates a bidirectional transferring mode of scalar data in a one-dimensional marching main memory adjacent to an ALU, the scalar data moves toward the ALU and moves from / to the next memory
  • Fig. 35(c) illustrates a uni-directional transferring mode of vector/streaming data in a one-dimensional marching main memory adjacent to a pipeline, the vector/streaming data moves toward the pipeline, and moves from the next memory
  • Fig. 36(a) compares with Fig. 36(b), illustrating an inner configuration of existing memory, in which each memory unit is labeled by an address
  • FIG. 36(b) illustrates an inner configuration of present one-dimensional marching main memory, in which the positioning of individual memory unit is at least necessary to identify the starting point and ending point of a set of successive memory units in vector/streaming data.
  • Fig. 37(a) illustrates an inner configuration of present one-dimensional marching main memory, in which the positioning of individual memory unit is at least necessary to identify the starting point and ending point of a set of successive memory units in vector instruction
  • Fig. 37(b) illustrates an inner configuration of present one-dimensional marching main memory for scalar data..
  • FIG. 37(c) illustrates an inner configuration of present one-dimensional marching main memory, in which position indexes are at least necessary to identify the starting point and ending point of a set of successive memory units in vector/streaming data;
  • Fig. 38(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages for vector/streaming data case
  • Fig. 38(b) illustrates schematically an example of a configuration of one of the pages, each of the page is implemented by a plurality of files for vector/streaming data case
  • Fig. 38(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages for vector/streaming data case
  • Fig. 38(b) illustrates schematically an example of a configuration of one of the pages, each of the page is implemented by a plurality of files for vector/streaming data case
  • FIG. 38(c) illustrates schematically an example of a configuration of one of the files, each of the file is implemented by a plurality of memory units for vector/streaming data case, in the computer system pertaining to the first embodiment of the present invention
  • Fig. 39(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages for programs/scalar data case, where each pages has its own position index as address
  • Fig. 39(b) illustrates schematically an example of a configuration of one of the pages and the driving positions of the page, using digits in the binary system, each of the page is implemented by a plurality of files for programs/scalar data case, and each file has its own position index as address
  • Fig. 39(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages for programs/scalar data case, where each pages has its own position index as address
  • Fig. 39(b) illustrates schematically an example of a configuration
  • FIG. 39(c) illustrates schematically an example of a configuration of one of the files and the driving positions of the file, using digits in the binary system, each of the file is implemented by a plurality of memory units for programs/scalar data case, where each memory units has its own position index as address, in the computer system pertaining to the first embodiment of the present invention
  • Fig. 40(a) illustrates schematically the speed/capability of the existing memory compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 40(b) illustrates schematically the speed/capability of the marching main memory compared with that of the existing memory illustrated in Fig. 40(a);
  • Fig. 40(a) illustrates schematically the speed/capability of the existing memory compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 40(b) illustrates schematically the speed/capability of the marching main memory compared with that of the existing memory
  • FIG. 41(a) illustrates schematically the speed/capability of the worst case of the existing memory for scalar instructions compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 41(b) illustrates schematically the speed/capability of the marching main memory compared with that of the worst case of the existing memory illustrated in Fig. 41(a);
  • Fig. 42(a) illustrates schematically the speed/capability of the typical case of the existing memory for scalar instructions compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention, and Fig.
  • FIG. 42(b) illustrates schematically the speed/capability of the marching main memory compared with that of the typical case of the existing memory illustrated in Fig. 42(a);
  • Fig. 43(a) illustrates schematically the speed/capability of the typical case of the existing memory for scalar data case compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention, and
  • Fig. 43(b) illustrates schematically the speed/capability of the marching main memory compared with that of the existing memory illustrated in Fig. 43(a);
  • Fig. 43(a) illustrates schematically the speed/capability of the marching main memory compared with that of the existing memory illustrated in Fig. 43(a);
  • Fig. 43(a) illustrates schematically the speed/capability of the marching main memory compared with that of the existing memory illustrated in Fig. 43(a);
  • Fig. 43(a) illustrates schematically the speed/capability of the marching main memory compared with that of
  • Fig. 44(a) illustrates schematically the speed/capability of the best case of the existing memory for streaming data and data parallel case compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 44(b) illustrates schematically the speed/capability of the marching main memory compared with that of the best case of the existing memory illustrated in Fig. 44(a)
  • Fig. 45 illustrates an example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 45 illustrates an example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • FIG. 46 illustrates another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 47 illustrates a still another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 48 illustrates a yet still another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 48 illustrates a yet still another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • FIG. 49 illustrates a further another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 50 illustrates a further another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 51 illustrates a further another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 50 illustrates a further another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention
  • Fig. 51 illustrates a further another example of the array of
  • FIG. 52(a) illustrates device level energy consumption in current microprocessors, decomposing into static and dynamic energy consumptions
  • Fig. 52(b) illustrates net and overhead of the power consumption in the dynamic energy consumption illustrated in Fig. 52(a)
  • Fig. 52(c) illustrates the net energy consumption in the current microprocessors
  • Fig. 53 illustrates an actual energy consumption distribution over a processor including registers and caches in the conventional architecture, estimated by Dally
  • Fig. 54(a) illustrates energy consumption in the conventional cache-based architecture, decomposing the energy consumption in the cache memory into static and dynamic energy consumptions
  • Fig. 54(b) illustrates energy consumption in the computer system according to a third embodiment of the present invention, decomposing the energy consumption in the marching cache memory into static and dynamic energy consumption.
  • Fig. 55 illustrates a schematic block diagram illustrating an organization of a computer system pertaining to a second embodiment of the present invention
  • Fig. 56 illustrates a schematic block diagram illustrating an organization of a computer system pertaining to a third embodiment of the present invention
  • Fig. 57(a) illustrates a combination of arithmetic pipelines and marching register units in the computer system pertaining to the third embodiment of the present invention
  • Fig. 57(b) illustrates an array of marching cache units in the computer system pertaining to the third embodiment of the present invention
  • Fig. 55 illustrates a schematic block diagram illustrating an organization of a computer system pertaining to a second embodiment of the present invention
  • Fig. 56 illustrates a schematic block diagram illustrating an organization of a computer system pertaining to a third embodiment of the present invention
  • Fig. 57(a) illustrates a combination of arithmetic pipelines and marching register units in the computer system pertaining to the third embodiment of the present invention
  • Fig. 58 illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a single processor core, a marching-cache memory and a marching-register file in accordance with a modification of the third embodiment of the present invention
  • Fig. 59 illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a single arithmetic pipeline, a marching-cache memory and a marching-vector register file in accordance with another modification of the third embodiment of the present invention
  • FIG. 60 illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a plurality of processor cores, a marching-cache memory and a marching-register file in accordance with a still another modification of the third embodiment of the present invention
  • Fig. 61 illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a plurality of arithmetic pipelines, a marching-cache memory and a marching-vector register file in accordance with a yet still another modification of the third embodiment of the present invention
  • Fig. 60 illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a plurality of arithmetic pipelines, a marching-cache memory and a marching-vector register file in accordance with a yet still another modification of the third embodiment of the present invention
  • Fig. 62(a) illustrates a schematic block diagram of an organization of a conventional computer system implemented by a combination of a plurality of arithmetic pipelines, a plurality of conventional cache memories, a plurality of conventional-vector register files (RFs) and a conventional main memory, in which bottleneck is established between the conventional cache memories and the conventional main memory
  • Fig. 62(b) illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a plurality of arithmetic pipelines, a plurality of marching cache memories, a plurality of marching-vector register files and a marching main memory, in which no bottleneck is established, in accordance with a yet still another modification of the third embodiment of the present invention
  • Fig. 63 illustrates a schematic block diagram illustrating an organization of a high performance computing (HPC) system pertaining to a fourth embodiment of the present invention
  • Fig. 64 illustrates a schematic block diagram illustrating an organization of a computer system pertaining to a fifth embodiment of the present invention
  • Fig. 65(a) illustrates a cross-sectional view of a three-dimensional marching main memory used in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 65(b) illustrates a cross-sectional view of a three-dimensional marching-cache used in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 65(a) illustrates a cross-sectional view of a three-dimensional marching main memory used in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 65(b) illustrates a cross-sectional view of a three-dimensional marching-cache used in the computer system pertaining to the fifth embodiment of the present invention
  • FIG. 65(c) illustrates a cross-sectional view of a three-dimensional marching-register file used in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 66 illustrates a perspective view of a three-dimensional configuration used in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 67 illustrates a perspective view of another three-dimensional configuration used in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 68 illustrates a cross-sectional view of the three-dimensional configuration illustrated in Fig. 67
  • Fig. 69 illustrates a cross-sectional view of another three-dimensional configuration used in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 66 illustrates a perspective view of a three-dimensional configuration used in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 67 illustrates a perspective view of another three-dimensional configuration used in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 69 illustrates
  • Fig. 70 illustrates schematically a cross-sectional view of the three-dimensional configuration of a fundamental core of the computer system for executing the control processing, by representing control paths in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 71 illustrates schematically a cross-sectional view of the three-dimensional configuration of a fundamental core of the computer system for executing the scalar data processing, by representing data-paths for scalar data in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 72 illustrates schematically a cross-sectional view of the three-dimensional configuration of a fundamental core of the computer system for executing the vector/streaming data processing, by representing data-paths for vector/streaming data in the computer system pertaining to the fifth embodiment of the present invention
  • Fig. 73 illustrates schematically a cross-sectional view of the three-dimensional configuration of a fundamental core of the computer system, configured to execute the scalar data part of the computer system, where a plurality of processing units (CPUs) execute not only scalar data but also vector/streaming data, and pipelined ALUs are included in the processing units, by representing the combination of scalar data-path and the control path for the computer system pertaining to the fifth embodiment of the present invention;
  • Fig. 74 illustrates a bit-level parallel processing of scalar/vector data in MISD architecture
  • Fig. 75 illustrates a parallel processing of vector data in SIMD architecture
  • Fig. 76 illustrates a typical chaining in vector processing
  • Fig. 77 illustrates a parallel processing of scalar/vector data in MISD architecture
  • Fig. 78 illustrates a parallel processing of scalar/vector data in MISD architecture
  • Fig. 79(a) illustrates a plan view of a representative conventional DRAM delineated on a single semiconductor chip
  • Fig. 79(b) illustrates a corresponding plan view of a schematic inner layout of a complex marching memory, which is delineated on the same single semiconductor chip of the conventional DRAM
  • Fig. 79(a) illustrates a plan view of a representative conventional DRAM delineated on a single semiconductor chip
  • Fig. 79(b) illustrates a corresponding plan view of a schematic inner layout of a
  • Fig. 80 (a) illustrates an outer shape of a single marching memory block
  • Fig. 80 (b) illustrates a partial plan view of the marching memory block illustrated in Fig. 80 (a), which has one thousand of columns, where the marching memory's access time (cycle time) is defined to a single column
  • Fig. 80(c) illustrates the conventional DRAM's memory cycle for writing in or reading out the content of the conventional DRAM's one memory element
  • Fig. 81 illustrates a schematic plan view of a complex marching memory module.
  • nMOS transistors are illustrated as transfer-transistors and reset-transistors in transistor-level representations of bit-level cells in Figs. 4, 5, 6, 8, 11, 13, 16-20, 22, 25 and 32, etc.
  • pMOS transistors can be used as the transfer-transistors and the reset-transistors, if the opposite polarity of the clock signal is employed.
  • a computer system pertaining to a first embodiment of the present invention encompasses a processor 11 and a marching main memory 31.
  • the processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, and an arithmetic logic unit (ALU) 112 configured to execute arithmetic and logic operations synchronized with the clock signal.
  • ALU arithmetic logic unit
  • the marching main memory 31 encompasses an array of memory units U 1 , U 2 , U 3 ,........., U n-1 , U n , each of memory units U 1 , U 2 , U 3 ,........., U n-1 , U n having a unit of information including word size of data or instructions, input terminals of the array and output terminals of the array. As illustrated in Fig.
  • the marching main memory 31 stores the information in each of memory units U 1 , U 2 , U 3 ,........., U n-1 , U n and transfers the information synchronously with the clock signal, step by step, toward the output terminals, so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
  • each of joint members 54 may be implemented by a first terminal pin attached to the marching main memory 31, a second terminal pin attached to the processor 11, and an electrical conductive bump interposed between the first and second terminal pins.
  • electrical conductive bumps solder balls, gold (Au) bumps, silver (Ag) bumps, copper (Cu) bumps, nickel-gold (Ni-Au) alloy bumps or nickel-gold-indium (Ni-Au-In) alloy bumps, etc. are acceptable.
  • the resultant data of the processing in the ALU 112 are sent out to the marching main memory 31 through the joint members 54.
  • bidirectional arrow PHI (Greek-letter) 12
  • data are transferred bi-directionally between the marching main memory 31 and the processor 11 through the joint members 54.
  • uni-directional arrow ETA(Greek-letter) 11 as to the instructions movement, there is only one way of instruction-flow from the marching main memory 31 to the processor 11.
  • the organization of the computer system pertaining to the first embodiment of the present invention further encompasses an external secondary memory 41 such as disk, an input unit 61, an output unit 62 and input/ output (I/O) interface circuit 63.
  • an external secondary memory 41 such as disk
  • the signals or data are received by the input unit 61, and the signals or data are sent from the output unit 62.
  • known keyboards and known mice can be considered as the input unit 6, while known monitors and printers can be considered as the output unit 62.
  • Known devices for communication between computers, such as modems and network cards typically serve for both the input unit 61 and the output unit 62. Note that the designation of a device as either the input unit 61 or the output unit 62 depends on the perspective.
  • the input unit 61 takes as input physical movement that the human user provides and converts it into signals that the computer system pertaining to the first embodiment can understand. For example, the input unit 61 converts incoming data and instructions into a pattern of electrical signals in binary code that are comprehensible to the computer system pertaining to the first embodiment, and the output from the input unit 61 is fed to the marching main memory 31 through the I/O interface circuit 63.
  • the output unit 62 takes as input signals that the marching main memory 31 provides through the I/O interface circuit 63. The output unit 62 then converts these signals into representations that human users can see or read, reversing the process of the input unit 61, translating the digitized signals into a form intelligible to the user.
  • the I/O interface circuit 63 is required whenever the processor 11 drives the input unit 61 and the output unit 62.
  • the processor 11 can communicate with the input unit 61 and the output unit 62 through the I/O interface circuit 63. If in the case of different data formatted being exchanged, the I/O interface circuit 63 converts serial data to parallel form and vice-versa. There is provision for generating interrupts and the corresponding type numbers for further processing by the processor 11 if required.
  • the secondary memory 41 stores data and information on a more long-term basis than the marching main memory 31. While the marching main memory 31 is concerned mainly with storing programs currently executing and data currently being employed, the secondary memory 41 is generally intended for storing anything that needs to be kept even if the computer is switched off or no programs are currently executing.
  • the examples of the secondary memory 41 are known hard disks (or hard drives) and known external media drives (such as CD-ROM drives). These storage methods are most commonly used to store the computer's operating system, the user's collection of software and any other data the user wishes.
  • the processor 11 may includes a plurality of arithmetic pipelines configured to receive the stored information through the output terminals from the marching main memory 31, and as represented by bidirectional arrow PHI 12 , data are transferred bi-directionally between the marching main memory 31 and the plurality of arithmetic pipelines through the joint members 54.
  • the unit of address resolution is either a character (e.g. a byte) or a word. If the unit is a word, then a larger amount of memory can be accessed using an address of a given size. On the other hand, if the unit is a byte, then individual characters can be addressed (i.e. selected during the memory operation).
  • Machine instructions are normally fractions or multiples of the architecture's word size. This is a natural choice since instructions and data usually share the same memory subsystem.
  • Figs. 4 and 5 correspond to transistor-level representations of the cell array implementing the marching main memory 31 illustrated in Fig. 3
  • Fig. 23 corresponds to a gate-level representation of the cell array implementing marching main memory 31 illustrated in Fig. 3.
  • the first column of the m * n matrix which is implemented by a vertical array of cell M 11 , M 21 , M 31 , ........, M m-1,1 , M m1 , represents the first memory unit U 1 illustrated in Fig.3.
  • "m" is an integer determined by word size. Although the choice of a word size is of substantial importance, when computer architecture is designed, word sizes are naturally multiples of eight bits, with 16, 32, and 64 bits being commonly used.
  • the second column of the m * n matrix which is implemented by a vertical array of cell M 12 , M 22 , M 32 , ........, M m-1,2 , M m2 , represents the second memory unit U 2
  • the third column of the m * n matrix which is implemented by a vertical array of cell M 13 , M 23 , M 33 , ........, M m-1,3 , M m3 , represents the third memory unit U 3 , .
  • the (n-1)-th column of the m * n matrix which is implemented by a vertical array of cell M 1,n-1 , M 2,n-1 , M 3,n-1 , ........, M m-1,n-1 , M m,n-1 , represents the (n-1)-th memory unit U n-1
  • the n-th column of the m * n matrix which is implemented by a vertical array of cell M 1,n , M 2,n-1 , M 3,n-1 , ........, M m-1,n-1
  • the first memory unit U 1 of word-size level is implemented by a vertical array of bit-level cell M 11 , M 21 , M 31 , ........, M m-1,1 , M m1 in the first column of the m * n matrix.
  • the first-column cell M 11 on the first row encompasses a first nMOS transistor Q 111 having a drain electrode connected to a clock signal supply line through a first delay element D 111 and a gate electrode connected to the output terminal of a first bit-level input terminal through a second delay element D 112 ; a second nMOS transistor Q 112 having a drain electrode connected to a source electrode of the first nMOS transistor Q 111 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 11 configured to store the information of the cell M 11 , connected in parallel with the second nMOS transistor Q 112 ,wherein an output node connecting the source electrode of the first nMOS transistor Q 111 and the drain electrode of the second nMOS transistor Q 112 serves as an output terminal of the cell M 11 , configured to deliver the signal stored in the capacitor C 11 to the next bit-level cell M 12 .
  • the first-column cell M 21 on the second row encompasses a first nMOS transistor Q 211 having a drain electrode connected to the clock signal supply line through a first delay element D 211 and a gate electrode connected to the output terminal of a second bit-level input terminal through a second delay element D 212 ; a second nMOS transistor Q 212 having a drain electrode connected to a source electrode of the first nMOS transistor Q 211 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 21 configured to store the information of the cell M 21 , connected in parallel with the second nMOS transistor Q 212 ,wherein an output node connecting the source electrode of the first nMOS transistor Q 211 and the drain electrode of the second nMOS transistor Q 212 serves as an output terminal of the cell M 21 , configured to deliver the signal stored in the capacitor C 21 to the next bit-level cell M 22 .
  • the first-column cell M 31 on the third row encompasses a first nMOS transistor Q 311 having a drain electrode connected to the clock signal supply line through a first delay element D 311 and a gate electrode connected to the output terminal of a third bit-level input terminal through a second delay element D 312 ; a second nMOS transistor Q 312 having a drain electrode connected to a source electrode of the first nMOS transistor Q 311 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 31 configured to store the information of the cell M 31 , connected in parallel with the second nMOS transistor Q 312 , wherein an output node connecting the source electrode of the first nMOS transistor Q 311 and the drain electrode of the second nMOS transistor Q 312 serves as an output terminal of the cell M 31 , configured to deliver the signal stored in the capacitor C 31 to the next bit-level cell M 31 .
  • the first-column cell M (m-1)1 on the (m-1)-th row encompasses a first nMOS transistor Q (m-1)11 having a drain electrode connected to the clock signal supply line through a first delay element D (m-1)11 and a gate electrode connected to the output terminal of a (m-1)-th bit-level input terminal through a second delay element D (m-1)12 ; a second nMOS transistor Q (m-1)12 having a drain electrode connected to a source electrode of the first nMOS transistor Q (m-1)11 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C (m-1)1 configured to store the information of the cell M (m-1)1 , connected in parallel with the second nMOS transistor Q (m-1)12 , wherein an output node connecting the source electrode of the first nMOS transistor Q (m-1)11 and the drain electrode of the second nMOS transistor Q (m-1)12
  • the first-column cell M m1 on the m-th row encompasses a first nMOS transistor Q m11 having a drain electrode connected to the clock signal supply line through a first delay element D m11 and a gate electrode connected to the output terminal of a m-th bit-level input terminal through a second delay element D m12 ; a second nMOS transistor Q m12 having a drain electrode connected to a source electrode of the first nMOS transistor Q m11 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C m1 configured to store the information of the cell M m1 , connected in parallel with the second nMOS transistor Q m12 ,wherein an output node connecting the source electrode of the first nMOS transistor Q m11 and the drain electrode of the second nMOS transistor Q m12 serves as an output terminal of the cell M m1 , configured to deliver the signal stored in the capacitor C m1 to
  • the second memory unit U 2 of word-size level is implemented by a vertical array of bit-level cell M 12 , M 22 , M 32 , ........, M m-1,2 , M m2 in the second column of the m * n matrix.
  • the second column cell M 12 on the first row encompasses a first nMOS transistor Q 121 having a drain electrode connected to the clock signal supply line through a first delay element D 121 and a gate electrode connected to the output terminal of the previous bit-level cell M 11 through a second delay element D 122 ; a second nMOS transistor Q 122 having a drain electrode connected to a source electrode of the first nMOS transistor Q 121 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 12 configured to store the information of the cell M 12 , connected in parallel with the second nMOS transistor Q 122 ,wherein an output node connecting the source electrode of the first nMOS transistor Q 121 and the drain electrode of the second nMOS transistor Q 122 serves as an output terminal of the cell M 12 , configured to deliver the signal stored in the capacitor C 12 to the next bit-level cell M 13 .
  • the second column cell M 22 on the second row encompasses a first nMOS transistor Q 221 having a drain electrode connected to the clock signal supply line through a first delay element D 221 and a gate electrode connected to the output terminal of the previous bit-level cell M 21 through a second delay element D 222 ; a second nMOS transistor Q 222 having a drain electrode connected to a source electrode of the first nMOS transistor Q 221 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 22 configured to store the information of the cell M 22 , connected in parallel with the second nMOS transistor Q 222 ,wherein an output node connecting the source electrode of the first nMOS transistor Q 221 and the drain electrode of the second nMOS transistor Q 222 serves as an output terminal of the cell M 22 , configured to deliver the signal stored in the capacitor C 22 to the next bit-level cell M 23 .
  • the second column cell M 32 on the third row encompasses a first nMOS transistor Q 321 having a drain electrode connected to the clock signal supply line through a first delay element D 321 and a gate electrode connected to the output terminal of the previous bit-level cell M 31 through a second delay element D 322 ; a second nMOS transistor Q 322 having a drain electrode connected to a source electrode of the first nMOS transistor Q 321 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 32 configured to store the information of the cell M 32 , connected in parallel with the second nMOS transistor Q 322 , wherein an output node connecting the source electrode of the first nMOS transistor Q 321 and the drain electrode of the second nMOS transistor Q 322 serves as an output terminal of the cell M 32 , configured to deliver the signal stored in the capacitor C 32 to the next bit-level cell M 33 .
  • the second column cell M (m-1)2 on the (m-1)-th row encompasses a first nMOS transistor Q (m-1)21 having a drain electrode connected to the clock signal supply line through a first delay element D (m-1)21 and a gate electrode connected to the output terminal of the previous bit-level cell M (m-1)1 through a second delay element D (m-1)22 ; a second nMOS transistor Q (m-1)22 having a drain electrode connected to a source electrode of the first nMOS transistor Q (m-1)21 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C (m-1)2 configured to store the information of the cell M (m-1)2 , connected in parallel with the second nMOS transistor Q (m-1)22 , wherein an output node connecting the source electrode of the first nMOS transistor Q (m-1)21 and the drain electrode of the second nMOS transistor Q (m-1)22 serves as an output
  • the second column cell M m2 on the m-th row encompasses a first nMOS transistor Q m21 having a drain electrode connected to the clock signal supply line through a first delay element D m21 and a gate electrode connected to the output terminal of the previous bit-level cell M m1 through a second delay element D m22 ; a second nMOS transistor Q m22 having a drain electrode connected to a source electrode of the first nMOS transistor Q m21 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C m2 configured to store the information of the cell M m2 , connected in parallel with the second nMOS transistor Q m22 ,wherein an output node connecting the source electrode of the first nMOS transistor Q m21 and the drain electrode of the second nMOS transistor Q m22 serves as an output terminal of the cell M m2 , configured to deliver the signal stored in the capacitor C m2 to the next bit-
  • the third memory unit U 3 of word-size level is implemented by a vertical array of bit-level cell M 13 , M 23 , M 33 , ........, M m-1,3 , M m3 in the third column of the m * n matrix.
  • the third-column cell M 13 on the first row encompasses a first nMOS transistor Q 131 having a drain electrode connected to the clock signal supply line through a first delay element D 131 and a gate electrode connected to the output terminal of the previous bit-level cell M 12 through a second delay element D 132 ; a second nMOS transistor Q 132 having a drain electrode connected to a source electrode of the first nMOS transistor Q 131 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 13 configured to store the information of the cell M 13 , connected in parallel with the second nMOS transistor Q 132 ,wherein an output node connecting the source electrode of the first nMOS transistor Q 131 and the drain electrode of the second nMOS transistor Q 132 serves as an output terminal of the cell M 13 , configured to deliver the signal stored in the capacitor C 13 to the next bit-level cell.
  • the third-column cell M 23 on the second row encompasses a first nMOS transistor Q 231 having a drain electrode connected to the clock signal supply line through a first delay element D 231 and a gate electrode connected to the output terminal of the previous bit-level cell M 22 through a second delay element D 232 ; a second nMOS transistor Q 232 having a drain electrode connected to a source electrode of the first nMOS transistor Q 231 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 23 configured to store the information of the cell M 23 , connected in parallel with the second nMOS transistor Q 232 ,wherein an output node connecting the source electrode of the first nMOS transistor Q 231 and the drain electrode of the second nMOS transistor Q 232 serves as an output terminal of the cell M 23 , configured to deliver the signal stored in the capacitor C 23 to the next bit-level cell.
  • the third-column cell M 33 on the third row encompasses a first nMOS transistor Q 331 having a drain electrode connected to the clock signal supply line through a first delay element D 331 and a gate electrode connected to the output terminal of the previous bit-level cell M 32 through a second delay element D 332 ; a second nMOS transistor Q 332 having a drain electrode connected to a source electrode of the first nMOS transistor Q 331 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 33 configured to store the information of the cell M 33 , connected in parallel with the second nMOS transistor Q 332 , wherein an output node connecting the source electrode of the first nMOS transistor Q 331 and the drain electrode of the second nMOS transistor Q 332 serves as an output terminal of the cell M 33 , configured to deliver the signal stored in the capacitor C 33 to the next bit-level cell.
  • the third-column cell M (m-1)3 on the (m-1)-th row encompasses a first nMOS transistor Q (m-1)31 having a drain electrode connected to the clock signal supply line through a first delay element D (m-1)31 and a gate electrode connected to the output terminal of the previous bit-level cell M (m-1)2 through a second delay element D (m-1)32 ; a second nMOS transistor Q (m-1)32 having a drain electrode connected to a source electrode of the first nMOS transistor Q (m-1)31 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C (m-1)3 configured to store the information of the cell M (m-1)3 , connected in parallel with the second nMOS transistor Q (m-1)32 , wherein an output node connecting the source electrode of the first nMOS transistor Q (m-1)31 and the drain electrode of the second nMOS transistor Q (m-1)32 serves
  • the third-column cell M m3 on the m-th row encompasses a first nMOS transistor Q m31 having a drain electrode connected to the clock signal supply line through a first delay element D m31 and a gate electrode connected to the output terminal of the previous bit-level cell M m2 through a second delay element D m32 ; a second nMOS transistor Q m32 having a drain electrode connected to a source electrode of the first nMOS transistor Q m31 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C m3 configured to store the information of the cell M m3 , connected in parallel with the second nMOS transistor Q m32 ,wherein an output node connecting the source electrode of the first nMOS transistor Q m31 and the drain electrode of the second nMOS transistor Q m32 serves as an output terminal of the cell M m3 , configured to deliver the signal stored in the capacitor C m3 to the
  • the n-th memory unit of word-size level is implemented by a vertical array of bit-level cell M 1n , M 2n , M 3n , ........, M m-1,n , M mn in the n-th column of the m * n matrix.
  • the n-th-column cell M 1n on the first row encompasses a first nMOS transistor Q 1n1 having a drain electrode connected to the clock signal supply line through a first delay element D 1n1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell M 1(n-1) through a second delay element D 1n2 ; a second nMOS transistor Q 1n2 having a drain electrode connected to a source electrode of the first nMOS transistor Q 1n1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 1n configured to store the information of the cell M 1n , connected in parallel with the second nMOS transistor Q 1n2 ,wherein an output node connecting the source electrode of the first nMOS transistor Q 1n1 and the drain electrode of the second nMOS transistor Q 1n2 serves as a bit-level output terminal of the cell M 1n , configured to deliver the signal stored in the capacitor C
  • the n-th-column cell M 2n on the second row encompasses a first nMOS transistor Q 2n1 having a drain electrode connected to the clock signal supply line through a first delay element D 2n1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell M 2(n-1) through a second delay element D 2n2 ; a second nMOS transistor Q 2n2 having a drain electrode connected to a source electrode of the first nMOS transistor Q 2n1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 2n configured to store the information of the cell M 2n , connected in parallel with the second nMOS transistor Q 2n2 ,wherein an output node connecting the source electrode of the first nMOS transistor Q 2n1 and the drain electrode of the second nMOS transistor Q 2n2 serves as a bit-level output terminal of the cell M 2n , configured to deliver the signal stored in the capacitor C
  • the n-th-column cell M 3n on the third row encompasses a first nMOS transistor Q 3n1 having a drain electrode connected to the clock signal supply line through a first delay element D 3n1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell M 3(n-1) through a second delay element D 3n2 ; a second nMOS transistor Q 3n2 having a drain electrode connected to a source electrode of the first nMOS transistor Q 3n1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C 3n configured to store the information of the cell M 3n , connected in parallel with the second nMOS transistor Q 3n2 , wherein an output node connecting the source electrode of the first nMOS transistor Q 3n1 and the drain electrode of the second nMOS transistor Q 3n2 serves as a bit-level output terminal of the cell M 3n , configured to deliver the signal stored in the capacitor C
  • the n-th-column cell M (m-1)n on the (m-1)-th row encompasses a first nMOS transistor Q (m-1)n1 having a drain electrode connected to the clock signal supply line through a first delay element D (m-1)n1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell M (m-1) (n-1) through a second delay element D (m-1)n2 ; a second nMOS transistor Q (m-1)n2 having a drain electrode connected to a source electrode of the first nMOS transistor Q (m-1)n1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C (m-1)n configured to store the information of the cell M (m-1)n , connected in parallel with the second nMOS transistor Q (m-1)n2 , wherein an output node connecting the source electrode of the first nMOS transistor Q (m-1)n1 and the drain
  • the n-th-column cell M mn on the m-th row encompasses a first nMOS transistor Q mn1 having a drain electrode connected to the clock signal supply line through a first delay element D mn1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell M m(n-1) through a second delay element D mn2 ; a second nMOS transistor Q mn2 having a drain electrode connected to a source electrode of the first nMOS transistor Q mn1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C mn configured to store the information of the cell M mn , connected in parallel with the second nMOS transistor Q mn2 ,wherein an output node connecting the source electrode of the first nMOS transistor Q mn1 and the drain electrode of the second nMOS transistor Q mn2 serves as a bit-level output terminal of the cell
  • a bit-level cell M ij of the j-th column and on the i-th row, in the representative 2 * 2 cell-array of the marching main memory used in the computer system pertaining to the first embodiment of the present invention encompasses a first nMOS transistor Q ij1 having a drain electrode connected to a clock signal supply line through a first delay element D ij1 and a gate electrode connected to the output terminal of the previous bit-level cell through a second delay element D ij2 ; a second nMOS transistor Q ij2 having a drain electrode connected to a source electrode of the first nMOS transistor Q ij1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C ij configured to store the information of the bit-level cell M ij , connected in parallel with the second nMOS transistor Q ij2 , wherein an output node connecting the source electrode of the first n
  • a column bit-level cell M i(j+1) of the (j+1)-th column and on the i-th row encompasses a first nMOS transistor Q i(j+1)1 having a drain electrode connected to clock signal supply line through a first delay element D i(j+1)1 and a gate electrode connected to the output terminal of the previous bit-level cell M ij through a second delay element D i(j+1)2 ; a second nMOS transistor Q i(j+1)2 having a drain electrode connected to a source electrode of the first nMOS transistor Q i(j+1)1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C i(j+1) configured to store the information of the bit-level cell M i(j+1) , connected in parallel with the second nMOS transistor Q i(j+1)2 , wherein an output node connecting the source electrode of the first nMOS transistor Q i(j
  • a bit-level cell M (i+1)j of the j-th column and on the (i+1)-th row encompasses a first nMOS transistor Q (i+1)j1 having a drain electrode connected to the clock signal supply line through a first delay element D (i+1)j1 and a gate electrode connected to the output terminal of the previous bit-level cell through a second delay element D (i+1)j2 ; a second nMOS transistor Q (i+1)j2 having a drain electrode connected to a source electrode of the first nMOS transistor Q (i+1)j1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C (i+1)j configured to store the information of the bit-level cell M (i+1)j , connected in parallel with the second nMOS transistor Q (i+1)j2 , wherein an output node connecting the source electrode of the first nMOS transistor Q (i+1)j1 and the drain electrode of the second n
  • a bit-level cell M (i+1)(j+1) of the (j+1)-th column and on the (i+1)-th row encompasses a first nMOS transistor Q (i+1)(j+1)1 having a drain electrode connected to the clock signal supply line through a first delay element D (i+1)(j+1)1 and a gate electrode connected to the output terminal of the previous bit-level cell M (i+1)j through a second delay element D (i+1)(j+1)2 ; a second nMOS transistor Q (i+1)(j+1)2 having a drain electrode connected to a source electrode of the first nMOS transistor Q (i+1)(j+1)1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C (i+1)(j+1) configured to store the information of the bit-level cell M (i+1)(j+1) , connected in parallel with the second nMOS transistor Q (i+1)(j+1)2
  • the j-th bit-level cell M ij on the i-th row encompasses a first nMOS transistor Q ij1 having a drain electrode connected to a clock signal supply line through a first delay element D ij1 and a gate electrode connected to the output terminal of the previous cell through a second delay element D ij2 ; a second nMOS transistor Q ij2 having a drain electrode connected to a source electrode of the first nMOS transistor Q ij1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C ij configured to store the information of the bit-level cell M ij , connected in parallel with the second nMOS transistor Q ij2 .
  • the second nMOS transistor Q ij2 serves as a reset-transistor configured to reset the signal charge stored in the capacitor C ij , when a clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Q ij2 , discharging the signal charge already stored in the capacitor C ij .
  • Figs. 7A and 7B illustrate a schematic example of the transistor-level responses of the bit-level cell M ij illustrated in Fig. 6, which is one of the bit-level cells used in the computer system pertaining to the first embodiment of the present invention, to a waveform of a clock signal illustrated by broken line.
  • the clock signal illustrated by broken line swings periodically between the logical levels of "1" and "0" with the clock period TAU(Greek-letter) clock .
  • the second ideal delay element D ij2 can achieve a delay of TAU clock /2 with very sharp leading edge, by which the rise time can be neglected.
  • the capacitor C ij can begin discharging at time "t 0 ", because the second nMOS transistor Q ij2 can become active as the reset-transistor with the clock signal of the high-level illustrated by the broken line applied to the gate electrode of the second nMOS transistor Q ij2 at time "t 0 ", if the operation of the second nMOS transistor Q ij2 has no delay.
  • the output node N out connecting the source electrode of the first nMOS transistor Q ij1 and the drain electrode of the second nMOS transistor Q ij2 serves as an output terminal of the bit-level cell M ij , and the output terminal of the bit-level cell M ij delivers the signal stored in the capacitor C ij to the next bit-level cell on the i-th row.
  • Fig. 7C illustrates an actual example of the response to the waveform of the clock signal, for a case that both of the first delay element D ij1 and the second delay element D ij2 are implemented by R-C delay circuit, as illustrated in Fig. 8.
  • the signal charge stored in the capacitor C ij is actually either of the logical level of "0" or"1", and if the signal charge stored in the capacitor C ij is of the logical level of "1", although the first nMOS transistor Q ij1 still keeps off-sate, the capacitor C ij can begin discharging at time "t 0 ", because the second nMOS transistor Q ij2 can become active when the clock signal of the high-level is applied to the gate electrode of the second nMOS transistor Q ij2 , if an ideal operation of the second nMOS transistor Q ij2 with no delay can be approximated.
  • the first nMOS transistor Q ij1 becomes active as a transfer-transistor, delayed by a predetermined delay time t d1 determined by the first delay element D ij1 implemented by the R-C delay circuit.
  • the first nMOS transistor Q ij1 transfers the signal stored in the previous bit-level cell M i(j-1) , further delayed by a predetermined delay time t d2 determined by the second delay element D ij2 to the capacitor C ij .
  • An output node N out connecting the source electrode of the first nMOS transistor Q ij1 and the drain electrode of the second nMOS transistor Q ij2 serves as an output terminal of the bit-level cell M ij , and the output terminal of the bit-level cell M ij delivers the signal stored in the capacitor C ij to the next bit-level cell on the i-th row.
  • the clock signal swings periodically between the logical levels of "1" and "0", with a predetermined clock period (clock cycle time) TAU clock , and when the clock signal becomes the logical level of "1", the second nMOS transistor Q ij2 begins to discharge the signal charge, which is already stored in the capacitor C ij at a previous clock cycle. And, after the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor C ij is completely discharged to the potential of the logical level of "0", the first nMOS transistor Q ij1 becomes active as the transfer-transistor, delayed by the predetermined delay time t d1 determined by the first delay element D ij1 .
  • the delay time t d1 may be set to be equal to 1/4TAU clock preferably. Thereafter, when the signal stored in the previous bit-level cell M i(j-1) on the i-th row is fed from the previous bit-level cell M i(j-1) to the gate electrode of the first nMOS transistor Q ij1 , the first nMOS transistor Q ij1 transfers the signal stored in the previous bit-level cell M i(j-1) , further delayed by the predetermined delay time t d2 determined by the second delay element D ij2 implemented by the R-C delay circuit to the capacitor C ij .
  • the first nMOS transistor Q ij1 becomes conductive state, and the logical level of "1" is stored in the capacitor C ij .
  • the bit-level cell M ij can establish "a marching AND-gate" operation.
  • the delay time t d2 shall be longer than the delay time t d1 , and the delay time t d2 may be set to be equal to 1/2TAU clock preferably.
  • the output node N out connecting the source electrode of the first nMOS transistor Q ij1 and the drain electrode of the second nMOS transistor Q ij2 , which is serving as the output terminal of the bit-level cell M ij can deliver the signal stored in the capacitor C ij to the next bit-level cell M i(j+1) at the next clock cycle.
  • each of the output nodes connecting the source electrodes of the first nMOS transistors Q 111 , Q 211 , Q 311 , ........, Q m-1,11 , Q m11 and the drain electrodes of the second nMOS transistors Q 112 , Q 212 , Q 312 , ........, Q m-1,12 , Q m12 cannot deliver the signals, which are entered to the gate electrodes of the first nMOS transistors Q 111 , Q 211 , Q 311 , ........, Q m-1,11 , Q m11 , further to the next bit-level cell M 12 , M 22 , M 32 , ........, M m-1,2 , M m2 at a time when time proceeds 1/2TAU clock , as each of the signals is blocked to be transferred to the gate electrodes of the next first nMOS transistors Q 121 , Q 221
  • each of the output nodes connecting the source electrodes of the first nMOS transistors Q 121 , Q 221 , Q 321 , ........, Q m-1,21 , Q m21 and the drain electrodes of the second nMOS transistors Q 122 , Q 222 , Q 322 , ........, Q m-1,22 , Q m2 cannot deliver the signals stored in the previous bit-level cell M 11 , M 21 , M 31 , ........, M m-1,1 , M m1 further to the next bit-level cell M 12 , M 22 , M 32 , ........, M m-1,2 , M m2 at a time when time proceeds (1+1/2)TAU clock , as each of the signals is blocked to be transferred to the gate electrode of the next first nMOS transistor Q 131 , Q 231 , Q 331 , ........,
  • a sequence of the second nMOS transistors Q 132 , Q 232 , Q 332 , ........, Q m-1,32 , Q m32 in the third memory unit U 3 begin to discharge the signal charges, respectively, which are already stored in the capacitors C 13 , C 23 , C 33 , ........, C m-1,3 , C m3 , respectively, in the third memory unit U 3 at the previous clock cycle.
  • each of the first delay element D ij1 and the second delay element D ij2 can be implemented by known "resistive-capacitive delay” or "R-C delay".
  • the RC circuit is mere example, and the first delay element D ij1 and the second delay element D ij2 .can be implemented by another passive delay elements, or various active delay element, which may include active element of transistor, etc.
  • Fig.9 illustrates an example of the top view of the actual planar pattern of the bit-level cell M ij of the j-th column and on the i-th row illustrated in Fig. 8, which has the first delay element D ij1 and the second delay element D ij2 implemented by the R-C delay circuit
  • Fig.10 illustrates the corresponding cross-sectional view taken on the line A-A of Fig.9.
  • the first delay element D ij1 is implemented by a first meandering line 91 of conductive wire
  • the second delay element D ij2 is implemented by a second meandering line 97 of conductive wire.
  • the first nMOS transistor Q ij1 has a drain electrode region 93 connected to the first meandering line 91 via a contact plug 96a.
  • the other end of the first meandering line 91 opposite to the end connected to the drain electrode region 93 of the first nMOS transistor Q ij1 is connected to the clock signal supply line.
  • the drain electrode region 93 is implemented by an n + semiconductor region.
  • a gate electrode of the first nMOS transistor Q ij1 is implemented by the second meandering line 97.
  • the other end of the second meandering line 97 opposite to the end serving as the gate electrode of the first nMOS transistor Q ij1 is connected to the output terminal of the previous cell.
  • the second nMOS transistor Q ij2 has a drain electrode region implemented by a common n + semiconductor region 94, which also serves as the source electrode region of the first nMOS transistor Q ij1 , a gate electrode 98 connected to the clock signal supply line via a contact plug 96a, and a source electrode region 95 connected to the ground potential via a contact plug 96a.
  • the source electrode region 95 is implemented by an n + semiconductor region.
  • the common n + semiconductor region 94 is the output node connecting the source electrode region of the first nMOS transistor Q ij1 and the drain electrode region of the second nMOS transistor Q ij2 , the common n + semiconductor region 94 is connected to a surface wiring 92b via a contact plug 96d.
  • the common n + semiconductor region 94 serves as the output terminal of the bit-level cell M ij , and delivers the signal stored in the capacitor C ij to the next bit-level cell through the surface wiring 92b.
  • the drain electrode region 93, the common n + semiconductor region 94, and the source electrode region 95 is provided at the surface of and in the upper portion of the p-type semiconductor substrate 81.
  • the drain electrode region 93, the common n + semiconductor region 94, and the source electrode region 95 can be provided in the upper portion of the p-well, or p-type epitaxial layer grown on a semiconductor substrate.
  • an element isolation insulator 82 is provided so as to define an active area of the p-type semiconductor substrate 81 as a window provided in the element isolation insulator 82.
  • drain electrode region 93, the common n + semiconductor region 94, and the source electrode region 95 is provided in the active area, surrounded by the element isolation insulator 82.
  • a gate insulating film 83 is provided at the surface of and on the active area.
  • the gate electrode of the first nMOS transistor Q ij1 implemented by the second meandering line 97 and the gate electrode 98 of the second nMOS transistor Q ij2 are provided on the gate insulating film 83.
  • a first interlayer dielectric film 84 is provided on the second meandering line 97 and the gate electrode 98.
  • a bottom electrode 85 of the capacitor C ij configured to store the information of the bit-level cell M ij is provided.
  • the bottom electrode 85 is made of conducting film, and a contact plug 96c is provided in the first interlayer dielectric film 84 so as to connect between the bottom electrode 85 and the source electrode region 95.
  • a capacitor insulating film 86 is provided on the bottom electrode 85.
  • a top electrode 87 of the capacitor C ij is provided so as to occupy an upper portion of the bottom electrode 85.
  • the top electrode 87 is made of conducting film. Although the illustration is omitted in the cross-sectional view illustrated in Fig. 10, the top electrode 87 is electrically connected to the common n + semiconductor region 94 so as to establish an electric circuit topology that the capacitor C ij is connected in parallel with the second nMOS transistor Q ij2 .
  • a variety of insulator films may be used as the capacitor insulating film 86.
  • the miniaturized marching main memory may be required to occupy a small area of the bottom electrode 85 opposing the top electrode 87.
  • the capacitance between the bottom electrode 85 and the top electrode 87 via the capacitor insulating film 86 needs to maintain a constant value.
  • usage of a material with a dielectric constant e r greater than that of a silicon oxide (SiO 2 ) film is preferred, considering the storage capacitance between the bottom electrode 85 and the top electrode 87.
  • SiO 2 silicon oxide
  • the ratio in thickness of the upper layer silicon oxide film, the middle layer silicon nitride film, and the underlayer silicon oxide film is selectable, however, a dielectric constant e r of approximately 5 to 5.5 can be provided.
  • a strontium oxide (SrO) film with e r 6,
  • Ta 2 O 5 and Bi 2 O 3 show disadvantages in lacking thermal stability at the interface with the polysilicon.
  • it may be a composite film made from a silicon oxide film and these films.
  • the composite film may have a stacked structure of triple-levels or more.
  • it should be an insulating film containing a material with the relative dielectric constant e r of 5 to 6 or greater in at least a portion thereof.
  • selecting a combination that results in having an effective relative dielectric constant e reff of 5 to 6 or greater measured for the entire film is preferred.
  • it may also be an insulating film made from an oxide film of a ternary compound such as a hafnium aluminate (HfAlO) film.
  • HfAlO hafnium aluminate
  • a second interlayer dielectric film 87 is provided on the top electrode 87.
  • the first meandering line 91 is provided on second interlayer dielectric film 87.
  • the contact plug 96a is provided, penetrating the first interlayer dielectric film 84, the capacitor insulating film 86 and the second interlayer dielectric film 87 so as to connect between the first meandering line 91 and the drain electrode region 93
  • the capacitance C of the R-C delay is implemented by the stray capacitance associated with the first meandering line 91 and the second meandering line 97. Because both R and C are proportional to wire lengths of the first meandering line 91 and the second meandering line 97, the delay times t d1 , t d2 can be easily designed by electing the wire lengths of the first meandering line 91 and the second meandering line 97. Furthermore, we can design the thickness, the cross section, or the resistivity of the first meandering line 91 and the second meandering line 97 to as to achieve desired value of the delay times t d1 , t d2 .
  • the wire lengths of the first meandering line 91 and the second meandering line 97 are determined depending on the resistivities of the first meandering line 91 and the second meandering line 97 so as to achieve the required values of the delay times t d1 , t d2 .
  • first meandering line 91 and the second meandering line 97 are illustrated in Fig.9, the illustrated meandering topology for resistor R is mere example, and other topologies such as a straight line configuration can be used depending upon the required values of resistor R and capacitance C.
  • the delineation of extrinsic resistor elements R can be omitted, if parasitic resistance (stray resistance) and parasitic capacitance (stray capacitance) can achieve the required delay times t d1 , t d2 .
  • the propagation delay is mainly ascribable to the value of the second delay element D ij2 , it is preferable to insert an inter-unit cell B ij between the (j-1)-th bit-level cell M ij-1 and the j-th bit-level cell M ij , as illustrated in Figs. 11 and 13.
  • the inter-unit cell B ij is provided so as to isolate the signal-storage state of the j-th bit-level cell M ij in the j-th memory unit U j from the signal-storage state of the (j-1)-th bit-level cell M ij-1 in the (j-1)-th memory unit U j-1 , the inter-unit cell B ij transfers a signal from the (j-1)-th bit-level cell M ij-1 to the j-th bit-level cell M ij at a required timing determined by a clock signal, which is supplied through the clock signal supply line.
  • a sequence of inter-unit cells arrayed in parallel with the memory units U j-1 and U j transfers the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line so that the information of byte size or word size can march along a predetermined direction, pari passu. As illustrated in Figs.
  • the inter-unit cell B ij which encompasses a single isolation transistor Q ij3 having a first main-electrode connected to the output terminal of the (j-1)-th bit-level cell M ij , a second main-electrode connected to the input terminal of the j-th bit-level cell M ij and a control electrode connected to the clock signal supply line
  • the structure of the inter-unit cell B ij is not limited to the configuration illustrated in Figs. 11 and 13.
  • the inter-unit cell B ij may be implemented by a clocked-circuit having a plurality of transistors, which can transfer the signal from the (j-1)-th bit-level cell M ij-1 to the j-th bit-level cell M ij at the required timing determined by the clock signal.
  • the j-th bit-level cell M ij encompasses the first nMOS transistor Q ij1 having the drain electrode connected to the clock signal supply line through the first delay element D ij1 and the gate electrode connected to the inter-unit cell B ij through the second delay element D ij2 ; the second nMOS transistor Q ij2 having the drain electrode connected to the source electrode of the first nMOS transistor Q ij1 , the gate electrode connected to the clock signal supply line, and the source electrode connected to the ground potential; and the capacitor C ij configured to store the information of the bit-level cell M ij , connected in parallel with the second nMOS transistor Q ij2 .
  • FIG. 12 An example of planar structure of the inter-unit cell B ij , encompassing a single isolation transistor Q ij3 of nMOS transistor is illustrated in Fig. 12, in addition to the configuration of the bit-level cell M ij , which are already illustrated in Fig. 9.
  • the first nMOS transistor Q ij1 having the drain electrode region 93, the first meandering line 91 connected to the drain electrode region 93 via a contact plug 96a, the second meandering line 97 implementing the gate electrode of the first nMOS transistor Q ij1 , and the second nMOS transistor Q ij3 having the drain electrode region implemented by the common n + semiconductor region 94, serving as the output terminal of the bit-level cell M ij are illustrated.
  • the isolation transistor Q ij3 of the inter-unit cell B ij has a first main-electrode region implemented by a left side of an n + semiconductor region 90, a gate electrode 99 connected to the clock signal supply line, and a second main-electrode region implemented by a right side of the n + semiconductor region 90.
  • the second main-electrode region is connected to one end of the second meandering line 97 opposite to the other end of the second meandering line 97, which serves as the gate electrode of the first nMOS transistor Q ij1 via a contact plug 96e, and first main-electrode region is connected to the output terminal of the previous cell M ij-1 via a contact plug 96f.
  • a parallel plate structure of the capacitor C ij configured to store the information of the bit-level cell M ij may be provided, being connected in parallel with the second nMOS transistor Q ij2 .
  • another inter-unit cell B i(j-1) is provided between the (j-2)-th bit-level cell M i(j-2) and the (j-1)-th bit-level cell M i(j-1) ,configured to isolate the signal-storage state of the (j-1)-th bit-level cell M i(j-1) in the (j-1)-th memory unit U j-1 from the signal-storage state of the (j-2)-th bit-level cell M i(j-2) in the (j-2)-th memory unit U j-2 , and to transfer a signal from the (j-2)-th bit-level cell M i(j-2) to the (j-1)-th bit-level cell M i(j-1) at the required timing determined by the clock signal, which is supplied through the clock signal supply line.
  • the inter-unit cell B i(j-1) which encompasses a single isolation transistor Q i(j-1)3 having a first main-electrode connected to the output terminal of the (j-2)-th bit-level cell M i(j-1) , a second main-electrode connected to the input terminal of the (j-1)-th bit-level cell M i(j-1) and a control electrode connected to the clock signal supply line
  • the structure of the inter-unit cell B i(j-1) is not limited to the configuration illustrated in Fig.
  • the inter-unit cell B i(j-1) may be implemented by a clocked-circuit having a plurality of transistors, which can transfer the signal from the (j-2)-th bit-level cell M i(j-2) to the (j-1)-th bit-level cell M i(j-1) at the required timing determined by the clock signal.
  • the (j-1)-th bit-level cell M i(j-1) encompasses a first nMOS transistor Q i(j-1)1 having a drain electrode connected to the clock signal supply line through a first delay element D i(j-1)1 and a gate electrode connected to the inter-unit cell B i(j-1) through a second delay element D i(j-1)2 ; a second nMOS transistor Q i(j-1)2 having a drain electrode connected to the source electrode of the first nMOS transistor Q i(j-1)1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C i(j-1) configured to store the information of the bit-level cell M i(j-1) , connected in parallel with the second nMOS transistor Q i(j-1)2 .
  • the second nMOS transistor Q ij2 of the bit-level cell M ij serves as a reset-transistor configured to reset the signal charge stored in the capacitor C ij , when the clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Q ij2 , discharging the signal charge already stored in the capacitor C ij
  • the second nMOS transistor Q i(j-1)2 of the bit-level cell M i(j-1) serves as a reset-transistor configured to reset the signal charge stored in the capacitor C i(j-1) , when the clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Q i(j-1)2 , discharging the signal charge already stored in the capacitor C i(j-1) .
  • the isolation transistors Q i(j-1)3 and Q ij3 may be pMOS transistors, which can operate complementary with the second nMOS transistors Q i(j-1)2 and Q ij2 , although Figs. 11 and 13 represent the transistor symbol of an nMOS transistor as the isolation transistors Q i(j-1)3 and Q ij3 .
  • the isolation transistors Q i(j-1)3 and Q ij3 shall be cut-off state so as to establish the isolation between the memory units, and when the second nMOS transistors Q i(j-1)2 and Q ij2 are cut-off state, the isolation transistors Q i(j-1)3 and Q ij3 shall be conductive state so as to transfer the signal charges between the memory units.
  • the isolation transistors Q i(j-1)3 and Q ij3 are nMOS transistors, as the transistor symbol illustrates in Figs. 11 and 13, the isolation transistors Q i(j-1)3 and Q ij3 shall be high-speed transistors having a shorter rise time, a shorter period of conductive state, and a shorter fall time than the second nMOS transistors Q i(j-1)2 and Q ij2 , which have larger stray capacitances and larger stray resistances associated with gate circuits and gate structures so that, when the second nMOS transistors Q i(j-1)2 and Q ij2 are still in the cut-off state, the isolation transistors Q i(j-1)3 and Q ij3 becomes the conductive state very rapidly so as to transfer the signal charges between the memory units, and when the second nMOS transistors Q i(j-1)2 and Q ij2 start slowly toward the conductive state for discharging the signal charge stored in the
  • a normally off type MOS static induction transistor can be used, which represents triode-like I-V characteristic.
  • N-channel MOSSIT can be considered as an extreme ultimate structure of the short channel nMOSFET. Owing to the triode-like I-V characteristic, because the on-state of the MOSSIT depends both on a gate voltage and a potential deference between the first and second main-electrodes, a very short time interval of the on-state can be achieved.
  • any normally off type switching devices such as a tunneling SIT, which represent a very short on-state period like Dirac delta function, can be used.
  • Fig. 14(a) illustrates a timing diagram of a response of the bit-level cell M i(j-1) illustrated in Fig. 13, and Fig. 14(b) illustrates a next timing diagram of a next response of the next bit-level cell M ij illustrated in Fig. 13, to a waveform of a clock signal.
  • Figs. 14(b) illustrates a timing diagram of a response of the bit-level cell M i(j-1) illustrated in Fig. 13
  • Fig. 14(b) illustrates a next timing diagram of a next response of the next bit-level cell M ij illustrated in Fig. 13, to a waveform of a clock signal.
  • the clock signal is supposed to swing periodically between the logical levels of "1" and "0" with the clock period TAU(Greek-letter) clock
  • the shaded rectangular area with backward diagonals illustrates a regime for a reset timing of the signal charges stored in the capacitors C i(j-1) and C ij , respectively
  • the shaded rectangular area with forward diagonals illustrates a regime for a charge-transfer timing of the signal charges to the capacitors C i(j-1) and C ij , respectively.
  • the signal charges stored in the capacitor C i(j-1) is of the logical level of "1"
  • the first nMOS transistor Q i(j-1)1 still keeps off-sate
  • the signal charge stored in the capacitor C i(j-1) is being driven to be discharging, in the shaded rectangular area with backward diagonals.
  • the first nMOS transistor Q i(j-1)1 becomes active as a transfer-transistor, delayed by a predetermined delay time t d1 determined by the first delay element D i(j-1)1 implemented by the R-C delay circuit.
  • the first nMOS transistor Q i(j-1)1 transfers the signal stored in the previous bit-level cell M i(j-2) , further delayed by a predetermined delay time t d2 determined by the second delay element D i(j-1)2 to the capacitor C i(j-1) in the shaded rectangular area with forward diagonals.
  • the signal charges stored in the capacitor is of the logical level of "1"
  • the first nMOS transistor Q ij1 still keeps off-sate
  • the signal charge stored in the capacitor C ij is being driven to be discharging, in the shaded rectangular area with backward diagonals.
  • the first nMOS transistor Q ij1 becomes active as a transfer-transistor, delayed by a predetermined delay time t d1 determined by the first delay element D ij1 implemented by the R-C delay circuit.
  • the first nMOS transistor Q ij1 transfers the signal stored in the previous bit-level cell M i(j-1) , further delayed by a predetermined delay time t d2 determined by the second delay element D ij2 to the capacitor C ij in the shaded rectangular area with forward diagonals
  • Fig. 15 illustrates a more detailed response of the bit-level cell M i(j-1) illustrated in Fig. 13, which is one of the bit-level cells used in the computer system pertaining to the first embodiment of the present invention, to the waveform of the clock signal illustrated by thin solid line, for a case that both of the first delay element D i(j-1)1 and the second delay element D i(j-1)2 are implemented by R-C delay circuit, as illustrated in Fig. 12.
  • the clock signal illustrated by thin solid line swings periodically between the logical levels of "1" and "0" with the clock period TAU clock .
  • the signal charge stored in the capacitor C i(j-1) is actually either of the logical level of "0" or"1", as illustrated in Figs. 16 (a)-(d). If the signal charge stored in the capacitor C i(j-1) is of the logical level of "1", as illustrated in Figs.
  • the capacitor C i(j-1) can begin discharging at the beginning of the time intervalTAU 1 , because the second nMOS transistor Q i(j-1)2 becomes active when the clock signal of the high-level is applied to the gate electrode of the second nMOS transistor Q i(j-1)2 , under the assumption that an ideal operation of the second nMOS transistor Q i(j-1)2 with no delay can be approximated.
  • the signal charge stored in the capacitor C i(j-1) is actually of the logical level of "1"
  • the clock signal of high-level has been applied to the gate electrode of the second nMOS transistor Q i(j-1)2 , as illustrated by the thin solid line in Fig. 15, and the signal charge stored in the capacitor C i(j-1) will be discharged, and thereafter, the first nMOS transistor Q i(j-1)1 becomes active as a transfer-transistor, delayed by a predetermined delay time t d1 determined by the first delay element D i(j-1)1 implemented by the R-C delay circuit.
  • the change of the potential at the drain electrode of the first nMOS transistor Q i(j-1)1 is illustrated by dash-dotted line.
  • An output node N out connecting the source electrode of the first nMOS transistor Q i(j-1)1 and the drain electrode of the second nMOS transistor Q i(j-1)2 serves as an output terminal of the bit-level cell M i(j-1) , and the output terminal delivers the signal stored in the capacitor C i(j-1) to the next bit-level cell on the i-th row.
  • the second nMOS transistor Q i(j-1)2 begins to discharge the signal charge, which is already stored in the capacitor C i(j-1) at a previous clock cycle. And, after the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor C i(j-1) is completely discharged to the potential of the logical level of "0", the first nMOS transistor Q i(j-1)1 becomes active as the transfer-transistor, delayed by the predetermined delay time t d1 determined by the first delay element D i(j-1)1 .
  • the first nMOS transistor Q i(j-1)1 transfers the signal stored in the previous bit-level cell M i(j-2) , further delayed by the predetermined delay time t d2 determined by the second delay element D i(j-1)2 implemented by the R-C delay circuit to the capacitor C i(j-1) .
  • the first nMOS transistor Q i(j-1)1 becomes conductive state at the beginning of the time intervalTAU 3 , and the logical level of "1" is stored in the capacitor C i(j-1) .
  • the bit-level cell M i(j-1) can establish "a marching AND-gate" operation.
  • the delay time t d2 shall be longer than the delay time t d1 , and the delay time t d2 may be set to be equal to 1/2TAU clock preferably.
  • the clock signal swings periodically between the logical levels of "1" and "0", with the clock period TAU clock as illustrated by the thin solid line, then, the clock signal becomes the logical level of "0" as time proceeds by 1/2TAU clock , or at the beginning of the time intervalTAU 3 , the potential at the drain electrode of the first nMOS transistor Q i(j-1)1 begins to decay as illustrated by the dash-dotted line.
  • the path between the output terminal of the current bit-level cell M i(j-1) and the gate electrode of the first nMOS transistor Q ij1 of the next bit-level cell M ij becomes the cut-off state by the logical level of "0" of the clock signal being applied to the gate electrode of the nMOS transistor, and therefore, the output node N out connecting the source electrode of the first nMOS transistor Q i(j-1)1 and the drain electrode of the second nMOS transistor Q i(j-1)2 cannot deliver the signal transferred from the previous bit-level cell M i(j-2) further to the next bit-level cell M ij like duckpins in the time intervalsTAU 3 andTAU 4 , and the signal is blocked to be domino transferred to the gate electrode of the next first nMOS transistor
  • the potential at the output node N out is kept in a floating state, and the signal states stored in the capacitor C i(j-1) are held.
  • the output node N out connecting the source electrode of the first nMOS transistor Q i(j-1)1 and the drain electrode of the second nMOS transistor Q i(j-1)2 , which is serving as the output terminal of the bit-level cell M i(j-1) can deliver the signal stored in the capacitor C i(j-1) to the next bit-level cell M ij at the next clock cycle because the inter-unit cell B ij becomes conductive state, and the potential at the drain electrode of the first nMOS transistor Q i(j-1)1 increase as illustrated by the dash-dotted line.
  • Figs. 16 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell M ij illustrated in Figs. 11 and 13, the bit-level cell M ij is one of the bit-level cells arrayed sequentially in the j-th memory unit U j , the j-th memory unit U j stores information of byte size or word size by the sequence of bit-level cells arrayed sequentially in the j-th memory unit U j .
  • the information of byte size or word size arrayed sequentially marches side by side from a previous memory unit to a next memory unit, pari passu.
  • Figs. 16 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell M ij illustrated in Figs. 11 and 13
  • the bit-level cell M ij is one of the bit-level cells arrayed sequentially in the j-th memory unit U j
  • the clock signal is supplied by the clock signal supply line CLOCK so as to swing periodically between the logical levels of "1" and “0” with the clock period TAU clock , while the clock signal supply line CLOCK serves as a power supply line.
  • Figs. 16(a) and (b) illustrate the cases when the logical level of "0" is stored by previous clock signal into the capacitor C ij
  • Figs. 16 (c) and (d) illustrate the cases when the logical level of "1" is stored by previous clock signal into the capacitor C ij as one of the signal in the information of byte size or word size. As illustrated in Fig.
  • the output node N out connecting the source electrode of the first nMOS transistor Q ij1 and the drain electrode of the second nMOS transistor Q ij2 delivers the signal level of "0", which is maintained in the capacitor C ij , to the next bit-level cell on the i-th
  • the j-th bit-level cell M ij encompasses the first nMOS transistor Q ij1 having the drain electrode connected to the clock signal supply line through the first delay element D ij1 and the gate electrode connected to the inter-unit cell B ij through the second delay element D ij2 ; the second nMOS transistor Q ij2 having the drain electrode connected to the source electrode of the first nMOS transistor Q ij1 , the gate electrode connected to the clock signal supply line, and the source electrode connected to the ground potential; and the capacitor C ij configured to store the information of the bit-level cell M ij , connected in parallel with the second nMOS transistor Q ij2 , the features such that the first delay element D ij1 is implemented by a first diode D
  • any p-n junction diode can be represented by an equivalent circuit encompassing resistors including the series resistance such as the diffusion resistance, the lead resistance, the ohmic contact resistance and the spreading resistance, etc., and capacitors including the diode capacitance such as the junction capacitance or the diffusion capacitance, and a single diode or a tandem connection of diodes can serve as "resistive-capacitive delay" or "R-C delay", because the value of "R-C delay" can be made much smaller than the values achieved by the specialized and dedicated R-C elements such as the first meandering line 91 and the second meandering line 97 illustrated in Figs.
  • the operation of the j-th bit-level cell M ij with the inter-unit cell B ij illustrated in Fig. 17 can achieve more preferable operation than the operation achieved by the configuration illustrated in Fig. 12. That is, the operation of the j-th bit-level cell M ij with the inter-unit cell B ij illustrated in Fig. 17 can approaches to an ideal delay performance illustrated in Figs. 7A and 7B, in which any rise time and fall time are not illustrated, and wave forms of the pulses are illustrated by ideal rectangular shape. In addition to the performance by the configuration illustrated in Figs.
  • the configuration implemented by a combination of the j-th bit-level cell M ij with the inter-unit cell B ij illustrated in Fig. 17 can achieve a better isolation between the signal-storage state of the (j-1)-th bit-level cell M i(j-1) and the signal-storage state of the j-th bit-level cell M ij , even if the signal of the lower logical level of "0" stored in the previous bit-level cell M i(j-1) is fed to the gate electrode of the first nMOS transistor Q ij1 through the inter-unit cell B ij .
  • another inter-unit cell B i(j-1) is provided between the (j-2)-th bit-level cell M i(j-2) and the (j-1)-th bit-level cell M i(j-1) ,configured to isolate the signal-storage state of the (j-1)-th bit-level cell M i(j-1) in the (j-1)-th memory unit U j-1 from the signal-storage state of the (j-2)-th bit-level cell M i(j-2) in the (j-2)-th memory unit U j-2 , and to transfer a signal from the (j-2)-th bit-level cell M i(j-2) to the (j-1)-th bit-level cell M i(j-1) at the required timing determined by the clock signal, which is supplied through the clock signal supply line.
  • the (j-1)-th bit-level cell M i(j-1) encompasses a first nMOS transistor Q i(j-1)1 having a drain electrode connected to the clock signal supply line through a first delay element D i(j-1)1 and a gate electrode connected to the inter-unit cell B i(j-1) through a second delay element D i(j-1)2 ; a second nMOS transistor Q i(j-1)2 having a drain electrode connected to the source electrode of the first nMOS transistor Q i(j-1)1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C i(j-1) configured to store the information of the bit-level cell M i(j-1) , connected in parallel with the second nMOS transistor Q i(j-1)2 .
  • the first delay element D i(i-1)1 is implemented by a first diode D 1b
  • the second delay element D i(i-1)2 is implemented by a tandem connection of a second diode D 2b and a third diode D 3b .
  • the operation of the (j-1)-th bit-level cell M i(i-1) with the inter-unit cell B i(i-1) illustrated in Fig. 18 is substantially same as the operation of the configuration illustrated in Fig. 13.
  • the tandem connection of the second diode D 2b and the third diode D 3b can block efficiently the flow of the reverse-directional current, the configuration implemented by a combination of the (j-1)-th bit-level cell M i(i-1) with the inter-unit cell B i(i-1) illustrated in Fig.
  • the delineation of extrinsic resistor elements and capacitor elements can be omitted, if the parasitic resistances and the parasitic capacitances can achieve the required delay times t d1 , t d2 compared with operation speed of the marching main memory. Therefore, in the configuration illustrated in Figs. 11-13 and 16, the first delay elements D i(j-1)1 and D ij1 can be omitted, as illustrated in Figs. 19, 20 and 22.
  • bit-level cells used in the computer system pertaining to the first embodiment of the present invention illustrated in Fig. 19 although the j-th bit-level cell M ij encompasses a first nMOS transistor Q ij1 , similar to the configuration illustrated in Fig. 11, the first nMOS transistor Q ij1 has a drain electrode directly connected to the clock signal supply line, and the first delay element D ij1 employed in the configuration illustrated in Fig. 11 is omitted.
  • the feature that the first nMOS transistor Q ij1 has a gate electrode connected to the inter-unit cell B ij through a signal-delay element D ij , which corresponds to the second delay element D ij2 illustrated in Fig.
  • the second nMOS transistor Q ij2 has a drain electrode connected to a source electrode of the first nMOS transistor Q ij1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential, and a capacitor C ij configured to store the information of the bit-level cell M ij , connected in parallel with the second nMOS transistor Q ij2 is substantially same as the configuration illustrated in Fig. 11.
  • the inter-unit cell B ij is further provided so as to isolate the signal-storage state of the j-th bit-level cell M ij in the j-th memory unit U j from the signal-storage state of the (j-1)-th bit-level cell M ij-1 in the (j-1)-th memory unit U j-1 . Furthermore, the inter-unit cell B ij transfers a signal from the (j-1)-th bit-level cell M ij-1 to the j-th bit-level cell M ij at a required timing determined by a clock signal, which is supplied through the clock signal supply line.
  • the j-th memory unit U j stores information of byte size or word size by the sequence of bit-level cells arrayed in the j-th memory unit U j
  • the (j-1)-th memory unit U j-1 stores information of byte size or word size by the sequence of bit-level cells arrayed in the (j-1)-th memory unit U j-1
  • a sequence of inter-unit cells arrayed in parallel with the memory units U j-1 and U j transfers the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line so that the information of byte size or word size can march along a predetermined direction, pari passu.
  • another inter-unit cell B i(j-1) is provided between the (j-2)-th bit-level cell M i(j-2) and the (j-1)-th bit-level cell M i(j-1) ,configured to isolate the signal-storage state of the (j-1)-th bit-level cell M i(j-1) in the (j-1)-th memory unit U j-1 from the signal-storage state of the (j-2)-th bit-level cell M i(j-2) in the (j-2)-th memory unit U j-2 , and to transfer a signal from the (j-2)-th bit-level cell M i(j-2) to the (j-1)-th bit-level cell M i(j-1) at the required timing determined by the clock signal, which is supplied through the clock signal supply line.
  • the (j-1)-th bit-level cell M i(j-1) encompasses a first nMOS transistor Q i(j-1)1 having a drain electrode directly connected to the clock signal supply line and a gate electrode connected to the inter-unit cell B i(j-1) through a signal-delay element D i(j-1) ; a second nMOS transistor Q i(j-1)2 having a drain electrode connected to the source electrode of the first nMOS transistor Q i(j-1)1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C i(j-1) configured to store the information of the bit-level cell M i(j-1) , connected in parallel with the second nMOS transistor Q i(j-1)2 .
  • the second nMOS transistor Q ij2 of the bit-level cell M ij serves as a reset-transistor configured to reset the signal charge stored in the capacitor C ij , when the clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Q ij2 , discharging the signal charge already stored in the capacitor C ij
  • the second nMOS transistor Q i(j-1)2 of the bit-level cell M i(j-1) serves as a reset-transistor configured to reset the signal charge stored in the capacitor C i(j-1) , when the clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Q i(j-1)2 , discharging the signal charge already stored in the capacitor C i(j-
  • the isolation transistors Q i(j-1)3 and Q ij3 shall be high-speed transistors having a shorter rise time, a shorter period of conductive state, and a shorter fall time than the second nMOS transistors Q i(j-1)2 and Q ij2 , which have larger stray capacitances and larger stray resistances associated with gate circuits and gate structures so that, when the second nMOS transistors Q i(j-1)2 and Q ij2 are still in the cut-off state, the isolation transistors Q i(j-1)3 and Q ij3 becomes the conductive state very rapidly so as to transfer the signal charges between the memory units, and when the second nMOS transistors Q i(j-1)2 and Q ij2 start slowly toward the conductive state for discharging the signal charge stored in the capacitors C i(j-1) and C ij , the isolation transistors Q i(j-1)3 and Q ij3 proceeds to become
  • Fig. 21 illustrates a detailed response of the bit-level cell M i(j-1) illustrated in Fig. 20, which is one of other examples of the bit-level cells used in the computer system pertaining to the first embodiment of the present invention, to the waveform of the clock signal illustrated by thin solid line, for a case that the signal-delay element D i(j-1) is implemented by R-C delay circuit.
  • the clock signal illustrated by thin solid line swings periodically between the logical levels of "1" and "0" with the clock period TAU clock .
  • the signal charge stored in the capacitor C i(j-1) is actually either of the logical level of "0" or"1", as illustrated in Figs. 22 (a)-(d). If the signal charge stored in the capacitor C i(j-1) is of the logical level of "1", as illustrated in Figs.
  • the capacitor C i(j-1) can begin discharging at the beginning of the time intervalTAU 1 , because the second nMOS transistor Q i(j-1)2 becomes active rapidly when the clock signal of the high-level is applied to the gate electrode of the second nMOS transistor Q i(j-1)2 , under the assumption that an ideal operation of the second nMOS transistor Q i(j-1)2 with no delay can be approximated.
  • the signal charge stored in the capacitor C i(j-1) is actually of the logical level of "1"
  • the clock signal of high-level has been applied to the gate electrode of the second nMOS transistor Q i(j-1)2 , as illustrated by the thin solid line in Fig. 21, and the signal charge stored in the capacitor C i(j-1) will be discharged to the logical level of "0", and at the same time approximately, the first nMOS transistor Q i(j-1)1 is prepared to be active as a transfer-transistor, delayed by a negligibly-short delay time determined by parasitic elements implemented by stray resistance and stray capacitance.
  • the change of the potential at the drain electrode of the first nMOS transistor Q i(j-1)1 is illustrated exaggeratingly by dash-dotted line.
  • the clock signal swings periodically between the logical levels of "1" and "0", with the clock period TAU clock as illustrated by the thin solid line, then, the clock signal becomes the logical level of "0" as time proceeds by 1/2TAU clock , or at the beginning of the time intervalTAU 3 , the potential at the drain electrode of the first nMOS transistor Q i(j-1)1 begins to decay rapidly as illustrated exaggeratingly by the dash-dotted line.
  • the path between the output terminal of the current bit-level cell M i(j-1) and the gate electrode of the first nMOS transistor Q ij1 of the next bit-level cell M ij becomes the cut-off state by the logical level of "0" of the clock signal being applied to the gate electrode of the nMOS transistor, and therefore, the output node N out cannot deliver the signal transferred from the previous bit-level cell M i(j-2) further to the next bit-level cell M ij like duckpins in the time intervalsTAU 3 andTAU 4 , and the signal is blocked to be domino transferred to the gate electrode of the next first nMOS transistor Q ij1 .
  • the potential at the output node N out is kept in a floating state, and the signal states stored in the capacitor C i(j-1) are held.
  • the output node N out connecting the source electrode of the first nMOS transistor Q i(j-1)1 and the drain electrode of the second nMOS transistor Q i(j-1)2 , which is serving as the output terminal of the bit-level cell M i(j-1) can deliver the signal stored in the capacitor C i(j-1) to the next bit-level cell M ij at the next clock cycle because the inter-unit cell B ij becomes conductive state, and the potential at the drain electrode of the first nMOS transistor Q i(j-1)1 increase as illustrated exaggeratingly by the dash-dotted line.
  • Figs. 22 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell M ij illustrated in Figs. 19 and 20, the bit-level cell M ij is one of the bit-level cells arrayed sequentially in the j-th memory unit U j , the j-th memory unit U j stores information of byte size or word size by the sequence of bit-level cells arrayed sequentially in the j-th memory unit U j .
  • the information of byte size or word size arrayed sequentially marches side by side from a previous memory unit to a next memory unit, pari passu.
  • Figs. 22 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell M ij illustrated in Figs. 19 and 20, the bit-level cell M ij is one of the bit-level cells arrayed sequentially in the j-th memory unit U j , the j-th memory unit U j
  • the clock signal is supplied by the clock signal supply line CLOCK so as to swing periodically between the logical levels of "1" and “0" with the clock period TAU clock , while the clock signal supply line CLOCK serves as a power supply line.
  • Figs. 22(a) and (b) illustrate the cases when the logical level of "0" is stored by previous clock signal into the capacitor C ij
  • Figs. 22 (c) and (d) illustrate the cases when the logical level of "1" is stored by previous clock signal into the capacitor C ij as one of the signal in the information of byte size or word size. As illustrated in Fig.
  • the first nMOS transistor Q i(j-1)1 transfers equivalently the logical level of "0" to the capacitor C ij . Then, the output node N out delivers the signal level of "0", which is maintained in the capacitor C ij , to the next bit-level cell as illustrated in Fig. 22(a).
  • a first cell M 11 allocated at the leftmost side on a first row and connected to an input terminal I 1 encompasses a capacitor C 11 configured to store the information, and a marching AND-gate G 11 having one input terminal connected to the capacitor C 11 , the other input configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G 21 assigned to the adjacent second cell M 21 on the first row.
  • An example of the response to the waveform of the clock signal is illustrated in Fig. 7C.
  • the second cell M 12 on the first row of the gate-level representation of cell array implementing the marching main memory 31 encompasses the capacitor C 12 and a marching AND-gate G 12 , which has one input terminal connected to the capacitor C 12 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G 13 assigned to the adjacent third cell M 13 on the first row.
  • the third cell M 13 on the first row of the gate-level representation of cell array implementing the marching main memory 31 encompasses a capacitor C 13 configured to store the information, and a marching AND-gate G 13 having one input terminal connected to the capacitor C 13 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate assigned to the adjacent fourth cell, although the illustration of the fourth cell is omitted.
  • the information stored in the capacitor C 12 is transferred to the capacitor C 13 , assigned to the third cell M 13 , and the capacitor C 13 stores the information, and when the logical values of "1" is fed to the other input terminal of the marching AND-gate G 13 , the information stored in the capacitor C 13 is transferred to the capacitor, assigned to the fourth cell.
  • a (n-1)-th cell M 1, n-1 on the first row of the gate-level representation of cell array implementing the marching main memory 31 encompasses a capacitor C 1, n-1 configured to store the information, and a marching AND-gate G 1, n-1 having one input terminal connected to the capacitor C 1, n-1 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G 1n assigned to the adjacent n-th cell M 1n , which is allocated at the rightmost side on the first row and connected to an output terminal O 1 .
  • each of the cells M 11 , M 12 , M 13 ,........., M 1, n-1 , M 1n stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals O 1 , so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
  • a first cell M 21 allocated at the leftmost side on a second row and connected to an input terminal I 2 encompasses a capacitor C 21 , and a marching AND-gate G 21 having one input terminal connected to the capacitor C 21 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G 21 assigned to the adjacent second cell M 21 on the second row.
  • the second cell M 22 on the second row of the gate-level representation of cell array implementing the marching main memory 31 encompasses the capacitor C 22 and a marching AND-gate G 22 , which has one input terminal connected to the capacitor C 22 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G 23 assigned to the adjacent third cell M 23 on the second row.
  • the third cell M 23 on the second row of the gate-level representation of cell array implementing the marching main memory 31 encompasses a capacitor C 23, and a marching AND-gate G 23 having one input terminal connected to the capacitor C 23 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate assigned to the adjacent fourth cell.
  • a (n-1)-th cell M 2, n-1 on the second row of the gate-level representation of cell array implementing the marching main memory 31 encompasses a capacitor C 2, n-1, and a marching AND-gate G 2, n-1 having one input terminal connected to the capacitor C 2, n-1 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G 1n assigned to the adjacent n-th cell M 1n , which is allocated at the rightmost side on the second row and connected to an output terminal O 1 .
  • each of the cells M 21 , M 22 , M 23 ,........., M 2, n-1 , M 1n on the second row stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals O 1 , so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
  • each of the cells M 31 , M 32 , M 33 ,........., M 3, n-1 , M 3n on the third row stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals O 3 , so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
  • a first cell M (m-1),1 allocated at the leftmost side and connected to an input terminal I m-1, , a second cell M (m-1),2 adjacent to the first cell M (m-1),1 , a third cell M (m-1),3 adjacent to the second cell M (m-1),2 , .
  • a (n-1)-th cell M (m-1), n-1 and an n-th cell M (m-1),n , which is allocated at the rightmost side on the (m-1)-th row and connected to an output terminal O m-1, are aligned.
  • each of the cells M (m-1),1 , M (m-1),2 , M (m-1),3 ,........., M (m-1), n-1 , M (m-1),n on the (m-1)-th row stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals O m-1, , so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
  • each of the cells M m1 , M m2 , M m3 ,........., M m(n-1) , M mn on the m-th row stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals O m , so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
  • FIG. 6 Although one of the examples of the transistor-level configurations of the marching AND-gate G ij is illustrated in Fig. 6, there are various circuit configurations to implement the marching AND-gate, which can be applied to the cell array implementing the marching main memory 31 in the computer system pertaining to the first embodiment of the present invention.
  • Another example of the marching AND-gate G ij which can be applied to the cell array implementing the marching main memory 31, may be a configuration encompassing a CMOS NAND gate and a CMOS inverter connected to the output terminal of the CMOS NAND gate.
  • the configuration encompassing the CMOS NAND gate and the CMOS inverter requires six transistors.
  • the marching AND-gate G ij can be implemented by other circuit configurations such as resistor-transistor logics, or by various semiconductor elements, magnetic elements, superconductor elements, or single quantum elements, etc. which has a function of AND logic.
  • Each of the vertical sequence of marching AND-gates G 11 , G 21 , G 31 , ........, G m-1,1 , G m1 implementing the first memory unit U 1 shifts the sequence of signals from input terminals I 1 , I 2 , I 3 ,........., I n-1 , I n to right along row-direction, or horizontal direction, based on clocks as illustrated in Fig. 7C.
  • each of the vertical sequence of marching AND-gates G 12 , G 22 , G 32 , ........, G m-1,2 , G m2 implementing the second memory unit U 2 shifts the sequence of signals of word size from left to right along row-direction based on clocks
  • each of the vertical sequence of marching AND-gates G 13 , G 23 , G 33 , ........, G m-1,3 , G m3 implementing the third memory unit U 3 shifts the sequence of signals of word size from left to right along row-direction based on clocks, ............,each of the vertical sequence of marching AND-gates G 1,n-1 , G 2,n-1 , G 3,n-1 , ........, G m-1,n-1 , G m,n-1 implementing the (n-1)-th memory unit U n-1 shifts the sequence of signals of word size from left to right along row-direction based on clocks, and each of the vertical sequence of marching AND-gates G 1,n , G
  • FIGs. 3-23 illustrate the marching main memory which stores the information in each of memory units U 1 , U 2 , U 3 ,........., U n-1 , U n and transfers the information synchronously with the clock signal, step by step, from input terminal toward the output terminal
  • Fig. 24 illustrates another marching main memory.
  • each of the memory units U 1 , U 2 , U 3 ,........., U n-1 , U n stores the information including word size of data or instructions, and transfers in the reverse direction the information synchronously with the clock signal, step by step, toward the output terminals, provided from the processor 11 with the resultant data executed in the ALU 112.
  • Fig. 25(a) illustrates an array of i-th row of the m * n matrix (here, "m” is an integer determined by word size) in a cell-level representation of the another marching main memory illustrated in Fig. 24, which stores the information of bit level in each of cells M i1 , M i2 , M i3 ,........., M i,n-1 , M i,n and transfers the information synchronously with the clock signal, step by step in the reverse direction to the marching main memory illustrated in Figs. 3-23, namely from the output terminal OUT toward the input terminal IN.
  • m is an integer determined by word size
  • a bit-level cell M in of the n-th column and on the i-th row, allocated at the rightmost side on the i-th row and connected to an input terminal IN encompasses a first nMOS transistor Q in1 having a drain electrode connected to a clock signal supply line through a first delay element D in1 and a gate electrode connected to the input terminal IN through a second delay element D in2 ; a second nMOS transistor Q in2 having a drain electrode connected to a source electrode of the first nMOS transistor Q in1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C in configured to store the information of the bit-level cell M in , connected in parallel with the second nMOS transistor Q in2 , wherein an output node connecting the source electrode of the first nMOS transistor Q in1 and the drain electrode of the second nMOS transistor Q in2 serves
  • the clock signal swings periodically between the logical levels of "1" and "0", with a predetermined clock period TAU clock , and when the clock signal becomes the logical level of "1", the second nMOS transistor Q in2 begins to discharge the signal charge, which is already stored in the capacitor C in at a previous clock cycle. And, after the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor C in is completely discharged to becomes the logical level of "0", the first nMOS transistor Q in1 becomes active as the transfer transistor, delayed by the predetermined delay time t d1 determined by the first delay element D in1 .
  • the delay time t d1 may be set to be equal to 1/4TAU clock preferably.
  • the first nMOS transistor Q in1 transfers the signal stored in the previous bit-level cell M i2 , further delayed by the predetermined delay time t d2 determined by the second delay element D in2 to the capacitor C in .
  • the first nMOS transistor Q in1 becomes conductive state, and the logical level of "1" is stored in the capacitor C in .
  • the bit-level cell M in can establish "a marching AND-gate" operation.
  • the delay time t d2 shall be longer than the delay time t d1 , and the delay time t d2 may be set to be equal to 1/2TAU clock preferably.
  • a bit-level cell M i(n-1) of the (n-1)-th column and on the i-th row, allocated at the second right side on the i-th row encompasses a first nMOS transistor Q i(n-1)1 having a drain electrode connected to the clock signal supply line through a first delay element D i(n-1)1 and a gate electrode connected to the output terminal of the bit-level cell M in through a second delay element D i(n-1)2 ; a second nMOS transistor Q i(n-1)2 having a drain electrode connected to a source electrode of the first nMOS transistor Q i(n-1)1 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C i(n-1) configured to store the information of the bit-level cell M i(n-1) , connected in parallel with the second nMOS transistor
  • the second nMOS transistor Q i(n-1)2 begins to discharge the signal charge, which is already stored in the capacitor C i(n-1) at a previous clock cycle. And, as illustrated in Fig. 25(b), and the logical values of "1" is kept from time “t” to time “t+1” in the capacitor C i(n-1) .
  • the first nMOS transistor Q i(n-1)1 becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first delay element D i(n-1)1 .
  • the first nMOS transistor Q i(n-1)1 transfers the signal stored in the previous bit-level cell M in , further delayed by the delay time t d2 determined by the second delay element D i(n-1)2 to the capacitor C i(n-1) .
  • the third cell M i3 from the left, on the i-th row, of the reverse-directional marching main memory encompasses a first nMOS transistor Q i31 having a drain electrode connected to the clock signal supply line through a first delay element D i31 and a gate electrode connected to the output terminal of the bit-level cell M i4 (illustration is omitted) through a second delay element D i32 ; a second nMOS transistor Q i32 having a drain electrode connected to a source electrode of the first nMOS transistor Q i31 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C i3 configured to store the information of the bit-level cell M i3 , connected in parallel with the second nMOS transistor Q i32 .
  • the second nMOS transistor Q i32 begins to discharge the signal charge, which is already stored in the capacitor C i3 at a previous clock cycle.
  • the first nMOS transistor Q i31 becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first delay element D i31 .
  • the first nMOS transistor Q i31 transfers the signal stored in the previous bit-level cell M in , further delayed by the delay time t d2 determined by the second delay element D i32 to the capacitor C i3 .
  • a bit-level cell M i2 of the second column from the left, and on the i-th row encompasses a first nMOS transistor Q i21 having a drain electrode connected to the clock signal supply line through a first delay element D i21 and a gate electrode connected to the output terminal of the bit-level cell M i3 through a second delay element D i22 ; a second nMOS transistor Q i22 having a drain electrode connected to a source electrode of the first nMOS transistor Q i21 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C i2 configured to store the information of the bit-level cell M i2 , connected in parallel with the second nMOS transistor Q i22 .
  • the second nMOS transistor Q i22 begins to discharge the signal charge, which is already stored in the capacitor C i2 at a previous clock cycle.
  • the first nMOS transistor Q i21 becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first delay element D i21 .
  • the first nMOS transistor Q i21 transfers the signal stored in the previous bit-level cell M i3 , further delayed by the delay time t d2 determined by the second delay element D i22 to the capacitor C i2 .
  • a bit-level cell M i1 of the first column and on the i-th row which is allocated at the leftmost side on the i-th row and connected to an output terminal OUT, encompasses a first nMOS transistor Q i11 having a drain electrode connected to the clock signal supply line through a first delay element D i11 and a gate electrode connected to the output terminal of the bit-level cell M i2 through a second delay element D i12 ; a second nMOS transistor Q i12 having a drain electrode connected to a source electrode of the first nMOS transistor Q i11 , a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C i1 configured to store the information of the bit-level cell M i1 , connected in parallel with the second nMOS transistor Q i12 .
  • the second nMOS transistor Q i12 begins to discharge the signal charge, which is already stored in the capacitor C i1 at a previous clock cycle.
  • the first nMOS transistor Q i11 becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first delay element D i11 .
  • the first nMOS transistor Q i11 transfers the signal stored in the previous bit-level cell M i2 , further delayed by the delay time t d2 determined by the second delay element D i12 to the capacitor C i1 .
  • the output node connecting the source electrode of the first nMOS transistor Q i11 and the drain electrode of the second nMOS transistor Q i12 delivers the signal stored in the capacitor C i1 to the output terminal OUT.
  • addressing to each of memory units U 1 , U 2 , U 3 ,........., U n-1 , U n disappears and required information is heading for its destination unit connected to the edge of the memory.
  • the mechanism of accessing the reverse-directional one-dimensional marching main memory 31 of the first embodiment is truly alternative to existing memory schemes that are starting from the addressing mode to read/write information. Therefore, according to the reverse-directional one-dimensional marching main memory 31 of the first embodiment, the memory-accessing without addressing mode is quite simpler than existing memory schemes.
  • the bit-level cell M ij can establish "a marching AND-gate" operation. Therefore, as illustrated in Fig. 26, in a gate-level representation of the cell array corresponding to the reverse-directional marching main memory 31 illustrated in Fig. 25(a), the n-th bit-level cell M i,n allocated at the rightmost side on the i-th row and connected to an input terminal IN encompasses a capacitor C in configured to store the information, and a marching AND-gate G in having one input terminal connected to the capacitor C in , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the preceding marching AND-gate G in-1 assigned to the adjacent (n-1)-th bit-level cell M i,n-1 on the i-th row.
  • the information stored in the capacitor C in is transferred to a capacitor C i,n-1 , assigned to the adjacent (n-1)-th bit-level cell M i,n-1 on the i-th row, and the capacitor C i,n-1 stores the information.
  • the (n-1)-th bit-level cell M i,n-1 on the i-th row of the reverse-directional marching main memory encompasses the capacitor C i,n-1 and a marching AND-gate G i,n-1 , which has one input terminal connected to the capacitor C i,n-1 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the preceding marching AND-gate G i,n-2 assigned to the adjacent third bit-level cell M i,n-2 (illustration is omitted).
  • the third bit-level cell M i3 on the i-th row of the reverse-directional marching main memory encompasses a capacitor C i3 configured to store the information, and a marching AND-gate G i3 having one input terminal connected to the capacitor C i3 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the preceding marching AND-gate G i2 assigned to the adjacent second bit-level cell M i2 .
  • the information stored in the capacitor C i3 is transferred to the capacitor C i2 , assigned to the second bit-level cell M i2 , and the capacitor C i2 stores the information.
  • the second bit-level cell M i2 on the i-th row of the reverse-directional marching main memory encompasses the capacitor C i2 configured to store the information, and the marching AND-gate G i2 having one input terminal connected to the capacitor C i2 , the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the preceding marching AND-gate G i1 assigned to the adjacent first bit-level cell M i1 , which is allocated at the leftmost side on the i-th row and connected to an output terminal OUT.
  • marching main memory 31 used in the computer system pertaining to the first embodiment of the present invention is illustrated in Fig. 27, this is different from existing computer memory, because the marching main memory 31 is purposely designed with functionality of storage and conveyance of information/data through all of memory units U 1 , U 2 , U 3 ,........., U n-1 , U n in the marching main memory 31.
  • Marching memory supplies information/data to the processor (CPU) 11 at the same speed of the processor 11.
  • the memory unit streaming time T mus required for transferring information/data through one memory units U 1 , U 2 , U 3 ,........., U n-1 , U n , in the marching main memory 31 is equal to the clock cycle T cc in the processor 11.
  • the marching main memory 31 stores information/data in each of the memory units U 1 , U 2 , U 3 ,........., U n-1 , U n , and transfers synchronously with the clock signal, step by step, toward the output terminals, so as to provide the processor 11 with the stored information/data so that the arithmetic logic unit 112 can execute the arithmetic and logic operations with the stored information/data.
  • marching memory structure 3 includes the marching main memory 31 of the first embodiment of the present invention.
  • the term of "the marching memory structure 3" means a generic concept of the memory structure including a marching-instruction register file (RF) 22a and a marching-data register file (RF) 22b connected to the ALU 112, which will be explained further in the following second embodiment, and a marching-instruction cache memory 21a and a marching-data cache memory 21b, which will be explained further in the following third embodiment, in addition to the marching main memory 31 used in the computer system pertaining to the first embodiment of the present invention.
  • RF marching-instruction register file
  • RF marching-data register file
  • Fig. 29(a) illustrates a forward data-stream S f flowing from the marching memory structure 3 to the processor 11 and backward data-stream (reverse data-stream) S b flowing from the processor 11 to the marching memory structure 3, and Fig. 29(a) illustrates bandwidths established between the marching memory structure 3 and the processor 11 assuming that the memory unit streaming time T mus in the marching memory structure 3 is equal to the clock cycle T cc of the processor 11.
  • the scheme of the marching main memory 31 may be considered to be analogous to a magnetic tape system illustrated in Fig. 30 (a), which encompasses a magnetic tape 503, a take-up reel 502 for winding the magnetic tape 503, a supply reel 501 for rewinding and releasing the magnetic tape 503, a read/write header 504 for reading information/data from the magnetic tape 503 or writing information/data to the magnetic tape 503, and a processor 11 connected to the read/write header 504.
  • the magnetic tape 503 moves at high speed from the supply reel 501 toward the take-up reel 502, and information/data stored on the magnetic tape 503, being transferred with the movement of the magnetic tape 503 at high speed, are read by the read/write header 504.
  • the processor 11 connected to the read/write header 504 can execute arithmetic and logic operations with information/data read from the magnetic tape 503. Alternatively, the results of the processing in the processor 11 are sent out to the magnetic tape 503 through the read/write header 504.
  • the extremely high-speed magnetic tape system illustrated in Fig. 30 (a) may correspond to a net marching memory structure 3, including the marching main memory 31 of the first embodiment of the present invention.
  • Figs. 31 (a)-(c) the marching main memory 31 of the first embodiment of the present invention, can achieve bidirectional transferring of information/data. That is, Fig. 31 (a) illustrates a forward marching behavior of information/data, in which information/data marches (shifts) side by side toward right-hand direction (forward direction) in a one-dimensional marching main memory 31, Fig. 31 (b) illustrates a staying state of the one-dimensional marching main memory 31, and Fig. 31 (c) illustrates a reverse-marching behavior of information/data (a backward marching behavior), in which information/data marches (shifts) side by side toward left-hand direction (reverse direction) in the one-dimensional marching main memory 31.
  • Fig. 31 (a) illustrates a forward marching behavior of information/data, in which information/data marches (shifts) side by side toward right-hand direction (forward direction) in a one-dimensional marching main memory 31
  • Fig. 31 (b) illustrates a staying state of
  • Figs. 32 and 33 illustrate two examples of the representative arrays of i-th row of the m * n matrix (here, "m” is an integer determined by word size) in a transistor-level representation of the cell array for the bidirectional marching main memory 31, respectively, which can achieve the bidirectional behavior illustrated in Figs. 31 (a)-(c).
  • the bidirectional marching main memory 31 stores the information/data of bit level in each of cells M i1 , M i2 , M i3 ,........., M i,n-1 , M i,n and transfers bi-directionally the information/data synchronously with the clock signal, step by step in the forward direction and/or reverse direction (backward direction) between a first I/O selector 512 and a second I/O selector 513.
  • each of the cells M i1 , M i2 , M i3 ,........., M i,n-1 , M i,n is assigned in memory unit U 1 , U 2 , U 3 ,........., U n-1 , U n , respectively. That is the cell M i1 is assigned as the first bit-level cell in the first memory unit U 1 , the first memory unit U 1 stores information of byte size or word size by the sequence of bit-level cells arrayed in the first memory unit U 1 .
  • the cell M i2 is assigned as the second bit-level cell in the second memory unit U 2
  • the cell M i3 is assigned as the third bit-level cell in the third memory unit U 3
  • the cell M i,n-1 is assigned as the (n-1)-th bit-level cell in the (n-1)-th memory unit U n-1
  • the cell M i,n is assigned as the n-th bit-level cell in the n-th memory unit U n .
  • the memory units U 2 , U 3 ,........., U n-1 , U n stores information of byte size or word size by the sequence of bit-level cells arrayed in the memory unit U 2 , U 3 ,........., U n-1 , U n , respectively. Therefore, the bidirectional marching main memory 31 stores the information/data of byte size or word size in each of cells U 1 , U 2 , U 3 ,........., U n-1 , U n and transfers bi-directionally the information/data of byte size or word size synchronously with the clock signal, pari passu, in the forward direction and/or reverse direction (backward direction) between a first I/O selector 512 and a second I/O selector 513.
  • a clock selector 511 selects a first clock signal supply line CL1 and a second clock signal supply line CL2.
  • the first clock signal supply line CL1 drives the forward data-stream
  • the second clock signal supply line CL2 drives the backward data-stream
  • each of the first clock signal supply line CL1 and the second clock signal supply line CL2 has logical values of "1" and "0".
  • a first bit-level cell M i1 allocated at the leftmost side on i-th row, being connected to a first I/O selector 512 encompasses a first forward nMOS transistor Q i11f having a drain electrode connected to a first clock signal supply line CL1 through a first forward delay element D i11f and a gate electrode connected to the first I/O selector 512 through a second forward delay element D i12f ; a second forward nMOS transistor Q i12f having a drain electrode connected to a source electrode of the first forward nMOS transistor Q i11f , a gate electrode connected to the first clock signal supply line, and a source electrode connected to the ground potential; and a forward capacitor C i1f configured to store the forward information/data of the cell M i1 , connected in parallel with the second forward nMOS transistor Q i12f , wherein an output node
  • the first bit-level cell M i1 further encompasses a first backward nMOS transistor Q i11g having a drain electrode connected to a second clock signal supply line through a first backward delay element D i11g and a gate electrode connected to the backward output terminal of the bit-level cell M i2 through a second backward delay element D i12g ; a second backward nMOS transistor Q i12g having a drain electrode connected to a source electrode of the first backward nMOS transistor Q i11g , a gate electrode connected to the second clock signal supply line, and a source electrode connected to the ground potential; and a backward capacitor C i1g configured to store the backward information/data of the cell M i1 , connected in parallel with the second backward nMOS transistor Q i12g , wherein an output node connecting the source electrode of the first backward nMOS transistor Q i11g and the drain electrode of the second backward nMOS transistor Q i12g serves
  • a second bit-level cell M i2 allocated at the second from the left side on i-th row, being connected to the bit-level cell M i1 encompasses a first forward nMOS transistor Q i21f having a drain electrode connected to the first clock signal supply line CL1 through a first forward delay element D i21f and a gate electrode connected to the forward output terminal of the bit-level cell M i1 through a second forward delay element D i22f ; a second forward nMOS transistor Q i22f having a drain electrode connected to a source electrode of the first forward nMOS transistor Q i21f , a gate electrode connected to the first clock signal supply line CL1, and a source electrode connected to the ground potential; and a forward capacitor C i2f configured to store the forward information/data of the cell M i2 , connected in parallel with the second forward nMOS transistor Q i22f ,wherein an output node connecting the source electrode of the first forward nMOS transistor Q i21f and
  • the second bit-level cell M i2 further encompasses a first backward nMOS transistor Q i21g having a drain electrode connected to the second clock signal supply line CL2 through a first backward delay element D i21g and a gate electrode connected to the backward output terminal of the bit-level cell M i3 through a second backward delay element D i22g ; a second backward nMOS transistor Q i22g having a drain electrode connected to a source electrode of the first backward nMOS transistor Q i21g , a gate electrode connected to the second clock signal supply line CL2, and a source electrode connected to the ground potential; and a backward capacitor C i2g configured to store the backward information/data of the cell M i2 , connected in parallel with the second backward nMOS transistor Q i22g ,wherein an output node connecting the source electrode of the first backward nMOS transistor Q i21g and the drain electrode of the second backward nMOS transistor Q i22
  • the third bit-level cell M i3 further encompasses a first backward nMOS transistor Q i31g having a drain electrode connected to the second clock signal supply line CL2 through a first backward delay element D i31g and a gate electrode connected to the backward output terminal of the bit-level cell M i4 through a second backward delay element D i32g ; a second backward nMOS transistor Q i32g having a drain electrode connected to a source electrode of the first backward nMOS transistor Q i31g , a gate electrode connected to the second clock signal supply line CL2, and a source electrode connected to the ground potential; and a backward capacitor C i3g configured to store the backward information/data of the cell M i3 , connected in parallel with the second backward nMOS transistor Q i32g ,wherein an output node connecting the source electrode of the first backward nMOS transistor Q i31g and the drain electrode of the second backward nMOS transistor Q i32
  • a (n-1)-th bit-level cell M i(n-1) allocated at the second from the left side on i-th row encompasses a first forward nMOS transistor Q i(n-1)1f having a drain electrode connected to the first clock signal supply line CL1 through a first forward delay element D i(n-1)1f and a gate electrode connected to the forward output terminal of the bit-level cell M i(n-2) (illustration is omitted) through a second forward delay element D i(n-1)2f ; a second forward nMOS transistor Q i(n-1)2f having a drain electrode connected to a source electrode of the first forward nMOS transistor Q i(n-1)1f , a gate electrode connected to the first clock signal supply line CL1, and a source electrode connected to the ground potential; and a forward capacitor C i(n-1)f configured to store the forward information/data of the cell M i(n-1) , connected in parallel with the second forward nMOS transistor
  • the (n-1)-th bit-level cell M i(n-1) further encompasses a first backward nMOS transistor Q i(n-1)1g having a drain electrode connected to the second clock signal supply line CL2 through a first backward delay element D i(n-1)1g and a gate electrode connected to the backward output terminal of next bit-level cell M in through a second backward delay element D i(n-1)2g ; a second backward nMOS transistor Q i(n-1)2g having a drain electrode connected to a source electrode of the first backward nMOS transistor Q i(n-1)1g , a gate electrode connected to the second clock signal supply line CL2, and a source electrode connected to the ground potential; and a backward capacitor C i(n-1)g configured to store the backward information/data of the cell M i(n-1) , connected in parallel with the second backward nMOS transistor Q i(n-1)2g ,wherein an output node connecting
  • a n-th bit-level cell M in allocated at the rightmost side on i-th row encompasses a first forward nMOS transistor Q in1f having a drain electrode connected to the first clock signal supply line CL1 through a first forward delay element D in1f and a gate electrode connected to the forward output terminal of the bit-level cell M i(n-1) through a second forward delay element D in2f ; a second forward nMOS transistor Q in2f having a drain electrode connected to a source electrode of the first forward nMOS transistor Q in1f , a gate electrode connected to the first clock signal supply line CL1, and a source electrode connected to the ground potential; and a forward capacitor C inf configured to store the forward information/data of the cell M in , connected in parallel with the second forward nMOS transistor Q in2f , wherein an output node connecting the source electrode of the first forward nMOS transistor Q in1f and the drain electrode of the second forward nMOS transistor Q in2f serves as a forward output terminal
  • the n-th bit-level cell M in further encompasses a first backward nMOS transistor Q in1g having a drain electrode connected to the second clock signal supply line CL2 through a first backward delay element D in1g and a gate electrode connected to the second I/O selector 513 through a second backward delay element D in2g ; a second backward nMOS transistor Q in2g having a drain electrode connected to a source electrode of the first backward nMOS transistor Q in1g , a gate electrode connected to the second clock signal supply line CL2, and a source electrode connected to the ground potential; and a backward capacitor C ing configured to store the backward information/data of the cell M in , connected in parallel with the second backward nMOS transistor Q in2g ,wherein an output node connecting the source electrode of the first backward nMOS transistor Q in1g and the drain electrode of the second backward nMOS transistor Q in2g serves as a backward output terminal of the cell M in , configured
  • the second forward nMOS transistor Q i12f in the first memory unit U 1 begin to discharge the signal charge, which is already stored in the forward capacitor C i1f in the first memory unit U 1 at a previous clock cycle.
  • the first forward nMOS transistor Q i11f becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first forward delay element D i11f .
  • the first forward nMOS transistor Q i11f transfers the information/data to the forward capacitor C i1f , delayed by the delay time t d2 determined by the second forward delay element D i12f .
  • the second backward nMOS transistor Q i12b When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "1", the second backward nMOS transistor Q i12b begins to discharge the signal charge, which is already stored in the backward capacitor C i1b at a previous clock cycle. After the clock signal of the logical level of "1”, supplied from the second clock signal supply line CL2, is applied and the signal charge stored in the backward capacitor C i1b is completely discharged to becomes the logical level of "0", the first backward nMOS transistor Q i11b becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first backward delay element D i11b .
  • the first backward nMOS transistor Q i11b transfers the information/data stored in the previous bit-level cell M i2 , further delayed by the delay time t d2 determined by the second backward delay element D i12b to the backward capacitor C i1b .
  • the output node connecting the source electrode of the first backward nMOS transistor Q i11b and the drain electrode of the second backward nMOS transistor Q i12b delivers the information/data stored in the backward capacitor C i1b to the first I/O selector 512.
  • the first forward nMOS transistor Q i2f1 becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first forward delay element D i21f .
  • the second backward nMOS transistor Q i22b begins to discharge the signal charge, which is already stored in the backward capacitor C i2b at a previous clock cycle.
  • the first backward nMOS transistor Q i21b becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first backward delay element D i21b .
  • the first backward nMOS transistor Q i21b transfers the information/data stored in the previous bit-level cell M i3 , further delayed by the delay time t d2 determined by the second backward delay element D i22b to the backward capacitor C i2b .
  • the second forward nMOS transistor Q i32f in the third memory unit U 3 begin to discharge the signal charge, which is already stored in the forward capacitor C i3f in the third memory unit U 3 at the previous clock cycle.
  • the first forward nMOS transistor Q i31f becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first forward delay element D i31f .
  • the second backward nMOS transistor Q i32b begins to discharge the signal charge, which is already stored in the backward capacitor C i3b at a previous clock cycle.
  • the first backward nMOS transistor Q i31b becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first backward delay element D i31b .
  • the first backward nMOS transistor Q i31b transfers the information/data stored in the previous bit-level cell M i3 , further delayed by the delay time t d2 determined by the second backward delay element D i32b to the backward capacitor C i3b .
  • the second forward nMOS transistor Q i(n-1)2f in the third memory unit U (n-1) begin to discharge the signal charge, which is already stored in the forward capacitor C i(n-1)f in the third memory unit U (n-1) at the previous clock cycle.
  • the first forward nMOS transistor Q i(n-1)1f becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first forward delay element D i(n-1)1f .
  • the second backward nMOS transistor Q i(n-1)2b begins to discharge the signal charge, which is already stored in the backward capacitor C i(n-1)b at a previous clock cycle.
  • the first backward nMOS transistor Q i(n-1)1b becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first backward delay element D i(n-1)1b .
  • the first backward nMOS transistor Q i(n-1)1b transfers the information/data stored in the previous bit-level cell M i(n-1) , further delayed by the delay time t d2 determined by the second backward delay element D i(n-1)2b to the backward capacitor C i(n-1)b .
  • the second forward nMOS transistor Q in2f in the third memory unit U n begin to discharge the signal charge, which is already stored in the forward capacitor C inf in the third memory unit U n at the previous clock cycle.
  • the clock signal of the logical level of "1”, supplied from the first clock signal supply line CL1 is applied to the second forward nMOS transistor Q in2f , and the signal charge stored in the forward capacitor C inf is completely discharged to becomes the logical level of "0”
  • the first forward nMOS transistor Q in1f becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first forward delay element D in1f .
  • the first forward nMOS transistor Q in1f transfers the information/data, delayed by the delay time t d2 determined by the second forward delay element D in2f to the forward capacitor C inf .
  • the output node connecting the source electrode of the first forward nMOS transistor Q in1f and the drain electrode of the second forward nMOS transistor Q in2f delivers the information/data, which is entered to the gate electrode of the first forward nMOS transistor Q in1f to the second I/O selector 513.
  • the second backward nMOS transistor Q in2b When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "1", the second backward nMOS transistor Q in2b begins to discharge the signal charge, which is already stored in the backward capacitor C inb at a previous clock cycle. After the clock signal supplied from the second clock signal supply line CL2 of the logical level of "1" is applied and the signal charge stored in the backward capacitor C inb is completely discharged to becomes the logical level of "0", the first backward nMOS transistor Q in1b becomes active as the transfer transistor, delayed by the delay time t d1 determined by the first backward delay element D in1b .
  • the first backward nMOS transistor Q in1b transfers the information/data received from the second I/O selector 513, further delayed by the delay time t d2 determined by the second backward delay element D in2b to the backward capacitor C inb .
  • each of the cells M i1 , M i2 , M i3 ,........., M i,(n-1) , M i,n on the i-th row of the bidirectional marching main memory stores the information/data, and transfers bi-directionally the information/data, synchronously with the clock signals supplied respectively from the first clock signal supply line CL1 and the second clock signal supply line CL2, step by step, between the first I/O selector 512 and the second I/O selector 513.
  • the bidirectional marching main memory 31 illustrated in Fig.32 stores the information/data of byte size or word size in each of cells U 1 , U 2 , U 3 ,........., U n-1 , U n and transfers bi-directionally the information/data of byte size or word size synchronously with the clock signal, pari passu, in the forward direction and/or reverse direction (backward direction) between a first I/O selector
  • a forward isolation transistor Q i23f is provided so as to isolate the signal-storage state of the second bit-level cell M i2 in the second memory unit U n from the signal-storage state of the first bit-level cell M i1 in the first memory unit U 1 , the forward isolation transistor Q i23f transfers forward a signal from the first bit-level cell M i1 to the second bit-level cell M i2 at a required timing determined by a clock signal, which is supplied through the first clock signal supply line CL1.
  • a backward isolation transistor Q i13b is provided so as to isolate the signal-storage state of the signal-storage state of the first bit-level cell M i1 in the first memory unit U 1 from the second bit-level cell M i2 in the second memory unit U 2 , the backward isolation transistor Q i13b transfers backward a signal from the second bit-level cell M i2 to the first bit-level cell M i1 at a required timing determined by a clock signal, which is supplied through the second clock signal supply line CL2.
  • a backward isolation transistor Q i23b is provided so as to isolate the signal-storage state of the signal-storage state of the second bit-level cell M i2 in the second memory unit U 2 from the third bit-level cell M i3 (the illustration is omitted) in the third memory unit U 3 , the backward isolation transistor Q i23b transfers backward a signal from the third bit-level cell M i3 to the second bit-level cell M i2 at a required timing determined by a clock signal, which is supplied through the third clock signal supply line CL2.
  • a forward isolation transistor Q i(n-1)3f is provided so as to isolate the signal-storage state of the (n-1)-th bit-level cell M i(n-1) in the (n-1)-th memory unit U n-1 from the signal-storage state of the (n-2)-th bit-level cell M i(n-2) (the illustration is omitted) in the (n-2)-th memory unit U n-2 (the illustration is omitted), the forward isolation transistor Q i(n-1)3f transfers forward a signal from the (n-2)-th bit-level cell M i(n-2) to the (n-1)-th bit-level cell M i(n-1) at a required timing determined by a clock signal, which is supplied through the first clock signal supply line CL1.
  • a forward isolation transistor Q in3f is provided so as to isolate the signal-storage state of the n-th bit-level cell M in in the n-th memory unit U n from the signal-storage state of the (n-1)-th bit-level cell M in-1 in the (n-1)-th memory unit U n-1 , the forward isolation transistor Q in3f transfers forward a signal from the (n-1)-th bit-level cell M in-1 to the n-th bit-level cell M in at a required timing determined by a clock signal, which is supplied through the first clock signal supply line CL1.
  • a backward isolation transistor Q in3b is provided so as to isolate the signal-storage state of the signal-storage state of the (n-1)-th bit-level cell M in-1 in the (n-1)-th memory unit U n-1 from the n-th bit-level cell M in in the n-th memory unit U n , the backward isolation transistor Q in3b transfers backward a signal from the n-th bit-level cell M in to the (n-1)-th bit-level cell M in-1 at a required timing determined by a clock signal, which is supplied through the second clock signal supply line CL2.
  • Fig. 34 illustrates an array of i-th row of the m * n matrix (here, "m" is an integer determined by word size) in a gate-level representation of the bidirectional marching main memory 31, which can achieve the random access mode in the bidirectional behavior illustrated in Figs. 31 (a)-(c).
  • the bidirectional marching main memory 31 stores the information/data of bit level in each of cells M i1 , M i2 , M i3 ,........., M i,n-1 , M i,n and transfers bi-directionally the information/data synchronously with the clock signal, step by step in the forward direction and/or reverse direction (backward direction) between a first I/O selector 512 and a second I/O selector 513.
  • a first bit-level cell M i1 allocated at the leftmost side on i-th row and connected to first I/O selector 512 encompasses a common capacitor C i1 configured to store the information/data, and a forward marching AND-gate G i1f having one input terminal connected to the common capacitor C i1 , the other input supplied with the first clock signal supply line CL1, and an output terminal connected to one input terminal of the next forward marching AND-gate G (i+1)1f assigned to the adjacent second bit-level cell M (i+1)1 on the i-th row, and a backward marching AND-gate G i1b having one input terminal connected to the common capacitor C i1 , the other input supplied with the second clock signal supply line CL2, and an output terminal connected to the first I/O selector 512.
  • the first clock signal supply line CL1, configured to drive the forward data-stream, and the second clock signal supply line CL2, configured to drive the backward data-stream, are respectively selected by a clock selector 511, and each of the first clock signal supply line CL1 and the second clock signal supply line CL2 has logical values of "1" and "0".
  • the logical values of "1" of the first clock signal supply line CL1 is fed to the other input terminal of the forward marching AND-gate G i1 , the information/data stored in the common capacitor C i1 is transferred to a common capacitor C i2 , assigned to the adjacent second bit-level cell M i2 , and the common capacitor C i2 stores the information/data.
  • the second bit-level cell M i2 on the i-th row of the bidirectional marching main memory 31 encompasses the common capacitor C i2 configured to store the information/data, a forward marching AND-gate G i2f , which has one input terminal connected to the common capacitor C i2 , the other input supplied with the first clock signal supply line CL1, and an output terminal connected to one input terminal of the next forward marching AND-gate G 13 assigned to the adjacent third bit-level cell M i3 on the i-th row, and the backward marching AND-gate G i2b having one input terminal connected to the common capacitor C i2 , the other input supplied with the second clock signal supply line CL2, and an output terminal connected to one input terminal of the preceding backward marching AND-gate G i1 .
  • the third bit-level cell M i3 on the i-th row encompasses a common capacitor C i3 configured to store the information/data, a forward marching AND-gate G i3f having one input terminal connected to the common capacitor C i3 , the other input supplied with the first clock signal supply line CL1, and an output terminal connected to one input terminal of the next forward marching AND-gate assigned to the adjacent fourth cell, although the illustration of the fourth cell is omitted, and an backward marching AND-gate G i3b having one input terminal connected to the common capacitor C i3 , the other input supplied with the second clock signal supply line CL2, and an output terminal connected to one input terminal of the preceding backward marching AND-gate G i2b assigned to the adjacent second bit-level cell M i2 .
  • the information/data stored in the common capacitor C i2 is transferred to the common capacitor C i3 , assigned to the third bit-level cell M i3 , and the common capacitor C i3 stores the information/data, and when the logical values of "1" of the first clock signal supply line CL1 is fed to the other input terminal of the forward marching AND-gate G i3f , the information/data stored in the common capacitor C i3 is transferred to the capacitor, assigned to the fourth cell.
  • an (n-1)-th bit-level cell M i,(n-1) on the i-th row encompasses a common capacitor C i,(n-1) , configured to store the information/data, and a forward marching AND-gate G i,(n-1)f having one input terminal connected to the common capacitor C i,(n-1) , the other input supplied with the first clock signal supply line CL1, and an output terminal connected to one input terminal of the next forward marching AND-gate G i,nf assigned to the adjacent n-th bit-level cell M i,n , which is allocated at the rightmost side on the i-th row and connected to the second I/O selector 513, and an backward marching AND-gate G i,(n-1)b , which has one input terminal connected to the common capacitor C i,(n-1) , the other input supplied with the second clock signal supply line CL2, and an output terminal connected to one input terminal of the preceding backward marching AND-gate G i,(n-2)
  • an n-th bit-level cell M i,n allocated at the rightmost side on the i-th row and connected to the second I/O selector 513 encompasses a common capacitor C i,n configured to store the information/data, a backward marching AND-gate G inb having one input terminal connected to the common capacitor C in , the other input terminal configured to be supplied with the second clock signal supply line CL2, and an output terminal connected to one input terminal of the preceding backward marching AND-gate G i(n-1)b assigned to the adjacent (n-1)-th bit-level cell M i,n-1 on the i-th row, and a forward marching AND-gate G i,nf having one input terminal connected to the common capacitor C i,n , the other input terminal configured to be supplied with the first clock signal supply line CL1, and an output terminal connected to the second I/O selector 513.
  • the information/data stored in the common capacitor C i2 is transferred to the common capacitor C i1 , assigned to the second bit-level cell M i1 , and the common capacitor C i1 stores the information/data, and when the logical values of "1" of the second clock signal supply line CL2 is fed to the other input terminal of the backward marching AND-gate G i1b , the information/data stored in the common capacitor C i1 is transferred to the first I/O selector 512.
  • each of the cells M i1 , M i2 , M i3 ,........., M i,(n-1) , M i,n on the i-th row of the bidirectional marching main memory stores the information/data, and transfers bi-directionally the information/data, synchronously with the clock signals supplied respectively from the first clock signal supply line CL1 and the second clock signal supply line CL2, step by step, between the first I/O selector 512 and the second I/O selector 513.
  • the bidirectional marching main memory 31 illustrated in Fig.34 stores the information/data of byte size or word size in each of cells U 1 , U 2 , U 3 ,........., U n-1 , U n and transfers bi-directionally the information/data of byte size or word size synchronously with the clock signal, pari passu, in the forward direction and/or reverse direction (backward direction) between a first I/O selector 512 and
  • Fig. 35(a) illustrates a bidirectional transferring mode of instructions in a one-dimensional marching main memory adjacent to a processor, where the instructions moves toward the processor, and moves from / to the next memory.
  • Fig. 35(b) illustrates a bidirectional transferring mode of scalar data in a one-dimensional marching main memory adjacent to an ALU 112, the scalar data moves toward the ALU and moves from / to the next memory.
  • Fig. 35(b) illustrates a bidirectional transferring mode of scalar data in a one-dimensional marching main memory adjacent to an ALU 112, the scalar data moves toward the ALU and moves from / to the next memory.
  • 35(c) illustrates a uni-directional transferring mode of vector/streaming data in a one-dimensional marching main memory adjacent to a pipeline 117, which will be explained in the following third embodiment, the vector/streaming data moves toward the pipeline 117, and moves from the next memory.
  • the marching main memory 31 used in the computer system pertaining to the first embodiment uses positioning to identify the starting point and ending point of a set of successive memory units U 1 , U 2 , U 3 ,........., U n-1 , U n in vector/streaming data.
  • each item must have a position index similar to conventional address.
  • Fig. 36(a) illustrates a configuration of conventional main memory, in which every memory units U 1 , U 2 , U 3 ,........., U n-1 , U n in are labeled by addresses A 1 , A 2 , A 3 ,........., A n-1 , A n , Fig.
  • 36(b) illustrates a configuration of one-dimensional marching main memory, in which the positioning of individual memory unit U 1 , U 2 , U 3 ,........., U n-1 , U n is not always necessary, but the positioning of individual memory unit U 1 , U 2 , U 3 ,........., U n-1 , U n is at least necessary to identify the starting point and ending point of a set of successive memory units in vector/streaming data.
  • FIG. 37(a) illustrates an inner configuration of present one-dimensional marching main memory, in which the position indexes like existing addresses are not necessary for scalar instruction I s , but the positioning of individual memory unit is at least necessary to identify the starting point and ending point of a set of successive memory units in vector instruction I v , as indicated by hatched circle.
  • Fig. 37(b) illustrates an inner configuration of present one-dimensional marching main memory, in which the position indexes are not necessary for scalar data "b" and "a”. However, as illustrated in Fig. 37(c), position indexes are at least necessary to identify the starting point and ending point of a set of successive memory units in vector/streaming data "o", "p", “q”, “r”, “s", “t”,.......... as indicated by hatched circle.
  • a marching memory family which includes a marching-instruction register file 22a and a marching-data register file 22b connected to the ALU 112, which will be explained in the following second embodiment, and a marching-instruction cache memory 21a and a marching-data cache memory 21b, which will be explained in the following third embodiment
  • the relation between the main memory, the register file and cache memory has their own position pointing strategy based on the property of locality of reference.
  • Fig. 38(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages P i-1,j-1 , P i,j-1 , P i+1,j-1 , P i+2,j-1 , P i-1,j , P i,j , P i+1,j , P i+2,j for vector/streaming data case.
  • Fig. 38(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages P i-1,j-1 , P i,j-1 , P i+1,j-1 , P i+2,j-1 , P i-1,j , P i,j , P i+1,j , P i+2,j for vector/streaming data case.
  • Fig. 38(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages P i-1,j-1 , P i,
  • FIG. 38(b) illustrates schematically an example of a configuration of the hatched page P i,j , which is implemented by a plurality of files F 1 , F 2 , F 3 , F 4 for vector/streaming data case, and each of the pages P i-1,j-1 , P i,j-1 , P i+1,j-1 , P i+2,j-1 , P i-1,j , P i,j , P i+1,j , P i+2,j can be used for marching cache memories 21a and 21b in the third embodiment.
  • each of the files F 1 , F 2 , F 3 , F 4 is implemented by a plurality of memory units U 1 , U 2 , U 3 ,........., U n-1 , U n for vector/streaming data case, and each of the file files F 1 , F 2 , F 3 , F 4 can be used for marching register files 22a and 22b in the second embodiment.
  • Fig. 39(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages P r-1,s-1 , P r,s-1 , P r+1,s-1 , P r+2,s-1 , P r-1,s , P r,s , P r+1,s , P r+2,s for programs/scalar data case, where each pages has its own position index as an address.
  • Fig. 39(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages P r-1,s-1 , P r,s-1 , P r+1,s-1 , P r+2,s-1 , P r-1,s , P r,s , P r+1,s , P r+2,s for programs/scalar data case, where each pages has its own position index as an address.
  • Fig. 39(a) illustrates schematically an example of an overall configuration of
  • each of the page P r-1,s-1 , P r,s-1 , P r+1,s-1 , P r+2,s-1 , P r-1,s , P r,s , P r+1,s , P r+2,s is implemented by a plurality of files F 1 , F 2 , F 3 , F 4 for programs/scalar data case.
  • Each of the page P r-1,s-1 , P r,s-1 , P r+1,s-1 , P r+2,s-1 , P r-1,s , P r,s , P r+1,s , P r+2,s can be used for marching cache memories 21a and 21b in the third embodiment, where each of the files F 1 , F 2 , F 3 , F 4 has its own position index as address.
  • each of the files F 1 , F 2 , F 3 , F 4 is implemented by a plurality of memory units U 1 , U 2 , U 3 ,........., U n , U n+1 , U n+2 , U n+3 , U n+4 , U n+5 for programs/scalar data case.
  • Each of the files F 1 , F 2 , F 3 , F 4 can be used for a marching register files 22a and 22b in the second embodiment, where each memory units U 1 , U 2 , U 3 ,........., U n , U n+1 , U n+2 , U n+3 , U n+4 , U n+5 has its own position index n+4, n+3, n+2,........., 5, 4, 3, 2, 1, 0 as address.
  • Fig. 39(c) represents position pointing strategy for all of the cases by digits in the binary system.
  • the n binary digits identify a single memory unit among 2 n memory units, respectively, in a memory structure having an equivalent size corresponding to the size of a marching register file.
  • the structure of one page has an equivalent size corresponding to the size of a marching cache memory, which is represented by two digits which identify four files F 1 , F 2 , F 3 , F 4 , while the structure of one marching main memory is represented by three digits which identify eight pages P r-1,s-1 , P r,s-1 , P r+1,s-1 , P r+2,s-1 , P r-1,s , P r,s , P r+1,s , P r+2,s in the marching main memory as illustrated in Fig. 39(a).
  • Fig. 40 compares the speed/capability of the conventional computer system without cache with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, Fig. 40(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U 1 , U 2 , U 3 ,........., U 100 , and compares with the speed/capability of the existing memory illustrated in Fig. 40(a).
  • one memory unit time T mue in the conventional computer system is estimated to be equal to one hundred of the memory unit streaming time T mus of the marching main memory 31 pertaining to the first embodiment of the present invention.
  • Fig. 41 compares the speed/capability of the worst case of the existing memory for scalar data or program instructions with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, the hatched portion of Fig. 41(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U 1 , U 2 , U 3 ,........., U 100 , and compares with the speed/capability of the worst case of the existing memory illustrated in Fig. 41(a). In the worst case, we can read out 99 memory units of the marching main memory 31, but they are not available due to a scalar program's requirement.
  • Fig. 42 compares the speed/capability of the typical case of the existing memory for scalar data or program instructions with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, Fig. 42(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U 1 , U 2 , U 3 ,........., U 100 , and compares with the speed/capability of the typical case of the existing memory illustrated in Fig. 42(a). In the typical case, we can read out 99 memory units but only several memory units are available, as illustrated by hatched memory units in the existing memory, by speculative data preparation in a scalar program.
  • Fig. 43 compares the speed/capability of the typical case of the existing memory for scalar data case with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, Fig. 43(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U 1 , U 2 , U 3 ,........., U 100 , and compares with the speed/capability of the existing memory illustrated in Fig. 43(a). Similar to the case illustrated in Figs.
  • Fig. 44 compares the speed/capability of the best case of the existing memory for streaming data, vector data or program instructions case with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, Fig. 44(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U 1 , U 2 , U 3 ,........., U 100 ,, and compares with the speed/capability of the best case of the existing memory illustrated in Fig. 44(a). In the best case, we can understand that one hundred memory units of the marching main memory 31 are usable for streaming data and data parallel.
  • the memory units can be arranged two-dimensionally on a chip as illustrated in Figs. 45-51 so that various mode of operation can be achieved without a switch/network. According to the two-dimensional marching main memory 31 of the first embodiment illustrated in Figs.
  • the memory units U 11 , U 12 , U 13 ,........., U 1, v-1 , U 1v ; U 22 , U 22 , U 23 ,........., U 2, v-2 , U 2v ; ..........; U u1 , U u2 , U u3 ,........., U u, v-1 , U uv are not required of the refreshment, because all of the memory units U 11 , U 12 , U 13 ,........., U 1, v-1 , U 1v ; U 22 , U 22 , U 23 ,........., U 2, v-2 , U 2v ; ..........; U u1 , U u2 , U u3 ,........., U u, v-1 , U uv are usually refreshed automatically due to the information-moving scheme (information-marching scheme).
  • Fig. 52(a) shows that the energy consumption in microprocessors can be decomposed into static power consumption and dynamic power consumption.
  • dynamic power consumption illustrated in Fig. 52(a) net and overhead of the power consumption are outstandingly illustrated in Fig. 52(b).
  • Fig. 52(c) only the net energy portions are practically necessary to operate a given job in a computer system, so that these pure energy parts make least energy consumption to perform the computer system. This means the shortest processing time is achieved by the net energy consumed illustrated in Fig. 52(c).
  • the access of instructions in the marching main memory 31 is not necessary, because instructions are actively accessed by themselves to processor 11.
  • the access of data in the marching main memory 31 is not necessary, because data are actively accessed by themselves to processor 11.
  • Fig. 53 shows an actual energy consumption distribution over a processor including registers and caches in the conventional architecture, estimated by William J. Dally, et al., in "Efficient Embedded Computing", Computer, vol. 41, no. 7, 2008, pp. 27-32.
  • Fig. 53 an estimation of the power consumption distribution on only the whole chip, except for wires between chips is disclosed.
  • the instruction supply power consumption is estimated to be 42%
  • the data supply power consumption is estimated to be 28%
  • the clock and control logic power consumption is estimated to be 24%
  • the arithmetic power consumption is estimated to be 6%.
  • instruction supply and data supply power consumptions are relatively larger than of the clock/ control logic power consumption and the arithmetic power consumption, which is ascribable to the inefficiency of cache/register accessing with lots of wires and some software overhead due to access ways of these caches and registers in addition to non-refreshment of all the memories, caches and registers.
  • the ratio of the instruction supply power consumption to the data supply power consumption is 3:2, and the ratio of the clock and control logic power consumption to the arithmetic power consumption is 4:1, in accordance with the computer system pertaining to the first embodiment of the present invention illustrated in Fig.2, we can reduce easily the data supply power consumption to 20% by using the marching main memory 31 at least partly so that the instruction supply power consumption becomes 30%, while we can increase the arithmetic power consumption to 10% so that the clock and control logic power consumption become 40%, which means that the sum of the instruction supply power consumption and the data supply power consumption can be made 50%, and the sum of the clock and control logic power consumption and the arithmetic power consumption can be made 50%.
  • the instruction supply power consumption becomes 15%
  • the clock and control logic power consumption will become 60%, which means that the sum of the instruction supply power consumption and the data supply power consumption can be made 35%, while the sum of the clock and control logic power consumption and the arithmetic power consumption can be made 75%.
  • the conventional computer system dissipates energy as illustrated in the Fig. 54(a) with a relatively large average active time for addressing and read/writing memory units, accompanied by wire delay time, while the present computer system dissipates smaller energy as illustrated in the Fig. 54(b), because the present computer system has a shorter average active smooth time through marching memory, and we could process the same data faster than the conventional computer system with less energy.
  • a computer system pertaining to a second embodiment of the present invention encompasses a processor 11 and a marching main memory 31.
  • the processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, an arithmetic logic unit (ALU) 112 configured to execute arithmetic and logic operations synchronized with the clock signal, a marching-instruction register file (RF) 22a connected to the control unit 111 and a marching-data register file (RF) 22b connected to the ALU 112.
  • ALU arithmetic logic unit
  • the marching-instruction register file 22a has an array of instruction register units, instruction-register input terminals of the third array configured to receive the stored instruction from the marching main memory 31, and instruction-register output terminals of the third array, configured to store instruction in each of instruction register units and to transfer successively and periodically the stored instruction in each of instruction register units to an adjacent instruction register unit being synchronized with the clock signal from the instruction register units adjacent to the instruction-register input terminals toward the instruction register units adjacent to the instruction-register output terminals, so as to provide actively and sequentially instruction implemented by the stored instruction to the control unit 111 through the instruction-register output terminals so that the control unit 111 can execute operations with the instruction.
  • the marching-data register file 22b has an array of data register units, data-register input terminals of the fourth array configured to receive the stored data from the marching main memory 31, and data-register output terminals of the fourth array, configured to store data in each of data register units and to transfer successively and periodically the stored data in each of data register units to an adjacent data register unit being synchronized with the clock signal from the data register units adjacent to the data-register input terminals toward the data register units adjacent to the data-register output terminals, so as to provide actively and sequentially the data to the ALU 112 through the data-register output terminals so that the ALU 112 can execute operations with the data, although the detailed illustration of, the marching-data register file 22b is omitted,.
  • a portion of the marching main memory 31 and the marching-instruction register file 22a are electrically connected by a plurality of joint members 54, and remaining portion of the marching main memory 31 and the marching-data register file 22b are electrically connected by another plurality of joint members 54.
  • the resultant data of the processing in the ALU 112 are sent out to the marching-data register file 22b. Therefore, as represented by bidirectional arrow PHI(Greek-letter) 24 , data are transferred bi-directionally between the marching-data register file 22b and the ALU 112. Furthermore, the data stored in the marching-data register file 22b are sent out to the marching main memory 31 through the joint members 54. Therefore, as represented by bidirectional arrow PHI 23 , data are transferred bi-directionally between the marching main memory 31 and the marching-data register file 22b through the joint members 54.
  • a computer system pertaining to a third embodiment of the present invention encompasses a processor 11, a marching-cache memory (21a, 21b) and a marching main memory 31.
  • the processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, an arithmetic logic unit (ALU) 112 configured to execute arithmetic and logic operations synchronized with the clock signal, a marching-instruction register file (RF) 22a connected to the control unit 111 and a marching-data register file (RF) 22b connected to the ALU 112.
  • ALU arithmetic logic unit
  • the marching-cache memory (21a, 21b) embraces a marching-instruction cache memory 21a and a marching-data cache memory 21b.
  • each of the marching-instruction cache memory 21a and the marching-data cache memory21b has an array of cache memory units at locations corresponding to a unit of information, cache input terminals of the array configured to receive the stored information from the marching main memory 31, and cache output terminals of the array, configured to store information in each of cache memory units and to transfer, synchronously with the clock signal, step by step, the information each to an adjacent cache memory unit, so as to provide actively and sequentially the stored information to the processor 11 so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
  • a portion of the marching main memory 31 and the marching-instruction cache memory 21a are electrically connected by a plurality of joint members 52, and remaining portion of the marching main memory 31 and the marching-data cache memory 21b are electrically connected by another plurality of joint members 52. Furthermore, the marching-instruction cache memory 21a and the marching-instruction register file 22a are electrically connected by a plurality of joint members 51, and the marching-data cache memory 21b and the marching-data register file 22b are electrically connected by another plurality of joint members 51.
  • the resultant data of the processing in the ALU 112 are sent out to the marching-data register file 22b, and, as represented by bidirectional arrow PHI(Greek-letter) 34 , data are transferred bi-directionally between the marching-data register file 22b and the ALU 112. Furthermore, the data stored in the marching-data register file 22b are sent out to the marching-data cache memory 21b through the joint members 51, and, as represented by bidirectional arrow PHI 33 , data are transferred bi-directionally between the marching-data cache memory 21b and the marching-data register file 22b through the joint members 51.
  • the data stored in the marching-data cache memory 21b are sent out to the marching main memory 31 through the joint members 52, and, as represented by bidirectional arrow PHI 32 , data are transferred bi-directionally between the marching main memory 31 and the marching-data cache memory 21b through the joint members 52.
  • the ALU 112 in the computer system of the third embodiment may includes a plurality of arithmetic pipelines P 1 , P 2 , P 3 ,........., P n configured to receive the stored information through marching register units R 11 , R 12 , R 13 ,........., R 1n ; R 22 , R 22 , R 23 ,........., R 2n , in which data move in parallel with the alignment direction of the arithmetic pipelines P 1 , P 2 , P 3 ,........., P n .
  • marching-vector register units R 11 , R 12 , R 13 ,........., R 1n ; R 22 , R 22 , R 23 ,........., R 2n can be
  • a plurality of marching cache units C 11 , C 12 , C 13 ,........., C 1n ; C 21 , C 22 , C 23 ,........., C 2n ; C 31 , C 32 , C 33 ,........., C 3n can be aligned in parallel.
  • the ALU 112 in the computer system of the third embodiment may include a single processor core 116, and as represented by cross-directional arrows, the information can moves from the marching-cache memory 21 to the marching-register file 22, and from the marching-register file 22 to the processor core 116.
  • the resultant data of the processing in the processor core 116 are sent out to the marching-register file 22 so that data are transferred bi-directionally between the marching-register file 22 and the processor core 116.
  • the data stored in the marching-register file 22 are sent out to the marching-cache memory 21 so that data are transferred bi-directionally between the marching-cache memory 21 and the marching-register file 22. In case of instructions movement, there is no flow along the opposite direction of the information to be processed.
  • the ALU 112 in the computer system of the third embodiment may include a single arithmetic pipeline 117, and as represented by cross-directional arrows, the information can moves from the marching-cache memory 21 to the marching-vector register file 22v, and from the marching-vector register file 22v to the arithmetic pipeline 117.
  • the resultant data of the processing in the arithmetic pipeline 117 are sent out to the marching-vector register file 22v so that data are transferred bi-directionally between the marching-vector register file 22v and the arithmetic pipeline 117.
  • the data stored in the marching-vector register file 22v are sent out to the marching-cache memory 21 so that data are transferred bi-directionally between the marching-cache memory 21 and the marching-vector register file 22v. In case of instructions movement, there is no flow along the opposite direction of the information to be processed.
  • the ALU 112 in the computer system of the third embodiment may include a plurality of processor cores 116 -1 , 116 -2 , 116 -3 , 116 -4 ,........., 116 -m , and as represented by cross-directional arrows, the information can moves from the marching-cache memory 21 to the marching-register file 22, and from the marching-register file 22 to the processor cores 116 -1 , 116 -2 , 116 -3 , 116 -4 ,........., 116 -m .
  • the resultant data of the processing in the processor cores 116 -1 , 116 -2 , 116 -3 , 116 -4 ,........., 116 -m are sent out to the marching-register file 22 so that data are transferred bi-directionally between the marching-register file 22 and the processor cores 116 -1 , 116 -2 , 116 -3 , 116 -4 ,........., 116 -m .
  • the data stored in the marching-register file 22 are sent out to the marching-cache memory 21 so that data are transferred bi-directionally between the marching-cache memory 21 and the marching-register file 22. In case of instructions movement, there is no flow along the opposite direction of the information to be processed.
  • the ALU 112 in the computer system of the third embodiment may include a plurality of arithmetic pipelines 117 -1 , 117 -2 , 117 -3 , 117 -4 ,........., 117 -m , and as represented by cross-directional arrows, the information can moves from the marching-cache memory 21 to the marching-vector register file 22v, and from the marching-vector register file 22v to the arithmetic pipelines 117 -1 , 117 -2 , 117 -3 , 117 -4 ,........., 117 -m .
  • the resultant data of the processing in the arithmetic pipelines 117 -1 , 117 -2 , 117 -3 , 117 -4 ,........., 117 -m are sent out to the marching-vector register file 22v so that data are transferred bi-directionally between the marching-vector register file 22v and the arithmetic pipelines 117 -1 , 117 -2 , 117 -3 , 117 -4 ,........., 117 -m .
  • the data stored in the marching-vector register file 22v are sent out to the marching-cache memory 21 so that data are transferred bi-directionally between the marching-cache memory 21 and the marching-vector register file 22v. In case of instructions movement, there is no flow along the opposite direction of the information to be processed.
  • the ALU 112 in the computer system of the third embodiment may include a plurality of arithmetic pipelines 117 -1 , 117 -2 , 117 -3 , 117 -4 ,........., 117 -m , and a plurality of marching cache memories 21 -1 , 21 -2 , 21 -3 , 21 -4 ,........., 21 -m are electrically connected to the marching main memory 31.
  • a first marching-vector register file 22v -1 is connected to the first marching-cache memory 21 -1
  • a first arithmetic pipeline 117 -1 is connected to the first marching-vector register file 22v -1 .
  • a second marching-vector register file 22v -2 is connected to the second marching-cache memory 21 -2
  • a second arithmetic pipelines 117 -2 is connected to the second marching-vector register file 22v -2
  • a third marching-vector register file 22v -3 is connected to the third marching-cache memory 21 -3
  • a third arithmetic pipelines 117 -3 is connected to the third marching-vector register file 22v -3 ;
  • a m-th marching-vector register file 22v -m is connected to the m-th marching-cache memory 21 -m
  • a m-th arithmetic pipelines 117 -m is connected to the m-th marching-vector register file 22v -m .
  • the information moves from the marching main memory 31 to the marching cache memories 21 -1 , 21 -2 , 21 -3 , 21 -4 ,........., 21 -m in parallel, from marching cache memories 21 -1 , 21 -2 , 21 -3 , 21 -4 ,........., 21 -m to the marching-vector register files 22v -1 , 22v -2 , 22v -3 , 22v -4 ,........., 22v -m in parallel, and from the marching-vector register files 22v -1 , 22v -2 , 22v -3 , 22v -4 ,........., 22v -m to the arithmetic pipelines 117 -1 , 117 -2 , 117 -3 , 117 -4 ,........., 117 -m in parallel.
  • the resultant data of the processing in the arithmetic pipelines 117 -1 , 117 -2 , 117 -3 , 117 -4 ,........., 117 -m are sent out to the marching-vector register files 22v -1 , 22v -2 , 22v -3 , 22v -4 ,........., 22v -m so that data are transferred bi-directionally between the marching-vector register files 22v -1 , 22v -2 , 22v -3 , 22v -4 ,........., 22v -m and the arithmetic pipelines 117 -1 , 117 -2 , 117 -3 , 117 -4 ,........., 117 -m .
  • the data stored in the marching-vector register files 22v -1 , 22v -2 , 22v -3 , 22v -4 ,........., 22v -m are sent out to the marching cache memories 21 -1 , 21 -2 , 21 -3 , 21 -4 ,........., 21 -m so that data are transferred bi-directionally between the marching cache memories 21 -1 , 21 -2 , 21 -3 , 21 -4 ,........., 21 -m and the marching-vector register files 22v -1 , 22v -2 , 22v -3 , 22v -4 ,........., 22v -m , and the data stored in the marching cache memories 21 -1 , 21 -2 , 21 -3 , 21 -4 ,........., 21 -m are sent out to the marching main memory 31 so that data are transferred bi-directionally between the marching main memory 31 and the marching cache memories 21 -1 , 21 -2 , 21 -3 , 21 -4 ,
  • a plurality of conventional cache memories 321 -1 , 321 -2 , 321 -3 , 321 -4 ,........., 321 -m are electrically connected to the conventional main memory 331 through wires and/or buses which implement von Neumann bottleneck 325.
  • a computer system of a fourth embodiment encompasses a conventional main memory 31s, a mother marching main memory 31 -0 connected to the conventional main memory 31s, and a plurality of processing units 12 -1 , 12 -2 , 12 -3 ,........, configured to communicate with mother marching main memory 31 -0 so as to implement a high performance computing (HPC) system, which can be used for graphics processing unit (GPU)-based general-purpose computing.
  • HPC high performance computing
  • the HPC system of the fourth embodiment further includes a control unit 111 having a clock generator 113 configured to generate a clock signal, and a field programmable gate array (FPGA) configured to switch-control operations of the plurality of processing units 12 -1 , 12 -2 , 12 -3 ,.........., optimizing the flow of crunching calculations by running parallel, constructing to help manage and organize bandwidth consumption.
  • FPGA is, in essence, a computer chip that can rewire itself for a given task.
  • FPGA can be programmed with hardware description languages such as VHDL or Verilog.
  • the first processing unit 12 -1 encompasses a first branched-marching main memory 31 -1 , a plurality of first marching cache memories 21 -11 , 21 -12 ,........., 21 -1p electrically connected respectively to the first branched-marching main memory 31 -1 , a plurality of first marching-vector register files 22v -11 , 22v -12 ,........., 22v -1p electrically connected respectively to the first marching cache memories 21 -11 , 21 -12 ,........., 21 -1p , a plurality of first arithmetic pipelines 117 -11 , 117 -12 ,........., 117 -1p electrically connected respectively to the first marching-vector register files 22v -11 , 22v -12 ,........., 22v -1p .
  • each of the mother marching main memory 31 -0 , the first branched-marching main memory 31 -1 , the first marching cache memories 21 -11 , 21 -12 ,........., 21 -1p , and the first marching-vector register files 22v -11 , 22v -12 ,........., 22v -1p encompasses an array of memory units, input terminals of the array and output terminals of the array, configured to store information in each of memory units and to transfer synchronously with the clock signal, step by step, from a side of input terminals toward the output terminals.
  • the information moves from the mother marching main memory 31 -0 to the first branched-marching main memory 31 -1 , from the first branched-marching main memory 31 -1 to the first marching cache memories 21 -11 , 21 -12 ,........., 21 -1p in parallel, from first marching cache memories 21 -11 , 21 -12 ,........., 21 -1p to the first marching-vector register files 22v -11 , 22v -12 ,........., 22v -1p in parallel, and from the first marching-vector register files 22v -11 , 22v -12 ,........., 22v -1p to the first a
  • the resultant data of the processing in the first arithmetic pipelines 117 -11 , 117 -12 ,........., 117 -1p are sent out to the first marching-vector register files 22v -11 , 22v -12 ,........., 22v -1p so that data are transferred bi-directionally between the first marching-vector register files 22v -11 , 22v -12 ,........., 22v -1p and the first arithmetic pipelines 117 -11 , 117 -12 , ........., 117 -1p .
  • the data stored in the first marching-vector register files 22v -11 , 22v -12 , ........., 22v -1p are sent out to the first marching cache memories 21 -11 , 21 -12 ,........., 21 -1p so that data are transferred bi-directionally between the first marching cache memories 21 -11 , 21 -12 , ........., 21 -1p and the first marching-vector register files 22v -11 , 22v -12 , ........., 22v -1p , and the data stored in the first marching cache memories 21 -11 , 21 -12 , ........., 21 -1p are sent out to the first branched-marching main memory 31 -1 so that data are transferred bi-directionally between the first branched-marching main memory 31 -1 and the first marching cache memories 21 -11 , 21 -12 , ........., 21 -1p .
  • the FPGA controls the movement of instructions such that there is no flow along the opposite direction of the information to be processed in the first march
  • the second processing unit 12 -2 encompasses a second branched-marching main memory 31 -2 , a plurality of second marching cache memories 21 -21 , 21 -22 ,........., 21 -2p electrically connected respectively to the second branched-marching main memory 31 -2 , a plurality of second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2q electrically connected respectively to the second marching cache memories 21 -21 , 21 -22 ,........., 21 -2p , a plurality of second arithmetic pipelines 117 -21 , 117 -22 ,........., 117 -2p electrically connected respectively to the second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2q .
  • each of the mother marching main memory 31 -0 , the second branched-marching main memory 31 -2 , the second marching cache memories 21 -21 , 21 -22 ,........., 21 -2p , and the second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2p encompasses an array of memory units, input terminals of the array and output terminals of the array, configured to store information in each of memory units and to transfer synchronously with the clock signal, step by step, from a side of input terminals toward the output terminals.
  • the information moves from the mother marching main memory 31 -0 to the second branched-marching main memory 31 -2 , from the second branched-marching main memory 31 -2 to the second marching cache memories 21 -21 , 21 -22 ,........., 21 -2q in parallel, from second marching cache memories 21 -21 , 21 -22 ,........., 21 -2q to the second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2q in parallel, and from the second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2q to the second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2q to the second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2q in parallel, and from the second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2q to the
  • the resultant data of the processing in the second arithmetic pipelines 117 -21 , 117 -22 ,........., 117 -2q are sent out to the second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2q so that data are transferred bi-directionally between the second marching-vector register files 22v -21 , 22v -22 ,........., 22v -2q and the second arithmetic pipelines 117 -21 , 117 -22 , ........., 117 -2q .
  • the data stored in the second marching-vector register files 22v -21 , 22v -22 , ........., 22v -2q are sent out to the second marching cache memories 21 -21 , 21 -22 ,........., 21 -2q so that data are transferred bi-directionally between the second marching cache memories 21 -21 , 21 -22 , ........., 21 -2q and the second marching-vector register files 22v -21 , 22v -22 , ........., 22v -2q , and the data stored in the second marching cache memories 21 -21 , 21 -22 , ........., 21 -2q are sent out to the second branched-marching main memory 31 -2 so that data are transferred bi-directionally between the second branched-marching main memory 31 -2 and the second marching cache memories 21 -21 , 21 -22 , ........., 21 -2q .
  • the FPGA controls the movement of instructions such that there is no flow along the opposite direction of the information to be processed in the second march
  • vector instructions generated from loops in a source program are transferred from the mother marching main memory 31 -0 to the first processing unit 12 -1 , the second processing unit 12 -2 , the third processing unit 12 -3 , ..........in parallel, so that parallel processing of these vector instructions can be executed by arithmetic pipelines117 -11 , 117 -12 , ........., 117 -1p , 117 -21 , 117 -22 , ........., 117 -2q , .......... in each of the first processing unit 12 -1 , the second processing unit 12 -2 , the third processing unit 12 -3 ,........
  • the FPGA-controlled HPC system pertaining to the fourth embodiment can execute, for example, thousands of threads or more simultaneously at very high speed, enabling high computational throughput across large amounts of data.
  • a computer system pertaining to a fifth embodiment of the present invention encompasses a processor 11, a stack of marching-register files 22 -1 , 22 -2 , 22 -3 , ........, implementing a three-dimensional marching-register file connected to the processor 11, a stack of marching-cache memories 21 -1 , 21 -2 , 21 -3 , ........, implementing a three-dimensional marching-cache memory connected to the three-dimensional marching-register file (22 -1 , 22 -2 , 22 -3 , ........), and a stack of marching main memories 31 -1 , 31 -2 , 31 -3 , ........., implementing a three-dimensional marching main memory connected to the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........).
  • the processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, an arithm
  • a first marching-register file 22 -1 includes a first marching-instruction register file 22a -1 connected to the control unit 111 and a first marching-data register file 22b -1 connected to the ALU 112
  • a second marching-register file 22 -2 includes a second marching-instruction register file connected to the control unit 111 and a second marching-data register file connected to the ALU 112
  • a third marching-register file 22 -3 includes a third marching-instruction register file connected to the control unit 111 and a third marching-data register file connected to the ALU 112, and,.........
  • the first marching-cache memory 21 -1 includes a first marching-instruction cache memory 21a -1 and a first marching-data cache memory21b -1
  • the second marching-cache memory 21 -2 includes a second marching-instruction cache memory and a second marching-data cache memory
  • the third marching-cache memory 21 -3 includes a third marching-instruction cache memory and a third marching-data cache memory
  • each of the marching main memories 31 -1 , 31 -2 , 31 -3 , ........ has a two-dimensional array of memory units each having a unit of information, input terminals of the main memory array and output terminals of the main memory array
  • each of the marching main memories 31 -1 , 31 -2 , 31 -3 , ........ stores the information in each of memory units and to transfer synchronously with the clock signal, step by step, toward the output terminals of the main memory array, so as to provide the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........) with the stored information actively and sequentially
  • each of the marching-cache memories 21 -1 , 21 -2 , 21 -3 , ........ has a two-dimensional array of cache memory units, cache input terminals of the marching-cache array configured to receive the stored information from the three-dimensional marching main memory
  • Each of the marching main memories 31 -1 , 31 -2 , 31 -3 , ........ is implemented by the two-dimensional array of memory units delineated at a surface of a semiconductor chip, and a plurality of the semiconductor chips are stacked vertically as illustrated in 27A, sandwiching heat dissipating plates 58m -1 , 58m -2 , 58m -3 , ......... between the plurality of the semiconductor chips so as to implement the three-dimensional marching main memory (31 -1 , 31 -2 , 31 -3 , .........).
  • each of the marching-cache memories 21 -1 , 21 -2 , 21 -3 , ........, is implemented by the two-dimensional array of memory units delineated at a surface of a semiconductor chip, and a plurality of the semiconductor chips are stacked vertically as illustrated in 27B, sandwiching heat dissipating plates 58c -1 , 58c -2 , 58c -3 , ........., between the plurality of the semiconductor chips so as to implement the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........), and each of the marching-register files 22 -1 , 22 -2 , 22 -3 ,........, is implemented by the two-dimensional array of memory units delineated at a surface of a semiconductor chip, and a plurality of
  • the heat dissipating plates 58c -1 , 58c -2 , 58c -3 , ........, 58r -1 , 58r -2 , 58r -3 , ........., are made of materials having high thermal conductivity such as diamond. Because there are no interconnects inside the surfaces of the semiconductor chips in the three-dimensional configuration illustrated in Figs.
  • 65(a)-(c) and 66 is suitable for establishing the thermal flow from active computing semiconductor chips through the heat dissipating plates 58c -1 , 58c -2 , 58c -3 , ........., 58c -1 , 58c -2 , 58c -3 , ........, 58r -1 , 58r -2 , 58r -3 , ........., to outside the system more effectively. Therefore, in the computer system of the fifth embodiment, these semiconductor chips can be stacked proportionally to the scale of the system, and as illustrated in Figs.
  • the three-dimensional marching main memory (31 -1 , 31 -2 , 31 -3 , .........) and the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........) are electrically connected by a plurality of joint members
  • the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........) and the three-dimensional marching-register file (22 -1 , 22 -2 , 22 -3 , ........) are electrically connected by a plurality of joint members
  • the three-dimensional marching-register file (22 -1 , 22 -2 , 22 -3 , ........) and processor 11 are electrically connected by another plurality of joint members.
  • the resultant data of the processing in the ALU 112 are sent out to the three-dimensional marching-register file (22 -1 , 22 -2 , 22 -3 , ........) through the joint members so that data are transferred bi-directionally between the three-dimensional marching-register file (22 -1 , 22 -2 , 22 -3 , ........) and the ALU 112.
  • the data stored in the three-dimensional marching-register file (22 -1 , 22 -2 , 22 -3 , ........) are sent out to the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........) through the joint members so that data are transferred bi-directionally between the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........) and the three-dimensional marching-register file (22 -1 , 22 -2 , 22 -3 , ........).
  • the data stored in the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........) are sent out to the three-dimensional marching main memory (31 -1 , 31 -2 , 31 -3 , .........) through the joint members so that data are transferred bi-directionally between the three-dimensional marching main memory (31 -1 , 31 -2 , 31 -3 , .........) and the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........).
  • vector instructions generated from loops in a source program are transferred from the three-dimensional marching main memory (31 -1 , 31 -2 , 31 -3 , .........) to the control unit 111 through the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........) through the three-dimensional marching-cache (21 -1 , 21 -2 , 21 -3 , ........) and the three-dimensional marching-register file (22 -1 , 22 -2 , 22 -3 , ........) so that each of these vector instructions can be executed by arithmetic pipelines in the control unit 111.
  • the computer system of the fifth embodiment can achieve much higher processing speed and lower power consumption than the conventional computer system, keeping the temperature of the computer system at lower temperature than the conventional computer system so as to establish "a cool computer", by employing the heat dissipating plates 58c -1 , 58c -2 , 58c -3 , ........., 58c -1 , 58c -2 , 58c -3 , ........, 58r -1 , 58r -2 , 58r -3 , ........., which are made of materials having high thermal conductivity such as diamond and disposed between the semiconductor chips.
  • the cool computer pertaining to the fifth embodiment is different from existing computers because the cool computer is purposely architected and designed with an average of 30% less energy consumption and 10000% less size to obtain 100 times higher speed, for example.
  • FIG. 64, 65(a), 65(b) and 65(c) are mere examples, and there are various ways and combinations how to implement three-dimensional configurations so as to facilitate the organization of a scalable computer system.
  • a first chip (top chip) merging a plurality of arithmetic pipelines 117 and a plurality of marching-register files 22, a second chip (middle chip) merging a marching-cache memory 21 and a third chip (bottom chip) merging a marching main memory 31 can be stacked vertically.
  • Each of the arithmetic pipelines 117 may include a vector-processing unit, and each of the marching-register files 22 may include marching-vector registers.
  • a plurality of joint members 55a are inserted, and between the second and third chips, a plurality of joint members 55b are inserted.
  • each of joint members 55a and 55b may be implemented by an electrical conductive bump such as a solder ball, a gold (Au) bump, a silver (Ag) bump, a copper (Cu) bump, a nickel-gold (Ni-Au) alloy bump or a nickel-gold-indium (Ni-Au-In) alloy bump.
  • an electrical conductive bump such as a solder ball, a gold (Au) bump, a silver (Ag) bump, a copper (Cu) bump, a nickel-gold (Ni-Au) alloy bump or a nickel-gold-indium (Ni-Au-In) alloy bump.
  • a first three-dimensional (3D)-stack embracing a first top chip, a first middle chip and first bottom chip and a second 3D-stack embracing a second top chip, a second middle chip and second bottom chip may be disposed two dimensionally on a same substrate or a same circuit board so as to implement a parallel computing with multiple processors, in which the first 3D-stack and the second 3D-stack are connected by bridges 59a and 59b.
  • a first top chip merging a plurality of first arithmetic pipelines 117 -1 and a plurality of first marching-register files 22 -1 , a first middle chip merging a first marching-cache memory 21 -1 and a first bottom chip merging a first marching main memory 31 -1 are 3D-stacked vertically.
  • Each of the first arithmetic pipelines 117 -1 may include a vector-processing unit, and each of the first marching-cache files 22 -1 may include marching-vector registers.
  • each of joint members 55a -1 and 55b -1 may be implemented by an electrical conductive bump such as a solder ball, a gold (Au) bump, a silver (Ag) bump, a copper (Cu) bump, a nickel-gold (Ni-Au) alloy bump or a nickel-gold-indium (Ni-Au-In) alloy bump.
  • an electrical conductive bump such as a solder ball, a gold (Au) bump, a silver (Ag) bump, a copper (Cu) bump, a nickel-gold (Ni-Au) alloy bump or a nickel-gold-indium (Ni-Au-In) alloy bump.
  • a second top chip merging a plurality of second arithmetic pipelines 117 -2 and a plurality of second marching-register files 22 -2 , a second middle chip merging a second marching-cache memory 21 -2 and a second bottom chip merging a second marching main memory 31 -2 are 3D-stacked vertically.
  • Each of the second arithmetic pipelines 117 -2 may include a vector-processing unit, and each of the second marching-cache files 22 -2 may include marching-vector registers.
  • each of joint members 55a -2 and 55b -2 may be implemented by an electrical conductive bump such as a solder ball, a gold (Au) bump, a silver (Ag) bump, a copper (Cu) bump, a nickel-gold (Ni-Au) alloy bump or a nickel-gold-indium (Ni-Au-In) alloy bump.
  • an electrical conductive bump such as a solder ball, a gold (Au) bump, a silver (Ag) bump, a copper (Cu) bump, a nickel-gold (Ni-Au) alloy bump or a nickel-gold-indium (Ni-Au-In) alloy bump.
  • heat-dissipating plates can be inserted between the first top and first middle chips, between the first middle and first bottom chips, between the second top and second middle chips and between the second middle and second bottom chips similar to the configuration illustrated in Figs. 65(a)-(c) and 66 so as to achieve "cool chips".
  • a field programmable gate array may switch-control the operations of the first and second 3D-stacks, by traveling a thread or chaining of vector processing on the first arithmetic pipelines 117 -1 and the second arithmetic pipelines 117 -2 , implementing a HPC system, which can be used for GPU-based general-purpose computing.
  • a first chip (top chip) merging a plurality of arithmetic pipelines 117, a second chip merging and a plurality of marching-register files 22, a third chip merging a marching-cache memory 21, a fourth chip merging a first marching main memory 31 -1 , a fifth chip merging a marching main memory 31 -2 and a sixth chip (bottom chip) merging a third marching main memory 31 -3 can be stacked vertically.
  • Each of the arithmetic pipelines 117 may include a vector-processing unit, and each of the marching-register files 22 may include marching-vector registers so that vector instructions generated from loops in a source program can be executed in the vector-processing unit.
  • a first heat dissipating plate 58 -1 is inserted between the first and second chips
  • a second heat dissipating plate 58 -2 is between the second and third chips
  • a third heat dissipating plate 58 -1 is between the third and fourth chips
  • a fourth heat dissipating plate 58 -4 is between the fourth and fifth chips
  • a fifth heat dissipating plate 58 -5 is between the fifth and sixth chips so as to achieve "cool chips".
  • the cool-chip-configuration illustrated in Fig. 69 is not limited to a case of six chips, but expandable to three-dimensional stacking structures with any number of chips, because the sandwich structure illustrated in Fig. 69 is suitable for establishing the thermal flow from active computing chips through the heat dissipating plates 58 -1 , 58 -2 , 58 -3 , 58 -4 , 58 -5 to outside of the cool computer system more effectively. Therefore, the number of cool chips in the computer system of the fifth embodiment can be increased in proportion to the scale of the computer system.
  • each of the 3D-stacks includes cooling technology with heat dissipating plate 58 such as diamond plate inserted between the semiconductor memory chips 3a and 3b, in which at least one of the marching memory classified in the marching memory family is merged, the term of "the marching memory family" includes the marching-instruction register file 22a and the marching-data register file 22b connected to the ALU 112 explained in the second embodiment, and the marching-instruction cache memory 21a and the marching-data cache memory 21b explained in the third embodiment, in addition to the marching main memory 31 explained in the first embodiment of the present invention.
  • a 3D-stack implementing a part of the fundamental core of the computer system pertaining to the fifth embodiment of the present invention, embraces a first semiconductor memory chip 3a merging at least one of the marching memory in the marching memory family, a heat dissipating plate 58 disposed under the first semiconductor memory chip 3a, a second semiconductor memory chip 3b disposed under the heat dissipating plate 58, which merges at least one of the marching memory in the marching memory family, and a processor 11 disposed at a side of the heat dissipating plate 58.
  • Fig. 70 a 3D-stack, implementing a part of the fundamental core of the computer system pertaining to the fifth embodiment of the present invention, embraces a first semiconductor memory chip 3a merging at least one of the marching memory in the marching memory family, a heat dissipating plate 58 disposed under the first semiconductor memory chip 3a, a second semiconductor memory chip 3b disposed under the heat dissipating plate 58,
  • the processor 11 can be disposed at any required or appropriate site in the configuration of the 3D-stack or external of the 3D-stack, depending on the design choice of the 3D-stack.
  • the processor 11 can be allocated at the same horizontal level of the first semiconductor memory chip 3a or at the level of the second semiconductor memory chip 3b.
  • the marching memory merged on the first semiconductor memory chip 3a and the marching memory merged on the second semiconductor memory chip 3b stores program instruction, respectively. In the 3D configuration illustrated in Fig.
  • a first control path is provided between the first semiconductor memory chip 3a and the processor 11
  • a second control path is provided between the second semiconductor memory chip 3b and the processor 11 so as to facilitate the execution of the control processing with the processor 11.
  • a further data-path may be provided between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b so as to facilitate direct communication of the program instruction between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b.
  • another 3D-stack implementing a part of the fundamental core of the computer system pertaining to the fifth embodiment of the present invention, embraces a first semiconductor memory chip 3a merging at least one of the marching memory in the marching memory family, a heat dissipating plate 58 disposed under the first semiconductor memory chip 3a, a second semiconductor memory chip 3b disposed under the heat dissipating plate 58, which merges at least one of the marching memory in the marching memory family, and a ALU 112 disposed at a side of the heat dissipating plate 58.
  • the location of the ALU 112 is not limited to the site illustrated in Fig.
  • the ALU 112 can be disposed at any required or appropriate site in the configuration of the 3D-stack or external of the 3D-stack, such as a site allocated at the same horizontal level of the first semiconductor memory chip 3a or at the level of the second semiconductor memory chip 3b, depending on the design choice of the 3D-stack.
  • the marching memory merged on the first semiconductor memory chip 3a and the marching memory merged on the second semiconductor memory chip 3b read/write scalar data, respectively. In the 3D configuration illustrated in Fig.
  • a first data-path is provided between the first semiconductor memory chip 3a and the ALU 112
  • a second data-path is provided between the second semiconductor memory chip 3b and the ALU 112 so as to facilitate the execution of the scalar data processing with the ALU 112.
  • a further data-path may be provided between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b so as to facilitate direct communication of the scalar data between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b.
  • a still another 3D-stack implementing a part of the fundamental core of the computer system pertaining to the fifth embodiment of the present invention, embraces a first semiconductor memory chip 3a merging at least one of the marching memory in the marching memory family, a heat dissipating plate 58 disposed under the first semiconductor memory chip 3a, a second semiconductor memory chip 3b disposed under the heat dissipating plate 58, which merges at least one of the marching memory in the marching memory family, and an arithmetic pipelines 117 disposed at a side of the heat dissipating plate 58. Similar to the topologies illustrated in Figs.
  • the location of the arithmetic pipelines 117 is not limited to the site illustrated in Fig. 72, and the arithmetic pipelines 117 can be disposed at any required or appropriate site.
  • the marching memory merged on the first semiconductor memory chip 3a and the marching memory merged on the second semiconductor memory chip 3b read/write vector/streaming data, respectively. In the 3D configuration illustrated in Fig.
  • a first data-path is provided between the first semiconductor memory chip 3a and the arithmetic pipelines 117
  • a second data-path is provided between the second semiconductor memory chip 3b and the arithmetic pipelines 117 so as to facilitate the execution of the vector/streaming data processing with the arithmetic pipelines 117.
  • a further data-path may be provided between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b so as to facilitate direct communication of the vector/streaming data between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b.
  • the 3D hybrid computer system encompasses a first left chip (top left chip) 3p -1 merging at least one of the marching memory in the marching memory family, a second left chip 3p -2 merging at least one of the marching memory in the marching memory family, a third left chip 3p -3 merging at least one of the marching memory in the marching memory family, a fourth left chip 3p -4 merging at least one of the marching memory in the marching memory family, a fifth left chip 3p -5 merging at least one of the marching memory in the marching memory family and a sixth left chip (bottom left chip) 3p -6 merging at least one of the marching memory in the marching memory family, which are stacked vertically.
  • a first left heat dissipating plate 58a -1 is inserted between the first left chip 3p -1 and second left chip 3p -2
  • a second left heat dissipating plate 58a -2 is inserted between the second left chip 3p -2 and third left chip 3p -3
  • a third left heat dissipating plate 58a -1 is inserted between the third left chip 3p -3 and fourth left chip 3p -4
  • a fourth left heat dissipating plate 58a -4 is inserted between the fourth left chip 3p -4 and fifth left chip 3p -5
  • a fifth left heat dissipating plate 58a -5 is inserted between the fifth left chip 3p -5 and sixth left chip 3p -6 so as to achieve "cool left chips”.
  • a first right heat dissipating plate 58b -1 is inserted between the first right chip 3q -1 and second right chip 3q -2
  • a second right heat dissipating plate 58b -2 is inserted between the second right chip 3q -2 and third right chip 3q -3
  • a third right heat dissipating plate 58b -1 is inserted between the third right chip 3q -3 and fourth right chip 3q -4
  • a fourth right heat dissipating plate 58b -4 is inserted between the fourth right chip 3q -4 and fifth right chip 3q -5
  • a fifth right heat dissipating plate 58b -5 is inserted between the fifth right chip 3q -5 and sixth right chip 3q -6 so as to achieve "cool right chips”.
  • a first processing unit 11a is provided between the first left heat dissipating plate 58a -1 and the first right heat dissipating plate 58b -1
  • a second processing unit 11b is provided between the third left heat dissipating plate 58a -3 and the third right heat dissipating plate 58b -3
  • a third processing unit 11c is provided between the fifth left heat dissipating plate 58a -5 and the fifth right heat dissipating plate 58b -5
  • pipelined ALUs are respectively included in the processing units11a, 11b, 11c.
  • the scalar data-path and control path are established between the first left chip 3p -1 and second left chip 3p -2 , the scalar data-path and control path are established between the second left chip 3p -2 and third left chip 3p -3 , the scalar data-path and control path are established between the third left chip 3p -3 and fourth left chip 3p -4 , the scalar data-path and control path are established between the fourth left chip 3p -4 and fifth left chip 3p -5 , and the scalar data-path and control path are established between the fifth left chip 3p -5 and sixth left chip 3p -6, the scalar data-path and control path are established between the first right chip 3q -1 and second right chip 3q -2 , the scalar data-path and control path are established between the second right chip 3q -2 and third right chip 3q -3 , the scalar data-path and control path are established between the third right chip 3q -3 and fourth right chip 3q -4
  • nMOS transistors are assigned respectively as the transfer-transistors and the reset-transistors in the transistor-level representations of the bit-level cells, because the illustration in Figs. 4, 5, 6, 8, 11, 13, 16-20, 22, 25 and 32 are mere schematic examples, pMOS transistors can be used as the transfer-transistors and the reset-transistors, if the opposite polarity of the clock signal is employed.
  • MIS transistors or insulated-gate transistors having gate-insulation films made of silicon nitride film, ONO film, SrO film, Al 2 O 3 film, MgO film, Y 2 O 3 film, HfO 2 film, ZrO 2 film, Ta 2 O 5 film, Bi 2 O 3 film, HfAlO film, and others can be used for the transfer-transistors and the reset-transistors.
  • a marching memory which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments can implement a bit-level parallel processing of scalar/vector data in a multiple-instruction-single-data (MISD) architecture, by which many independent instruction streams provided vertically to a first processor 11 -1 , a second processor 11 -2 , a third processor 11 -3 , a fourth processor 11 -4 , .........., in parallel operate on a single horizontal stream of data at a time with a systolic array of processors 11 -1 , 11 -2 , 11 -3 , 11 -4 .
  • MISD multiple-instruction-single-data
  • arithmetic-level parallelism can be established by a marching memory, which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments, with a single-instruction-multiple-data (SIMD) architecture, by which a single instruction stream is provided to a first processor 11 -1 , a second processor 11 -2 , a third processor 11 -3 , and a fourth processor 11 -4 , so that the single instruction stream can operate on multiple vertical streams of data at a time with the array of processors 11 -1 , 11 -2 , 11 -3 , 11 -4 .
  • SIMD single-instruction-multiple-data
  • a marching memory which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments, can implement a typical chaining in vector processing with a first processor 11 -1 , a second processor 11 -2 , a third processor 11 -3 , and a fourth processor 11 -4 to which a first instruction I 1 , a second instruction I 2 , a third instruction I 3 , and a fourth instruction I 4 are provided respectively.
  • a marching memory which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments, can implement a parallel processing of a single horizontal stream of scalar/vector data in a MISD architecture with a first processor 11 -1 , a second processor 11 -2 , a third processor 11 -3 , and a fourth processor 11 -4 .
  • a marching memory which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments, can implement a parallel processing of a single horizontal stream of scalar/vector data in a MISD architecture with a first processor 11 -1 configured execute multiplication, a second processor 11 -2 configured execute addition, a third processor 11 -3 configured execute multiplication, and a fourth processor 11 -4 configured execute addition.
  • a single-thread-stream and single-data-stream architecture can be achieved with a marching memory, which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments.
  • FIG. 41 we have compared the speed/capability of the worst case of the existing memory for scalar data or program instructions with that of the marching main memory 31, an the hatched portion of Fig. 41(b) has illustrated schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U 1 , U 2 , U 3 ,........., U 100 , and compared with the speed/capability of the worst case of the existing memory shown in Fig. 41(a).
  • the worst case we have discussed that we can read out 99 memory units of the marching main memory 31, but they are not available due to a scalar program's requirement.
  • a memory array area 661, peripheral circuitry for a row decoder 662, peripheral circuitry for sense amplifiers 663, and peripheral circuitry for a column decoder 664 are merged on a single semiconductor chip 66.
  • a plurality of memory cells are arranged in an array of rows and columns in the memory array area 661 so that each row of memory cells share a common 'word' line, while each column of cells share a common 'bit' line, and the location of a memory cell in the array is determined as the intersection of its 'word' and 'bit' lines.
  • the data to be written ('1' or '0') is provided at the 'bit' line from the column decoder 664, while the 'word line' is asserted from the row decoder 662, so as to turn on the access transistor of the memory cell and allows the capacitor to charge up or discharge, depending on the state of the bit line.
  • the 'word' line is also asserted from the row decoder 662, which turns on the access transistor.
  • the enabled transistor allows the voltage on the capacitor to be read by a sense amplifier 663 through the 'bit' line.
  • the sense amplifier 663 can determine whether a '1' or '0' is stored in the memory cell by comparing the sensed capacitor voltage against a threshold.
  • each of unidirectional marching memory blocks is implemented by a bit-level cell consisting of two transistors and one capacitor, while the DRAM memory cell consists of only a single transistor that is paired with a capacitor.
  • one thousand marching memory blocks MM ij with 128kbits capacity can be deployed on the same semiconductor chip 66 for the 512 Mbits DRAM chip.
  • a bidirectional marching memory block is implemented by a bit-level cell consisting of four transistors and two capacitors, while the DRAM memory cell consists of only a single transistor and a single capacitor. If one Gbit DRAM chip technology is assumed, one thousand bidirectional marching memory blocks MM ij with 256kbits capacity can be deployed on the same DRAM chip 66 so as to implement a 256 Mbits marching memory chip.
  • the clock period (the clock cycle time) TAU(Greek-letter) clock illustrated in Fig. 7C, is recited as "the marching memory's memory cycle t M ".
  • the plurality of 256 kbits marching memory blocks MM ij may be arranged in the two dimensional matrix form on the semiconductor chip 66 so that each horizontal array of the marching memory blocks MM ij share a common horizontal-core line, while each vertical array of marching memory blocks MM ij share a common vertical-core line, and a location of a specified marching memory block MM ij in the two dimensional matrix is accessed as the intersection of its horizontal-core line and vertical-core line, with double-level hierarchy.
  • every column of a subject marching memory block MM ij is accessed with an address at the lower level, and every marching memory block MM ij are directly accessed with its own address for each marching memory block MM ij at the higher level.
  • a virtual storage mechanism can be used for the access methodology of the complex marching memory.
  • the scheduling is decided at compilation run if any.
  • the multi-level caches generally operate by checking the smallest Level1 (L1) cache first, and if the L1 cache hits, the processor proceeds at high speed. If the smaller L1 cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked.
  • L1 Level1
  • L2 next larger cache
  • the L2 cache-like memory can support the virtual indexing mechanism, because the size of L2 cache corresponds to the size of the complex marching memory, and the size of a marching memory block MM ij corresponds to the size of smallest L1 cache.
  • a plurality of complex marching memory chips, or a plurality of macro complex marching memory blocks MMM 1 , MMM 2 , ........, MMM k can be mounted on a first circuit board having external-connection pins P 1 , P 2 , ........, P s-1 , P s ("s" may be any integer determined by unit of byte, or word size) so as to implement a multichip module of the complex marching memory, or "a complex marching memory module" as illustrated in Fig. 81, although the illustration of the circuit board is omitted.
  • the first macro complex marching memory block MMM 1 may monolithically integrate one thousand of marching memory blocks MM 111 , MM 121 , MM 131 , ........, MM 1(t-1)1 , MM 1t1 ; MM 211 ,........,; MM (s-1)11 ..................; MM s11 , MM s21 , , ........, MM s(t-1)1 , MM st1 on a first semiconductor chip
  • the second macro complex marching memory block MMM 2 may monolithically integrate one thousand of marching memory blocks MM 112 , MM 122 , MM 132 , ........, MM 1(t-1)2 , MM 1t2 ; MM 212 ,........,; MM (s-1)12 ..................; MM s12 , MM s22 , , ........,; MM (s-1)12 ..................; MM s12 , MM s22 ,
  • the first complex marching memory module hybridly assembling the macro complex marching memory blocks MMM 1 , MMM 2 , ........, MMM k can be connected to a second complex marching memory module hybridly assembling the macro complex marching memory block MMM k+1 and others on a second circuit board through the external-connection pins P 1 , P 2 , ........, P s-1 , P s .
  • the macro complex marching memory block MMM k+1 may monolithically integrate one thousand of marching memory blocks MM 11(k+1) , MM 12(k+1) , MM 13(k+1) , ........, MM 1(t-1)(k+1) , MM 1t(k+1) ; MM 21(k+1) ,........,; MM (s-1)1(k+1) ..................; MM s1(k+1) , MM s2(k+1) , , ........, MM s(t-1)(k+1) , MM st(k+1) on a semiconductor chip, for example.
  • we implement dual lines of the hybrid assembly of macro complex marching memory blocks we can establish a dual in-line module of complex marching memory.
  • a virtual storage mechanism can be used for the access methodology of the complex marching memory, in which the marching memory cores to be used are scheduled just like pages in the virtual memory.
  • the scheduling can be decided at compilation run if any.
  • a marching-data cache memory 21b implemented by the complex marching memory scheme can by used with more smaller size of marching memory blocks, or more smaller size of marching memory cores.
  • one-dimensional array of marching memory blocks, or marching memory cores, being deployed vertically on a semiconductor chip can implement a marching cache memory.
  • each of the marching memory cores includes a single horizontal array of memory units, and the number of memory units deployed horizontally is smaller than the number of memory units employed in the marching memory cores for the marching main memory 31.
  • each of the marching memory cores can be randomly accessed.
  • each of the marching memory blocks consist of a single memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size so as to implement a marching register file by the complex marching memory scheme.
  • marching-data register file 22b implemented by one-bit marching memory cores can be connected to the ALU 112, similar to the organizations illustrated in Figs. 55 and 56. Then, very similar to the operation of SRAM, each of the one-bit marching memory cores can be randomly accessed.
  • the instant invention can be applied to industrial fields of various computer systems, which require higher speed and lower power consumption.

Landscapes

  • Engineering & Computer Science (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Power Engineering (AREA)
  • Dram (AREA)
  • Memory System (AREA)
  • Shift Register Type Memory (AREA)
  • Static Random-Access Memory (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Read Only Memory (AREA)
  • Semiconductor Memories (AREA)

Abstract

A marching memory includes an array of memory units (U1, U2, U3,........., Un-1, Un), each of the memory units having a sequence of bit-level cells (M11, M21, M31,........, Mm-1,1, Mm1) so as to store information of byte size or word size. Each of the bit-level cells encompasses a transfer-transistor (Q111) having a first main-electrode connected to a clock signal supply line (CLOCK) through a first delay element (D 111) and a control-electrode connected to an output terminal of a first neighboring bit-level cell disposed at input side of the array, through a second delay element (D 112), a reset-transistor (Q112) having a control-electrode connected to the clock signal supply line, and a capacitor (C11) connected in parallel with the reset-transistor.

Description

A MARCHING MEMORY, A BIDIRECTIONAL MARCHING MEMORY, A COMPLEX MARCHING MEMORY AND A COMPUTER SYSTEM, WITHOUT THE MEMORY BOTTLENECK
The instant invention relates to new memories and new computer systems using the new memories, which operate at a low energy consumption and high speed.
Since von Neumann and others more than 60 years ago developed a stored program electronic computer, the fundamental memory accessing principle has not been changed. While the processing speeds of computers have increased significantly over the years for whole range of high performance computing (HPC) applications, it has been accomplished either by device technology or by schemes that avoid memory accessing (such as using cache). However, the memory accessing time still limits performance. Currently computer systems use many processors 11 and many large-scale main memories 331, as illustrated in Fig.1.
The computer system illustrated in Fig. 1 includes a processor 11, a cache memory (321a, 321b) and a main memory 331. The processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, an arithmetic logic unit (ALU) 112 configured to execute arithmetic and logic operations synchronized with the clock signal, a instruction register file (RF) 322a connected to the control unit 111 and a data register file (RF) 322b connected to the ALU 112. The cache memory (321a, 321b) has an instruction cache memory 321a and a data cache memory321b. A portion of the main memory 331 and the instruction cache memory 321a are electrically connected by wires and/or buses, which limits the memory access time (or having the Von Neumann bottleneck).351. The remaining portion of the main memory 331, and the data cache memory 321b are electrically connected to enable a similar memory access.351. Furthermore, wires and/or buses, which implement memory access.352, electrically connect between the data cache memory 321b and the instruction cache memory 321a, and the instruction register file 322a and the data register file 322b.
Even though the HPC systems are expected to operate at high speed and low energy consumption, there are speed limitations due to the memory accessing bottlenecks 351, 352. The bottlenecks 351, 352 are ascribable to the wirings between processors 11 and the main memory 331, because the wire length delays access to the computers and stray capacitance existing between wires cause additional delay. Such capacitance requires more power consumption that is proportional to the processor clock frequency in 11.
Currently some HPC processors are implemented using several vector arithmetic pipelines. This vector processor makes better use of memory bandwidth and is a superior machine for HPC applications that can be expressed in vector notation. The vector instructions are made from loops in a source program and each of these vector instructions is executed in an arithmetic pipeline in a vector processor or corresponding units in a parallel processor. The results of these processing schemes give the same results.
However, even the vector processor based system has the memory bottleneck 351, 352 between all the units. Even in a single system with a wide memory and large bandwidth, the same bottleneck 351, 352 appears and if the system consists of many of the same units as in a parallel processor, and the bottleneck 351, 352 is unavoidable.
There are two essential memory access problems in the conventional computer systems. The first problem is wiring lying not only between memory chips and caches or between these two units even on a chip but also inside memory systems. Between chips the wiring between these two chips/units results in more dynamic power consumption due to capacity and the wire signal time delay. This is extended to the internal wire problems within a memory chip related to access lines and the remaining read/write lines. Thus in both inter and intra wiring of memory chips, there exists energy consumption caused by the capacitors with these wires.
The second problem is the memory bottleneck 351, 352 between processor chip, cache and memory chips. Since the ALU can access any part of cache or memory, the access path 351, 352 consists of global wires of long length. These paths are also limited in the number of wires available. Such a bottleneck seems to be due to hardware such as busses. Especially when there is a high speed CPU and a large capacity of memory, the apparent bottleneck is basically between these two.
The key to removing the bottleneck is to have the same memory clock cycle as the CPU's. First, addressing proceeding must be created to improve memory access. Secondly the time delay due to longer wires must be significantly reduced both inside memory and outside memory.
By solving these two issues, a fast direct coupling between memory and the CPU is made, which fact enables a computer without the Memory Bottleneck. The processor and periphery of the processor consumes 70% of the total energy because of these problems, which is divided into 42 percent for supplying instructions and 28 percent for data as illustrated in Fig. 53. The wiring problems generate not only power consumption but also time delay of signals. Overcoming the wiring problems implies the elimination of bottlenecks 351, 352 that limits the flow of data/instructions. If we could remove the wirings of intra/inter chips, the problems of power consumption, time delay and memory bottlenecks 351, 352 would be solved.
An aspect of the present invention inheres in a marching memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a transfer-transistor having a first main-electrode connected to a clock signal supply line through a first delay element and a control-electrode connected to an output terminal of a first neighboring bit-level cell disposed at input side of the array of the memory units, through a second delay element; (b) a reset-transistor having a first main-electrode connected to a second main-electrode of the transfer-transistor, a control-electrode connected to the clock signal supply line, and a second main-electrode connected to the ground potential; and (c) a capacitor configured to store the information of the bit-level cell, connected in parallel with the reset-transistor, wherein an output node connecting the second main-electrode of the transfer-transistor and the first main-electrode of the reset-transistor serves as an output terminal of the bit-level cell, and the output terminal of the bit-level cell delivers the signal stored in the capacitor to a second neighboring bit-level cell disposed at output side of the array of the memory units.
Here, the first main-electrode shall be assigned as a source electrode or a drain electrode for a field effect transistor (FET), a static induction transistor (SIT), a high electron mobility transistor (HEMT), and others, and the second main-electrode is the drain electrode if the first main-electrode is assigned as the source electrode. Alternatively, the second main-electrode is the source electrode if the first main-electrode is assigned as the drain electrode for FET, SIT, and HEMT etc. Similarly, the first main-electrode shall be assigned as an emitter electrode or a collector electrode for a bipolar junction transistor (BJT), and the second main-electrode is the collector electrode if the first main-electrode is assigned as the emitter electrode. Alternatively, the second main-electrode is the emitter electrode if the first main-electrode is assigned as the collector electrode for BJT. And, the control-electrode is a gate electrode for FET, SIT, and HEMT, etc., and a base electrode for BJT.
Another aspect of the present invention inheres in a bidirectional-marching memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element; (b) a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the first clock signal supply line, and a second main-electrode connected to the ground potential; (c) a backward transfer-transistor having a first main-electrode connected to a second clock signal supply line through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element; (d) a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the second clock signal supply line, and a second main-electrode connected to the ground potential; (e) a forward capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor; and (f) a backward capacitor configured to store the information of the bit-level cell, connected in parallel with the backward reset-transistor, wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the forward capacitor to a second neighboring bit-level cell disposed at another side of the array of the memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the backward capacitor to the first neighboring bit-level cell.
Still another aspect of the present invention inheres in a bidirectional-marching memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element; (b) a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the first clock signal supply line, and a second main-electrode connected to the ground potential; (c) a backward transfer-transistor having a first main-electrode connected to a second clock signal supply line through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element; (d) a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the second clock signal supply line, and a second main-electrode connected to the ground potential; and (e) a common capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor, and the backward reset-transistor, wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the common capacitor to a second neighboring bit-level cell disposed at another side of the array of the memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the common capacitor to the first neighboring bit-level cell.
Still another aspect of the present invention inheres in a complex marching memory encompassing a plurality of marching memory blocks being deployed spatially, each of the marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size. Here, wherein each of the memory units transfers synchronously with a clock signal synchronized with the CPU's clock signal, step by step, toward an output side of corresponding marching memory block from an input side of the corresponding marching memory block, and each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
Still another aspect of the present invention inheres in a complex marching memory encompassing a plurality of marching memory blocks being deployed spatially, each of the marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size. Here, each of the memory units transfers synchronously with a first clock signal, step by step, toward a first edge side of corresponding marching memory block from a second edge side of the corresponding marching memory block opposing to the first edge side, and further, each of the memory units transfers synchronously with a second clock signal, step by step, toward the second edge side from the first edge side, and each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
Still another aspect of the present invention inheres in a computer system comprising a processor and a marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the marching main memory to the processor, the marching main memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a transfer-transistor having a first main-electrode connected to a clock signal supply line through a first delay element and a control-electrode connected to an output terminal of a first neighboring bit-level cell disposed at input side of the array of the memory units, through a second delay element; (b) a reset-transistor having a first main-electrode connected to a second main-electrode of the transfer-transistor, a control-electrode connected to the clock signal supply line, and a second main-electrode connected to the ground potential; and (c) a capacitor configured to store the information of the bit-level cell, connected in parallel with the reset-transistor, wherein an output node connecting the second main-electrode of the transfer-transistor and the first main-electrode of the reset-transistor serves as an output terminal of the bit-level cell, and the output terminal of the bit-level cell delivers the signal stored in the capacitor to a second neighboring bit-level cell disposed at output side of the array of the memory units.
Still another aspect of the present invention inheres in a computer system comprising a processor and a bidirectional marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the bidirectional marching main memory to the processor, the bidirectional marching main memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element; (b) a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the first clock signal supply line, and a second main-electrode connected to the ground potential; (c) a backward transfer-transistor having a first main-electrode connected to a second clock signal supply line through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element; (d) a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the second clock signal supply line, and a second main-electrode connected to the ground potential; and (e) a common capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor, and the backward reset-transistor, wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the common capacitor to a second neighboring bit-level cell disposed at another side of the array of the memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the common capacitor to the first neighboring bit-level cell.
Still another aspect of the present invention inheres in a computer system comprising a processor and a bidirectional marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the bidirectional marching main memory to the processor, the bidirectional marching main memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising: (a) a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element; (b) a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the first clock signal supply line, and a second main-electrode connected to the ground potential; (c) a backward transfer-transistor having a first main-electrode connected to a second clock signal supply line through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element; (d) a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the second clock signal supply line, and a second main-electrode connected to the ground potential; and (e) a common capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor, and the backward reset-transistor, wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the common capacitor to a second neighboring bit-level cell disposed at another side of the array of the memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the common capacitor to the first neighboring bit-level cell.
Still another aspect of the present invention inheres in a computer system encompassing a processor and a marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the marching main memory to the processor, the marching main memory encompassing a plurality of marching memory blocks being deployed spatially, each of the marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size. Here, each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
Still another aspect of the present invention inheres in a computer system encompassing a processor and a bidirectional marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the bidirectional marching main memory to the processor, the bidirectional marching main memory encompassing a plurality of bidirectional marching memory blocks being deployed spatially, each of the bidirectional marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size. Here, each of the memory units transfers synchronously with a first clock signal, step by step, toward a first edge side of corresponding marching memory block from a second edge side of the corresponding marching memory block opposing to the first edge side, and further, each of the memory units transfers synchronously with a second clock signal, step by step, toward the second edge side from the first edge side, and each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
Fig. 1 illustrates a schematic block diagram illustrating an organization of a conventional computer system; Fig. 2 illustrates a schematic block diagram illustrating a fundamental organization of a computer system pertaining to a first embodiment of the present invention; Fig. 3 illustrates an array of memory units implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention, and a transfer of information in the marching main memory; Fig. 4 illustrates an example of a transistor-level representation of the cell-array in the marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 5 illustrates an enlarged transistor-level representation of the cell-array in the marching main memory used in the computer system pertaining to the first embodiment of the present invention, focusing to four neighboring bit-level cells; Fig. 6 illustrates a further enlarged transistor-level representation of a single bit-level cell in the marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 7A illustrates a schematic example of the response of the transistor to the waveform of a clock signal configured to be applied to the marching main memory used in the computer system pertaining to the first embodiment of the present invention, illustrating a case when a signal "1" is transferred from the previous stage; Fig. 7B illustrates another schematic example of the response of the transistor to the waveform of the clock signal configured to be applied to the marching main memory used in the computer system pertaining to the first embodiment of the present invention, illustrating another case when a signal "0" is transferred from the previous stage; Fig. 7C illustrates an actual example of the responses of the transistors to the waveform of a clock signal configured to be applied to the marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 8 illustrates a detailed example of a bit-level cell used in the marching main memory for the computer system pertaining to the first embodiment of the present invention; Fig. 9 illustrates an example of actual plan view implementing the bit-level cell illustrated in Fig. 8; Fig. 10 illustrates a cross-sectional view taken on line A-A in the plan view illustrated in Fig. 9; Fig. 11 illustrates another enlarged transistor-level representation of the single bit-level cell in combination with an inter-unit cell adapted for a marching main memory used in the computer system pertaining to a modification of the first embodiment of the present invention; Fig. 12 illustrates an example of actual plan view implementing the bit-level cell illustrated in Fig. 11; Fig. 13 illustrates an enlarged transistor-level representation of the cell-array in combination with corresponding inter-unit cells, in the marching main memory used in the computer system pertaining to the modification of the first embodiment of the present invention, focusing to two neighboring bit-level cells; Fig. 14(a) illustrates a timing diagram of a response of the bit-level cell illustrated in Fig. 13, and Fig. 14(b) illustrates a next timing diagram of a next response of the next bit-level cell illustrated in Fig. 13, to a waveform of a clock signal. Fig. 15 illustrates an actual example of the responses of the transistors to the waveform of a clock signal configured to be applied to the marching main memory used in the computer system pertaining to the modification of the first embodiment of the present invention; Figs. 16 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell illustrated in Figs. 11 and 13, in the marching main memory used in the computer system pertaining to the modification of the first embodiment of the present invention; Fig. 17 illustrates a still another enlarged transistor-level representation of the single bit-level cell in combination with an inter-unit cell adapted for a marching main memory used in the computer system pertaining to another modification (second modification) of the first embodiment of the present invention; Fig. 18 illustrates an enlarged transistor-level representation of the cell-array in combination with corresponding inter-unit cells, in the marching main memory used in the computer system pertaining to the second modification of the first embodiment of the present invention, focusing to two neighboring bit-level cells; Fig. 19 illustrates a yet still another enlarged transistor-level representation of the single bit-level cell in combination with an inter-unit cell adapted for a marching main memory used in the computer system pertaining to a still another modification (third modification) of the first embodiment of the present invention; Fig. 20 illustrates an enlarged transistor-level representation of the cell-array in combination with corresponding inter-unit cells, in the marching main memory used in the computer system pertaining to the third modification of the first embodiment of the present invention, focusing to two neighboring bit-level cells; Fig. 21 illustrates an actual example of the responses of the transistors to the waveform of a clock signal configured to be applied to the marching main memory used in the computer system pertaining to the third modification of the first embodiment of the present invention; Figs. 22 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell illustrated in Figs. 20 and 21, in the marching main memory used in the computer system pertaining to the third modification of the first embodiment of the present invention; Fig. 23 illustrates a gate-level representation of the cell-array illustrated in Fig. 4; Fig. 24 illustrates an array of memory units implementing a reverse directional marching main memory used in the computer system pertaining to the first embodiment of the present invention, and a reverse directional transfer of information in the reverse directional marching main memory; Fig. 25(a) illustrates an example of a transistor-level circuit configuration of cell array implementing i-th row of the reverse directional marching main memory illustrated in Fig. 24, and Fig. 25(b) illustrates an example of the response of the transistor to the waveform of a clock signal configured to be applied to the reverse directional marching main memory illustrated in Fig. 24; Fig. 26 illustrates a gate-level representation of the cell-array implementing i-th row in the reverse directional marching main memory illustrated in Fig. 25 (a); Fig. 27 illustrates a time-domain relationship between the memory unit streaming time in a marching main memory and the clock cycle in a processor (CPU) in the computer system pertaining to the first embodiment of the present invention; Fig. 28 illustrates schematically an organization of the computer system pertaining to the first embodiment of the present invention, in which the memory bottleneck is disappeared between the processor (CPU) and the marching memory structure including the marching main memory, in the computer system pertaining to the first embodiment of the present invention; Fig. 29(a) illustrates a forward data stream flowing from the marching memory structure, which includes the marching main memory, to the processor (CPU) and backward data stream flowing from the processor (CPU) to the marching memory structure in the computer system pertaining to the first embodiment of the present invention, and Fig. 29(b) illustrates bandwidths established between the marching memory structure and the processor (CPU) under an ideal condition that the memory unit streaming time of the marching memory structure is equal to the clock cycle of the processor (CPU); Fig. 30(a) schematically illustrates an extremely high-speed magnetic tape system, comparing with the computer system illustrated in Fig. 30(b), which corresponds to the computer system pertaining to the first embodiment of the present invention; Fig. 31 (a) illustrates a concrete image of a marching behavior of information (a forward marching behavior), in which information marches (shifts) side by side toward right-hand direction in a one-dimensional marching main memory, Fig. 31 (b) illustrates a staying state of the one-dimensional marching main memory, and Fig. 31 (c) illustrates a concrete image of a reverse-marching behavior of information (a backward marching behavior), in which information marches (shifts) side by side toward left-hand direction in the one-dimensional marching main memory in the computer system pertaining to the first embodiment of the present invention; Fig. 32 illustrates an example of a transistor-level circuit configuration of the one-dimensional marching main memory, which can achieve the bidirectional transferring behavior illustrated in Figs. 31 (a)-(c), configured to store and transfer bi-directionally instructions or scalar data in the computer system pertaining to the first embodiment of the present invention; Fig. 33 illustrates another example of a transistor-level circuit configuration of the one-dimensional marching main memory, incorporating isolation transistors between memory units, which can achieve the bidirectional transferring behavior illustrated in Figs. 31 (a)-(c), configured to store and transfer bi-directionally instructions or scalar data in the computer system pertaining to the first embodiment of the present invention; Fig. 34 illustrates a generic representation of the gate-level circuit configuration of the one-dimensional marching main memory illustrated in Fig. 32; Fig. 35(a) illustrates a bidirectional transferring mode of instructions in a one-dimensional marching main memory adjacent to a processor, the instructions moves toward the processor, and moves from / to the next memory arranged at left-hand side, Fig. 35(b) illustrates a bidirectional transferring mode of scalar data in a one-dimensional marching main memory adjacent to an ALU, the scalar data moves toward the ALU and moves from / to the next memory, and Fig. 35(c) illustrates a uni-directional transferring mode of vector/streaming data in a one-dimensional marching main memory adjacent to a pipeline, the vector/streaming data moves toward the pipeline, and moves from the next memory; Fig. 36(a) compares with Fig. 36(b), illustrating an inner configuration of existing memory, in which each memory unit is labeled by an address, and Fig. 36(b) illustrates an inner configuration of present one-dimensional marching main memory, in which the positioning of individual memory unit is at least necessary to identify the starting point and ending point of a set of successive memory units in vector/streaming data. Fig. 37(a) illustrates an inner configuration of present one-dimensional marching main memory, in which the positioning of individual memory unit is at least necessary to identify the starting point and ending point of a set of successive memory units in vector instruction, Fig. 37(b) illustrates an inner configuration of present one-dimensional marching main memory for scalar data.. Fig. 37(c) illustrates an inner configuration of present one-dimensional marching main memory, in which position indexes are at least necessary to identify the starting point and ending point of a set of successive memory units in vector/streaming data; Fig. 38(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages for vector/streaming data case, Fig. 38(b) illustrates schematically an example of a configuration of one of the pages, each of the page is implemented by a plurality of files for vector/streaming data case, and Fig. 38(c) illustrates schematically an example of a configuration of one of the files, each of the file is implemented by a plurality of memory units for vector/streaming data case, in the computer system pertaining to the first embodiment of the present invention; Fig. 39(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages for programs/scalar data case, where each pages has its own position index as address, Fig. 39(b) illustrates schematically an example of a configuration of one of the pages and the driving positions of the page, using digits in the binary system, each of the page is implemented by a plurality of files for programs/scalar data case, and each file has its own position index as address, and Fig. 39(c) illustrates schematically an example of a configuration of one of the files and the driving positions of the file, using digits in the binary system, each of the file is implemented by a plurality of memory units for programs/scalar data case, where each memory units has its own position index as address, in the computer system pertaining to the first embodiment of the present invention; Fig. 40(a) illustrates schematically the speed/capability of the existing memory compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention, and Fig. 40(b) illustrates schematically the speed/capability of the marching main memory compared with that of the existing memory illustrated in Fig. 40(a); Fig. 41(a) illustrates schematically the speed/capability of the worst case of the existing memory for scalar instructions compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention, and Fig. 41(b) illustrates schematically the speed/capability of the marching main memory compared with that of the worst case of the existing memory illustrated in Fig. 41(a); Fig. 42(a) illustrates schematically the speed/capability of the typical case of the existing memory for scalar instructions compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention, and Fig. 42(b) illustrates schematically the speed/capability of the marching main memory compared with that of the typical case of the existing memory illustrated in Fig. 42(a); Fig. 43(a) illustrates schematically the speed/capability of the typical case of the existing memory for scalar data case compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention, and Fig. 43(b) illustrates schematically the speed/capability of the marching main memory compared with that of the existing memory illustrated in Fig. 43(a); Fig. 44(a) illustrates schematically the speed/capability of the best case of the existing memory for streaming data and data parallel case compared with that of the marching main memory used in the computer system pertaining to the first embodiment of the present invention, and Fig. 44(b) illustrates schematically the speed/capability of the marching main memory compared with that of the best case of the existing memory illustrated in Fig. 44(a); Fig. 45 illustrates an example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 46 illustrates another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 47 illustrates a still another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 48 illustrates a yet still another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 49 illustrates a further another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 50 illustrates a further another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 51 illustrates a further another example of the array of two-dimensional memory units, each of the memory units storing and transferring data or instructions, implementing a marching main memory used in the computer system pertaining to the first embodiment of the present invention; Fig. 52(a) illustrates device level energy consumption in current microprocessors, decomposing into static and dynamic energy consumptions, Fig. 52(b) illustrates net and overhead of the power consumption in the dynamic energy consumption illustrated in Fig. 52(a), and Fig. 52(c) illustrates the net energy consumption in the current microprocessors; Fig. 53 illustrates an actual energy consumption distribution over a processor including registers and caches in the conventional architecture, estimated by Dally; Fig. 54(a) illustrates energy consumption in the conventional cache-based architecture, decomposing the energy consumption in the cache memory into static and dynamic energy consumptions, and Fig. 54(b) illustrates energy consumption in the computer system according to a third embodiment of the present invention, decomposing the energy consumption in the marching cache memory into static and dynamic energy consumption. Fig. 55 illustrates a schematic block diagram illustrating an organization of a computer system pertaining to a second embodiment of the present invention; Fig. 56 illustrates a schematic block diagram illustrating an organization of a computer system pertaining to a third embodiment of the present invention; Fig. 57(a) illustrates a combination of arithmetic pipelines and marching register units in the computer system pertaining to the third embodiment of the present invention, and Fig. 57(b) illustrates an array of marching cache units in the computer system pertaining to the third embodiment of the present invention; Fig. 58 illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a single processor core, a marching-cache memory and a marching-register file in accordance with a modification of the third embodiment of the present invention; Fig. 59 illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a single arithmetic pipeline, a marching-cache memory and a marching-vector register file in accordance with another modification of the third embodiment of the present invention; Fig. 60 illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a plurality of processor cores, a marching-cache memory and a marching-register file in accordance with a still another modification of the third embodiment of the present invention; Fig. 61 illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a plurality of arithmetic pipelines, a marching-cache memory and a marching-vector register file in accordance with a yet still another modification of the third embodiment of the present invention; Fig. 62(a) illustrates a schematic block diagram of an organization of a conventional computer system implemented by a combination of a plurality of arithmetic pipelines, a plurality of conventional cache memories, a plurality of conventional-vector register files (RFs) and a conventional main memory, in which bottleneck is established between the conventional cache memories and the conventional main memory, and Fig. 62(b) illustrates a schematic block diagram of an organization of a computer system implemented by a combination of a plurality of arithmetic pipelines, a plurality of marching cache memories, a plurality of marching-vector register files and a marching main memory, in which no bottleneck is established, in accordance with a yet still another modification of the third embodiment of the present invention; Fig. 63 illustrates a schematic block diagram illustrating an organization of a high performance computing (HPC) system pertaining to a fourth embodiment of the present invention; Fig. 64 illustrates a schematic block diagram illustrating an organization of a computer system pertaining to a fifth embodiment of the present invention; Fig. 65(a) illustrates a cross-sectional view of a three-dimensional marching main memory used in the computer system pertaining to the fifth embodiment of the present invention, Fig. 65(b) illustrates a cross-sectional view of a three-dimensional marching-cache used in the computer system pertaining to the fifth embodiment of the present invention, and Fig. 65(c) illustrates a cross-sectional view of a three-dimensional marching-register file used in the computer system pertaining to the fifth embodiment of the present invention; Fig. 66 illustrates a perspective view of a three-dimensional configuration used in the computer system pertaining to the fifth embodiment of the present invention; Fig. 67 illustrates a perspective view of another three-dimensional configuration used in the computer system pertaining to the fifth embodiment of the present invention; Fig. 68 illustrates a cross-sectional view of the three-dimensional configuration illustrated in Fig. 67; Fig. 69 illustrates a cross-sectional view of another three-dimensional configuration used in the computer system pertaining to the fifth embodiment of the present invention; Fig. 70 illustrates schematically a cross-sectional view of the three-dimensional configuration of a fundamental core of the computer system for executing the control processing, by representing control paths in the computer system pertaining to the fifth embodiment of the present invention; Fig. 71 illustrates schematically a cross-sectional view of the three-dimensional configuration of a fundamental core of the computer system for executing the scalar data processing, by representing data-paths for scalar data in the computer system pertaining to the fifth embodiment of the present invention; Fig. 72 illustrates schematically a cross-sectional view of the three-dimensional configuration of a fundamental core of the computer system for executing the vector/streaming data processing, by representing data-paths for vector/streaming data in the computer system pertaining to the fifth embodiment of the present invention; Fig. 73 illustrates schematically a cross-sectional view of the three-dimensional configuration of a fundamental core of the computer system, configured to execute the scalar data part of the computer system, where a plurality of processing units (CPUs) execute not only scalar data but also vector/streaming data, and pipelined ALUs are included in the processing units, by representing the combination of scalar data-path and the control path for the computer system pertaining to the fifth embodiment of the present invention; Fig. 74 illustrates a bit-level parallel processing of scalar/vector data in MISD architecture; Fig. 75 illustrates a parallel processing of vector data in SIMD architecture; Fig. 76 illustrates a typical chaining in vector processing; Fig. 77 illustrates a parallel processing of scalar/vector data in MISD architecture; Fig. 78 illustrates a parallel processing of scalar/vector data in MISD architecture; Fig. 79(a) illustrates a plan view of a representative conventional DRAM delineated on a single semiconductor chip, and Fig. 79(b) illustrates a corresponding plan view of a schematic inner layout of a complex marching memory, which is delineated on the same single semiconductor chip of the conventional DRAM; Fig. 80 (a) illustrates an outer shape of a single marching memory block, Fig. 80 (b) illustrates a partial plan view of the marching memory block illustrated in Fig. 80 (a), which has one thousand of columns, where the marching memory's access time (cycle time) is defined to a single column, and Fig. 80(c) illustrates the conventional DRAM's memory cycle for writing in or reading out the content of the conventional DRAM's one memory element; and Fig. 81 illustrates a schematic plan view of a complex marching memory module.
Various embodiments of the present invention will be described with reference to the accompanying drawings. It is to be noted that the same or similar reference numerals are applied to the same or similar parts and elements throughout the drawings, and the description of the same or similar parts and elements will be omitted or simplified. Generally and as it is conventional in the representation of semiconductor devices, it will be appreciated that the various drawings are not drawn to scale from one figure to another nor inside a given figure, and in particular that the layer thicknesses are arbitrarily drawn for facilitating the reading of the drawings. In the following description specific details are set forth, such as specific materials, processes and equipment in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known manufacturing materials, processes and equipment are not set forth in detail in order not to unnecessarily obscure the present invention. Prepositions, such as "on", "over", "under", "beneath", and "normal" are defined with respect to a planar surface of the substrate, regardless of the orientation in which the substrate is actually held. A layer is on another layer even if there are intervening layers.
Although nMOS transistors are illustrated as transfer-transistors and reset-transistors in transistor-level representations of bit-level cells in Figs. 4, 5, 6, 8, 11, 13, 16-20, 22, 25 and 32, etc., pMOS transistors can be used as the transfer-transistors and the reset-transistors, if the opposite polarity of the clock signal is employed.
--FIRST EMBODIMENT--
(FUNDAMENTAL ORGANIZATION OF COMPUTER SYSTEM)
As illustrated in Fig. 2, a computer system pertaining to a first embodiment of the present invention encompasses a processor 11 and a marching main memory 31. The processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, and an arithmetic logic unit (ALU) 112 configured to execute arithmetic and logic operations synchronized with the clock signal. As illustrated in Fig. 3, the marching main memory 31 encompasses an array of memory units U1, U2, U3,........., Un-1, Un, each of memory units U1, U2, U3,........., Un-1, Un having a unit of information including word size of data or instructions, input terminals of the array and output terminals of the array. As illustrated in Fig. 3, the marching main memory 31 stores the information in each of memory units U1, U2, U3,........., Un-1, Un and transfers the information synchronously with the clock signal, step by step, toward the output terminals, so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
As illustrated in Fig. 2, the marching main memory 31 and the processor 11 are electrically connected by a plurality of joint members 54. For example, each of joint members 54 may be implemented by a first terminal pin attached to the marching main memory 31, a second terminal pin attached to the processor 11, and an electrical conductive bump interposed between the first and second terminal pins. For the material of the electrical conductive bumps, solder balls, gold (Au) bumps, silver (Ag) bumps, copper (Cu) bumps, nickel-gold (Ni-Au) alloy bumps or nickel-gold-indium (Ni-Au-In) alloy bumps, etc. are acceptable. The resultant data of the processing in the ALU 112 are sent out to the marching main memory 31 through the joint members 54. Therefore, as represented by bidirectional arrow PHI (Greek-letter)12, data are transferred bi-directionally between the marching main memory 31 and the processor 11 through the joint members 54. On the contrary, as represented by uni-directional arrow ETA(Greek-letter)11, as to the instructions movement, there is only one way of instruction-flow from the marching main memory 31 to the processor 11.
As illustrated in Fig. 2, the organization of the computer system pertaining to the first embodiment of the present invention further encompasses an external secondary memory 41 such as disk, an input unit 61, an output unit 62 and input/ output (I/O) interface circuit 63. Similar to a conventional von Neumann computer, the signals or data are received by the input unit 61, and the signals or data are sent from the output unit 62. For instance, known keyboards and known mice can be considered as the input unit 6, while known monitors and printers can be considered as the output unit 62. Known devices for communication between computers, such as modems and network cards, typically serve for both the input unit 61 and the output unit 62. Note that the designation of a device as either the input unit 61 or the output unit 62 depends on the perspective. The input unit 61 takes as input physical movement that the human user provides and converts it into signals that the computer system pertaining to the first embodiment can understand. For example, the input unit 61 converts incoming data and instructions into a pattern of electrical signals in binary code that are comprehensible to the computer system pertaining to the first embodiment, and the output from the input unit 61 is fed to the marching main memory 31 through the I/O interface circuit 63. The output unit 62 takes as input signals that the marching main memory 31 provides through the I/O interface circuit 63. The output unit 62 then converts these signals into representations that human users can see or read, reversing the process of the input unit 61, translating the digitized signals into a form intelligible to the user. The I/O interface circuit 63 is required whenever the processor 11 drives the input unit 61 and the output unit 62. The processor 11 can communicate with the input unit 61 and the output unit 62 through the I/O interface circuit 63. If in the case of different data formatted being exchanged, the I/O interface circuit 63 converts serial data to parallel form and vice-versa. There is provision for generating interrupts and the corresponding type numbers for further processing by the processor 11 if required.
The secondary memory 41 stores data and information on a more long-term basis than the marching main memory 31. While the marching main memory 31 is concerned mainly with storing programs currently executing and data currently being employed, the secondary memory 41 is generally intended for storing anything that needs to be kept even if the computer is switched off or no programs are currently executing. The examples of the secondary memory 41 are known hard disks (or hard drives) and known external media drives (such as CD-ROM drives). These storage methods are most commonly used to store the computer's operating system, the user's collection of software and any other data the user wishes. While the hard drive is used to store data and software on a semi-permanent basis and the external media drives are used to hold other data, this setup varies wildly depending on the different forms of storage available and the convenience of using each. As represented by bidirectional arrow PHI (Greek-letter)1, data are transferred bi-directionally between the secondary memory 41 and the marching main memory 31 and the processor 11 through existing wire connection 53.
Although the illustration is omitted, in the computer system of the first embodiment illustrated in Fig. 2, the processor 11 may includes a plurality of arithmetic pipelines configured to receive the stored information through the output terminals from the marching main memory 31, and as represented by bidirectional arrow PHI12, data are transferred bi-directionally between the marching main memory 31 and the plurality of arithmetic pipelines through the joint members 54.
In the computer system of the first embodiment illustrated in Fig. 2, there are no buses consisting of the data bus and address bus because the whole computer system has no global wires even in any data exchange between the processor 11 and the marching main memory 31, while the wires or the buses implement the bottleneck in the conventional computer system. There are only short local wires within the marching main memory 31 or connecting portions of the marching main memory 31 with a corresponding ALU 112. As there are no global wires, which generate time delay and stray capacitances between these wires, the computer system of the first embodiment can achieve much higher processing speed and lower power consumption.
(DETAILED CONFIGURATION OF CELL ARRAY IMPLEMENTING THE (MARCHING MAIN MEMORY)
In most conventional computers, the unit of address resolution is either a character (e.g. a byte) or a word. If the unit is a word, then a larger amount of memory can be accessed using an address of a given size. On the other hand, if the unit is a byte, then individual characters can be addressed (i.e. selected during the memory operation). Machine instructions are normally fractions or multiples of the architecture's word size. This is a natural choice since instructions and data usually share the same memory subsystem. Figs. 4 and 5 correspond to transistor-level representations of the cell array implementing the marching main memory 31 illustrated in Fig. 3, and Fig. 23 corresponds to a gate-level representation of the cell array implementing marching main memory 31 illustrated in Fig. 3.
In Fig. 4, the first column of the m * n matrix, which is implemented by a vertical array of cell M11, M21, M31, ........, Mm-1,1, Mm1, represents the first memory unit U1 illustrated in Fig.3. Here, "m" is an integer determined by word size. Although the choice of a word size is of substantial importance, when computer architecture is designed, word sizes are naturally multiples of eight bits, with 16, 32, and 64 bits being commonly used. Similarly, the second column of the m * n matrix, which is implemented by a vertical array of cell M12, M22, M32, ........, Mm-1,2, Mm2, represents the second memory unit U2, the third column of the m * n matrix, which is implemented by a vertical array of cell M13, M23, M33, ........, Mm-1,3, Mm3, represents the third memory unit U3, ........, the (n-1)-th column of the m * n matrix, which is implemented by a vertical array of cell M1,n-1, M2,n-1, M3,n-1, ........, Mm-1,n-1, Mm,n-1, represents the (n-1)-th memory unit Un-1, and the n-th column of the m * n matrix, which is implemented by a vertical array of cell M1,n, M2,n, M3,n, ........, Mm-1,n, Mm,n, represents the n-th memory unit Un.
Namely, as illustrated in Fig. 4, the first memory unit U1 of word-size level is implemented by a vertical array of bit-level cell M11, M21, M31, ........, Mm-1,1, Mm1 in the first column of the m * n matrix. The first-column cell M11 on the first row encompasses a first nMOS transistor Q111 having a drain electrode connected to a clock signal supply line through a first delay element D 111 and a gate electrode connected to the output terminal of a first bit-level input terminal through a second delay element D 112; a second nMOS transistor Q112 having a drain electrode connected to a source electrode of the first nMOS transistor Q111, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C11 configured to store the information of the cell M11, connected in parallel with the second nMOS transistor Q112 ,wherein an output node connecting the source electrode of the first nMOS transistor Q111 and the drain electrode of the second nMOS transistor Q112 serves as an output terminal of the cell M11, configured to deliver the signal stored in the capacitor C11 to the next bit-level cell M12. The first-column cell M21 on the second row encompasses a first nMOS transistor Q211 having a drain electrode connected to the clock signal supply line through a first delay element D 211 and a gate electrode connected to the output terminal of a second bit-level input terminal through a second delay element D 212; a second nMOS transistor Q212 having a drain electrode connected to a source electrode of the first nMOS transistor Q211, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C21 configured to store the information of the cell M21, connected in parallel with the second nMOS transistor Q212 ,wherein an output node connecting the source electrode of the first nMOS transistor Q211 and the drain electrode of the second nMOS transistor Q212 serves as an output terminal of the cell M21, configured to deliver the signal stored in the capacitor C21 to the next bit-level cell M22. The first-column cell M31 on the third row encompasses a first nMOS transistor Q311 having a drain electrode connected to the clock signal supply line through a first delay element D 311 and a gate electrode connected to the output terminal of a third bit-level input terminal through a second delay element D 312; a second nMOS transistor Q312 having a drain electrode connected to a source electrode of the first nMOS transistor Q311, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C31 configured to store the information of the cell M31, connected in parallel with the second nMOS transistor Q312 , wherein an output node connecting the source electrode of the first nMOS transistor Q311 and the drain electrode of the second nMOS transistor Q312 serves as an output terminal of the cell M31, configured to deliver the signal stored in the capacitor C31 to the next bit-level cell M31. ............ The first-column cell M(m-1)1 on the (m-1)-th row encompasses a first nMOS transistor Q(m-1)11 having a drain electrode connected to the clock signal supply line through a first delay element D (m-1)11 and a gate electrode connected to the output terminal of a (m-1)-th bit-level input terminal through a second delay element D (m-1)12; a second nMOS transistor Q(m-1)12 having a drain electrode connected to a source electrode of the first nMOS transistor Q(m-1)11, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C(m-1)1 configured to store the information of the cell M(m-1)1, connected in parallel with the second nMOS transistor Q(m-1)12 , wherein an output node connecting the source electrode of the first nMOS transistor Q(m-1)11 and the drain electrode of the second nMOS transistor Q(m-1)12 serves as an output terminal of the cell M(m-1)1, configured to deliver the signal stored in the capacitor C(m-1)1 to the next bit-level cell M(m-1)12. The first-column cell Mm1 on the m-th row encompasses a first nMOS transistor Qm11 having a drain electrode connected to the clock signal supply line through a first delay element D m11 and a gate electrode connected to the output terminal of a m-th bit-level input terminal through a second delay element D m12; a second nMOS transistor Qm12 having a drain electrode connected to a source electrode of the first nMOS transistor Qm11, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Cm1 configured to store the information of the cell Mm1, connected in parallel with the second nMOS transistor Qm12 ,wherein an output node connecting the source electrode of the first nMOS transistor Qm11 and the drain electrode of the second nMOS transistor Qm12 serves as an output terminal of the cell Mm1, configured to deliver the signal stored in the capacitor Cm1 to the next bit-level cell Mm2.
And, as illustrated in Fig.4, the second memory unit U2 of word-size level is implemented by a vertical array of bit-level cell M12, M22, M32, ........, Mm-1,2, Mm2 in the second column of the m * n matrix. The second column cell M12 on the first row encompasses a first nMOS transistor Q121 having a drain electrode connected to the clock signal supply line through a first delay element D 121 and a gate electrode connected to the output terminal of the previous bit-level cell M11 through a second delay element D 122; a second nMOS transistor Q122 having a drain electrode connected to a source electrode of the first nMOS transistor Q121, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C12 configured to store the information of the cell M12, connected in parallel with the second nMOS transistor Q122 ,wherein an output node connecting the source electrode of the first nMOS transistor Q121 and the drain electrode of the second nMOS transistor Q122 serves as an output terminal of the cell M12, configured to deliver the signal stored in the capacitor C12 to the next bit-level cell M13. The second column cell M22 on the second row encompasses a first nMOS transistor Q221 having a drain electrode connected to the clock signal supply line through a first delay element D 221 and a gate electrode connected to the output terminal of the previous bit-level cell M21 through a second delay element D 222; a second nMOS transistor Q222 having a drain electrode connected to a source electrode of the first nMOS transistor Q221, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C22 configured to store the information of the cell M22, connected in parallel with the second nMOS transistor Q222 ,wherein an output node connecting the source electrode of the first nMOS transistor Q221 and the drain electrode of the second nMOS transistor Q222 serves as an output terminal of the cell M22, configured to deliver the signal stored in the capacitor C22 to the next bit-level cell M23. The second column cell M32 on the third row encompasses a first nMOS transistor Q321 having a drain electrode connected to the clock signal supply line through a first delay element D 321 and a gate electrode connected to the output terminal of the previous bit-level cell M31 through a second delay element D 322; a second nMOS transistor Q322 having a drain electrode connected to a source electrode of the first nMOS transistor Q321, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C32 configured to store the information of the cell M32, connected in parallel with the second nMOS transistor Q322 , wherein an output node connecting the source electrode of the first nMOS transistor Q321 and the drain electrode of the second nMOS transistor Q322 serves as an output terminal of the cell M32, configured to deliver the signal stored in the capacitor C32 to the next bit-level cell M33. ............ The second column cell M(m-1)2 on the (m-1)-th row encompasses a first nMOS transistor Q(m-1)21 having a drain electrode connected to the clock signal supply line through a first delay element D (m-1)21 and a gate electrode connected to the output terminal of the previous bit-level cell M(m-1)1 through a second delay element D (m-1)22; a second nMOS transistor Q(m-1)22 having a drain electrode connected to a source electrode of the first nMOS transistor Q(m-1)21, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C(m-1)2 configured to store the information of the cell M(m-1)2, connected in parallel with the second nMOS transistor Q(m-1)22 , wherein an output node connecting the source electrode of the first nMOS transistor Q(m-1)21 and the drain electrode of the second nMOS transistor Q(m-1)22 serves as an output terminal of the cell M(m-1)2, configured to deliver the signal stored in the capacitor C(m-1)2 to the next bit-level cell M(m-1)3. The second column cell Mm2 on the m-th row encompasses a first nMOS transistor Qm21 having a drain electrode connected to the clock signal supply line through a first delay element D m21 and a gate electrode connected to the output terminal of the previous bit-level cell Mm1 through a second delay element D m22; a second nMOS transistor Qm22 having a drain electrode connected to a source electrode of the first nMOS transistor Qm21, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Cm2 configured to store the information of the cell Mm2, connected in parallel with the second nMOS transistor Qm22 ,wherein an output node connecting the source electrode of the first nMOS transistor Qm21 and the drain electrode of the second nMOS transistor Qm22 serves as an output terminal of the cell Mm2, configured to deliver the signal stored in the capacitor Cm2 to the next bit-level cell Mm3.
Furthermore, as illustrated in Fig.4, the third memory unit U3 of word-size level is implemented by a vertical array of bit-level cell M13, M23, M33, ........, Mm-1,3, Mm3 in the third column of the m * n matrix. The third-column cell M13 on the first row encompasses a first nMOS transistor Q131 having a drain electrode connected to the clock signal supply line through a first delay element D 131 and a gate electrode connected to the output terminal of the previous bit-level cell M12 through a second delay element D 132; a second nMOS transistor Q132 having a drain electrode connected to a source electrode of the first nMOS transistor Q131, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C13 configured to store the information of the cell M13, connected in parallel with the second nMOS transistor Q132 ,wherein an output node connecting the source electrode of the first nMOS transistor Q131 and the drain electrode of the second nMOS transistor Q132 serves as an output terminal of the cell M13, configured to deliver the signal stored in the capacitor C13 to the next bit-level cell. The third-column cell M23 on the second row encompasses a first nMOS transistor Q231 having a drain electrode connected to the clock signal supply line through a first delay element D 231 and a gate electrode connected to the output terminal of the previous bit-level cell M22 through a second delay element D 232; a second nMOS transistor Q232 having a drain electrode connected to a source electrode of the first nMOS transistor Q231, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C23 configured to store the information of the cell M23, connected in parallel with the second nMOS transistor Q232 ,wherein an output node connecting the source electrode of the first nMOS transistor Q231 and the drain electrode of the second nMOS transistor Q232 serves as an output terminal of the cell M23, configured to deliver the signal stored in the capacitor C23 to the next bit-level cell. The third-column cell M33 on the third row encompasses a first nMOS transistor Q331 having a drain electrode connected to the clock signal supply line through a first delay element D 331 and a gate electrode connected to the output terminal of the previous bit-level cell M32 through a second delay element D 332; a second nMOS transistor Q332 having a drain electrode connected to a source electrode of the first nMOS transistor Q331, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C33 configured to store the information of the cell M33, connected in parallel with the second nMOS transistor Q332 , wherein an output node connecting the source electrode of the first nMOS transistor Q331 and the drain electrode of the second nMOS transistor Q332 serves as an output terminal of the cell M33, configured to deliver the signal stored in the capacitor C33 to the next bit-level cell. ............ The third-column cell M(m-1)3 on the (m-1)-th row encompasses a first nMOS transistor Q(m-1)31 having a drain electrode connected to the clock signal supply line through a first delay element D (m-1)31 and a gate electrode connected to the output terminal of the previous bit-level cell M(m-1)2 through a second delay element D (m-1)32; a second nMOS transistor Q(m-1)32 having a drain electrode connected to a source electrode of the first nMOS transistor Q(m-1)31, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C(m-1)3 configured to store the information of the cell M(m-1)3, connected in parallel with the second nMOS transistor Q(m-1)32 , wherein an output node connecting the source electrode of the first nMOS transistor Q(m-1)31 and the drain electrode of the second nMOS transistor Q(m-1)32 serves as an output terminal of the cell M(m-1)3, configured to deliver the signal stored in the capacitor C(m-1)3 to the next bit-level cell. The third-column cell Mm3 on the m-th row encompasses a first nMOS transistor Qm31 having a drain electrode connected to the clock signal supply line through a first delay element D m31 and a gate electrode connected to the output terminal of the previous bit-level cell Mm2 through a second delay element D m32; a second nMOS transistor Qm32 having a drain electrode connected to a source electrode of the first nMOS transistor Qm31, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Cm3 configured to store the information of the cell Mm3, connected in parallel with the second nMOS transistor Qm32 ,wherein an output node connecting the source electrode of the first nMOS transistor Qm31 and the drain electrode of the second nMOS transistor Qm32 serves as an output terminal of the cell Mm3, configured to deliver the signal stored in the capacitor Cm3 to the next bit-level cell.
Still furthermore, as illustrated in Fig.4, the n-th memory unit of word-size level is implemented by a vertical array of bit-level cell M1n, M2n, M3n, ........, Mm-1,n, Mmn in the n-th column of the m * n matrix. The n-th-column cell M1n on the first row encompasses a first nMOS transistor Q1n1 having a drain electrode connected to the clock signal supply line through a first delay element D 1n1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell M1(n-1) through a second delay element D 1n2; a second nMOS transistor Q1n2 having a drain electrode connected to a source electrode of the first nMOS transistor Q1n1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C1n configured to store the information of the cell M1n, connected in parallel with the second nMOS transistor Q1n2 ,wherein an output node connecting the source electrode of the first nMOS transistor Q1n1 and the drain electrode of the second nMOS transistor Q1n2 serves as a bit-level output terminal of the cell M1n, configured to deliver the signal stored in the capacitor C1n to a first bit-level output terminal. The n-th-column cell M2n on the second row encompasses a first nMOS transistor Q2n1 having a drain electrode connected to the clock signal supply line through a first delay element D 2n1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell M2(n-1) through a second delay element D 2n2; a second nMOS transistor Q2n2 having a drain electrode connected to a source electrode of the first nMOS transistor Q2n1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C2n configured to store the information of the cell M2n, connected in parallel with the second nMOS transistor Q2n2 ,wherein an output node connecting the source electrode of the first nMOS transistor Q2n1 and the drain electrode of the second nMOS transistor Q2n2 serves as a bit-level output terminal of the cell M2n, configured to deliver the signal stored in the capacitor C2n to a second bit-level output terminal. The n-th-column cell M3n on the third row encompasses a first nMOS transistor Q3n1 having a drain electrode connected to the clock signal supply line through a first delay element D 3n1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell M3(n-1) through a second delay element D 3n2; a second nMOS transistor Q3n2 having a drain electrode connected to a source electrode of the first nMOS transistor Q3n1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C3n configured to store the information of the cell M3n, connected in parallel with the second nMOS transistor Q3n2 , wherein an output node connecting the source electrode of the first nMOS transistor Q3n1 and the drain electrode of the second nMOS transistor Q3n2 serves as a bit-level output terminal of the cell M3n, configured to deliver the signal stored in the capacitor C3n to a third bit-level output terminal. ............ The n-th-column cell M(m-1)n on the (m-1)-th row encompasses a first nMOS transistor Q(m-1)n1 having a drain electrode connected to the clock signal supply line through a first delay element D (m-1)n1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell M(m-1) (n-1) through a second delay element D (m-1)n2; a second nMOS transistor Q(m-1)n2 having a drain electrode connected to a source electrode of the first nMOS transistor Q(m-1)n1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C(m-1)n configured to store the information of the cell M(m-1)n, connected in parallel with the second nMOS transistor Q(m-1)n2 , wherein an output node connecting the source electrode of the first nMOS transistor Q(m-1)n1 and the drain electrode of the second nMOS transistor Q(m-1)n2 serves as a bit-level output terminal of the cell M(m-1)n, configured to deliver the signal stored in the capacitor C(m-1)n to a (m-1)-th bit-level output terminal. The n-th-column cell Mmn on the m-th row encompasses a first nMOS transistor Qmn1 having a drain electrode connected to the clock signal supply line through a first delay element D mn1 and a gate electrode connected to the bit-level output terminal of the previous bit-level cell Mm(n-1) through a second delay element D mn2; a second nMOS transistor Qmn2 having a drain electrode connected to a source electrode of the first nMOS transistor Qmn1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Cmn configured to store the information of the cell Mmn, connected in parallel with the second nMOS transistor Qmn2 ,wherein an output node connecting the source electrode of the first nMOS transistor Qmn1 and the drain electrode of the second nMOS transistor Qmn2 serves as a bit-level output terminal of the cell Mmn, configured to deliver the signal stored in the capacitor Cmn to a m-th bit-level output terminal.
As illustrated in Fig. 5, a bit-level cell Mij of the j-th column and on the i-th row, in the representative 2 * 2 cell-array of the marching main memory used in the computer system pertaining to the first embodiment of the present invention, encompasses a first nMOS transistor Qij1 having a drain electrode connected to a clock signal supply line through a first delay element D ij1 and a gate electrode connected to the output terminal of the previous bit-level cell through a second delay element D ij2; a second nMOS transistor Qij2 having a drain electrode connected to a source electrode of the first nMOS transistor Qij1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Cij configured to store the information of the bit-level cell Mij, connected in parallel with the second nMOS transistor Qij2 , wherein an output node connecting the source electrode of the first nMOS transistor Qij1 and the drain electrode of the second nMOS transistor Qij2 serves as an output terminal of the bit-level cell Mij, configured to deliver the signal stored in the capacitor Cij to the next bit-level cell Mi(j+1).
A column bit-level cell Mi(j+1) of the (j+1)-th column and on the i-th row encompasses a first nMOS transistor Qi(j+1)1 having a drain electrode connected to clock signal supply line through a first delay element D i(j+1)1 and a gate electrode connected to the output terminal of the previous bit-level cell Mij through a second delay element D i(j+1)2; a second nMOS transistor Qi(j+1)2 having a drain electrode connected to a source electrode of the first nMOS transistor Qi(j+1)1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Ci(j+1) configured to store the information of the bit-level cell Mi(j+1), connected in parallel with the second nMOS transistor Qi(j+1)2, wherein an output node connecting the source electrode of the first nMOS transistor Qi(j+1)1 and the drain electrode of the second nMOS transistor Qi(j+1)2 serves as an output terminal of the bit-level cell Mi(j+1), configured to deliver the signal stored in the capacitor Ci(j+1) to the next cell.
And, a bit-level cell M(i+1)j of the j-th column and on the (i+1)-th row encompasses a first nMOS transistor Q(i+1)j1 having a drain electrode connected to the clock signal supply line through a first delay element D (i+1)j1 and a gate electrode connected to the output terminal of the previous bit-level cell through a second delay element D (i+1)j2; a second nMOS transistor Q(i+1)j2 having a drain electrode connected to a source electrode of the first nMOS transistor Q(i+1)j1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C(i+1)j configured to store the information of the bit-level cell M(i+1)j, connected in parallel with the second nMOS transistor Q(i+1)j2 , wherein an output node connecting the source electrode of the first nMOS transistor Q(i+1)j1 and the drain electrode of the second nMOS transistor Q(i+1)j2 serves as an output terminal of the bit-level cell M(i+1)j, configured to deliver the signal stored in the capacitor C(i+1)j to the next bit-level cell M(i+1)(j+1).
Furthermore, a bit-level cell M(i+1)(j+1) of the (j+1)-th column and on the (i+1)-th row encompasses a first nMOS transistor Q(i+1)(j+1)1 having a drain electrode connected to the clock signal supply line through a first delay element D (i+1)(j+1)1 and a gate electrode connected to the output terminal of the previous bit-level cell M(i+1)j through a second delay element D (i+1)(j+1)2; a second nMOS transistor Q(i+1)(j+1)2 having a drain electrode connected to a source electrode of the first nMOS transistor Q(i+1)(j+1)1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor C(i+1)(j+1) configured to store the information of the bit-level cell M(i+1)(j+1), connected in parallel with the second nMOS transistor Q(i+1)(j+1)2, wherein an output node connecting the source electrode of the first nMOS transistor Q(i+1)(j+1)1 and the drain electrode of the second nMOS transistor Q(i+1)(j+1)2 serves as an output terminal of the bit-level cell M(i+1)(j+1), configured to deliver the signal stored in the capacitor C(i+1)(j+1) to the next cell.
As illustrated in Fig. 6, the j-th bit-level cell Mij on the i-th row encompasses a first nMOS transistor Qij1 having a drain electrode connected to a clock signal supply line through a first delay element D ij1 and a gate electrode connected to the output terminal of the previous cell through a second delay element D ij2; a second nMOS transistor Qij2 having a drain electrode connected to a source electrode of the first nMOS transistor Qij1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Cij configured to store the information of the bit-level cell Mij, connected in parallel with the second nMOS transistor Qij2 .
In the circuit configuration illustrated in Fig. 6, the second nMOS transistor Qij2 serves as a reset-transistor configured to reset the signal charge stored in the capacitor Cij, when a clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Qij2, discharging the signal charge already stored in the capacitor Cij.
Figs. 7A and 7B illustrate a schematic example of the transistor-level responses of the bit-level cell Mij illustrated in Fig. 6, which is one of the bit-level cells used in the computer system pertaining to the first embodiment of the present invention, to a waveform of a clock signal illustrated by broken line. The clock signal illustrated by broken line swings periodically between the logical levels of "1" and "0" with the clock period TAU(Greek-letter) clock. In Figs. 7A and 7B, t1-t0 (= t2-t1 = t3-t2 = t4-t3) is defined to be a quarter of the clock period TAUclock (=TAUclock /4).
(a) As illustrated in Fig. 7A(a), at time "t0", although the clock signal of high-level illustrated by the broken line is applied both to a drain electrode of the first nMOS transistor Qij1 through a first ideal delay element D ij1 and to a gate electrode of the second nMOS transistor Qij2, the second nMOS transistor Qij2 keeps off-state until the first nMOS transistor Qij1 will establish on-state at time "t1", because the potential of the output node Nout, connecting between a source electrode of the first nMOS transistor Qij1 and a drain electrode of the second nMOS transistor Qij2, is supposed to be a floating state, lying between the logical levels of "0" and "1", between the time "t0" and the time "t1".
(b) Owing to the first ideal delay element D ij1, because the turn on of the first nMOS transistor Qij1 is delayed by t1-t0 =TAUclock /4, the first nMOS transistor Qij1 becomes active as a transfer-transistor at time "t1", and the potential of the output node Nout becomes the logical level "1". Here, it is assumed that the first ideal delay element D ij1 can achieve a delay of TAUclock /4 with very sharp leading edge, by which the rise time can be neglected. That is, as illustrated by solid line with very sharp leading edge and very sharp trailing edge in Fig. 7A(a), the clock signal applied at time "t0" is delayed by t1-t0 =TAUclock /4. Then, as illustrated in Fig. 7A(c)-(d), if the signal stored in the previous bit-level cell Mi(j-1) is the logical level of "1", the second nMOS transistor Qij2 becomes active as a reset-transistor, and any signal charge stored in the capacitor Cij is driven to be discharged, and at time "t2".
(c) The first nMOS transistor Qij1 becomes completely active as the transfer-transistor at time "t2", delayed by a predetermined delay time td2 = t2-t0 =TAUclock /2, determined by the second ideal delay element D ij2. Here, it is assumed that the second ideal delay element D ij2 can achieve a delay of TAUclock /2 with very sharp leading edge, by which the rise time can be neglected. Then, if the signal of the logical level of "1" stored in a previous bit-level cell Mi(j-1) is fed from the previous bit-level cell Mi(j-1) on the i-th row to the gate electrode of the first nMOS transistor Qij1, at time "t2", the signal charge stored in the capacitor Cij is completely discharged to establish the logical level of "0", as illustrated in Fig. 7A(b), and the first nMOS transistor Qij1 begins transferring the signal of the logical level of "1" stored in the previous bit-level cell Mi(j-1), to the capacitor Cij so as to execute marching AND-gate operation as illustrated in Fig. 7A(c)-(d). That is, with an input signal of "1" provided by the clock signal and another input signal of "1" provided by the previous bit-level cell Mi(j-1), the conventional 2-input AND operation of:

1 + 1= 1

can be executed. By the way, if the signal charge stored in the capacitor Cij is of the logical level of "1", the capacitor Cij can begin discharging at time "t0", because the second nMOS transistor Qij2 can become active as the reset-transistor with the clock signal of the high-level illustrated by the broken line applied to the gate electrode of the second nMOS transistor Qij2 at time "t0", if the operation of the second nMOS transistor Qij2 has no delay.
(d) Alternatively, as illustrated in Fig. 7B(c)-(d), if the signal stored in the previous bit-level cell Mi(j-1) is the logical level of "0", the first nMOS transistor Qij1 keeps off-sate at any time "t0", "t1", "t2" and "t3". As above-mentioned, if the signal charge stored in the capacitor Cij is of the logical level of "1", although the first nMOS transistor Qij1 keeps off-sate, the capacitor Cij can begin discharging at time "t0", because the second nMOS transistor Qij2 can become active as the reset-transistor with the clock signal of the high-level illustrated by the broken line applied to the gate electrode of the second nMOS transistor Qij2 at time "t0", and the marching AND-gate operation of:

1 + 0 = 0

is executed as illustrated in Fig. 7A(c)-(d), with an input signal of "1" provided by the clock signal and another input signal of "0" provided by the previous bit-level cell Mi(j-1). However, if the signal charge stored in the capacitor Cij is of of the logical level of "0", because both of the first nMOS transistor Qij1 the second nMOS transistor Qij2 keep the off-sate, the capacitor Cij keep the logical level of "0" any time "t0", "t1", "t2"and "t3", and the marching AND-gate operation of is executed as illustrated in Fig. 7A(c)-(d). The output node Nout connecting the source electrode of the first nMOS transistor Qij1 and the drain electrode of the second nMOS transistor Qij2 serves as an output terminal of the bit-level cell Mij, and the output terminal of the bit-level cell Mij delivers the signal stored in the capacitor Cij to the next bit-level cell on the i-th row.
Furthermore, Fig. 7C illustrates an actual example of the response to the waveform of the clock signal, for a case that both of the first delay element D ij1 and the second delay element D ij2 are implemented by R-C delay circuit, as illustrated in Fig. 8. In a normal operation of the marching memory, the signal charge stored in the capacitor Cij is actually either of the logical level of "0" or"1", and if the signal charge stored in the capacitor Cij is of the logical level of "1", although the first nMOS transistor Qij1 still keeps off-sate, the capacitor Cij can begin discharging at time "t0", because the second nMOS transistor Qij2 can become active when the clock signal of the high-level is applied to the gate electrode of the second nMOS transistor Qij2, if an ideal operation of the second nMOS transistor Qij2 with no delay can be approximated. Therefore, if the signal charge stored in the capacitor Cij is actually of the logical level of "1", after the clock signal of high-level has been applied to the gate electrode of the second nMOS transistor Qij2 and the signal charge stored in the capacitor Cij has been discharged, the first nMOS transistor Qij1 becomes active as a transfer-transistor, delayed by a predetermined delay time td1 determined by the first delay element D ij1 implemented by the R-C delay circuit. And when the signal stored in a previous bit-level cell Mi(j-1) is fed from the previous bit-level cell Mi(j-1) on the i-th row to the gate electrode of the first nMOS transistor Qij1, the first nMOS transistor Qij1 transfers the signal stored in the previous bit-level cell Mi(j-1), further delayed by a predetermined delay time td2 determined by the second delay element D ij2 to the capacitor Cij. An output node Nout connecting the source electrode of the first nMOS transistor Qij1 and the drain electrode of the second nMOS transistor Qij2 serves as an output terminal of the bit-level cell Mij, and the output terminal of the bit-level cell Mij delivers the signal stored in the capacitor Cij to the next bit-level cell on the i-th row.
As illustrated in Fig. 7C, the clock signal swings periodically between the logical levels of "1" and "0", with a predetermined clock period (clock cycle time) TAUclock, and when the clock signal becomes the logical level of "1", the second nMOS transistor Qij2 begins to discharge the signal charge, which is already stored in the capacitor Cij at a previous clock cycle. And, after the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor Cij is completely discharged to the potential of the logical level of "0", the first nMOS transistor Qij1 becomes active as the transfer-transistor, delayed by the predetermined delay time td1 determined by the first delay element D ij1. The delay time td1 may be set to be equal to 1/4TAUclock preferably. Thereafter, when the signal stored in the previous bit-level cell Mi(j-1) on the i-th row is fed from the previous bit-level cell Mi(j-1) to the gate electrode of the first nMOS transistor Qij1, the first nMOS transistor Qij1 transfers the signal stored in the previous bit-level cell Mi(j-1), further delayed by the predetermined delay time td2 determined by the second delay element D ij2 implemented by the R-C delay circuit to the capacitor Cij.
For example, if the logical level of "1" stored in the previous bit-level cell Mi(j-1) on the i-th row is fed from the previous bit-level cell Mi(j-1) to the gate electrode of the first nMOS transistor Qij1, the first nMOS transistor Qij1 becomes conductive state, and the logical level of "1" is stored in the capacitor Cij. On the other hand, if the logical level of "0" stored in the previous bit-level cell Mi(j-1) is fed from the previous bit-level cell Mi(j-1) to the gate electrode of the first nMOS transistor Qij1, the first nMOS transistor Qij1 keeps cut-off state, and the logical level of "0" is maintained in the capacitor Cij. Therefore, the bit-level cell Mij can establish "a marching AND-gate" operation. The delay time td2 shall be longer than the delay time td1, and the delay time td2 may be set to be equal to 1/2TAUclock preferably.
Because the clock signal swings periodically between the logical levels of "1" and "0", with the clock period TAUclock, then, the clock signal becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, and the output node Nout connecting the source electrode of the first nMOS transistor Qij1 and the drain electrode of the second nMOS transistor Qij2 cannot deliver the signal transferred from the previous bit-level cell Mi(j-1) further to the next bit-level cell Mi(j+1) at a time when time proceeds 1/2TAUclock, as the signal is blocked to be transferred to the gate electrode of the next first nMOS transistor Qi(j+1)1 delayed by the delay time td2 = 1/2TAUclock determined by the second delay element D i(j+1)2. When the clock signal becomes the logical level of "1" again at a time when time proceeds TAUclock, the output node Nout connecting the source electrode of the first nMOS transistor Qij1 and the drain electrode of the second nMOS transistor Qij2, which is serving as the output terminal of the bit-level cell Mij, can deliver the signal stored in the capacitor Cij to the next bit-level cell Mi(j+1) at the next clock cycle.
Turning back to Fig.4, when the clock signal illustrated in Fig. 7A (a) or Fig. 7C becomes the logical level of "1", a sequence of the second nMOS transistors Q112, Q212, Q312, ........, Qm-1,12, Qm12 in the first memory unit U1 begin to discharge the signal charges, respectively, which are already stored in the capacitors C11, C21, C31, ........, Cm-1,1, Cm1, respectively, in the first memory unit U1 at a previous clock cycle. And, after the clock signal of the logical level of "1" is applied to the gate electrodes of the sequence of the second nMOS transistors Q112, Q212, Q312, ........, Qm-1,12, Qm12, respectively, and the signal charges stored in the capacitors C11, C21, C31, ........, Cm-1,1, Cm1 are completely discharged to the potential of the logical level of "0", a sequence of the first nMOS transistors Q111, Q211, Q311, ........, Qm-1,11, Qm11 becomes active as the transfer-transistors, delayed by the delay time td1 determined by the first delay elements D111, D211, D311, ........, Dm-1,11, Dm11, respectively. Thereafter, when a sequence of signals of word size, whish is multiples of eight bits, such as 16, 32, and 64 bits are entered to the gate electrodes of the sequence of the first nMOS transistors Q111, Q211, Q311, ........, Qm-1,11, Qm11, the sequence of the first nMOS transistors Q111, Q211, Q311, ........, Qm-1,11, Qm11 transfer the sequence of signals of word size to the capacitors C11, C21, C31, ........, Cm-1,1, Cm1, delayed by the delay time td2 determined by the second delay elements D112, D212, D312, ........, Dm-1,12, Dm12, respectively.
When the clock signal becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, each of the output nodes connecting the source electrodes of the first nMOS transistors Q111, Q211, Q311, ........, Qm-1,11, Qm11 and the drain electrodes of the second nMOS transistors Q112, Q212, Q312, ........, Qm-1,12, Qm12 cannot deliver the signals, which are entered to the gate electrodes of the first nMOS transistors Q111, Q211, Q311, ........, Qm-1,11, Qm11, further to the next bit-level cell M12, M22, M32, ........, Mm-1,2, Mm2 at a time when time proceeds 1/2TAUclock, as each of the signals is blocked to be transferred to the gate electrodes of the next first nMOS transistors Q121, Q221, Q321, ........, Qm-1,21, Qm21 delayed by the delay time td2 = 1/2TAUclock determined by the second delay element D122, D222, D322, ........, Dm-1,22, Dm22.
And, at a time when time proceeds TAUclock, when the next clock signal becomes the logical level of "1" again, a sequence of the second nMOS transistors Q122, Q222, Q322, ........, Qm-1,22, Qm22 in the second memory unit U2 begin to discharge the signal charges, respectively, which are already stored in the capacitors C12, C22, C32, ........, Cm-1,2, Cm2, respectively, in the second memory unit U2 at the previous clock cycle. And, after the clock signal of the logical level of "1" is applied to the gate electrodes of the sequence of the second nMOS transistors Q122, Q222, Q322, ........, Qm-1,22, Qm22, respectively, and the signal charges stored in the capacitors C12, C22, C32, ........, Cm-1,2, Cm2, are completely discharged to the potential of the logical level of "0", a sequence of the first nMOS transistors Q121, Q221, Q321, ........, Qm-1,21, Qm21 becomes active as the transfer-transistors, delayed by the delay time td1 determined by the first delay elements D121, D221, D321, ........, Dm-1,21, Dm21, respectively. Thereafter, when the sequence of signals of word size stored in the previous capacitors C11, C21, C31, ........, Cm-1,1, Cm1 are fed to the gate electrode of the sequence of the first nMOS transistors Q121, Q221, Q321, ........, Qm-1,21, Qm21, and the first nMOS transistor Q121, Q221, Q321, ........, Qm-1,21, Qm21 transfer the sequence of signals of word size, delayed by the delay time td2 determined by the second delay element D122, D222, D322, ........, Dm-1,22, Dm22, to the capacitors C12, C22, C32, ........, Cm-1,2, Cm2.
When the clock signal becomes the logical level of "0" at a time when time further proceeds (1+1/2)TAUclock, each of the output nodes connecting the source electrodes of the first nMOS transistors Q121, Q221, Q321, ........, Qm-1,21, Qm21 and the drain electrodes of the second nMOS transistors Q122, Q222, Q322, ........, Qm-1,22, Qm2 cannot deliver the signals stored in the previous bit-level cell M11, M21, M31, ........, Mm-1,1, Mm1 further to the next bit-level cell M12, M22, M32, ........, Mm-1,2, Mm2 at a time when time proceeds (1+1/2)TAUclock, as each of the signals is blocked to be transferred to the gate electrode of the next first nMOS transistor Q131, Q231, Q331, ........, Qm-1,31, Qm31 delayed by the delay time td2 = 1/2TAUclock determined by the second delay element D132, D232, D332, ........, Dm-1,32, Dm32.
And, at a time when time further proceeds 2TAUclock, when the next clock signal becomes the logical level of "1" again, a sequence of the second nMOS transistors Q132, Q232, Q332, ........, Qm-1,32, Qm32 in the third memory unit U3 begin to discharge the signal charges, respectively, which are already stored in the capacitors C13, C23, C33, ........, Cm-1,3, Cm3, respectively, in the third memory unit U3 at the previous clock cycle. And, after the clock signal of the logical level of "1" is applied to the gate electrodes of the sequence of the second nMOS transistors Q132, Q232, Q332, ........, Qm-1,32, Qm32, respectively, and the signal charges stored in the capacitors C13, C23, C33, ........, Cm-1,3, Cm3, are completely discharged to the potential of the logical level of "0", a sequence of the first nMOS transistors Q131, Q231, Q331, ........, Qm-1,31, Qm31 becomes active as the transfer-transistors, delayed by the delay time td1 determined by the first delay elements D131, D231, D331, ........, Dm-1,31, Dm31, respectively. Thereafter, when the sequence of signals of word size stored in the previous capacitors C12, C22, C32, ........, Cm-1,2, Cm2 are fed to the gate electrode of the sequence of the first nMOS transistors Q131, Q231, Q331, ........, Qm-1,31, Qm31, and the first nMOS transistor Q131, Q231, Q331, ........, Qm-1,31, Qm31 transfer the sequence of signals of word size, delayed by the delay time td2 determined by the second delay element D132, D232, D332, ........, Dm-1,32, Dm32, to the capacitors C13, C23, C33, ........, Cm-1,3, Cm3.
As illustrated in Fig. 8, each of the first delay element D ij1 and the second delay element D ij2 can be implemented by known "resistive-capacitive delay" or "R-C delay". In the RC circuit, the value of the time constant (in seconds) is equal to the product of the circuit resistance (in ohms) and the circuit capacitance (in farads), i.e. td1 , td2 = R * C. Because the structure of the RC circuit is very simple, it is preferable to use the RC circuit for the first delay element D ij1 and the second delay element D ij2. However, the RC circuit is mere example, and the first delay element D ij1 and the second delay element D ij2.can be implemented by another passive delay elements, or various active delay element, which may include active element of transistor, etc.
Fig.9 illustrates an example of the top view of the actual planar pattern of the bit-level cell Mij of the j-th column and on the i-th row illustrated in Fig. 8, which has the first delay element D ij1 and the second delay element D ij2 implemented by the R-C delay circuit, and Fig.10 illustrates the corresponding cross-sectional view taken on the line A-A of Fig.9. As illustrated in Fig.9, the first delay element D ij1 is implemented by a first meandering line 91 of conductive wire, and the second delay element D ij2 is implemented by a second meandering line 97 of conductive wire.
In Fig. 9, the first nMOS transistor Qij1 has a drain electrode region 93 connected to the first meandering line 91 via a contact plug 96a. The other end of the first meandering line 91 opposite to the end connected to the drain electrode region 93 of the first nMOS transistor Qij1 is connected to the clock signal supply line. The drain electrode region 93 is implemented by an n+ semiconductor region. A gate electrode of the first nMOS transistor Qij1 is implemented by the second meandering line 97. The other end of the second meandering line 97 opposite to the end serving as the gate electrode of the first nMOS transistor Qij1 is connected to the output terminal of the previous cell.
The second nMOS transistor Qij2 has a drain electrode region implemented by a common n+ semiconductor region 94, which also serves as the source electrode region of the first nMOS transistor Qij1, a gate electrode 98 connected to the clock signal supply line via a contact plug 96a, and a source electrode region 95 connected to the ground potential via a contact plug 96a. The source electrode region 95 is implemented by an n+ semiconductor region. Because the common n+ semiconductor region 94 is the output node connecting the source electrode region of the first nMOS transistor Qij1 and the drain electrode region of the second nMOS transistor Qij2, the common n+ semiconductor region 94 is connected to a surface wiring 92b via a contact plug 96d. The common n+ semiconductor region 94 serves as the output terminal of the bit-level cell Mij, and delivers the signal stored in the capacitor Cij to the next bit-level cell through the surface wiring 92b.
As illustrated in Fig. 10, the drain electrode region 93, the common n+ semiconductor region 94, and the source electrode region 95 is provided at the surface of and in the upper portion of the p-type semiconductor substrate 81. Instead of the p-type semiconductor substrate 81, the drain electrode region 93, the common n+ semiconductor region 94, and the source electrode region 95 can be provided in the upper portion of the p-well, or p-type epitaxial layer grown on a semiconductor substrate. On the p-type semiconductor substrate 81, an element isolation insulator 82 is provided so as to define an active area of the p-type semiconductor substrate 81 as a window provided in the element isolation insulator 82. And the drain electrode region 93, the common n+ semiconductor region 94, and the source electrode region 95 is provided in the active area, surrounded by the element isolation insulator 82. At the surface of and on the active area, a gate insulating film 83 is provided. And the gate electrode of the first nMOS transistor Qij1 implemented by the second meandering line 97 and the gate electrode 98 of the second nMOS transistor Qij2 are provided on the gate insulating film 83.
As illustrated in Fig. 10, a first interlayer dielectric film 84 is provided on the second meandering line 97 and the gate electrode 98. On a part of the first interlayer dielectric film 84, a bottom electrode 85 of the capacitor Cij configured to store the information of the bit-level cell Mij is provided. The bottom electrode 85 is made of conducting film, and a contact plug 96c is provided in the first interlayer dielectric film 84 so as to connect between the bottom electrode 85 and the source electrode region 95. And, on the bottom electrode 85, a capacitor insulating film 86 is provided.
Furthermore, on the capacitor insulating film 86, a top electrode 87 of the capacitor Cij is provided so as to occupy an upper portion of the bottom electrode 85. The top electrode 87 is made of conducting film. Although the illustration is omitted in the cross-sectional view illustrated in Fig. 10, the top electrode 87 is electrically connected to the common n+ semiconductor region 94 so as to establish an electric circuit topology that the capacitor Cij is connected in parallel with the second nMOS transistor Qij2. A variety of insulator films may be used as the capacitor insulating film 86. The miniaturized marching main memory may be required to occupy a small area of the bottom electrode 85 opposing the top electrode 87. However, to allow the marching main memory to function successfully, the capacitance between the bottom electrode 85 and the top electrode 87 via the capacitor insulating film 86 needs to maintain a constant value. In particular, with a miniaturized marching main memory with a minimum line width of approximately 100 nm or less, usage of a material with a dielectric constant er greater than that of a silicon oxide (SiO2) film is preferred, considering the storage capacitance between the bottom electrode 85 and the top electrode 87. With an ONO film, for example, the ratio in thickness of the upper layer silicon oxide film, the middle layer silicon nitride film, and the underlayer silicon oxide film is selectable, however, a dielectric constant er of approximately 5 to 5.5 can be provided. Alternatively, a single layer film made from any one of a strontium oxide (SrO) film with er = 6, a silicon nitride (Si3N4) film with er = 7, an aluminum oxide (Al2O3) film where er = 8 - 11, a magnesium oxide (MgO) film where er = 10, an yttrium oxide (Y2O3) film where er = 16-17, a hafnium oxide (HfO2) film where er = 22-23, a zirconium oxide (ZrO2) film where er = 22-23, a tantalum oxide (Ta2O5) film where er = 25- 27, or a bismuth oxide (Bi2O3) film where er = 40, or a composite film embracing at least two of these plural layers thereof may be used. Ta2O5 and Bi2O3 show disadvantages in lacking thermal stability at the interface with the polysilicon. Furthermore, it may be a composite film made from a silicon oxide film and these films. The composite film may have a stacked structure of triple-levels or more. In other words, it should be an insulating film containing a material with the relative dielectric constant er of 5 to 6 or greater in at least a portion thereof. However, in the case of a composite film, selecting a combination that results in having an effective relative dielectric constant ereff of 5 to 6 or greater measured for the entire film is preferred. Moreover, it may also be an insulating film made from an oxide film of a ternary compound such as a hafnium aluminate (HfAlO) film.
Furthermore, a second interlayer dielectric film 87 is provided on the top electrode 87. And the first meandering line 91 is provided on second interlayer dielectric film 87. As illustrated in Fig. 10, the contact plug 96a is provided, penetrating the first interlayer dielectric film 84, the capacitor insulating film 86 and the second interlayer dielectric film 87 so as to connect between the first meandering line 91 and the drain electrode region 93
In a topology illustrated in Figs.9 and 10, the capacitance C of the R-C delay is implemented by the stray capacitance associated with the first meandering line 91 and the second meandering line 97. Because both R and C are proportional to wire lengths of the first meandering line 91 and the second meandering line 97, the delay times td1, td2 can be easily designed by electing the wire lengths of the first meandering line 91 and the second meandering line 97. Furthermore, we can design the thickness, the cross section, or the resistivity of the first meandering line 91 and the second meandering line 97 to as to achieve desired value of the delay times td1, td2.
For example, because the delay time td2 shall be twice of the delay time td1, the wire length of the second meandering line 97 can be designed as 21/2 time of the wire length the first meandering line 91, if we use the same thickness, the same cross section, and the material having the same specific resistively for the first meandering line 91 and the second meandering line 97, and further the same effective thickness and the same effective dielectric constant for the insulating film implementing the stray capacitance for the R-C delay (= R * C). However, if we use different materials for the first meandering line 91 and the second meandering line 97, the wire lengths of the first meandering line 91 and the second meandering line 97shall be determined depending on the resistivities of the first meandering line 91 and the second meandering line 97 so as to achieve the required values of the delay times td1, td2. For example, in a case that the second meandering line 97 is formed of polycrystalline silicon, and the first meandering line 91 is formed of refractory material such as tungsten (W), molybdenum (Mo), platinum (Pt), having a higher resistivity than the polycrystalline silicon, the wire lengths of the first meandering line 91 and the second meandering line 97 are determined depending on the resistivities of the first meandering line 91 and the second meandering line 97 so as to achieve the required values of the delay times td1, td2.
Furthermore, although the first meandering line 91 and the second meandering line 97 are illustrated in Fig.9, the illustrated meandering topology for resistor R is mere example, and other topologies such as a straight line configuration can be used depending upon the required values of resistor R and capacitance C. In a very high speed operation of the marching main memory 31, the delineation of extrinsic resistor elements R can be omitted, if parasitic resistance (stray resistance) and parasitic capacitance (stray capacitance) can achieve the required delay times td1, td2.
In the configuration illustrated in Figs.4-6, although an isolation between a signal-storage state of the (j-1)-th bit-level cell Mij-1 on the i-th row and a signal-storage state of the j-th bit-level cell Mij on the i-th row can be established by a propagation delay accompanying the signal propagation path between the output terminal of the (j-1)-th bit-level cell Mij-1 and the gate electrode of the first nMOS transistor Qij1 of the j-th bit-level cell Mij, the propagation delay is mainly ascribable to the value of the second delay element D ij2, it is preferable to insert an inter-unit cell Bij between the (j-1)-th bit-level cell Mij-1 and the j-th bit-level cell Mij, as illustrated in Figs. 11 and 13.
Although the inter-unit cell Bij is provided so as to isolate the signal-storage state of the j-th bit-level cell Mij in the j-th memory unit Uj from the signal-storage state of the (j-1)-th bit-level cell Mij-1 in the (j-1)-th memory unit Uj-1, the inter-unit cell Bij transfers a signal from the (j-1)-th bit-level cell Mij-1 to the j-th bit-level cell Mij at a required timing determined by a clock signal, which is supplied through the clock signal supply line. Because the j-th memory unit Uj stores information of byte size or word size by the sequence of bit-level cells arrayed in the j-th memory unit Uj, and the (j-1)-th memory unit Uj-1 stores information of byte size or word size by the sequence of bit-level cells arrayed in the (j-1)-th memory unit Uj-1, a sequence of inter-unit cells arrayed in parallel with the memory units Uj-1 and Uj transfers the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line so that the information of byte size or word size can march along a predetermined direction, pari passu. As illustrated in Figs. 11 and 13, because the input terminal of the j-th bit-level cell Mij on the i-th row is connected to the inter-unit cell Bij, the signal charge stored in the (j-1)-th bit-level cell Mij-1 is fed to the second delay element D ij2 through the inter-unit cell Bij at the required timing, and the transfer operation of the signal charge is cut off at periods other than the required timing.
In Figs. 11 and 13, although an example of the inter-unit cell Bij, which encompasses a single isolation transistor Qij3 having a first main-electrode connected to the output terminal of the (j-1)-th bit-level cell Mij, a second main-electrode connected to the input terminal of the j-th bit-level cell Mij and a control electrode connected to the clock signal supply line, the structure of the inter-unit cell Bij is not limited to the configuration illustrated in Figs. 11 and 13. For example, the inter-unit cell Bij may be implemented by a clocked-circuit having a plurality of transistors, which can transfer the signal from the (j-1)-th bit-level cell Mij-1 to the j-th bit-level cell Mij at the required timing determined by the clock signal.
Similar to the configuration illustrated in Fig. 5, the j-th bit-level cell Mij encompasses the first nMOS transistor Qij1 having the drain electrode connected to the clock signal supply line through the first delay element D ij1 and the gate electrode connected to the inter-unit cell Bij through the second delay element D ij2; the second nMOS transistor Qij2 having the drain electrode connected to the source electrode of the first nMOS transistor Qij1, the gate electrode connected to the clock signal supply line, and the source electrode connected to the ground potential; and the capacitor Cij configured to store the information of the bit-level cell Mij, connected in parallel with the second nMOS transistor Qij2.
An example of planar structure of the inter-unit cell Bij, encompassing a single isolation transistor Qij3 of nMOS transistor is illustrated in Fig. 12, in addition to the configuration of the bit-level cell Mij, which are already illustrated in Fig. 9. In the bit-level cell Mij, the first nMOS transistor Qij1 having the drain electrode region 93, the first meandering line 91 connected to the drain electrode region 93 via a contact plug 96a, the second meandering line 97 implementing the gate electrode of the first nMOS transistor Qij1, and the second nMOS transistor Qij3 having the drain electrode region implemented by the common n+ semiconductor region 94, serving as the output terminal of the bit-level cell Mij are illustrated.
In Fig. 12, the isolation transistor Qij3 of the inter-unit cell Bij has a first main-electrode region implemented by a left side of an n+ semiconductor region 90, a gate electrode 99 connected to the clock signal supply line, and a second main-electrode region implemented by a right side of the n+ semiconductor region 90. The second main-electrode region is connected to one end of the second meandering line 97 opposite to the other end of the second meandering line 97, which serves as the gate electrode of the first nMOS transistor Qij1 via a contact plug 96e, and first main-electrode region is connected to the output terminal of the previous cell Mij-1 via a contact plug 96f. Although the illustration is omitted, similar to the structure illustrated in Fig. 10, on an interlayer dielectric film provided on the second meandering line 97, for example, a parallel plate structure of the capacitor Cij configured to store the information of the bit-level cell Mij may be provided, being connected in parallel with the second nMOS transistor Qij2.
In Fig. 13, in addition to the configuration illustrated in Fig. 11, another inter-unit cell Bi(j-1) is provided between the (j-2)-th bit-level cell Mi(j-2) and the (j-1)-th bit-level cell Mi(j-1),configured to isolate the signal-storage state of the (j-1)-th bit-level cell Mi(j-1) in the (j-1)-th memory unit Uj-1 from the signal-storage state of the (j-2)-th bit-level cell Mi(j-2) in the (j-2)-th memory unit Uj-2, and to transfer a signal from the (j-2)-th bit-level cell Mi(j-2) to the (j-1)-th bit-level cell Mi(j-1) at the required timing determined by the clock signal, which is supplied through the clock signal supply line. In Fig. 13, because the input terminal of the (j-1)-th bit-level cell Mi(j-1) on the i-th row is connected to the inter-unit cell Bi(j-1), the signal charge stored in the (j-2)-th bit-level cell Mi(j-2) is fed to the second delay element D i(j-1)2 through the inter-unit cell Bi(j-1) at the required timing, and the transfer operation of the signal charge is cut off thereafter.
In Fig. 13, although an example of the inter-unit cell Bi(j-1), which encompasses a single isolation transistor Qi(j-1)3 having a first main-electrode connected to the output terminal of the (j-2)-th bit-level cell Mi(j-1), a second main-electrode connected to the input terminal of the (j-1)-th bit-level cell Mi(j-1) and a control electrode connected to the clock signal supply line, the structure of the inter-unit cell Bi(j-1) is not limited to the configuration illustrated in Fig. 13, and the inter-unit cell Bi(j-1) may be implemented by a clocked-circuit having a plurality of transistors, which can transfer the signal from the (j-2)-th bit-level cell Mi(j-2) to the (j-1)-th bit-level cell Mi(j-1) at the required timing determined by the clock signal.
Similar to the configuration of the j-th bit-level cell Mij, the (j-1)-th bit-level cell Mi(j-1) encompasses a first nMOS transistor Qi(j-1)1 having a drain electrode connected to the clock signal supply line through a first delay element D i(j-1)1 and a gate electrode connected to the inter-unit cell Bi(j-1) through a second delay element D i(j-1)2; a second nMOS transistor Qi(j-1)2 having a drain electrode connected to the source electrode of the first nMOS transistor Qi(j-1)1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Ci(j-1) configured to store the information of the bit-level cell Mi(j-1), connected in parallel with the second nMOS transistor Qi(j-1)2 .
In the circuit configuration illustrated in Figs. 11 and 13, the second nMOS transistor Qij2 of the bit-level cell Mij, serves as a reset-transistor configured to reset the signal charge stored in the capacitor Cij, when the clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Qij2, discharging the signal charge already stored in the capacitor Cij, and the second nMOS transistor Qi(j-1)2 of the bit-level cell Mi(j-1) serves as a reset-transistor configured to reset the signal charge stored in the capacitor Ci(j-1), when the clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Qi(j-1)2, discharging the signal charge already stored in the capacitor Ci(j-1). Therefore, the isolation transistors Qi(j-1)3 and Qij3 may be pMOS transistors, which can operate complementary with the second nMOS transistors Qi(j-1)2 and Qij2, although Figs. 11 and 13 represent the transistor symbol of an nMOS transistor as the isolation transistors Qi(j-1)3 and Qij3. That is, when the second nMOS transistors Qi(j-1)2 and Qij2 are conductive state for discharging the signal charge stored in the capacitors Ci(j-1) and Cij, the isolation transistors Qi(j-1)3 and Qij3 shall be cut-off state so as to establish the isolation between the memory units, and when the second nMOS transistors Qi(j-1)2 and Qij2 are cut-off state, the isolation transistors Qi(j-1)3 and Qij3 shall be conductive state so as to transfer the signal charges between the memory units.
Alternatively, if the isolation transistors Qi(j-1)3 and Qij3 are nMOS transistors, as the transistor symbol illustrates in Figs. 11 and 13, the isolation transistors Qi(j-1)3 and Qij3 shall be high-speed transistors having a shorter rise time, a shorter period of conductive state, and a shorter fall time than the second nMOS transistors Qi(j-1)2 and Qij2, which have larger stray capacitances and larger stray resistances associated with gate circuits and gate structures so that, when the second nMOS transistors Qi(j-1)2 and Qij2 are still in the cut-off state, the isolation transistors Qi(j-1)3 and Qij3 becomes the conductive state very rapidly so as to transfer the signal charges between the memory units, and when the second nMOS transistors Qi(j-1)2 and Qij2 start slowly toward the conductive state for discharging the signal charge stored in the capacitors Ci(j-1) and Cij, the isolation transistors Qi(j-1)3 and Qij3 proceeds to become the cut-off state very rapidly so as to establish the isolation between the memory units. As a candidate for such high-speed transistors, a normally off type MOS static induction transistor (SIT) can be used, which represents triode-like I-V characteristic. N-channel MOSSIT can be considered as an extreme ultimate structure of the short channel nMOSFET. Owing to the triode-like I-V characteristic, because the on-state of the MOSSIT depends both on a gate voltage and a potential deference between the first and second main-electrodes, a very short time interval of the on-state can be achieved. Instead of the MOSSIT, any normally off type switching devices such as a tunneling SIT, which represent a very short on-state period like Dirac delta function, can be used.
Fig. 14(a) illustrates a timing diagram of a response of the bit-level cell Mi(j-1) illustrated in Fig. 13, and Fig. 14(b) illustrates a next timing diagram of a next response of the next bit-level cell Mij illustrated in Fig. 13, to a waveform of a clock signal. In Figs. 14(a) and (b), the clock signal is supposed to swing periodically between the logical levels of "1" and "0" with the clock period TAU(Greek-letter) clock, and the shaded rectangular area with backward diagonals illustrates a regime for a reset timing of the signal charges stored in the capacitors Ci(j-1) and Cij, respectively, and further, the shaded rectangular area with forward diagonals illustrates a regime for a charge-transfer timing of the signal charges to the capacitors Ci(j-1) and Cij, respectively.
That is, as shown in Fig. 14(a), if the signal charges stored in the capacitor Ci(j-1) is of the logical level of "1", although the first nMOS transistor Qi(j-1)1 still keeps off-sate, the signal charge stored in the capacitor Ci(j-1) is being driven to be discharging, in the shaded rectangular area with backward diagonals. After the capacitor Ci(j-1) begins discharging, in the shaded rectangular area with forward diagonals, the first nMOS transistor Qi(j-1)1 becomes active as a transfer-transistor, delayed by a predetermined delay time td1 determined by the first delay element D i(j-1)1 implemented by the R-C delay circuit. And, when the signal stored in a previous bit-level cell Mi(j-2) is fed through the inter-unit cell Bi(j-1) to the gate electrode of the first nMOS transistor Qi(j-1)1, the first nMOS transistor Qi(j-1)1 transfers the signal stored in the previous bit-level cell Mi(j-2), further delayed by a predetermined delay time td2 determined by the second delay element D i(j-1)2 to the capacitor Ci(j-1) in the shaded rectangular area with forward diagonals.
Similarly, as shown in Fig. 14(b), if the signal charges stored in the capacitor is of the logical level of "1", although the first nMOS transistor Qij1 still keeps off-sate, the signal charge stored in the capacitor Cij is being driven to be discharging, in the shaded rectangular area with backward diagonals. After the capacitor Cij begins discharging, in the shaded rectangular area with forward diagonals, the first nMOS transistor Qij1 becomes active as a transfer-transistor, delayed by a predetermined delay time td1 determined by the first delay element D ij1 implemented by the R-C delay circuit. And, when the signal stored in a previous bit-level cell Mi(j-1) is fed through the inter-unit cell Bij to the gate electrode of the first nMOS transistor Qij1, the first nMOS transistor Qij1 transfers the signal stored in the previous bit-level cell Mi(j-1), further delayed by a predetermined delay time td2 determined by the second delay element D ij2 to the capacitor Cij in the shaded rectangular area with forward diagonals
Fig. 15 illustrates a more detailed response of the bit-level cell Mi(j-1) illustrated in Fig. 13, which is one of the bit-level cells used in the computer system pertaining to the first embodiment of the present invention, to the waveform of the clock signal illustrated by thin solid line, for a case that both of the first delay element D i(j-1)1 and the second delay element D i(j-1)2 are implemented by R-C delay circuit, as illustrated in Fig. 12. The clock signal illustrated by thin solid line swings periodically between the logical levels of "1" and "0" with the clock period TAUclock. In Fig. 15, time intervalTAU1= TAU2 = TAU3 = TAU4 is defined to be a quarter of the clock period TAUclock (= TAUclock /4).
In a normal operation of the marching memory, the signal charge stored in the capacitor Ci(j-1) is actually either of the logical level of "0" or"1", as illustrated in Figs. 16 (a)-(d). If the signal charge stored in the capacitor Ci(j-1) is of the logical level of "1", as illustrated in Figs. 16 (c) and (d), although the first nMOS transistor Qi(j-1)1 still keeps off-sate, the capacitor Ci(j-1) can begin discharging at the beginning of the time intervalTAU1, because the second nMOS transistor Qi(j-1)2 becomes active when the clock signal of the high-level is applied to the gate electrode of the second nMOS transistor Qi(j-1)2, under the assumption that an ideal operation of the second nMOS transistor Qi(j-1)2 with no delay can be approximated. Therefore, if the signal charge stored in the capacitor Ci(j-1) is actually of the logical level of "1", after the clock signal of high-level has been applied to the gate electrode of the second nMOS transistor Qi(j-1)2, as illustrated by the thin solid line in Fig. 15, and the signal charge stored in the capacitor Ci(j-1) will be discharged, and thereafter, the first nMOS transistor Qi(j-1)1 becomes active as a transfer-transistor, delayed by a predetermined delay time td1 determined by the first delay element D i(j-1)1 implemented by the R-C delay circuit. In Fig. 15, the change of the potential at the drain electrode of the first nMOS transistor Qi(j-1)1 is illustrated by dash-dotted line.
And, as illustrated by a thick solid line in Fig. 15, when the signal level of "1" stored in a previous bit-level cell Mi(j-2) is fed from the previous bit-level cell Mi(j-2) on the i-th row through the inter-unit cell Bi(j-1) to the gate electrode of the first nMOS transistor Qi(j-1)1, the first nMOS transistor Qi(j-1)1 transfers the signal level of "1" stored in the previous bit-level cell Mi(j-2), further delayed by a predetermined delay time td2 determined by the second delay element D i(j-1)2 to the capacitor Ci(j-1). Alternatively, as illustrated by a broken line in Fig. 15, when the signal level of "0" stored in a previous bit-level cell Mi(j-2) is fed from the previous bit-level cell Mi(j-2) to the gate electrode of the first nMOS transistor Qi(j-1)1, the first nMOS transistor Qi(j-1)1 transfers the signal level of "0" stored in the previous bit-level cell Mi(j-2), further delayed by the predetermined delay time td2 to the capacitor Ci(j-1). An output node Nout connecting the source electrode of the first nMOS transistor Qi(j-1)1 and the drain electrode of the second nMOS transistor Qi(j-1)2 serves as an output terminal of the bit-level cell Mi(j-1), and the output terminal delivers the signal stored in the capacitor Ci(j-1) to the next bit-level cell on the i-th row.
As illustrated by the thin solid line in Fig. 15, when the clock signal becomes the logical level of "1", the second nMOS transistor Qi(j-1)2 begins to discharge the signal charge, which is already stored in the capacitor Ci(j-1) at a previous clock cycle. And, after the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor Ci(j-1) is completely discharged to the potential of the logical level of "0", the first nMOS transistor Qi(j-1)1 becomes active as the transfer-transistor, delayed by the predetermined delay time td1 determined by the first delay element D i(j-1)1. The delay time td1 may be set to be equal to 1/4TAUclock = TAU1 preferably.
Thereafter, when the signal stored in the previous bit-level cell Mi(j-2) is fed from the previous bit-level cell Mi(j-2) to the gate electrode of the first nMOS transistor Qi(j-1)1 through the inter-unit cell Bi(j-1), as illustrated by thick solid line and broken line, the first nMOS transistor Qi(j-1)1 transfers the signal stored in the previous bit-level cell Mi(j-2), further delayed by the predetermined delay time td2 determined by the second delay element D i(j-1)2 implemented by the R-C delay circuit to the capacitor Ci(j-1).
For example, if the logical level of "1" stored in the previous bit-level cell Mi(j-2) is fed from the previous bit-level cell Mi(j-2) to the gate electrode of the first nMOS transistor Qi(j-1)1 as illustrated by the thick solid line, the first nMOS transistor Qi(j-1)1 becomes conductive state at the beginning of the time intervalTAU3, and the logical level of "1" is stored in the capacitor Ci(j-1). On the other hand, if the logical level of "0" stored in the previous bit-level cell Mi(j-2) is fed from the previous bit-level cell Mi(j-2) to the gate electrode of the first nMOS transistor Qi(j-1)1 as illustrated by the broken line, the first nMOS transistor Qi(j-1)1 keeps the cut-off state, and the logical level of "0" is maintained in the capacitor Ci(j-1). Therefore, the bit-level cell Mi(j-1) can establish "a marching AND-gate" operation. The delay time td2 shall be longer than the delay time td1, and the delay time td2 may be set to be equal to 1/2TAUclock preferably.
Because the clock signal swings periodically between the logical levels of "1" and "0", with the clock period TAUclock as illustrated by the thin solid line, then, the clock signal becomes the logical level of "0" as time proceeds by 1/2TAUclock, or at the beginning of the time intervalTAU3, the potential at the drain electrode of the first nMOS transistor Qi(j-1)1 begins to decay as illustrated by the dash-dotted line. If the inter-unit cell Bij, inserted between the current bit-level cell Mi(j-1) and the next bit-level cell Mij, is implemented by an nMOS transistor, the path between the output terminal of the current bit-level cell Mi(j-1) and the gate electrode of the first nMOS transistor Qij1 of the next bit-level cell Mij, becomes the cut-off state by the logical level of "0" of the clock signal being applied to the gate electrode of the nMOS transistor, and therefore, the output node Nout connecting the source electrode of the first nMOS transistor Qi(j-1)1 and the drain electrode of the second nMOS transistor Qi(j-1)2 cannot deliver the signal transferred from the previous bit-level cell Mi(j-2) further to the next bit-level cell Mij like duckpins in the time intervalsTAU3 andTAU4, and the signal is blocked to be domino transferred to the gate electrode of the next first nMOS transistor Qij1. Because the first nMOS transistor Qi(j-1)1 becomes the cut-off state in the time intervalsTAU3 andTAU4, the potential at the output node Nout is kept in a floating state, and the signal states stored in the capacitor Ci(j-1) are held.
When the clock signal becomes the logical level of "1" again, as illustrated by the thin solid line in a next column of Fig. 15, the output node Nout connecting the source electrode of the first nMOS transistor Qi(j-1)1 and the drain electrode of the second nMOS transistor Qi(j-1)2, which is serving as the output terminal of the bit-level cell Mi(j-1), can deliver the signal stored in the capacitor Ci(j-1) to the next bit-level cell Mij at the next clock cycle because the inter-unit cell Bij becomes conductive state, and the potential at the drain electrode of the first nMOS transistor Qi(j-1)1 increase as illustrated by the dash-dotted line.
Figs. 16 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell Mij illustrated in Figs. 11 and 13, the bit-level cell Mij is one of the bit-level cells arrayed sequentially in the j-th memory unit Uj, the j-th memory unit Uj stores information of byte size or word size by the sequence of bit-level cells arrayed sequentially in the j-th memory unit Uj. In the computer system pertaining to the first embodiment of the present invention, the information of byte size or word size arrayed sequentially marches side by side from a previous memory unit to a next memory unit, pari passu. In Figs. 16 (a)-(d), the clock signal is supplied by the clock signal supply line CLOCK so as to swing periodically between the logical levels of "1" and "0" with the clock period TAUclock, while the clock signal supply line CLOCK serves as a power supply line.
Figs. 16(a) and (b) illustrate the cases when the logical level of "0" is stored by previous clock signal into the capacitor Cij, and Figs. 16 (c) and (d) illustrate the cases when the logical level of "1" is stored by previous clock signal into the capacitor Cij as one of the signal in the information of byte size or word size. As illustrated in Fig. 16(a), in a case when the signal charge previously stored in the capacitor Cij is of the logical level of "0", if the signal of the logical level of "0", which is stored in a previous bit-level cell Mi(j-1), as one of the signal in the information of byte size or word size to be transferred in a cooperative way, is fed from the previous bit-level cell Mi(j-1) through the inter-unit cell Bij (the illustration is omitted) to the gate electrode of the first nMOS transistor Qij1, in the timing the signal charge stored in the capacitor Cij keeping the logical level of "0", because the first nMOS transistor Qij1 keeps off-state, the output node Nout connecting the source electrode of the first nMOS transistor Qij1 and the drain electrode of the second nMOS transistor Qij2 delivers the signal level of "0", which is maintained in the capacitor Cij, to the next bit-level cell on the i-th row, so as to execute marching AND-gate operation of 0 + 1= 0 with an input signal of "1" provided by the clock signal.
Similarly, as illustrated in Fig. 16(b), in a case when the signal charge previously stored in the capacitor Cij is of the logical level of "0", if the signal of the logical level of "1" stored in a previous bit-level cell Mi(j-1) is fed from the previous bit-level cell Mi(j-1) through the inter-unit cell Bij to the gate electrode of the first nMOS transistor Qij1, in the timing the signal charge stored in the capacitor Cij keeps the logical level of "0", the first nMOS transistor Qij1 begins turning-on for transferring the signal of the logical level of "1" stored in the previous bit-level cell Mi(j-1) to the capacitor Cij so that the logical level of "1" can be stored in the capacitor Cij, and the output node Nout delivers the signal level of "1" stored in the capacitor Cij to the next bit-level cell on the i-th row, so as to execute marching AND-gate operation 1 + 1= 1 with an input signal of "1" provided by the clock signal.
On the contrary, as illustrated in Fig. 16(c), in a case when the signal charge previously stored in the capacitor Cij is of the logical level of "1", if the signal of the logical level of "0", which is stored in a previous bit-level cell Mi(j-1), is fed from the previous bit-level cell Mi(j-1) through the inter-unit cell Bij to the gate electrode of the first nMOS transistor Qij1, after the timing when the signal charge stored in the capacitor Cij is completely discharged to establish the logical level of "0", because the first nMOS transistor Qij1 keeps off-state, the output node Nout delivers the signal level of "0" stored in the capacitor Cij to the next bit-level cell on the i-th row, so as to execute marching AND-gate operation of 0 + 1= 0 with an input signal of "1" provided by the clock signal.
Similarly, as illustrated in Fig. 16(d), in a case when the signal charge previously stored in the capacitor Cij is of the logical level of "1", if the signal of the logical level of "1" stored in a previous bit-level cell Mi(j-1) is fed from the previous bit-level cell Mi(j-1) through the inter-unit cell Bij to the gate electrode of the first nMOS transistor Qij1, after the timing when the signal charge stored in the capacitor Cij is completely discharged to establish the logical level of "0", the first nMOS transistor Qij1 begins turning-on for transferring the signal of the logical level of "1" stored in the previous bit-level cell Mi(j-1) to the capacitor Cij so that the logical level of "1" can be stored in the capacitor Cij, and the output node Nout delivers the signal level of "1" stored in the capacitor Cij to the next bit-level cell on the i-th row, so as to execute marching AND-gate operation 1 + 1= 1 with an input signal of "1" provided by the clock signal.
Similar to the configuration illustrated in Fig. 11, although an inter-unit cell Bij is inserted between the (j-1)-th bit-level cell Mij-1 and the j-th bit-level cell Mij, and the j-th bit-level cell Mij encompasses the first nMOS transistor Qij1 having the drain electrode connected to the clock signal supply line through the first delay element D ij1 and the gate electrode connected to the inter-unit cell Bij through the second delay element D ij2; the second nMOS transistor Qij2 having the drain electrode connected to the source electrode of the first nMOS transistor Qij1, the gate electrode connected to the clock signal supply line, and the source electrode connected to the ground potential; and the capacitor Cij configured to store the information of the bit-level cell Mij, connected in parallel with the second nMOS transistor Qij2, the features such that the first delay element D ij1 is implemented by a first diode D1a, and the second delay element D ij2 is implemented by a tandem connection of a second diode D2a and a third diode D3a is distinguishable from the configuration illustrated in Fig. 11.
Although any p-n junction diode can be represented by an equivalent circuit encompassing resistors including the series resistance such as the diffusion resistance, the lead resistance, the ohmic contact resistance and the spreading resistance, etc., and capacitors including the diode capacitance such as the junction capacitance or the diffusion capacitance, and a single diode or a tandem connection of diodes can serve as "resistive-capacitive delay" or "R-C delay", because the value of "R-C delay" can be made much smaller than the values achieved by the specialized and dedicated R-C elements such as the first meandering line 91 and the second meandering line 97 illustrated in Figs. 9 and 12, the operation of the j-th bit-level cell Mij with the inter-unit cell Bij illustrated in Fig. 17 can achieve more preferable operation than the operation achieved by the configuration illustrated in Fig. 12. That is, the operation of the j-th bit-level cell Mij with the inter-unit cell Bij illustrated in Fig. 17 can approaches to an ideal delay performance illustrated in Figs. 7A and 7B, in which any rise time and fall time are not illustrated, and wave forms of the pulses are illustrated by ideal rectangular shape. In addition to the performance by the configuration illustrated in Figs. 11 and 12, because the tandem connection of the second diode D2a and the third diode D3a can block efficiently the flow of the reverse-directional current, the configuration implemented by a combination of the j-th bit-level cell Mij with the inter-unit cell Bij illustrated in Fig. 17 can achieve a better isolation between the signal-storage state of the (j-1)-th bit-level cell Mi(j-1) and the signal-storage state of the j-th bit-level cell Mij, even if the signal of the lower logical level of "0" stored in the previous bit-level cell Mi(j-1) is fed to the gate electrode of the first nMOS transistor Qij1 through the inter-unit cell Bij.
In Fig. 18, in addition to the configuration illustrated in Fig. 17, another inter-unit cell Bi(j-1) is provided between the (j-2)-th bit-level cell Mi(j-2) and the (j-1)-th bit-level cell Mi(j-1),configured to isolate the signal-storage state of the (j-1)-th bit-level cell Mi(j-1) in the (j-1)-th memory unit Uj-1 from the signal-storage state of the (j-2)-th bit-level cell Mi(j-2) in the (j-2)-th memory unit Uj-2, and to transfer a signal from the (j-2)-th bit-level cell Mi(j-2) to the (j-1)-th bit-level cell Mi(j-1) at the required timing determined by the clock signal, which is supplied through the clock signal supply line. In Fig. 18, because the input terminal of the (j-1)-th bit-level cell Mi(j-1) is connected to the inter-unit cell Bi(j-1), the signal charge stored in the (j-2)-th bit-level cell Mi(j-2) is fed to the second delay element D i(j-1)2 through the inter-unit cell Bi(j-1) at the required timing, and the transfer of the signal charge is cut off thereafter.
Similar to the configuration of the j-th bit-level cell Mij, the (j-1)-th bit-level cell Mi(j-1) encompasses a first nMOS transistor Qi(j-1)1 having a drain electrode connected to the clock signal supply line through a first delay element D i(j-1)1 and a gate electrode connected to the inter-unit cell Bi(j-1) through a second delay element D i(j-1)2; a second nMOS transistor Qi(j-1)2 having a drain electrode connected to the source electrode of the first nMOS transistor Qi(j-1)1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Ci(j-1) configured to store the information of the bit-level cell Mi(j-1), connected in parallel with the second nMOS transistor Qi(j-1)2. Here, the first delay element D i(i-1)1 is implemented by a first diode D1b, and the second delay element D i(i-1)2 is implemented by a tandem connection of a second diode D2b and a third diode D3b.
As explained above, because a single diode or a tandem connection of diodes can serve as "resistive-capacitive delay" or "R-C delay", the operation of the (j-1)-th bit-level cell Mi(i-1) with the inter-unit cell Bi(i-1) illustrated in Fig. 18 is substantially same as the operation of the configuration illustrated in Fig. 13. In addition to the performance by the configuration illustrated in Fig. 13, because the tandem connection of the second diode D2b and the third diode D3b can block efficiently the flow of the reverse-directional current, the configuration implemented by a combination of the (j-1)-th bit-level cell Mi(i-1) with the inter-unit cell Bi(i-1) illustrated in Fig. 18 can achieve a better isolation between the signal-storage state of the (j-2)-th bit-level cell Mi(j-2) and the signal-storage state of the (j-1)-th bit-level cell Mi(i-1), even if the signal of the lower logical level of "0" stored in the previous bit-level cell Mi(j-2) is fed to the gate electrode of the first nMOS transistor Qi(i-1)1 through the inter-unit cell Bi(i-1).
In actual semiconductor devices, because there are many parasitic resistances (stray resistances) and many parasitic capacitances (stray capacitances) associated with wirings, gate structures, electrode structures, and junction structures are inherent, in a very high speed operation of the marching main memory, the delineation of extrinsic resistor elements and capacitor elements can be omitted, if the parasitic resistances and the parasitic capacitances can achieve the required delay times td1 , td2 compared with operation speed of the marching main memory. Therefore, in the configuration illustrated in Figs. 11-13 and 16, the first delay elements D i(j-1)1 and D ij1 can be omitted, as illustrated in Figs. 19, 20 and 22.
In one of other examples of the bit-level cells used in the computer system pertaining to the first embodiment of the present invention illustrated in Fig. 19, although the j-th bit-level cell Mij encompasses a first nMOS transistor Qij1, similar to the configuration illustrated in Fig. 11, the first nMOS transistor Qij1 has a drain electrode directly connected to the clock signal supply line, and the first delay element D ij1 employed in the configuration illustrated in Fig. 11 is omitted. The feature that the first nMOS transistor Qij1 has a gate electrode connected to the inter-unit cell Bij through a signal-delay element D ij, which corresponds to the second delay element D ij2 illustrated in Fig. 11, and the second nMOS transistor Qij2 has a drain electrode connected to a source electrode of the first nMOS transistor Qij1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential, and a capacitor Cij configured to store the information of the bit-level cell Mij, connected in parallel with the second nMOS transistor Qij2 is substantially same as the configuration illustrated in Fig. 11.
In the other example of the bit-level cell pertaining to the first embodiment illustrated in Fig. 19, similar to the configuration illustrated in Figs. 11-13 and 16, the inter-unit cell Bij is further provided so as to isolate the signal-storage state of the j-th bit-level cell Mij in the j-th memory unit Uj from the signal-storage state of the (j-1)-th bit-level cell Mij-1 in the (j-1)-th memory unit Uj-1. Furthermore, the inter-unit cell Bij transfers a signal from the (j-1)-th bit-level cell Mij-1 to the j-th bit-level cell Mij at a required timing determined by a clock signal, which is supplied through the clock signal supply line. Because the j-th memory unit Uj stores information of byte size or word size by the sequence of bit-level cells arrayed in the j-th memory unit Uj, and the (j-1)-th memory unit Uj-1 stores information of byte size or word size by the sequence of bit-level cells arrayed in the (j-1)-th memory unit Uj-1, a sequence of inter-unit cells arrayed in parallel with the memory units Uj-1 and Uj transfers the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line so that the information of byte size or word size can march along a predetermined direction, pari passu.
As illustrated in Fig. 19, because the input terminal of the j-th bit-level cell Mij on the i-th row is connected to the inter-unit cell Bij, the signal charge stored in the (j-1)-th bit-level cell Mij-1 is fed to the signal-delay element D ij through the inter-unit cell Bij at the required timing, and the transfer operation of the signal charge is cut off at periods other than the required timing.
In Fig. 20, in addition to the configuration illustrated in Fig. 19, another inter-unit cell Bi(j-1) is provided between the (j-2)-th bit-level cell Mi(j-2) and the (j-1)-th bit-level cell Mi(j-1),configured to isolate the signal-storage state of the (j-1)-th bit-level cell Mi(j-1) in the (j-1)-th memory unit Uj-1 from the signal-storage state of the (j-2)-th bit-level cell Mi(j-2) in the (j-2)-th memory unit Uj-2, and to transfer a signal from the (j-2)-th bit-level cell Mi(j-2) to the (j-1)-th bit-level cell Mi(j-1) at the required timing determined by the clock signal, which is supplied through the clock signal supply line. In Fig. 20, because the input terminal of the (j-1)-th bit-level cell Mi(j-1) on the i-th row is connected to the inter-unit cell Bi(j-1), the signal charge stored in the (j-2)-th bit-level cell Mi(j-2) is fed to the signal-delay element D i(j-1) through the inter-unit cell Bi(j-1) at the required timing, and the transfer operation of the signal charge is cut off thereafter.
Similar to the configuration of the j-th bit-level cell Mij, the (j-1)-th bit-level cell Mi(j-1) encompasses a first nMOS transistor Qi(j-1)1 having a drain electrode directly connected to the clock signal supply line and a gate electrode connected to the inter-unit cell Bi(j-1) through a signal-delay element D i(j-1); a second nMOS transistor Qi(j-1)2 having a drain electrode connected to the source electrode of the first nMOS transistor Qi(j-1)1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Ci(j-1) configured to store the information of the bit-level cell Mi(j-1), connected in parallel with the second nMOS transistor Qi(j-1)2 .
In the circuit configuration, as one of other examples of the bit-level cells pertaining to the first embodiment, illustrated in Figs. 19 and 20, the second nMOS transistor Qij2 of the bit-level cell Mij, serves as a reset-transistor configured to reset the signal charge stored in the capacitor Cij, when the clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Qij2, discharging the signal charge already stored in the capacitor Cij, and the second nMOS transistor Qi(j-1)2 of the bit-level cell Mi(j-1) serves as a reset-transistor configured to reset the signal charge stored in the capacitor Ci(j-1), when the clock signal of high-level (or a logical level of "1") is applied to the gate electrode of the second nMOS transistor Qi(j-1)2, discharging the signal charge already stored in the capacitor Ci(j-1).
In Figs. 19 and 20, the isolation transistors Qi(j-1)3 and Qij3 shall be high-speed transistors having a shorter rise time, a shorter period of conductive state, and a shorter fall time than the second nMOS transistors Qi(j-1)2 and Qij2, which have larger stray capacitances and larger stray resistances associated with gate circuits and gate structures so that, when the second nMOS transistors Qi(j-1)2 and Qij2 are still in the cut-off state, the isolation transistors Qi(j-1)3 and Qij3 becomes the conductive state very rapidly so as to transfer the signal charges between the memory units, and when the second nMOS transistors Qi(j-1)2 and Qij2 start slowly toward the conductive state for discharging the signal charge stored in the capacitors Ci(j-1) and Cij, the isolation transistors Qi(j-1)3 and Qij3 proceeds to become the cut-off state very rapidly so as to establish the isolation between the memory units.
Fig. 21 illustrates a detailed response of the bit-level cell Mi(j-1) illustrated in Fig. 20, which is one of other examples of the bit-level cells used in the computer system pertaining to the first embodiment of the present invention, to the waveform of the clock signal illustrated by thin solid line, for a case that the signal-delay element D i(j-1) is implemented by R-C delay circuit. The clock signal illustrated by thin solid line swings periodically between the logical levels of "1" and "0" with the clock period TAUclock. In Fig. 21, time intervalTAU1= TAU2 = TAU3 = TAU4 is defined to be a quarter of the clock period TAUclock (= TAUclock /4).
In a normal operation of the marching memory, the signal charge stored in the capacitor Ci(j-1) is actually either of the logical level of "0" or"1", as illustrated in Figs. 22 (a)-(d). If the signal charge stored in the capacitor Ci(j-1) is of the logical level of "1", as illustrated in Figs. 22 (c) and (d), although the first nMOS transistor Qi(j-1)1 still keeps off-sate because the potential of the gate electrode of the first nMOS transistor Qi(j-1)1 is delayed by the signal-delay element D i(j-1), the capacitor Ci(j-1) can begin discharging at the beginning of the time intervalTAU1, because the second nMOS transistor Qi(j-1)2 becomes active rapidly when the clock signal of the high-level is applied to the gate electrode of the second nMOS transistor Qi(j-1)2, under the assumption that an ideal operation of the second nMOS transistor Qi(j-1)2 with no delay can be approximated. Therefore, if the signal charge stored in the capacitor Ci(j-1) is actually of the logical level of "1", after the clock signal of high-level has been applied to the gate electrode of the second nMOS transistor Qi(j-1)2, as illustrated by the thin solid line in Fig. 21, and the signal charge stored in the capacitor Ci(j-1) will be discharged to the logical level of "0", and at the same time approximately, the first nMOS transistor Qi(j-1)1 is prepared to be active as a transfer-transistor, delayed by a negligibly-short delay time determined by parasitic elements implemented by stray resistance and stray capacitance. In Fig. 21, the change of the potential at the drain electrode of the first nMOS transistor Qi(j-1)1 is illustrated exaggeratingly by dash-dotted line.
And, as illustrated by a thick solid line in Fig. 21, when the signal level of "1" stored in a previous bit-level cell Mi(j-2) is fed from the previous bit-level cell Mi(j-2) through the inter-unit cell Bi(j-1) to the gate electrode of the first nMOS transistor Qi(j-1)1, the first nMOS transistor Qi(j-1)1 turns on, and the first nMOS transistor Qi(j-1)1 transfers the signal level of "1" stored in the previous bit-level cell Mi(j-2), delayed by a predetermined delay time td2 determined by the signal-delay element D i(j-1) to the capacitor Ci(j-1). Alternatively, as illustrated by a broken line in Fig. 21, when the signal level of "0" stored in a previous bit-level cell Mi(j-2) is fed from the previous bit-level cell Mi(j-2) to the gate electrode of the first nMOS transistor Qi(j-1)1, the first nMOS transistor Qi(j-1)1 keeps off-state. At this instant of time, because the capacitor Ci(j-1) still keep the logical level of "0", the first nMOS transistor Qi(j-1)1 transfers equivalently the signal level of "0" stored in the previous bit-level cell Mi(j-2). An output node Nout serving as an output terminal of the bit-level cell Mi(j-1) delivers the signal stored in the capacitor Ci(j-1) to the next bit-level cell on the i-th row.
Because the clock signal swings periodically between the logical levels of "1" and "0", with the clock period TAUclock as illustrated by the thin solid line, then, the clock signal becomes the logical level of "0" as time proceeds by 1/2TAUclock, or at the beginning of the time intervalTAU3, the potential at the drain electrode of the first nMOS transistor Qi(j-1)1 begins to decay rapidly as illustrated exaggeratingly by the dash-dotted line. If the inter-unit cell Bij, inserted between the current bit-level cell Mi(j-1) and the next bit-level cell Mij, is implemented by an nMOS transistor, the path between the output terminal of the current bit-level cell Mi(j-1) and the gate electrode of the first nMOS transistor Qij1 of the next bit-level cell Mij, becomes the cut-off state by the logical level of "0" of the clock signal being applied to the gate electrode of the nMOS transistor, and therefore, the output node Nout cannot deliver the signal transferred from the previous bit-level cell Mi(j-2) further to the next bit-level cell Mij like duckpins in the time intervalsTAU3 andTAU4, and the signal is blocked to be domino transferred to the gate electrode of the next first nMOS transistor Qij1. Because the first nMOS transistor Qi(j-1)1 becomes the cut-off state in the time intervalsTAU3 andTAU4, the potential at the output node Nout is kept in a floating state, and the signal states stored in the capacitor Ci(j-1) are held.
When the clock signal becomes the logical level of "1" again, as illustrated by the thin solid line in a next column of Fig. 21, the output node Nout connecting the source electrode of the first nMOS transistor Qi(j-1)1 and the drain electrode of the second nMOS transistor Qi(j-1)2, which is serving as the output terminal of the bit-level cell Mi(j-1), can deliver the signal stored in the capacitor Ci(j-1) to the next bit-level cell Mij at the next clock cycle because the inter-unit cell Bij becomes conductive state, and the potential at the drain electrode of the first nMOS transistor Qi(j-1)1 increase as illustrated exaggeratingly by the dash-dotted line.
Figs. 22 (a)-(d) illustrate four modes of signal-transferring operations, respectively, focusing to the bit-level cell Mij illustrated in Figs. 19 and 20, the bit-level cell Mij is one of the bit-level cells arrayed sequentially in the j-th memory unit Uj, the j-th memory unit Uj stores information of byte size or word size by the sequence of bit-level cells arrayed sequentially in the j-th memory unit Uj. In the computer system pertaining to the first embodiment of the present invention, the information of byte size or word size arrayed sequentially marches side by side from a previous memory unit to a next memory unit, pari passu. In Figs. 22 (a)-(d), the clock signal is supplied by the clock signal supply line CLOCK so as to swing periodically between the logical levels of "1" and "0" with the clock period TAUclock, while the clock signal supply line CLOCK serves as a power supply line.
Figs. 22(a) and (b) illustrate the cases when the logical level of "0" is stored by previous clock signal into the capacitor Cij, and Figs. 22 (c) and (d) illustrate the cases when the logical level of "1" is stored by previous clock signal into the capacitor Cij as one of the signal in the information of byte size or word size. As illustrated in Fig. 22(a), in a case when the signal charge previously stored in the capacitor Cij is of the logical level of "0", if the signal of the logical level of "0", which is stored in a previous bit-level cell Mi(j-1), as one of the signal in the information of byte size or word size to be transferred in a cooperative way, is fed from the previous bit-level cell Mi(j-1) through the inter-unit cell Bij (the illustration is omitted) to the gate electrode of the first nMOS transistor Qij1, the first nMOS transistor Qij1 keeps off-state. At this instant of time, because the capacitor Cij still keep the logical level of "0", the first nMOS transistor Qi(j-1)1 transfers equivalently the logical level of "0" to the capacitor Cij. Then, the output node Nout delivers the signal level of "0", which is maintained in the capacitor Cij, to the next bit-level cell as illustrated in Fig. 22(a).
Similarly, as illustrated in Fig. 22(b), in a case when the signal charge previously stored in the capacitor Cij is of the logical level of "0", if the signal of the logical level of "1" stored in a previous bit-level cell Mi(j-1) is fed from the previous bit-level cell Mi(j-1) through the inter-unit cell Bij to the gate electrode of the first nMOS transistor Qij1, in the timing the signal charge stored in the capacitor Cij keeps the logical level of "0", the first nMOS transistor Qij1 begins turning-on for transferring the signal of the logical level of "1" stored in the previous bit-level cell Mi(j-1) to the capacitor Cij so that the logical level of "1" can be stored in the capacitor Cij, and the output node Nout delivers the signal level of "1" stored in the capacitor Cij to the next bit-level cell as illustrated in Fig. 22(b).
On the contrary, as illustrated in Fig. 22(c), in a case when the signal charge previously stored in the capacitor Cij is of the logical level of "1", if the signal of the logical level of "0", which is stored in a previous bit-level cell Mi(j-1), is fed from the previous bit-level cell Mi(j-1) through the inter-unit cell Bij to the gate electrode of the first nMOS transistor Qij1, after the timing when the signal charge stored in the capacitor Cij is completely discharged to establish the logical level of "0", the first nMOS transistor Qij1 keeps off-state. Then, the output node Nout delivers the signal level of "0" stored in the capacitor Cij to the next bit-level cell as illustrated in Fig. 22(c).
Similarly, as illustrated in Fig. 22(d), in a case when the signal charge previously stored in the capacitor Cij is of the logical level of "1", if the signal of the logical level of "1" stored in a previous bit-level cell Mi(j-1) is fed from the previous bit-level cell Mi(j-1) through the inter-unit cell Bij to the gate electrode of the first nMOS transistor Qij1, after the timing when the signal charge stored in the capacitor Cij is completely discharged to establish the logical level of "0", the first nMOS transistor Qij1 turns on, and the first nMOS transistor Qij1 transfers the signal of the logical level of "1" stored in the previous bit-level cell Mi(j-1) to the capacitor Cij. Then, the output node Nout delivers the signal level of "1" stored in the capacitor Cij to the next bit-level cell as illustrated in Fig. 22(d).
As above-mentioned, with an input signal of "1" provided by the clock signal and another input signal of "1" or "0" provided by the previous bit-level cell Mi(j-1), the bit-level cell Mij can establish "a marching AND-gate" operations of:

1 + 1 = 1
1 + 0 = 1,

and with an input signal of "0" provided by the clock signal and another input signal of "1" or "0" provided by the previous bit-level cell Mi(j-1), the bit-level cell Mij can establish "the marching AND-gate" operations of:

0 + 1 = 0
0 + 0 = 0.

Therefore, in a gate-level representation of the cell array corresponding to the marching main memory 31 illustrated in Fig. 4, as illustrated in Fig. 23, a first cell M11 allocated at the leftmost side on a first row and connected to an input terminal I1 encompasses a capacitor C11 configured to store the information, and a marching AND-gate G11 having one input terminal connected to the capacitor C11, the other input configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G21 assigned to the adjacent second cell M21 on the first row. An example of the response to the waveform of the clock signal is illustrated in Fig. 7C. When the logical values of "1" of the clock signal is fed to the other input terminal of the marching AND-gate G11, the information stored in the capacitor C11 is transferred to a capacitor C12, assigned to the adjacent second cell M12, and the capacitor C12 stores the information. Namely, the second cell M12 on the first row of the gate-level representation of cell array implementing the marching main memory 31 encompasses the capacitor C12 and a marching AND-gate G12, which has one input terminal connected to the capacitor C12, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G13 assigned to the adjacent third cell M13 on the first row. Similarly the third cell M13 on the first row of the gate-level representation of cell array implementing the marching main memory 31 encompasses a capacitor C13 configured to store the information, and a marching AND-gate G13 having one input terminal connected to the capacitor C13, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate assigned to the adjacent fourth cell, although the illustration of the fourth cell is omitted. Therefore, when the logical values of "1" is fed to the other input terminal of the marching AND-gate G12, the information stored in the capacitor C12 is transferred to the capacitor C13, assigned to the third cell M13, and the capacitor C13 stores the information, and when the logical values of "1" is fed to the other input terminal of the marching AND-gate G13, the information stored in the capacitor C13 is transferred to the capacitor, assigned to the fourth cell. Furthermore, a (n-1)-th cell M1, n-1 on the first row of the gate-level representation of cell array implementing the marching main memory 31 encompasses a capacitor C1, n-1 configured to store the information, and a marching AND-gate G1, n-1 having one input terminal connected to the capacitor C1, n-1, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G1n assigned to the adjacent n-th cell M1n, which is allocated at the rightmost side on the first row and connected to an output terminal O1. Therefore, each of the cells M11, M12, M13,........., M1, n-1, M1n stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals O1, so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
Similarly, in a gate-level representation of cell array implementing the marching main memory 31 illustrated in Fig. 23, a first cell M21 allocated at the leftmost side on a second row and connected to an input terminal I2 encompasses a capacitor C21, and a marching AND-gate G21 having one input terminal connected to the capacitor C21, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G21 assigned to the adjacent second cell M21 on the second row. The second cell M22 on the second row of the gate-level representation of cell array implementing the marching main memory 31 encompasses the capacitor C22 and a marching AND-gate G22, which has one input terminal connected to the capacitor C22, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G23 assigned to the adjacent third cell M23 on the second row. Similarly the third cell M23 on the second row of the gate-level representation of cell array implementing the marching main memory 31 encompasses a capacitor C23, and a marching AND-gate G23 having one input terminal connected to the capacitor C23, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate assigned to the adjacent fourth cell. Furthermore, a (n-1)-th cell M2, n-1 on the second row of the gate-level representation of cell array implementing the marching main memory 31 encompasses a capacitor C2, n-1, and a marching AND-gate G2, n-1 having one input terminal connected to the capacitor C2, n-1, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the next marching AND-gate G1n assigned to the adjacent n-th cell M1n, which is allocated at the rightmost side on the second row and connected to an output terminal O1. Therefore, each of the cells M21, M22, M23,........., M2, n-1, M1n on the second row stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals O1, so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
On a third row, a first cell M31 allocated at the leftmost side and connected to an input terminal I3, a second cell M32 adjacent to the first cell M31, a third cell M33 adjacent to the second cell M32, ........., a (n-1)-th cell M3, n-1, and an n-th cell M3n, which is allocated at the rightmost side on the third row and connected to an output terminal O3 are aligned. And, each of the cells M31, M32, M33,........., M3, n-1, M3n on the third row stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals O3, so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
On a (m-1)-th row, a first cell M(m-1),1 allocated at the leftmost side and connected to an input terminal Im-1,, a second cell M(m-1),2 adjacent to the first cell M(m-1),1, a third cell M(m-1),3 adjacent to the second cell M(m-1),2, ........., a (n-1)-th cell M(m-1), n-1, and an n-th cell M(m-1),n, which is allocated at the rightmost side on the (m-1)-th row and connected to an output terminal Om-1, are aligned. And, each of the cells M(m-1),1, M(m-1),2, M(m-1),3,........., M(m-1), n-1, M(m-1),n on the (m-1)-th row stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals Om-1,, so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
On a m-th row, a first cell Mm1 allocated at the leftmost side and connected to an input terminal Im-1, a second cell Mm2 adjacent to the first cell Mm1, a third cell Mm3 adjacent to the second cell Mm2,........, a (n-1)-th cell Mm(n-1), and an n-th cell Mmn, which is allocated at the rightmost side on the m-th row and connected to an output terminal Om, are aligned. And, each of the cells Mm1, Mm2, Mm3,........., Mm(n-1), Mmn on the m-th row stores the information, and transfers the information synchronously with the clock signal, step by step, toward the output terminals Om, so as to provide the processor 11 with the stored information actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
Although one of the examples of the transistor-level configurations of the marching AND-gate Gij is illustrated in Fig. 6, there are various circuit configurations to implement the marching AND-gate, which can be applied to the cell array implementing the marching main memory 31 in the computer system pertaining to the first embodiment of the present invention. Another example of the marching AND-gate Gij, which can be applied to the cell array implementing the marching main memory 31, may be a configuration encompassing a CMOS NAND gate and a CMOS inverter connected to the output terminal of the CMOS NAND gate. Because the CMOS NAND gate requires two nMOS transistors and two pMOS transistors, and the CMOS inverter requires one nMOS transistor and one pMOS transistor, the configuration encompassing the CMOS NAND gate and the CMOS inverter requires six transistors. Furthermore, the marching AND-gate Gij can be implemented by other circuit configurations such as resistor-transistor logics, or by various semiconductor elements, magnetic elements, superconductor elements, or single quantum elements, etc. which has a function of AND logic.
As illustrated in Fig. 23, the gate-level representation of cell array implementing the marching main memory 31 is as simple as the configuration of DRAM, where each of the bit-level cells Mij (i = 1 to m; j = 1 to n) is represented by one capacitor and one marching AND-gate. Each of the vertical sequence of marching AND-gates G11, G21, G31, ........, Gm-1,1, Gm1 implementing the first memory unit U1 shifts the sequence of signals from input terminals I1, I2, I3,........., In-1, In to right along row-direction, or horizontal direction, based on clocks as illustrated in Fig. 7C. And, each of the vertical sequence of marching AND-gates G12, G22, G32, ........, Gm-1,2, Gm2 implementing the second memory unit U2 shifts the sequence of signals of word size from left to right along row-direction based on clocks, each of the vertical sequence of marching AND-gates G13, G23, G33, ........, Gm-1,3, Gm3 implementing the third memory unit U3 shifts the sequence of signals of word size from left to right along row-direction based on clocks, ............,each of the vertical sequence of marching AND-gates G1,n-1, G2,n-1, G3,n-1, ........, Gm-1,n-1, Gm,n-1 implementing the (n-1)-th memory unit Un-1 shifts the sequence of signals of word size from left to right along row-direction based on clocks, and each of the vertical sequence of marching AND-gates G1,n, G2,n, G3,n, ........, Gm-1,n, Gm,n implementing the n-th memory unit Un shifts the sequence of signals of word size from left to right to the output terminals O1, O2, O3,........., On-1, On based on clocks as illustrated in Fig. 7C. Especially, the time delay td1 , td2 in each of marching AND-gate Gij (i = 1 to m; j = 1 to n) is significant to correctly perform the marching-shift actions in every memory units in the marching main memory 31 successively.
(REVERSE-DIRECTIONAL MARCHING MAIN MEMORY)
Although Figs. 3-23 illustrate the marching main memory which stores the information in each of memory units U1, U2, U3,........., Un-1, Un and transfers the information synchronously with the clock signal, step by step, from input terminal toward the output terminal, Fig. 24 illustrates another marching main memory.
In Fig. 24, each of the memory units U1, U2, U3,........., Un-1, Un stores the information including word size of data or instructions, and transfers in the reverse direction the information synchronously with the clock signal, step by step, toward the output terminals, provided from the processor 11 with the resultant data executed in the ALU 112.
Fig. 25(a) illustrates an array of i-th row of the m * n matrix (here, "m" is an integer determined by word size) in a cell-level representation of the another marching main memory illustrated in Fig. 24, which stores the information of bit level in each of cells Mi1, Mi2, Mi3,........., Mi,n-1, Mi,n and transfers the information synchronously with the clock signal, step by step in the reverse direction to the marching main memory illustrated in Figs. 3-23, namely from the output terminal OUT toward the input terminal IN.
As illustrated in Fig. 25(a), in a reverse-directional marching main memory, a bit-level cell Min of the n-th column and on the i-th row, allocated at the rightmost side on the i-th row and connected to an input terminal IN encompasses a first nMOS transistor Qin1 having a drain electrode connected to a clock signal supply line through a first delay element D in1 and a gate electrode connected to the input terminal IN through a second delay element D in2; a second nMOS transistor Qin2 having a drain electrode connected to a source electrode of the first nMOS transistor Qin1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Cin configured to store the information of the bit-level cell Min, connected in parallel with the second nMOS transistor Qin2, wherein an output node connecting the source electrode of the first nMOS transistor Qin1 and the drain electrode of the second nMOS transistor Qin2 serves as an output terminal of the bit-level cell Min, configured to transfer the signal stored in the capacitor Cin to the next bit-level cell Mi2.
As illustrated in Fig. 25(b), the clock signal swings periodically between the logical levels of "1" and "0", with a predetermined clock period TAUclock, and when the clock signal becomes the logical level of "1", the second nMOS transistor Qin2 begins to discharge the signal charge, which is already stored in the capacitor Cin at a previous clock cycle. And, after the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor Cin is completely discharged to becomes the logical level of "0", the first nMOS transistor Qin1 becomes active as the transfer transistor, delayed by the predetermined delay time td1 determined by the first delay element D in1. The delay time td1 may be set to be equal to 1/4TAUclock preferably. Thereafter, when the signal is fed from the input terminal IN to the gate electrode of the first nMOS transistor Qin1, the first nMOS transistor Qin1 transfers the signal stored in the previous bit-level cell Mi2, further delayed by the predetermined delay time td2 determined by the second delay element D in2 to the capacitor Cin. For example, if the logical level of "1" is fed from the input terminal IN to the gate electrode of the first nMOS transistor Qin1, the first nMOS transistor Qin1 becomes conductive state, and the logical level of "1" is stored in the capacitor Cin. On the other hand, if the logical level of "0" is fed from the input terminal IN to the gate electrode of the first nMOS transistor Qin1, the first nMOS transistor Qin1 keeps cut-off state, and the logical level of "0" is maintained in the capacitor Cin. Therefore, the bit-level cell Min can establish "a marching AND-gate" operation. The delay time td2 shall be longer than the delay time td1, and the delay time td2 may be set to be equal to 1/2TAUclock preferably. When the clock signal becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrodes of the first nMOS transistor Qin1and the drain electrodes of the second nMOS transistor Qin2 cannot deliver the signals, which are entered to the gate electrodes of the first nMOS transistor Qin1, further to the next bit-level cell Mi2, at a time when time proceeds 1/2TAUclock, as the signals is blocked to be transferred to the gate electrodes of the next first nMOS transistor Qi21 delayed by the delay time td2 = 1/2TAUclock determined by the second delay element D i22. And, at a time when time proceeds TAUclock, when the next clock signal becomes the logical level of "1" again, a sequence of the second nMOS transistors
As illustrated in Fig. 25(a), in a reverse-directional marching main memory, a bit-level cell Mi(n-1) of the (n-1)-th column and on the i-th row, allocated at the second right side on the i-th row, encompasses a first nMOS transistor Qi(n-1)1 having a drain electrode connected to the clock signal supply line through a first delay element D i(n-1)1 and a gate electrode connected to the output terminal of the bit-level cell Min through a second delay element D i(n-1)2; a second nMOS transistor Qi(n-1)2 having a drain electrode connected to a source electrode of the first nMOS transistor Qi(n-1)1, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Ci(n-1) configured to store the information of the bit-level cell Mi(n-1), connected in parallel with the second nMOS transistor Qi(n-1)2. When the clock signal becomes the logical level of "1", the second nMOS transistor Qi(n-1)2 begins to discharge the signal charge, which is already stored in the capacitor Ci(n-1) at a previous clock cycle. And, as illustrated in Fig. 25(b), and the logical values of "1" is kept from time "t" to time "t+1" in the capacitor Ci(n-1). After the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor Ci(n-1) is completely discharged to becomes the logical level of "0", the first nMOS transistor Qi(n-1)1 becomes active as the transfer transistor, delayed by the delay time td1 determined by the first delay element D i(n-1)1. Thereafter, when the signal is fed from the output terminal of the bit-level cell Min to the gate electrode of the first nMOS transistor Qi(n-1)1, the first nMOS transistor Qi(n-1)1 transfers the signal stored in the previous bit-level cell Min, further delayed by the delay time td2 determined by the second delay element D i(n-1)2 to the capacitor Ci(n-1). When the clock signal becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrodes of the first nMOS transistor Qi(n-1)1and the drain electrodes of the second nMOS transistor Qi(n-1)2 cannot deliver the signals, which are entered to the gate electrodes of the first nMOS transistor Qi(n-1)1, further to the next bit-level cell Mi(n-2), at a time when time proceeds 1/2TAUclock, as the signals is blocked to be transferred to the gate electrodes of the next first nMOS transistor Qi(n-2)1 (illustration is omitted) delayed by the delay time td2 = 1/2TAUclock determined by the second delay element D i(n-2)2 (illustration is omitted).
Similarly the third cell Mi3 from the left, on the i-th row, of the reverse-directional marching main memory encompasses a first nMOS transistor Qi31 having a drain electrode connected to the clock signal supply line through a first delay element D i31 and a gate electrode connected to the output terminal of the bit-level cell Mi4 (illustration is omitted) through a second delay element D i32; a second nMOS transistor Qi32 having a drain electrode connected to a source electrode of the first nMOS transistor Qi31, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Ci3 configured to store the information of the bit-level cell Mi3, connected in parallel with the second nMOS transistor Qi32. When the clock signal becomes the logical level of "1", the second nMOS transistor Qi32 begins to discharge the signal charge, which is already stored in the capacitor Ci3 at a previous clock cycle. After the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor Ci3 is completely discharged to becomes the logical level of "0", the first nMOS transistor Qi31 becomes active as the transfer transistor, delayed by the delay time td1 determined by the first delay element Di31. Thereafter, when the signal is fed from the output terminal of the bit-level cell Mi4 to the gate electrode of the first nMOS transistor Qi31, the first nMOS transistor Qi31 transfers the signal stored in the previous bit-level cell Min, further delayed by the delay time td2 determined by the second delay element D i32 to the capacitor Ci3. When the clock signal becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrodes of the first nMOS transistor Qi31and the drain electrodes of the second nMOS transistor Qi32 cannot deliver the signals, which are entered to the gate electrodes of the first nMOS transistor Qi31, further to the next bit-level cell Mi2, at a time when time proceeds 1/2TAUclock, as the signals is blocked to be transferred to the gate electrodes of the next first nMOS transistor Qi21 delayed by the delay time td2 = 1/2TAUclock determined by the second delay element D i22
And, as illustrated in Fig. 25(a), in a reverse-directional marching main memory, a bit-level cell Mi2 of the second column from the left, and on the i-th row, encompasses a first nMOS transistor Qi21 having a drain electrode connected to the clock signal supply line through a first delay element D i21 and a gate electrode connected to the output terminal of the bit-level cell Mi3 through a second delay element D i22; a second nMOS transistor Qi22 having a drain electrode connected to a source electrode of the first nMOS transistor Qi21, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Ci2 configured to store the information of the bit-level cell Mi2, connected in parallel with the second nMOS transistor Qi22. When the clock signal becomes the logical level of "1", the second nMOS transistor Qi22 begins to discharge the signal charge, which is already stored in the capacitor Ci2 at a previous clock cycle. After the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor Ci2 is completely discharged to becomes the logical level of "0", the first nMOS transistor Qi21 becomes active as the transfer transistor, delayed by the delay time td1 determined by the first delay element Di21. Thereafter, when the signal is fed from the output terminal of the bit-level cell Mi3 to the gate electrode of the first nMOS transistor Qi21, the first nMOS transistor Qi21 transfers the signal stored in the previous bit-level cell Mi3, further delayed by the delay time td2 determined by the second delay element D i22 to the capacitor Ci2. When the clock signal becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrode of the first nMOS transistor Qi21and the drain electrode of the second nMOS transistor Qi22 cannot deliver the signal, which is entered to the gate electrode of the first nMOS transistor Qi21, further to the next bit-level cell Mi1, at a time when time proceeds 1/2TAUclock, as the signal is blocked to be transferred to the gate electrode of the next first nMOS transistor Qi11 delayed by the delay time td2 = 1/2TAUclock determined by the second delay element D i12 .
As illustrated in Fig. 25(a), in a reverse-directional marching main memory, a bit-level cell Mi1 of the first column and on the i-th row, , which is allocated at the leftmost side on the i-th row and connected to an output terminal OUT, encompasses a first nMOS transistor Qi11 having a drain electrode connected to the clock signal supply line through a first delay element D i11 and a gate electrode connected to the output terminal of the bit-level cell Mi2 through a second delay element D i12; a second nMOS transistor Qi12 having a drain electrode connected to a source electrode of the first nMOS transistor Qi11, a gate electrode connected to the clock signal supply line, and a source electrode connected to the ground potential; and a capacitor Ci1 configured to store the information of the bit-level cell Mi1, connected in parallel with the second nMOS transistor Qi12. When the clock signal becomes the logical level of "1", the second nMOS transistor Qi12 begins to discharge the signal charge, which is already stored in the capacitor Ci1 at a previous clock cycle. After the clock signal of the logical level of "1" is applied and the signal charge stored in the capacitor Ci1 is completely discharged to becomes the logical level of "0", the first nMOS transistor Qi11 becomes active as the transfer transistor, delayed by the delay time td1 determined by the first delay element D i11. Thereafter, when the signal is fed from the output terminal of the bit-level cell Mi2 to the gate electrode of the first nMOS transistor Qi11, the first nMOS transistor Qi11 transfers the signal stored in the previous bit-level cell Mi2, further delayed by the delay time td2 determined by the second delay element D i12 to the capacitor Ci1. The output node connecting the source electrode of the first nMOS transistor Qi11 and the drain electrode of the second nMOS transistor Qi12 delivers the signal stored in the capacitor Ci1 to the output terminal OUT.
According to the reverse-directional one-dimensional marching main memory 31 of the first embodiment illustrated in Figs. 24. 25(a) and 25(b), addressing to each of memory units U1, U2, U3,........., Un-1, Un disappears and required information is heading for its destination unit connected to the edge of the memory. The mechanism of accessing the reverse-directional one-dimensional marching main memory 31 of the first embodiment is truly alternative to existing memory schemes that are starting from the addressing mode to read/write information. Therefore, according to the reverse-directional one-dimensional marching main memory 31 of the first embodiment, the memory-accessing without addressing mode is quite simpler than existing memory schemes.
As above mentioned, the bit-level cell Mij can establish "a marching AND-gate" operation. Therefore, as illustrated in Fig. 26, in a gate-level representation of the cell array corresponding to the reverse-directional marching main memory 31 illustrated in Fig. 25(a), the n-th bit-level cell Mi,n allocated at the rightmost side on the i-th row and connected to an input terminal IN encompasses a capacitor Cin configured to store the information, and a marching AND-gate Gin having one input terminal connected to the capacitor Cin, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the preceding marching AND-gate Gin-1 assigned to the adjacent (n-1)-th bit-level cell Mi,n-1 on the i-th row. When the logical values of "1" is fed to the other input terminal of the marching AND-gate Gn, the information stored in the capacitor Cin is transferred to a capacitor Ci,n-1, assigned to the adjacent (n-1)-th bit-level cell Mi,n-1 on the i-th row, and the capacitor Ci,n-1 stores the information. Namely, the (n-1)-th bit-level cell Mi,n-1 on the i-th row of the reverse-directional marching main memory encompasses the capacitor Ci,n-1 and a marching AND-gate Gi,n-1, which has one input terminal connected to the capacitor Ci,n-1, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the preceding marching AND-gate Gi,n-2 assigned to the adjacent third bit-level cell Mi,n-2(illustration is omitted).
Similarly the third bit-level cell Mi3 on the i-th row of the reverse-directional marching main memory encompasses a capacitor Ci3 configured to store the information, and a marching AND-gate Gi3 having one input terminal connected to the capacitor Ci3, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the preceding marching AND-gate Gi2 assigned to the adjacent second bit-level cell Mi2. Therefore, when the logical values of "1" is fed to the other input terminal of the marching AND-gate Gi3, the information stored in the capacitor Ci3 is transferred to the capacitor Ci2, assigned to the second bit-level cell Mi2, and the capacitor Ci2 stores the information.
Furthermore, the second bit-level cell Mi2 on the i-th row of the reverse-directional marching main memory encompasses the capacitor Ci2 configured to store the information, and the marching AND-gate Gi2 having one input terminal connected to the capacitor Ci2, the other input terminal configured to be supplied with the clock signal, and an output terminal connected to one input terminal of the preceding marching AND-gate Gi1 assigned to the adjacent first bit-level cell Mi1, which is allocated at the leftmost side on the i-th row and connected to an output terminal OUT.
The concept of marching main memory 31 used in the computer system pertaining to the first embodiment of the present invention is illustrated in Fig. 27, this is different from existing computer memory, because the marching main memory 31 is purposely designed with functionality of storage and conveyance of information/data through all of memory units U1, U2, U3,........., Un-1, Un in the marching main memory 31. Marching memory supplies information/data to the processor (CPU) 11 at the same speed of the processor 11. As illustrated in the time-domain relationship of Fig.9, the memory unit streaming time Tmus required for transferring information/data through one memory units U1, U2, U3,........., Un-1, Un, in the marching main memory 31 is equal to the clock cycle Tcc in the processor 11. The marching main memory 31 stores information/data in each of the memory units U1, U2, U3,........., Un-1, Un, and transfers synchronously with the clock signal, step by step, toward the output terminals, so as to provide the processor 11 with the stored information/data so that the arithmetic logic unit 112 can execute the arithmetic and logic operations with the stored information/data.
Therefore, as illustrated in Fig. 28, marching memory structure 3 includes the marching main memory 31 of the first embodiment of the present invention. The term of "the marching memory structure 3" means a generic concept of the memory structure including a marching-instruction register file (RF) 22a and a marching-data register file (RF) 22b connected to the ALU 112, which will be explained further in the following second embodiment, and a marching-instruction cache memory 21a and a marching-data cache memory 21b, which will be explained further in the following third embodiment, in addition to the marching main memory 31 used in the computer system pertaining to the first embodiment of the present invention.
Fig. 29(a) illustrates a forward data-stream Sf flowing from the marching memory structure 3 to the processor 11 and backward data-stream (reverse data-stream) Sb flowing from the processor 11 to the marching memory structure 3, and Fig. 29(a) illustrates bandwidths established between the marching memory structure 3 and the processor 11 assuming that the memory unit streaming time Tmus in the marching memory structure 3 is equal to the clock cycle Tcc of the processor 11.
The scheme of the marching main memory 31 may be considered to be analogous to a magnetic tape system illustrated in Fig. 30 (a), which encompasses a magnetic tape 503, a take-up reel 502 for winding the magnetic tape 503, a supply reel 501 for rewinding and releasing the magnetic tape 503, a read/write header 504 for reading information/data from the magnetic tape 503 or writing information/data to the magnetic tape 503, and a processor 11 connected to the read/write header 504. As the take-up reel 502 winds the magnetic tape 503, which is released from the supply reel 501, the magnetic tape 503 moves at high speed from the supply reel 501 toward the take-up reel 502, and information/data stored on the magnetic tape 503, being transferred with the movement of the magnetic tape 503 at high speed, are read by the read/write header 504. And the processor 11 connected to the read/write header 504 can execute arithmetic and logic operations with information/data read from the magnetic tape 503. Alternatively, the results of the processing in the processor 11 are sent out to the magnetic tape 503 through the read/write header 504.
If we suppose the architecture of the magnetic tape system illustrated in Fig. 30 (a) is implemented by semiconductor technology, that is, If we image an extremely high-speed magnetic tape system is virtually established on semiconductor silicon chip as illustrated in Fig. 30 (b), the extremely high-speed magnetic tape system illustrated in Fig. 30 (a) may correspond to a net marching memory structure 3, including the marching main memory 31 of the first embodiment of the present invention. The net marching memory structure 3 illustrated in Fig. 30 (b) stores information/data in each of the memory units on the silicon chip and transfers synchronously with the clock signal, step by step, toward the take-up reel 502, so as to provide the processor 11 with the stored information/data actively and sequentially so that the processor 11 can execute the arithmetic and logic operations with the stored information/data, and the results of the processing in the processor 11 are sent out to the net marching memory structure 3.
(BIDIRECTIONAL MARCHING MAIN MEMORY)
As illustrated in Figs. 31 (a)-(c), the marching main memory 31 of the first embodiment of the present invention, can achieve bidirectional transferring of information/data. That is, Fig. 31 (a) illustrates a forward marching behavior of information/data, in which information/data marches (shifts) side by side toward right-hand direction (forward direction) in a one-dimensional marching main memory 31, Fig. 31 (b) illustrates a staying state of the one-dimensional marching main memory 31, and Fig. 31 (c) illustrates a reverse-marching behavior of information/data (a backward marching behavior), in which information/data marches (shifts) side by side toward left-hand direction (reverse direction) in the one-dimensional marching main memory 31.
Figs. 32 and 33 illustrate two examples of the representative arrays of i-th row of the m * n matrix (here, "m" is an integer determined by word size) in a transistor-level representation of the cell array for the bidirectional marching main memory 31, respectively, which can achieve the bidirectional behavior illustrated in Figs. 31 (a)-(c). The bidirectional marching main memory 31 stores the information/data of bit level in each of cells Mi1, Mi2, Mi3,........., Mi,n-1, Mi,n and transfers bi-directionally the information/data synchronously with the clock signal, step by step in the forward direction and/or reverse direction (backward direction) between a first I/O selector 512 and a second I/O selector 513.
In Figs. 32 and 33, each of the cells Mi1, Mi2, Mi3,........., Mi,n-1, Mi,n is assigned in memory unit U1, U2, U3,........., Un-1, Un, respectively. That is the cell Mi1 is assigned as the first bit-level cell in the first memory unit U1, the first memory unit U1 stores information of byte size or word size by the sequence of bit-level cells arrayed in the first memory unit U1. Similarly, the cell Mi2 is assigned as the second bit-level cell in the second memory unit U2, the cell Mi3 is assigned as the third bit-level cell in the third memory unit U3, ........., the cell Mi,n-1 is assigned as the (n-1)-th bit-level cell in the (n-1)-th memory unit Un-1, and the cell Mi,n is assigned as the n-th bit-level cell in the n-th memory unit Un. And the memory units U2, U3,........., Un-1, Un stores information of byte size or word size by the sequence of bit-level cells arrayed in the memory unit U2, U3,........., Un-1, Un, respectively. Therefore, the bidirectional marching main memory 31 stores the information/data of byte size or word size in each of cells U1, U2, U3,........., Un-1, Un and transfers bi-directionally the information/data of byte size or word size synchronously with the clock signal, pari passu, in the forward direction and/or reverse direction (backward direction) between a first I/O selector 512 and a second I/O selector 513.
A clock selector 511 selects a first clock signal supply line CL1 and a second clock signal supply line CL2. The first clock signal supply line CL1 drives the forward data-stream, and the second clock signal supply line CL2 drives the backward data-stream, and each of the first clock signal supply line CL1 and the second clock signal supply line CL2 has logical values of "1" and "0".
In the transistor-level representation of the cell array implementing the marching main memory 31 illustrated in Fig. 32, a first bit-level cell Mi1 allocated at the leftmost side on i-th row, being connected to a first I/O selector 512, encompasses a first forward nMOS transistor Qi11f having a drain electrode connected to a first clock signal supply line CL1 through a first forward delay element D i11f and a gate electrode connected to the first I/O selector 512 through a second forward delay element D i12f; a second forward nMOS transistor Qi12f having a drain electrode connected to a source electrode of the first forward nMOS transistor Qi11f, a gate electrode connected to the first clock signal supply line, and a source electrode connected to the ground potential; and a forward capacitor Ci1f configured to store the forward information/data of the cell Mi1, connected in parallel with the second forward nMOS transistor Qi12f, wherein an output node connecting the source electrode of the first forward nMOS transistor Qi11f and the drain electrode of the second forward nMOS transistor Qi12f serves as a forward output terminal of the cell Mi1, configured to transfer the signal stored in the forward capacitor Ci1f to the next bit-level cell Mi2. The first bit-level cell Mi1 further encompasses a first backward nMOS transistor Qi11g having a drain electrode connected to a second clock signal supply line through a first backward delay element D i11g and a gate electrode connected to the backward output terminal of the bit-level cell Mi2 through a second backward delay element D i12g; a second backward nMOS transistor Qi12g having a drain electrode connected to a source electrode of the first backward nMOS transistor Qi11g, a gate electrode connected to the second clock signal supply line, and a source electrode connected to the ground potential; and a backward capacitor Ci1g configured to store the backward information/data of the cell Mi1, connected in parallel with the second backward nMOS transistor Qi12g, wherein an output node connecting the source electrode of the first backward nMOS transistor Qi11g and the drain electrode of the second backward nMOS transistor Qi12g serves as a backward output terminal of the cell Mi1, configured to transfer the signal stored in the backward capacitor Ci1g to the first I/O selector 512.
A second bit-level cell Mi2 allocated at the second from the left side on i-th row, being connected to the bit-level cell Mi1, encompasses a first forward nMOS transistor Qi21f having a drain electrode connected to the first clock signal supply line CL1 through a first forward delay element D i21f and a gate electrode connected to the forward output terminal of the bit-level cell Mi1 through a second forward delay element D i22f; a second forward nMOS transistor Qi22f having a drain electrode connected to a source electrode of the first forward nMOS transistor Qi21f, a gate electrode connected to the first clock signal supply line CL1, and a source electrode connected to the ground potential; and a forward capacitor Ci2f configured to store the forward information/data of the cell Mi2, connected in parallel with the second forward nMOS transistor Qi22f ,wherein an output node connecting the source electrode of the first forward nMOS transistor Qi21f and the drain electrode of the second forward nMOS transistor Qi22f serves as a forward output terminal of the cell Mi2, configured to transfer the signal stored in the forward capacitor Ci2f to the next bit-level cell Mi3. The second bit-level cell Mi2 further encompasses a first backward nMOS transistor Qi21g having a drain electrode connected to the second clock signal supply line CL2 through a first backward delay element D i21g and a gate electrode connected to the backward output terminal of the bit-level cell Mi3 through a second backward delay element D i22g; a second backward nMOS transistor Qi22g having a drain electrode connected to a source electrode of the first backward nMOS transistor Qi21g, a gate electrode connected to the second clock signal supply line CL2, and a source electrode connected to the ground potential; and a backward capacitor Ci2g configured to store the backward information/data of the cell Mi2, connected in parallel with the second backward nMOS transistor Qi22g ,wherein an output node connecting the source electrode of the first backward nMOS transistor Qi21g and the drain electrode of the second backward nMOS transistor Qi22g serves as a backward output terminal of the cell Mi2, configured to transfer the signal stored in the backward capacitor Ci2g to the next bit-level cell Mi1.
A third bit-level cell Mi3 allocated at the second from the left side on i-th row, being connected to the bit-level cell Mi2, encompasses a first forward nMOS transistor Qi31f having a drain electrode connected to the first clock signal supply line CL1 through a first forward delay element D i31f and a gate electrode connected to the forward output terminal of the bit-level cell Mi2 through a second forward delay element D i32f; a second forward nMOS transistor Qi32f having a drain electrode connected to a source electrode of the first forward nMOS transistor Qi31f, a gate electrode connected to the first clock signal supply line CL1, and a source electrode connected to the ground potential; and a forward capacitor Ci3f configured to store the forward information/data of the cell Mi3, connected in parallel with the second forward nMOS transistor Qi32f, wherein an output node connecting the source electrode of the first forward nMOS transistor Qi31f and the drain electrode of the second forward nMOS transistor Qi32f serves as a forward output terminal of the cell Mi3, configured to transfer the signal stored in the forward capacitor Ci3f to the next bit-level cell Mi4 (illustration is omitted). The third bit-level cell Mi3 further encompasses a first backward nMOS transistor Qi31g having a drain electrode connected to the second clock signal supply line CL2 through a first backward delay element D i31g and a gate electrode connected to the backward output terminal of the bit-level cell Mi4 through a second backward delay element D i32g; a second backward nMOS transistor Qi32g having a drain electrode connected to a source electrode of the first backward nMOS transistor Qi31g, a gate electrode connected to the second clock signal supply line CL2, and a source electrode connected to the ground potential; and a backward capacitor Ci3g configured to store the backward information/data of the cell Mi3, connected in parallel with the second backward nMOS transistor Qi32g ,wherein an output node connecting the source electrode of the first backward nMOS transistor Qi31g and the drain electrode of the second backward nMOS transistor Qi32g serves as a backward output terminal of the cell Mi3, configured to transfer the signal stored in the backward capacitor Ci3g to the next bit-level cell Mi2.
A (n-1)-th bit-level cell Mi(n-1) allocated at the second from the left side on i-th row, encompasses a first forward nMOS transistor Qi(n-1)1f having a drain electrode connected to the first clock signal supply line CL1 through a first forward delay element D i(n-1)1f and a gate electrode connected to the forward output terminal of the bit-level cell Mi(n-2) (illustration is omitted) through a second forward delay element D i(n-1)2f; a second forward nMOS transistor Qi(n-1)2f having a drain electrode connected to a source electrode of the first forward nMOS transistor Qi(n-1)1f, a gate electrode connected to the first clock signal supply line CL1, and a source electrode connected to the ground potential; and a forward capacitor Ci(n-1)f configured to store the forward information/data of the cell Mi(n-1), connected in parallel with the second forward nMOS transistor Qi(n-1)2f, wherein an output node connecting the source electrode of the first forward nMOS transistor Qi(n-1)1f and the drain electrode of the second forward nMOS transistor Qi(n-1)2f serves as a forward output terminal of the cell Mi(n-1), configured to transfer the signal stored in the forward capacitor Ci(n-1)f to the next bit-level cell Min. The (n-1)-th bit-level cell Mi(n-1) further encompasses a first backward nMOS transistor Qi(n-1)1g having a drain electrode connected to the second clock signal supply line CL2 through a first backward delay element D i(n-1)1g and a gate electrode connected to the backward output terminal of next bit-level cell Min through a second backward delay element D i(n-1)2g; a second backward nMOS transistor Qi(n-1)2g having a drain electrode connected to a source electrode of the first backward nMOS transistor Qi(n-1)1g, a gate electrode connected to the second clock signal supply line CL2, and a source electrode connected to the ground potential; and a backward capacitor Ci(n-1)g configured to store the backward information/data of the cell Mi(n-1), connected in parallel with the second backward nMOS transistor Qi(n-1)2g ,wherein an output node connecting the source electrode of the first backward nMOS transistor Qi(n-1)1g and the drain electrode of the second backward nMOS transistor Qi(n-1)2g serves as a backward output terminal of the cell Mi(n-1), configured to transfer the signal stored in the backward capacitor Ci(n-1)g to the next bit-level cell Mi(n-2) (illustration is omitted).
A n-th bit-level cell Min allocated at the rightmost side on i-th row, encompasses a first forward nMOS transistor Qin1f having a drain electrode connected to the first clock signal supply line CL1 through a first forward delay element D in1f and a gate electrode connected to the forward output terminal of the bit-level cell Mi(n-1) through a second forward delay element D in2f; a second forward nMOS transistor Qin2f having a drain electrode connected to a source electrode of the first forward nMOS transistor Qin1f, a gate electrode connected to the first clock signal supply line CL1, and a source electrode connected to the ground potential; and a forward capacitor Cinf configured to store the forward information/data of the cell Min, connected in parallel with the second forward nMOS transistor Qin2f, wherein an output node connecting the source electrode of the first forward nMOS transistor Qin1f and the drain electrode of the second forward nMOS transistor Qin2f serves as a forward output terminal of the cell Min, configured to transfer the signal stored in the forward capacitor Cinf to the second I/O selector 513. The n-th bit-level cell Min further encompasses a first backward nMOS transistor Qin1g having a drain electrode connected to the second clock signal supply line CL2 through a first backward delay element D in1g and a gate electrode connected to the second I/O selector 513 through a second backward delay element D in2g; a second backward nMOS transistor Qin2g having a drain electrode connected to a source electrode of the first backward nMOS transistor Qin1g, a gate electrode connected to the second clock signal supply line CL2, and a source electrode connected to the ground potential; and a backward capacitor Cing configured to store the backward information/data of the cell Min, connected in parallel with the second backward nMOS transistor Qin2g ,wherein an output node connecting the source electrode of the first backward nMOS transistor Qin1g and the drain electrode of the second backward nMOS transistor Qin2g serves as a backward output terminal of the cell Min, configured to transfer the signal stored in the backward capacitor Cing to the next bit-level cell Mi(n-1).
When the clock signal supplied from the first clock signal supply line CL1 becomes the logical level of "1", the second forward nMOS transistor Qi12f in the first memory unit U1 begin to discharge the signal charge, which is already stored in the forward capacitor Ci1f in the first memory unit U1 at a previous clock cycle. And, after the clock signal of the logical level of "1", supplied from the first clock signal supply line CL1, is applied to the second forward nMOS transistor Qi12f, and the signal charge stored in the forward capacitor Ci1f is completely discharged to becomes the logical level of "0", the first forward nMOS transistor Qi11f, becomes active as the transfer transistor, delayed by the delay time td1 determined by the first forward delay element Di11f. Thereafter, when the information/data of bit level is entered from the first I/O selector 512 to the gate electrode of the first forward nMOS transistor Qi11f, the first forward nMOS transistor Qi11f transfers the information/data to the forward capacitor Ci1f, delayed by the delay time td2 determined by the second forward delay element Di12f. When the clock signal supplied from the first clock signal supply line CL1 becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrode of the first forward nMOS transistor Qi11f and the drain electrode of the second forward nMOS transistor Qi12f cannot deliver the information/data, which is entered from the first I/O selector 512 to the gate electrode of the first forward nMOS transistor Qi11f, further to the next bit-level cell Mi2, at a time when time proceeds 1/2TAUclock, as the information/data is blocked to be transferred to the gate electrode of the next first forward nMOS transistor Qi21f delayed by the delay time td2 = 1/2TAUclock determined by the second forward delay element D i22f .
When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "1", the second backward nMOS transistor Qi12b begins to discharge the signal charge, which is already stored in the backward capacitor Ci1b at a previous clock cycle. After the clock signal of the logical level of "1", supplied from the second clock signal supply line CL2, is applied and the signal charge stored in the backward capacitor Ci1b is completely discharged to becomes the logical level of "0", the first backward nMOS transistor Qi11b becomes active as the transfer transistor, delayed by the delay time td1 determined by the first backward delay element D i11b. Thereafter, when the information/data is fed from the backward output terminal of the bit-level cell Mi2 to the gate electrode of the first backward nMOS transistor Qi11b, the first backward nMOS transistor Qi11b transfers the information/data stored in the previous bit-level cell Mi2, further delayed by the delay time td2 determined by the second backward delay element D i12b to the backward capacitor Ci1b. The output node connecting the source electrode of the first backward nMOS transistor Qi11b and the drain electrode of the second backward nMOS transistor Qi12b delivers the information/data stored in the backward capacitor Ci1b to the first I/O selector 512.
And, when the next clock signal supplied from the first clock signal supply line CL1 becomes the logical level of "1", and the second forward nMOS transistor Qi22f in the second memory unit U2 begin to discharge the signal charge, which is already stored in the forward capacitor Ci2f in the second memory unit U2 at the previous clock cycle. And, after the clock signal of the logical level of "1", supplied from the first clock signal supply line CL1, is applied to the second forward nMOS transistor Qi22f, and the signal charge stored in the forward capacitor Ci2f is completely discharged to becomes the logical level of "0", the first forward nMOS transistor Qi2f1 becomes active as the transfer transistor, delayed by the delay time td1 determined by the first forward delay element Di21f. Thereafter, when the information/data of bit level stored in the previous forward capacitor Ci1f is fed to the gate electrode of the first forward nMOS transistor Qi21f and the first forward nMOS transistor Qi21f transfers the information/data, delayed by the delay time td2 determined by the second forward delay element Di22f to the forward capacitor Ci2f. When the clock signal supplied from the first clock signal supply line CL1 becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrode of the first forward nMOS transistor Qi21f and the drain electrode of the second forward nMOS transistor Qi22f cannot deliver the information/data, which is entered to the gate electrode of the first forward nMOS transistor Qi21f, further to the next bit-level cell Mi3, at a time when time proceeds 1/2TAUclock, as the information/data is blocked to be transferred to the gate electrode of the next first forward nMOS transistor Qi31f delayed by the delay time td2 = 1/2TAUclock determined by the second forward delay element D i32f .
When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "1", the second backward nMOS transistor Qi22b begins to discharge the signal charge, which is already stored in the backward capacitor Ci2b at a previous clock cycle. After the clock signal supplied from the second clock signal supply line CL2 of the logical level of "1" is applied and the signal charge stored in the backward capacitor Ci2b is completely discharged to becomes the logical level of "0", the first backward nMOS transistor Qi21b becomes active as the transfer transistor, delayed by the delay time td1 determined by the first backward delay element Di21b. Thereafter, when the information/data is fed from the backward output terminal of the bit-level cell Mi3 to the gate electrode of the first backward nMOS transistor Qi21b, the first backward nMOS transistor Qi21b transfers the information/data stored in the previous bit-level cell Mi3, further delayed by the delay time td2 determined by the second backward delay element D i22b to the backward capacitor Ci2b. When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrode of the first backward nMOS transistor Qi21b and the drain electrode of the second backward nMOS transistor Qi22b cannot deliver the information/data, which is entered to the gate electrode of the first backward nMOS transistor Qi21b, further to the next bit-level cell Mi1, at a time when time proceeds 1/2TAUclock, as the information/data is blocked to be transferred to the gate electrode of the next first backward nMOS transistor Qi11b delayed by the delay time td2 = 1/2TAUclock determined by the second backward delay element D i12b .
And, when the next clock signal supplied from the first clock signal supply line CL1 becomes the logical level of "1", the second forward nMOS transistor Qi32f in the third memory unit U3 begin to discharge the signal charge, which is already stored in the forward capacitor Ci3f in the third memory unit U3 at the previous clock cycle. And, after the clock signal of the logical level of "1", supplied from the first clock signal supply line CL1, is applied to the second forward nMOS transistor Qi32f, and the signal charge stored in the forward capacitor Ci3f is completely discharged to becomes the logical level of "0", the first forward nMOS transistor Qi31f becomes active as the transfer transistor, delayed by the delay time td1 determined by the first forward delay element Di31f. Thereafter, when the information/data stored in the previous forward capacitor Ci2f is fed to the gate electrode of the first forward nMOS transistor Qi31f, and the first forward nMOS transistor Qi31f transfers the information/data, delayed by the delay time td2 determined by the second forward delay element Di32f to the forward capacitor Ci3f. When the clock signal supplied from the first clock signal supply line CL1 becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrode of the first forward nMOS transistor Qi31f and the drain electrode of the second forward nMOS transistor Qi32f cannot deliver the information/data, which is entered to the gate electrode of the first forward nMOS transistor Qi31f, further to the next bit-level cell Mi4 (illustration is omitted), at a time when time proceeds 1/2TAUclock, as the information/data is blocked to be transferred to the gate electrode of the next first forward nMOS transistor Qi41f (illustration is omitted) delayed by the delay time td2 = 1/2TAUclock determined by the second forward delay element D i42f (illustration is omitted).
When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "1", the second backward nMOS transistor Qi32b begins to discharge the signal charge, which is already stored in the backward capacitor Ci3b at a previous clock cycle. After the clock signal supplied from the second clock signal supply line CL2 of the logical level of "1" is applied and the signal charge stored in the backward capacitor Ci3b is completely discharged to becomes the logical level of "0", the first backward nMOS transistor Qi31b becomes active as the transfer transistor, delayed by the delay time td1 determined by the first backward delay element Di31b. Thereafter, when the information/data is fed from the backward output terminal of the bit-level cell Mi3 to the gate electrode of the first backward nMOS transistor Qi31b, the first backward nMOS transistor Qi31b transfers the information/data stored in the previous bit-level cell Mi3, further delayed by the delay time td2 determined by the second backward delay element D i32b to the backward capacitor Ci3b. When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrode of the first backward nMOS transistor Qi31b and the drain electrode of the second backward nMOS transistor Qi32b cannot deliver the information/data, which is entered to the gate electrode of the first backward nMOS transistor Qi31b, further to the next bit-level cell Mi2, at a time when time proceeds 1/2TAUclock, as the information/data is blocked to be transferred to the gate electrode of the next first backward nMOS transistor Qi21b delayed by the delay time td2 = 1/2TAUclock determined by the second backward delay element D i22b .
And, when the next clock signal supplied from the first clock signal supply line CL1 becomes the logical level of "1", the second forward nMOS transistor Qi(n-1)2f in the third memory unit U(n-1) begin to discharge the signal charge, which is already stored in the forward capacitor Ci(n-1)f in the third memory unit U(n-1) at the previous clock cycle. And, after the clock signal of the logical level of "1", supplied from the first clock signal supply line CL1, is applied to the second forward nMOS transistor Qi(n-1)2f, and the signal charge stored in the forward capacitor Ci(n-1)f is completely discharged to becomes the logical level of "0", the first forward nMOS transistor Qi(n-1)1f becomes active as the transfer transistor, delayed by the delay time td1 determined by the first forward delay element Di(n-1)1f. Thereafter, when the information/data stored in the previous forward capacitor Ci2f is fed to the gate electrode of the first forward nMOS transistor Qi(n-1)1f, and the first forward nMOS transistor Qi(n-1)1f transfers the information/data, delayed by the delay time td2 determined by the second forward delay element Di(n-1)2f to the forward capacitor Ci(n-1)f. When the clock signal supplied from the first clock signal supply line CL1 becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrode of the first forward nMOS transistor Qi(n-1)1f and the drain electrode of the second forward nMOS transistor Qi(n-1)2f cannot deliver the information/data, which is entered to the gate electrode of the first forward nMOS transistor Qi(n-1)1f, further to the next bit-level cell Min, at a time when time proceeds 1/2TAUclock, as the information/data is blocked to be transferred to the gate electrode of the next first forward nMOS transistor Qin1f delayed by the delay time td2 = 1/2TAUclock determined by the second forward delay element D in2f.
When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "1", the second backward nMOS transistor Qi(n-1)2b begins to discharge the signal charge, which is already stored in the backward capacitor Ci(n-1)b at a previous clock cycle. After the clock signal supplied from the second clock signal supply line CL2 of the logical level of "1" is applied and the signal charge stored in the backward capacitor Ci(n-1)b is completely discharged to becomes the logical level of "0", the first backward nMOS transistor Qi(n-1)1b becomes active as the transfer transistor, delayed by the delay time td1 determined by the first backward delay element Di(n-1)1b. Thereafter, when the information/data is fed from the backward output terminal of the bit-level cell Mi(n-1) to the gate electrode of the first backward nMOS transistor Qi(n-1)1b, the first backward nMOS transistor Qi(n-1)1b transfers the information/data stored in the previous bit-level cell Mi(n-1), further delayed by the delay time td2 determined by the second backward delay element D i(n-1)2b to the backward capacitor Ci(n-1)b. When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrode of the first backward nMOS transistor Qi(n-1)1b and the drain electrode of the second backward nMOS transistor Qi(n-1)2b cannot deliver the information/data, which is entered to the gate electrode of the first backward nMOS transistor Qi(n-1)1b, further to the next bit-level cell Mi(n-2) (illustration is omitted), at a time when time proceeds 1/2TAUclock, as the information/data is blocked to be transferred to the gate electrode of the next first backward nMOS transistor Qi(n-2)1b (illustration is omitted)delayed by the delay time td2 = 1/2TAUclock determined by the second backward delay element D i(n-2)2b (illustration is omitted).
And, when the next clock signal supplied from the first clock signal supply line CL1 becomes the logical level of "1", the second forward nMOS transistor Qin2f in the third memory unit Un begin to discharge the signal charge, which is already stored in the forward capacitor Cinf in the third memory unit Un at the previous clock cycle. And, after the clock signal of the logical level of "1", supplied from the first clock signal supply line CL1, is applied to the second forward nMOS transistor Qin2f, and the signal charge stored in the forward capacitor Cinf is completely discharged to becomes the logical level of "0", the first forward nMOS transistor Qin1f becomes active as the transfer transistor, delayed by the delay time td1 determined by the first forward delay element Din1f. Thereafter, when the information/data stored in the previous forward capacitor Ci2f is fed to the gate electrode of the first forward nMOS transistor Qin1f, and the first forward nMOS transistor Qin1f transfers the information/data, delayed by the delay time td2 determined by the second forward delay element Din2f to the forward capacitor Cinf. The output node connecting the source electrode of the first forward nMOS transistor Qin1f and the drain electrode of the second forward nMOS transistor Qin2f delivers the information/data, which is entered to the gate electrode of the first forward nMOS transistor Qin1f to the second I/O selector 513.
When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "1", the second backward nMOS transistor Qin2b begins to discharge the signal charge, which is already stored in the backward capacitor Cinb at a previous clock cycle. After the clock signal supplied from the second clock signal supply line CL2 of the logical level of "1" is applied and the signal charge stored in the backward capacitor Cinb is completely discharged to becomes the logical level of "0", the first backward nMOS transistor Qin1b becomes active as the transfer transistor, delayed by the delay time td1 determined by the first backward delay element Din1b. Thereafter, when the information/data is fed from the second I/O selector 513 to the gate electrode of the first backward nMOS transistor Qin1b, the first backward nMOS transistor Qin1b transfers the information/data received from the second I/O selector 513, further delayed by the delay time td2 determined by the second backward delay element D in2b to the backward capacitor Cinb. When the clock signal supplied from the second clock signal supply line CL2 becomes the logical level of "0" at a time when time proceeds 1/2TAUclock, the output node connecting the source electrode of the first backward nMOS transistor Qin1b and the drain electrode of the second backward nMOS transistor Qin2b cannot deliver the information/data, which is entered to the gate electrode of the first backward nMOS transistor Qin1b, further to the next bit-level cell Mi(n-2) (illustration is omitted), at a time when time proceeds 1/2TAUclock, as the information/data is blocked to be transferred to the gate electrode of the next first backward nMOS transistor Qi(n-2)1b (illustration is omitted)delayed by the delay time td2 = 1/2TAUclock determined by the second backward delay element D i(n-2)2b (illustration is omitted).
Therefore, in the bidirectional marching main memory illustrated in Fig. 32, each of the cells Mi1, Mi2, Mi3,........., Mi,(n-1), Mi,n on the i-th row of the bidirectional marching main memory stores the information/data, and transfers bi-directionally the information/data, synchronously with the clock signals supplied respectively from the first clock signal supply line CL1 and the second clock signal supply line CL2, step by step, between the first I/O selector 512 and the second I/O selector 513. As already explained, because each of the cells Mi1, Mi2, Mi3,........., Mi,n-1, Mi,n is assigned in memory unit U1, U2, U3,........., Un-1, Un, respectively, and the memory units U2, U3,........., Un-1, Un stores information of byte size or word size by the sequence of bit-level cells arrayed in the memory unit U2, U3,........., Un-1, Un, respectively, the bidirectional marching main memory 31 illustrated in Fig.32 stores the information/data of byte size or word size in each of cells U1, U2, U3,........., Un-1, Un and transfers bi-directionally the information/data of byte size or word size synchronously with the clock signal, pari passu, in the forward direction and/or reverse direction (backward direction) between a first I/O selector 512 and a second I/O selector 513, so as to provide the processor 11 with the stored information/data of byte size or word size actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information/data.
As illustrated in Fig. 33, a forward isolation transistor Qi23f is provided so as to isolate the signal-storage state of the second bit-level cell Mi2 in the second memory unit Un from the signal-storage state of the first bit-level cell Mi1 in the first memory unit U1, the forward isolation transistor Qi23f transfers forward a signal from the first bit-level cell Mi1 to the second bit-level cell Mi2 at a required timing determined by a clock signal, which is supplied through the first clock signal supply line CL1. And, a backward isolation transistor Qi13b is provided so as to isolate the signal-storage state of the signal-storage state of the first bit-level cell Mi1 in the first memory unit U1 from the second bit-level cell Mi2 in the second memory unit U2, the backward isolation transistor Qi13b transfers backward a signal from the second bit-level cell Mi2 to the first bit-level cell Mi1 at a required timing determined by a clock signal, which is supplied through the second clock signal supply line CL2. Then, a sequence of the forward isolation transistors Qi23f (i=1 to m; "m" is integer corresponding the byte size or the word size) arrayed in parallel with the memory units U1 and U2 transfers forward the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line CL1 so that the information of byte size or word size can march along the forward direction, pari passu. And, a sequence of the backward isolation transistors Qi13b (i=1 to m) arrayed in parallel with the memory units U1 and U2 transfers backward the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line CL2 so that the information of byte size or word size can march along the backward direction, pari passu.
And, similarly, a backward isolation transistor Qi23b is provided so as to isolate the signal-storage state of the signal-storage state of the second bit-level cell Mi2 in the second memory unit U2 from the third bit-level cell Mi3 (the illustration is omitted) in the third memory unit U3, the backward isolation transistor Qi23b transfers backward a signal from the third bit-level cell Mi3 to the second bit-level cell Mi2 at a required timing determined by a clock signal, which is supplied through the third clock signal supply line CL2. And, a sequence of the backward isolation transistors Qi23b (i=1 to m) arrayed in parallel with the memory units U2 and U3 transfers backward the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line CL2 so that the information of byte size or word size can march along the backward direction, pari passu.
Furthermore, as illustrated in Fig. 33, a forward isolation transistor Qi(n-1)3f is provided so as to isolate the signal-storage state of the (n-1)-th bit-level cell Mi(n-1) in the (n-1)-th memory unit Un-1 from the signal-storage state of the (n-2)-th bit-level cell Mi(n-2) (the illustration is omitted) in the (n-2)-th memory unit Un-2(the illustration is omitted), the forward isolation transistor Qi(n-1)3f transfers forward a signal from the (n-2)-th bit-level cell Mi(n-2) to the (n-1)-th bit-level cell Mi(n-1) at a required timing determined by a clock signal, which is supplied through the first clock signal supply line CL1. Then, a sequence of the forward isolation transistors Qi(n-1)3f (i=1 to m) arrayed in parallel with the memory units Un-2 and Un-1 transfers the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line CL1 so that the information of byte size or word size can march along the forward direction, pari passu.
And, a forward isolation transistor Qin3f is provided so as to isolate the signal-storage state of the n-th bit-level cell Min in the n-th memory unit Un from the signal-storage state of the (n-1)-th bit-level cell Min-1 in the (n-1)-th memory unit Un-1, the forward isolation transistor Qin3f transfers forward a signal from the (n-1)-th bit-level cell Min-1 to the n-th bit-level cell Min at a required timing determined by a clock signal, which is supplied through the first clock signal supply line CL1. And, a backward isolation transistor Qin3b is provided so as to isolate the signal-storage state of the signal-storage state of the (n-1)-th bit-level cell Min-1 in the (n-1)-th memory unit Un-1 from the n-th bit-level cell Min in the n-th memory unit Un, the backward isolation transistor Qin3b transfers backward a signal from the n-th bit-level cell Min to the (n-1)-th bit-level cell Min-1 at a required timing determined by a clock signal, which is supplied through the second clock signal supply line CL2. Then, a sequence of the forward isolation transistors Qin3f (i=1 to m) arrayed in parallel with the memory units Un-1 and Un transfers the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line CL1 so that the information of byte size or word size can march along the forward direction, pari passu. And, a sequence of the backward isolation transistors Qin3b (i=1 to m) arrayed in parallel with the memory units Un-1 and Un transfers the information of byte size or word size, controlled by the clock signal supplied through the clock signal supply line CL2 so that the information of byte size or word size can march along the backward direction, pari passu.
In the configuration of the bidirectional marching main memory illustrated in Figs. 32 and 33, the forward capacitor Cijf and the backward capacitor Cij shall preferably be merged into a single common capacitor so as to implement random access mode with high locality. Fig. 34 illustrates an array of i-th row of the m * n matrix (here, "m" is an integer determined by word size) in a gate-level representation of the bidirectional marching main memory 31, which can achieve the random access mode in the bidirectional behavior illustrated in Figs. 31 (a)-(c).
As illustrated in Fig. 34, two kinds of marching AND-gates are assigned to each of the cells Mi1, Mi2, Mi3,........., Mi,(n-1), Mi,n on the i-th row so as to establish a bidirectional transfer of information/data with random access mode. The bidirectional marching main memory 31 stores the information/data of bit level in each of cells Mi1, Mi2, Mi3,........., Mi,n-1, Mi,n and transfers bi-directionally the information/data synchronously with the clock signal, step by step in the forward direction and/or reverse direction (backward direction) between a first I/O selector 512 and a second I/O selector 513.
In the gate-level representation of cell array implementing the marching main memory 31 illustrated in Fig. 34, a first bit-level cell Mi1 allocated at the leftmost side on i-th row and connected to first I/O selector 512 encompasses a common capacitor Ci1 configured to store the information/data, and a forward marching AND-gate Gi1f having one input terminal connected to the common capacitor Ci1, the other input supplied with the first clock signal supply line CL1, and an output terminal connected to one input terminal of the next forward marching AND-gate G(i+1)1f assigned to the adjacent second bit-level cell M(i+1)1 on the i-th row, and a backward marching AND-gate Gi1b having one input terminal connected to the common capacitor Ci1, the other input supplied with the second clock signal supply line CL2, and an output terminal connected to the first I/O selector 512.
The first clock signal supply line CL1, configured to drive the forward data-stream, and the second clock signal supply line CL2, configured to drive the backward data-stream, are respectively selected by a clock selector 511, and each of the first clock signal supply line CL1 and the second clock signal supply line CL2 has logical values of "1" and "0". When the logical values of "1" of the first clock signal supply line CL1 is fed to the other input terminal of the forward marching AND-gate Gi1, the information/data stored in the common capacitor Ci1 is transferred to a common capacitor Ci2, assigned to the adjacent second bit-level cell Mi2, and the common capacitor Ci2 stores the information/data.
The second bit-level cell Mi2 on the i-th row of the bidirectional marching main memory 31 encompasses the common capacitor Ci2 configured to store the information/data, a forward marching AND-gate Gi2f, which has one input terminal connected to the common capacitor Ci2, the other input supplied with the first clock signal supply line CL1, and an output terminal connected to one input terminal of the next forward marching AND-gate G13 assigned to the adjacent third bit-level cell Mi3 on the i-th row, and the backward marching AND-gate Gi2b having one input terminal connected to the common capacitor Ci2, the other input supplied with the second clock signal supply line CL2, and an output terminal connected to one input terminal of the preceding backward marching AND-gate Gi1.
Similarly the third bit-level cell Mi3 on the i-th row encompasses a common capacitor Ci3 configured to store the information/data, a forward marching AND-gate Gi3f having one input terminal connected to the common capacitor Ci3, the other input supplied with the first clock signal supply line CL1, and an output terminal connected to one input terminal of the next forward marching AND-gate assigned to the adjacent fourth cell, although the illustration of the fourth cell is omitted, and an backward marching AND-gate Gi3b having one input terminal connected to the common capacitor Ci3, the other input supplied with the second clock signal supply line CL2, and an output terminal connected to one input terminal of the preceding backward marching AND-gate Gi2b assigned to the adjacent second bit-level cell Mi2. Therefore, when the logical values of "1" of the first clock signal supply line CL1 is fed to the other input terminal of the forward marching AND-gate Gi2f, the information/data stored in the common capacitor Ci2 is transferred to the common capacitor Ci3, assigned to the third bit-level cell Mi3, and the common capacitor Ci3 stores the information/data, and when the logical values of "1" of the first clock signal supply line CL1 is fed to the other input terminal of the forward marching AND-gate Gi3f, the information/data stored in the common capacitor Ci3 is transferred to the capacitor, assigned to the fourth cell.
Furthermore, an (n-1)-th bit-level cell Mi,(n-1) on the i-th row encompasses a common capacitor Ci,(n-1), configured to store the information/data, and a forward marching AND-gate Gi,(n-1)f having one input terminal connected to the common capacitor Ci,(n-1), the other input supplied with the first clock signal supply line CL1, and an output terminal connected to one input terminal of the next forward marching AND-gate Gi,nf assigned to the adjacent n-th bit-level cell Mi,n, which is allocated at the rightmost side on the i-th row and connected to the second I/O selector 513, and an backward marching AND-gate Gi,(n-1)b, which has one input terminal connected to the common capacitor Ci,(n-1), the other input supplied with the second clock signal supply line CL2, and an output terminal connected to one input terminal of the preceding backward marching AND-gate Gi,(n-2)b assigned to the adjacent third bit-level cell Mi,(n-2)b (illustration is omitted).
Finally, an n-th bit-level cell Mi,n allocated at the rightmost side on the i-th row and connected to the second I/O selector 513 encompasses a common capacitor Ci,n configured to store the information/data, a backward marching AND-gate Ginb having one input terminal connected to the common capacitor Cin, the other input terminal configured to be supplied with the second clock signal supply line CL2, and an output terminal connected to one input terminal of the preceding backward marching AND-gate Gi(n-1)b assigned to the adjacent (n-1)-th bit-level cell Mi,n-1 on the i-th row, and a forward marching AND-gate Gi,nf having one input terminal connected to the common capacitor Ci,n, the other input terminal configured to be supplied with the first clock signal supply line CL1, and an output terminal connected to the second I/O selector 513.
When the logical values of "1" of the second clock signal supply line CL2 is fed to the other input terminal of the backward marching AND-gate Ginb, the information/data stored in the common capacitor Cin is transferred to a common capacitor Ci,(n-1), assigned to the adjacent (n-1)-th bit-level cell Mi,(n-1) on the i-th row, and the common capacitor Ci,(n-1) stores the information/data. Then, when the logical values of "1" of the second clock signal supply line CL2 is fed to the other input terminal of the backward marching AND-gate Gi3b, the information/data stored in the common capacitor Ci3 is transferred to the common capacitor Ci2, assigned to the second bit-level cell Mi2, and the common capacitor Ci2 stores the information/data. Furthermore, when the logical values of "1" of the second clock signal supply line CL2 is fed to the other input terminal of the backward marching AND-gate Gi2b, the information/data stored in the common capacitor Ci2 is transferred to the common capacitor Ci1, assigned to the second bit-level cell Mi1, and the common capacitor Ci1 stores the information/data, and when the logical values of "1" of the second clock signal supply line CL2 is fed to the other input terminal of the backward marching AND-gate Gi1b, the information/data stored in the common capacitor Ci1 is transferred to the first I/O selector 512.
Therefore, each of the cells Mi1, Mi2, Mi3,........., Mi,(n-1), Mi,n on the i-th row of the bidirectional marching main memory stores the information/data, and transfers bi-directionally the information/data, synchronously with the clock signals supplied respectively from the first clock signal supply line CL1 and the second clock signal supply line CL2, step by step, between the first I/O selector 512 and the second I/O selector 513. Because each of the cells Mi1, Mi2, Mi3,........., Mi,n-1, Mi,n is assigned in memory unit U1, U2, U3,........., Un-1, Un, respectively, and the memory units U2, U3,........., Un-1, Un stores information of byte size or word size by the sequence of bit-level cells arrayed in the memory unit U2, U3,........., Un-1, Un, respectively, the bidirectional marching main memory 31 illustrated in Fig.34 stores the information/data of byte size or word size in each of cells U1, U2, U3,........., Un-1, Un and transfers bi-directionally the information/data of byte size or word size synchronously with the clock signal, pari passu, in the forward direction and/or reverse direction (backward direction) between a first I/O selector 512 and a second I/O selector 513, so as to provide the processor 11 with the stored information/data of byte size or word size actively and sequentially so that the ALU 112 can execute the arithmetic and logic operations with the stored information/data.
(POSITION POINTING STRATEGY)
Fig. 35(a) illustrates a bidirectional transferring mode of instructions in a one-dimensional marching main memory adjacent to a processor, where the instructions moves toward the processor, and moves from / to the next memory. Fig. 35(b) illustrates a bidirectional transferring mode of scalar data in a one-dimensional marching main memory adjacent to an ALU 112, the scalar data moves toward the ALU and moves from / to the next memory. Fig. 35(c) illustrates a uni-directional transferring mode of vector/streaming data in a one-dimensional marching main memory adjacent to a pipeline 117, which will be explained in the following third embodiment, the vector/streaming data moves toward the pipeline 117, and moves from the next memory.
The marching main memory 31 used in the computer system pertaining to the first embodiment uses positioning to identify the starting point and ending point of a set of successive memory units U1, U2, U3,........., Un-1, Un in vector/streaming data. On the other hand, for programs and scalar data, each item must have a position index similar to conventional address. Fig. 36(a) illustrates a configuration of conventional main memory, in which every memory units U1, U2, U3,........., Un-1, Un in are labeled by addresses A1, A2, A3,........., An-1, An, Fig. 36(b) illustrates a configuration of one-dimensional marching main memory, in which the positioning of individual memory unit U1, U2, U3,........., Un-1, Un is not always necessary, but the positioning of individual memory unit U1, U2, U3,........., Un-1, Un is at least necessary to identify the starting point and ending point of a set of successive memory units in vector/streaming data.
Fig. 37(a) illustrates an inner configuration of present one-dimensional marching main memory, in which the position indexes like existing addresses are not necessary for scalar instruction Is, but the positioning of individual memory unit is at least necessary to identify the starting point and ending point of a set of successive memory units in vector instruction Iv, as indicated by hatched circle. Fig. 37(b) illustrates an inner configuration of present one-dimensional marching main memory, in which the position indexes are not necessary for scalar data "b" and "a". However, as illustrated in Fig. 37(c), position indexes are at least necessary to identify the starting point and ending point of a set of successive memory units in vector/streaming data "o", "p", "q", "r", "s", "t",.......... as indicated by hatched circle.
In a marching memory family, which includes a marching-instruction register file 22a and a marching-data register file 22b connected to the ALU 112, which will be explained in the following second embodiment, and a marching-instruction cache memory 21a and a marching-data cache memory 21b, which will be explained in the following third embodiment, in addition to the marching main memory 31 used in the computer system pertaining to the first embodiment of the present invention, the relation between the main memory, the register file and cache memory has their own position pointing strategy based on the property of locality of reference.
Fig. 38(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages Pi-1,j-1, Pi,j-1, Pi+1,j-1, Pi+2,j-1, Pi-1,j, Pi,j, Pi+1,j, Pi+2,j for vector/streaming data case. Fig. 38(b) illustrates schematically an example of a configuration of the hatched page Pi,j, which is implemented by a plurality of files F1, F2, F3, F4 for vector/streaming data case, and each of the pages Pi-1,j-1, Pi,j-1, Pi+1,j-1, Pi+2,j-1, Pi-1,j, Pi,j, Pi+1,j, Pi+2,j can be used for marching cache memories 21a and 21b in the third embodiment. Fig. 38(c) illustrates schematically an example of a configuration of the hatched file F3, each of the files F1, F2, F3, F4 is implemented by a plurality of memory units U1, U2, U3,........., Un-1, Un for vector/streaming data case, and each of the file files F1, F2, F3, F4 can be used for marching register files 22a and 22b in the second embodiment.
Similarly, Fig. 39(a) illustrates schematically an example of an overall configuration of present marching main memory implemented by a plurality of pages Pr-1,s-1, Pr,s-1, Pr+1,s-1, Pr+2,s-1, Pr-1,s, Pr,s, Pr+1,s, Pr+2,s for programs/scalar data case, where each pages has its own position index as an address. Fig. 39(b) illustrates schematically an example of a configuration of the hatched page Pr-1,s and the driving positions of the page Pr-1,s, using digits in the binary system, each of the page Pr-1,s-1, Pr,s-1, Pr+1,s-1, Pr+2,s-1, Pr-1,s, Pr,s, Pr+1,s, Pr+2,s is implemented by a plurality of files F1, F2, F3, F4 for programs/scalar data case. Each of the page Pr-1,s-1, Pr,s-1, Pr+1,s-1, Pr+2,s-1, Pr-1,s, Pr,s, Pr+1,s, Pr+2,s can be used for marching cache memories 21a and 21b in the third embodiment, where each of the files F1, F2, F3, F4 has its own position index as address. Fig. 39(c) illustrates schematically an example of a configuration of the hatched file F3 and the driving positions of the file F3, using digits 0, 1, 2, 3 in the binary system, each of the files F1, F2, F3, F4 is implemented by a plurality of memory units U1, U2, U3,........., Un, Un+1 , Un+2 , Un+3 , Un+4 , Un+5 for programs/scalar data case. Each of the files F1, F2, F3, F4 can be used for a marching register files 22a and 22b in the second embodiment, where each memory units U1, U2, U3,........., Un, Un+1 , Un+2 , Un+3 , Un+4 , Un+5 has its own position index n+4, n+3, n+2,........., 5, 4, 3, 2, 1, 0 as address. Fig. 39(c) represents position pointing strategy for all of the cases by digits in the binary system.
As illustrated in Fig. 39(c), the n binary digits identify a single memory unit among 2n memory units, respectively, in a memory structure having an equivalent size corresponding to the size of a marching register file. And, as illustrated in Fig. 39(b), the structure of one page has an equivalent size corresponding to the size of a marching cache memory, which is represented by two digits which identify four files F1, F2, F3, F4, while the structure of one marching main memory is represented by three digits which identify eight pages Pr-1,s-1, Pr,s-1, Pr+1,s-1, Pr+2,s-1, Pr-1,s, Pr,s, Pr+1,s, Pr+2,s in the marching main memory as illustrated in Fig. 39(a).
(SPEED/CAPABILITY)
The speed gap between memory access time and the CPU cycle time in a conventional computer system is, for example, 1:100. However, the speed of the marching memory access time is equal to the CPU cycle time in the computer system of the first embodiment. Fig. 40 compares the speed/capability of the conventional computer system without cache with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, Fig. 40(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U1, U2, U3,........., U100, and compares with the speed/capability of the existing memory illustrated in Fig. 40(a). We can also support 99 additional simultaneous memory units of the marching main memory 31, in the condition that we have necessary processing units to use the data from the marching main memory 31. Therefore, one memory unit time Tmue in the conventional computer system is estimated to be equal to one hundred of the memory unit streaming time Tmus of the marching main memory 31 pertaining to the first embodiment of the present invention.
And, Fig. 41 compares the speed/capability of the worst case of the existing memory for scalar data or program instructions with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, the hatched portion of Fig. 41(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U1, U2, U3,........., U100, and compares with the speed/capability of the worst case of the existing memory illustrated in Fig. 41(a). In the worst case, we can read out 99 memory units of the marching main memory 31, but they are not available due to a scalar program's requirement.
Further, Fig. 42 compares the speed/capability of the typical case of the existing memory for scalar data or program instructions with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, Fig. 42(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U1, U2, U3,........., U100, and compares with the speed/capability of the typical case of the existing memory illustrated in Fig. 42(a). In the typical case, we can read out 99 memory units but only several memory units are available, as illustrated by hatched memory units in the existing memory, by speculative data preparation in a scalar program.
Fig. 43 compares the speed/capability of the typical case of the existing memory for scalar data case with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, Fig. 43(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U1, U2, U3,........., U100, and compares with the speed/capability of the existing memory illustrated in Fig. 43(a). Similar to the case illustrated in Figs. 34 (a)-(b), in the typical case, we can read out 99 memory units but only several memory units are available, as illustrated by hatched memory units in the existing memory, by speculative data preparation in a scalar data or program instructions in multi-thread parallel processing.
Fig. 44 compares the speed/capability of the best case of the existing memory for streaming data, vector data or program instructions case with that of the marching main memory 31, configured to be used in the computer system pertaining to the first embodiment of the present invention. That is, Fig. 44(b) illustrates schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U1, U2, U3,........., U100,, and compares with the speed/capability of the best case of the existing memory illustrated in Fig. 44(a). In the best case, we can understand that one hundred memory units of the marching main memory 31 are usable for streaming data and data parallel.
(TWO-DIMENSIONAL MARCHING MAIN MEMORY)
The memory units can be arranged two-dimensionally on a chip as illustrated in Figs. 45-51 so that various mode of operation can be achieved without a switch/network. According to the two-dimensional marching main memory 31 of the first embodiment illustrated in Figs. 45-51, the memory units U11, U12, U13,........., U1, v-1, U1v; U22, U22, U23,........., U2, v-2, U2v; ..........; Uu1, Uu2, Uu3,........., Uu, v-1, Uuv are not required of the refreshment, because all of the memory units U11, U12, U13,........., U1, v-1, U1v; U22, U22, U23,........., U2, v-2, U2v; ..........; Uu1, Uu2, Uu3,........., Uu, v-1, Uuv are usually refreshed automatically due to the information-moving scheme (information-marching scheme). And then addressing to each of memory units U11, U12, U13,........., U1, v-1, U1v; U22, U22, U23,........., U2, v-2, U2v; ..........; Uu1, Uu2, Uu3,........., Uu, v-1, Uuv disappears and required information is heading for its destination unit connected to the edge of the memory. The mechanism of accessing the two-dimensional marching main memory 31 of the first embodiment is truly alternative to existing memory schemes that are starting from the addressing mode to read/write information in the conventional computer system. Therefore, according to the two-dimensional marching main memory 31 of the first embodiment, the memory-accessing process without addressing mode in the computer system of the first embodiment is quite simpler than existing memory schemes of the conventional computer system.
(ENERGY CONSUMPTION)
To clarify the improvement of architecture, design and implementation of the computer system pertaining to the first embodiment of the present invention, the improvement in energy consumption will be explained. Fig. 52(a) shows that the energy consumption in microprocessors can be decomposed into static power consumption and dynamic power consumption. In the dynamic power consumption illustrated in Fig. 52(a), net and overhead of the power consumption are outstandingly illustrated in Fig. 52(b). As illustrated in Fig. 52(c), only the net energy portions are practically necessary to operate a given job in a computer system, so that these pure energy parts make least energy consumption to perform the computer system. This means the shortest processing time is achieved by the net energy consumed illustrated in Fig. 52(c).
Even though some efforts are introduced into architecting, designing and implementing processors, there are bottlenecks in the conventional architecture as illustrated in Fig. 1. In the conventional architecture, there are various issues in the von Neumann computer, as follows:

1) Programs are stored like data in memory;
2) All processing is basically sequential in a uni-processor;
3) The operation of programs is the sequential execution of instructions;
4) Vector data is sequentially processed by the CPU with vector instructions;
5) Streaming data is sequentially processed with threads;
6) Programs then threads are arranged sequentially;
7) Data parallel consists of an arrangement of data as a vector: and
8) Streaming data is a flow of data
From the properties of a conventional computer, we conclude that storage of programs and data is in a fashion of basically sequentially arranged ones. This fact means the regular arrangement of instructions exists in a program and the corresponding data. .
In the computer system pertaining to the first embodiment of the present invention illustrated in Fig.2, the access of instructions in the marching main memory 31 is not necessary, because instructions are actively accessed by themselves to processor 11. Similarly, the access of data in the marching main memory 31 is not necessary, because data are actively accessed by themselves to processor 11.
Fig. 53 shows an actual energy consumption distribution over a processor including registers and caches in the conventional architecture, estimated by William J. Dally, et al., in "Efficient Embedded Computing", Computer, vol. 41, no. 7, 2008, pp. 27-32. In Fig. 53, an estimation of the power consumption distribution on only the whole chip, except for wires between chips is disclosed. By Dally, et al, the instruction supply power consumption is estimated to be 42%, the data supply power consumption is estimated to be 28%, the clock and control logic power consumption is estimated to be 24%, and the arithmetic power consumption is estimated to be 6%. Therefore, we can understand that instruction supply and data supply power consumptions are relatively larger than of the clock/ control logic power consumption and the arithmetic power consumption, which is ascribable to the inefficiency of cache/register accessing with lots of wires and some software overhead due to access ways of these caches and registers in addition to non-refreshment of all the memories, caches and registers.
Because the ratio of the instruction supply power consumption to the data supply power consumption is 3:2, and the ratio of the clock and control logic power consumption to the arithmetic power consumption is 4:1, in accordance with the computer system pertaining to the first embodiment of the present invention illustrated in Fig.2, we can reduce easily the data supply power consumption to 20% by using the marching main memory 31 at least partly so that the instruction supply power consumption becomes 30%, while we can increase the arithmetic power consumption to 10% so that the clock and control logic power consumption become 40%, which means that the sum of the instruction supply power consumption and the data supply power consumption can be made 50%, and the sum of the clock and control logic power consumption and the arithmetic power consumption can be made 50%.
If we reduce the data supply power consumption to 10%, the instruction supply power consumption becomes 15%, and if we increase the arithmetic power consumption to 15%, the clock and control logic power consumption will become 60%, which means that the sum of the instruction supply power consumption and the data supply power consumption can be made 35%, while the sum of the clock and control logic power consumption and the arithmetic power consumption can be made 75%.
The conventional computer system dissipates energy as illustrated in the Fig. 54(a) with a relatively large average active time for addressing and read/writing memory units, accompanied by wire delay time, while the present computer system dissipates smaller energy as illustrated in the Fig. 54(b), because the present computer system has a shorter average active smooth time through marching memory, and we could process the same data faster than the conventional computer system with less energy.
--SECOND EMBODIMENT--
As illustrated in Fig. 55, a computer system pertaining to a second embodiment of the present invention encompasses a processor 11 and a marching main memory 31. The processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, an arithmetic logic unit (ALU) 112 configured to execute arithmetic and logic operations synchronized with the clock signal, a marching-instruction register file (RF) 22a connected to the control unit 111 and a marching-data register file (RF) 22b connected to the ALU 112.
Although the illustration is omitted, very similar to the marching main memory 31 illustrated in Figs. 3-24, 25(a), 25(b), 26 and 45-51, the marching-instruction register file 22a has an array of instruction register units, instruction-register input terminals of the third array configured to receive the stored instruction from the marching main memory 31, and instruction-register output terminals of the third array, configured to store instruction in each of instruction register units and to transfer successively and periodically the stored instruction in each of instruction register units to an adjacent instruction register unit being synchronized with the clock signal from the instruction register units adjacent to the instruction-register input terminals toward the instruction register units adjacent to the instruction-register output terminals, so as to provide actively and sequentially instruction implemented by the stored instruction to the control unit 111 through the instruction-register output terminals so that the control unit 111 can execute operations with the instruction.
Further similar to the marching main memory 31 illustrated in Figs. 3-24, 25(a), 25(b), 26 and 45-51, the marching-data register file 22b has an array of data register units, data-register input terminals of the fourth array configured to receive the stored data from the marching main memory 31, and data-register output terminals of the fourth array, configured to store data in each of data register units and to transfer successively and periodically the stored data in each of data register units to an adjacent data register unit being synchronized with the clock signal from the data register units adjacent to the data-register input terminals toward the data register units adjacent to the data-register output terminals, so as to provide actively and sequentially the data to the ALU 112 through the data-register output terminals so that the ALU 112 can execute operations with the data, although the detailed illustration of, the marching-data register file 22b is omitted,.
As illustrated in Fig. 55, a portion of the marching main memory 31 and the marching-instruction register file 22a are electrically connected by a plurality of joint members 54, and remaining portion of the marching main memory 31 and the marching-data register file 22b are electrically connected by another plurality of joint members 54.
The resultant data of the processing in the ALU 112 are sent out to the marching-data register file 22b. Therefore, as represented by bidirectional arrow PHI(Greek-letter)24, data are transferred bi-directionally between the marching-data register file 22b and the ALU 112. Furthermore, the data stored in the marching-data register file 22b are sent out to the marching main memory 31 through the joint members 54. Therefore, as represented by bidirectional arrow PHI23, data are transferred bi-directionally between the marching main memory 31 and the marching-data register file 22b through the joint members 54.
On the contrary, as represented by uni-directional arrows ETA(Greek-letter)22 and ETA23, as to the instructions movement, there is only one way of instruction-flow from the marching main memory 31 to the marching-instruction register file 22a, and from the marching-instruction register file 22a to the control unit 111.
In the computer system of the second embodiment illustrated in Fig. 55, there are no buses consisting of the data bus and address bus because the whole computer system has no wires even in any data exchange between the marching main memory 31 and the marching-instruction register file 22a, between the marching main memory 31 and the marching-data register file 22b, between the marching-instruction register file 22a and the control unit 111 and between the marching-data register file 22band the ALU 112, while the wires or the buses implement the bottleneck in the conventional computer system. As there are no global wires, which generate time delay and stray capacitances between these wires, the computer system of the second embodiment can achieve much higher processing speed and lower power consumption.
Since other functions, configurations, and ways of operation of the computer system pertaining to the second embodiment are substantially similar to the functions, configurations, way of operation already explained in the first embodiment, overlapping or redundant description may be omitted.
--THIRD EMBODIMENT--
As illustrated in Fig. 56, a computer system pertaining to a third embodiment of the present invention encompasses a processor 11, a marching-cache memory (21a, 21b) and a marching main memory 31. Similar to the second embodiment, the processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, an arithmetic logic unit (ALU) 112 configured to execute arithmetic and logic operations synchronized with the clock signal, a marching-instruction register file (RF) 22a connected to the control unit 111 and a marching-data register file (RF) 22b connected to the ALU 112.
The marching-cache memory (21a, 21b) embraces a marching-instruction cache memory 21a and a marching-data cache memory 21b. Although the illustration is omitted, very similar to the marching main memory 31 illustrated in Figs. 3-24, 25(a), 25(b), 26 and 45-51, each of the marching-instruction cache memory 21a and the marching-data cache memory21b has an array of cache memory units at locations corresponding to a unit of information, cache input terminals of the array configured to receive the stored information from the marching main memory 31, and cache output terminals of the array, configured to store information in each of cache memory units and to transfer, synchronously with the clock signal, step by step, the information each to an adjacent cache memory unit, so as to provide actively and sequentially the stored information to the processor 11 so that the ALU 112 can execute the arithmetic and logic operations with the stored information.
As illustrated in Fig. 56, a portion of the marching main memory 31 and the marching-instruction cache memory 21a are electrically connected by a plurality of joint members 52, and remaining portion of the marching main memory 31 and the marching-data cache memory 21b are electrically connected by another plurality of joint members 52. Furthermore, the marching-instruction cache memory 21a and the marching-instruction register file 22a are electrically connected by a plurality of joint members 51, and the marching-data cache memory 21b and the marching-data register file 22b are electrically connected by another plurality of joint members 51.
The resultant data of the processing in the ALU 112 are sent out to the marching-data register file 22b, and, as represented by bidirectional arrow PHI(Greek-letter)34, data are transferred bi-directionally between the marching-data register file 22b and the ALU 112. Furthermore, the data stored in the marching-data register file 22b are sent out to the marching-data cache memory 21b through the joint members 51, and, as represented by bidirectional arrow PHI33, data are transferred bi-directionally between the marching-data cache memory 21b and the marching-data register file 22b through the joint members 51. Furthermore, the data stored in the marching-data cache memory 21b are sent out to the marching main memory 31 through the joint members 52, and, as represented by bidirectional arrow PHI32, data are transferred bi-directionally between the marching main memory 31 and the marching-data cache memory 21b through the joint members 52.
On the contrary, as represented by uni-directional arrows ETA(Greek-letter)31, eta32 and eta33, as to the instructions movement, there is only one way of instruction-flow from the marching main memory 31 to the marching-instruction cache memory 21a, from the marching-instruction cache memory 21a to the marching-instruction register file 22a, and from the marching-instruction register file 22a to the control unit 111.
In the computer system of the third embodiment illustrated in Fig. 56, there are no buses consisting of the data bus and address bus because the whole computer system has no global wires even in any data exchange between the marching main memory 31 and the marching-instruction cache memory 21a, between the marching-instruction cache memory 21a and the marching-instruction register file 22a, between the marching main memory 31 and the marching-data cache memory 21b, between the marching-data cache memory 21b and the marching-data register file 22b, between the marching-instruction register file 22a and the control unit 111 and between the marching-data register file 22band the ALU 112, while the wires or the buses implement the bottleneck in the conventional computer system. As there are no global wires, which generate time delay and stray capacitances between these wires, the computer system of the third embodiment can achieve much higher processing speed and lower power consumption.
Since other functions, configurations, way of operation of the computer system pertaining to the third embodiment are substantially similar to the functions, configurations, way of operation already explained in the first and second embodiments, overlapping or redundant description may be omitted.
As illustrated in Fig. 57(a), the ALU 112 in the computer system of the third embodiment may includes a plurality of arithmetic pipelines P1, P2, P3,........., Pn configured to receive the stored information through marching register units R11, R12, R13,........., R1n; R22, R22, R23,........., R2n, in which data move in parallel with the alignment direction of the arithmetic pipelines P1, P2, P3,........., Pn. In case that vector data are stored, marching-vector register units R11, R12, R13,........., R1n; R22, R22, R23,........., R2n can be used.
Furthermore, as illustrated in Fig. 57(b), a plurality of marching cache units C11, C12, C13,........., C1n; C21, C22, C23,........., C2n; C31, C32, C33,........., C3n can be aligned in parallel.
As illustrated in Fig. 58, the ALU 112 in the computer system of the third embodiment may include a single processor core 116, and as represented by cross-directional arrows, the information can moves from the marching-cache memory 21 to the marching-register file 22, and from the marching-register file 22 to the processor core 116. The resultant data of the processing in the processor core 116 are sent out to the marching-register file 22 so that data are transferred bi-directionally between the marching-register file 22 and the processor core 116. Furthermore, the data stored in the marching-register file 22 are sent out to the marching-cache memory 21 so that data are transferred bi-directionally between the marching-cache memory 21 and the marching-register file 22. In case of instructions movement, there is no flow along the opposite direction of the information to be processed.
As illustrated in Fig. 59, the ALU 112 in the computer system of the third embodiment may include a single arithmetic pipeline 117, and as represented by cross-directional arrows, the information can moves from the marching-cache memory 21 to the marching-vector register file 22v, and from the marching-vector register file 22v to the arithmetic pipeline 117. The resultant data of the processing in the arithmetic pipeline 117 are sent out to the marching-vector register file 22v so that data are transferred bi-directionally between the marching-vector register file 22v and the arithmetic pipeline 117. Furthermore, the data stored in the marching-vector register file 22v are sent out to the marching-cache memory 21 so that data are transferred bi-directionally between the marching-cache memory 21 and the marching-vector register file 22v. In case of instructions movement, there is no flow along the opposite direction of the information to be processed.
As illustrated in Fig. 60, the ALU 112 in the computer system of the third embodiment may include a plurality of processor cores 116-1, 116-2, 116-3, 116-4,........., 116-m, and as represented by cross-directional arrows, the information can moves from the marching-cache memory 21 to the marching-register file 22, and from the marching-register file 22 to the processor cores 116-1, 116-2, 116-3, 116-4,........., 116-m. The resultant data of the processing in the processor cores 116-1, 116-2, 116-3, 116-4,........., 116-m are sent out to the marching-register file 22 so that data are transferred bi-directionally between the marching-register file 22 and the processor cores 116-1, 116-2, 116-3, 116-4,........., 116-m. Furthermore, the data stored in the marching-register file 22 are sent out to the marching-cache memory 21 so that data are transferred bi-directionally between the marching-cache memory 21 and the marching-register file 22. In case of instructions movement, there is no flow along the opposite direction of the information to be processed.
As illustrated in Fig. 61, the ALU 112 in the computer system of the third embodiment may include a plurality of arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m, and as represented by cross-directional arrows, the information can moves from the marching-cache memory 21 to the marching-vector register file 22v, and from the marching-vector register file 22v to the arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m. The resultant data of the processing in the arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m are sent out to the marching-vector register file 22v so that data are transferred bi-directionally between the marching-vector register file 22v and the arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m. Furthermore, the data stored in the marching-vector register file 22v are sent out to the marching-cache memory 21 so that data are transferred bi-directionally between the marching-cache memory 21 and the marching-vector register file 22v. In case of instructions movement, there is no flow along the opposite direction of the information to be processed.
As illustrated in Fig. 62(b), the ALU 112 in the computer system of the third embodiment may include a plurality of arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m, and a plurality of marching cache memories 21-1, 21-2, 21-3, 21-4,........., 21-m are electrically connected to the marching main memory 31. Here, a first marching-vector register file 22v-1 is connected to the first marching-cache memory 21-1, and a first arithmetic pipeline 117-1 is connected to the first marching-vector register file 22v-1. And, a second marching-vector register file 22v-2 is connected to the second marching-cache memory 21-2, and a second arithmetic pipelines 117-2 is connected to the second marching-vector register file 22v-2; a third marching-vector register file 22v-3 is connected to the third marching-cache memory 21-3, and a third arithmetic pipelines 117-3 is connected to the third marching-vector register file 22v-3;........; and a m-th marching-vector register file 22v-m is connected to the m-th marching-cache memory 21-m, and a m-th arithmetic pipelines 117-m is connected to the m-th marching-vector register file 22v-m.
The information moves from the marching main memory 31 to the marching cache memories 21-1, 21-2, 21-3, 21-4,........., 21-m in parallel, from marching cache memories 21-1, 21-2, 21-3, 21-4,........., 21-m to the marching- vector register files 22v-1, 22v-2, 22v-3, 22v-4,........., 22v-m in parallel, and from the marching- vector register files 22v-1, 22v-2, 22v-3, 22v-4,........., 22v-m to the arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m in parallel. The resultant data of the processing in the arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m are sent out to the marching- vector register files 22v-1, 22v-2, 22v-3, 22v-4,........., 22v-m so that data are transferred bi-directionally between the marching- vector register files 22v-1, 22v-2, 22v-3, 22v-4,........., 22v-m and the arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m. Furthermore, the data stored in the marching- vector register files 22v-1, 22v-2, 22v-3, 22v-4,........., 22v-m are sent out to the marching cache memories 21-1, 21-2, 21-3, 21-4,........., 21-m so that data are transferred bi-directionally between the marching cache memories 21-1, 21-2, 21-3, 21-4,........., 21-m and the marching- vector register files 22v-1, 22v-2, 22v-3, 22v-4,........., 22v-m, and the data stored in the marching cache memories 21-1, 21-2, 21-3, 21-4,........., 21-m are sent out to the marching main memory 31 so that data are transferred bi-directionally between the marching main memory 31 and the marching cache memories 21-1, 21-2, 21-3, 21-4,........., 21-m. In case of instructions movement, there is no flow along the opposite direction of the information to be processed.
On the contrary, as illustrated Fig. 62(a), in the ALU 112 of the conventional computer system including a plurality of arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m, a plurality of conventional cache memories 321-1, 321-2, 321-3, 321-4,........., 321-m are electrically connected to the conventional main memory 331 through wires and/or buses which implement von Neumann bottleneck 325. Then, information moves from the conventional main memory 331 to the conventional cache memories 321-1, 321-2, 321-3, 321-4,........., 321-m in parallel through von Neumann bottleneck 325, from conventional cache memories 321-1, 321-2, 321-3, 321-4,........., 321-m to the conventional-vector register files (RFs) 322v-1, 322v-2, 322v-3, 322v-4,........., 322v-m in parallel, and from the conventional- vector register files 322v-1, 322v-2, 322v-3, 322v-4,........., 322v-m to the arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m in parallel.
In the computer system of the third embodiment illustrated in Fig. 62(b), there are no buses consisting of the data bus and address bus because the whole system has no global wires even in any data exchange between the arithmetic pipelines 117-1, 117-2, 117-3, 117-4,........., 117-m and the marching main memory 31, while the wires or the buses implement the bottleneck in the conventional computer system as illustrated in Fig. 62(a). As there are no global wires, which generate time delay and stray capacitances between these wires, the computer system illustrated in Fig. 62(b) can achieve much higher processing speed and lower power consumption.
--FOURTH EMBODIMENT--
As illustrated in Fig. 63, a computer system of a fourth embodiment encompasses a conventional main memory 31s, a mother marching main memory 31-0 connected to the conventional main memory 31s, and a plurality of processing units 12-1, 12-2, 12-3,........, configured to communicate with mother marching main memory 31-0 so as to implement a high performance computing (HPC) system, which can be used for graphics processing unit (GPU)-based general-purpose computing. Although the illustration is omitted, the HPC system of the fourth embodiment further includes a control unit 111 having a clock generator 113 configured to generate a clock signal, and a field programmable gate array (FPGA) configured to switch-control operations of the plurality of processing units 12-1, 12-2, 12-3,.........., optimizing the flow of crunching calculations by running parallel, constructing to help manage and organize bandwidth consumption. FPGA is, in essence, a computer chip that can rewire itself for a given task. FPGA can be programmed with hardware description languages such as VHDL or Verilog.
The first processing unit 12-1 encompasses a first branched-marching main memory 31-1, a plurality of first marching cache memories 21-11, 21-12,........., 21-1p electrically connected respectively to the first branched-marching main memory 31-1, a plurality of first marching- vector register files 22v-11, 22v-12,........., 22v-1p electrically connected respectively to the first marching cache memories 21-11, 21-12,........., 21-1p, a plurality of first arithmetic pipelines 117-11, 117-12,........., 117-1p electrically connected respectively to the first marching- vector register files 22v-11, 22v-12,........., 22v-1p.
Similar to the configurations illustrated in Figs. 3-24, 25(a), 25(b), 26 and 45-51 etc., because each of the mother marching main memory 31-0, the first branched-marching main memory 31-1, the first marching cache memories 21-11, 21-12,........., 21-1p, and the first marching- vector register files 22v-11, 22v-12,........., 22v-1p encompasses an array of memory units, input terminals of the array and output terminals of the array, configured to store information in each of memory units and to transfer synchronously with the clock signal, step by step, from a side of input terminals toward the output terminals.
Because the operations of the mother marching main memory 31-0, the first branched-marching main memory 31-1, the first marching cache memories 21-11, 21-12,........., 21-1p, and the first marching- vector register files 22v-11, 22v-12,........., 22v-1p are controlled by FPGA, the information moves from the mother marching main memory 31-0 to the first branched-marching main memory 31-1, from the first branched-marching main memory 31-1 to the first marching cache memories 21-11, 21-12,........., 21-1p in parallel, from first marching cache memories 21-11, 21-12,........., 21-1p to the first marching- vector register files 22v-11, 22v-12,........., 22v-1p in parallel, and from the first marching- vector register files 22v-11, 22v-12,........., 22v-1p to the first arithmetic pipelines 117-11, 117-12, ........., 117-1p in parallel. The resultant data of the processing in the first arithmetic pipelines 117-11, 117-12,........., 117-1p are sent out to the first marching- vector register files 22v-11, 22v-12,........., 22v-1p so that data are transferred bi-directionally between the first marching- vector register files 22v-11, 22v-12,........., 22v-1p and the first arithmetic pipelines 117-11, 117-12, ........., 117-1p. Furthermore, the data stored in the first marching- vector register files 22v-11, 22v-12, ........., 22v-1p are sent out to the first marching cache memories 21-11, 21-12,........., 21-1p so that data are transferred bi-directionally between the first marching cache memories 21-11, 21-12, ........., 21-1p and the first marching- vector register files 22v-11, 22v-12, ........., 22v-1p, and the data stored in the first marching cache memories 21-11, 21-12, ........., 21-1p are sent out to the first branched-marching main memory 31-1 so that data are transferred bi-directionally between the first branched-marching main memory 31-1 and the first marching cache memories 21-11, 21-12, ........., 21-1p. However, the FPGA controls the movement of instructions such that there is no flow along the opposite direction of the information to be processed in the first processing unit 12-1.
The second processing unit 12-2 encompasses a second branched-marching main memory 31-2, a plurality of second marching cache memories 21-21, 21-22,........., 21-2p electrically connected respectively to the second branched-marching main memory 31-2, a plurality of second marching- vector register files 22v-21, 22v-22,........., 22v-2q electrically connected respectively to the second marching cache memories 21-21, 21-22,........., 21-2p, a plurality of second arithmetic pipelines 117-21, 117-22,........., 117-2p electrically connected respectively to the second marching- vector register files 22v-21, 22v-22,........., 22v-2q. Similar to the first processing unit 12-1, each of the mother marching main memory 31-0, the second branched-marching main memory 31-2, the second marching cache memories 21-21, 21-22,........., 21-2p, and the second marching- vector register files 22v-21, 22v-22,........., 22v-2p encompasses an array of memory units, input terminals of the array and output terminals of the array, configured to store information in each of memory units and to transfer synchronously with the clock signal, step by step, from a side of input terminals toward the output terminals. Because the operations of the mother marching main memory 31-0, the second branched-marching main memory 31-2, the second marching cache memories 21-21, 21-22,........., 21-2p, and the second marching- vector register files 22v-21, 22v-22,........., 22v-2p are controlled by the FPGA, the information moves from the mother marching main memory 31-0 to the second branched-marching main memory 31-2, from the second branched-marching main memory 31-2 to the second marching cache memories 21-21, 21-22,........., 21-2q in parallel, from second marching cache memories 21-21, 21-22,........., 21-2q to the second marching- vector register files 22v-21, 22v-22,........., 22v-2q in parallel, and from the second marching- vector register files 22v-21, 22v-22,........., 22v-2q to the second arithmetic pipelines 117-21, 117-22, ........., 117-2q in parallel. The resultant data of the processing in the second arithmetic pipelines 117-21, 117-22,........., 117-2q are sent out to the second marching- vector register files 22v-21, 22v-22,........., 22v-2q so that data are transferred bi-directionally between the second marching- vector register files 22v-21, 22v-22,........., 22v-2q and the second arithmetic pipelines 117-21, 117-22, ........., 117-2q. Furthermore, the data stored in the second marching- vector register files 22v-21, 22v-22, ........., 22v-2q are sent out to the second marching cache memories 21-21, 21-22,........., 21-2q so that data are transferred bi-directionally between the second marching cache memories 21-21, 21-22, ........., 21-2q and the second marching- vector register files 22v-21, 22v-22, ........., 22v-2q, and the data stored in the second marching cache memories 21-21, 21-22, ........., 21-2q are sent out to the second branched-marching main memory 31-2 so that data are transferred bi-directionally between the second branched-marching main memory 31-2 and the second marching cache memories 21-21, 21-22, ........., 21-2q. However, the FPGA controls the movement of instructions such that there is no flow along the opposite direction of the information to be processed in the second processing unit 12-2.
For example, vector instructions generated from loops in a source program are transferred from the mother marching main memory 31-0 to the first processing unit 12-1, the second processing unit 12-2, the third processing unit 12-3, ..........in parallel, so that parallel processing of these vector instructions can be executed by arithmetic pipelines117-11, 117-12, ........., 117-1p, 117-21, 117-22, ........., 117-2q, .......... in each of the first processing unit 12-1, the second processing unit 12-2, the third processing unit 12-3,........
Although the current FPGA-controlled HPC system requires a large amount of wiring resources, which generate time delay and stray capacitances between these wires, contributing the bottleneck, in the HPC system of the fourth embodiment illustrated in Fig. 63, because there are no buses such as data bus and address bus for any data exchange between the first marching-vector register files 22v-11, 22v-12,........., 22v-1p and the first arithmetic pipelines 117-11, 117-12, ........., 117-1p, between the first marching cache memories 21-11, 21-12, ........., 21-1p and the first marching-vector register files 22v-11, 22v-12, ........., 22v-1p, between the first branched-marching main memory 31-1 and the first marching cache memories 21-11, 21-12, ........., 21-1p, between the second marching-vector register files 22v-21, 22v-22,........., 22v-2q and the second arithmetic pipelines 117-21, 117-22, ........., 117-2q, between the second marching cache memories 21-21, 21-22, ........., 21-2q and the second marching-vector register files 22v-21, 22v-22, ........., 22v-2q, between the second branched-marching main memory 31-2 and the second marching cache memories 21-21, 21-22, ........., 21-2q, between the mother marching main memory 31-0 and the first branched-marching main memory 31-1, and between the mother marching main memory 31-0 and the second branched-marching main memory 31-2, the FPGA-controlled HPC system illustrated in Fig. 63 can achieve much higher processing speed and lower power consumption than the current FPGA-controlled HPC system. By increasing the number of processing units 12-1, 12-2, 12-3,........., the FPGA-controlled HPC system pertaining to the fourth embodiment can execute, for example, thousands of threads or more simultaneously at very high speed, enabling high computational throughput across large amounts of data.
--FIFTH EMBODIMENT--
As illustrated in Fig. 64, a computer system pertaining to a fifth embodiment of the present invention encompasses a processor 11, a stack of marching- register files 22-1, 22-2, 22-3, ........, implementing a three-dimensional marching-register file connected to the processor 11, a stack of marching- cache memories 21-1, 21-2, 21-3, ........, implementing a three-dimensional marching-cache memory connected to the three-dimensional marching-register file (22-1, 22-2, 22-3, ........), and a stack of marching main memories 31-1, 31-2, 31-3, ........., implementing a three-dimensional marching main memory connected to the three-dimensional marching-cache (21-1, 21-2, 21-3, ........). The processor 11 includes a control unit 111 having a clock generator 113 configured to generate a clock signal, an arithmetic logic unit (ALU) 112 configured to execute arithmetic and logic operations synchronized with the clock signal.
In the three-dimensional marching-register file (22-1, 22-2, 22-3, ........), a first marching-register file 22-1 includes a first marching-instruction register file 22a-1 connected to the control unit 111 and a first marching-data register file 22b-1 connected to the ALU 112, a second marching-register file 22-2 includes a second marching-instruction register file connected to the control unit 111 and a second marching-data register file connected to the ALU 112, a third marching-register file 22-3 includes a third marching-instruction register file connected to the control unit 111 and a third marching-data register file connected to the ALU 112, and,......... In the three-dimensional marching-cache (21-1, 21-2, 21-3, ........), the first marching-cache memory 21-1 includes a first marching-instruction cache memory 21a-1 and a first marching-data cache memory21b-1, the second marching-cache memory 21-2 includes a second marching-instruction cache memory and a second marching-data cache memory, the third marching-cache memory 21-3 includes a third marching-instruction cache memory and a third marching-data cache memory, and.........
Although the illustration is omitted, very similar to the marching main memory 31 illustrated in Figs. 45-51, each of the marching main memories 31-1, 31-2, 31-3, ........, has a two-dimensional array of memory units each having a unit of information, input terminals of the main memory array and output terminals of the main memory array, each of the marching main memories 31-1, 31-2, 31-3, ........, stores the information in each of memory units and to transfer synchronously with the clock signal, step by step, toward the output terminals of the main memory array, so as to provide the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) with the stored information actively and sequentially, each of the marching-cache memories 21-1, 21-2, 21-3, ........, has a two-dimensional array of cache memory units, cache input terminals of the marching-cache array configured to receive the stored information from the three-dimensional marching main memory (31-1, 31-2, 31-3, ........), and cache output terminals of the marching-cache array, each of the marching-cache memories 21-1, 21-2, 21-3, ........, stores the information in each of cache memory units and to transfer, synchronously with the clock signal, step by step, the information to an adjacent cache memory unit, so as to provide actively and sequentially the stored information to the three-dimensional marching-register file (22-1, 22-2, 22-3, ........), and each of the marching-register files 22-1, 22-2, 22-3, ........, has a two-dimensional array of register units each having a unit of information, input terminals of the register array configured to receive the stored information from the three-dimensional marching-cache (21-1, 21-2, 21-3, ........), and output terminals of the register array, each of the marching-register files 22-1, 22-2, 22-3, ........, stores the information in each of register units and to transfer synchronously with the clock signal, step by step, toward the output terminals of the register array, so as to provide the processor 11 with the stored information actively and sequentially so that the processor 11 can execute the arithmetic and logic operations with the stored information.
Each of the marching main memories 31-1, 31-2, 31-3, ........, is implemented by the two-dimensional array of memory units delineated at a surface of a semiconductor chip, and a plurality of the semiconductor chips are stacked vertically as illustrated in 27A, sandwiching heat dissipating plates 58m-1, 58m-2, 58m-3, ......... between the plurality of the semiconductor chips so as to implement the three-dimensional marching main memory (31-1, 31-2, 31-3, .........). It is preferable that the heat dissipating plates 58m-1, 58m-2, 58m-3, ........., are made of materials having high thermal conductivity such as diamond. Similarly, each of the marching- cache memories 21-1, 21-2, 21-3, ........, is implemented by the two-dimensional array of memory units delineated at a surface of a semiconductor chip, and a plurality of the semiconductor chips are stacked vertically as illustrated in 27B, sandwiching heat dissipating plates 58c-1, 58c-2, 58c-3, ........., between the plurality of the semiconductor chips so as to implement the three-dimensional marching-cache (21-1, 21-2, 21-3, ........), and each of the marching- register files 22-1, 22-2, 22-3,........, is implemented by the two-dimensional array of memory units delineated at a surface of a semiconductor chip, and a plurality of the semiconductor chips are stacked vertically as illustrated in 27C, sandwiching heat dissipating plates 58r-1, 58r-2, 58r-3, ........., between the plurality of the semiconductor chips so as to implement the three-dimensional marching-register file (22-1, 22-2, 22-3, ........). It is preferable that the heat dissipating plates 58c-1, 58c-2, 58c-3, ........, 58r-1, 58r-2, 58r-3, ........., are made of materials having high thermal conductivity such as diamond. Because there are no interconnects inside the surfaces of the semiconductor chips in the three-dimensional configuration illustrated in Figs. 65(a)-(c) and 66, it is easy to insert the heat dissipating plates 58c-1, 58c-2, 58c-3, ........., 58c-1, 58c-2, 58c-3, ........, 58r-1, 58r-2, 58r-3, ........., between the semiconductor chips, the configuration illustrated in Figs. 65(a)-(c) and 66 is expandable to stacking structures with any number of the semiconductor chips. In the conventional architecture, basically there is a limit of the number of stacked semiconductor chips in terms of thermal issues when the conventional semiconductor chips are directly stacked. In the computer system of the fifth embodiment, the sandwich structure illustrated in Figs. 65(a)-(c) and 66 is suitable for establishing the thermal flow from active computing semiconductor chips through the heat dissipating plates 58c-1, 58c-2, 58c-3, ........., 58c-1, 58c-2, 58c-3, ........, 58r-1, 58r-2, 58r-3, ........., to outside the system more effectively. Therefore, in the computer system of the fifth embodiment, these semiconductor chips can be stacked proportionally to the scale of the system, and as illustrated in Figs. 65(a)-(c) and 66, because a plurality of the semiconductor chips merging the marching main memories 31-1, 31-2, 31-3, ........, the marching- cache memories 21-1, 21-2, 21-3, ........, and the marching- register files 22-1, 22-2, 22-3,........, could easily be stacked to implement the three-dimensional configuration, a scalable computer systems can be easily organized, thereby keeping the temperature of the system cooler.
Although the illustration is omitted, the three-dimensional marching main memory (31-1, 31-2, 31-3, .........) and the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) are electrically connected by a plurality of joint members, the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) and the three-dimensional marching-register file (22-1, 22-2, 22-3, ........) are electrically connected by a plurality of joint members, and the three-dimensional marching-register file (22-1, 22-2, 22-3, ........) and processor 11 are electrically connected by another plurality of joint members.
The resultant data of the processing in the ALU 112 are sent out to the three-dimensional marching-register file (22-1, 22-2, 22-3, ........) through the joint members so that data are transferred bi-directionally between the three-dimensional marching-register file (22-1, 22-2, 22-3, ........) and the ALU 112. Furthermore, the data stored in the three-dimensional marching-register file (22-1, 22-2, 22-3, ........) are sent out to the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) through the joint members so that data are transferred bi-directionally between the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) and the three-dimensional marching-register file (22-1, 22-2, 22-3, ........). Furthermore, the data stored in the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) are sent out to the three-dimensional marching main memory (31-1, 31-2, 31-3, .........) through the joint members so that data are transferred bi-directionally between the three-dimensional marching main memory (31-1, 31-2, 31-3, .........) and the three-dimensional marching-cache (21-1, 21-2, 21-3, ........).
On the contrary, there is only one way of instruction-flow from the three-dimensional marching main memory (31-1, 31-2, 31-3, .........) to the three-dimensional marching-cache (21-1, 21-2, 21-3, ........), from the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) to the three-dimensional marching-register file (22-1, 22-2, 22-3, ........), and from the three-dimensional marching-register file (22-1, 22-2, 22-3, ........) to the control unit 111. For example, vector instructions generated from loops in a source program are transferred from the three-dimensional marching main memory (31-1, 31-2, 31-3, .........) to the control unit 111 through the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) through the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) and the three-dimensional marching-register file (22-1, 22-2, 22-3, ........) so that each of these vector instructions can be executed by arithmetic pipelines in the control unit 111.
In the computer system of the fifth embodiment illustrated in Fig. 64, there are no buses such as the data bus and address bus in any data exchange between the three-dimensional marching main memory (31-1, 31-2, 31-3, .........) and the three-dimensional marching-cache (21-1, 21-2, 21-3, ........), between the three-dimensional marching-cache (21-1, 21-2, 21-3, ........) and the three-dimensional marching-register file (22-1, 22-2, 22-3, ........), and between the three-dimensional marching-register file (22-1, 22-2, 22-3, ........) and the processor 11, while the wires or the buses implement the bottleneck in the conventional computer system. As there are no global wires, which generate time delay and stray capacitances between these wires, the computer system of the fifth embodiment can achieve much higher processing speed and lower power consumption than the conventional computer system, keeping the temperature of the computer system at lower temperature than the conventional computer system so as to establish "a cool computer", by employing the heat dissipating plates 58c-1, 58c-2, 58c-3, ........., 58c-1, 58c-2, 58c-3, ........, 58r-1, 58r-2, 58r-3, ........., which are made of materials having high thermal conductivity such as diamond and disposed between the semiconductor chips. The cool computer pertaining to the fifth embodiment is different from existing computers because the cool computer is purposely architected and designed with an average of 30% less energy consumption and 10000% less size to obtain 100 times higher speed, for example.
Since other functions, configurations, way of operation of the computer system pertaining to the fifth embodiment are substantially similar to the functions, configurations, way of operation already explained in the first to third embodiments, overlapping or redundant description may be omitted.
(MISCELLANEOUS THREE-DIMENSIONAL CONFIGURATIONS)
The three-dimensional configurations illustrated in Figs. 64, 65(a), 65(b) and 65(c) are mere examples, and there are various ways and combinations how to implement three-dimensional configurations so as to facilitate the organization of a scalable computer system.
For example, as illustrated in Fig. 66, a first chip (top chip) merging a plurality of arithmetic pipelines 117 and a plurality of marching-register files 22, a second chip (middle chip) merging a marching-cache memory 21 and a third chip (bottom chip) merging a marching main memory 31 can be stacked vertically. Each of the arithmetic pipelines 117 may include a vector-processing unit, and each of the marching-register files 22 may include marching-vector registers. Between the first and second chips, a plurality of joint members 55a are inserted, and between the second and third chips, a plurality of joint members 55b are inserted. For example, each of joint members 55a and 55b may be implemented by an electrical conductive bump such as a solder ball, a gold (Au) bump, a silver (Ag) bump, a copper (Cu) bump, a nickel-gold (Ni-Au) alloy bump or a nickel-gold-indium (Ni-Au-In) alloy bump. Although the illustration is omitted, heat-dissipating plates can be inserted between the first and second chips and between the second and third chips so as to achieve "cool chips", similar to the configuration illustrated in Figs. 65(a)-(c) and 66.
Alternatively, as illustrated in Figs. 67 and 68, a first three-dimensional (3D)-stack embracing a first top chip, a first middle chip and first bottom chip and a second 3D-stack embracing a second top chip, a second middle chip and second bottom chip may be disposed two dimensionally on a same substrate or a same circuit board so as to implement a parallel computing with multiple processors, in which the first 3D-stack and the second 3D-stack are connected by bridges 59a and 59b.
In the first 3D-stack, a first top chip merging a plurality of first arithmetic pipelines 117-1 and a plurality of first marching-register files 22-1, a first middle chip merging a first marching-cache memory 21-1 and a first bottom chip merging a first marching main memory 31-1 are 3D-stacked vertically. Each of the first arithmetic pipelines 117-1 may include a vector-processing unit, and each of the first marching-cache files 22-1 may include marching-vector registers. Between the first top and first middle chips, a plurality of joint members 55a-1 are inserted, and between the first middle and first bottom chips, a plurality of joint members 55b-1 are inserted. For example, each of joint members 55a-1 and 55b-1 may be implemented by an electrical conductive bump such as a solder ball, a gold (Au) bump, a silver (Ag) bump, a copper (Cu) bump, a nickel-gold (Ni-Au) alloy bump or a nickel-gold-indium (Ni-Au-In) alloy bump. Similarly, in the second 3D-stack, a second top chip merging a plurality of second arithmetic pipelines 117-2 and a plurality of second marching-register files 22-2, a second middle chip merging a second marching-cache memory 21-2 and a second bottom chip merging a second marching main memory 31-2 are 3D-stacked vertically. Each of the second arithmetic pipelines 117-2 may include a vector-processing unit, and each of the second marching-cache files 22-2 may include marching-vector registers. Between the second top and second middle chips, a plurality of joint members 55a-2 are inserted, and between the second middle and second bottom chips, a plurality of joint members 55b-2 are inserted. For example, each of joint members 55a-2 and 55b-2 may be implemented by an electrical conductive bump such as a solder ball, a gold (Au) bump, a silver (Ag) bump, a copper (Cu) bump, a nickel-gold (Ni-Au) alloy bump or a nickel-gold-indium (Ni-Au-In) alloy bump. Although the illustration is omitted, heat-dissipating plates can be inserted between the first top and first middle chips, between the first middle and first bottom chips, between the second top and second middle chips and between the second middle and second bottom chips similar to the configuration illustrated in Figs. 65(a)-(c) and 66 so as to achieve "cool chips".
Similar to the computer system of the fourth embodiment, a field programmable gate array (FPGA) may switch-control the operations of the first and second 3D-stacks, by traveling a thread or chaining of vector processing on the first arithmetic pipelines 117-1 and the second arithmetic pipelines 117-2, implementing a HPC system, which can be used for GPU-based general-purpose computing.
Still alternatively, as illustrated in Fig. 69, a first chip (top chip) merging a plurality of arithmetic pipelines 117, a second chip merging and a plurality of marching-register files 22, a third chip merging a marching-cache memory 21, a fourth chip merging a first marching main memory 31-1, a fifth chip merging a marching main memory 31-2 and a sixth chip (bottom chip) merging a third marching main memory 31-3 can be stacked vertically. Each of the arithmetic pipelines 117 may include a vector-processing unit, and each of the marching-register files 22 may include marching-vector registers so that vector instructions generated from loops in a source program can be executed in the vector-processing unit. A first heat dissipating plate 58-1 is inserted between the first and second chips, a second heat dissipating plate 58-2 is between the second and third chips, a third heat dissipating plate 58-1 is between the third and fourth chips, a fourth heat dissipating plate 58-4 is between the fourth and fifth chips, and a fifth heat dissipating plate 58-5 is between the fifth and sixth chips so as to achieve "cool chips". Because there are no interconnects inside the surfaces of these cool chips in the three-dimensional configuration illustrated in Fig. 69, it is easy to insert the heat dissipating plates 58-1, 58-2, 58-3, 58-4, 58-5 such as diamond chips between these six chips alternately.
The cool-chip-configuration illustrated in Fig. 69 is not limited to a case of six chips, but expandable to three-dimensional stacking structures with any number of chips, because the sandwich structure illustrated in Fig. 69 is suitable for establishing the thermal flow from active computing chips through the heat dissipating plates 58-1, 58-2, 58-3, 58-4, 58-5 to outside of the cool computer system more effectively. Therefore, the number of cool chips in the computer system of the fifth embodiment can be increased in proportion to the scale of the computer system.
Figs. 70-72 show various examples of the three-dimensional (3D) stack, implementing a part of fundamental cores of the computer systems according to the fifth embodiment of the present invention, each of the 3D-stacks includes cooling technology with heat dissipating plate 58 such as diamond plate inserted between the semiconductor memory chips 3a and 3b, in which at least one of the marching memory classified in the marching memory family is merged, the term of "the marching memory family" includes the marching-instruction register file 22a and the marching-data register file 22b connected to the ALU 112 explained in the second embodiment, and the marching-instruction cache memory 21a and the marching-data cache memory 21b explained in the third embodiment, in addition to the marching main memory 31 explained in the first embodiment of the present invention.
That is, as illustrated in Fig. 70, a 3D-stack, implementing a part of the fundamental core of the computer system pertaining to the fifth embodiment of the present invention, embraces a first semiconductor memory chip 3a merging at least one of the marching memory in the marching memory family, a heat dissipating plate 58 disposed under the first semiconductor memory chip 3a, a second semiconductor memory chip 3b disposed under the heat dissipating plate 58, which merges at least one of the marching memory in the marching memory family, and a processor 11 disposed at a side of the heat dissipating plate 58. Here, in Fig. 70, because the location of the processor 11 is illustrated as one of the examples, the processor 11 can be disposed at any required or appropriate site in the configuration of the 3D-stack or external of the 3D-stack, depending on the design choice of the 3D-stack. For example, the processor 11 can be allocated at the same horizontal level of the first semiconductor memory chip 3a or at the level of the second semiconductor memory chip 3b. The marching memory merged on the first semiconductor memory chip 3a and the marching memory merged on the second semiconductor memory chip 3b stores program instruction, respectively. In the 3D configuration illustrated in Fig. 70, in which the first semiconductor memory chip 3a, the heat dissipating plate 58 and the second semiconductor memory chip 3b are stacked vertically, a first control path is provided between the first semiconductor memory chip 3a and the processor 11, and a second control path is provided between the second semiconductor memory chip 3b and the processor 11 so as to facilitate the execution of the control processing with the processor 11. A further data-path may be provided between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b so as to facilitate direct communication of the program instruction between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b.
And, as illustrated in Fig. 71, another 3D-stack, implementing a part of the fundamental core of the computer system pertaining to the fifth embodiment of the present invention, embraces a first semiconductor memory chip 3a merging at least one of the marching memory in the marching memory family, a heat dissipating plate 58 disposed under the first semiconductor memory chip 3a, a second semiconductor memory chip 3b disposed under the heat dissipating plate 58, which merges at least one of the marching memory in the marching memory family, and a ALU 112 disposed at a side of the heat dissipating plate 58. The location of the ALU 112 is not limited to the site illustrated in Fig. 71, and the ALU 112 can be disposed at any required or appropriate site in the configuration of the 3D-stack or external of the 3D-stack, such as a site allocated at the same horizontal level of the first semiconductor memory chip 3a or at the level of the second semiconductor memory chip 3b, depending on the design choice of the 3D-stack. The marching memory merged on the first semiconductor memory chip 3a and the marching memory merged on the second semiconductor memory chip 3b read/write scalar data, respectively. In the 3D configuration illustrated in Fig. 71, in which the first semiconductor memory chip 3a, the heat dissipating plate 58 and the second semiconductor memory chip 3b are stacked vertically, a first data-path is provided between the first semiconductor memory chip 3a and the ALU 112, and a second data-path is provided between the second semiconductor memory chip 3b and the ALU 112 so as to facilitate the execution of the scalar data processing with the ALU 112. A further data-path may be provided between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b so as to facilitate direct communication of the scalar data between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b.
Further, as illustrated in Fig. 72, a still another 3D-stack, implementing a part of the fundamental core of the computer system pertaining to the fifth embodiment of the present invention, embraces a first semiconductor memory chip 3a merging at least one of the marching memory in the marching memory family, a heat dissipating plate 58 disposed under the first semiconductor memory chip 3a, a second semiconductor memory chip 3b disposed under the heat dissipating plate 58, which merges at least one of the marching memory in the marching memory family, and an arithmetic pipelines 117 disposed at a side of the heat dissipating plate 58. Similar to the topologies illustrated in Figs. 62 and 63, the location of the arithmetic pipelines 117 is not limited to the site illustrated in Fig. 72, and the arithmetic pipelines 117 can be disposed at any required or appropriate site. The marching memory merged on the first semiconductor memory chip 3a and the marching memory merged on the second semiconductor memory chip 3b read/write vector/streaming data, respectively. In the 3D configuration illustrated in Fig. 72, in which the first semiconductor memory chip 3a, the heat dissipating plate 58 and the second semiconductor memory chip 3b are stacked vertically, a first data-path is provided between the first semiconductor memory chip 3a and the arithmetic pipelines 117, and a second data-path is provided between the second semiconductor memory chip 3b and the arithmetic pipelines 117 so as to facilitate the execution of the vector/streaming data processing with the arithmetic pipelines 117. A further data-path may be provided between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b so as to facilitate direct communication of the vector/streaming data between the first semiconductor memory chip 3a and the second semiconductor memory chip 3b.
As illustrated in Fig. 73, the 3D hybrid computer system according to the fifth embodiment encompasses a first left chip (top left chip) 3p-1 merging at least one of the marching memory in the marching memory family, a second left chip 3p-2 merging at least one of the marching memory in the marching memory family, a third left chip 3p-3 merging at least one of the marching memory in the marching memory family, a fourth left chip 3p-4 merging at least one of the marching memory in the marching memory family, a fifth left chip 3p-5 merging at least one of the marching memory in the marching memory family and a sixth left chip (bottom left chip) 3p-6 merging at least one of the marching memory in the marching memory family, which are stacked vertically. A first left heat dissipating plate 58a-1 is inserted between the first left chip 3p-1 and second left chip 3p-2, a second left heat dissipating plate 58a-2 is inserted between the second left chip 3p-2 and third left chip 3p-3, a third left heat dissipating plate 58a-1 is inserted between the third left chip 3p-3 and fourth left chip 3p-4, a fourth left heat dissipating plate 58a-4 is inserted between the fourth left chip 3p-4 and fifth left chip 3p-5, and a fifth left heat dissipating plate 58a-5 is inserted between the fifth left chip 3p-5 and sixth left chip 3p-6 so as to achieve "cool left chips".
And, a first right chip (top right chip) 3q-1 merging at least one of the marching memory in the marching memory family, a second right chip 3q-2 merging at least one of the marching memory in the marching memory family, a third right chip 3q-3 merging at least one of the marching memory in the marching memory family, a fourth right chip 3q-4 merging at least one of the marching memory in the marching memory family, a fifth right chip 3q-5 merging at least one of the marching memory in the marching memory family and a sixth right chip (bottom right chip) 3q-6 merging at least one of the marching memory in the marching memory family are stacked vertically. A first right heat dissipating plate 58b-1 is inserted between the first right chip 3q-1 and second right chip 3q-2, a second right heat dissipating plate 58b-2 is inserted between the second right chip 3q-2 and third right chip 3q-3, a third right heat dissipating plate 58b-1 is inserted between the third right chip 3q-3 and fourth right chip 3q-4, a fourth right heat dissipating plate 58b-4 is inserted between the fourth right chip 3q-4 and fifth right chip 3q-5, and a fifth right heat dissipating plate 58b-5 is inserted between the fifth right chip 3q-5 and sixth right chip 3q-6 so as to achieve "cool right chips".
A first processing unit 11a is provided between the first left heat dissipating plate 58a-1 and the first right heat dissipating plate 58b-1, a second processing unit 11b is provided between the third left heat dissipating plate 58a-3 and the third right heat dissipating plate 58b-3, and a third processing unit 11c is provided between the fifth left heat dissipating plate 58a-5 and the fifth right heat dissipating plate 58b-5, and pipelined ALUs are respectively included in the processing units11a, 11b, 11c.
The scalar data-path and control path are established between the first left chip 3p-1 and second left chip 3p-2, the scalar data-path and control path are established between the second left chip 3p-2 and third left chip 3p-3, the scalar data-path and control path are established between the third left chip 3p-3 and fourth left chip 3p-4, the scalar data-path and control path are established between the fourth left chip 3p-4 and fifth left chip 3p-5, and the scalar data-path and control path are established between the fifth left chip 3p-5 and sixth left chip 3p-6, the scalar data-path and control path are established between the first right chip 3q-1 and second right chip 3q-2, the scalar data-path and control path are established between the second right chip 3q-2 and third right chip 3q-3, the scalar data-path and control path are established between the third right chip 3q-3 and fourth right chip 3q-4, the scalar data-path and control path are established between the fourth right chip 3q-4 and fifth right chip 3q-5, and the scalar data-path and control path are established between the fifth right chip 3q-5 and sixth right chip 3q-6. The 3D computer system illustrated in Fig. 73 can execute not only scalar data but also vector/streaming data through the combination of scalar data-path and control path for the computer system.
Because there are no interconnects inside the surfaces of these cool chips in the 3D configuration illustrated in Fig. 73, it is easy to insert the heat dissipating plates 58a-1, 58a-2, 58a-3, 58a-4, 58a-5 such as diamond left chips between these six left chips alternately, and to insert the heat dissipating plates 58b-1, 58b-2, 58b-3, 58b-4, 58b-5 such as diamond right chips between these six right chips alternately.
--OTHER EMBODIMENTS--
Various modifications will become possible for those skilled in the art after receiving the teaching of the present disclosure without departing from the scope thereof.
In Figs. 4, 5, 6, 8, 11, 13, 16-20, 22, 25 and 32, although nMOS transistors are assigned respectively as the transfer-transistors and the reset-transistors in the transistor-level representations of the bit-level cells, because the illustration in Figs. 4, 5, 6, 8, 11, 13, 16-20, 22, 25 and 32 are mere schematic examples, pMOS transistors can be used as the transfer-transistors and the reset-transistors, if the opposite polarity of the clock signal is employed. Furthermore, MIS transistors, or insulated-gate transistors having gate-insulation films made of silicon nitride film, ONO film, SrO film, Al2O3 film, MgO film, Y2O3 film, HfO2 film, ZrO2 film, Ta2O5 film, Bi2O3 film, HfAlO film, and others can be used for the transfer-transistors and the reset-transistors.
There are several different forms of parallel computing such as bit-level, instruction level, data, and task parallelism, and as well known as "Flynn's taxonomy", programs and computers are classified as to whether they were operating using a single set or multiple sets of instructions, whether or not those instructions were using a single or multiple sets of data.
For example, as illustrated in Fig. 74, a marching memory, which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments can implement a bit-level parallel processing of scalar/vector data in a multiple-instruction-single-data (MISD) architecture, by which many independent instruction streams provided vertically to a first processor 11-1, a second processor 11-2, a third processor 11-3, a fourth processor 11-4, .........., in parallel operate on a single horizontal stream of data at a time with a systolic array of processors 11-1, 11-2, 11-3, 11-4 .
Alternatively, as illustrated in Fig. 75,arithmetic-level parallelism can be established by a marching memory, which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments, with a single-instruction-multiple-data (SIMD) architecture, by which a single instruction stream is provided to a first processor 11-1, a second processor 11-2, a third processor 11-3, and a fourth processor 11-4, so that the single instruction stream can operate on multiple vertical streams of data at a time with the array of processors 11-1, 11-2, 11-3, 11-4 .
Still alternatively, as illustrated in Fig. 76, a marching memory, which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments, can implement a typical chaining in vector processing with a first processor 11-1, a second processor 11-2, a third processor 11-3, and a fourth processor 11-4 to which a first instruction I1, a second instruction I2, a third instruction I3, and a fourth instruction I4 are provided respectively.
Furthermore, as illustrated in Fig. 77, a marching memory, which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments, can implement a parallel processing of a single horizontal stream of scalar/vector data in a MISD architecture with a first processor 11-1, a second processor 11-2, a third processor 11-3, and a fourth processor 11-4.
Furthermore, as illustrated in Fig. 78, a marching memory, which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments, can implement a parallel processing of a single horizontal stream of scalar/vector data in a MISD architecture with a first processor 11-1 configured execute multiplication, a second processor 11-2 configured execute addition, a third processor 11-3 configured execute multiplication, and a fourth processor 11-4 configured execute addition.
Furthermore, as to process-level parallelism, a single-thread-stream and single-data-stream architecture, a single-thread-stream and multiple-data-streams architecture, a multiple-thread-streams and single-data-stream architecture, and a multiple-thread-streams and multiple-data-streams architecture can be achieved with a marching memory, which may include the marching-register file, the marching-cache memory, and the marching main memory already discussed in the first to fifth embodiments.
Referring to Fig. 41, we have compared the speed/capability of the worst case of the existing memory for scalar data or program instructions with that of the marching main memory 31, an the hatched portion of Fig. 41(b) has illustrated schematically the speed/capability of the marching main memory 31, implemented by one hundred of memory units U1, U2, U3,........., U100, and compared with the speed/capability of the worst case of the existing memory shown in Fig. 41(a). In the worst case, we have discussed that we can read out 99 memory units of the marching main memory 31, but they are not available due to a scalar program's requirement. However, with "a complex marching memory" scheme illustrated in Fig. 79(b), we can improve the speed/capability of the marching memory for scalar data or program instructions, in which a plurality of marching memory blocks MM11, MM12, MM13, ........, MM16;MM21, MM22, MM23, ........, MM26;MM31, MM32, MM33, ........, MM36;..................;MM51, MM52, MM53, ........, MM56 are deployed two dimensionally and merged on a single semiconductor chip 66, and a specified marching memory block MMij (i=1 to 5; j=1 to 6) can be randomly accessed from the plurality of marching memory blocks MM11, MM12, MM13, ........, MM16;MM21, MM22, MM23, ........, MM26;MM31, MM32, MM33, ........, MM36;..................;MM51, MM52, MM53, ........, MM56, similar to the random-access methodology employed in a dynamic random access memory (DRAM) architecture.
As illustrated in Fig. 79(a), in a conventional DRAM, a memory array area 661, peripheral circuitry for a row decoder 662, peripheral circuitry for sense amplifiers 663, and peripheral circuitry for a column decoder 664 are merged on a single semiconductor chip 66. A plurality of memory cells are arranged in an array of rows and columns in the memory array area 661 so that each row of memory cells share a common 'word' line, while each column of cells share a common 'bit' line, and the location of a memory cell in the array is determined as the intersection of its 'word' and 'bit' lines. During a 'write' operation, the data to be written ('1' or '0') is provided at the 'bit' line from the column decoder 664, while the 'word line' is asserted from the row decoder 662, so as to turn on the access transistor of the memory cell and allows the capacitor to charge up or discharge, depending on the state of the bit line. During a 'read' operation, the 'word' line is also asserted from the row decoder 662, which turns on the access transistor. The enabled transistor allows the voltage on the capacitor to be read by a sense amplifier 663 through the 'bit' line. The sense amplifier 663 can determine whether a '1' or '0' is stored in the memory cell by comparing the sensed capacitor voltage against a threshold.
Although 6*5=30 marching memory blocks MM11, MM12, MM13, ........, MM16;MM21, MM22, MM23, ........, MM26;MM31, MM32, MM33, ........, MM36;..................;MM51, MM52, MM53, ........, MM56 are deployed on the semiconductor chip 66 for avoiding cluttering up the drawings, the illustration is schematic, and actually one thousand marching memory blocks MMij (i=1 to s; j=1 to t; and s*t =1000) with 256kbits capacity can be deployed on the same semiconductor chip 66, if unidirectional marching memories are arrayed, and if 512 Mbits DRAM chip technology is assumed as the manufacturing technology of the complex marching memory scheme illustrated in Fig. 79(b). That is, as an area for monolithically integrating each of the marching memory blocks MMij having 256kbits capacity on a semiconductor chip 66, an equivalent area for 512kbits DRAM block is required, because, as illustrated in Figs. 4-6, each of unidirectional marching memory blocks is implemented by a bit-level cell consisting of two transistors and one capacitor, while the DRAM memory cell consists of only a single transistor that is paired with a capacitor. Alternatively, as to an array of bidirectional marching memories, one thousand marching memory blocks MMij with 128kbits capacity can be deployed on the same semiconductor chip 66 for the 512 Mbits DRAM chip. That is, as an area for monolithically integrating each of the marching memory blocks MMij having 128kbits capacity, an equivalent area for the 512kbits DRAM block is required, because, as illustrated in Fig. 32, a bidirectional marching memory block is implemented by a bit-level cell consisting of four transistors and two capacitors, while the DRAM memory cell consists of only a single transistor and a single capacitor. If one Gbit DRAM chip technology is assumed, one thousand bidirectional marching memory blocks MMij with 256kbits capacity can be deployed on the same DRAM chip 66 so as to implement a 256 Mbits marching memory chip.
Therefore, one thousand of marching memory blocks MMij, or one thousand of marching memory cores can be monolithically integrated on the semiconductor chip 66, as illustrated in Fig. 79(b). A single marching memory block MMij, or "a single marching memory core" may encompass, for example, one thousand of marching memory columns, or one thousand of marching memory units Uk (k=1 to 1000), which have 1000*32 byte-based addresses, where one memory unit Uk has 256 bit-level cells. That is, with a complex marching memory chip having one thousand of marching memory blocks MMij, one thousand of marching memory units Uk (k=1 to 1000) of 32 bytes (or 256 bits) are allowable to access within one cycle of the conventional DRAM access.
Figs. 80 (a) and (b) illustrate an example of a single 256kbits marching memory blocks MMij, which has one thousand of marching memory units Uk (k=1 to n; n=1000) of 32 bytes (or 256 bits). In the complex marching memory schemes, as illustrated in Fig. 80 (b), position indexes Tk (k=1 to 1000) or position tags are labeled, respectively, on each of the marching memory units Uk as the token of each of the columns Uk that means the first address of the column bytes. In Fig. 80 (b), the clock period (the clock cycle time) TAU(Greek-letter)clock, illustrated in Fig. 7C, is recited as "the marching memory's memory cycle tM".
In the light of above discussions stated in the first to fifth embodiments, because we can use so large speed difference between the conventional DRAM and the marching memory, as illustrated in Fig. 80 (c), with the conventional DRAM's memory cycle tC for writing in or reading out the content of the conventional DRAM's one memory element, we can estimate as:

tC = 1000tM ............(1) .
Therefore, with the complex marching memory scheme illustrated in Fig. 79(b), we can improve the speed/capability of the marching memory for scalar data or program instructions, by which a specified marching memory block MMij (i=1 to s; j=1 to t; and s*t =1000) can be randomly accessed from one thousand of marching memory blocks, similar to the random-access methodology employed in the DRAM architecture.
Although the illustration is omitted in Fig. 79(b), the plurality of 256 kbits marching memory blocks MMij may be arranged in the two dimensional matrix form on the semiconductor chip 66 so that each horizontal array of the marching memory blocks MMij share a common horizontal-core line, while each vertical array of marching memory blocks MMij share a common vertical-core line, and a location of a specified marching memory block MMij in the two dimensional matrix is accessed as the intersection of its horizontal-core line and vertical-core line, with double-level hierarchy. In the double-level hierarchy, every column of a subject marching memory block MMij is accessed with an address at the lower level, and every marching memory block MMij are directly accessed with its own address for each marching memory block MMij at the higher level.
Alternatively, a virtual storage mechanism can be used for the access methodology of the complex marching memory. In the virtual storage mechanism, the marching memory blocks MMij (i=1 to s; j=1 to t), or the marching memory cores to be used are scheduled just like pages in a virtual memory. The scheduling is decided at compilation run if any. For example, in the multi-level caches architecture, the multi-level caches generally operate by checking the smallest Level1 (L1) cache first, and if the L1 cache hits, the processor proceeds at high speed. If the smaller L1 cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked. For the access methodology of the complex marching memory, the L2 cache-like memory can support the virtual indexing mechanism, because the size of L2 cache corresponds to the size of the complex marching memory, and the size of a marching memory block MMij corresponds to the size of smallest L1 cache.
Then, because the achievement of the complex marching memory encompassing one thousand of marching memory blocks, or one thousand of cores is relatively easy as stated above, and in the complex marching memory, the access of any column is basically available at the CPU's clock rate, even at the worst case, the speed of the complex marching memory keeps the speed of the conventional DRAM.
Furthermore, a plurality of complex marching memory chips, or a plurality of macro complex marching memory blocks MMM1, MMM2, ........, MMMk, can be mounted on a first circuit board having external-connection pins P1, P2, ........, Ps-1, Ps ("s" may be any integer determined by unit of byte, or word size) so as to implement a multichip module of the complex marching memory, or "a complex marching memory module" as illustrated in Fig. 81, although the illustration of the circuit board is omitted. In the hybrid assembly of macro complex marching memory blocks MMM1, MMM2, ........, MMMk, the first macro complex marching memory block MMM1 may monolithically integrate one thousand of marching memory blocks MM111, MM121, MM131, ........, MM1(t-1)1, MM1t1; MM211,........,; MM(s-1)11..................; MMs11, MMs21, , ........, MMs(t-1)1, MMst1 on a first semiconductor chip, the second macro complex marching memory block MMM2 may monolithically integrate one thousand of marching memory blocks MM112, MM122, MM132, ........, MM1(t-1)2, MM1t2; MM212,........,; MM(s-1)12..................; MMs12, MMs22, , ........, MMs(t-1)2, MMst2 on a second semiconductor chip, ............., and the k-th macro complex marching memory block MMMk may monolithically integrate one thousand of marching memory blocks MM11k, MM12k, MM13k, ........, MM1(t-1)k, MM1tk; MM21k,........,; MM(s-1)1k..................; MMs1k, MMs2k, , ........, MMs(t-1)k, MMstk on a k-th semiconductor chip, for example. And the first complex marching memory module hybridly assembling the macro complex marching memory blocks MMM1, MMM2, ........, MMMk can be connected to a second complex marching memory module hybridly assembling the macro complex marching memory block MMMk+1 and others on a second circuit board through the external-connection pins P1, P2, ........, Ps-1, Ps. Here, the macro complex marching memory block MMMk+1 may monolithically integrate one thousand of marching memory blocks MM11(k+1), MM12(k+1), MM13(k+1), ........, MM1(t-1)(k+1), MM1t(k+1); MM21(k+1),........,; MM(s-1)1(k+1)..................; MMs1(k+1), MMs2(k+1), , ........, MMs(t-1)(k+1), MMst(k+1) on a semiconductor chip, for example. In addition, if we implement dual lines of the hybrid assembly of macro complex marching memory blocks, we can establish a dual in-line module of complex marching memory.
In the configuration of the complex marching memory modules illustrated in Fig. 81, by using triple-level hierarchy, every column of a subject marching memory block MMiju (u=1 to k; "k" is any integer greater than or equal to two) is accessed with an address at the lowest level, every marching memory block MMiju are accessed with its own address for each marching memory block MMiju at the middle level, and every macro marching memory block MMMu (u=1 to k) may be directly accessed with its own address at the highest level, which facilitate accessing to a remote column of the marching memory for scalar data or program instructions.
Alternatively, very similar to DRAM rank architecture encompassing a set of DRAM chips that operate in lockstep fashion to command in a memory, in which the DRAM chips inside the same rank are accessed simultaneously, the plurality of macro complex marching memory blocks MMM1, MMM2, ........, MMMk, can be random accessed simultaneously, and with the above-mentioned double-level hierarchy methodology, every column of a subject marching memory block MMiju (u=1 to k) is accessed with an address at the lower level, and every marching memory block MMiju are directly accessed with its own address for each marching memory block MMiju at the higher level.
Still alternatively, a virtual storage mechanism can be used for the access methodology of the complex marching memory, in which the marching memory cores to be used are scheduled just like pages in the virtual memory. The scheduling can be decided at compilation run if any.
Because the data transfer between the marching main memory 31 and the processor 11 is achieved at a very high speed, the cash memory employed in the conventional computer system is not required, and the cash memory can be omitted. However, similar to the organization illustrated in Fig. 56, a marching-data cache memory 21b implemented by the complex marching memory scheme can by used with more smaller size of marching memory blocks, or more smaller size of marching memory cores. For example, a plurality of marching memory cores with 1 kbits, 512 bits, or 256 bits capacity can be deployed on a semiconductor chip so as to implement the marching-data cache memory 21b, while a plurality of marching memory cores MMij (i=1 to s; j=1 to t; and s*t =1000) with 256 kbits capacity are deployed on the semiconductor chip 66 so as to implement marching main memory 31. And, for example, with the virtual storage mechanism, each of the marching memory cores can be randomly accessed.
Alternatively, one-dimensional array of marching memory blocks, or marching memory cores, being deployed vertically on a semiconductor chip, can implement a marching cache memory. Here, each of the marching memory cores includes a single horizontal array of memory units, and the number of memory units deployed horizontally is smaller than the number of memory units employed in the marching memory cores for the marching main memory 31. And, for example, with the virtual storage mechanism, each of the marching memory cores can be randomly accessed.
Furthermore, a plurality of marching memory blocks, or a plurality of marching memory cores can be deployed vertically on a semiconductor chip, each of the marching memory blocks consist of a single memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size so as to implement a marching register file by the complex marching memory scheme.
In the ultimate case of scaling the marching memory core, it can be considered that a plurality of marching memory cores with minimized size, or one bit capacity can be deployed on a semiconductor chip by the complex marching memory scheme, which may correspond to the structure of conventional SRAM. Therefore, marching-data register file 22b implemented by one-bit marching memory cores can be connected to the ALU 112, similar to the organizations illustrated in Figs. 55 and 56. Then, very similar to the operation of SRAM, each of the one-bit marching memory cores can be randomly accessed.
Thus, the present invention of course includes various embodiments and modifications and the like, which are not detailed above. Therefore, the scope of the present invention will be defined in the following claims.
The instant invention can be applied to industrial fields of various computer systems, which require higher speed and lower power consumption.

Claims (41)

  1. A marching memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising:
    a transfer-transistor having a first main-electrode connected to a clock signal supply line through a first delay element and a control-electrode connected to an output terminal of a first neighboring bit-level cell disposed at input side of the array of the memory units, through a second delay element;
    a reset-transistor having a first main-electrode connected to a second main-electrode of the transfer-transistor, a control-electrode connected to the clock signal supply line, and a second main-electrode connected to the ground potential; and
    a capacitor configured to store the information of the bit-level cell, connected in parallel with the reset-transistor,
    wherein an output node connecting the second main-electrode of the transfer-transistor and the first main-electrode of the reset-transistor serves as an output terminal of the bit-level cell, and the output terminal of the bit-level cell delivers the signal stored in the capacitor to a second neighboring bit-level cell disposed at output side of the array of the memory units.
  2. The marching memory of claim 1, wherein in each of the bit-level cells, when a clock signal is applied to the control-electrode of the reset-transistor, the reset-transistor discharges the signal charge already stored in the capacitor.
  3. The marching memory of claim 1, wherein in each of the bit-level cells, after the signal charge stored in the capacitor has been discharged, the transfer-transistor becomes active delayed by a first delay time determined by the first delay element, and when the signal stored in the first neighboring bit-level cell is fed to the control-electrode of the transfer-transistor, the transfer-transistor transfers the signal stored in the first neighboring bit-level cell, further delayed by a second delay time determined by the second delay element to the capacitor.
  4. The marching memory of claim 3, wherein the first delay time is a quarter of clock period of the clock signal, and the second delay time is a half of the clock period.
  5. The marching memory of claim 1, wherein in the transfer-transistor, the control-electrode controls a current flowing between the first main-electrode and the second main-electrode electro-statically.
  6. The marching memory of claim 1, wherein in the reset-transistor, the control-electrode controls a current flowing between the first main-electrode and the second main-electrode electro-statically.
  7. The marching memory of claim 1, wherein the transfer-transistor and the reset-transistor are made of an insulated-gate transistor, respectively, including a MOS transistor, a MIS transistor and a high electron mobility transistor.
  8. The marching memory of claim 7, wherein the transfer-transistor and the reset-transistor are made of an nMOS transistor, respectively, and the clock signal of positively high-level is applied to the control electrode of the nMOS transistor so as to achieve a conductive state.
  9. The marching memory of claim 7, wherein the transfer-transistor and the reset-transistor are made of a pMOS transistor, respectively, and the clock signal of negatively high-level is applied to the control electrode of the nMOS transistor so as to achieve a conductive state.
  10. A bidirectional-marching memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising:
    a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element;
    a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the first clock signal supply line, and a second main-electrode connected to the ground potential;
    a backward transfer-transistor having a first main-electrode connected to a second clock signal supply line through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element;
    a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the second clock signal supply line, and a second main-electrode connected to the ground potential;
    a forward capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor; and
    a backward capacitor configured to store the information of the bit-level cell, connected in parallel with the backward reset-transistor,
    wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the forward capacitor to a second neighboring bit-level cell disposed at another side of the array of the memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the backward capacitor to the first neighboring bit-level cell.
  11. The bidirectional-marching memory of claim 10, wherein in each of the bit-level cells, when a first clock signal from a first clock signal supply line is applied to the control-electrode of the forward reset-transistor, the forward reset-transistor discharges the signal charge already stored in the forward capacitor, and a second clock signal from a second clock signal supply line is applied to the control-electrode of the backward reset-transistor, the backward reset-transistor discharges the signal charge already stored in the backward capacitor.
  12. The bidirectional-marching memory of claim 10, wherein in each of the bit-level cells, after the signal charge stored in the forward capacitor has been discharged, the forward transfer-transistor becomes active delayed by a first forward delay time determined by the first forward delay element, and when the signal stored in the first neighboring bit-level cell is fed to the control-electrode of the forward transfer-transistor, the forward transfer-transistor transfers the signal stored in the first neighboring bit-level cell, further delayed by a second forward delay time determined by the second forward delay element to the forward capacitor, and, after the signal charge stored in the backward capacitor has been discharged, the backward transfer-transistor becomes active delayed by a first backward delay time determined by the first backward delay element, and when the signal stored in the first neighboring bit-level cell is fed to the control-electrode of the backward transfer-transistor, the backward transfer-transistor transfers the signal stored in the first neighboring bit-level cell, further delayed by a second backward delay time determined by the second backward delay element to the backward capacitor.
  13. The bidirectional-marching memory of claim 12, wherein the first forward delay time and the first backward delay time are a quarter of clock period of the first clock signal, respectively, and the second forward delay time and the second backward delay time are a half of the clock period, respectively.
  14. The bidirectional-marching memory of claim 10, wherein in the forward transfer-transistor and the backward transfer-transistor, the control-electrode controls a current flowing between the first main-electrode and the second main-electrode electro-statically.
  15. The bidirectional-marching memory of claim 10, wherein in the forward reset-transistor and backward reset-transistor, the control-electrode controls a current flowing between the first main-electrode and the second main-electrode electro-statically.
  16. The bidirectional-marching memory of claim 10, wherein the forward transfer-transistor, the forward reset-transistor, the backward transfer-transistor and the backward reset-transistor are made of an insulated-gate transistor, respectively, including a MOS transistor, a MIS transistor and a high electron mobility transistor.
  17. The bidirectional-marching memory of claim 16, wherein the forward transfer-transistor, the forward reset-transistor, the backward transfer-transistor and the backward reset-transistor are made of an nMOS transistor, respectively, and the first clock signal of positively high-level is applied to the control electrode of the nMOS transistor so as to achieve a conductive state.
  18. The bidirectional-marching memory of claim 16, wherein the forward transfer-transistor, the forward reset-transistor, the backward transfer-transistor and the backward reset-transistor are made of a pMOS transistor, respectively, and the first clock signal of negatively high-level is applied to the control electrode of the nMOS transistor so as to achieve a conductive state.
  19. A bidirectional-marching memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising:
    a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element;
    a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the first clock signal supply line, and a second main-electrode connected to the ground potential;
    a backward transfer-transistor having a first main-electrode connected to a second clock signal supply line through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element;
    a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the second clock signal supply line, and a second main-electrode connected to the ground potential; and
    a common capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor, and the backward reset-transistor,
    wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the common capacitor to a second neighboring bit-level cell disposed at another side of the array of the memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the common capacitor to the first neighboring bit-level cell.
  20. The bidirectional-marching memory of claim 19, wherein in each of the bit-level cells, when a first clock signal from a first clock signal supply line is applied to the control-electrode of the forward reset-transistor, the forward reset-transistor discharges the signal charge already stored in the common capacitor, and a second clock signal from a second clock signal supply line is applied to the control-electrode of the backward reset-transistor, the backward reset-transistor discharges the signal charge already stored in the common capacitor.
  21. The bidirectional-marching memory of claim 19, wherein in each of the bit-level cells, after the signal charge stored in the common capacitor has been discharged, the forward transfer-transistor becomes active delayed by a first forward delay time determined by the first forward delay element, and when the signal stored in the first neighboring bit-level cell is fed to the control-electrode of the forward transfer-transistor, the forward transfer-transistor transfers the signal stored in the first neighboring bit-level cell, further delayed by a second forward delay time determined by the second forward delay element to the common capacitor, and, after the signal charge stored in the common capacitor has been discharged, the backward transfer-transistor becomes active delayed by a first backward delay time determined by the first backward delay element, and when the signal stored in the first neighboring bit-level cell is fed to the control-electrode of the backward transfer-transistor, the backward transfer-transistor transfers the signal stored in the first neighboring bit-level cell, further delayed by a second backward delay time determined by the second backward delay element to the common capacitor.
  22. A complex marching memory comprising a plurality of marching memory blocks being deployed spatially, each of the marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size,
    wherein each of the memory units transfers synchronously with a clock signal, step by step, toward an output side of corresponding marching memory block from an input side of the corresponding marching memory block, and each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
  23. The complex marching memory of claim 22, wherein each of the bit-level cells comprises:
    a transfer-transistor having a first main-electrode connected to a clock signal supply line, configured to supply the clock signal, through a first delay element and a control-electrode connected to an output terminal of a first neighboring bit-level cell disposed at input side of the array of memory units, through a second delay element;
    a reset-transistor having a first main-electrode connected to a second main-electrode of the transfer-transistor, a control-electrode connected to the clock signal supply line, and a second main-electrode connected to the ground potential; and
    a capacitor configured to store the information of the bit-level cell, connected in parallel with the reset-transistor,
    wherein an output node connecting the second main-electrode of the transfer-transistor and the first main-electrode of the reset-transistor serves as an output terminal of the bit-level cell, and the output terminal of the bit-level cell delivers the signal stored in the capacitor to a second neighboring bit-level cell disposed at output side of the array of memory units.
  24. The complex marching memory of claim 23, wherein in each of the bit-level cells, when the clock signal is applied to the control-electrode of the reset-transistor, the reset-transistor discharges the signal charge already stored in the capacitor.
  25. The complex marching memory of claim 23, wherein in each of the bit-level cells, after the signal charge stored in the capacitor has been discharged, the transfer-transistor becomes active delayed by a first delay time determined by the first delay element, and when the signal stored in the first neighboring bit-level cell is fed to the control-electrode of the transfer-transistor, the transfer-transistor transfers the signal stored in the first neighboring bit-level cell, further delayed by a second delay time determined by the second delay element to the capacitor.
  26. A complex marching memory comprising a plurality of marching memory blocks being deployed spatially, each of the marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size,
    wherein each of the memory units transfers synchronously with a first clock signal, step by step, toward a first edge side of corresponding marching memory block from a second edge side of the corresponding marching memory block opposing to the first edge side, and further, each of the memory units transfers synchronously with a second clock signal, step by step, toward the second edge side from the first edge side, and each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
  27. The complex marching memory of claim 26, wherein each of the bit-level cells comprises:
    a forward transfer-transistor having a first main-electrode connected to a first clock signal supply line, configured to supply the first clock signal, through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of memory units, through a second forward delay element;
    a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the first clock signal supply line, and a second main-electrode connected to the ground potential;
    a backward transfer-transistor having a first main-electrode connected to a second clock signal supply line, configured to supply the second clock signal, through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element;
    a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the second clock signal supply line, and a second main-electrode connected to the ground potential; and
    a common capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor, and the backward reset-transistor,
    wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the common capacitor to a second neighboring bit-level cell disposed at another side of the array of memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the common capacitor to the first neighboring bit-level cell.
  28. The complex marching memory of claim 27, wherein in each of the bit-level cells, when the first clock signal is applied to the control-electrode of the forward reset-transistor, the forward reset-transistor discharges the signal charge already stored in the common capacitor, and the second clock signal is applied to the control-electrode of the backward reset-transistor, the backward reset-transistor discharges the signal charge already stored in the common capacitor.
  29. The complex marching memory of claim 27, wherein in each of the bit-level cells, after the signal charge stored in the common capacitor has been discharged, the forward transfer-transistor becomes active delayed by a first forward delay time determined by the first forward delay element, and when the signal stored in the first neighboring bit-level cell is fed to the control-electrode of the forward transfer-transistor, the forward transfer-transistor transfers the signal stored in the first neighboring bit-level cell, further delayed by a second forward delay time determined by the second forward delay element to the common capacitor, and, after the signal charge stored in the common capacitor has been discharged, the backward transfer-transistor becomes active delayed by a first backward delay time determined by the first backward delay element, and when the signal stored in the first neighboring bit-level cell is fed to the control-electrode of the backward transfer-transistor, the backward transfer-transistor transfers the signal stored in the first neighboring bit-level cell, further delayed by a second backward delay time determined by the second backward delay element to the common capacitor.
  30. A computer system comprising a processor and a marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the marching main memory to the processor, the marching main memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising:
    a transfer-transistor having a first main-electrode connected to a clock signal supply line through a first delay element and a control-electrode connected to an output terminal of a first neighboring bit-level cell disposed at input side of the array of the memory units, through a second delay element;
    a reset-transistor having a first main-electrode connected to a second main-electrode of the transfer-transistor, a control-electrode connected to the clock signal supply line, and a second main-electrode connected to the ground potential; and
    a capacitor configured to store the information of the bit-level cell, connected in parallel with the reset-transistor,
    wherein an output node connecting the second main-electrode of the transfer-transistor and the first main-electrode of the reset-transistor serves as an output terminal of the bit-level cell, and the output terminal of the bit-level cell delivers the signal stored in the capacitor to a second neighboring bit-level cell disposed at output side of the array of the memory units.
  31. A computer system comprising a processor and a bidirectional marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the bidirectional marching main memory to the processor, the bidirectional marching main memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising:
    a forward transfer-transistor having a first main-electrode connected to a forward clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element;
    a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the forward clock signal supply line, and a second main-electrode connected to the ground potential;
    a backward transfer-transistor having a first main-electrode connected to a backward clock signal supply line through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element;
    a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the backward clock signal supply line, and a second main-electrode connected to the ground potential; and
    a common capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor, and the backward reset-transistor,
    wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the common capacitor to a second neighboring bit-level cell disposed at another side of the array of the memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the common capacitor to the first neighboring bit-level cell.
  32. A computer system comprising a processor and a bidirectional marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the bidirectional marching main memory to the processor, the bidirectional marching main memory including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size, each of the bit-level cells comprising:
    a forward transfer-transistor having a first main-electrode connected to a forward clock signal supply line through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of the memory units, through a second forward delay element;
    a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the forward clock signal supply line, and a second main-electrode connected to the ground potential;
    a backward transfer-transistor having a first main-electrode connected to a backward clock signal supply line through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element;
    a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the backward clock signal supply line, and a second main-electrode connected to the ground potential; and
    a common capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor, and the backward reset-transistor,
    wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the common capacitor to a second neighboring bit-level cell disposed at another side of the array of the memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the common capacitor to the first neighboring bit-level cell.
  33. A computer system comprising a processor and a marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the marching main memory to the processor, the marching main memory comprising a plurality of marching memory blocks being deployed spatially, each of the marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells configured to store information of byte size or word size,
    wherein each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
  34. The computer system of claim 33, wherein each of the bit-level cells comprises:
    a transfer-transistor having a first main-electrode connected to a clock signal supply line through a first delay element and a control-electrode connected to an output terminal of a first neighboring bit-level cell disposed at input side of the array of memory units, through a second delay element;
    a reset-transistor having a first main-electrode connected to a second main-electrode of the transfer-transistor, a control-electrode connected to the clock signal supply line, and a second main-electrode connected to the ground potential; and
    a capacitor configured to store the information of the bit-level cell, connected in parallel with the reset-transistor,
    wherein an output node connecting the second main-electrode of the transfer-transistor and the first main-electrode of the reset-transistor serves as an output terminal of the bit-level cell, and the output terminal of the bit-level cell delivers the signal stored in the capacitor to a second neighboring bit-level cell disposed at output side of the array of memory units.
  35. A computer system comprising a processor and a bidirectional marching main memory configured to provide the processor with stored information actively and sequentially so that the processor can execute arithmetic and logic operations with the stored information, in addition results of processing in the processor are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is only one way of instructions flow from the bidirectional marching main memory to the processor, the bidirectional marching main memory comprising a plurality of bidirectional marching memory blocks being deployed spatially, each of the bidirectional marching memory blocks including an array of memory units, each of the memory units having a sequence of bit-level cells so as to store information of byte size or word size,
    wherein each of the memory units transfers synchronously with a first clock signal, step by step, toward a first edge side of corresponding marching memory block from a second edge side of the corresponding marching memory block opposing to the first edge side, and further, each of the memory units transfers synchronously with a second clock signal, step by step, toward the second edge side from the first edge side, and each of the marching memory blocks is randomly accessed so that each of the memory units in a subject marching memory block can be randomly accessed.
  36. The computer system of claim 35, wherein each of the bit-level cells comprises:
    a forward transfer-transistor having a first main-electrode connected to a forward clock signal supply line, configured to supply the first clock signal, through a first forward delay element and a control-electrode connected to an forward output terminal of a first neighboring bit-level cell disposed at a one side of the array of memory units, through a second forward delay element;
    a forward reset-transistor having a first main-electrode connected to a second main-electrode of the forward transfer-transistor, a control-electrode connected to the forward clock signal supply line, and a second main-electrode connected to the ground potential;
    a backward transfer-transistor having a first main-electrode connected to a backward clock signal supply line, configured to supply the second clock signal, through a first backward delay element and a control-electrode connected to an backward output terminal of the second neighboring bit-level cell through a second backward delay element;
    a backward reset-transistor having a first main-electrode connected to a second main-electrode of the backward transfer-transistor, a control-electrode connected to the backward clock signal supply line, and a second main-electrode connected to the ground potential; and
    a common capacitor configured to store the information of the bit-level cell, connected in parallel with the forward reset-transistor, and the backward reset-transistor,
    wherein an output node connecting the second main-electrode of the forward transfer-transistor and the first main-electrode of the forward reset-transistor serves as an forward output terminal of the bit-level cell, the forward output terminal of the bit-level cell delivers the signal stored in the common capacitor to a second neighboring bit-level cell disposed at another side of the array of memory units, an output node connecting the second main-electrode of the backward transfer-transistor and the first main-electrode of the backward reset-transistor serves as a backward output terminal of the bit-level cell, and the backward output terminal of the bit-level cell delivers the signal stored in the common capacitor to the first neighboring bit-level cell.
  37. The computer system of claim 36, wherein in each of the bit-level cells, when the first clock signal is applied to the control-electrode of the forward reset-transistor, the forward reset-transistor discharges the signal charge already stored in the common capacitor, and the second clock signal is applied to the control-electrode of the backward reset-transistor, the backward reset-transistor discharges the signal charge already stored in the common capacitor.
  38. The computer system as in any of claims 30-32, 34 or 36, wherein the processor further includes a plurality of arithmetic pipelines configured to receive the stored information from the bidirectional marching main memory.
  39. The computer system as in any of claims 30-32, 34 or 36, further comprising a marching-cache memory having an array of cache memory units, located at locations corresponding to each for a unit of information, cache input terminals of the array configured to receive the stored information from the bidirectional marching main memory, and cache output terminals of the array, configured to store information in each of cache memory units and to transfer, synchronously with the clock signal, step by step, the information each to an adjacent cache memory unit, so as to provide actively and sequentially the stored information to the processor so that the arithmetic logic unit can execute the arithmetic and logic operations with the stored information, the results of the processing in the arithmetic logic unit are sent out to the bidirectional marching main memory, except that in case of instructions movement, there is not the opposite direction to the information flow to be processed..
  40. The computer system of claim 39, wherein the plurality of arithmetic pipelines includes either a plurality of vector processing units or a plurality of scalar function units.
  41. The computer system as in any of claims 30-32, 34 or 36, further comprising a marching-cache memory having an array of cache memory units, cache input terminals of the second array configured to receive the stored information from the bidirectional marching main memory, and cache output terminals of the second array, configured to store information in each of cache memory units and to transfer successively, step by step, the stored information in each of cache memory units to an adjacent cache memory unit, synchronized with the clock signal from the cache memory units adjacent to the cache input terminals toward the cache memory units adjacent to the cache output terminals, so as to provide actively and sequentially the stored information to the processor through the cache output terminals so that the processor cores can operate with the stored information.
PCT/JP2013/000760 2012-02-13 2013-02-13 A marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck WO2013121776A1 (en)

Priority Applications (9)

Application Number Priority Date Filing Date Title
EP21207151.8A EP3982366A1 (en) 2012-02-13 2013-02-13 A marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck
JP2014556207A JP6093379B2 (en) 2012-02-13 2013-02-13 Marching memory, interactive marching memory and computer system
EP13748879.7A EP2815403B1 (en) 2012-02-13 2013-02-13 A marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck
KR1020167026590A KR101747619B1 (en) 2012-02-13 2013-02-13 A marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck
EP18000889.8A EP3477645B1 (en) 2012-02-13 2013-02-13 A marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck
CN201380005030.5A CN104040635B (en) 2012-02-13 2013-02-13 There is no propulsion memory, two-way propulsion memory, compound propulsion memory and the computer system of memory bottleneck
KR1020147019139A KR101689939B1 (en) 2012-02-13 2013-02-13 A marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck
US14/450,705 US10573359B2 (en) 2012-02-13 2014-08-04 Marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck
US16/744,849 US11164612B2 (en) 2012-02-13 2020-01-16 Marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261597945P 2012-02-13 2012-02-13
US61/597,945 2012-02-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/450,705 Continuation US10573359B2 (en) 2012-02-13 2014-08-04 Marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck

Publications (1)

Publication Number Publication Date
WO2013121776A1 true WO2013121776A1 (en) 2013-08-22

Family

ID=48983909

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/000760 WO2013121776A1 (en) 2012-02-13 2013-02-13 A marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck

Country Status (7)

Country Link
US (2) US10573359B2 (en)
EP (3) EP3982366A1 (en)
JP (2) JP6093379B2 (en)
KR (2) KR101689939B1 (en)
CN (2) CN104040635B (en)
TW (2) TWI607454B (en)
WO (1) WO2013121776A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016186832A (en) * 2015-03-27 2016-10-27 株式会社ニコン Rank progression type storage device and calculator system

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9985791B2 (en) 2015-08-13 2018-05-29 Arizona Board Of Regents Acting For And On Behalf Of Northern Arizona University Physically unclonable function generating systems and related methods
US9971566B2 (en) 2015-08-13 2018-05-15 Arizona Board Of Regents Acting For And On Behalf Of Northern Arizona University Random number generating systems and related methods
US10275356B2 (en) * 2015-12-11 2019-04-30 Quanta Computer Inc. Component carrier with converter board
JP2017123208A (en) * 2016-01-06 2017-07-13 ルネサスエレクトロニクス株式会社 Semiconductor storage device
KR102294108B1 (en) 2018-01-23 2021-08-26 다다오 나카무라 Marching memory and computer systems
KR102603399B1 (en) * 2018-08-09 2023-11-17 삼성디스플레이 주식회사 Display apparatus and manufacturing the same
WO2020229620A1 (en) * 2019-05-16 2020-11-19 Xenergic Ab Shiftable memory and method of operating a shiftable memory
CN111344790B (en) * 2020-01-17 2021-01-29 长江存储科技有限责任公司 Advanced memory structure and apparatus
WO2023049132A1 (en) * 2021-09-21 2023-03-30 Monolithic 3D Inc. A 3d semiconductor device and structure with heat spreader

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011010445A1 (en) * 2009-07-21 2011-01-27 Tadao Nakamura A lower energy comsumption and high speed computer without the memory bottleneck

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4017741A (en) * 1975-11-13 1977-04-12 Rca Corporation Dynamic shift register cell
US4698788A (en) * 1985-07-01 1987-10-06 Motorola, Inc. Memory architecture with sub-arrays
US4745393A (en) 1985-09-25 1988-05-17 Hitachi, Ltd Analog-to-digital converter
JP2536567B2 (en) * 1987-12-17 1996-09-18 株式会社日立製作所 High-speed processing method of bidirectional inference
EP0463721A3 (en) * 1990-04-30 1993-06-16 Gennum Corporation Digital signal processing device
US5159616A (en) * 1990-10-25 1992-10-27 Digital Equipment Corporation CMOS shift register with complementary refresh pass gates and buffer
JPH05128898A (en) 1991-11-05 1993-05-25 Oki Electric Ind Co Ltd Semiconductor memory
DE4307177C2 (en) * 1993-03-08 1996-02-08 Lueder Ernst Circuit arrangement as part of a shift register for controlling chain or matrix-shaped switching elements
JP3272888B2 (en) * 1993-12-28 2002-04-08 株式会社東芝 Semiconductor storage device
JP3243920B2 (en) 1994-03-07 2002-01-07 株式会社日立製作所 Semiconductor device
US20050036363A1 (en) * 1996-05-24 2005-02-17 Jeng-Jye Shau High performance embedded semiconductor memory devices with multiple dimension first-level bit-lines
US5728960A (en) * 1996-07-10 1998-03-17 Sitrick; David H. Multi-dimensional transformation systems and display communication architecture for musical compositions
JP3280867B2 (en) * 1996-10-03 2002-05-13 シャープ株式会社 Semiconductor storage device
JPH11195766A (en) * 1997-10-31 1999-07-21 Mitsubishi Electric Corp Semiconductor integrated circuit device
EP1084496A4 (en) 1998-05-06 2002-07-24 Fed Corp Method and apparatus for sequential memory addressing
EP1252619A1 (en) * 2000-01-27 2002-10-30 David Sitrick Device for displaying music using a single or several linked workstations
EP1398786B1 (en) * 2002-09-12 2010-04-07 STMicroelectronics Asia Pacific Pte Ltd. Pseudo bidimensional randomly accessible memory
US7617356B2 (en) 2002-12-31 2009-11-10 Intel Corporation Refresh port for a dynamic memory
EP1606823B1 (en) * 2003-03-14 2009-11-18 Nxp B.V. Two-dimensional data memory
KR100585227B1 (en) 2004-03-12 2006-06-01 삼성전자주식회사 semiconductor stack package with improved heat dissipation property and memory module using the stack packages
JP2005285168A (en) 2004-03-29 2005-10-13 Alps Electric Co Ltd Shift register and liquid crystal driving circuit using the same
CN100502277C (en) * 2004-04-10 2009-06-17 鸿富锦精密工业(深圳)有限公司 Correctness checking system and method of data transmission
JP2006127460A (en) * 2004-06-09 2006-05-18 Renesas Technology Corp Semiconductor device, semiconductor signal processing apparatus and crossbar switch
JP4654389B2 (en) 2006-01-16 2011-03-16 株式会社ムサシノエンジニアリング Room temperature bonding method for diamond heat spreader and heat dissipation part of semiconductor device
US7493439B2 (en) * 2006-08-01 2009-02-17 International Business Machines Corporation Systems and methods for providing performance monitoring in a memory system
WO2008122919A1 (en) * 2007-04-05 2008-10-16 Nxp B.V. A memory cell, a memory array and a method of programming a memory cell
KR101414774B1 (en) 2007-08-29 2014-08-07 삼성전자주식회사 Multi-port semiconductor memory device
US8000156B2 (en) * 2008-10-24 2011-08-16 Arm Limited Memory device with propagation circuitry in each sub-array and method thereof
US8508460B2 (en) * 2009-12-15 2013-08-13 Sharp Kabushiki Kaisha Scanning signal line drive circuit and display device including the same
CN103907157B (en) * 2011-10-28 2017-10-17 慧与发展有限责任合伙企业 Enter the removable bit memory of every trade displacement

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011010445A1 (en) * 2009-07-21 2011-01-27 Tadao Nakamura A lower energy comsumption and high speed computer without the memory bottleneck

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TADAO NAKAMURA ET AL.: "Marching Memory: designing computers to avoid the Memory Bottleneck.", PROCEEDINGS OF THE SIXTH INTERNATIONAL WORKSHOP ON UNIQUE CHIPS AND SYSTEMS, 4 December 2010 (2010-12-04), pages 44 - 47, XP055122938, Retrieved from the Internet <URL:http://ispass.org/ucas6/ucas6.pdf> [retrieved on 20130304] *
TADAO NAKAMURA: "Marching Memory.", MARCHING MEMORY., 30 September 2011 (2011-09-30), XP055159254, Retrieved from the Internet <URL:http://www.eda-express.com/verify2011/pdf/Verify2011_04.pdf> [retrieved on 20130226] *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016186832A (en) * 2015-03-27 2016-10-27 株式会社ニコン Rank progression type storage device and calculator system

Also Published As

Publication number Publication date
CN104040635A (en) 2014-09-10
JP2015510216A (en) 2015-04-02
CN107093463A (en) 2017-08-25
CN104040635B (en) 2017-09-08
JP6093379B2 (en) 2017-03-15
TW201737251A (en) 2017-10-16
JP6218294B2 (en) 2017-10-25
US20140344544A1 (en) 2014-11-20
US11164612B2 (en) 2021-11-02
TWI607454B (en) 2017-12-01
KR20140102745A (en) 2014-08-22
EP2815403B1 (en) 2019-01-02
EP2815403A4 (en) 2015-12-23
EP3477645A1 (en) 2019-05-01
EP3982366A1 (en) 2022-04-13
CN107093463B (en) 2021-01-01
KR101689939B1 (en) 2017-01-02
TWI637396B (en) 2018-10-01
KR20160116040A (en) 2016-10-06
US10573359B2 (en) 2020-02-25
KR101747619B1 (en) 2017-06-16
EP3477645B1 (en) 2021-12-15
TW201337943A (en) 2013-09-16
US20200152247A1 (en) 2020-05-14
EP2815403A1 (en) 2014-12-24
JP2017062865A (en) 2017-03-30

Similar Documents

Publication Publication Date Title
US11164612B2 (en) Marching memory, a bidirectional marching memory, a complex marching memory and a computer system, without the memory bottleneck
EP2457155B1 (en) A lower energy comsumption and high speed computer without the memory bottleneck
Talati et al. mmpu—a real processing-in-memory architecture to combat the von neumann bottleneck
US20190385674A1 (en) Page buffer and memory device including the same
US11004484B2 (en) Page buffer and memory device including the same
US10867647B2 (en) Marching memory and computer system
Bottleneck mMPU—A Real Processing-in-Memory Architecture to Combat the von
JP2004327029A (en) Magnetic memory array

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13748879

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2013748879

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20147019139

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2014556207

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE