US20140013129A1 - Hybrid computing module - Google Patents
Hybrid computing module Download PDFInfo
- Publication number
- US20140013129A1 US20140013129A1 US13/917,601 US201313917601A US2014013129A1 US 20140013129 A1 US20140013129 A1 US 20140013129A1 US 201313917601 A US201313917601 A US 201313917601A US 2014013129 A1 US2014013129 A1 US 2014013129A1
- Authority
- US
- United States
- Prior art keywords
- memory
- stack
- computing module
- module
- hybrid computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000015654 memory Effects 0.000 claims abstract description 313
- 239000004065 semiconductor Substances 0.000 claims abstract description 108
- 238000012546 transfer Methods 0.000 claims abstract description 15
- 238000000034 method Methods 0.000 claims description 82
- 230000008569 process Effects 0.000 claims description 49
- 230000006870 function Effects 0.000 claims description 32
- 238000012545 processing Methods 0.000 claims description 29
- 239000000758 substrate Substances 0.000 claims description 14
- 238000004891 communication Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 6
- 230000003068 static effect Effects 0.000 claims description 4
- 238000013459 approach Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 22
- 238000013461 design Methods 0.000 description 18
- 239000000872 buffer Substances 0.000 description 17
- 239000000463 material Substances 0.000 description 14
- 238000004519 manufacturing process Methods 0.000 description 12
- 230000008901 benefit Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 9
- 150000001875 compounds Chemical class 0.000 description 9
- 230000008859 change Effects 0.000 description 8
- 230000000737 periodic effect Effects 0.000 description 8
- 230000002829 reductive effect Effects 0.000 description 7
- 230000001965 increasing effect Effects 0.000 description 6
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 5
- 230000010354 integration Effects 0.000 description 5
- 239000000919 ceramic Substances 0.000 description 4
- 239000007789 gas Substances 0.000 description 4
- 230000012010 growth Effects 0.000 description 4
- 238000001459 lithography Methods 0.000 description 4
- 230000006386 memory function Effects 0.000 description 4
- 239000011669 selenium Substances 0.000 description 4
- 229910052710 silicon Inorganic materials 0.000 description 4
- 239000010703 silicon Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 239000000969 carrier Substances 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 229910052751 metal Inorganic materials 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- BUGBHKTXTAQXES-UHFFFAOYSA-N Selenium Chemical compound [Se] BUGBHKTXTAQXES-UHFFFAOYSA-N 0.000 description 2
- NINIDFKCEFEMDL-UHFFFAOYSA-N Sulfur Chemical compound [S] NINIDFKCEFEMDL-UHFFFAOYSA-N 0.000 description 2
- ATJFFYVFTNAWJD-UHFFFAOYSA-N Tin Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 238000005234 chemical deposition Methods 0.000 description 2
- 239000011365 complex material Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000005669 field effect Effects 0.000 description 2
- 229910052732 germanium Inorganic materials 0.000 description 2
- GNPVGFCGXDBREM-UHFFFAOYSA-N germanium atom Chemical compound [Ge] GNPVGFCGXDBREM-UHFFFAOYSA-N 0.000 description 2
- 230000001976 improved effect Effects 0.000 description 2
- 239000007788 liquid Substances 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 229910052711 selenium Inorganic materials 0.000 description 2
- 229910052717 sulfur Inorganic materials 0.000 description 2
- 239000011593 sulfur Substances 0.000 description 2
- 229910000601 superalloy Inorganic materials 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 229910052714 tellurium Inorganic materials 0.000 description 2
- PORWMNRCUJJQNO-UHFFFAOYSA-N tellurium atom Chemical compound [Te] PORWMNRCUJJQNO-UHFFFAOYSA-N 0.000 description 2
- 230000005676 thermoelectric effect Effects 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 241000238876 Acari Species 0.000 description 1
- ZOXJGFHDIHLPTG-UHFFFAOYSA-N Boron Chemical compound [B] ZOXJGFHDIHLPTG-UHFFFAOYSA-N 0.000 description 1
- GYHNNYVSQQEPJS-UHFFFAOYSA-N Gallium Chemical compound [Ga] GYHNNYVSQQEPJS-UHFFFAOYSA-N 0.000 description 1
- 241000580063 Ipomopsis rubra Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 229910052782 aluminium Inorganic materials 0.000 description 1
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 1
- 229910052787 antimony Inorganic materials 0.000 description 1
- WATWJIUSRGPENY-UHFFFAOYSA-N antimony atom Chemical compound [Sb] WATWJIUSRGPENY-UHFFFAOYSA-N 0.000 description 1
- 229910052785 arsenic Inorganic materials 0.000 description 1
- RQNWIZPPADIBDY-UHFFFAOYSA-N arsenic atom Chemical compound [As] RQNWIZPPADIBDY-UHFFFAOYSA-N 0.000 description 1
- 230000035045 associative learning Effects 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 229910052797 bismuth Inorganic materials 0.000 description 1
- JCXGWMGPZLAOME-UHFFFAOYSA-N bismuth atom Chemical compound [Bi] JCXGWMGPZLAOME-UHFFFAOYSA-N 0.000 description 1
- 229910052796 boron Inorganic materials 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 229910052793 cadmium Inorganic materials 0.000 description 1
- BDOSMKKIYDKNTQ-UHFFFAOYSA-N cadmium atom Chemical compound [Cd] BDOSMKKIYDKNTQ-UHFFFAOYSA-N 0.000 description 1
- 229910010293 ceramic material Inorganic materials 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000010205 computational analysis Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000002939 deleterious effect Effects 0.000 description 1
- 239000003989 dielectric material Substances 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 229910052733 gallium Inorganic materials 0.000 description 1
- BHEPBYXIRTUNPN-UHFFFAOYSA-N hydridophosphorus(.) (triplet) Chemical compound [PH] BHEPBYXIRTUNPN-UHFFFAOYSA-N 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 229910052738 indium Inorganic materials 0.000 description 1
- APFVFJFRJDLVQX-UHFFFAOYSA-N indium atom Chemical compound [In] APFVFJFRJDLVQX-UHFFFAOYSA-N 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 239000012212 insulator Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000012705 liquid precursor Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- QSHDDOUJBYECFT-UHFFFAOYSA-N mercury Chemical compound [Hg] QSHDDOUJBYECFT-UHFFFAOYSA-N 0.000 description 1
- 229910052753 mercury Inorganic materials 0.000 description 1
- 229910001092 metal group alloy Inorganic materials 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000001465 metallisation Methods 0.000 description 1
- 150000002739 metals Chemical class 0.000 description 1
- 238000004377 microelectronic Methods 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000011224 oxide ceramic Substances 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000013404 process transfer Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000011343 solid material Substances 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- JBQYATWDVHIOAR-UHFFFAOYSA-N tellanylidenegermanium Chemical compound [Te]=[Ge] JBQYATWDVHIOAR-UHFFFAOYSA-N 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/28—Supervision thereof, e.g. detecting power-supply failure by out of limits supervision
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/324—Power saving characterised by the action undertaken by lowering clock frequency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0875—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1009—Address translation using page tables, e.g. page table structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
- G06F13/1673—Details of memory controller using buffers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
- G06F13/1689—Synchronisation and timing concerns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/24—Handling requests for interconnection or transfer for access to input/output bus using interrupt
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/36—Handling requests for interconnection or transfer for access to common bus or bus system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0625—Power saving in storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0685—Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1072—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers for memories with random access ports synchronised on clock signal pulse trains, e.g. synchronous memories, self timed memories
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L21/00—Processes or apparatus adapted for the manufacture or treatment of semiconductor or solid state devices or of parts thereof
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L21/00—Processes or apparatus adapted for the manufacture or treatment of semiconductor or solid state devices or of parts thereof
- H01L21/70—Manufacture or treatment of devices consisting of a plurality of solid state components formed in or on a common substrate or of parts thereof; Manufacture of integrated circuit devices or of parts thereof
- H01L21/71—Manufacture of specific parts of devices defined in group H01L21/70
- H01L21/76—Making of isolation regions between components
- H01L21/762—Dielectric regions, e.g. EPIC dielectric isolation, LOCOS; Trench refilling techniques, SOI technology, use of channel stoppers
- H01L21/76224—Dielectric regions, e.g. EPIC dielectric isolation, LOCOS; Trench refilling techniques, SOI technology, use of channel stoppers using trench refilling with dielectric materials
- H01L21/76229—Concurrent filling of a plurality of trenches having a different trench shape or dimension, e.g. rectangular and V-shaped trenches, wide and narrow trenches, shallow and deep trenches
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L21/00—Processes or apparatus adapted for the manufacture or treatment of semiconductor or solid state devices or of parts thereof
- H01L21/70—Manufacture or treatment of devices consisting of a plurality of solid state components formed in or on a common substrate or of parts thereof; Manufacture of integrated circuit devices or of parts thereof
- H01L21/77—Manufacture or treatment of devices consisting of a plurality of solid state components or integrated circuits formed in, or on, a common substrate
- H01L21/78—Manufacture or treatment of devices consisting of a plurality of solid state components or integrated circuits formed in, or on, a common substrate with subsequent division of the substrate into plural individual devices
- H01L21/82—Manufacture or treatment of devices consisting of a plurality of solid state components or integrated circuits formed in, or on, a common substrate with subsequent division of the substrate into plural individual devices to produce devices, e.g. integrated circuits, each consisting of a plurality of components
- H01L21/84—Manufacture or treatment of devices consisting of a plurality of solid state components or integrated circuits formed in, or on, a common substrate with subsequent division of the substrate into plural individual devices to produce devices, e.g. integrated circuits, each consisting of a plurality of components the substrate being other than a semiconductor body, e.g. being an insulating body
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L25/00—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
- H01L25/03—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes
- H01L25/04—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers
- H01L25/065—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group H01L27/00
- H01L25/0652—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group H01L27/00 the devices being arranged next and on each other, i.e. mixed assemblies
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L25/00—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
- H01L25/16—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof the devices being of types provided for in two or more different main groups of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. forming hybrid circuits
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L27/00—Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate
- H01L27/02—Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including semiconductor components specially adapted for rectifying, oscillating, amplifying or switching and having potential barriers; including integrated passive circuit elements having potential barriers
- H01L27/0203—Particular design considerations for integrated circuits
- H01L27/0207—Geometrical layout of the components, e.g. computer aided design; custom LSI, semi-custom LSI, standard cell technique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
- G06F2212/1024—Latency reduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/45—Caching of specific data in cache memory
- G06F2212/452—Instruction code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/602—Details relating to cache prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/62—Details of cache specific to multiprocessor cache arrangements
- G06F2212/621—Coherency control relating to peripheral accessing, e.g. from DMA or I/O device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/65—Details of virtual memory and virtual address translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0038—System on Chip
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2924/00—Indexing scheme for arrangements or methods for connecting or disconnecting semiconductor or solid-state bodies as covered by H01L24/00
- H01L2924/0001—Technical content checked by a classifier
- H01L2924/0002—Not covered by any one of groups H01L24/00, H01L24/00 and H01L2224/00
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2924/00—Indexing scheme for arrangements or methods for connecting or disconnecting semiconductor or solid-state bodies as covered by H01L24/00
- H01L2924/10—Details of semiconductor or other solid state devices to be connected
- H01L2924/11—Device type
- H01L2924/14—Integrated circuits
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2924/00—Indexing scheme for arrangements or methods for connecting or disconnecting semiconductor or solid-state bodies as covered by H01L24/00
- H01L2924/30—Technical effects
- H01L2924/301—Electrical effects
- H01L2924/3011—Impedance
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S257/00—Active solid-state devices, e.g. transistors, solid-state diodes
Definitions
- the present invention relates generally to the construction of customized system-on-chip computing modules and specifically to the application of semiconductor carriers comprising a fully integrated power management system that transfers data between at least one memory die mounted on the semiconductor carrier at speeds that are synchronous or comparable with at least one general purpose processor die co-located on the semiconductor carrier.
- the present invention relates generally to methods and means that reduce the physical size and cost of high-speed computing modules. More specifically, the present invention instructs methods and means to flexibly form a hybrid computing module designed for specialized purposes that serve low market volume applications while using lower cost general purpose multi-core microprocessor chips having functional design capabilities that are generally restricted to high-volume market applications.
- the invention teaches the use of methods to switch high current (power) levels at high (GHz frequency) speeds by means of semiconductor carrier comprising a fully integrated power management system to maximize utilization rates of multi-core microprocessor chips having considerably more stack-based cache memory with the need for little or no on-board heap-based cache memory, thereby enabling higher performance, smaller overall system size and reduced system cost in specialized low volume market applications.
- the large thermal loads generated by conventional power management systems further reduce system efficiency by requiring power management to be located significant distances from the processor and memory die, thereby adding loss through the power distribution network. Therefore, methods that reduce system losses by providing means to fabricate a hybrid computing module comprising power management systems that generate sufficiently low thermal loads to be situated in close proximity to the memory and microprocessor die are desirable.
- Switching speed/frequency is increased by minimizing gate capacitance (C OX ), gate electrode surface area (W ⁇ L).
- C OX gate capacitance
- W ⁇ L gate electrode surface area
- C OX gate capacitance
- W ⁇ L gate electrode surface area
- the high thermal loads require complex thermal management devices to be designed into the assembled system and usually require the power management and processor systems to be physically separated from one another for optimal thermal management. Therefore, methods and means to produce a hybrid computing module that embeds power management devices in close proximity to the processor cores to reduce loss and contain power FETs that switch large currents comprising several 10's to 100's of amperes at high speeds without generating large thermal loads are desirable.
- FIGS. 1 A, 1 B, 1 C The large loss of available processor real estate to cache memory in multi-core x86 processor chips is illustrated in FIGS. 1 A, 1 B, 1 C.
- FIG. 1A presents a scaled representation of a Nehalem quad-core microprocessor chip 1 fabricated using the 45 nm technology node.
- the chip's surface area is allocated for 4 microprocessor cores 2 A, 2 B, 2 C, 2 D, an integrated 3 Ch DDR3 memory controller 3 , and shared L3 cache memory 4 .
- L3 cache memory 4 occupies roughly 40% of the surface area not allocated to system interconnect circuits 5 A, 5 B, or approximately 30% of the total die surface area.
- the Westmere dual-core microprocessor chip 6 ( FIG. 1B ) fabricated using the 32 nm technology node allocates approximately 35% of its total available surface area to L3 cache memory 7 to serve its 2 microprocessor cores 8 A, 8 B.
- the Westmere-EP 6 core microprocessor chip 9 ( FIG.
- FIG. 2A shows the average costs of masks used to photolithographically pattern an individual material layer embedded within an integrated circuit assembly as a function of the manufacturing technology nodes.
- a key technology objective has been to integrate entire electronic systems on a chip.
- the significantly higher mask costs cause design and lithography costs to skyrocket at the more advanced technology nodes (45 nm & 32 nm).
- FIG. 2B shows the variation of design and lithography costs per function (memory, processor, controller, etc.) among the different technology nodes (65 nm, 45 nm, 32 nm) normalized to the fabrication cost at the 90 nm technology node for system-on-chip (“SoC”) devices serving low volume 20 , medium volume 22 , and high volume (general purpose) 24 technology applications.
- SoC system-on-chip
- the increasing design and lithography costs cause SoC applications fabricated to the more advanced technology nodes (45 nm and 32 nm) to be more expensive in low-volume 20 and medium-volume 22 markets than they would be when fabricated to the less advanced technology nodes (90 nm and 65 nm).
- SoC System-on-Chip
- active component or “active element” is herein understood to refer to its conventional definition as an element of an electrical circuit that that does require electrical power to operate and is capable of producing power gain.
- atomicity is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as an indivisible block of programming code that defines an operation that either does not happen at all or is fully completed when used.
- cache memory herein refers to its conventional meaning as an electrical bit-based memory system that is physically located on the microprocessor die and used to store stack variables and main memory pointers or addresses.
- compositional complexity is herein understood to refer to a material, such as a metal or superalloy, compound semiconductor, or ceramic that consists of three (3) or more elements from the periodic table.
- chip carrier is herein understood to refer to an interconnect structure built into a semiconductor substrate that contains wiring elements and active components that route electrical signals between one or more integrated circuits mounted on chip carrier's surface and a larger electrical system that they may be connected to.
- coherency or “memory coherence” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as an issue that affects the design of computer systems in which two or more processors or cores share a common area of memory and the processors are notified of changes to shared data values in the common memory location when it is updated by one of the processing elements.
- concurrency or “memory consistency” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as a model for distributed shared memory or distributed data stores (file systems, web caching, databases, replication systems) that specifies rules that allow memory to be consistent and the results of memory operations to be predictable.
- computing system is herein understood to mean any microprocessor-based system comprising a register compatible with 32, 64, 128 (or any integral multiple thereof) bit architectures that is used to electrically process data or render computational analysis that delivers useful information to an end-user.
- critical performance tolerances is herein understood to refer to the ability for all passive components in an electrical circuit to hold performance values within ⁇ 1% of the desired values at all operating temperatures over which the circuit was designed to function.
- die is herein understood to refer to its conventional meaning as a sectioned slide of semiconductor material that comprises a fully functioning integrated circuit.
- DMA Direct Memory Access
- DMA Direct Memory Access
- devices either external or internal to the systems chassis, having a means to bypass normal processor functionality, updates or reads main memory and signals the processor(s) the operation is complete. This is usually done to avoid slow memory controller functionality and or in cases where normal processor functionality is not needed.
- electrosenor is herein understood to refer to its conventional meaning as being a complex ceramic material that has robust dielectric properties that augment the field densities of applied electrical or magnetic stimulus.
- FET field effect transistor
- heat memory is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as a large pool of memory, generally located in RAM, that has divisible portions dynamically allocated for current and future memory requests.
- Hybrid Memory Cube is herein understood to refer a DRAM memory architecture that combines high-speed logic processing within a stack of through-silicon-via bonded memory die and is under development through the Hybrid Memory Cube Consortium.
- integrated circuit is herein understood to mean a semiconductor chip into which a large, very large, or ultra-large number of transistor elements have been embedded.
- kernel is herein understood to refer to its conventional meaning in computer operating systems as the communications interface between the computing applications and the data processing hardware and manages the system's lowest-level abstraction layer controlling basic processor and I/O device resources.
- the “latency” or “column address strobe (CAS) latency” is the delay time between the moment a memory controller tells the memory module to access a particular memory column on a random-access memory (RAM) module and the moment the data from the given memory location is available on the module's output pins.
- LCD is herein understood to mean a method that uses liquid precursor solutions to fabricate materials of arbitrary compositional or chemical complexity as an amorphous laminate or free-standing body or as a crystalline laminate or free-standing body that has atomic-scale chemical uniformity and a microstructure that is controllable down to nanoscale dimensions.
- main memory or “physical memory” are herein understood to refer to their conventional definitions as memory that is not part of the microprocessor die and is physically located in separate electronic modules that are linked to the microprocessor through input/output (I/O) controllers that are usually integrated into the processor die.
- I/O input/output
- ordering is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as a system of special instructions, such as memory fences or barriers, which prevent a multi-threaded program from running out of sequence.
- passive component is herein understood to refer to its conventional definition as an element of an electrical circuit that that modulates the phase or amplitude of an electrical signal without producing power gain.
- pipeline or “instruction pipeline” is herein understood to refer to a technique used in the design of computers to increase their instruction throughput, (the number of instructions that can be executed in a unit of time), by running multiple operations in parallel.
- processor is herein understood to be interchangeable with the conventional definition of a microprocessor integrated circuit.
- RISC is herein understood to refer to its conventional meaning with regards to computing systems as a microprocessor designed to perform a smaller number of computer instruction types, wherein each type of computer instruction utilizes a dedicated set of transistors so the lower number of instruction types reduces the microprocessor's overall transistor count.
- resonant gate transistor is herein understood to refer to any of the transistor architectures disclosed in de Rochemont, U.S. Ser. No. 13/216,192, “POWER FET WITH A RESONANT TRANSISTOR GATE”, wherein the transistor switching speed is not limited by the capacitance of the transistor gate, but operates at frequencies that cause the gate capacitance to resonate with inductive elements embedded within the gate structure.
- shared data is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as data elements that are simultaneously used by two or more microprocessor cores.
- stack or “stack-based memory allocation” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as regions of memory reserved for a thread where data is added or removed in a last-in-first-out protocol.
- stack-based computing is herein understood to describe a computational system that primarily uses a stack-based memory allocation and retrieval protocol in preference to conventional register-cache computational models.
- standard operating temperatures is herein understood to mean the range of temperatures between ⁇ 40° C. and +125° C.
- thermoelectric effect is herein understood to refer to its conventional definition as the physical phenomenon wherein a temperature differential applied across a material induces a voltage differential within that material, and/or an applied voltage differential across the material induces a temperature differential within that material.
- thermoelectric material is herein understood to refer to its conventional definition as a solid material that exhibits the “thermoelectric effect”.
- optical tolerance or “critical tolerance” are herein understood to mean a performance value, such as a capacitance, inductance, or resistance that varies less than ⁇ 1% over standard operating temperatures.
- II-VI compound semiconductor is herein understood to refer to its conventional meaning describing a compound semiconductor comprising at least one element from column IIB of the periodic table including: zinc (Zn), cadmium (Cd), or mercury (Hg); and, at least one element from column VI of the periodic table consisting of: oxygen (O), sulfur (S), selenium (Se), or tellurium (Te).
- III-V compound semiconductor is herein understood to refer to its conventional meaning describing a compound semiconductor comprising at least one semi-metallic element from column III of the periodic table including: boron (B), aluminum (Al), gallium (Ga), and indium (In); and, at least one gaseous or semi-metallic element from the column V of the periodic table consisting of: nitrogen (N), phosphorous (P), arsenic (As), antimony (Sb), or bismuth (Bi).
- IV-IV compound semiconductor is herein understood to refer to its conventional meaning describing a compound semiconductor comprising a plurality of elements from column IV of the periodic table including: carbon (C), silicon (Si), germanium (Ge), tin (Sn), or lead (Pb).
- IV-VI compound semiconductor is herein understood to refer to its conventional meaning describing a compound semiconductor comprising at least one element from column IV of the periodic table including: carbon (C), silicon (Si), germanium (Ge), tin (Sn), or lead (Pb); and, at least one element from column VI of the periodic table consisting of: sulfur (S), selenium (Se), or tellurium (Te).
- the present invention generally relates to a hybrid system-on-chip that comprises a plurality of memory and processor die mounted on a semiconductor carrier chip that contains a fully integrated power management system that switches DC power at speeds that match or approach processor core clock speeds, thereby allowing the efficient transfer of data between off-chip physical memory and processor die.
- the present invention relates to methods and means to reduce the size and cost of computing systems, while increasing performance.
- the present invention relates to methods and means to provide a factor increase in computing performance per processor die surface area while only fractionally increasing power consumption.
- One embodiment of the present invention provides a hybrid computing module, comprising: a semiconductor carrier including a substrate adapted to provide electrical communication, through electrically conducting traces and passive circuit network filtering elements formed upon the carrier substrate, between a fully integrated power management circuit module having a resonant gate transistor to switch electrical power to drive the transfer of data and digital process instruction sets between a plurality of discrete semiconductor die mounted upon the semiconductor carrier, wherein the plurality of discrete semiconductor die include: at least one microprocessor die forming a central processing unit (CPU), and a memory bank having at least one memory die.
- CPU central processing unit
- the plurality of semiconductor die may include a field programmable gate array (FPGA) or provide memory controller functionality.
- the memory controller functionality may be field programmable or be provided by a static address memory controller.
- the plurality of semiconductor die may additionally include a graphics processing unit (GPU) or an application-specific integrated circuit (ASIC).
- the plurality of semiconductor die may be mounted as a stack on the semiconductor carrier.
- the module may further comprise a plurality of semiconductor die mounted upon the hybrid computing module that provide GPU and field programmability.
- the CPU and GPU semiconductor die may comprise multiple processing cores.
- the substrate forming the semiconductor carrier may be a semiconductor. Active circuitry may be embedded in the semiconductor substrate that manages USB, audio, video and other communications bus interface protocols.
- the microprocessor die may contain multiple processing cores or may have cache memory that occupies less than 15% or even 10% of the microprocessor die footprint.
- the plurality of discreet semiconductor die may be configured as a chip stack.
- the hybrid computing module may contain a plurality of central processing units, each functioning as distributed processing cores or a plurality of central processing units that are configured to function as a fault-tolerant computing system.
- the hybrid computing module may be in thermal contact with a thermoelectric device.
- the passive circuit network filtering elements formed upon the semiconductor carrier may have performance values that maintain critical performance tolerances.
- the memory die may be mounted within a stack comprising additional semiconductor die.
- the fully integrated power management module may be mounted on the semiconductor carrier and may switch power at speeds greater than 250 MHz.
- the fully integrated power management module may switch power at speeds in the range of 600 MHz to 60 GHz.
- the fully integrated power management module may be formed upon the semiconductor carrier.
- the semiconductor carrier may be in electrical communication with an electro-optic drivers that interface the hybrid computing module with other systems by means of fiber-optic network.
- the electro-optical interface may contain an active layer that forms a 3D electron gas.
- a real-time memory access computing architecture comprising: a hybrid computer module comprising a plurality of discrete semiconductor die mounted upon a semiconductor carrier, which hybrid computer module further comprises: a fully integrated power management module having a resonant gate transistor, wherein the fully integrated power management module is adapted to synchronously switch power at speeds that match a clock speed of a microprocessor on an adjacent microprocessor die mounted within the hybrid computer module to provide real-time memory access; a look-up table adapted to select a pointer to reference addresses in a main memory where data and/or processes are physically located; a memory management variable that uses the look-up table to select the next set of data and/or processes called by the microprocessor; a memory bank forming the main memory, wherein, ⁇ 50% of cache memory of the microprocessor die is allocated to stack-based memory functionality.
- the resonant transistor gate may switch power at speeds between 600 MHz and 60 GHz.
- the fully integrated power management module may have an efficiency greater than 98%.
- the computing architecture may have 70%-100% of the microprocessor die cache memory is allocated to stack-based memory functionality.
- the look-up table may be located in cache memory or in main memory.
- the main memory resources may provide both stack-based and heap-based memory functionality.
- the memory management variable may be adapted to instruct the look-up table to reassign and/or reallocate main memory addresses.
- the computing architecture does not have to include a memory management algorithm that predictively manages the inflow of stack-based memory functions into the cache memory of a processor die within the hybrid computer module.
- the computing architecture processor die may have no cache memory.
- FIGS. 1 A, 1 B, 1 C depict the scaled surface areas distributed to cache memory and processor functions in modern microprocessor systems.
- FIGS. 2 A, 2 B depict the higher design and lithography costs of advanced semiconductor technology nodes and their impact on the cost SoC systems as a function of varying market volumes.
- FIGS. 3 A, 3 B depict the hybrid computing module.
- FIGS. 4 A, 4 B illustrate multi-core microprocessor die with reduced cache memory used in the hybrid computing module.
- FIGS. 5 A, 5 B, 5 C depict the use of semiconductor layers that form 3-D electron gases.
- FIG. 6 illustrates the use of a thermoelectric device in the hybrid computing module.
- FIGS. 7 A, 7 B, 7 C, 7 D, 7 E, 7 F illustrate the invention's methods and embodiments that enable minimal instruction set computing suitable for general purpose applications.
- FIGS. 8 A, 8 B depict the prior art related to stack machines.
- FIGS. 9 A, 9 B illustrate characteristic features of a general purpose stack machine enabled by this invention.
- the '698 application instructs on methods and embodiments that provide meta-material dielectrics that have dielectric inclusion(s) with performance values that remain stable as a function of operating temperature. This is achieved by controlling the dielectric inclusion(s)' microstructure to nanoscale dimensions less than or equal to 50 nm.
- de Rochemont '159 and '042 instruct the integration of passive components that hold performance values that remain stable with temperature in printed circuit boards, semiconductor chip packages, wafer-scale SoC die, and power management systems.
- de Rochemont '159 instructs on how LCD is applied to form passive filtering networks and quarter wave transformers in radio frequency or wireless applications that are integrated into a printed circuit board, ceramic package, or semiconductor component.
- de Rochemont '042 instructs methods to form an adaptive inductor coil that can be integrated into a printed circuit board, ceramic package, or semiconductor device.
- de Rochemont et al. '112 discloses the liquid chemical deposition (LCD) process and apparatus used to produce macroscopically large compositionally complex materials, that consist of a theoretically dense network of polycrystalline microstructures comprising uniformly distributed grains with maximum dimensions less than 50 nm. Complex materials are defined to include semiconductors, metals or super alloys, and metal oxide ceramics.
- de Rochemont '222 and '922A instruct on methods and embodiments related to a fully integrated low EMI, high power density inductor coil and/or high power density power management module.
- de Rochemont '192 instructs on methods to integrate a field effect transistor that switch arbitrarily large currents at arbitrarily high speeds with minimal On-resistance into a fully integrated silicon chip carrier.
- de Rochemont '922B instructs methods and embodiments to integrated semiconductor layers that produce a 3-dimensional electron gas within semiconductor chip carriers and monolithically integrated microelectronic modules.
- de Rochemont '302 instructs methods and embodiments to optimize thermoelectric device performance by integrating chemically complex semiconductor material having nanoscale microstructure.
- a hybrid system-on-chip (“SoC”) computing module 100 is shown in a perspective view in FIG. 3A and a top view in FIG. 3B .
- the hybrid computing module 100 is formed by mounting at least one microprocessor die 102 A,B with at least one memory bank 104 A,B on a semiconductor chip carrier 106 .
- the semiconductor chip carrier 106 consists of a substrate, preferably a semiconducting substrate, upon which electrically conducting traces and passive circuit network filtering elements have been formed, and a plurality of semiconductor die and circuit modules have been mounted or monolithically integrated.
- the substrate may alternatively comprise an electrically insulting material that has high thermal conductivity such as MAX-phase materials referenced in de Rochemont '405, which enable substrate materials that having electrical resistivity greater than 10 10 ohm-cm and thermal conductivity greater than 100 W-m ⁇ -K ⁇ 1 .
- the at least one microprocessor die 102 A,B is preferably a multi-core processor, which may be assigned logic, graphic, central processing, or math functions.
- the at least one memory bank 104 A,B is preferably configured as a stack of memory die and may be a Hybrid Memory CubeTM currently under development.
- the memory bank 104 A,B may optionally comprise an integrated circuit within the stack that provides memory controller functionality that arbitrates management issues and protocols with the microprocessor die 102 A,B.
- the controller chip stacked within the memory bank 104 A,B may comprise a field programmable gate array (FPGA), but is preferably a static address memory controller.
- FPGA field programmable gate array
- the semiconductor chip carrier 106 consists of a power management module 108 that is either mounted on to or monolithically integrated into the semiconductor chip carrier 106 , passive circuit networks 110 as needed to properly regulate the power bus 112 and interconnect bus 114 networks, ground planes 115 , input/output pads 116 , and timing circuitry that are fully integrated on to the semiconductor chip carrier using LCD methods described in de Rochemont and Kovacs '112 and de Rochemont '159.
- the semiconductor chip carrier 106 may additionally comprise standard bus functionality (not shown for clarity) in the form of circuitry that is integrated within its body to manage processing buffers, audio, video, parallel bus or universal serial bus (USB) functionality.
- the power management module 108 incorporates a resonant gate power transistor configured to reduce loss within the power management module 108 to levels less than 2% and to switch power regulating currents greater than 0.005 A at speeds greater than 250 MHz, preferably at speeds in the range of 600 MHz to 60 GHz, that can be tuned to match or support clock speed(s) of the microprocessor die 102 A,B, or transfer data from main memory at to the processor die at speeds that range from the processor clock speed to 1/10 th the processor clock speed using methods and means instructed in de Rochemont '922A and '192.
- 3 A, 3 B only depict a single power management module for convenience, a plurality of power management modules 108 may be integrated into the semiconductor chip carrier 106 as may be needed to serve a particular design objective for the hybrid computing module 100 .
- digital radio systems incorporate baseband-processors to manage radio control functions (signal modulation, encoding/decoding, radio frequency shifting, etc.).
- Baseband processors manage lower frequency processes, but are often separated from the main CPU because they are highly dependent on timing and require certification of their software stack by government regulatory bodies.
- the current invention enables the real-time processing needed to integrate the baseband processors with the CPU, (see “stack-based computing” below), it might be advantageous to mount a certified baseband processor ( 102 B) separately from the main CPU ( 102 A) to avoid system certification delays.
- the design might also include an additional “off-stepped” power management module (not shown) that regulates power at lower switching speeds that are in-step with the baseband processing unit.
- the hybrid computing module may also comprise one or more electro-optic signal drivers 118 that interface the module to within a larger computing or communications system by means of an optical waveguide or fiber-optic network through input/output ports 120 A, 120 B. Additionally, the hybrid computing module may also comprise application-specific integrated circuitry (ASIC) semiconductor die 122 that coordinate interactions between microprocessor die 102 A,B and memory banks 104 A,B. Although the ASIC semiconductor die 122 may have specific processor functions described below, it can also be used to customize memory management protocols to achieve improved coherency in low-volume to mid-volume applications, or to serve a specific functional need, such as radio signal modulation/de-modulation, or to respond to specific data/sensory inputs for which the computing module 100 was uniquely designed. Multiple cost, performance, foot print and power management benefits are enabled as a result of the module configuration defined by this invention.
- ASIC application-specific integrated circuitry
- the high efficiency (98+%) of the low-loss power management module 108 allows it to be placed in close proximity to the microprocessor die 102 A,B and memory banks 104 A,B.
- This ability to integrate low loss passive components operating at critical performance tolerances with active elements embedded within the semiconductor chip carrier 106 , or within semiconductor layers deposited thereupon, is used to resolve many of the technical constraints outlined above that lead to on-chip and off-chip data bottlenecks that compromise system performance in system-on-chip (“SoC”) product offerings.
- SoC system-on-chip
- the efficient switching of large currents at speeds that match the processor clock(s) are achieved by integrating a resonant gate transistor into the monolithically integrated power management module 108 using the means and methods described in de Rochemont '922A and '192.
- the resonant response of the resonant gate transistor modulating the power management module's power FET is tuned to match core clock speeds in the microprocessor die 102 A,B. Designing the power management module to synchronously match off-chip memory latency and bandwidth to the needs of computing system cores allows data from physical memory banks 104 A,B to be efficiently transferred to and from processor cores, thereby mitigating the need for large on-chip cache memory in the microprocessor die 102 A,B.
- prior reference is made to x86 microprocessor core architecture to establish visual clarity in FIGS. 1 A, 1 B, 1 C the generic value of this invention applies to computing systems of any known or unknown 32-bit, 64-bit, 128-bit (or larger) microprocessor architecture.
- a preferred embodiment of the hybrid computer module utilizes multi-core processors 150 / 160 ( 102 A,B) that have less than 15%, preferably less than 10% of their surface areas allocated to cache memory 152 / 160 as shown in FIGS. 4 A, 4 B.
- Multi-core processor die 150 that minimize the fractional percentage of semiconductor surface area allocated to cache memory 152 A, 152 B, 152 C, 152 D/ 162 A, 162 B, 162 C, 162 D, 162 E, 162 F and maximize real estate dedicated to processor core 154 functionality have smaller footprint, resulting in higher productivity yields and lower production costs.
- microprocessor die 150 wherein the ratio of processor cores 154 to cache memory 152 functionality is greater than 90% increases computing performance by more than 30%-50% per square millimeter (mm 2 ) of processor integrated circuitry.
- FIG. 4A illustrates the relative size of a scaled representation of a Nehalem quad-core microprocessor chip 150 fabricated using the 45 nm technology node if it were designed to have 10% of its surface area allocated to cache memory for comparison with FIG. 1A .
- the chip's surface area is allocated for 4 microprocessor cores 152 A, 152 B, 152 C, 152 D, and shared L3 cache memory 164 that has been reduced in size.
- the L3 cache memory 164 occupies roughly 10% of the surface area not allocated to system interconnect circuits.
- FIG. 4A illustrates the relative size of a scaled representation of a Nehalem quad-core microprocessor chip 150 fabricated using the 45 nm technology node if it were designed to have 10% of its surface area allocated to cache memory for comparison with FIG. 1A .
- the chip's surface area is allocated for 4 microprocessor cores 152 A, 152 B, 152 C, 152 D, and shared L3 cache memory 164 that has been reduced in
- FIG. 4B illustrates a modified Westmere-EP 6 core microprocessor chip 160 fabricated using the 32 nm technology node that allocates less than 10% of its available surface area to L3 cache memory 164 to serve its 6 microprocessor cores 162 A, 162 B, 162 C, 162 D, 162 E, 162 F for comparison with FIG. 1C .
- the smaller size of the processor die's cache memory directly reflects smaller cache memory capacity.
- an alternative embodiment of the invention claims a computing system comprising a hybrid computing module 100 consisting of processor functionality 102 A,B and physical memory utility (memory banks) 104 A,B that is segregated onto discrete semiconductor die mounted upon a monolithically integrated semiconductor chip carrier 106 , wherein the processor die 102 A,B have on-board cache memory capacities less than 16 Mb/core, preferably less than 128 Kb/core.
- a subsequent embodiment of the invention enabled by mounting microprocessor die 102 A,B and memory banks 104 A,B upon a semiconductor chip carrier 106 comprising a monolithically integrated, high-speed power management module 108 that synchronously switches power at processor clock speeds provides real-time memory access by removing the need for direct-memory access updates from cache memory.
- main memory resources located in memory banks 104 A,B serve all stack-based and heap-based memory functionality for microprocessor die 102 A,B.
- the microprocessor die 102 A,B may be organized as distributed computing cells or serve as a fault-tolerant computing platform.
- An additional embodiment of the hybrid computer module 100 further reduces cost through the use of ASIC semiconductor die 122 A, 122 B to customize the performance of general purpose microprocessor systems for broader application to low- and mid-volume market sectors.
- ASIC semiconductor die 122 A, 122 B to customize the performance of general purpose microprocessor systems for broader application to low- and mid-volume market sectors.
- the higher design and masking costs of the more advanced technology nodes causes SoC semiconductor die to be more expensive in low-volume 20 and mid-volume 22 market segments.
- An SoC device will integrate a plurality of functions into a single die.
- hybrid computing module 100 to incorporate general purpose microprocessor die 102 A,B and memory banks 104 A,B fabricated to the highest technology node and use ASIC semiconductor die 122 A, 122 B to tailor functions for a specific application.
- Semiconductor die adjacent to the microprocessor die 102 A, 102 B may provide any functional process to the hybrid computing module, including analog-to-digital or digital-to-analog functionality.
- Functionality provided by the ASIC semiconductor die 122 A, 122 B (or other die) and bus management circuitry embedded within the semiconductor chip carrier 106 may be fabricated using a lower technology whenever it is possible to do so.
- a further embodiment of the hybrid computing module 100 uses methods described in de Rochemont '192, incorporated herein by reference, to integrate a semiconductor layer 130 , 132 , 134 that forms a 3D electron gas to maximize switching speeds of active components embedded within the semiconductor chip carrier 106 , the power management module 108 , or the electro-optic driver 118 , respectively, to further improve switching speeds within those devices.
- thermoelectric module 140 in thermal communication with the unpopulated major surface 142 of the semiconductor chip carrier 106 to pump heat generated by the active components mounted on or integrated into the chip carrier 106 to a thermal reservoir 144 .
- a preferred embodiment of the thermoelectric module 140 utilizes methods and means described by de Rochemont '302, incorporated herein by reference, to integrate the thermoelectric module 140 into the hybrid computing module 100 .
- Thermoelectric modules may also be mounted onto a free surface of various semiconductor mounted onto the semiconductor chip carrier 106 .
- Pulsed power is required to access (read or write) and to refresh data stored within arrays of physical and cache memory.
- Larger memory banks require larger currents to strobe and transfer data from physical memory to the processor cores.
- Large latency, driven by the inability of alternative power management solutions to pulse sufficiently large currents at duty cycles close to processor core clock speeds have necessitated the move to integrate larger cache memory 4 , 7 , 10 on conventional multi-core processor die 1 , 6 , 9 (see FIGS. 1 A, 1 B, 1 C).
- the larger cache memories mask the data transfer deficiencies and mitigate associated problems with memory coherence in computing platforms. These problems are resolved by improving the speed and efficiency of power management modules supplying the computing platform and providing means to maintain signal integrity within passive circuit and interconnect networks used to route high-speed digital signals within the system.
- Latency in asynchronous dynamic random access memory remains constant, so the time delay between presenting a column address and receiving the data on the output pins is fixed by the internal configuration of the DRAM array.
- Synchronous DRAM (SDRAM) modules organize plurality of DRAM arrays in a single module.
- the column address strobe (CAS) latency in SDRAM modules is dependent upon the clock rate and is specified in clock ticks instead of real time. Therefore, computing systems that reduce latency in SDRAM modules by enabling large currents to be strobed at gigahertz clock speeds improve overall system performance through efficient, high-speed data transfers between physical memory and the processor cores.
- An embodiment of hybrid computing module 100 designs the power management 108 to regulate currents greater than 50 A, preferably greater than 100 A.
- the hybrid computing module 100 situates the memory banks 104 A,B in close proximity to the microprocessor cores 102 A,B to reduce delay times and minimize deleterious noise influences.
- Tight tolerance passive elements enabled by LCD manufacturing methods integrated into the passive circuit networks 110 are used to improve signal integrity and control leakage currents by maintaining stable transmission line and filtering characteristics over standard operating temperatures. Methods that minimize loss in the magnetic cores of inductor and transformer components described in de Rochemont '222, incorporated herein by reference, are used to maximize the efficiency and signal integrity of passive circuit networks 110 and power management modules 108 .
- Matching off-chip memory latency and bandwidth to meet the needs of the computing systems' cores removes the need for large on-chip cache memories and improves coherence by maintaining all shared data in physical memory where it is simultaneously available to all processor cores. Removing on-chip memory constraints leads to roughly 35%-50% increase in performance per square millimeter (mm 2 ) of microprocessor real estate.
- a typical 6 core-Westmere-EP cpu 9 (see FIG. 1C ) operating at voltages between 0.75 V and 1.35 V and a switching speed of 3.0 GHz consumes 95 Watts.
- the same cpu driven at 4.6 GHz (a 54% increase in switching frequency) will consume 45% more power due to a combination of higher voltage and larger switching currents, assuming leakage is tightly controlled.
- the system will consume 150 W of supplied power when it is supplied by a power management device that has a 92% conversion efficiency.
- a hybrid computing module 100 comprising a high efficiency power management module 108 having a 98+% efficiency that is capable of driving large currents at switching speeds that match processor core clock speeds (2-50 GHz) improves performance and power consumption through superior conversion efficiencies and lower cpu operating voltages.
- a 9-core version of the same processor, reconfigured by eliminating on-chip L3 cache memory 10 would consume 45% more power when operated at 3.0 GHz while occupying roughly the same footprint as the 6-core Westmere-EP cpu 9 .
- the hybrid computing module 100 provides a 2.3 ⁇ (230%) increase in performance while decreasing CPU power consumption 17%. simply by eliminating power consumed in cache memory from the processor die.
- System-level performance comparisons are provided in Table I immediately below.
- This invention allows for such functionality that mitigates and greatly minimizes the need for cache-based heap memory, resulting in smaller-sized processor dies when compared to conventional chip designs, it enables processor die cache memories that can be tasked primarily for stack-based resources. It is therefore another preferred embodiment of the invention to enable a direct memory access computing system wherein ⁇ 50% of the cache memory, preferably 70% to 100% of the cache memory, is allocated to stack-based, rather than heap-based, memory functions. Therefore, a principal embodiment of the invention is a computing system wherein heap-based memory functionality (i.e. pointers which map cache memory to RAM) is removed entirely from cache memory and placed in main memory.
- heap-based memory functionality i.e. pointers which map cache memory to RAM
- a further embodiment of the invention provides for the management of stack-based and heap-based memory functions directly from physical or main memory. Additionally, changes in operational architectures would be possible due to synchronization between the system processor(s) and main memory. Further benefits include the removal of expensive control algorithms providing cache and memory coherency functionality as well as cache hit-miss prediction. Much flatter memory designs can be achieved removing the need for multiple layers of cache memory.
- FIGS. 7A-7F The improved computer architectures and operating systems enabled by the hybrid computer module 100 are depicted in FIGS. 7A-7F .
- Computing systems that utilize cache memory to achieve higher speed require a memory management architecture 200 that employs predictive algorithms 202 located in cache memory 204 to manage the flow of data and instruction sets in and out of cache memory 204 .
- Memory coherence is maintained through invalidation-based or update-based arbitration protocols.
- the algorithms 202 reference a look-up table (register or directory) 206 , which may be located in cache memory 204 or physical memory 208 that contains a list of pointers 210 .
- the pointers reference addresses 212 where program stacks 214 comprising sequenced lists of data and process instructions that define a computational process are located in physical memory 206 .
- the processor core 215 calls a selected program stack 214 , a copy of the called program stack 216 listing data and/or processes needed to serve a computational objective is then loaded into the cache memory for subsequent processing by the processor unit 215 .
- cache-based computing An additional deficiency of cache-based computing is the need to dedicate roughly 45% of the transistors in the processor 215 and 30%-70% of the code instructions to manage “fetch”/“store” routines used to maintain coherency when copying a stack and returning the computed result back to main memory to maintain coherency. Therefore, memory management architectures and computer operating systems that increase computational efficiencies by substantially reducing processor transistor counts and instruction sets are equally desirable for their ability to reduce processor size, cost, and power consumption while increasing computational speeds are highly desirable.
- FIG. 7B depicts the memory management architecture 220 that is another preferred embodiment of the invention.
- This embodiment overcomes the stack overflow limitations of conventional computing architectures 200 and eliminates the need for complex predictive memory management algorithms 202 by running program stacks directly from main memory 222 .
- the algorithms 202 are mitigated or eliminated in a hybrid computing module 100 when the resonant gate transistor in the fully integrated power management module 108 is tuned to switch power at speeds that enable the physical memory 222 to operate in-step with the clock speed of the processor unit 224 .
- the look-up table 226 can be located in an optional cache memory 228 on-board the processor unit 224 , it is a preferred embodiment of the invention to locate the look-up table 226 in physical memory 222 .
- the invented architecture subsequently enables the processing unit 224 to render a memory management variable 230 to the look-up table 226 that selects the pointer 232 referencing the address 234 of the next set of data and/or processes in a program stack 236 needed by the processor unit 224 to complete its computational task.
- the availability of essentially unlimited bit-space in physical memory allows the variable 230 to instruct the look-up table 226 to reassign and reallocate addresses 234 to match the requirements of processed data and/or updated processes as they are loaded 238 in and out of the processing unit 224 .
- FIGS. 7 C, 7 D further illustrates the intrinsic benefits of a computer operating system enabled by the invention's memory management architecture 220 when it is applied to processing program stacks 240 through a single-threaded CPU processor 242 .
- a modern general purpose operating systems 243 loads all declared program items comprising variables (global and local), data structures, and called functions, etc., (not shown in its entirety for clarity), contained within a program stack 240 directly from the computer's main memory 244 into the CPU cache memory 246 .
- the operating system 239 copies these items and organizes them as sequenced code blocks into a collection of program stacks 240 that are collectively stored as heap memory within main memory 244 .
- the operating system 243 organizes the items within the program stacks 240 stored in main memory 244 (or optionally loaded into cache memory 246 ) to be operated upon as a last-in-first-out (“LIFO”) series of variables and instruction sets.
- LIFO last-in-first-out
- a computational process defined within a first selected program stack 240 A heaped in main memory 244 is copied and transferred 248 into the CPU cache memory 246 .
- the program stack copy 250 is then worked through item by item within the processor 242 , until it gets to the bottom of the program stack copy 250 . Since items within a stack copied into in cache memory 246 are not independently addressable while in cache memory 246 , any changes made to a global variable 252 within the program stack copy 250 are reported 253 back to the look-up table 254 before the next program stack 240 is called and loaded into cache memory 246 . Items organized in program stacks 240 are independently addressable when they are heaped together in main memory 244 .
- the power reduction enabled by the hybrid computer module 100 that is cited for 6-core and 9-core processors in Table 1 can be further reduced by an additional 30%-75% through a more efficient operating system.
- a very meaningful embodiment of the invention shown in FIG. 7D is a computational operating system 265 enabled by the hybrid computing module 100 that uses the memory management architecture 220 to minimize power loss and wasted operational cycles.
- the operating system 265 compiles a collection of program stacks 266 heaped into main memory 267 , wherein the series of sequenced items 268 within each of the program stacks 266 are not copies of process-defining instruction sets and data 269 , but pointers 270 to the memory addresses 271 of the desired process-defining instruction sets and data 269 , which remain statically stored in main memory 267 .
- the top item 268 A of the first selected program stack 266 A is copied 273 into the memory controller 274 , which then uses the pointer 270 copied from the top item 268 A to load a copy 275 of the corresponding process-defining instruction set or data 269 A into the processor 272 .
- the operating system 265 executes the desired computational process by working its way through the first selected program stack 266 A by copying the next pointer 270 listed in the next item 268 of the first selected program stack 266 A and loading 275 its corresponding process-defining instruction sets and data 269 in the order their pointers 270 are organized in the first selected program stack 266 A.
- the loading process 273 is halted to allow the memory management variable 230 to notify the look-up table 277 .
- the look-up table 277 in-turn updates 278 A the global variable 276 at the address 271 it is stored statically at its primary location in main memory 267 . There is no need to consume power and waste operational cycles updating the global variable 276 at multiple locations in main memory 267 , since the program stacks 266 never store copies of the global variable 276 , they only comprise pointing items 268 B that store the pointer to global variable 270 A. This allows all program stacks 266 containing pointing items 268 B to remain unchanged and still operate as intended when called into the processor 272 following an update to the global variable 276 .
- the computational operating system 265 enables similar reductions in power consumption and wasted operational cycles during program jumps.
- the memory management variable 230 halts the loading process 273 before the discarded items 280 are copied and loaded into the controller 274 .
- the memory management variable 230 in-turn uses the look-up table 277 to instruct the controller 274 to address the top item 281 on new program stack 266 B.
- the memory management variable 230 may also be used to store new instruction sets and/or 269 B defined by processes completed in the processor 272 at a new address 271 A main memory 267 . While this embodiment achieves maximal efficiencies maintaining stack-based and heap-based memory functions in main memory 222 , 244 , that does not preclude the use of this computational operating system 265 from fully loading program stacks into an optional cache memory 228 and still fall within the scope of the invention.
- FIGS. 7E&7F illustrate the inherent benefits of the present invention when applied to resolving major operational inefficiencies in conventional multi-core microprocessor architectures 283 .
- a collection of code items for a program stack 284 (variables and instruction sets) is stored in main memory 285 .
- a program stack 286 is generated with stack subdivisions 286 A, 286 B, 286 C, 286 D and stored within the heap (not shown) located in main memory 285 .
- the stack subdivisions 286 A, 286 B, 286 C, 286 D are code blocks (“short stacks”) structured to be threaded between multiple processor cores 287 A, 287 B, 287 C, 287 D operating on a single multi-core microprocessor die 287 .
- the program stack 286 is called by the processor 287 , the subdivisions 286 A, 286 B, 286 C, 286 D in the program stack 286 are copied and mapped 288 A, 288 B, 288 C, 288 D into the processor cores' 287 A, 287 B, 287 C, 287 D cache memory banks 289 A, 289 B, 289 C, 289 D where they are subsequently processed.
- the code blocks contain data, branching, iterative, nested loop, and recursive functions that operate on local and global variables.
- Each of the subdivisions 286 A, 286 B, 286 C, 286 D maintain a register 290 of the shared global variables that are simultaneously processed among the multiple processor cores 287 A, 287 B, 287 C, 287 D. Once an alert to a change in a global variable has been flagged by a register 290 , all of the processors have to be halted since none of the items in the running code blocks within subdivisions 286 A, 286 B, 286 C, 286 D are independently addressable in the cache memory banks 289 A, 289 B, 289 C, 289 D.
- swap memory stack 291 This requires a swap memory stack 291 to be created in main memory 285 where the uncompleted stack subdivisions 291 A, 291 B, 291 C, 291 D are copied and mapped 292 A, 292 B, 292 C, 292 D from the cache memory banks 289 A, 289 B, 289 C, 289 D in the multiple processor cores 287 A, 287 B, 287 C, 287 D.
- the swap stack registers 290 ′A, 290 ′B, 290 ′C, 290 ′D can update 293 the addressable items within the uncompleted stack subdivisions 291 A, 291 B, 291 C, 291 D.
- the uncompleted stack subdivisions 291 A, 291 B, 291 C, 291 D can be reloaded 294 A, 294 B, 294 C, 294 D back into their respective processor cores 287 A, 287 B, 287 C, 287 D so the computational process defined by the program stack 286 can be completed
- this process (described with great simplification herein) requires intensive code executions to complete the mapping process and relies heavily upon “fetch”/“store” commands that are very wasteful of power budgeted to main memory 285 . Therefore, methods that sharply reduce the code complexity and minimize the usage of “fetch”/“store” commands while updating a global variable processed within a multi-core microprocessor die 287 is very desirable.
- the intrinsic efficiency of the disclosed multi-core operating system 295 is illustrated in FIG. 7F .
- the multi-core operating system 295 compiles and heaps a subdivided program stack 296 into main memory 267 , wherein the series of sequenced items 268 within each of the program stack subdivisions 296 A, 296 B, 296 C, 296 D are not copies of process-defining instruction sets and data 269 , but pointers 270 to the memory addresses 271 of the desired process-defining instruction sets and data 269 , which remain statically stored at their primary locations in main memory 267 .
- the memory controllers then 274 A, 274 B, 274 C, 274 D use the pointers 270 copied from the top items 268 W, 268 X, 268 Y, 268 Z to load a copies 275 A, 275 B, 275 C, 275 D of the process-defining instruction set or data 269 A corresponding to the loaded pointers 270 into the processor cores 297 A, 297 B, 297 C, 297 D.
- FIGS. 8 A, 8 B, 9 A, 9 B illustrate embodiments of the invention that relate to a general purpose stack-machine computing module.
- Stack-machine computing architectures were used on many early minicomputers and mainframe computing platforms.
- the Burroughs B5000 remains the most famous mainframe platform to use this architecture.
- RISC eventually enabled register-based cache computing architectures to displace stack-machine computing in broader applications as general purpose computing grew in complexity and hardware limitations imposed stricter requirements on memory management.
- advances in software and hardware combined to make it difficult for stack-machine systems to operate High-Level Languages, such as ALGOL and the suite of C-languages derived from it.
- stack-machine computing inefficient in general purpose applications, though it remains an attractive option in limited-use/specific-purpose embedded processors.
- Stack machine architectures are also implemented in certain software applications (JAVA and Adobe POSTSCRIPT) by configuring the processor and cache memory as a virtual stack machine.
- a stack 300 (see FIG. 8A ) is an abstract data structure that exists as a restricted linear or sequential collection of items 302 that have some shared significance to the desired computational objective.
- the items are loaded into the stack 300 in a Last-In-First-Out (“LIFO”) structure, which is very useful for block-oriented languages.
- the stack contains a list of “operands” 304 a , 304 b , 304 c , 304 d , 304 e sequenced in the linear collection 302 near the top of the stack.
- operands 304 a , 304 b , 304 c , 304 d , 304 e are operated upon together in a controlled fashion through another linear series 306 of operations (“operators”) 308 a , 308 b , 308 c , 308 d .
- opertors operations
- the individual operators 308 a , 308 b , 308 e , 308 d comprise primitive elements of a more complex algorithm encoded within the linear series 306 .
- Each of the individual operators 308 a , 308 b , 308 c , 308 d are applied using post-fix notation to the top of the stack 300 by means of push 310 and pop 312 commands, that add and remove the operators 308 a , 308 b , 308 c , 308 d in their coded sequential order.
- Each of the operators 308 a , 308 b , 308 c , 308 d applies its primitive operation to the top two items in the sequential collection 302 .
- the first operator 308 a is applied to the top two operands 304 a , 304 b in the stack.
- the resultant value is then returned to the top of the stack as the operator is popped 310 off the top of the stack.
- the sequential process continues in post-fix notation until all of the remaining operators 308 b , 308 c , 308 d are applied to all of the remaining operands contained within the stack 300 to complete the algorithmic calculation.
- the stack will comprise the resultant of 308 a applied to 304 a , 304 b inserted to the top of the stack and 304 c , 304 d , 304 e .
- the second operator 308 b is then applied to the resultant of 308 a applied to 304 a , 304 b and item 304 c , which now occupies the second position in the stack 300 .
- the process continues until the last operator 308 d is applied to the resultant of the two operands 304 c , 304 d immediately before the last operand item 304 e in the stack 300 .
- the final resultant is then inserted into the top of the stack 300 to be dispatched and used in the next step of the program.
- the stack 300 will typically contain non-operand items in the stack, such as addresses, function calls, records, pointers (stack, current program and frame), or other descriptors needed elsewhere in the computational process.
- the process depicted in FIG. 8B depicts how stacks are implemented in the most generic (simplest) conventional stack machine 320 .
- FIG. 8B also illustrates how stack machine computing is ideal for recursive computations, which progressively update and operate on the first two elements of a series, or nested functions that run a local variable through a series of operations until the desired output is generated.
- the data stack 322 , return stack 324 , program counter 326 , and the top-of-the-stack (“TOS”) register 328 are embedded in cache memory 330 integrated into the processor core 332 .
- the data stack 322 loads the top item of the stack into the top-of-the-stack (“TOS”) register or buffer 328 .
- the second item (now moved to the top) in the data stack 322 is simultaneously loaded through the data bus 334 as a pair with the item stored in the TOS register 328 into the arithmetic and logic computational unit (“ALU”) 336 where the primitive element operator (logical or arithmetic) is applied to the two operands.
- ALU arithmetic and logic computational unit
- the resultant value of the ALU 336 is then placed in the TOS register 328 to be loaded back into the ALU 336 with the next item that has moved to the top of the data stack 322 .
- the program counter 326 stores the address within the ALU 336 of the next instruction to be executed.
- the program counter 326 may be loaded from the bus when implementing program branches, or may be incremented to fetch the next sequential instruction from program memory 338 located in main memory 340 .
- the ALU 336 and the control logic and instruction register (CLIR) 342 are located in the processor core 332 .
- the ALU 336 comprises a plurality of addresses consisting of transistor banks configured to perform a primitive arithmetic element that functions as the operator applied to the pair of items sent through the ALU 336 .
- the return stack is a LIFO stack used to store subroutine return addresses instead of instruction operands.
- Program memory 338 comprises a fair amount of random access memory and operates with the memory address register 344 , which records the addresses of the items to be read onto or written from the data bus 334 on the next system cycle.
- the data bus 334 is also connected to an I/O port 346 used to communicate with peripheral devices.
- the number of instructions needed in stack-based computing can be reduced by as much as 50% compared to the number of instructions needed by register-based systems because interim values are recorded within the stack 300 . This obviates the need to use additional processor cycles for multiple memory calls (fetch and restore) when manipulating a “local variable”.
- the code density of stack machines can be very compact since no operand fields and memory fetching instructions are required until the computational objective is completed. There is no need to allocate registers for temporary values or local variables, which are implicitly stored within the stack 300 .
- the LIFO structure also facilitates maintenance and storage of activation records within the stack 300 during the transfer of programmatic control to subroutines.
- the utility of stack machines has become limited in more complex operations that require pipelining and multi-threading, or the maintenance of real-time consistency of global values over a broader network such as a computing cloud.
- stacks 300 were processed entirely in main memory. While this approach made the system slow, it allowed all items in the stack 300 to be independently addressable. However, as microprocessor speeds increased beyond the ability of physical memories to keep pace, stacks had to be loaded into cache memory where the items are not independently addressable. This limitation amplified the intrinsic inflexibility of working with restricted sequential collections of operand items 302 and linear instruction sets 308 . Consequently, modern stack machines started losing their competitive edge as general purpose applications required larger numbers of global variables to maintain their consistency as they are being simultaneously processed in various program branches within a plurality of stacks that could be located across a multiplicity of processor cores. Additionally, some computational problems require conditional problem solving where it is advantageous to modify a sequence of instructions based upon the conditional response of an earlier computation.
- stack architectures remain a preferred computing mode in limited small-scale and/or embedded applications that require high computational efficiencies because of their ability to be configured in ways that make computational use of every single available CPU cycle.
- This intrinsic advantage to stack architectures further enables fast subroutine linkage and interrupt response.
- These architectures are also emulated in virtual stack machines that require a less then efficient use of memory bandwidth and processing power. It is therefore desirable to provide a general purpose stack machine and operating system that processes computational problems with minimal instruction sets and transistor counts to minimize power consumption.
- FIGS. 9 A, 9 B illustrate the general purpose stack machine module 350 that applies the memory management architecture 220 and computational operating system 265 to a conventional stack machine processor architecture.
- the general purpose stack-machine computing module 350 incorporates a hybrid computing module 100 wherein the module's main memory bank 352 has been allocated into multiple groupings comprising a stack memory group 354 , a CPU/GPU memory group 356 , a global memory group 358 , a redundant memory group 360 , and a general utility memory group 362 .
- Each of the memory groupings 354 , 356 , 358 , 360 , 362 has its own memory address register/look-up table 355 , 357 , 359 , 361 , 363 and internal program counter 364 , 366 , 368 , 370 , 372 to administer program blocks assigned to the grouping.
- the general purpose stack machine computing module's 350 operating system segregates its functional blocks to maximize efficiencies enabled the invention.
- Instruction sets and associated variables within nested functions and recursive processes are organized and stored in the stack memory group 354 , which interfaces with the general purpose stack processor 374 designed to run with optimal code, power, and physical size efficiencies.
- Block program elements that have an iterative code structure have their instruction sets and associated variables stored and organized in the CPU/GPU memory group 356 .
- Global variables, master instruction sets, and the master program counter is stored in the global memory group 358 , which interfaces a master processor.
- the master processor could either the CPU/GPU processor(s) 376 or the general purpose stack processor 374 and administers the primary iterative code blocks.
- the redundant memory management group 360 is used to interface the general purpose stack machine computing module 350 with redundant systems or backup memory systems connected to the module through its I/O system interface 378 .
- the general utility memory management group 362 can be subdivided into a plurality of subgroupings and used to manage any purpose not delegated to the other groups, such as system buffering, or memory overflows.
- a master controller and instruction register 380 coordinate data and process transfers and function calls between the main memory bank 352 , the CPU/GPU processor(s) 376 , the general purpose stack processor 374 , and the I/O system interface 378 .
- Stack machine computers have demonstrated clear efficiency gains, measured in terms of processing speed, transistor count (size), power efficiency, and code density minimization, when applied to nested and recursive functions.
- conventional processors using register-based architectures can be configured as a virtual stack machine, considerable power and transistor counts savings are only achieved by applying structured programming languages (FORTH and POSTSCRIPT) to processors having matching machine code.
- FORTH and POSTSCRIPT structured programming languages
- the Computer Cowboys MuP21 processor which had machine code structured to match FORTH, managed 100 million-instructions-per-second (“MIPS”) with only 7,000 transistors consuming 50 mW. This represented a 1,000-fold decrease in transistor count, with associated benefits to component size/cost and power consumption over equivalent processors utilizing conventional register architectures.
- a specific embodiment of the general purpose stack machine computing module 350 incorporates an ASIC semiconductor die 122 to function as the module's stack processor 374 , wherein the ASIC die 122 is designed with machine code that matches and supports a structured programming language, preferably the FORTH or POSTSCRIPT programming languages. Since the primary objective of the invention is to develop a general purpose stack machine computing module, and an FPGA can be encoded with machine code that matches a structured programming language, a preferred embodiment of the invention comprises a general purpose stack machine computing module 350 that incorporates an FPGA as its stack processor 374 , or an FPGA configured as a stack processor 374 comprising multiple processing cores (not shown to avoid redundancy).
- the general purpose stack machine computing module's 350 operating system organizes the stack memory group 354 (see FIG. 9A ) to have a data stack register 382 , a return stack register 384 , and one or more instruction stack registers 386 .
- the one or more instruction registers 386 are used to store functions or subroutines as operator sequences, and can also be used to store instructions used by the stack processor 374 for retrieval at a later time.
- Each stack register 382 , 384 , 386 comprises memory cells 388 that contain the memory address (or pointer) of the item to be loaded program stack.
- the flexibility to change from one program stack to another, or change a global variable that is buried within a program stack, further allows program stacks to be manipulated using Last-In-First-Out (“LIFO”) or First-In-First-Out (“FIFO”) stack structures.
- LIFO Last-In-First-Out
- FIFO First-In-First-Out
- the stack buffer utility 390 loads the first desired operand from the stack main memory 392 into the top-of-the-stack (TOS) buffer 394 through the data bus 395 , while the next item-address listed in the data stack register 382 is loaded into the stack buffer utility 390 to configure it to load the second item in the data stack into the ALU operand buffer 396 during the subsequent operational cycle.
- the buffer utility 390 may also store a plurality of items that are address-mapped into its local register in exact sequence with the LIFO structure of the data stack register 382 .
- This process of using the stack buffer utility 390 to translate a LIFO structure of item-addresses into a self-consistent list of items at processor clock-speeds allows a pre-determined sequence of operands to be loaded into the ALU operand buffer 396 as though the sequence was loaded directly from the data stack register.
- the return register 384 comprises the list of addresses that are used to permanently store a block of instructional code so it can be returned when the stack processor 374 has completed the block calculation. Similarly, the return register 384 is also be used to list the address used to temporarily house a block of code that was interrupted so it can be retrieved following a status interrupt and reinstated to complete its original task. These lists are also formatted in LIFO structure to more easily maintain programmatic integrity.
- the instruction stack register 386 comprises a LIFO list of pointers to locations within the ALU 392 that represent specific machine-coded logical operations to be used as primitive element operators as described in FIG. 8A .
- the ALU address pointers in the instruction stack register 386 are sequenced to match primitive element algorithmic series to be applied to an associated set operands that will be loaded in tandem into the ALU 392 .
- the LIFO sequence of operator addresses are compiled as a list of operators to complete any recursive or nested loop calculation desired with the stack processor 374 .
- the mathematical operators in the instruction register 386 are loaded into the ALU 398 by means of an instruction set utility 400 .
- the instruction set utility 400 activates input paths within the ALU 398 that load the operands stored in the TOS 394 and ALU operand 396 buffers into the prescribed logical operator.
- the general purpose stack processor 374 allows all of the items specified in the data and instruction “stacks” ( 382 , 384 ) to be processed in a manner consistent with a conventional stack machine using minimal instruction sets, transistor counts, chip size, and power consumption.
- the instruction set utility 400 can also be configured to record and copy a programmable fixed number of operand pairs and operators so they can be played back again through the ALU 398 in proper sequence without affecting the instruction register 386 .
- a principal benefit of the stack processor 374 over, and its major distinction from, the prior art is its ability to use the memory management architecture 220 and computational operating system 265 to modify any global variable buried within a data stack 300 “on-the-fly” without a need to transfer the sequenced items in and out of cache to main memory to effectuate the global variable update, or waste operational cycles when making a program jump.
- This aspect of the invention couples a stack machine's inherent ability to execute fast subroutine linkages and interrupt responses with the invention's ability to load addressable items directly from main memory at speeds in step with the processors' operational cycle.
- This embodiment further enables the stack processor 374 to respond to a conditional logic interrupt triggered outside the stack or elsewhere in the system so it can operate alongside pipelined and multi-threaded CPU/GPU processor cores.
- This aspect of the invention allows the general purpose stack machine computing module 350 to support pipelined or multi-threaded general purpose architectures, which are additional embodiments of this invention.
- An update to a buried global value is effectuated when an alert from the master controller and instruction register 380 signaling that a global variable has been changed from somewhere in the system.
- the global variable could be changed in additional cores within the stack processor 374 , a neighboring CPU/GPU core 376 , or another general purpose stack machine computing module 350 configured as a distributed or fault-tolerant computing element, or a networked system connected to the module 350 through the I/O system 378 .
- the master controller and instruction register 380 activates commands over the status interrupt bus 402 to temporarily halt traffic over the data bus 395 . While data traffic is temporarily halted, the addressable item stored in stack main memory 392 that corresponds to the address pointer of the global variable loaded into the data stack register 382 is refreshed with the updated value from the global variable register 404 . Once the updated global variable is confirmed, the global variable register 404 signals the master controller and instruction register 378 to resume traffic over the data bus 395 .
- the updated value is loaded into the instruction set utility 400 during the system interrupt.
- the instruction set utility 400 then overrides the previously loaded operand with the updated global value during the cycle it is scheduled to be operated upon within the ALU 398 .
- the instruction set utility 400 is instructed to playback in reverse order the operands and operators it has copied and recorded, and then substitute the updated global value for the obsolete value before the interrupt is released.
- the instruction set utility 400 can use a series of operands and operators stored in the instruction stack register 386 to re-calculate the function with the updated global variable, if desired.
- the memory management flexibility enabled by the invention further provides a general purpose stack machine computing module 350 comprising a general purpose stack processor 374 that can be halted by a logical interrupt command to accommodate instructions that re-orient the computational program to block stored within module main memory bank 352 , or to an entirely new set of instructions that are pipelined in or threaded with other processors within or in communication with the module 350 .
- an interrupt flag originating from an internal logical process alerts the master controller and instruction register 380 to change the direction of the program based upon a pre-specified logical condition using any of the embodiments specified above, such as giving priority access to certain processes scheduled to run in the stack processor 374 or updating a global variable across main memory bank 352 , or any peripheral memory (not shown) networked to main memory bank 352 .
- the master controller and instruction register 380 issues commands to halt traffic on the data base 395 until the logical interrupt register 408 has loaded the high priority program blocks into the data stack 382 , return stack 384 , and instruction stack 386 registers, with all associated items placed in the stack memory group's 354 main memory 392 .
- the pointers previously loaded into the registers can be either be pushed further down the register, or redirected to other locations within module main memory bank 352 . Traffic is then restored to the data bus 395 allowing the higher priority process to run through to completion so the lower priority process then can be restored.
- the logical interrupt register 408 alerts the master controller and instruction register 380 to halt traffic on the data bus 395 .
- the stack program controller 406 coordinates with the instruction set utility 400 to record and store the state of the existing process so it can be restored at a later instance, while the logical interrupt register 408 pipelines the items from the external processor core(s) (not shown) through the status interrupt bus 402 .
- Additional data stack 382 , return stack 384 , and instruction set 386 registers may be allocated during the process and the imported items could be stored in any reliable location in main memory bank 352 .
- Pointers related to the threaded or pipelined processes address locations accessed through the I/O interface system 378 . Traffic over the data bus is reinitiated to activate computational processors in the stack processor 374 , and the threaded processes/data may be interleaved to run continually with the internal processes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Power Engineering (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Condensed Matter Physics & Semiconductors (AREA)
- Human Computer Interaction (AREA)
- Manufacturing & Machinery (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Semiconductor Integrated Circuits (AREA)
- Semiconductor Memories (AREA)
- Microcomputers (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Power Sources (AREA)
Abstract
A hybrid system-on-chip provides a plurality of memory and processor die mounted on a semiconductor carrier chip that contains a fully integrated power management system that switches DC power at speeds that match or approach processor core clock speeds, thereby allowing the efficient transfer of data between off-chip physical memory and processor die
Description
- This application claims priority from U.S. Provisional Patent Application Ser. No. 61/669,557, filed Jul. 9, 2012 and from U.S. Provisional Patent Application Ser. No. 61/776,333, filed Mar. 11, 2013, both of which are incorporated herein by reference in their entirety.
- The present invention relates generally to the construction of customized system-on-chip computing modules and specifically to the application of semiconductor carriers comprising a fully integrated power management system that transfers data between at least one memory die mounted on the semiconductor carrier at speeds that are synchronous or comparable with at least one general purpose processor die co-located on the semiconductor carrier.
- The present invention relates generally to methods and means that reduce the physical size and cost of high-speed computing modules. More specifically, the present invention instructs methods and means to flexibly form a hybrid computing module designed for specialized purposes that serve low market volume applications while using lower cost general purpose multi-core microprocessor chips having functional design capabilities that are generally restricted to high-volume market applications. In particular, the invention teaches the use of methods to switch high current (power) levels at high (GHz frequency) speeds by means of semiconductor carrier comprising a fully integrated power management system to maximize utilization rates of multi-core microprocessor chips having considerably more stack-based cache memory with the need for little or no on-board heap-based cache memory, thereby enabling higher performance, smaller overall system size and reduced system cost in specialized low volume market applications.
- Until recently, gains in computer performance have tracked with Moore's law, which states that transistor integration densities will double every 18 months. Although the ability to shrink the size of the transistor has lead to higher switching speeds and lower operating voltages, the ultra-large scale integration densities achievable through modern manufacturing methods has led to a leveling off in corresponding improvements in computer performance due to the large currents needed to power the ultra-large numbers of transistors. Silicon chips manufactured to the 22 nm manufacturing mode will draw 700 W-inch2 of semiconductor die. This large current draw needed to refresh and move data between die and across the surface of a single die has pushed the limitation of conventional power management circuits, which are restricted to significantly lower switching speeds. The large thermal loads generated by conventional power management systems further reduce system efficiency by requiring power management to be located significant distances from the processor and memory die, thereby adding loss through the power distribution network. Therefore, methods that reduce system losses by providing means to fabricate a hybrid computing module comprising power management systems that generate sufficiently low thermal loads to be situated in close proximity to the memory and microprocessor die are desirable.
- As is typically the case with transistors, higher power switching speeds are achieved in conventional power management by shrinking the surface area of the transistor gate electrode in power FETs. In conventional transistor architectures switching speeds are limited by gate capacitance, according to the following:
-
f=I ON/(C OX ×W×L×V dd) (1) - where,
-
f≡limiting switch frequency (1a) -
I ON≡source current (1b) -
C OX≡gate capacitance (1c) -
W≡gate width (1d) -
L≡gate length (1e) -
V dd≡drain voltage (1f) - Switching speed/frequency is increased by minimizing gate capacitance (COX), gate electrode surface area (W×L). However, minimizing gate electrode surface areas to achieve high switching speeds imposes self-limiting constraints in high power systems (>100 Watts) when managing large low voltage currents, as the large switched current is forced through small semiconductor volumes. The resultant high current densities generate higher On-resistance, which becomes a principal source for undesirable high thermal loads. Modern computing platforms require very large supply currents to operate due to the ultra-large number of transistors assembled into the processor cores. Higher speed processor cores require power management systems to function at higher speeds. Achieving higher speeds in the power management system's power FET by minimizing gate electrode surface areas creates very high current densities, which in turn generate high thermal loads. The high thermal loads require complex thermal management devices to be designed into the assembled system and usually require the power management and processor systems to be physically separated from one another for optimal thermal management. Therefore, methods and means to produce a hybrid computing module that embeds power management devices in close proximity to the processor cores to reduce loss and contain power FETs that switch large currents comprising several 10's to 100's of amperes at high speeds without generating large thermal loads are desirable.
- The inability of modern power management to switch large currents at speeds that keep pace with ultra-large scale integration (“ULSI”) transistor switching speeds has led to on-chip and off-chip data bottlenecks as there is insufficient power to transfer data from random-access memory stacks into the processor cores. These bottlenecks leave the individual cores in multi-core microprocessor systems under-utilized as it waits for the data to be delivered. Low core utilization rates (<25%) in multi-core microprocessors (quad core and greater) with minimal cache memory have forced manufacturers to add large cache memory banks to the processor die. The popular solution to this problem has been to allocate 30% or more of the modern microprocessor chip to cache memory circuits. In essence, this approach only masks the “data bottleneck” problem caused by having insufficient power to switch data stored nearby in physical random-access memory banks. This requirement weakens the economic impact of Moore's Law by reducing the processor die yield per wafer as the microprocessor die must allocate a substantial surface area to transistor banks that serve non-processor functions compared to the surface area reserved exclusively for logic functionality. The large loss of available processor real estate to cache memory in multi-core x86 processor chips is illustrated in FIGS. 1A,1B,1C.
FIG. 1A presents a scaled representation of a Nehalem quad-core microprocessor chip 1 fabricated using the 45 nm technology node. The chip's surface area is allocated for 4microprocessor cores system interconnect circuits 5A,5B, or approximately 30% of the total die surface area. Similarly, the Westmere dual-core microprocessor chip 6 (FIG. 1B ) fabricated using the 32 nm technology node allocates approximately 35% of its total available surface area to L3 cache memory 7 to serve its 2microprocessor cores 8A,8B. The Westmere-EP 6 core microprocessor chip 9 (FIG. 1C ) fabricated using the 32 nm technology node allocates approximately 35% of its total available surface area toL3 cache memory 10 to serve its 6microprocessor cores - Another major drawback to Moore's Law is the extremely high manufacturing costs at the smaller technology nodes. These extreme costs have potential to greatly restrict the scope of low-cost computing applications in all but the largest applications.
FIG. 2A shows the average costs of masks used to photolithographically pattern an individual material layer embedded within an integrated circuit assembly as a function of the manufacturing technology nodes. A key technology objective has been to integrate entire electronic systems on a chip. However, the significantly higher mask costs cause design and lithography costs to skyrocket at the more advanced technology nodes (45 nm & 32 nm).FIG. 2B shows the variation of design and lithography costs per function (memory, processor, controller, etc.) among the different technology nodes (65 nm, 45 nm, 32 nm) normalized to the fabrication cost at the 90 nm technology node for system-on-chip (“SoC”) devices servinglow volume 20, medium volume 22, and high volume (general purpose) 24 technology applications. The increasing design and lithography costs cause SoC applications fabricated to the more advanced technology nodes (45 nm and 32 nm) to be more expensive in low-volume 20 and medium-volume 22 markets than they would be when fabricated to the less advanced technology nodes (90 nm and 65 nm). These cost constraints cause generalpurpose SoC applications 24 to be the only instance in which cost, size, and power benefits can be simultaneously achieved with the more advanced technology nodes. Markets are not monolithic, which causes low and medium applications to dominate overall market volumes in the aggregate. Therefore methods and means that allow the cost savings, size, and power savings achieved with general purpose system semiconductor systems made through the more advanced technology nodes (45 nm, 32 nm, and beyond) to be integrated into hybridized SoC designs serving the wider utility low-volume and medium-volume market applications are desirable. - The ability for the semiconductor industry to shrink the size of individual transistors so the number of transistors that can be integrated into a square unit of a silicon chip's surface doubles every year has propelled computing performance on a path of exponential growth. While this path has led to exponential growth in computing performance, and substantial reductions in chip unit costs, Moore's Law has had some consequences that have started restricting the industry's options. First, the design, mask, and fabrication costs have grown exponentially. Secondly, limitations related to the long design times and extremely high foundry costs have thinned the number of chip producers in the marketplace. Lastly, as emphasized below, the inadequacy of signal routing through printed circuit boards has forced more circuit functionality to be integrated onto a single chip.
- Current industry roadmaps envision a complete System-on-Chip (“SoC”), which places all circuit functionality (processors, memory, field programmability, etc.) on a single semiconductor chip. This perception has emerged from recent history. As signal routing through printed circuit boards inhibited the ability to transfer data from main memory at microprocessor clock speeds, cache memory banks became a requirement for all CPU's. As cache memory management caused a single threaded CPU to generate more heat than can be reasonably transferred using market acceptable thermal management solutions, multi-core processors were developed to drive large number of transistors at higher speeds in parallel to keep pace with the exponential growth in performance demanded by the marketplace. It is now generally accepted that in 2015, it will no longer be possible to supply sufficient power to multi-core microprocessors to drive all the transistors and higher speeds. The current solution being proposed by the industry is to integrate the full functionality of all circuitry onto a single SoC. While this will not allow all transistor to be operating simultaneously, nor will it allow them all to operating at higher speeds, this proposed solution will keep pace with the exponential growth curve the industry is accustomed to.
- The problem with this solution will be marketplace acceptance. In 1996, National Semiconductor had acquired all the intellectual property needed to integrate a laptop computer onto a single chip. While elegant, this solution failed for several reasons. First, the marketplace was too fragmented for a one-size-fits-all solution. Secondly, the marketplace was changing too fast to digest the 2-year minimum design cycles needed to produce the one-size-fits-all solution. Economic history has clearly demonstrated that flexible hybrid solutions are much preferred solutions to system consolidation in the broader marketplace.
- The terms “active component” or “active element” is herein understood to refer to its conventional definition as an element of an electrical circuit that that does require electrical power to operate and is capable of producing power gain.
- The term “atomicity” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as an indivisible block of programming code that defines an operation that either does not happen at all or is fully completed when used.
- The term “cache memory” herein refers to its conventional meaning as an electrical bit-based memory system that is physically located on the microprocessor die and used to store stack variables and main memory pointers or addresses.
- The terms “chemical complexity”, “compositional complexity”, “chemically complex”, or “compositionally complex” are herein understood to refer to a material, such as a metal or superalloy, compound semiconductor, or ceramic that consists of three (3) or more elements from the periodic table.
- The term “chip carrier” is herein understood to refer to an interconnect structure built into a semiconductor substrate that contains wiring elements and active components that route electrical signals between one or more integrated circuits mounted on chip carrier's surface and a larger electrical system that they may be connected to.
- The term “coherency” or “memory coherence” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as an issue that affects the design of computer systems in which two or more processors or cores share a common area of memory and the processors are notified of changes to shared data values in the common memory location when it is updated by one of the processing elements.
- The term “consistency” or “memory consistency” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as a model for distributed shared memory or distributed data stores (file systems, web caching, databases, replication systems) that specifies rules that allow memory to be consistent and the results of memory operations to be predictable.
- The term “computing system” is herein understood to mean any microprocessor-based system comprising a register compatible with 32, 64, 128 (or any integral multiple thereof) bit architectures that is used to electrically process data or render computational analysis that delivers useful information to an end-user.
- The term “critical performance tolerances” is herein understood to refer to the ability for all passive components in an electrical circuit to hold performance values within ±1% of the desired values at all operating temperatures over which the circuit was designed to function.
- The term “die” is herein understood to refer to its conventional meaning as a sectioned slide of semiconductor material that comprises a fully functioning integrated circuit.
- The term “DMA” or Direct Memory Access is herein understood to mean a method by which devices either external or internal to the systems chassis, having a means to bypass normal processor functionality, updates or reads main memory and signals the processor(s) the operation is complete. This is usually done to avoid slow memory controller functionality and or in cases where normal processor functionality is not needed.
- The term “electroceramic” is herein understood to refer to its conventional meaning as being a complex ceramic material that has robust dielectric properties that augment the field densities of applied electrical or magnetic stimulus.
- The term “FET” is herein understood to refer to its generally accepted definition of a field effect transistor wherein a voltage applied to an insulated gate electrode induces an electrical field through insulator that is used to modulate a current between a source electrode and a drain electrode.
- The term “heap memory” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as a large pool of memory, generally located in RAM, that has divisible portions dynamically allocated for current and future memory requests.
- The term “Hybrid Memory Cube” is herein understood to refer a DRAM memory architecture that combines high-speed logic processing within a stack of through-silicon-via bonded memory die and is under development through the Hybrid Memory Cube Consortium.
- The term “integrated circuit” is herein understood to mean a semiconductor chip into which a large, very large, or ultra-large number of transistor elements have been embedded.
- The term “kernel” is herein understood to refer to its conventional meaning in computer operating systems as the communications interface between the computing applications and the data processing hardware and manages the system's lowest-level abstraction layer controlling basic processor and I/O device resources.
- The “latency” or “column address strobe (CAS) latency” is the delay time between the moment a memory controller tells the memory module to access a particular memory column on a random-access memory (RAM) module and the moment the data from the given memory location is available on the module's output pins.
- The term “LCD” is herein understood to mean a method that uses liquid precursor solutions to fabricate materials of arbitrary compositional or chemical complexity as an amorphous laminate or free-standing body or as a crystalline laminate or free-standing body that has atomic-scale chemical uniformity and a microstructure that is controllable down to nanoscale dimensions.
- The terms “main memory” or “physical memory” are herein understood to refer to their conventional definitions as memory that is not part of the microprocessor die and is physically located in separate electronic modules that are linked to the microprocessor through input/output (I/O) controllers that are usually integrated into the processor die.
- The term “ordering” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as a system of special instructions, such as memory fences or barriers, which prevent a multi-threaded program from running out of sequence.
- The term “passive component” is herein understood to refer to its conventional definition as an element of an electrical circuit that that modulates the phase or amplitude of an electrical signal without producing power gain.
- The term “pipeline” or “instruction pipeline” is herein understood to refer to a technique used in the design of computers to increase their instruction throughput, (the number of instructions that can be executed in a unit of time), by running multiple operations in parallel.
- The term “processor” is herein understood to be interchangeable with the conventional definition of a microprocessor integrated circuit.
- The term “RISC” is herein understood to refer to its conventional meaning with regards to computing systems as a microprocessor designed to perform a smaller number of computer instruction types, wherein each type of computer instruction utilizes a dedicated set of transistors so the lower number of instruction types reduces the microprocessor's overall transistor count.
- The term “resonant gate transistor” is herein understood to refer to any of the transistor architectures disclosed in de Rochemont, U.S. Ser. No. 13/216,192, “POWER FET WITH A RESONANT TRANSISTOR GATE”, wherein the transistor switching speed is not limited by the capacitance of the transistor gate, but operates at frequencies that cause the gate capacitance to resonate with inductive elements embedded within the gate structure.
- The term “shared data” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as data elements that are simultaneously used by two or more microprocessor cores.
- The term “stack” or “stack-based memory allocation” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as regions of memory reserved for a thread where data is added or removed in a last-in-first-out protocol.
- The term “stack-based computing” is herein understood to describe a computational system that primarily uses a stack-based memory allocation and retrieval protocol in preference to conventional register-cache computational models.
- The term “standard operating temperatures” is herein understood to mean the range of temperatures between −40° C. and +125° C.
- The term “thermoelectric effect” is herein understood to refer to its conventional definition as the physical phenomenon wherein a temperature differential applied across a material induces a voltage differential within that material, and/or an applied voltage differential across the material induces a temperature differential within that material.
- The term “thermoelectric material” is herein understood to refer to its conventional definition as a solid material that exhibits the “thermoelectric effect”.
- The terms “tight tolerance” or “critical tolerance” are herein understood to mean a performance value, such as a capacitance, inductance, or resistance that varies less than ±1% over standard operating temperatures.
- The term “visibility” is herein understood to refer to its conventional meaning with regards to computing and programmatic memory usage as the ability of, or timeliness with which, other threads are notified of changes made to a current programming thread.
- The term “II-VI compound semiconductor” is herein understood to refer to its conventional meaning describing a compound semiconductor comprising at least one element from column IIB of the periodic table including: zinc (Zn), cadmium (Cd), or mercury (Hg); and, at least one element from column VI of the periodic table consisting of: oxygen (O), sulfur (S), selenium (Se), or tellurium (Te).
- The term “III-V compound semiconductor” is herein understood to refer to its conventional meaning describing a compound semiconductor comprising at least one semi-metallic element from column III of the periodic table including: boron (B), aluminum (Al), gallium (Ga), and indium (In); and, at least one gaseous or semi-metallic element from the column V of the periodic table consisting of: nitrogen (N), phosphorous (P), arsenic (As), antimony (Sb), or bismuth (Bi).
- The term “IV-IV compound semiconductor” is herein understood to refer to its conventional meaning describing a compound semiconductor comprising a plurality of elements from column IV of the periodic table including: carbon (C), silicon (Si), germanium (Ge), tin (Sn), or lead (Pb).
- The term “IV-VI compound semiconductor” is herein understood to refer to its conventional meaning describing a compound semiconductor comprising at least one element from column IV of the periodic table including: carbon (C), silicon (Si), germanium (Ge), tin (Sn), or lead (Pb); and, at least one element from column VI of the periodic table consisting of: sulfur (S), selenium (Se), or tellurium (Te).
- The present invention generally relates to a hybrid system-on-chip that comprises a plurality of memory and processor die mounted on a semiconductor carrier chip that contains a fully integrated power management system that switches DC power at speeds that match or approach processor core clock speeds, thereby allowing the efficient transfer of data between off-chip physical memory and processor die. The present invention relates to methods and means to reduce the size and cost of computing systems, while increasing performance. The present invention relates to methods and means to provide a factor increase in computing performance per processor die surface area while only fractionally increasing power consumption.
- One embodiment of the present invention provides a hybrid computing module, comprising: a semiconductor carrier including a substrate adapted to provide electrical communication, through electrically conducting traces and passive circuit network filtering elements formed upon the carrier substrate, between a fully integrated power management circuit module having a resonant gate transistor to switch electrical power to drive the transfer of data and digital process instruction sets between a plurality of discrete semiconductor die mounted upon the semiconductor carrier, wherein the plurality of discrete semiconductor die include: at least one microprocessor die forming a central processing unit (CPU), and a memory bank having at least one memory die.
- The plurality of semiconductor die may include a field programmable gate array (FPGA) or provide memory controller functionality. The memory controller functionality may be field programmable or be provided by a static address memory controller. The plurality of semiconductor die may additionally include a graphics processing unit (GPU) or an application-specific integrated circuit (ASIC). The plurality of semiconductor die may be mounted as a stack on the semiconductor carrier. The module may further comprise a plurality of semiconductor die mounted upon the hybrid computing module that provide GPU and field programmability. The CPU and GPU semiconductor die may comprise multiple processing cores. The substrate forming the semiconductor carrier may be a semiconductor. Active circuitry may be embedded in the semiconductor substrate that manages USB, audio, video and other communications bus interface protocols. The microprocessor die may contain multiple processing cores or may have cache memory that occupies less than 15% or even 10% of the microprocessor die footprint. The plurality of discreet semiconductor die may be configured as a chip stack. The hybrid computing module may contain a plurality of central processing units, each functioning as distributed processing cores or a plurality of central processing units that are configured to function as a fault-tolerant computing system. The hybrid computing module may be in thermal contact with a thermoelectric device. The passive circuit network filtering elements formed upon the semiconductor carrier may have performance values that maintain critical performance tolerances. The memory die may be mounted within a stack comprising additional semiconductor die.
- The fully integrated power management module may be mounted on the semiconductor carrier and may switch power at speeds greater than 250 MHz. The fully integrated power management module may switch power at speeds in the range of 600 MHz to 60 GHz. The fully integrated power management module may be formed upon the semiconductor carrier.
- The semiconductor carrier may be in electrical communication with an electro-optic drivers that interface the hybrid computing module with other systems by means of fiber-optic network. The electro-optical interface may contain an active layer that forms a 3D electron gas.
- Another embodiment of the present invention provides a real-time memory access computing architecture, comprising: a hybrid computer module comprising a plurality of discrete semiconductor die mounted upon a semiconductor carrier, which hybrid computer module further comprises: a fully integrated power management module having a resonant gate transistor, wherein the fully integrated power management module is adapted to synchronously switch power at speeds that match a clock speed of a microprocessor on an adjacent microprocessor die mounted within the hybrid computer module to provide real-time memory access; a look-up table adapted to select a pointer to reference addresses in a main memory where data and/or processes are physically located; a memory management variable that uses the look-up table to select the next set of data and/or processes called by the microprocessor; a memory bank forming the main memory, wherein, ≧50% of cache memory of the microprocessor die is allocated to stack-based memory functionality.
- The resonant transistor gate may switch power at speeds between 600 MHz and 60 GHz. The fully integrated power management module may have an efficiency greater than 98%. The computing architecture may have 70%-100% of the microprocessor die cache memory is allocated to stack-based memory functionality. The look-up table may be located in cache memory or in main memory. The main memory resources may provide both stack-based and heap-based memory functionality. The memory management variable may be adapted to instruct the look-up table to reassign and/or reallocate main memory addresses.
- The computing architecture does not have to include a memory management algorithm that predictively manages the inflow of stack-based memory functions into the cache memory of a processor die within the hybrid computer module. The computing architecture processor die may have no cache memory.
- The present invention is illustratively shown and described in reference to the accompanying drawings, in which:
- FIGS. 1A,1B,1C depict the scaled surface areas distributed to cache memory and processor functions in modern microprocessor systems.
- FIGS. 2A,2B depict the higher design and lithography costs of advanced semiconductor technology nodes and their impact on the cost SoC systems as a function of varying market volumes.
- FIGS. 3A,3B depict the hybrid computing module.
- FIGS. 4A,4B illustrate multi-core microprocessor die with reduced cache memory used in the hybrid computing module.
- FIGS. 5A,5B,5C depict the use of semiconductor layers that form 3-D electron gases.
-
FIG. 6 illustrates the use of a thermoelectric device in the hybrid computing module. - FIGS. 7A,7B,7C,7D,7E,7F illustrate the invention's methods and embodiments that enable minimal instruction set computing suitable for general purpose applications.
- FIGS. 8A,8B depict the prior art related to stack machines.
- FIGS. 9A,9B illustrate characteristic features of a general purpose stack machine enabled by this invention.
- The present invention is illustratively described above in reference to the disclosed embodiments. Various modifications and changes may be made to the disclosed embodiments by persons skilled in the art without departing from the scope of the present invention as defined in the appended claims.
- This application incorporates by reference all matter contained in de Rochemont U.S. No. 7,405,698 entitled “CERAMIC ANTENNA MODULE AND METHODS OF MANUFACTURE THEREOF” (the '698 application), de Rochemont U.S. Ser. No. 11/479,159, filed Jun. 30, 2006, entitled “ELECTRICAL COMPONENT AND METHOD OF MANUFACTURE” (the '159 application), U.S. Ser. No. 11/620,042 (the '042 application), filed Jan. 6, 2007 entitled “POWER MANAGEMENT MODULES”, de Rochemont and Kovacs, “LIQUID CHEMICAL DEPOSITION PROCESS APPARATUS AND EMBODIMENTS”, U.S. Ser. No. 12/843,112, ('112), de Rochemont, “MONOLITHIC DC/DC POWER MANAGEMENT MODULE WITH SURFACE FET”, U.S. Ser. No. 13/152,222 ('222), de Rochemont, “SEMICONDUCTOR CARRIER WITH VERTICAL POWER FET MODULE”, U.S. Ser. No. 13/168,922 ('922A), de Rochemont “CUTTING TOOL AND METHOD OF MANUFACTURE”, U.S. Ser. No. 13/182,405, ('405), “POWER FET WITH A RESONANT TRANSISTOR GATE”, U.S. Ser. No. 13/216,192 ('192), de Rochemont, “SEMICONDUCTOR CHIP CARRIERS WITH MONOLITHICALLY INTEGRATED QUANTUM DOT DEVICES AND METHOD OF MANUFACTURE THEREOF”, U.S. Ser. No. 13/288,922 ('922B), and, de Rochemont, “FULLY INTEGRATED THERMOELECTRIC DEVICES AND THEIR APPLICATION TO AEROSPACE DE-ICING SYSTEMS”, U.S. Application No. 61/529,302 ('302).
- The '698 application instructs on methods and embodiments that provide meta-material dielectrics that have dielectric inclusion(s) with performance values that remain stable as a function of operating temperature. This is achieved by controlling the dielectric inclusion(s)' microstructure to nanoscale dimensions less than or equal to 50 nm. de Rochemont '159 and '042 instruct the integration of passive components that hold performance values that remain stable with temperature in printed circuit boards, semiconductor chip packages, wafer-scale SoC die, and power management systems. de Rochemont '159 instructs on how LCD is applied to form passive filtering networks and quarter wave transformers in radio frequency or wireless applications that are integrated into a printed circuit board, ceramic package, or semiconductor component. de Rochemont '042 instructs methods to form an adaptive inductor coil that can be integrated into a printed circuit board, ceramic package, or semiconductor device. de Rochemont et al. '112 discloses the liquid chemical deposition (LCD) process and apparatus used to produce macroscopically large compositionally complex materials, that consist of a theoretically dense network of polycrystalline microstructures comprising uniformly distributed grains with maximum dimensions less than 50 nm. Complex materials are defined to include semiconductors, metals or super alloys, and metal oxide ceramics. de Rochemont '222 and '922A instruct on methods and embodiments related to a fully integrated low EMI, high power density inductor coil and/or high power density power management module. de Rochemont '192 instructs on methods to integrate a field effect transistor that switch arbitrarily large currents at arbitrarily high speeds with minimal On-resistance into a fully integrated silicon chip carrier. de Rochemont '922B instructs methods and embodiments to integrated semiconductor layers that produce a 3-dimensional electron gas within semiconductor chip carriers and monolithically integrated microelectronic modules. de Rochemont '302 instructs methods and embodiments to optimize thermoelectric device performance by integrating chemically complex semiconductor material having nanoscale microstructure.
- Reference is now made to
FIGS. 3-6 to illustrate various embodiments and means pertaining to the present invention. A hybrid system-on-chip (“SoC”)computing module 100 is shown in a perspective view inFIG. 3A and a top view inFIG. 3B . Thehybrid computing module 100 is formed by mounting at least one microprocessor die 102A,B with at least onememory bank 104A,B on asemiconductor chip carrier 106. Thesemiconductor chip carrier 106 consists of a substrate, preferably a semiconducting substrate, upon which electrically conducting traces and passive circuit network filtering elements have been formed, and a plurality of semiconductor die and circuit modules have been mounted or monolithically integrated. Although a semiconducting substrate is preferred because it enables the further integration of active circuitry within the semiconductor chip carrier's 106 base support structure, the substrate may alternatively comprise an electrically insulting material that has high thermal conductivity such as MAX-phase materials referenced in de Rochemont '405, which enable substrate materials that having electrical resistivity greater than 1010 ohm-cm and thermal conductivity greater than 100 W-m−-K−1. - The at least one microprocessor die 102A,B is preferably a multi-core processor, which may be assigned logic, graphic, central processing, or math functions. The at least one
memory bank 104A,B is preferably configured as a stack of memory die and may be a Hybrid Memory Cube™ currently under development. Thememory bank 104A,B may optionally comprise an integrated circuit within the stack that provides memory controller functionality that arbitrates management issues and protocols with the microprocessor die 102A,B. The controller chip stacked within thememory bank 104A,B may comprise a field programmable gate array (FPGA), but is preferably a static address memory controller. It may alternatively provide application-specific functionality that supports kernel management utilities unique to the low-volume, or mid-volume application for which thehybrid computing module 100 was designed, which improves computing performance over general purpose solutions. Various embodiments of thesemiconductor chip carrier 106 useful to the present applications as well as methods of their construction are described in greater detail in de Rochemont '222, '922A, '192, which are incorporated herein by reference. For the purposes of illustrating this invention, thesemiconductor chip carrier 106 consists of apower management module 108 that is either mounted on to or monolithically integrated into thesemiconductor chip carrier 106,passive circuit networks 110 as needed to properly regulate thepower bus 112 andinterconnect bus 114 networks, ground planes 115, input/output pads 116, and timing circuitry that are fully integrated on to the semiconductor chip carrier using LCD methods described in de Rochemont and Kovacs '112 and de Rochemont '159. Thesemiconductor chip carrier 106 may additionally comprise standard bus functionality (not shown for clarity) in the form of circuitry that is integrated within its body to manage processing buffers, audio, video, parallel bus or universal serial bus (USB) functionality. Thepower management module 108 incorporates a resonant gate power transistor configured to reduce loss within thepower management module 108 to levels less than 2% and to switch power regulating currents greater than 0.005 A at speeds greater than 250 MHz, preferably at speeds in the range of 600 MHz to 60 GHz, that can be tuned to match or support clock speed(s) of the microprocessor die 102A,B, or transfer data from main memory at to the processor die at speeds that range from the processor clock speed to 1/10th the processor clock speed using methods and means instructed in de Rochemont '922A and '192. Although FIGS. 3A,3B only depict a single power management module for convenience, a plurality ofpower management modules 108 may be integrated into thesemiconductor chip carrier 106 as may be needed to serve a particular design objective for thehybrid computing module 100. For instance, digital radio systems incorporate baseband-processors to manage radio control functions (signal modulation, encoding/decoding, radio frequency shifting, etc.). Baseband processors manage lower frequency processes, but are often separated from the main CPU because they are highly dependent on timing and require certification of their software stack by government regulatory bodies. Although the current invention enables the real-time processing needed to integrate the baseband processors with the CPU, (see “stack-based computing” below), it might be advantageous to mount a certified baseband processor (102B) separately from the main CPU (102A) to avoid system certification delays. In this instance, the design might also include an additional “off-stepped” power management module (not shown) that regulates power at lower switching speeds that are in-step with the baseband processing unit. - The hybrid computing module may also comprise one or more electro-
optic signal drivers 118 that interface the module to within a larger computing or communications system by means of an optical waveguide or fiber-optic network through input/output ports 120A,120B. Additionally, the hybrid computing module may also comprise application-specific integrated circuitry (ASIC) semiconductor die 122 that coordinate interactions between microprocessor die 102A,B andmemory banks 104A,B. Although the ASIC semiconductor die 122 may have specific processor functions described below, it can also be used to customize memory management protocols to achieve improved coherency in low-volume to mid-volume applications, or to serve a specific functional need, such as radio signal modulation/de-modulation, or to respond to specific data/sensory inputs for which thecomputing module 100 was uniquely designed. Multiple cost, performance, foot print and power management benefits are enabled as a result of the module configuration defined by this invention. - The high efficiency (98+%) of the low-loss
power management module 108 allows it to be placed in close proximity to the microprocessor die 102A,B andmemory banks 104A,B. This ability to integrate low loss passive components operating at critical performance tolerances with active elements embedded within thesemiconductor chip carrier 106, or within semiconductor layers deposited thereupon, is used to resolve many of the technical constraints outlined above that lead to on-chip and off-chip data bottlenecks that compromise system performance in system-on-chip (“SoC”) product offerings. The efficient switching of large currents at speeds that match the processor clock(s) are achieved by integrating a resonant gate transistor into the monolithically integratedpower management module 108 using the means and methods described in de Rochemont '922A and '192. The resonant response of the resonant gate transistor modulating the power management module's power FET is tuned to match core clock speeds in the microprocessor die 102A,B. Designing the power management module to synchronously match off-chip memory latency and bandwidth to the needs of computing system cores allows data fromphysical memory banks 104A,B to be efficiently transferred to and from processor cores, thereby mitigating the need for large on-chip cache memory in the microprocessor die 102A,B. Although prior reference is made to x86 microprocessor core architecture to establish visual clarity in FIGS. 1A,1B,1C, the generic value of this invention applies to computing systems of any known or unknown 32-bit, 64-bit, 128-bit (or larger) microprocessor architecture. Therefore, a preferred embodiment of the hybrid computer module utilizesmulti-core processors 150/160 (102A,B) that have less than 15%, preferably less than 10% of their surface areas allocated to cache memory 152/160 as shown in FIGS. 4A,4B. Multi-core processor die 150 that minimize the fractional percentage of semiconductor surface area allocated tocache memory processor core 154 functionality have smaller footprint, resulting in higher productivity yields and lower production costs. The use of microprocessor die 150 wherein the ratio ofprocessor cores 154 to cache memory 152 functionality is greater than 90% increases computing performance by more than 30%-50% per square millimeter (mm2) of processor integrated circuitry. Reduced cache memory 152 requirements within the processor die 150 (102A,B) boost productivity yields per wafer, which lowers chip and system costs to thehybrid computing module 100. -
FIG. 4A illustrates the relative size of a scaled representation of a Nehalem quad-core microprocessor chip 150 fabricated using the 45 nm technology node if it were designed to have 10% of its surface area allocated to cache memory for comparison withFIG. 1A . The chip's surface area is allocated for 4microprocessor cores L3 cache memory 164 that has been reduced in size. In this instance, theL3 cache memory 164 occupies roughly 10% of the surface area not allocated to system interconnect circuits. Similarly,FIG. 4B illustrates a modified Westmere-EP 6core microprocessor chip 160 fabricated using the 32 nm technology node that allocates less than 10% of its available surface area toL3 cache memory 164 to serve its 6microprocessor cores FIG. 1C . The smaller size of the processor die's cache memory directly reflects smaller cache memory capacity. Therefore an alternative embodiment of the invention claims a computing system comprising ahybrid computing module 100 consisting ofprocessor functionality 102A,B and physical memory utility (memory banks) 104A,B that is segregated onto discrete semiconductor die mounted upon a monolithically integratedsemiconductor chip carrier 106, wherein the processor die 102A,B have on-board cache memory capacities less than 16 Mb/core, preferably less than 128 Kb/core. - A subsequent embodiment of the invention enabled by mounting microprocessor die 102A,B and
memory banks 104A,B upon asemiconductor chip carrier 106 comprising a monolithically integrated, high-speedpower management module 108 that synchronously switches power at processor clock speeds provides real-time memory access by removing the need for direct-memory access updates from cache memory. In this configuration of thehybrid computing module 100, main memory resources located inmemory banks 104A,B serve all stack-based and heap-based memory functionality for microprocessor die 102A,B. The microprocessor die 102A,B may be organized as distributed computing cells or serve as a fault-tolerant computing platform. - An additional embodiment of the
hybrid computer module 100 further reduces cost through the use of ASIC semiconductor die 122A,122B to customize the performance of general purpose microprocessor systems for broader application to low- and mid-volume market sectors. As illustrated in FIGS. 2A,2B, the higher design and masking costs of the more advanced technology nodes (45 nm & 32 nm) causes SoC semiconductor die to be more expensive in low-volume 20 and mid-volume 22 market segments. An SoC device will integrate a plurality of functions into a single die. Therefore, fully integrated system-on-chip device fabricated at the 45 nm or 32 nm technology nodes for low-volume 20 and mid-volume 22 applications will be more than-2-3× more expensive than the same device fabricated at the 90 nm node after the normalized cost per function is figured into the total cost. SoC cost savings only achieve greater than marginal benefit at the 32 nm node and beyond in large volume markets 24. Historically, low-volume and mid-volume applications comprise the majority of market applications in the aggregate. As a result of these trends, the more advanced technology nodes (32 nm and beyond) will ultimately impose higher or unacceptable costs upon applications serving the larger aggregate market or force those applications to be unserved. Most system applications need to customize performance by optimizing memory management functions to a specific application. Therefore, it is a specific embodiment of thehybrid computing module 100 to incorporate general purpose microprocessor die 102A,B andmemory banks 104A,B fabricated to the highest technology node and use ASIC semiconductor die 122A,122B to tailor functions for a specific application. Semiconductor die adjacent to the microprocessor die 102A,102B may provide any functional process to the hybrid computing module, including analog-to-digital or digital-to-analog functionality. Functionality provided by the ASIC semiconductor die 122A,122B (or other die) and bus management circuitry embedded within thesemiconductor chip carrier 106 may be fabricated using a lower technology whenever it is possible to do so. - As shown in FIGS. 5A,5B,5C, a further embodiment of the
hybrid computing module 100 uses methods described in de Rochemont '192, incorporated herein by reference, to integrate asemiconductor layer semiconductor chip carrier 106, thepower management module 108, or the electro-optic driver 118, respectively, to further improve switching speeds within those devices. - An additional embodiment of invention, (see
FIG. 6 ), utilizes athermoelectric module 140 in thermal communication with the unpopulatedmajor surface 142 of thesemiconductor chip carrier 106 to pump heat generated by the active components mounted on or integrated into thechip carrier 106 to athermal reservoir 144. A preferred embodiment of thethermoelectric module 140 utilizes methods and means described by de Rochemont '302, incorporated herein by reference, to integrate thethermoelectric module 140 into thehybrid computing module 100. Thermoelectric modules may also be mounted onto a free surface of various semiconductor mounted onto thesemiconductor chip carrier 106. - As described in the Background to the Invention above, larger cache memories on multi-core processor die have been required due to an inability to supply sufficient levels of power pulsed at high enough clock speeds to efficiently transfer data from physical memory to the processor cores. This has resulted in problems with latency and memory coherence in SoC computing and processor designs. Without the larger cache memories underutilized multi-core processors clock “zeros” waiting for the data to be input to the system.
- Pulsed power is required to access (read or write) and to refresh data stored within arrays of physical and cache memory. Larger memory banks require larger currents to strobe and transfer data from physical memory to the processor cores. Large latency, driven by the inability of alternative power management solutions to pulse sufficiently large currents at duty cycles close to processor core clock speeds have necessitated the move to integrate
larger cache memory 4,7,10 on conventional multi-core processor die 1,6,9 (see FIGS. 1A,1B,1C). The larger cache memories mask the data transfer deficiencies and mitigate associated problems with memory coherence in computing platforms. These problems are resolved by improving the speed and efficiency of power management modules supplying the computing platform and providing means to maintain signal integrity within passive circuit and interconnect networks used to route high-speed digital signals within the system. - Latency in asynchronous dynamic random access memory (DRAM) remains constant, so the time delay between presenting a column address and receiving the data on the output pins is fixed by the internal configuration of the DRAM array. Synchronous DRAM (SDRAM) modules organize plurality of DRAM arrays in a single module. The column address strobe (CAS) latency in SDRAM modules is dependent upon the clock rate and is specified in clock ticks instead of real time. Therefore, computing systems that reduce latency in SDRAM modules by enabling large currents to be strobed at gigahertz clock speeds improve overall system performance through efficient, high-speed data transfers between physical memory and the processor cores. An embodiment of
hybrid computing module 100 designs thepower management 108 to regulate currents greater than 50 A, preferably greater than 100 A. As is known to engineers skilled in the art of high-power circuits, care needs to be taken in laying out metallization patterns inpassive circuit networks 110,power bus 112,interconnect bus 114, andground planes 115 to minimize problems associated with electromigration in conducting elements integrated within the module. - The
hybrid computing module 100 situates thememory banks 104A,B in close proximity to themicroprocessor cores 102A,B to reduce delay times and minimize deleterious noise influences. Tight tolerance passive elements enabled by LCD manufacturing methods integrated into thepassive circuit networks 110 are used to improve signal integrity and control leakage currents by maintaining stable transmission line and filtering characteristics over standard operating temperatures. Methods that minimize loss in the magnetic cores of inductor and transformer components described in de Rochemont '222, incorporated herein by reference, are used to maximize the efficiency and signal integrity ofpassive circuit networks 110 andpower management modules 108. Large currents (>50 A) regulated at microprocessor clock speeds bypower management modules 108 operating at 98+% efficiencies supply the processor die 102A,B (150) andmemory banks 104A,B to reduce latency while boosting core utilization rates above 50% even though on-chip cache memory is reduced in the processor die 102A,B. - Matching off-chip memory latency and bandwidth to meet the needs of the computing systems' cores removes the need for large on-chip cache memories and improves coherence by maintaining all shared data in physical memory where it is simultaneously available to all processor cores. Removing on-chip memory constraints leads to roughly 35%-50% increase in performance per square millimeter (mm2) of microprocessor real estate. A typical 6 core-Westmere-EP cpu 9 (see
FIG. 1C ) operating at voltages between 0.75 V and 1.35 V and a switching speed of 3.0 GHz consumes 95 Watts. The same cpu driven at 4.6 GHz (a 54% increase in switching frequency) will consume 45% more power due to a combination of higher voltage and larger switching currents, assuming leakage is tightly controlled. The system will consume 150 W of supplied power when it is supplied by a power management device that has a 92% conversion efficiency. - A
hybrid computing module 100 comprising a high efficiencypower management module 108 having a 98+% efficiency that is capable of driving large currents at switching speeds that match processor core clock speeds (2-50 GHz) improves performance and power consumption through superior conversion efficiencies and lower cpu operating voltages. A 9-core version of the same processor, reconfigured by eliminating on-chipL3 cache memory 10, would consume 45% more power when operated at 3.0 GHz while occupying roughly the same footprint as the 6-core Westmere-EP cpu 9. As a general rule, thehybrid computing module 100 provides a 2.3× (230%) increase in performance while decreasing CPU power consumption 17%. simply by eliminating power consumed in cache memory from the processor die. System-level performance comparisons are provided in Table I immediately below. -
TABLE I Clock Operating Conversion Power Cores Speed (GHz) Voltage Efficiency Consumption 6 4.6 1.35 92% 150 W 6 4.6 0.75 98% 84 W 9 4.6 0.75 98% 121 W - It has long been a desired function to have real-time, low latency main memory updates generated by the processor die. This invention allows for such functionality that mitigates and greatly minimizes the need for cache-based heap memory, resulting in smaller-sized processor dies when compared to conventional chip designs, it enables processor die cache memories that can be tasked primarily for stack-based resources. It is therefore another preferred embodiment of the invention to enable a direct memory access computing system wherein ≧50% of the cache memory, preferably 70% to 100% of the cache memory, is allocated to stack-based, rather than heap-based, memory functions. Therefore, a principal embodiment of the invention is a computing system wherein heap-based memory functionality (i.e. pointers which map cache memory to RAM) is removed entirely from cache memory and placed in main memory. A further embodiment of the invention provides for the management of stack-based and heap-based memory functions directly from physical or main memory. Additionally, changes in operational architectures would be possible due to synchronization between the system processor(s) and main memory. Further benefits include the removal of expensive control algorithms providing cache and memory coherency functionality as well as cache hit-miss prediction. Much flatter memory designs can be achieved removing the need for multiple layers of cache memory.
- The improved computer architectures and operating systems enabled by the
hybrid computer module 100 are depicted inFIGS. 7A-7F . Computing systems that utilize cache memory to achieve higher speed require amemory management architecture 200 that employspredictive algorithms 202 located incache memory 204 to manage the flow of data and instruction sets in and out ofcache memory 204. Memory coherence is maintained through invalidation-based or update-based arbitration protocols. Thealgorithms 202 reference a look-up table (register or directory) 206, which may be located incache memory 204 orphysical memory 208 that contains a list ofpointers 210. The pointers reference addresses 212 where program stacks 214 comprising sequenced lists of data and process instructions that define a computational process are located inphysical memory 206. When theprocessor core 215, calls a selectedprogram stack 214, a copy of the calledprogram stack 216 listing data and/or processes needed to serve a computational objective is then loaded into the cache memory for subsequent processing by theprocessor unit 215. - Conventional computing systems crash or freeze when the
predictive algorithms 202 fail to properly estimate cache memory requirements of the calledprogram stack 216. When this occurs, the copied data and/or processes in the calledprogram stack 216 have a bit-load that overflows the bit-space available in cache memory. The subsequent “stack overflow” usually requires the entire system to be re-booted because it can no longer find the next steps in the desired computational process. Therefore, a higher efficiency computing platform that is invulnerable to cache memory stack overflows and does not require apredictive algorithm 202 or acache memory 204 to complete complex or general purpose computations is highly desirable. - An additional deficiency of cache-based computing is the need to dedicate roughly 45% of the transistors in the
processor 215 and 30%-70% of the code instructions to manage “fetch”/“store” routines used to maintain coherency when copying a stack and returning the computed result back to main memory to maintain coherency. Therefore, memory management architectures and computer operating systems that increase computational efficiencies by substantially reducing processor transistor counts and instruction sets are equally desirable for their ability to reduce processor size, cost, and power consumption while increasing computational speeds are highly desirable. -
FIG. 7B depicts thememory management architecture 220 that is another preferred embodiment of the invention. This embodiment overcomes the stack overflow limitations ofconventional computing architectures 200 and eliminates the need for complex predictivememory management algorithms 202 by running program stacks directly frommain memory 222. Thealgorithms 202 are mitigated or eliminated in ahybrid computing module 100 when the resonant gate transistor in the fully integratedpower management module 108 is tuned to switch power at speeds that enable thephysical memory 222 to operate in-step with the clock speed of theprocessor unit 224. Although the look-up table 226 can be located in anoptional cache memory 228 on-board theprocessor unit 224, it is a preferred embodiment of the invention to locate the look-up table 226 inphysical memory 222. The invented architecture subsequently enables theprocessing unit 224 to render amemory management variable 230 to the look-up table 226 that selects thepointer 232 referencing theaddress 234 of the next set of data and/or processes in aprogram stack 236 needed by theprocessor unit 224 to complete its computational task. The availability of essentially unlimited bit-space in physical memory allows the variable 230 to instruct the look-up table 226 to reassign and reallocateaddresses 234 to match the requirements of processed data and/or updated processes as they are loaded 238 in and out of theprocessing unit 224. - FIGS. 7C,7D further illustrates the intrinsic benefits of a computer operating system enabled by the invention's
memory management architecture 220 when it is applied to processing program stacks 240 through a single-threadedCPU processor 242. As illustrated inFIG. 7C , a modern generalpurpose operating systems 243 loads all declared program items comprising variables (global and local), data structures, and called functions, etc., (not shown in its entirety for clarity), contained within aprogram stack 240 directly from the computer'smain memory 244 into theCPU cache memory 246. During the compiling process the operating system 239 copies these items and organizes them as sequenced code blocks into a collection of program stacks 240 that are collectively stored as heap memory withinmain memory 244. Theoperating system 243 organizes the items within the program stacks 240 stored in main memory 244 (or optionally loaded into cache memory 246) to be operated upon as a last-in-first-out (“LIFO”) series of variables and instruction sets. - When called, a computational process defined within a first selected
program stack 240A heaped inmain memory 244 is copied and transferred 248 into theCPU cache memory 246. Theprogram stack copy 250 is then worked through item by item within theprocessor 242, until it gets to the bottom of theprogram stack copy 250. Since items within a stack copied into incache memory 246 are not independently addressable while incache memory 246, any changes made to aglobal variable 252 within theprogram stack copy 250 are reported 253 back to the look-up table 254 before thenext program stack 240 is called and loaded intocache memory 246. Items organized in program stacks 240 are independently addressable when they are heaped together inmain memory 244. This allows the look-up table 254 to update 256 (4×) theglobal variable 252 at all the locations within all the program stacks 240 before thenext program stack 240 is called intocache memory 246 for subsequent processing. Similarly, if theprogram stack copy 250 encounters alogical function 258 that calls for a program jump, theprogram stack copy 250 is halted, any changes previously made to aglobal variable 252 are updated 256 (4×) through the look-up table 254. The remainingitems 260 in the originalprogram stack copy 250 are discarded before the “jump-to”program stack copy 262 is transferred 263 intocache memory 246 and placed at the top 264 of its operational stack. - Although this operating system represents the most efficient general purpose computational architecture currently available it does contain several inefficiencies that are circumvented by this invention. First, it should be noted that low powers are needed to store data bytes in “static” memory. Maximum power loss occurs during the dynamic-access processes needed to copy, transfer, and restore (update) a given data byte that is already stored at a specific address in
main memory 244. Larger power inefficiencies result when the same data structure has to be updated 256 (4×) in multiple locations within a plurality of program stacks 240 heaped intomain memory 244. It is therefore desirable to enable a general purpose computational operating system that minimizes power loss by updating a global variable that exists only at one address in main memory, or by eliminating the need to replicate data structures and function blocks within multiple program stacks 240. Similarly, a significant number of operational cycles are wasted when loading and discarding the remainingitems 260 of aprogram stack copy 250 following a program jump. It is therefore desirable to enable a general purpose computational operating systems that minimizes operational cycles by never having to copy, load, and discard the remainingitems 260 within aprogram stack copy 250 following a program jump. By eliminating the additional transistors and instruction sets needed to manage wasteful operational cycles and memory swaps, the power reduction enabled by thehybrid computer module 100 that is cited for 6-core and 9-core processors in Table 1 can be further reduced by an additional 30%-75% through a more efficient operating system. - A very meaningful embodiment of the invention shown in
FIG. 7D is acomputational operating system 265 enabled by thehybrid computing module 100 that uses thememory management architecture 220 to minimize power loss and wasted operational cycles. Theoperating system 265 compiles a collection of program stacks 266 heaped intomain memory 267, wherein the series of sequenceditems 268 within each of the program stacks 266 are not copies of process-defining instruction sets anddata 269, butpointers 270 to the memory addresses 271 of the desired process-defining instruction sets anddata 269, which remain statically stored inmain memory 267. When a first selectedprogram stack 266A is called by theprocessor 272, thetop item 268A of the first selectedprogram stack 266A is copied 273 into thememory controller 274, which then uses thepointer 270 copied from thetop item 268A to load acopy 275 of the corresponding process-defining instruction set ordata 269A into theprocessor 272. Following this protocol, theoperating system 265 executes the desired computational process by working its way through the first selectedprogram stack 266A by copying thenext pointer 270 listed in thenext item 268 of the first selectedprogram stack 266A and loading 275 its corresponding process-defining instruction sets anddata 269 in the order theirpointers 270 are organized in the first selectedprogram stack 266A. When a change is made to aglobal variable 276 after it has been loaded into theprocessor 272, theloading process 273 is halted to allow thememory management variable 230 to notify the look-up table 277. The look-up table 277 in-turn updates 278A theglobal variable 276 at theaddress 271 it is stored statically at its primary location inmain memory 267. There is no need to consume power and waste operational cycles updating theglobal variable 276 at multiple locations inmain memory 267, since the program stacks 266 never store copies of theglobal variable 276, they only comprise pointingitems 268B that store the pointer toglobal variable 270A. This allows all program stacks 266 containingpointing items 268B to remain unchanged and still operate as intended when called into theprocessor 272 following an update to theglobal variable 276. - The
computational operating system 265 enables similar reductions in power consumption and wasted operational cycles during program jumps. When an item that maps alogical function 279 embedded within the first selectedprogram stack 266A that calls for a jump to a new program stack 266B, thememory management variable 230 halts theloading process 273 before the discardeditems 280 are copied and loaded into thecontroller 274. Thememory management variable 230 in-turn uses the look-up table 277 to instruct thecontroller 274 to address thetop item 281 on new program stack 266B. This starts the process of copying 282 the pointingitems 268 in the new program stack 266B into thecontroller 274, which, in-turn, loads 275 the instruction sets anddata 269 that execute the computational process defined within new program stack 266B into theprocessor 272. - The
memory management variable 230 may also be used to store new instruction sets and/or 269B defined by processes completed in theprocessor 272 at anew address 271Amain memory 267. While this embodiment achieves maximal efficiencies maintaining stack-based and heap-based memory functions inmain memory computational operating system 265 from fully loading program stacks into anoptional cache memory 228 and still fall within the scope of the invention. - Reference is now made to
FIGS. 7E&7F to illustrate the inherent benefits of the present invention when applied to resolving major operational inefficiencies in conventionalmulti-core microprocessor architectures 283. In this instance, a collection of code items for a program stack 284 (variables and instruction sets) is stored inmain memory 285. Aprogram stack 286 is generated withstack subdivisions main memory 285. Thestack subdivisions multiple processor cores program stack 286 is called by theprocessor 287, thesubdivisions program stack 286 are copied and mapped 288A,288B,288C,288D into the processor cores' 287A,287B,287C,287Dcache memory banks subdivisions register 290 of the shared global variables that are simultaneously processed among themultiple processor cores register 290, all of the processors have to be halted since none of the items in the running code blocks withinsubdivisions cache memory banks swap memory stack 291 to be created inmain memory 285 where theuncompleted stack subdivisions cache memory banks multiple processor cores main memory 285, the swap stack registers 290′A,290′B,290′C,290′D can update 293 the addressable items within theuncompleted stack subdivisions uncompleted stack subdivisions respective processor cores program stack 286 can be completed As is evident from the complexity ofFIG. 7E , this process (described with great simplification herein) requires intensive code executions to complete the mapping process and relies heavily upon “fetch”/“store” commands that are very wasteful of power budgeted tomain memory 285. Therefore, methods that sharply reduce the code complexity and minimize the usage of “fetch”/“store” commands while updating a global variable processed within a multi-core microprocessor die 287 is very desirable. - The intrinsic efficiency of the disclosed
multi-core operating system 295 is illustrated inFIG. 7F . As is the case with the single-threadedcomputational operating system 265, themulti-core operating system 295 compiles and heaps a subdividedprogram stack 296 intomain memory 267, wherein the series of sequenceditems 268 within each of theprogram stack subdivisions data 269, butpointers 270 to the memory addresses 271 of the desired process-defining instruction sets anddata 269, which remain statically stored at their primary locations inmain memory 267. When the program stack subdivisions are called by theirrespective processor cores top items memory controllers respective processor cores pointers 270 copied from thetop items copies data 269A corresponding to the loadedpointers 270 into theprocessor cores global variable 298A because the item 268AA that records its pointer 270B is positioned closer to the top within its ownsubdivided stack 296D than any other global variable is positioned in any of the othersubdivided stacks memory management variable 230 is communicated over the interruptbus 299 to the look-up table 277, which in-turn updates 278B the firstglobal variable 298A statically stored at theaddress 271 mapped with pointer 270B. Similarly, when a change is made to a second global variable 298B because the item 268BB that records itspointer 270C is now closest to the top within its ownsubdivided stack 296A than any other global variable is positioned in any of the othersubdivided stacks memory management variable 230 is communicated over the interruptbus 299 to the look-up table 277, which in-turn updates 278B the first global variable 298B statically stored at theaddress 271 mapped withpointer 270C. Any of theprocessor cores CPU 272 as illustrated inFIG. 7D when managing program jumps with higher efficiency. - In conclusion, reference is now made to FIGS. 8A,8B,9A,9B to illustrate embodiments of the invention that relate to a general purpose stack-machine computing module. Stack-machine computing architectures were used on many early minicomputers and mainframe computing platforms. The Burroughs B5000 remains the most famous mainframe platform to use this architecture. RISC eventually enabled register-based cache computing architectures to displace stack-machine computing in broader applications as general purpose computing grew in complexity and hardware limitations imposed stricter requirements on memory management. Furthermore, advances in software and hardware combined to make it difficult for stack-machine systems to operate High-Level Languages, such as ALGOL and the suite of C-languages derived from it. These developments made stack-machine computing inefficient in general purpose applications, though it remains an attractive option in limited-use/specific-purpose embedded processors. Stack machine architectures are also implemented in certain software applications (JAVA and Adobe POSTSCRIPT) by configuring the processor and cache memory as a virtual stack machine.
- In the context of a stack machine, a stack 300 (see
FIG. 8A ) is an abstract data structure that exists as a restricted linear or sequential collection ofitems 302 that have some shared significance to the desired computational objective. The items are loaded into thestack 300 in a Last-In-First-Out (“LIFO”) structure, which is very useful for block-oriented languages. The stack contains a list of “operands” 304 a,304 b,304 c,304 d,304 e sequenced in thelinear collection 302 near the top of the stack. Theseoperands linear series 306 of operations (“operators”) 308 a,308 b,308 c,308 d. In a generic stack machine theindividual operators linear series 306. Each of theindividual operators stack 300 by means ofpush 310 andpop 312 commands, that add and remove theoperators operators sequential collection 302. Thefirst operator 308 a is applied to the top twooperands operators stack 300 to complete the algorithmic calculation. After the first operation is completed, the stack will comprise the resultant of 308 a applied to 304 a,304 b inserted to the top of the stack and 304 c,304 d,304 e. Thesecond operator 308 b is then applied to the resultant of 308 a applied to 304 a,304 b and item 304 c, which now occupies the second position in thestack 300. The process continues until thelast operator 308 d is applied to the resultant of the twooperands 304 c,304 d immediately before thelast operand item 304 e in thestack 300. The final resultant is then inserted into the top of thestack 300 to be dispatched and used in the next step of the program. - The
stack 300 will typically contain non-operand items in the stack, such as addresses, function calls, records, pointers (stack, current program and frame), or other descriptors needed elsewhere in the computational process. The process depicted inFIG. 8B depicts how stacks are implemented in the most generic (simplest)conventional stack machine 320.FIG. 8B also illustrates how stack machine computing is ideal for recursive computations, which progressively update and operate on the first two elements of a series, or nested functions that run a local variable through a series of operations until the desired output is generated. Inconventional stack machines 320, thedata stack 322, returnstack 324,program counter 326, and the top-of-the-stack (“TOS”)register 328 are embedded incache memory 330 integrated into theprocessor core 332. The data stack 322 loads the top item of the stack into the top-of-the-stack (“TOS”) register orbuffer 328. The second item (now moved to the top) in the data stack 322 is simultaneously loaded through thedata bus 334 as a pair with the item stored in the TOS register 328 into the arithmetic and logic computational unit (“ALU”) 336 where the primitive element operator (logical or arithmetic) is applied to the two operands. The resultant value of theALU 336 is then placed in the TOS register 328 to be loaded back into theALU 336 with the next item that has moved to the top of thedata stack 322. Theprogram counter 326 stores the address within theALU 336 of the next instruction to be executed. Theprogram counter 326 may be loaded from the bus when implementing program branches, or may be incremented to fetch the next sequential instruction fromprogram memory 338 located inmain memory 340. - The
ALU 336 and the control logic and instruction register (CLIR) 342 are located in theprocessor core 332. TheALU 336 comprises a plurality of addresses consisting of transistor banks configured to perform a primitive arithmetic element that functions as the operator applied to the pair of items sent through theALU 336. The return stack is a LIFO stack used to store subroutine return addresses instead of instruction operands.Program memory 338 comprises a fair amount of random access memory and operates with thememory address register 344, which records the addresses of the items to be read onto or written from thedata bus 334 on the next system cycle. Thedata bus 334 is also connected to an I/O port 346 used to communicate with peripheral devices. - In many instances, the number of instructions needed in stack-based computing can be reduced by as much as 50% compared to the number of instructions needed by register-based systems because interim values are recorded within the
stack 300. This obviates the need to use additional processor cycles for multiple memory calls (fetch and restore) when manipulating a “local variable”. Table II contrasts the processor cycles and code density needed to process simple A+B−C and D=E instruction sets in stack-based and register-based computing systems to illustrate the minimal instruction set computing (“MISC”) potential of stack machines. -
TABLE II Stack Register Operation A B + C - (post-fix notation) A + B − C Code push val A load r0, A push val B load r1, B add add r0, r1;; push val C r0 + r1 -> r0 sub load r2, C sub r0, r2;; r0-r2 -> r0 Operation D E = (post-fix notation) D = E Code push val D load r0, ads D push val E load r1, val B store store r1, (r0);; r1 -> (r0) - The code density of stack machines can be very compact since no operand fields and memory fetching instructions are required until the computational objective is completed. There is no need to allocate registers for temporary values or local variables, which are implicitly stored within the
stack 300. The LIFO structure also facilitates maintenance and storage of activation records within thestack 300 during the transfer of programmatic control to subroutines. However, the utility of stack machines has become limited in more complex operations that require pipelining and multi-threading, or the maintenance of real-time consistency of global values over a broader network such as a computing cloud. - In early computing embodiments,
stacks 300 were processed entirely in main memory. While this approach made the system slow, it allowed all items in thestack 300 to be independently addressable. However, as microprocessor speeds increased beyond the ability of physical memories to keep pace, stacks had to be loaded into cache memory where the items are not independently addressable. This limitation amplified the intrinsic inflexibility of working with restricted sequential collections ofoperand items 302 and linear instruction sets 308. Consequently, modern stack machines started losing their competitive edge as general purpose applications required larger numbers of global variables to maintain their consistency as they are being simultaneously processed in various program branches within a plurality of stacks that could be located across a multiplicity of processor cores. Additionally, some computational problems require conditional problem solving where it is advantageous to modify a sequence of instructions based upon the conditional response of an earlier computation. - The inability to address global variables or instructions buried within a stack in a timely manner generated additional high-density micro-coding needed to unload the stack, update the global variable or instruction sequence buried within it, and reload all the items back into the stack(s). This complexity and code density undermined the intrinsic efficiency of stack machines and allowed register machines to run far faster on less code. The efficiencies of higher-level language requirements enabled by compiler optimizations further restricted stack machines, which require structured languages, like FORTH or POSTSCRIPT, to achieve optimal efficiencies.
- Despite these current disadvantages, stack architectures remain a preferred computing mode in limited small-scale and/or embedded applications that require high computational efficiencies because of their ability to be configured in ways that make computational use of every single available CPU cycle. This intrinsic advantage to stack architectures further enables fast subroutine linkage and interrupt response. These architectures are also emulated in virtual stack machines that require a less then efficient use of memory bandwidth and processing power. It is therefore desirable to provide a general purpose stack machine and operating system that processes computational problems with minimal instruction sets and transistor counts to minimize power consumption.
- Reference is now made to FIGS. 9A,9B to illustrate the general purpose
stack machine module 350 that applies thememory management architecture 220 andcomputational operating system 265 to a conventional stack machine processor architecture. These enabling methods and embodiments overcome all the known limitations of conventional stack machines by simultaneously allowing global variables or instruction sets buried within multiple threaded stacks to be independently addressed and updated following a system interrupt. The general purpose stack-machine computing module 350 incorporates ahybrid computing module 100 wherein the module'smain memory bank 352 has been allocated into multiple groupings comprising astack memory group 354, a CPU/GPU memory group 356, aglobal memory group 358, aredundant memory group 360, and a generalutility memory group 362. Each of thememory groupings internal program counter - The general purpose stack machine computing module's 350 operating system segregates its functional blocks to maximize efficiencies enabled the invention. Instruction sets and associated variables within nested functions and recursive processes are organized and stored in the
stack memory group 354, which interfaces with the generalpurpose stack processor 374 designed to run with optimal code, power, and physical size efficiencies. Block program elements that have an iterative code structure have their instruction sets and associated variables stored and organized in the CPU/GPU memory group 356. Global variables, master instruction sets, and the master program counter is stored in theglobal memory group 358, which interfaces a master processor. The master processor could either the CPU/GPU processor(s) 376 or the generalpurpose stack processor 374 and administers the primary iterative code blocks. The redundantmemory management group 360 is used to interface the general purpose stackmachine computing module 350 with redundant systems or backup memory systems connected to the module through its I/O system interface 378. The general utilitymemory management group 362 can be subdivided into a plurality of subgroupings and used to manage any purpose not delegated to the other groups, such as system buffering, or memory overflows. A master controller andinstruction register 380 coordinate data and process transfers and function calls between themain memory bank 352, the CPU/GPU processor(s) 376, the generalpurpose stack processor 374, and the I/O system interface 378. - Stack machine computers have demonstrated clear efficiency gains, measured in terms of processing speed, transistor count (size), power efficiency, and code density minimization, when applied to nested and recursive functions. Although conventional processors using register-based architectures can be configured as a virtual stack machine, considerable power and transistor counts savings are only achieved by applying structured programming languages (FORTH and POSTSCRIPT) to processors having matching machine code. For example, the Computer Cowboys MuP21 processor, which had machine code structured to match FORTH, managed 100 million-instructions-per-second (“MIPS”) with only 7,000 transistors consuming 50 mW. This represented a 1,000-fold decrease in transistor count, with associated benefits to component size/cost and power consumption over equivalent processors utilizing conventional register architectures. However, the intrinsic programmatic inflexibility of stack machines inherent to the imposition of a fixed-depth stack that is not directly accessible has forced leading stack machines (Computer Cowboys MuP21, Harris RTX, and the Novix NC4016) to be withdrawn from the marketplace. These limitations have relegated modern stack machines to peripheral-interface-controller (PIC) devices.
- Therefore, a specific embodiment of the general purpose stack
machine computing module 350 incorporates an ASIC semiconductor die 122 to function as the module'sstack processor 374, wherein the ASIC die 122 is designed with machine code that matches and supports a structured programming language, preferably the FORTH or POSTSCRIPT programming languages. Since the primary objective of the invention is to develop a general purpose stack machine computing module, and an FPGA can be encoded with machine code that matches a structured programming language, a preferred embodiment of the invention comprises a general purpose stackmachine computing module 350 that incorporates an FPGA as itsstack processor 374, or an FPGA configured as astack processor 374 comprising multiple processing cores (not shown to avoid redundancy). Additionally, since the same efficiencies that enable minimum instruction set computing and maximum use of every operational cycle further enable efficient branching in main memory by changing a linear series 308 of operators applied to alinear collection 306 of operands before they are loaded into astack processor 374, it is a meaningful preferred embodiment of the invention to use the stack processor to manage iterative code blocks. - The general purpose stack machine computing module's 350 operating system organizes the stack memory group 354 (see
FIG. 9A ) to have adata stack register 382, areturn stack register 384, and one or more instruction stack registers 386. The one or more instruction registers 386 are used to store functions or subroutines as operator sequences, and can also be used to store instructions used by thestack processor 374 for retrieval at a later time. Eachstack register register 382 is loaded into astack buffer utility 390 in thestack processor 374 during the first operational cycle. On the second operational cycle, thestack buffer utility 390 loads the first desired operand from the stackmain memory 392 into the top-of-the-stack (TOS)buffer 394 through thedata bus 395, while the next item-address listed in the data stackregister 382 is loaded into thestack buffer utility 390 to configure it to load the second item in the data stack into theALU operand buffer 396 during the subsequent operational cycle. To maximize operational efficiencies, thebuffer utility 390 may also store a plurality of items that are address-mapped into its local register in exact sequence with the LIFO structure of the data stackregister 382. This process of using thestack buffer utility 390 to translate a LIFO structure of item-addresses into a self-consistent list of items at processor clock-speeds allows a pre-determined sequence of operands to be loaded into theALU operand buffer 396 as though the sequence was loaded directly from the data stack register. Once theTOS 394 andALU operand 396 buffers are loaded with the first two items in the data stack, subsequent operational cycles simultaneously call the next operand(s) into theALU operand buffer 396 in matching LIFO sequence with the list of corresponding address pointers originally loaded into the data stackregister 382, while the resultant of the applied operation emerging from theALU 398 is reloaded back into theTOS buffer 394. Although a list of address pointers loaded LIFO into the data stackregister 382 is a preferred embodiment of the invention, it is inherent within the invention to load the items into the data stackregister 382 and still maintain fundamental item-addressability. - The
return register 384 comprises the list of addresses that are used to permanently store a block of instructional code so it can be returned when thestack processor 374 has completed the block calculation. Similarly, thereturn register 384 is also be used to list the address used to temporarily house a block of code that was interrupted so it can be retrieved following a status interrupt and reinstated to complete its original task. These lists are also formatted in LIFO structure to more easily maintain programmatic integrity. - The
instruction stack register 386 comprises a LIFO list of pointers to locations within theALU 392 that represent specific machine-coded logical operations to be used as primitive element operators as described inFIG. 8A . The ALU address pointers in theinstruction stack register 386 are sequenced to match primitive element algorithmic series to be applied to an associated set operands that will be loaded in tandem into theALU 392. The LIFO sequence of operator addresses are compiled as a list of operators to complete any recursive or nested loop calculation desired with thestack processor 374. - The mathematical operators in the
instruction register 386 are loaded into theALU 398 by means of aninstruction set utility 400. Theinstruction set utility 400 activates input paths within theALU 398 that load the operands stored in theTOS 394 andALU operand 396 buffers into the prescribed logical operator. Left uninterrupted, the generalpurpose stack processor 374 allows all of the items specified in the data and instruction “stacks” (382,384) to be processed in a manner consistent with a conventional stack machine using minimal instruction sets, transistor counts, chip size, and power consumption. - The
instruction set utility 400 can also be configured to record and copy a programmable fixed number of operand pairs and operators so they can be played back again through theALU 398 in proper sequence without affecting theinstruction register 386. - A principal benefit of the
stack processor 374 over, and its major distinction from, the prior art is its ability to use thememory management architecture 220 andcomputational operating system 265 to modify any global variable buried within adata stack 300 “on-the-fly” without a need to transfer the sequenced items in and out of cache to main memory to effectuate the global variable update, or waste operational cycles when making a program jump. This aspect of the invention couples a stack machine's inherent ability to execute fast subroutine linkages and interrupt responses with the invention's ability to load addressable items directly from main memory at speeds in step with the processors' operational cycle. This embodiment further enables thestack processor 374 to respond to a conditional logic interrupt triggered outside the stack or elsewhere in the system so it can operate alongside pipelined and multi-threaded CPU/GPU processor cores. This aspect of the invention allows the general purpose stackmachine computing module 350 to support pipelined or multi-threaded general purpose architectures, which are additional embodiments of this invention. - An update to a buried global value is effectuated when an alert from the master controller and
instruction register 380 signaling that a global variable has been changed from somewhere in the system. The global variable could be changed in additional cores within thestack processor 374, a neighboring CPU/GPU core 376, or another general purpose stackmachine computing module 350 configured as a distributed or fault-tolerant computing element, or a networked system connected to themodule 350 through the I/O system 378. - The master controller and
instruction register 380 activates commands over the status interruptbus 402 to temporarily halt traffic over thedata bus 395. While data traffic is temporarily halted, the addressable item stored in stackmain memory 392 that corresponds to the address pointer of the global variable loaded into the data stackregister 382 is refreshed with the updated value from the globalvariable register 404. Once the updated global variable is confirmed, the globalvariable register 404 signals the master controller andinstruction register 378 to resume traffic over thedata bus 395. - In situations where the stack
processor program counter 406 registers that the global variable recorded within the data stackregister 382 has already been loaded into thestack buffer utility 390 or theALU operand buffer 396, the updated value is loaded into theinstruction set utility 400 during the system interrupt. Theinstruction set utility 400 then overrides the previously loaded operand with the updated global value during the cycle it is scheduled to be operated upon within theALU 398. - In the event the global value to be updated was recently used to produce the value stored in the
TOS buffer 394, theinstruction set utility 400 is instructed to playback in reverse order the operands and operators it has copied and recorded, and then substitute the updated global value for the obsolete value before the interrupt is released. Alternatively, theinstruction set utility 400 can use a series of operands and operators stored in theinstruction stack register 386 to re-calculate the function with the updated global variable, if desired. - The memory management flexibility enabled by the invention further provides a general purpose stack
machine computing module 350 comprising a generalpurpose stack processor 374 that can be halted by a logical interrupt command to accommodate instructions that re-orient the computational program to block stored within modulemain memory bank 352, or to an entirely new set of instructions that are pipelined in or threaded with other processors within or in communication with themodule 350. - In the case of a locally generated program change, an interrupt flag originating from an internal logical process alerts the master controller and
instruction register 380 to change the direction of the program based upon a pre-specified logical condition using any of the embodiments specified above, such as giving priority access to certain processes scheduled to run in thestack processor 374 or updating a global variable acrossmain memory bank 352, or any peripheral memory (not shown) networked tomain memory bank 352. The master controller and instruction register 380 issues commands to halt traffic on thedata base 395 until the logical interruptregister 408 has loaded the high priority program blocks into thedata stack 382, returnstack 384, andinstruction stack 386 registers, with all associated items placed in the stack memory group's 354main memory 392. The pointers previously loaded into the registers can be either be pushed further down the register, or redirected to other locations within modulemain memory bank 352. Traffic is then restored to thedata bus 395 allowing the higher priority process to run through to completion so the lower priority process then can be restored. - In situations where it is desirable to thread the
stack processor 374 with other stack processing cores located elsewhere in the system (not shown), the logical interruptregister 408 alerts the master controller andinstruction register 380 to halt traffic on thedata bus 395. Thestack program controller 406 coordinates with theinstruction set utility 400 to record and store the state of the existing process so it can be restored at a later instance, while the logical interruptregister 408 pipelines the items from the external processor core(s) (not shown) through the status interruptbus 402.Additional data stack 382, returnstack 384, andinstruction set 386 registers may be allocated during the process and the imported items could be stored in any reliable location inmain memory bank 352. Pointers related to the threaded or pipelined processes address locations accessed through the I/O interface system 378. Traffic over the data bus is reinitiated to activate computational processors in thestack processor 374, and the threaded processes/data may be interleaved to run continually with the internal processes. - While the invention is described herein with reference to the preferred embodiments, it is to be understood that it is not intended to limit the invention to the specific forms disclosed. On the contrary, it is intended to cover all modifications and alternative forms falling within the spirit and scope of the appended claims.
Claims (30)
1. A hybrid computing module, comprising:
a semiconductor carrier including a substrate adapted to provide electrical communication, through electrically conducting traces and passive circuit network filtering elements formed upon the carrier substrate, between a fully integrated power management circuit module having a resonant gate transistor to switch electrical power to drive the transfer of data and digital process instruction sets between a plurality of discrete semiconductor die mounted upon the semiconductor carrier, wherein the plurality of discrete semiconductor die include:
at least one microprocessor die forming a central processing unit (CPU), and
a memory bank having at least one memory die.
2. The hybrid computing module of claim 1 , wherein the plurality of semiconductor die include a field programmable gate array (FPGA).
3. The hybrid computing module of claim 1 , wherein the plurality of semiconductor die additionally provide memory controller functionality.
4. The hybrid computing module of claim 3 , wherein the memory controller functionality is field programmable.
5. The hybrid computing module of claim 3 , wherein the memory controller functionality is provided by a static address memory controller.
6. The hybrid computing module of claim 1 , wherein the plurality of semiconductor die additionally include a graphics processing unit (GPU).
7. The hybrid computing module of claim 1 , wherein the plurality of semiconductor die additionally include an application-specific integrated circuit (ASIC).
8. The hybrid computing module of claim 1 , wherein the some of the plurality of semiconductor die are mounted as a stack on the semiconductor carrier.
9. The hybrid computing module of claim 1 , further comprising a plurality of semiconductor die mounted upon the hybrid computing module that provide GPU and field programmability.
10. The hybrid computing module of claim 9 , wherein the CPU and GPU semiconductor die comprise multiple processing cores.
11. The hybrid computing module of claim 1 , wherein the fully integrated power management module is mounted on the semiconductor carrier.
12. The hybrid computing module of claim 1 , wherein the fully integrated power management module switches power at speeds greater than 250 MHz.
13. The hybrid computing module of claim 1 , wherein the fully integrated power management module is formed upon the semiconductor carrier.
14. The hybrid computing module of claim 1 , wherein the substrate forming the semiconductor carrier is a semiconductor.
15. The hybrid computing module of claim 14 , wherein active circuitry is embedded in the semiconductor substrate that manages USB, audio, video and other communications bus interface protocols.
16. The hybrid computing module of claim 1 , wherein the microprocessor die contains multiple processing cores.
17. The hybrid computing module of claim 1 , wherein the microprocessor die has cache memory that occupies less than 15% of the microprocessor die footprint.
18. The hybrid computing module of claim 1 , wherein the plurality of discreet semiconductor die are configured as a chip stack.
19. The hybrid computing module of claim 1 , wherein the semiconductor carrier is in electrical communication with an electro-optic drivers that interface the hybrid computing module with other systems by means of fiber-optic network.
20. The hybrid computing module of claim 19 , wherein the electro-optical interface contains an active layer that forms a 3D electron gas.
21. The hybrid computing module of claim 1 , wherein the hybrid computing module contains a plurality of central processing units, each functioning as distributed processing cores.
22. The hybrid computing module of claim 1 , wherein the hybrid computing module contains a plurality of central processing units that are configured to function as a fault-tolerant computing system.
23. The hybrid computing module of claim 1 , wherein the hybrid computing module is in thermal contact with a thermoelectric device.
24. A real-time memory access computing architecture, comprising:
a hybrid computer module comprising a plurality of discrete semiconductor die mounted upon a semiconductor carrier, which hybrid computer module further comprises:
a fully integrated power management module having a resonant gate transistor, wherein the fully integrated power management module is adapted to synchronously switch power at speeds that match a clock speed of a microprocessor on an adjacent microprocessor die mounted within the hybrid computer module to provide real-time memory access;
a look-up table adapted to select a pointer to reference addresses in a main memory where data and/or processes are physically located;
a memory management variable that uses the look-up table to select the next set of data and/or processes called by the microprocessor;
a memory bank forming the main memory,
wherein, ≧50% of cache memory of the microprocessor die is allocated to stack-based memory functionality.
25. The computing architecture of claim 24 , wherein the resonant transistor gate switches power at speeds between 600 MHz and 60 GHz.
26. The computing architecture of claim 24 , wherein the fully integrated power management module has an efficiency greater than 98%.
27. The computing architecture of claim 24 , wherein 70%-100% of the microprocessor die cache memory is allocated to stack-based memory functionality.
28. The computing architecture of claim 24 , wherein the look-up table is located in cache memory or in main memory.
29. The computing architecture of claim 24 , wherein main memory resources provide both stack-based and heap-based memory functionality.
30. The computing architecture of claim 24 , wherein the memory management variable is adapted to instruct the look-up table to reassign and/or reallocate main memory addresses.
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/917,601 US20140013129A1 (en) | 2012-07-09 | 2013-06-13 | Hybrid computing module |
CN201380046854.7A CN104603944B (en) | 2012-07-09 | 2013-07-09 | Mix computing module |
BR112015000525A BR112015000525A2 (en) | 2012-07-09 | 2013-07-09 | hybrid computational module |
PCT/US2013/049636 WO2014011579A2 (en) | 2012-07-09 | 2013-07-09 | Hybrid computing module |
CA2917932A CA2917932A1 (en) | 2012-07-09 | 2013-07-09 | Hybrid computing module |
EP13817338.0A EP2870630B1 (en) | 2012-07-09 | 2013-07-09 | Real-time memory access computer architecture |
US15/845,259 US20180224916A1 (en) | 2012-07-09 | 2017-12-18 | Hybrid computing module |
US16/525,318 US11061459B2 (en) | 2010-08-23 | 2019-07-29 | Hybrid computing module |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261669557P | 2012-07-09 | 2012-07-09 | |
US201361776333P | 2013-03-11 | 2013-03-11 | |
US13/917,601 US20140013129A1 (en) | 2012-07-09 | 2013-06-13 | Hybrid computing module |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/216,192 Continuation US8779489B2 (en) | 2010-08-23 | 2011-08-23 | Power FET with a resonant transistor gate |
US15/881,164 Continuation-In-Part US10651167B2 (en) | 2010-08-23 | 2018-01-26 | Power FET with a resonant transistor gate |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/325,129 Continuation US9153532B2 (en) | 2010-08-23 | 2014-07-07 | Power FET with a resonant transistor gate |
US15/845,259 Continuation US20180224916A1 (en) | 2010-08-23 | 2017-12-18 | Hybrid computing module |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140013129A1 true US20140013129A1 (en) | 2014-01-09 |
Family
ID=49879449
Family Applications (9)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/917,601 Abandoned US20140013129A1 (en) | 2010-08-23 | 2013-06-13 | Hybrid computing module |
US13/917,607 Active 2033-09-24 US9348385B2 (en) | 2012-07-09 | 2013-06-13 | Hybrid computing module |
US15/162,285 Expired - Fee Related US10620680B2 (en) | 2012-07-09 | 2016-05-23 | Hybrid computing module |
US15/161,815 Active US9710181B2 (en) | 2012-07-09 | 2016-05-23 | Hybrid computing module |
US15/162,739 Active US9791909B2 (en) | 2012-07-09 | 2016-05-24 | Hybrid computing module |
US15/162,745 Active US9766680B2 (en) | 2012-07-09 | 2016-05-24 | Hybrid computing module |
US15/162,759 Abandoned US20170031847A1 (en) | 2012-07-09 | 2016-05-24 | Hybrid computing module |
US15/845,259 Abandoned US20180224916A1 (en) | 2010-08-23 | 2017-12-18 | Hybrid computing module |
US16/729,630 Active US11199892B2 (en) | 2012-07-09 | 2019-12-30 | Hybrid computing module |
Family Applications After (8)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/917,607 Active 2033-09-24 US9348385B2 (en) | 2012-07-09 | 2013-06-13 | Hybrid computing module |
US15/162,285 Expired - Fee Related US10620680B2 (en) | 2012-07-09 | 2016-05-23 | Hybrid computing module |
US15/161,815 Active US9710181B2 (en) | 2012-07-09 | 2016-05-23 | Hybrid computing module |
US15/162,739 Active US9791909B2 (en) | 2012-07-09 | 2016-05-24 | Hybrid computing module |
US15/162,745 Active US9766680B2 (en) | 2012-07-09 | 2016-05-24 | Hybrid computing module |
US15/162,759 Abandoned US20170031847A1 (en) | 2012-07-09 | 2016-05-24 | Hybrid computing module |
US15/845,259 Abandoned US20180224916A1 (en) | 2010-08-23 | 2017-12-18 | Hybrid computing module |
US16/729,630 Active US11199892B2 (en) | 2012-07-09 | 2019-12-30 | Hybrid computing module |
Country Status (6)
Country | Link |
---|---|
US (9) | US20140013129A1 (en) |
EP (1) | EP2870630B1 (en) |
CN (1) | CN104603944B (en) |
BR (1) | BR112015000525A2 (en) |
CA (1) | CA2917932A1 (en) |
WO (1) | WO2014011579A2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150097221A1 (en) * | 2010-08-23 | 2015-04-09 | L. Pierre de Rochemont | Power fet with a resonant transistor gate |
US9501222B2 (en) | 2014-05-09 | 2016-11-22 | Micron Technology, Inc. | Protection zones in virtualized physical addresses for reconfigurable memory systems using a memory abstraction |
US9558143B2 (en) | 2014-05-09 | 2017-01-31 | Micron Technology, Inc. | Interconnect systems and methods using hybrid memory cube links to send packetized data over different endpoints of a data handling device |
WO2019213625A1 (en) | 2018-05-03 | 2019-11-07 | De Rochemont Pierre L | High speed / low power server farms and server networks |
US11239922B2 (en) | 2018-06-05 | 2022-02-01 | L. Pierre de Rochemont | Module with high peak bandwidth I/O channels |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140013129A1 (en) * | 2012-07-09 | 2014-01-09 | L. Pierre de Rochemont | Hybrid computing module |
US9047196B2 (en) * | 2012-06-19 | 2015-06-02 | Concurix Corporation | Usage aware NUMA process scheduling |
US9575813B2 (en) | 2012-07-17 | 2017-02-21 | Microsoft Technology Licensing, Llc | Pattern matching process scheduler with upstream optimization |
US9703335B2 (en) * | 2013-01-14 | 2017-07-11 | Dell Products L.P. | Information handling system chassis with anisotropic conductance |
US9342443B2 (en) | 2013-03-15 | 2016-05-17 | Micron Technology, Inc. | Systems and methods for memory system management based on thermal information of a memory system |
CN106133700A (en) | 2014-03-29 | 2016-11-16 | 英派尔科技开发有限公司 | Energy-conservation dynamic dram caching adjusts |
US9990293B2 (en) * | 2014-08-12 | 2018-06-05 | Empire Technology Development Llc | Energy-efficient dynamic dram cache sizing via selective refresh of a cache in a dram |
CN105743808B (en) * | 2014-12-08 | 2017-09-19 | 华为技术有限公司 | A kind of adaptation QoS method and apparatus |
US20160204497A1 (en) * | 2015-01-08 | 2016-07-14 | Young Max Enterprises Co., Ltd. | Inductive proximity antenna module |
WO2018020299A1 (en) * | 2016-07-29 | 2018-02-01 | Chan Kam Fu | Lossless compression and decompression methods |
US10068879B2 (en) | 2016-09-19 | 2018-09-04 | General Electric Company | Three-dimensional stacked integrated circuit devices and methods of assembling the same |
US11397687B2 (en) * | 2017-01-25 | 2022-07-26 | Samsung Electronics Co., Ltd. | Flash-integrated high bandwidth memory appliance |
US10489204B2 (en) | 2017-01-31 | 2019-11-26 | Samsung Electronics Co., Ltd. | Flexible in-order and out-of-order resource allocation |
US11030126B2 (en) * | 2017-07-14 | 2021-06-08 | Intel Corporation | Techniques for managing access to hardware accelerator memory |
CN113918481A (en) | 2017-07-30 | 2022-01-11 | 纽罗布拉德有限公司 | Memory chip |
CN110118581B (en) * | 2019-06-05 | 2024-01-09 | 上海一旻成锋电子科技有限公司 | Embedded composite sensor |
US20210173784A1 (en) * | 2019-12-06 | 2021-06-10 | Alibaba Group Holding Limited | Memory control method and system |
CN113032015B (en) * | 2019-12-24 | 2022-02-18 | 中国科学院沈阳自动化研究所 | Communication method for precision motion control |
US11507414B2 (en) * | 2020-11-25 | 2022-11-22 | Cadence Design Systems, Inc. | Circuit for fast interrupt handling |
US11842226B2 (en) | 2022-04-04 | 2023-12-12 | Ambiq Micro, Inc. | System for generating power profile in low power processor |
CN115237475B (en) * | 2022-06-23 | 2023-04-07 | 云南大学 | Forth multi-core stack processor and instruction set |
US20240046776A1 (en) * | 2022-08-07 | 2024-02-08 | Andrew Magdy Kamal | Computing Method |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5781040A (en) * | 1996-10-31 | 1998-07-14 | Hewlett-Packard Company | Transformer isolated driver for power transistor using frequency switching as the control signal |
US6333895B1 (en) * | 1999-10-07 | 2001-12-25 | Mitsubishi Denki Kabushiki Kaisha | Clock synchronous semiconductor device having a reduced clock access time |
US20030102495A1 (en) * | 2001-12-05 | 2003-06-05 | Huppenthal Jon M. | Reconfigurable processor module comprising hybrid stacked integrated circuit die elements |
US20050268185A1 (en) * | 2004-05-26 | 2005-12-01 | David Vinke | Method and apparatus for high speed testing of latch based random access memory |
US20090304389A1 (en) * | 2008-06-05 | 2009-12-10 | Samsung Electronics Co., Ltd. | Semiconductor apparatuses having optical connections between memory controller and memory module |
US7701252B1 (en) * | 2007-11-06 | 2010-04-20 | Altera Corporation | Stacked die network-on-chip for FPGA |
US20120043598A1 (en) * | 2010-08-23 | 2012-02-23 | De Rochemont L Pierre | Power fet with a resonant transistor gate |
US20120198266A1 (en) * | 2011-01-28 | 2012-08-02 | Qualcomm Incorporated | Bus Clock Frequency Scaling for a Bus Interconnect and Related Devices, Systems, and Methods |
US20130157639A1 (en) * | 2011-12-16 | 2013-06-20 | SRC Computers, LLC | Mobile electronic devices utilizing reconfigurable processing techniques to enable higher speed applications with lowered power consumption |
US20130205089A1 (en) * | 2012-02-08 | 2013-08-08 | Mediatek Singapore Pte. Ltd. | Cache Device and Methods Thereof |
US20130257481A1 (en) * | 2012-03-28 | 2013-10-03 | Sophocles R. Metsis | Tree based adaptive die enumeration |
US8635492B2 (en) * | 2011-02-15 | 2014-01-21 | International Business Machines Corporation | State recovery and lockstep execution restart in a system with multiprocessor pairing |
Family Cites Families (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5264736A (en) * | 1992-04-28 | 1993-11-23 | Raytheon Company | High frequency resonant gate drive for a power MOSFET |
US6446867B1 (en) | 1995-11-22 | 2002-09-10 | Jorge Sanchez | Electro-optic interface system and method of operation |
US5707715A (en) | 1996-08-29 | 1998-01-13 | L. Pierre deRochemont | Metal ceramic composites with improved interfacial properties and methods to make such composites |
US6143432A (en) | 1998-01-09 | 2000-11-07 | L. Pierre deRochemont | Ceramic composites with improved interfacial properties and methods to make such composites |
US6323549B1 (en) | 1996-08-29 | 2001-11-27 | L. Pierre deRochemont | Ceramic composite wiring structures for semiconductor devices and method of manufacture |
WO1998019234A1 (en) | 1996-10-28 | 1998-05-07 | Macronix International Co., Ltd. | Processor with embedded in-circuit programming structures |
US6148391A (en) * | 1998-03-26 | 2000-11-14 | Sun Microsystems, Inc. | System for simultaneously accessing one or more stack elements by multiple functional units using real stack addresses |
US6526491B2 (en) | 2001-03-22 | 2003-02-25 | Sony Corporation Entertainment Inc. | Memory protection system and method for computer architecture for broadband networks |
DE10201124A1 (en) * | 2002-01-09 | 2003-07-24 | Infineon Technologies Ag | Opto-electronic component for raising data transmission rates has a quantum point structure for making a functional link between monolithically integrated components. |
US8749054B2 (en) | 2010-06-24 | 2014-06-10 | L. Pierre de Rochemont | Semiconductor carrier with vertical power FET module |
US8037224B2 (en) | 2002-10-08 | 2011-10-11 | Netlogic Microsystems, Inc. | Delegating network processor operations to star topology serial bus interfaces |
US7191291B2 (en) | 2003-01-16 | 2007-03-13 | Ip-First, Llc | Microprocessor with variable latency stack cache |
FR2851349A1 (en) | 2003-02-17 | 2004-08-20 | St Microelectronics Sa | METHOD FOR MANAGING A MICROPROCESSOR STACK FOR BACKING UP CONTEXTUAL DATA |
US7325097B1 (en) | 2003-06-26 | 2008-01-29 | Emc Corporation | Method and apparatus for distributing a logical volume of storage for shared access by multiple host computers |
US7064973B2 (en) | 2004-02-03 | 2006-06-20 | Klp International, Ltd. | Combination field programmable gate array allowing dynamic reprogrammability |
US7405698B2 (en) | 2004-10-01 | 2008-07-29 | De Rochemont L Pierre | Ceramic antenna module and methods of manufacture thereof |
CN101213638B (en) | 2005-06-30 | 2011-07-06 | L·皮尔·德罗什蒙 | Electronic component and method of manufacture |
US8350657B2 (en) | 2005-06-30 | 2013-01-08 | Derochemont L Pierre | Power management module and method of manufacture |
US8354294B2 (en) | 2006-01-24 | 2013-01-15 | De Rochemont L Pierre | Liquid chemical deposition apparatus and process and products therefrom |
US7763917B2 (en) | 2006-01-24 | 2010-07-27 | De Rochemont L Pierre | Photovoltaic devices with silicon dioxide encapsulation layer and method to make same |
US7452766B2 (en) | 2006-08-31 | 2008-11-18 | Micron Technology, Inc. | Finned memory cells and the fabrication thereof |
US8934741B2 (en) | 2007-11-16 | 2015-01-13 | Brphotonics Produtos Optoelectronicos LTDA | Integrated circuit with optical data communication |
US20120137108A1 (en) | 2008-02-19 | 2012-05-31 | Koch Iii Kenneth Elmon | Systems and methods integrating boolean processing and memory |
US8527974B2 (en) | 2008-03-28 | 2013-09-03 | International Business Machines Corporation | Data transfer optimized software cache for regular memory references |
US8561044B2 (en) | 2008-10-07 | 2013-10-15 | International Business Machines Corporation | Optimized code generation targeting a high locality software cache |
US7816945B2 (en) | 2009-01-22 | 2010-10-19 | International Business Machines Corporation | 3D chip-stack with fuse-type through silicon via |
US8922347B1 (en) | 2009-06-17 | 2014-12-30 | L. Pierre de Rochemont | R.F. energy collection circuit for wireless devices |
US8952858B2 (en) | 2009-06-17 | 2015-02-10 | L. Pierre de Rochemont | Frequency-selective dipole antennas |
US20110114146A1 (en) | 2009-11-13 | 2011-05-19 | Alphabet Energy, Inc. | Uniwafer thermoelectric modules |
US8552708B2 (en) | 2010-06-02 | 2013-10-08 | L. Pierre de Rochemont | Monolithic DC/DC power management module with surface FET |
US9023493B2 (en) | 2010-07-13 | 2015-05-05 | L. Pierre de Rochemont | Chemically complex ablative max-phase material and method of manufacture |
US8705909B2 (en) * | 2010-07-16 | 2014-04-22 | Ibiden Co., Ltd. | Optical interconnect |
US20140013129A1 (en) * | 2012-07-09 | 2014-01-09 | L. Pierre de Rochemont | Hybrid computing module |
US9123768B2 (en) | 2010-11-03 | 2015-09-01 | L. Pierre de Rochemont | Semiconductor chip carriers with monolithically integrated quantum dot devices and method of manufacture thereof |
US8466559B2 (en) | 2010-12-17 | 2013-06-18 | Intel Corporation | Forming die backside coating structures with coreless packages |
US9490414B2 (en) | 2011-08-31 | 2016-11-08 | L. Pierre de Rochemont | Fully integrated thermoelectric devices and their application to aerospace de-icing systems |
-
2013
- 2013-06-13 US US13/917,601 patent/US20140013129A1/en not_active Abandoned
- 2013-06-13 US US13/917,607 patent/US9348385B2/en active Active
- 2013-07-09 CN CN201380046854.7A patent/CN104603944B/en not_active Expired - Fee Related
- 2013-07-09 BR BR112015000525A patent/BR112015000525A2/en not_active Application Discontinuation
- 2013-07-09 EP EP13817338.0A patent/EP2870630B1/en active Active
- 2013-07-09 WO PCT/US2013/049636 patent/WO2014011579A2/en active Application Filing
- 2013-07-09 CA CA2917932A patent/CA2917932A1/en not_active Abandoned
-
2016
- 2016-05-23 US US15/162,285 patent/US10620680B2/en not_active Expired - Fee Related
- 2016-05-23 US US15/161,815 patent/US9710181B2/en active Active
- 2016-05-24 US US15/162,739 patent/US9791909B2/en active Active
- 2016-05-24 US US15/162,745 patent/US9766680B2/en active Active
- 2016-05-24 US US15/162,759 patent/US20170031847A1/en not_active Abandoned
-
2017
- 2017-12-18 US US15/845,259 patent/US20180224916A1/en not_active Abandoned
-
2019
- 2019-12-30 US US16/729,630 patent/US11199892B2/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5781040A (en) * | 1996-10-31 | 1998-07-14 | Hewlett-Packard Company | Transformer isolated driver for power transistor using frequency switching as the control signal |
US6333895B1 (en) * | 1999-10-07 | 2001-12-25 | Mitsubishi Denki Kabushiki Kaisha | Clock synchronous semiconductor device having a reduced clock access time |
US20030102495A1 (en) * | 2001-12-05 | 2003-06-05 | Huppenthal Jon M. | Reconfigurable processor module comprising hybrid stacked integrated circuit die elements |
US20050268185A1 (en) * | 2004-05-26 | 2005-12-01 | David Vinke | Method and apparatus for high speed testing of latch based random access memory |
US7701252B1 (en) * | 2007-11-06 | 2010-04-20 | Altera Corporation | Stacked die network-on-chip for FPGA |
US20090304389A1 (en) * | 2008-06-05 | 2009-12-10 | Samsung Electronics Co., Ltd. | Semiconductor apparatuses having optical connections between memory controller and memory module |
US20120043598A1 (en) * | 2010-08-23 | 2012-02-23 | De Rochemont L Pierre | Power fet with a resonant transistor gate |
US20120198266A1 (en) * | 2011-01-28 | 2012-08-02 | Qualcomm Incorporated | Bus Clock Frequency Scaling for a Bus Interconnect and Related Devices, Systems, and Methods |
US8635492B2 (en) * | 2011-02-15 | 2014-01-21 | International Business Machines Corporation | State recovery and lockstep execution restart in a system with multiprocessor pairing |
US20130157639A1 (en) * | 2011-12-16 | 2013-06-20 | SRC Computers, LLC | Mobile electronic devices utilizing reconfigurable processing techniques to enable higher speed applications with lowered power consumption |
US20130205089A1 (en) * | 2012-02-08 | 2013-08-08 | Mediatek Singapore Pte. Ltd. | Cache Device and Methods Thereof |
US20130257481A1 (en) * | 2012-03-28 | 2013-10-03 | Sophocles R. Metsis | Tree based adaptive die enumeration |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9881915B2 (en) * | 2010-08-23 | 2018-01-30 | L. Pierre de Rochemont | Power FET with a resonant transistor gate |
US9153532B2 (en) * | 2010-08-23 | 2015-10-06 | L. Pierre de Rochemont | Power FET with a resonant transistor gate |
US20160225759A1 (en) * | 2010-08-23 | 2016-08-04 | L. Pierre de Rochemont | Power fet with a resonant transistor gate |
US10651167B2 (en) * | 2010-08-23 | 2020-05-12 | L. Pierre de Rochemont | Power FET with a resonant transistor gate |
US20150097221A1 (en) * | 2010-08-23 | 2015-04-09 | L. Pierre de Rochemont | Power fet with a resonant transistor gate |
US9558143B2 (en) | 2014-05-09 | 2017-01-31 | Micron Technology, Inc. | Interconnect systems and methods using hybrid memory cube links to send packetized data over different endpoints of a data handling device |
US10126947B2 (en) | 2014-05-09 | 2018-11-13 | Micron Technology, Inc. | Interconnect systems and methods using hybrid memory cube links to send packetized data over different endpoints of a data handling device |
US9501222B2 (en) | 2014-05-09 | 2016-11-22 | Micron Technology, Inc. | Protection zones in virtualized physical addresses for reconfigurable memory systems using a memory abstraction |
US11132127B2 (en) | 2014-05-09 | 2021-09-28 | Micron Technology, Inc. | Interconnect systems and methods using memory links to send packetized data between different data handling devices of different memory domains |
US11947798B2 (en) | 2014-05-09 | 2024-04-02 | Micron Technology, Inc. | Packet routing between memory devices and related apparatuses, methods, and memory systems |
WO2019213625A1 (en) | 2018-05-03 | 2019-11-07 | De Rochemont Pierre L | High speed / low power server farms and server networks |
CN112106156A (en) * | 2018-05-03 | 2020-12-18 | 皮尔·L·德罗什蒙 | High speed/low power server farm and server network |
EP3788644A4 (en) * | 2018-05-03 | 2023-03-01 | L. Pierre De Rochemont | High speed / low power server farms and server networks |
US11681348B2 (en) | 2018-05-03 | 2023-06-20 | L. Pierre de Rochemont | High speed / low power server farms and server networks |
US20230367384A1 (en) * | 2018-05-03 | 2023-11-16 | L. Pierre de Rochemont | Server farm with at least one hybrid computing module operating at clock speed optimally matching intrinsic clock speed of a related semiconductor die related thereto |
JP7398117B2 (en) | 2018-05-03 | 2023-12-14 | デ,ロシェモント,エル.,ピエール | Fast/Low Power Server Farms and Server Networks |
US11239922B2 (en) | 2018-06-05 | 2022-02-01 | L. Pierre de Rochemont | Module with high peak bandwidth I/O channels |
US11901956B2 (en) | 2018-06-05 | 2024-02-13 | L. Pierre de Rochemont | Module with high peak bandwidth I/O channels |
Also Published As
Publication number | Publication date |
---|---|
US20180224916A1 (en) | 2018-08-09 |
CA2917932A1 (en) | 2014-01-16 |
CN104603944A (en) | 2015-05-06 |
US20170139624A1 (en) | 2017-05-18 |
EP2870630A4 (en) | 2016-07-27 |
US11199892B2 (en) | 2021-12-14 |
BR112015000525A2 (en) | 2017-06-27 |
US9348385B2 (en) | 2016-05-24 |
US20170031844A1 (en) | 2017-02-02 |
EP2870630A2 (en) | 2015-05-13 |
US10620680B2 (en) | 2020-04-14 |
US9791909B2 (en) | 2017-10-17 |
US9766680B2 (en) | 2017-09-19 |
US20140013132A1 (en) | 2014-01-09 |
WO2014011579A3 (en) | 2014-03-27 |
US9710181B2 (en) | 2017-07-18 |
WO2014011579A2 (en) | 2014-01-16 |
CN104603944B (en) | 2018-01-02 |
EP2870630B1 (en) | 2023-02-22 |
US20200387206A1 (en) | 2020-12-10 |
US20170031413A1 (en) | 2017-02-02 |
US20170031847A1 (en) | 2017-02-02 |
US20170031843A1 (en) | 2017-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11199892B2 (en) | Hybrid computing module | |
EP3506511B1 (en) | Integrated circuit device with separate die for programmable fabric and programmable fabric support circuitry | |
US11928177B2 (en) | Methods and apparatus for performing video processing matrix operations within a memory array | |
US20130135008A1 (en) | Method and system for a run-time reconfigurable computer architecture | |
US8042082B2 (en) | Three dimensional memory in a system on a chip | |
US11061459B2 (en) | Hybrid computing module | |
EP3512101A1 (en) | Sector-aligned memory accessible to programmable logic fabric of programmable logic device | |
CN109219848A (en) | It is combined with the storage system of high density low bandwidth and low-density high bandwidth memory | |
EP1696318A1 (en) | Methods and apparatus for segmented stack management in a processor system | |
WO2005088443A2 (en) | Methods and apparatus for reducing power dissipation in a multi-processor system | |
WO2006098499A1 (en) | Methods and apparatus for dynamic linking program overlay | |
Pala et al. | Logic-in-memory architecture made real | |
JP2021057570A (en) | Packaged device with chiplet comprising memory resources | |
CN117616426A (en) | Methods, apparatus, and articles of manufacture for increasing utilization of a Neural Network (NN) accelerator circuit for a shallow layer of NN by reformatting one or more tensors | |
CN117751367A (en) | Method and apparatus for performing machine learning operations using storage element pointers | |
Hazarika et al. | Survey on memory management techniques in heterogeneous computing systems | |
US10489288B2 (en) | Algorithm methodologies for efficient compaction of overprovisioned memory systems | |
Vinçon et al. | Moving processing to data: On the influence of processing in memory on data management | |
EP3696664B1 (en) | Allocation of memory | |
US20220067524A1 (en) | Sparsity-aware datastore for inference processing in deep neural network architectures | |
Yang et al. | High-Performance Architecture Using Fast Dynamic Reconfigurable Accelerators | |
Elmegreen | Future trends in computing | |
Sterling et al. | First draft of a report on the continuum computer architecture | |
Alves et al. | Future Trends in Computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: DEROCHEMONT, L. PIERRE, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KOVACS, ALEXANDER J.;REEL/FRAME:031872/0809 Effective date: 20131214 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |