WO2013080426A1 - 熱を考慮した構造を持つ集積回路装置、三次元集積回路、三次元プロセッサ装置、及びプロセススケジューラ - Google Patents
熱を考慮した構造を持つ集積回路装置、三次元集積回路、三次元プロセッサ装置、及びプロセススケジューラ Download PDFInfo
- Publication number
- WO2013080426A1 WO2013080426A1 PCT/JP2012/006744 JP2012006744W WO2013080426A1 WO 2013080426 A1 WO2013080426 A1 WO 2013080426A1 JP 2012006744 W JP2012006744 W JP 2012006744W WO 2013080426 A1 WO2013080426 A1 WO 2013080426A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- processor
- circuit
- processor core
- chip
- integrated circuit
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 148
- 230000008569 process Effects 0.000 title claims description 135
- 230000015654 memory Effects 0.000 claims abstract description 270
- 230000020169 heat generation Effects 0.000 claims description 43
- 238000003860 storage Methods 0.000 claims description 38
- 238000010200 validation analysis Methods 0.000 claims description 13
- 230000008859 change Effects 0.000 claims description 10
- 238000012937 correction Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 2
- 230000010354 integration Effects 0.000 claims 1
- 238000012545 processing Methods 0.000 description 44
- 238000010586 diagram Methods 0.000 description 42
- 230000017525 heat dissipation Effects 0.000 description 20
- 238000001816 cooling Methods 0.000 description 14
- 239000010410 layer Substances 0.000 description 12
- 230000002093 peripheral effect Effects 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 6
- 230000015556 catabolic process Effects 0.000 description 6
- 238000006731 degradation reaction Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 230000007423 decrease Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 4
- 239000002184 metal Substances 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 229910052710 silicon Inorganic materials 0.000 description 4
- 239000010703 silicon Substances 0.000 description 4
- 230000003247 decreasing effect Effects 0.000 description 3
- 239000007788 liquid Substances 0.000 description 3
- 239000000872 buffer Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 239000000110 cooling liquid Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 239000002826 coolant Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010030 laminating Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7839—Architectures of general purpose stored program computers comprising a single central processing unit with memory
- G06F15/7864—Architectures of general purpose stored program computers comprising a single central processing unit with memory on more than one IC chip
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05F—SYSTEMS FOR REGULATING ELECTRIC OR MAGNETIC VARIABLES
- G05F5/00—Systems for regulating electric variables by detecting deviations in the electric input to the system and thereby controlling a device within the system to obtain a regulated output
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/16—Constructional details or arrangements
- G06F1/20—Cooling means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L23/00—Details of semiconductor or other solid state devices
- H01L23/34—Arrangements for cooling, heating, ventilating or temperature compensation ; Temperature sensing arrangements
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L25/00—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
- H01L25/03—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes
- H01L25/04—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers
- H01L25/065—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group H01L27/00
- H01L25/0657—Stacked arrangements of devices
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L27/00—Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate
- H01L27/02—Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including semiconductor components specially adapted for rectifying, oscillating, amplifying or switching and having potential barriers; including integrated passive circuit elements having potential barriers
- H01L27/0203—Particular design considerations for integrated circuits
- H01L27/0207—Geometrical layout of the components, e.g. computer aided design; custom LSI, semi-custom LSI, standard cell technique
-
- H—ELECTRICITY
- H10—SEMICONDUCTOR DEVICES; ELECTRIC SOLID-STATE DEVICES NOT OTHERWISE PROVIDED FOR
- H10B—ELECTRONIC MEMORY DEVICES
- H10B10/00—Static random access memory [SRAM] devices
- H10B10/18—Peripheral circuit regions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/601—Reconfiguration of cache memory
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2224/00—Indexing scheme for arrangements for connecting or disconnecting semiconductor or solid-state bodies and methods related thereto as covered by H01L24/00
- H01L2224/01—Means for bonding being attached to, or being formed on, the surface to be connected, e.g. chip-to-package, die-attach, "first-level" interconnects; Manufacturing methods related thereto
- H01L2224/10—Bump connectors; Manufacturing methods related thereto
- H01L2224/12—Structure, shape, material or disposition of the bump connectors prior to the connecting process
- H01L2224/13—Structure, shape, material or disposition of the bump connectors prior to the connecting process of an individual bump connector
- H01L2224/13001—Core members of the bump connector
- H01L2224/1302—Disposition
- H01L2224/13025—Disposition the bump connector being disposed on a via connection of the semiconductor or solid-state body
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2224/00—Indexing scheme for arrangements for connecting or disconnecting semiconductor or solid-state bodies and methods related thereto as covered by H01L24/00
- H01L2224/01—Means for bonding being attached to, or being formed on, the surface to be connected, e.g. chip-to-package, die-attach, "first-level" interconnects; Manufacturing methods related thereto
- H01L2224/10—Bump connectors; Manufacturing methods related thereto
- H01L2224/15—Structure, shape, material or disposition of the bump connectors after the connecting process
- H01L2224/16—Structure, shape, material or disposition of the bump connectors after the connecting process of an individual bump connector
- H01L2224/161—Disposition
- H01L2224/16135—Disposition the bump connector connecting between different semiconductor or solid-state bodies, i.e. chip-to-chip
- H01L2224/16145—Disposition the bump connector connecting between different semiconductor or solid-state bodies, i.e. chip-to-chip the bodies being stacked
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2225/00—Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
- H01L2225/03—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
- H01L2225/04—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
- H01L2225/065—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
- H01L2225/06503—Stacked arrangements of devices
- H01L2225/06513—Bump or bump-like direct electrical connections between devices, e.g. flip-chip connection, solder bumps
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2225/00—Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
- H01L2225/03—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
- H01L2225/04—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
- H01L2225/065—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
- H01L2225/06503—Stacked arrangements of devices
- H01L2225/06541—Conductive via connections through the device, e.g. vertical interconnects, through silicon via [TSV]
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2225/00—Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
- H01L2225/03—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
- H01L2225/04—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
- H01L2225/065—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
- H01L2225/06503—Stacked arrangements of devices
- H01L2225/06555—Geometry of the stack, e.g. form of the devices, geometry to facilitate stacking
- H01L2225/06562—Geometry of the stack, e.g. form of the devices, geometry to facilitate stacking at least one device in the stack being rotated or offset
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2225/00—Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
- H01L2225/03—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
- H01L2225/04—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
- H01L2225/065—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
- H01L2225/06503—Stacked arrangements of devices
- H01L2225/06555—Geometry of the stack, e.g. form of the devices, geometry to facilitate stacking
- H01L2225/06565—Geometry of the stack, e.g. form of the devices, geometry to facilitate stacking the devices having the same size and there being no auxiliary carrier between the devices
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2225/00—Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
- H01L2225/03—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
- H01L2225/04—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
- H01L2225/065—All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
- H01L2225/06503—Stacked arrangements of devices
- H01L2225/06589—Thermal management, e.g. cooling
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L24/00—Arrangements for connecting or disconnecting semiconductor or solid-state bodies; Methods or apparatus related thereto
- H01L24/01—Means for bonding being attached to, or being formed on, the surface to be connected, e.g. chip-to-package, die-attach, "first-level" interconnects; Manufacturing methods related thereto
- H01L24/10—Bump connectors ; Manufacturing methods related thereto
- H01L24/12—Structure, shape, material or disposition of the bump connectors prior to the connecting process
- H01L24/13—Structure, shape, material or disposition of the bump connectors prior to the connecting process of an individual bump connector
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L24/00—Arrangements for connecting or disconnecting semiconductor or solid-state bodies; Methods or apparatus related thereto
- H01L24/01—Means for bonding being attached to, or being formed on, the surface to be connected, e.g. chip-to-package, die-attach, "first-level" interconnects; Manufacturing methods related thereto
- H01L24/10—Bump connectors ; Manufacturing methods related thereto
- H01L24/15—Structure, shape, material or disposition of the bump connectors after the connecting process
- H01L24/16—Structure, shape, material or disposition of the bump connectors after the connecting process of an individual bump connector
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L25/00—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
- H01L25/18—Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof the devices being of types provided for in two or more different subgroups of the same main group of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2924/00—Indexing scheme for arrangements or methods for connecting or disconnecting semiconductor or solid-state bodies as covered by H01L24/00
- H01L2924/10—Details of semiconductor or other solid state devices to be connected
- H01L2924/11—Device type
- H01L2924/14—Integrated circuits
- H01L2924/143—Digital devices
- H01L2924/1432—Central processing unit [CPU]
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2924/00—Indexing scheme for arrangements or methods for connecting or disconnecting semiconductor or solid-state bodies as covered by H01L24/00
- H01L2924/10—Details of semiconductor or other solid state devices to be connected
- H01L2924/11—Device type
- H01L2924/14—Integrated circuits
- H01L2924/143—Digital devices
- H01L2924/1434—Memory
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01L—SEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
- H01L2924/00—Indexing scheme for arrangements or methods for connecting or disconnecting semiconductor or solid-state bodies as covered by H01L24/00
- H01L2924/15—Details of package parts other than the semiconductor or other solid state devices to be connected
- H01L2924/151—Die mounting substrate
- H01L2924/153—Connection portion
- H01L2924/1531—Connection portion the connection portion being formed only on the surface of the substrate opposite to the die mounting surface
- H01L2924/15311—Connection portion the connection portion being formed only on the surface of the substrate opposite to the die mounting surface being a ball array, e.g. BGA
Definitions
- the present invention relates to an integrated circuit device, a three-dimensional integrated circuit, a three-dimensional processor device, and a process scheduler that control heat generation.
- the three-dimensional integrated circuit is configured by stacking a plurality of chips, and the plurality of chips are connected by through silicon vias (Through Silicon Via; hereinafter referred to as “TSV”), micro bumps, or the like.
- TSV Through Silicon Via
- Three-dimensional integrated circuits are attracting attention as high-performance integrated circuits that realize high-speed circuits, wideband data communication, low power consumption, and the like.
- an integrated circuit without a three-dimensional stack is referred to as a “two-dimensional integrated circuit”.
- the circuit In a three-dimensional integrated circuit, the circuit is arranged not only in the direction of the two-dimensional plane but also in the three-dimensional direction. Therefore, compared to the two-dimensional integrated circuit, there is a problem that the generated heat is easy to burn (it is difficult to escape). . If the high temperature state due to the generated heat continues, malfunction of the integrated circuit is very likely to occur. There are two main reasons why the generated heat tends to burn.
- the first is an event related to the heat source.
- a heat source may overlap in the stacking direction.
- the heat generated from a single circuit (chip) but also the heat generated from chips stacked one above the other may be a heat source.
- a three-dimensional processor in which the same chips are stacked will be described.
- the processor basically has a high temperature around the arithmetic unit.
- the number of arithmetic units is 10 compared to the other regions. The result was that the temperature rose more than 1 degree.
- the second is an event related to cooling.
- the distance from the heat source to a cooling device such as a heat sink may be large, which may make it difficult to cool the generated heat.
- Silicon and metal wiring constituting an integrated circuit have high thermal conductivity.
- a material such as an insulating film disposed between metal wirings has low thermal conductivity. For this reason, the greater the distance to the heat sink, that is, the greater the number of laminated chips, the easier the heat is generated.
- Patent Document 1 discloses a SiP (System in Package) in which a driver chip and a microcomputer chip are stacked.
- SiP System in Package
- a driver chip that tends to be hot and a circuit block that is vulnerable to heat are arranged so as not to overlap each other.
- this SiP has a driver chip floor plan designed to match the layout of the microcomputer chip. That is, it is not a technology that can be applied to various three-dimensional integrated circuits for general purposes.
- An integrated circuit device includes a first circuit configured by a memory circuit, a second circuit configured by an arithmetic circuit, and a control circuit, and the first circuit includes a second circuit, Are divided into a plurality of circuit blocks according to the distance between the arrangement positions, and the control circuit controls each of the divided circuit blocks independently.
- the integrated circuit device of the present invention it is possible to stop only the area of the memory circuit that cannot operate due to the influence of the generated heat and continuously operate the area of the memory circuit that can operate simultaneously. The performance degradation of the processor chip due to the influence of heat is minimized.
- the present invention in a three-dimensional integrated circuit, it is possible to suppress the generation of hot spots that become high-temperature sites due to heat generated intensively.
- FIG. 2A is a plan view of a processor chip (circuit) c1 according to the first embodiment of the present invention.
- B) is a plan view of a conventional processor chip c1.
- A) And (b) is a figure which shows the example of distribution of the part of 85 degree
- A) (b) is a figure which shows another example of the division
- FIG. 1 It is a figure which shows the example of the layout of the circuit which comprises the inside of the processor core in the conventional processor chip. It is a top view of a processor chip concerning a 2nd embodiment of the present invention. It is a schematic diagram of the level 2 cache memory comprised by 4 ways. It is a figure which shows the example which allocated each way to the divided
- (a) is a figure which shows another example of the circuit layout of the processor chip c1 which concerns on 6th Embodiment, respectively. It is a figure which shows another example of the circuit layout of the processor chip which concerns on 6th Embodiment.
- (A) is a side view of a three-dimensional integrated circuit in which two processor chips shown in (b) are stacked.
- (B) is a circuit diagram of the processor chip. It is a side view of a general three-dimensional integrated circuit.
- (A) is a side view of a three-dimensional integrated circuit in which two processor chips are stacked.
- (B) is a schematic diagram in the case of stacking two processor chips.
- (A) is a side view of a three-dimensional integrated circuit according to a seventh embodiment of the present invention in which two processor chips are stacked.
- (B) is a schematic diagram in the case of stacking two processor chips in the three-dimensional integrated circuit according to the seventh embodiment.
- (A) is a side view of another example of the three-dimensional integrated circuit according to the seventh embodiment in which two processor chips are stacked.
- (B) is a schematic diagram in the case of stacking two processor chips in another example of the three-dimensional integrated circuit according to the seventh embodiment. It is a figure which shows another example of the three-dimensional laminated circuit which concerns on 7th Embodiment. It is a figure which shows another example of the three-dimensional laminated circuit which concerns on 7th Embodiment.
- FIG. 1 is a side view of the three-dimensional integrated circuit which laminated
- FIG. 1 is a block diagram showing a relationship between a processor chip and an allocation control circuit in a three-dimensional integrated circuit according to an eighth embodiment of the present invention.
- FIG. B is a block diagram showing a relationship between a processor chip and an allocation control circuit in another example of the three-dimensional integrated circuit according to the eighth embodiment of the present invention.
- C is a side view which shows the relationship between a processor chip and a heat sink in the three-dimensional integrated circuit which concerns on the 8th Embodiment of this invention.
- A) is the side view of another example of the three-dimensional integrated circuit which laminated
- (B) is a schematic diagram of two processor chips according to the eighth embodiment. It is a figure which shows the relationship between the block diagram of the conventional process scheduler part, and each processor chip in the three-dimensional integrated circuit comprised by laminating
- a preferred embodiment relates to a three-dimensional integrated circuit that controls so as not to generate hot spots that tend to generate heat and become high temperature.
- Embodiments relating to a circuit structure and a control method for cooling a high temperature portion of a processor chip.
- Embodiments relating to a chip layout and a circuit layout in which chips serving as heat sources are not overlapped between chips of different layers.
- Embodiments relating to a method for restricting the operation and process allocation of each circuit so that a high temperature place (hot spot) is not formed on a chip.
- FIG. 1A is a plan view of a processor chip (circuit) c1 according to the first embodiment.
- FIG. 1B is a plan view of a conventional processor chip c1 ′.
- the processor chip c1 is roughly divided into a circuit block that performs an operation called a processor core and a storage area called a level 2 cache memory. In many cases, a plurality of processor cores are provided in one processor chip.
- the processor chip c1 illustrated in FIG. 1A includes two processor cores (processor core 0 and processor core 1). Furthermore, the processor core includes a level 1 cache, a register file, an integer arithmetic unit, a decimal arithmetic unit, a SIMD (Single Instruction Multiple Data) arithmetic unit, a load store unit, and the like.
- the level 2 cache memory is a storage area composed of a plurality of SRAM subarrays.
- the processor chip c1 further includes a peripheral circuit 4, and an operation control circuit 6 that controls the operation of the processor core and the level 2 cache memory.
- the peripheral circuit 4 includes a clock control unit, a power supply control unit, an external memory interface unit, a PCI-Express interface unit, and the like. The control operation of the operation control circuit 6 will be described later.
- the processor core In the processor chip c1 'as shown in FIG. 1, the processor core generates more heat than the level 2 cache memory. The reason is that the level 2 cache memory has a large capacity and the operation speed is slower than that of the processor core. Furthermore, since the processor core operates in a pipeline manner, many parts of the circuit operate simultaneously, but it is also possible to mention that storage elements such as level 2 cache memory do not activate all of the storage areas at the same time. It is done.
- the processor core and a part of the level 2 cache memory adjacent thereto may exceed the allowable temperature at which the processor chip operates.
- the data stored in the level 2 cache memory is purged to the external memory, and then the power to the processor core and the level 2 cache memory having heat is cut off, and the heat on the processor chip is removed. It is conceivable to cool the part that has it.
- the portion of the level 2 cache memory that is heated to an inoperable level is only in the vicinity of the processor core, and when the entire level 2 cache memory is shut down in the conventional configuration shown in FIG. It becomes impossible to use even the operable part.
- a level 2 cache memory is often shared by a plurality of processor cores. For this reason, if the entire power supply of the level 2 cache memory is cut off, other processor cores that do not generate heat cannot operate.
- the processor chip c1 of the present embodiment solves the above-described problem.
- a circuit block configured by a processor core and a storage area configured by a level 2 cache memory
- the storage area is divided into three blocks (ie, level 2 cache memory (1), level 2 cache memory (2), and level 2 cache memory (3)).
- the three memory blocks are arranged according to the distance from the two processor cores, and the three memory blocks (level 2 cache memory (1), level 2 cache memory (2), and level 2 cache memory) Each of (3)) is configured to be controlled independently.
- the processor chip c1 of this embodiment having such a configuration, even when the temperature of the storage area (level 2 cache memory) exceeds the operable range due to heat received from the processor core, the level 2 cache It is not necessary to stop the entire memory for heat dissipation, and only the memory in the area that cannot be operated due to the influence of heat can be stopped and the memory in the area that can be operated simultaneously can be continuously operated. By doing so, the performance degradation of the processor chip c1 due to the influence of heat can be minimized.
- FIGS. 2A and 2B show examples of distributions of portions of 85 degrees or more in the processor chip c1 when the microchip c1 shown in FIG. 1A is operating.
- the heat distribution in the processor chip including the processor core varies depending on the processor architecture, the circuit block layout, the cache capacity, the device structure, the contents of the execution program, and the like.
- FIG. 2A illustrates an example of distribution of a portion of 85 degrees or more on the processor chip c1 when a program with a heavy load is executed on the processor core 0 and a program with a light load is executed on the processor core 1. Show. It can be seen that the region where heat of 85 degrees or more is generated extends to the region of the level 2 cache memory (1). If the processor chip c1 is continuously operated as it is, it can be assumed that the temperature further increases in the region of the processor core 0 and the level 2 cache memory (1) and exceeds the allowable operating temperature as an integrated circuit.
- the operation is stopped only by a minimum circuit block (here, the portion of the processor core 0 and the level 2 cache memory (1)), and the allowable operating temperature as an integrated circuit is reached. Do not exceed.
- the processor core 1, the level 2 cache memory (2), and the level 2 cache memory (3) can be continuously operated as they are.
- FIG. 2B shows an example of distribution of a portion of 85 degrees or more on the processor chip c1 when a heavy program is executed on both the processor core 0 and the processor core 1.
- an area where heat of 85 degrees or more is generated extends to the range of the level 2 cache memory (1) and the level 2 cache memory (2).
- an allowable operating temperature as an integrated circuit in the areas of the processor core 0, the processor core 1, the level 2 cache memory (1), and the level 2 cache memory (2). It can be assumed that this will be exceeded.
- the processor core 0, the processor core 1, the level 2 cache memory (1), and the level 2 cache memory (2) are stopped so as not to exceed the allowable operating temperature as the integrated circuit. .
- the level 2 cache memory (3) can be continuously operated as it is.
- FIGS. 3A and 3B are diagrams showing another example of the divided layout of the level 2 cache memory.
- the level 2 cache memory shown in FIG. 1A is divided into three areas, for example, it may be divided into two areas as shown in FIG. It may be divided into the above.
- a region divided in correspondence with the individual processor cores is provided in the region near the processor cores.
- One region may be provided in the remote region.
- the level 2 cache memory may be divided non-homogeneously.
- the idea of this embodiment is applied to the processor core and the level 2 cache memory.
- a circuit that generates a large amount of heat such as an arithmetic unit, and a cache memory
- the idea of the present embodiment can be applied.
- the present invention may be applied to a level 1 cache memory in a processor core.
- FIG. 5 shows an example of a layout of a circuit constituting the inside of the processor core in the conventional processor chip c1 '.
- the processor core c1 ' is roughly configured by a level 1 instruction cache memory, a level 1 data cache memory, an instruction processing unit, a data processing unit, an integer arithmetic unit, and a decimal arithmetic unit.
- the level 1 instruction cache memory is arranged at the top of the processor core, and instructions are transferred to the instruction processing unit immediately below. Thereafter, in accordance with this instruction, data processing is performed in the level 1 data cache memory, the data processing unit, the integer arithmetic unit, and the decimal arithmetic unit.
- FIG. 4 is an example of a circuit layout of the processor core in the processor chip c1 of the present embodiment.
- the decimal arithmetic unit may be used frequently or not at all.
- the level 1 instruction cache memory is divided into a portion close to the decimal arithmetic unit (level 1 instruction cache memory (2)) and a portion close to the instruction processing unit (level 1 instruction cache memory (1)).
- the divided level 1 instruction cache memory (1) and the level 1 instruction cache memory (2) are controlled independently. For example, when the temperature of the decimal arithmetic unit becomes high, the processor core 0 switches to a program that does not require decimal arithmetic and operates. Thereby, the decimal arithmetic unit and the level 1 instruction cache memory (2) do not exceed the allowable operating temperature as an integrated circuit. At this time, an instruction code using decimal arithmetic may be placed on the level 1 instruction cache memory (2) side.
- the processor core may use the integer arithmetic unit frequently or not at all depending on the program. Therefore, the level 1 data cache memory may be divided into a portion close to the integer arithmetic unit (level 1 data cache memory (2)) and a portion close to the data processing unit (level 1 data cache memory (1)). . At this time, the divided level 1 data cache memory (1) and the level 1 data cache memory (2) are controlled independently.
- a circuit that combines a circuit that generates a large amount of heat, such as an arithmetic unit, and a circuit that generates less heat and that has the same configuration uniformly distributed (such as a cache memory) In the chip, the memory circuit is appropriately divided, and the divided circuit blocks are controlled independently. With this configuration, it is possible to stop only the memory circuit area that cannot be operated due to the influence of the generated heat and continuously operate the memory circuit area that can be operated simultaneously. Therefore, the performance degradation of the processor chip due to the influence of the generated heat can be minimized.
- a circuit including an arithmetic unit and a cache memory is taken up.
- the present invention is not limited to this, and a circuit including a circuit block with a large amount of heat generation and a circuit block with a small amount of heat generation is also used. This embodiment can be applied.
- FIG. 6 is a plan view of a processor chip c1 according to the second embodiment.
- the level 2 cache memory is divided into a plurality of blocks, similarly to the processor chip according to the first embodiment shown in FIG.
- Each block of the level 2 cache memory shown in FIG. 6 is configured to be used in units of ways.
- the level 2 cache memory is composed of 4 ways (way 0 to 3), and “way 0” uses the area of the level 2 cache memory (1) and “way 1”. Uses the area of the level 2 cache memory (2), and “way 2” and “way 3” use the area of the level 2 cache memory (3).
- FIG. 7 is a schematic diagram of a level 2 cache memory composed of 4 ways.
- the level 2 cache memory includes a memory array 11a that holds tags and a memory array 11b that holds data. Memory arrays 11a and 11b are allocated to each of the four ways.
- the tags of each way are pulled out to the cache memory control circuit 8 by the lower bits of the address. Then, if the value stored in each tag and the higher-order bits of the address are compared and matched, it becomes a hit, and if they do not match, it becomes a miss.
- the data selection circuit 13 selects the way of the hit tag, reads the data, and outputs it to the processor core. In the case of a miss, data is acquired from an external memory (not shown).
- FIG. 8 shows an example in which each way is assigned to a divided block (area) of the level 2 cache memory.
- a bit flag 10 indicating validity / invalidity of ways 0 to 4 is provided.
- the bit flag 10 is used to invalidate each way.
- the bit flag 10 is written from the processor core or the operation control circuit 6 of the processor core and the cache memory.
- the bit flag 10 is prepared for each way. When the flag value is ON, all the ways are invalid. For example, as shown in FIG.
- the areas where the control such as the power shutdown is performed independently may be separated from each other as shown in the level 2 cache memories (1) and (3) shown in FIG. 9B.
- the level 2 cache memory (2) separated from the processor core is used, a cache memory in which heat from the processor core is not easily transmitted is used.
- FIG. 9 (c) the power of the two consecutive areas (way 0: level 2 cache memory (1), way 1: level 2 cache memory (2)) is shut off and at the same time one processor core It is also possible to use as a single-core processor chip by cutting off the power supply.
- the 4-way set associative method has been described.
- other ways may be used.
- a 2-way set associative method or an 8-way set associative method may be used.
- the divided level 2 cache memory circuit blocks are used in units of ways, and the use of units of ways is effective. Controlled by the invalid flag 10.
- FIG. 10 is a plan view of a processor chip c1 according to the third embodiment.
- the level 2 cache memory is divided into a plurality of blocks, similarly to the processor chip according to the first embodiment shown in FIG.
- Each block of the level 2 cache memory shown in FIG. 10 is configured to be used in units of sets.
- the level 2 cache memory includes 4 sets (sets 0 to 3). “Set 0” and “Set 1” use the area of the level 2 cache memory (3), “Set 2” uses the area of the level 2 cache memory (2), and “Set 3” uses the level 2 cache memory. The memory (1) area is used.
- FIG. 10 shows a state where the power is cut off for the set 2 and the set 3.
- FIG. 11A is a schematic diagram of a level 2 cache memory composed of 4 ways and 4 sets according to the present embodiment.
- a tag mask circuit 12 is provided in the input portion of the address, and the capacity (number of sets) of the memory array (data) can be changed.
- the set is a parameter related to the capacity per way.
- the cache memory is mounted separately in a subarray 14 as shown in FIG.
- a group of the subarrays 14 is referred to as a “set”.
- the subarray 14 includes one or a plurality of cache lines. When the number of sets is changed, the bits used for the index and tag change in the bit field in the address. In order to adjust this, a tag mask circuit 14 is mounted.
- each set has two subarrays 14 and each subarray 14 has one cache line, and each set has one subarray 14.
- each subarray 14 is provided with two cache lines. At this time, there are only 2 lines per set. The value of the least significant bit of the address specifies one of the two lines. The value of the remaining upper bits of the address is compared with the value stored in the tag to identify the way.
- the search and identification of the set and cache line are performed by the value of 2 bits from the least significant bit of the address, and the value of the remaining higher bits of the address is stored in the tag. The way is identified by comparing with the stored value.
- the SRAM sub-array 14 shown in FIG. 12 includes an SRAM array 16 serving as an array of storage elements.
- a control system circuit is constituted by a row driver 18 for selecting a row in the array, a column driver 24 for selecting a column, and a decoder circuit 22 for generating these selection signals.
- a data system path is configured by a data buffer 26 that buffers external data and a write data driver / sense amplifier 26 that drives write data to the inside of the array and amplifies read data from the array.
- a CLK control / memory control circuit 30 that distributes control signals and clock signals to these circuits controls the entire circuit.
- the cache memory is divided based on each of the way and the set, but may be divided based on other criteria.
- the cache memory area may be divided by combining ways and sets.
- the circuit block (memory block) of the divided level 2 cache memory is used in units of sets, and in units of sets. Usage is controlled. By configuring in this way, it is possible to stop only the circuit block related to the use of the inoperable set unit due to the influence of the generated heat and continuously operate the operable range. Therefore, the performance degradation of the processor chip due to the influence of the generated heat can be minimized.
- the fourth embodiment relates to the heat dissipation control of the divided level 2 cache memory.
- circuit blocks that are affected by heat are separated and fine control is performed for heat dissipation.
- FIG. 13 is a first processing flow for powering off a level 2 cache memory (hereinafter simply referred to as “cache memory”) according to the fourth embodiment.
- cache memory that controls the processing flow shown in FIG. 13
- a temperature detection circuit that monitors the temperature is provided for each divided circuit block (memory block).
- a temperature detection circuit 34 shown in FIG. 20 described later can be used. The same applies to the cache memory that controls the processing flow shown in FIGS.
- the process proceeds to cooling processing.
- the data in the cache memory needs to be written back to the external memory (write back) for data that does not match the contents in the external memory.
- write back the write-back is not necessary, so this processing may be omitted.
- the contents of the level 2 cache memory in the area where the power is shut off are purged (write back) (S12), and each purged cache line is invalidated (invalidated) (S13). Thereafter, the power is shut off (S14).
- the temperature of the processor chip decreases.
- the monitoring circuit monitors until the temperature becomes lower than the startable temperature (S15).
- the startable temperature needs to be determined in consideration of errors in the monitoring circuit and the like.
- the cache memory becomes lower than the startable temperature (S15, YES)
- power is supplied to the cache memory and the cache memory is initialized (S15). At this time, since all the values of the cache memory immediately after the start of power supply are indefinite, invalidate by initialization. Thereafter, it is used as a normal cache memory while monitoring the temperature (S11).
- FIG. 14 is a second process flow according to the fourth embodiment.
- the processing in steps S21 to S26 shown in FIG. 14 is substantially the same as the processing in steps S11 to S16 shown in FIG. 13, but in the processing flow shown in FIG. 14, the power supply voltage is changed (decreased) without shutting off the power. (S24).
- Power consumption such as leakage current of the integrated circuit is proportional to the power supply voltage. For this reason, even if the power supply voltage is lowered, the amount of heat generated is lowered and the cooling effect is obtained. Further, when the voltage is lowered within a range where data can be held, the cache memory is not purged (S22) or invalidated (S23).
- a memory circuit under a high temperature condition has a low data retention capability. Therefore, either the processing flow for shutting off the power supply shown in FIG. 13 or the processing flow for changing (decreasing) the power supply voltage shown in FIG. 14 is suitable. It is preferable to determine whether or not the application to be operated is included.
- FIG. 15 is a third process flow according to the fourth embodiment.
- the processing in steps S31 to S36 shown in FIG. 15 is substantially the same as the processing in steps S11 to S16 shown in FIG. 13, but in the processing flow shown in FIG. 15, clock gating is performed without shutting off the power (S34). ).
- clock gating unlike the power shutdown process, data is not volatilized. For this reason, when the cache memory becomes lower than the startable temperature (S35 / YES) and the gated cache memory is restored, that is, when the gated cache memory is released, It is not necessary to initialize the cache memory (S36).
- the gating of the cache memory is performed by gating the input clock to the CLK / memory control circuit 30 in the block diagram of the SRAM sub-array shown in FIG.
- FIG. 16 is a fourth process flow according to the fourth embodiment.
- the processing in steps S41 to S46 shown in FIG. 16 is substantially the same as the processing in steps S11 to S16 shown in FIG. 13.
- S45: YES startable temperature
- the cache memory whose frequency of the memory clock has been lowered is returned to the original frequency, it is not necessary to initialize the cache memory ( S46).
- the frequency of the memory clock to the cache memory is changed by changing the input clock to the CLK / memory control circuit 30 in the SRAM subarray block diagram shown in FIG. Note that the frequency change (decrease) of the memory clock of the cache memory may be performed simultaneously with the reduction of the power supply voltage, such as DVFS (Dynamic Voltage and Frequency Scaling).
- DVFS Dynamic Voltage and Frequency Scaling
- FIG. 17 is a fifth process flow according to the fourth embodiment.
- the processing in steps S51 to S56 shown in FIG. 17 is substantially the same as the processing in steps S11 to S16 shown in FIG. 13, but in the processing flow shown in FIG. 17, the power supply is not shut down and the duty ratio of the memory clock is changed. (Decrease) is performed (S54).
- the cache memory becomes lower than the startable temperature (S55, YES), and it is not necessary to initialize the cache memory when the duty ratio of the memory clock is restored (S56).
- FIG. 18 is a sixth process flow according to the fourth embodiment.
- the processing of steps S61 to S66 shown in FIG. 18 is substantially the same as the processing of steps S11 to S16 shown in FIG. 13.
- the chip select signal (see 12) is fixed to “disabled” (here, “1”) (S64).
- the process of fixing the chip select signal of the subarray to disabled is also different from the process of shutting down the power, and the data is not volatilized. For this reason, the cache memory becomes lower than the startable temperature (S65, YES), and initialization of the cache memory is not necessary when releasing the fixation of the chip select signal of the subarray (S56). Fixing the chip select signal of the subarray to disabled is performed by changing the input control signal to the CLK / memory control circuit 30 in the block diagram of the SRAM subarray shown in FIG.
- control methods shown in FIGS. 13 to 18 may be combined to control the heat dissipation of the divided cache memory.
- a change in power supply voltage and clock gating may be combined, or a change in power supply voltage and a change in clock frequency may be combined.
- the program may be executed with the power supply voltage and the operating frequency lowered.
- the processor core 1 may perform memory clock gating, and the level 2 cache memories (1) and (2) may perform power dissipation by performing power-off.
- control for heat dissipation is performed on the divided circuit blocks of the cache memory based on the generated heat to be monitored.
- Controls for heat dissipation include power cutoff, power supply voltage drop, clock gating, memory clock frequency reduction, memory clock duty ratio reduction, and SRAM subarray chip select signal fixation.
- the fifth embodiment relates to a specific configuration for controlling heat dissipation of a split cache memory.
- the triggers for starting the heat dissipation control are all temperatures, but in the device according to the fifth embodiment, the trigger for starting the heat dissipation control is something other than the temperature.
- the control target is the power supply voltage. However, as shown in the processing flow of the fourth embodiment, the control target is different (for example, the memory clock frequency). ).
- FIG. 20 is a diagram illustrating a first example of a circuit configuration of a processor chip c1 according to the fifth embodiment.
- each level 2 cache memory (1) is based on the temperature information of each block constituting the level 2 cache memory, that is, the level 2 cache memories (1), (2), and (3).
- the power supply voltage of (3) is controlled.
- a temperature detection circuit 34 is arranged at the center of each of the level 2 cache memories (1), (2) and (3). Has been.
- the temperature detection circuit 34 is configured by, for example, a thermal diode.
- a thermal diode is an element whose temperature is determined by measuring a voltage when a current is passed.
- the temperature information detected by the temperature detection circuit 34 is sent to the operation control circuit 6 arranged outside the cache memory.
- the operation control circuit 6 is provided with a cache block power supply control circuit 36 for controlling the power supply voltage to each block of the level 2 cache memory.
- the cache memory is purged. Therefore, the information on the start of power supply voltage drop (or shutdown) is notified to the processor core side and purged. Processing and invalidation processing are performed. After these processes are completed on the processor core side, the cache block power supply control circuit 36 performs a process of lowering (or shutting down) the power supply voltage.
- FIG. 21 is a diagram illustrating a second example of the circuit configuration of the processor chip c1 according to the fifth embodiment.
- the operation control circuit 6 includes a cache block power supply control circuit 40 for controlling the power supply voltage to each block of the level 2 cache memory, and each cache block power supply control circuit 40 is a timer.
- a circuit 38 is provided.
- the cache block power supply control circuit 40 starts decreasing (or shutting down) the power supply voltage after a lapse of a fixed time measured by the timer circuit 38.
- the timer circuit 38 measures a predetermined time so that the cache block power control circuit 40 shuts off the power to the block of the level 2 cache memory. It may be.
- the external temperature may be input to the timer circuit 38 and the cycle of the timer circuit 38 may be changed according to the external temperature.
- FIG. 22 is a diagram illustrating a third example of the circuit configuration of the processor chip c1 according to the fifth embodiment.
- the cache block power supply control circuit 44 included in the operation control circuit 6 of the processor chip c1 shown in FIG. 22 varies (or cuts off) the power supply voltage based on the clock signal supplied to the processor core.
- the clock gear switching circuit 46 included in the peripheral circuit 4 switches the frequency of the clock signal supplied to the processor core (processor core 0, processor core 1).
- the clock monitor circuit 42 provided in each cache block power supply control circuit 44 monitors the frequency of the clock signal supplied to the processor core.
- the cache block power supply control circuit 44 performs control to vary (or cut off) the power supply voltage based on the monitor information of the clock monitor circuit 42.
- DVFS Dynamic Voltage and Frequency Scaling
- a mechanism of DVFS in which a given frequency and voltage fluctuate according to a load on the processor is used in a processor chip. Therefore, by monitoring the frequency of the clock signal to the processor core, the tendency of heat generation in the processor core can be grasped. For example, suppose that there is a processor whose frequency is 1.8 GHz when the load is small and whose frequency is 3.8 GHz when the load is large. In this case, for example, it is conceivable to assume three stages of frequencies (1.8 GHz, 3.0 GHz, and 4.5 GHz) and change the area of the cache memory used at each clock frequency.
- the level 2 cache memory (3) is used, in the case of 3.0 GHz, the level 2 cache memory (1) (2) (3) is used, and in the case of 4.5 GHz
- level 2 cache memories (2) and (3) are used.
- the case of 4.5 GHz is a case where the frequency is increased instantaneously.
- the cache memory near the processor core that is, the level 2 cache memory (1)
- the clock monitor circuit 42 monitors the clock frequency.
- the clock monitor circuit 42 includes a timer circuit according to the fourth embodiment shown in FIG. The power supply voltage may be reduced (or cut off).
- FIG. 23 is a diagram illustrating a fourth example of the circuit configuration of the processor chip c1 according to the fifth embodiment.
- the cache block power supply control circuit 50 included in the operation control circuit 6 of the processor chip c1 shown in FIG. 23 is based on the operation rate of the arithmetic unit in the processor core, which is calculated by the operation rate calculation circuit 48 included in the operation control circuit 6.
- the power supply voltage is changed (or cut off).
- the heat generation inside the processor core is greatly influenced by the frequency of use of the arithmetic unit. For example, depending on the processor, integer arithmetic units and decimal arithmetic units tend to generate heat.
- the operation rate of the decimal or integer arithmetic unit is calculated from the instruction in the processor core.
- the operation rate calculation circuit 48 shown in FIG. 23 calculates the operation rate of the decimal / integer arithmetic circuit (arithmetic unit) based on the instruction given to the decimal / integer arithmetic circuit 54 by the instruction decoding unit 52 included in the processor core.
- the cache block power supply control circuit 50 starts to reduce (or cut off) the power supply voltage of the cache memory.
- the operation rate calculation circuit 48 is based not only on the instruction given by the instruction decoding unit 52 to the decimal / integer arithmetic circuit 54 but also on the operating frequency of the processor core. Thus, it may be configured to calculate the operating rate.
- FIG. 24 is a diagram illustrating a fifth example of the circuit configuration of the processor chip c1 according to the fifth embodiment.
- the cache block power supply control circuit 58 included in the operation control circuit 6 of the processor chip c1 shown in FIG. 24 is based on the cache memory miss rate calculated by the cache miss rate calculation circuit 56 also provided in the operation control circuit 6. Vary (or cut off) the voltage.
- the processor core normally stops processing without performing an operation. In other words, a processor core with many cache misses is often paused without performing much computation, and therefore does not generate much heat. Using this fact, the power supply voltage of each block of the level 2 cache memory is changed depending on the size of the cache miss rate.
- the cache miss rate calculation circuit 56 calculates a cache miss rate based on an access signal to the external memory via the BCU 60.
- the cache block power supply control circuit 58 reduces (or shuts off) the power supply voltage of the cache memory. ).
- control for heat dissipation is started on the divided circuit block of the cache memory based on a trigger for starting control of heat dissipation.
- Triggers for controlling heat dissipation include temperature information in a divided circuit block of the cache memory, period, frequency of clock signal given to the processor core, operation rate of the arithmetic unit in the processor core, cache memory miss rate, etc. Is mentioned.
- FIG. 6 6.1. Arrangement of Operation Control Circuit in Processor Chip
- the sixth embodiment relates to the arrangement position of the operation control circuit 6 that controls the heat dissipation of the cache memory and the processor core. If the operation control circuit 6 becomes higher than the operation allowable temperature due to the influence of ambient heat, the cache memory and the processor core are not properly dissipated. Accordingly, it is necessary to consider the generation of ambient heat with respect to the arrangement position of the operation control circuit 6.
- FIG. 25A when the operation control circuit 6 is arranged in the vicinity of a processor core (processor core 0, processor core 1), the operation control circuit 6 itself is It may become higher than the allowable operating temperature due to heat. Therefore, it is preferable that the operation control circuit 6 be arranged away from the processor core in a processor chip that easily generates heat.
- FIG. 25B is a diagram illustrating a first example of a circuit layout of the processor chip c1 according to the sixth embodiment. The shaded portion indicates a region where heat of 85 degrees or more is likely to be generated. As shown in FIG.
- the operation control circuit 6 is arranged in the vicinity of the level 2 cache memory but at the farthest position from the processor core. By arranging in this way, the operation control circuit 6 is hardly affected by heat generated from other parts, and the operation control circuit 6 itself does not reach a temperature exceeding the allowable range. Further, since the cache memory exists between the processor core and the processor core, a cooling operation such as power-off in the cache memory can be started before the heat from the processor core enters the operation control circuit 6. Therefore, the operation control circuit 6 can easily maintain a low temperature.
- FIG. 26 is a diagram illustrating a second example of the circuit layout of the processor chip c1 according to the sixth embodiment.
- the operation control circuit 6 shown in FIG. 26 is arranged on another chip c2.
- the operation control circuit 6 is arranged on the separate chip c2 in this way, heat is not conducted via the silicon substrate or the metal wiring. That is, the heat transmitted to the operation control circuit 6 can be surely cut off as compared with the case where the operation control circuit 6 is formed on the same chip as the processor core and the cache memory.
- FIG. 27A is a diagram illustrating a third example of the circuit layout of the processor chip c1 according to the sixth embodiment.
- the operation control circuit 6 is arranged on the outer periphery of the processor chip c1.
- heat of the operation control circuit 6 is easily released.
- not only the upper and lower surfaces of the processor chip c1, but also air or liquid cooled from the side surface may be sent, so that the outer periphery than the center of the processor chip c1 may be sent. The temperature can be lowered.
- the operation control circuit 6 is disposed inside the IO cell 64 as shown in FIG.
- FIG. 28 is a diagram illustrating a fourth example of the circuit layout of the processor chip c1 according to the sixth embodiment.
- the circuit shown in FIG. 28 is obtained by three-dimensionally stacking three processor chips c1, c2, and c3, and a heat sink 66 is provided on the upper surface of the uppermost processor chip c1.
- the operation control circuit 6 is disposed on the uppermost processor chip c 1 closest to the heat sink 66.
- the heat generated in the three-dimensional multilayer circuit shown in FIG. 28 is dissipated from the heat sink 66 at the top of the three-dimensional multilayer circuit and the printed circuit board at the bottom of the three-dimensional multilayer circuit, so that the central processor chip c2 has the highest temperature. Cheap. Subsequently, the temperature increases in the order of the processor chip c3 on the printed circuit board side and the processor chip c1 on the heat sink 66 side. In the case of such a structure, the temperature of the processor chip c1 close to the heat sink 66 is likely to be stable at the lowest temperature, and therefore it is preferable to mount the operation control circuit 6 in this portion. By mounting in this way, it is possible to prevent the operation control circuit 6 itself from exceeding the allowable operating temperature.
- the operation control circuit 6 may be arranged in the processor chip c2 in contact with the cooling liquid, and the cooling mechanism The operation control circuit 6 may be arranged on an adjacent processor chip. Since heat also escapes considerably to the printed circuit board side, the operation control circuit 6 may be arranged on the processor chip c3 closest to the printed circuit board.
- an operation control circuit that controls heat dissipation of the processor core and the cache memory is arranged at a position that maintains a low temperature. By doing so, it is avoided that the operation control circuit becomes higher than the allowable operation temperature.
- the seventh embodiment relates to a three-dimensional integrated circuit formed by stacking a plurality of processor chips. First, a general three-dimensional integrated circuit composed of a plurality of processor chips will be described.
- FIG. 29A is a side view of a three-dimensional integrated circuit in which two processor chips including the processor core, the level 1 cache memory, and the level 2 cache memory shown in FIG. 29B are stacked.
- Each processor chip is designed to operate alone, and four processor cores, a level 1 cache memory and a level 2 cache memory are mounted in the same processor chip.
- a peripheral circuit 4 for accessing a graphics circuit and an external memory is mounted on the periphery.
- a three-dimensional multiprocessor is realized by connecting the processor chip c1 having the above basic configuration to another processor chip c2 via the bumps 68 arranged in the central portion of the processor chip.
- the number of processor cores can be changed depending on the product grade.
- one chip has four cores for the low end, and two chips have eight cores for the middle range.
- a multi-core processor system can be constructed as a 16-core configuration with 4 chips. Since such a three-dimensional integrated circuit can be manufactured by stacking a large number of identical chips, the cost of chip masks can be suppressed, and the production line in the production factory can be used effectively. .
- the three-dimensional integrated circuit is advantageous in terms of yield cost.
- a measure against the yield of a large chip exceeding several hundred mm 2 such as a processor chip, it is effective to divide it into small chips and stack them three-dimensionally.
- FIG. 31A is a side view of a three-dimensional integrated circuit in which two processor chips c1 and c2 are stacked.
- FIG. 31B is a schematic diagram in the case of stacking two processor chips c1 and c2.
- FIG. Each of the processor chips shown in FIGS. 31A and 31B includes two areas of processor cores and two areas of level 2 cache memory.
- the temperature inside the processor core tends to rise more than level 2 cache memory.
- the level 2 cache memory is a storage element, and since all the cells constituting the memory are not activated simultaneously, the amount of heat generated is small.
- FIG. 31B the processor core portions of the two processor chips c1 and c2 overlap each other. In this case, since the heat sources overlap in the vertical direction, the temperature of the processor core portion of the two processor chips c1 and c2 becomes very high.
- the inventor has confirmed by simulation that the temperature rises by 10 degrees or more than when the processor chip is operated alone. In the configuration as shown in FIG. 31 (b), the performance is lowered as compared with the configuration operated by a single processor chip.
- the seventh embodiment solves the above-mentioned problems.
- the three-dimensional stacked circuit according to this embodiment is a three-dimensional integrated circuit configured by stacking two or more processor chips, and at least two of the processor chips have the same circuit block layout, In addition, the two processor chips have a structure in which the arrangement is changed between layers and stacked.
- the same circuit block layout means that the transistor layers other than the wiring layer are the same in the mask of the processor chip. In other words, the masks used in the FEOL process (Front End of Line) match.
- FIG. 32A is a side view of a three-dimensional integrated circuit according to the seventh embodiment in which two processor chips c1 and c2 are stacked
- FIG. FIG. 10 is a schematic diagram when two processor chips c1 and c2 are stacked in a three-dimensional integrated circuit according to a seventh embodiment.
- the processor chip c1 and the processor chip c2 are stacked by rotating 180 degrees. In this way, by stacking the processor chips c1 and c2 by rotating them 180 degrees, the processor core portion with a large amount of heat generation and the cache memory portion with a small amount of heat generation are superimposed.
- the two processor chips c1 and c2 are stacked by rotating 180 degrees, but the rotation angle between the stacked processor chips may not be 180 degrees.
- the rotation may be 45 degrees or 90 degrees.
- portions with a large amount of heat generation such as processor cores do not overlap each other, and an offset may be provided without rotating and the processor chips may be shifted and stacked.
- FIGS. 33 (a) and 33 (b) they may be laminated by being rotated and provided with an offset.
- the idea of the invention according to the seventh embodiment is not realized only by two processor chips having the same circuit block layout.
- the invention according to the seventh embodiment The idea can be realized.
- the idea of the seventh embodiment is realized by a three-dimensional integrated circuit having the following configuration.
- the three-dimensional integrated circuit includes a first chip and a second chip that is directly stacked on the first chip.
- the first chip includes a circuit block having a relatively large amount of heat generation and a circuit block having a relatively small amount of heat generation.
- the second chip also includes a circuit block with a relatively large amount of heat generation and a circuit block with a relatively small amount of heat generation.
- the first chip and the second chip are arranged so that the circuit block having a relatively large amount of heat generated in the first chip does not overlap the circuit block having a relatively large amount of heat generated in the second chip in the vertical direction. May be arranged mutually.
- the first chip and the second chip have a minimum area where the circuit block having a relatively large amount of heat generated in the first chip and the circuit block having a relatively large amount of heat generated in the second chip overlap each other.
- a circuit block having a relatively large amount of heat generation and “a circuit block having a relatively small amount of heat generation” are, for example, the following circuit blocks (1) to (3).
- (2) A circuit block that generates the largest amount of heat and other circuit blocks.
- the portions that generate a large amount of heat hardly overlap each other, so that hot spots do not occur.
- the cost of the cooling mechanism in the three-dimensional integrated circuit can be reduced and the performance of the three-dimensional integrated circuit can be expected.
- FIG. 34 shows a second example of the three-dimensional multilayer circuit according to the seventh embodiment, in which regions where portions with small amounts of generated heat overlap are further divided. That is, in the processor chip c1 and the processor chip c2, the level 2 cache memory is divided into an area where the processor core and the level 2 cache memory overlap and an area where the level 2 cache memories overlap. With this configuration, even if the level 2 cache memory (2) of another processor chip exceeds the allowable operating temperature due to heat conducted from the processor core of a certain processor chip, the remaining level in the other processor chip It is assumed that relatively no heat is conducted to the two-cache memory (1). At this time, it is possible to minimize performance degradation due to partial stop of the cache memory.
- each of the processor chips c1 and c2 may be further modified so that the central portion of each of the processor chips c1 and c2 is a level 3 cache memory as shown in FIG. Since the central area of each processor chip is not easily affected by heat, it may be configured as a cache memory that can be shared by each core of each processor chip.
- FIG. 36 is a diagram illustrating a third example of the three-dimensional multilayer circuit according to the seventh embodiment.
- the region where the portions with small heat generation overlap each other has an average lower temperature than the other portions.
- a region in which portions with small heat generation overlap each other may be mounted with a high-speed cache memory as in the three-dimensional stacked circuit shown in FIG.
- High-speed cache memory operates at high speed, but tends to increase current consumption and generate heat.
- the thermal problem does not become so great even if a high-speed cache memory is arranged.
- the arrangement shown in FIG. 36 results in the performance of the high-speed cache memory.
- a low-power cache memory may be mounted in an area overlapping with a portion where the heat generation amount is large, as shown in FIG. If a memory with high power consumption is arranged in the area of the level 2 cache memory that overlaps a portion with a large amount of heat generation (that is, a processor core portion, for example), the amount of heat generation also increases. For this reason, if a cache memory with low power consumption such as a low power cache memory is arranged, heat generation can be suppressed.
- FIG. 38 is a diagram illustrating a fourth example of the three-dimensional multilayer circuit according to the seventh embodiment. It is preferable not to arrange a control circuit or the like of the entire processor in a portion that overlaps a portion that generates a large amount of heat (for example, a processor core) when stacked.
- the power supply control circuit 36a for controlling the power supply of the entire processor chip is arranged in an area where the cache memories overlap. Note that the power supply control circuit 36a in FIG. 38 controls the power supply of the entire processor chip based on the temperature detected by the temperature sensor 34a provided in the processor core on the same processor chip.
- the three-dimensional stacked circuit according to the seventh embodiment is a three-dimensional integrated circuit configured by stacking two or more processor chips, and at least two of the processor chips have the same circuit block layout.
- the two processor chips are stacked with their arrangement being changed between the layers. This makes it easier to avoid the occurrence of hot spots in the three-dimensional integrated circuit.
- FIG. 39A is a diagram illustrating a configuration of a first example of a three-dimensional integrated circuit according to the eighth embodiment.
- the three-dimensional integrated circuit of the first example is a three-dimensional integrated circuit in which three processor chips c1, c2, and c3 are stacked.
- FIG. 39B is a schematic diagram in the case of stacking three processor chips c1, c2, and c3 according to the eighth embodiment.
- the three-dimensional integrated circuit according to the eighth embodiment constructs a multiprocessor system. This system (three-dimensional integrated circuit) is configured by stacking three processor chips having the same circuit layout.
- a multiprocessor system having six processor cores as a whole is obtained.
- the software recognizes it as a processor chip in which six processor cores are arranged on one chip, that is, a six-core multiprocessor.
- a three-dimensional integrated circuit formed by stacking processor chips has a problem of heat dissipation. For example, when the portions of the circuit blocks that generate heat overlap each other by stacking and operate simultaneously, heat may be generated more than a single-layer processor chip. For this reason, in a multiprocessor system, it is preferable to execute a program in consideration of heat generation.
- the assignment control unit provided in the three-dimensional integrated circuit is controlled so that processor cores that execute programs do not overlap in the upper and lower layers. That is, as shown in FIG. 39C, processes (ie, programs) are assigned to the processor cores by the assignment control unit so that the processor cores operating on the respective processor chips do not overlap in the three-dimensional direction.
- FIG. 40A is a block diagram showing the relationship between the three processor chips c1, c2, and c3 and the assignment control unit 77 in the three-dimensional integrated circuit according to the eighth embodiment.
- the assignment control unit 77 includes a processor core position storage unit 88.
- the processor core position storage unit 88 is provided for each processor core (processor core 1-0, processor core 1-1, processor core 2-0, processor core 2-1, processor core 3-0, processor core 3-1).
- the position data (position data) in the three-dimensional integrated circuit is stored.
- the assignment control unit 77 is included in the peripheral circuit 4 of each of the processor chips c1, c2, c3, for example. Further, one of the processor cores may operate as the assignment control unit 77. That is, one processor core may include the assignment control unit 77.
- FIG. 40B shows a configuration in which the processor core 1-0 of the processor chip c1 includes an allocation control circuit in the three-dimensional integrated circuit according to the eighth embodiment including three processor chips c1, c2, and c3. Show.
- FIG. 39C is a table showing an operation example in which the three-dimensional integrated circuit according to the eighth embodiment operates the processor core in each processor chip under the control of the assignment control unit 77.
- the processor core 1-1 of the processor chip c1, the processor core 2-0 of the processor chip c2, and the processor core 3-1 of the processor chip c3 operate. That is, control is performed so that the processor cores that overlap vertically between adjacent processor chips do not operate.
- the processor cores of the upper and lower adjacent overlapping portions do not operate at the same time, i.e., do not overlap as a heat generation source. Is suppressed.
- the allocation control unit 77 performs the tertiary processing of each processor core stored in the processor core position storage unit 88 so that the processor cores that execute programs do not overlap in the upper and lower layers.
- Process (program) assignment is performed based on position data in the original integrated circuit.
- the allocation control unit 77 performs control to allocate various processes (programs) on the assumption of heat generation based on the position data in the three-dimensional integrated circuit of each processor core stored in the processor core position storage unit 88. It can be carried out.
- the assignment control unit 77 may control program assignment to each processor core so that the processor cores arranged adjacent to the left and right do not simultaneously operate the program. In the example of FIGS. 40A and 40B, for example, the program assignment is controlled so that the processor core 2-0 and the processor core 2-1 do not operate the program at the same time.
- the allocation control unit 77 controls program allocation to each processor core so that when a certain processor core is operating a program, the processor core furthest from the processor core continues to operate the program. Also good.
- the processor core 3-0 which is the processor core farthest from the processor core 1-1, continues.
- program allocation is controlled so that the program is operated.
- the assignment control unit 77 controls the assignment of the program to each processor core so that the processor core in the vicinity of the heat sink operates with priority. Also good.
- the processor Program allocation is controlled so that the processor core 1-0 and the processor core 1-1 in the chip c1 are allocated with priority over other processor cores.
- the allocation control unit 77 avoids the processor core including the allocation control unit 77 and
- the program assignment to each processor core may be controlled such that the processor cores of the processor operate the program.
- the processor core 1-0 since the processor core 1-0 includes the assignment control unit 77, a processor core other than the processor core 1-0 (processor core 1-1, processor core 2-0, processor) Program allocation is controlled so that the core 2-1, the processor core 3-0, and the processor core 3-1) operate the program.
- the allocation control unit 77 avoids a processor core in the vicinity of the allocation control unit 77.
- the program assignment to each processor core may be controlled so that the other processor cores operate the program.
- the assignment control unit 77 is arranged in the peripheral circuit 4 portion in the vicinity of the processor core 2-1 in the processor chip c2, the processor cores other than the processor core 2-1 ( The program allocation is controlled so that the processor core 1-0, the processor core 1-1, the processor core 2-0, the processor core 3-0, and the processor core 3-1) operate the program.
- the allocation control unit 77 performs the program for each processor core. Control assignments. By assigning a program to each processor core in consideration of position data, the occurrence of hot hot spots is suppressed.
- FIG. 41A is a diagram showing a configuration of a second example of the three-dimensional integrated circuit in the eighth embodiment.
- the three-dimensional integrated circuit of the second example is a three-dimensional integrated circuit in which two processor chips c1 and c2 are stacked.
- FIG. 41B is a schematic diagram of two processor chips c1 and c2 according to the eighth embodiment.
- processor chips having the same circuit layout are stacked.
- processor chips c1 and c2 having different circuit layouts are stacked.
- the three-dimensional integrated circuit shown in FIG. 41B is a six-processor core three-dimensional integrated circuit configured by stacking a processor chip c1 having four processor cores and a processor chip c2 having two processor cores. is there.
- the processor core of the processor chip c1 and the processor core of the processor chip 2 do not completely overlap. In such a case, if a part of the processor core of one processor chip that generates more heat than the periphery overlaps the processor core of another processor chip, the part that generates more heat will overlap vertically. You may judge that it is. For example, since a decimal arithmetic unit, an integer arithmetic unit, etc. are likely to become hot, when this circuit block of the processor core overlaps with the processor core of the other processor chip, the processor cores overlap each other vertically. I think.
- the processor core 0 When the program is executed by the processor cores 1 and 2 of the processor chip c1 as in the operation example 1 of the table shown in FIG. 41C, the processor core 0 is used without using the processor core 1 of the processor chip c2. Is controlled by the allocation control unit 77 provided in the three-dimensional integrated circuit. By controlling the assignment control unit 77 in this manner, the processor cores that generate heat do not overlap one above the other, so that the hot spot can be prevented from becoming hot.
- the allocation control unit 77 controls program allocation to each processor core as shown in the following (1) to (4).
- (1) Control is performed so that processor cores arranged adjacent to the left and right do not simultaneously operate programs.
- (3) Control so that many processor cores close to the heat sink operate programs.
- a processor core in the vicinity of the allocation control unit 77 is avoided, and control is performed so that other processor cores operate the program.
- the eighth embodiment relates to a three-dimensional integrated circuit in which a plurality of processor chips are stacked.
- the process (program) of a process (program) is considered in consideration of the positional relationship of individual processor cores in the three-dimensional integrated circuit. Allocation is controlled. For example, process (program) allocation is controlled so that the processor cores of the overlapping portions between adjacent processor chips between layers do not operate. By performing such process (program) allocation, adjacent and overlapping portions of processor cores do not overlap or concentrate as heat generation sources, so that the generation of hot spots can be suppressed.
- FIG. 43 is a block diagram of a first example of the process scheduler 78a according to the ninth embodiment.
- FIG. 42 shows the relationship between the block diagram of the conventional process scheduler 78 ′ as a premise thereof and the processor chips c1 and c2 in the three-dimensional integrated circuit formed by stacking two processor chips c1 and c2.
- FIG. 43 is a block diagram of a first example of the process scheduler 78a according to the ninth embodiment.
- FIG. 42 shows the relationship between the block diagram of the conventional process scheduler 78 ′ as a premise thereof and the processor chips c1 and c2 in the three-dimensional integrated circuit formed by stacking two processor chips c1 and c2.
- a process flow of process scheduling in a three-dimensional integrated circuit formed by stacking processor chips will be described first.
- a plurality of processes are accepted on the operating system. These processes are scheduled by the process scheduling unit 80 ′ and correspond to each processor core (processor core 1-0, processor core 1-1, processor core 2-0, processor core 2-1). Processes are accumulated in the process queue units 84a, 84b, 84c, and 84d.
- the process schedule unit 80 ′ performs scheduling based on the priority assigned to each process. However, in a multi-core processor environment, scheduling is performed based on the load balance of each processor core. Therefore, the process schedule unit 80 ′ performs scheduling using the load amount of each processor core acquired by the processor core load acquisition unit 82. The scheduling algorithm is not described here.
- the process schedule unit 80a allocates processes to the process queue units 84a, 84b, 84c, and 84d according to the load amount of each processor core held by the processor core load acquisition unit 82. That is, a large number of tasks (processes) are allocated to processor cores with a small load, and a small number of tasks (processes) are allocated to processor cores with a large load.
- the processor so as to virtually increase the load amount of the processor core stacked and overlapped with the processor core having a temperature higher than the predetermined value, that is, to increase the load amount of the processor core. Data in the core load acquisition unit 82 is rewritten.
- the processor core position storage unit 88 stores the position of each processor core in the three-dimensional integrated circuit.
- the processor core temperature acquisition unit 90 always acquires the temperature of each processor core. From these two types of information, the data of the processor core load acquisition unit 82 is rewritten so that the load amount of the processor core that overlaps the processor core that is higher than the predetermined value in the stacking direction is larger than the actual load amount. It is done. This rewriting process is performed by the processor core load correcting unit 86.
- the processor load correction unit 86 in the processor core acquisition unit 82 is arranged so that not many processes are assigned to the processor core in the center of the stack, and conversely, the process is preferentially assigned to the processor core close to the heat sink. You may comprise so that the load amount of each processor core may be rewritten.
- the processor core load correction unit 86 shown in FIG. 43 rewrites the load amount of the target processor core to the maximum value and suppresses the allocation of processes to the processor core.
- the processor core load correcting unit 86 may rewrite the load amount of the target processor core to a slightly higher value to reduce the number of processes to be assigned. By doing in this way, since the load of the target processor core can be reduced, the amount of generated heat can be suppressed.
- the load state information for rewriting may be calculated as a larger value based on the data in the processor core temperature acquisition unit 90 and the data in the processor core position storage unit 88. . By doing so, it is possible to make the load amount of the processor core that overlaps the processor core having a high temperature by stacking seem to be pseudo high, and the process allocation amount is apparently reduced, so that the heat generation amount is reduced.
- FIG. 44 is a block diagram of a second example of the process scheduler 78b according to the ninth embodiment.
- a process queue invalidation enabling control unit 92 including a processor core position storage unit 88 and a processor core temperature acquisition unit 90 is connected to the process schedule unit 80b in the process scheduler 78b shown in FIG.
- the process queue invalidation validation control unit 92 invalidates and validates the process queue units 84a, 84b, 84c, and 84d.
- the process queue invalidation validation control unit 92 invalidates (or validates) the process queue units 84a, 84b, 84c, and 84d corresponding to each processor core.
- the process queue invalidation validation control unit 92 uses information on the position of each processor core and information on the temperature of each processor core.
- the processor core temperature acquisition unit 90 may acquire temperature information of each processor core from a circuit such as a thermal diode mounted on the processor chip, or may be estimated by a predetermined algorithm from a load state and an outside air temperature. Also good.
- the temperature information of each processor core obtained by the processor core temperature acquisition unit 90 is given to the process queue invalidation validation control unit 92. Further, the process queue invalidation enabling control unit 92 uses the information on the position of each processor core stored in the processor core position storage unit 88 to grasp the temperature of each processor core and the positional relationship between adjacent processor cores. To do.
- the process queue invalidation validation control unit 92 determines a process queue unit to be invalidated and validated based on the temperature and positional relationship of each processor core.
- An example of the determination procedure is to invalidate (stop) a process queue unit for a processor core that is stacked with a processor core having a temperature higher than a predetermined value and overlaps with the processor core.
- any one of the following control rules (1) to (11) may be used.
- the process queue portion of the processor core that is in contact with the hottest one among the currently operating processor cores in the vertical and horizontal directions is invalidated (stopped).
- Invalidating stopping the process queue unit of the processor core that is stacked on top and bottom of the currently operating processor core that has a temperature higher than the threshold value.
- Invalidate stop the process queue portion of the processor core that is in contact with the processor core that is currently operating and that exhibits a temperature that is equal to or higher than the threshold value.
- the processor cores that are currently operating and exhibiting a temperature equal to or higher than the threshold value are stopped, and at the same time, the process queue unit of the processor core that is stacked with the processor core and overlaps the upper and lower is enabled.
- a processor core that is currently operating and that shows a temperature equal to or higher than a threshold value is stopped, and at the same time, the processor core is in contact with the processor core in the vertical and horizontal directions (including the adjacency in the diagonal direction such as upper left and upper right). Enable the core process queue.
- the process queue unit is enabled or disabled so that adjacent processor cores do not operate simultaneously between adjacent processor chips regardless of the temperature.
- the process queue unit is enabled or disabled so that adjacent processor cores do not operate simultaneously between adjacent processor chips in the vertical and horizontal directions regardless of the temperature.
- All processor cores can be used for the processor chip (first) in contact with the heat sink, and the above-described procedures (1) to (10) are performed for the processor cores in the other processor chips. .
- the process queue invalidation validation control unit 92 controls the validation and invalidation of each process queue unit.
- the processes (1) to (8) described above may be performed by the process schedule unit 80a shown in FIG.
- the processor core load acquisition unit 82 performs control of validation and invalidation of the process queue unit.
- the process queue invalidation enabling control unit 92 may control the number of processes that can be executed by the processor core by changing the size of the queue without invalidating the process queue unit. . By doing so, the operation load of the processor core is reduced, and the temperature of the hot spot in the processor core can be suppressed.
- the number of adjacent processor chips is up to 1 (processor) chip, but the number of adjacent processor chips may be up to 2 chips or 3 chips. Also, adjacent processor cores in the same processor chip may be two adjacent cores (for example, processor core 1 and processor core 2 are adjacent to processor core 1) or three adjacent cores.
- the heat dissipation performance (that is, the degree of heat accumulation) differs between the processor chip close to the heat sink and the other processor chips. For this reason, all the processor cores are operated in the processor chips (first to third) near the heat sink, and the adjacent one processor core is stopped in the fourth to fifth processor chips.
- the adjacent two processor cores may be stopped in the first processor chip, and the adjacent one processor core may be stopped in the eighth to tenth processor chips.
- there is no eleventh processor chip there is a printed circuit board, and heat is radiated from the printed circuit board. Therefore, in the last three chips, one adjacent processor core is stopped.
- the numbers “1 to 3”, “4 to 5”, “5 to 8”, and “8 to 10” are merely examples.
- the heat sink includes not only a metal heat sink but also water cooling for liquid cooling and air cooling for flowing air.
- “adjacent to the heat sink” is in a portion in contact with the cooling medium.
- the processor chip in contact with the liquid flow path is a processor chip adjacent to the heat sink.
- process scheduler 78 shown in FIGS. 43 and 44 is premised on being implemented as software on the operating system for the CPU, but may be a hardware process scheduler having a similar mechanism. Further, in this specification, they are called processes, but they may be tasks or programs.
- the ninth embodiment relates to a process scheduler for a three-dimensional integrated circuit in which a plurality of processor chips are stacked, and includes a process schedule unit that controls scheduling of processes to a process queue unit for each processor core. Input data is controlled. By doing so, the occurrence of hot spots in the local processor core is avoided.
- the level 2 cache memory may be a level 3 cache memory or a level 4 cache memory, and does not depend on the hierarchy of the cache memory.
- An integrated circuit device includes: Including a first circuit configured by a memory circuit, a second circuit configured by an arithmetic circuit, and a control circuit; The first circuit is divided into a plurality of circuit blocks according to a distance of an arrangement position between the first circuit and the second circuit. The control circuit controls each of the divided circuit blocks independently.
- An integrated circuit device is the integrated circuit device according to the first aspect, If there is no control by the control circuit, the first circuit exceeds the operable temperature range due to the influence of heat generated by the operation of the second circuit.
- An integrated circuit device is the integrated circuit device according to the second aspect,
- the memory circuit is a cache memory, and the arithmetic circuit is a processor core.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit is configured to independently control supply and cut-off of a power supply voltage for each of the divided circuit blocks.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls a change in power supply voltage independently for each of the divided circuit blocks.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls clock gating independently for each of the divided circuit blocks.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls the frequency change of the memory clock independently for each of the divided circuit blocks.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls the change of the duty ratio of the memory clock independently for each of the divided circuit blocks.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls a chip select signal of a subarray in each of the divided circuit blocks independently.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls the circuit block independently based on the temperature in each of the divided circuit blocks.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls the circuit block independently based on a time measured by a timer provided for each of the divided circuit blocks.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls the circuit block independently based on a frequency of a clock applied to the second circuit.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls the circuit block independently based on an operation rate of an arithmetic circuit in the second circuit.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit controls the circuit block independently based on a cache miss ratio to the cache memory.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit is arranged at a position adjacent to the first circuit farthest from the second circuit.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit is arranged in a separate chip.
- An integrated circuit device is the integrated circuit device according to the third aspect, The control circuit is installed on the outer periphery of the same chip farthest from the second circuit.
- An integrated circuit device is the integrated circuit device according to the third aspect, In addition, including a heat sink, The control circuit is arranged in a layer of a chip closest to the heat sink.
- a three-dimensional integrated circuit includes: A three-dimensional integrated circuit including a first chip and a second chip that is directly stacked on the first chip,
- the first chip includes a circuit block having a relatively large amount of heat generation and a circuit block having a relatively small amount of heat generation
- the second chip includes a circuit block having a relatively large amount of heat generation and a circuit block having a relatively small amount of heat generation
- the first chip and the first chip so that the area where the circuit block having a relatively large amount of heat generated in the first chip and the circuit block having a relatively large amount of heat generated in the second chip overlap each other is minimized.
- the second chips are stacked after being arranged with each other.
- a three-dimensional integrated circuit according to a twentieth aspect of the present invention includes: A three-dimensional integrated circuit configured by stacking two or more chips, Among them, at least two chips are chips having the same circuit block layout, The at least two chips are arranged differently between layers.
- a three-dimensional integrated circuit according to a twenty-first aspect of the present invention is the three-dimensional integrated circuit according to the twentieth aspect,
- the at least two chips having the same circuit block layout are characterized in that one chip is stacked by being rotated 90 degrees or 180 degrees with respect to the other chip.
- a three-dimensional integrated circuit according to a twenty-second aspect of the present invention is the three-dimensional integrated circuit according to the twentieth aspect,
- the at least two chips having the same circuit block layout are processor chips and constitute a multi-core system.
- a three-dimensional integrated circuit according to a twenty-third aspect of the present invention is the three-dimensional integrated circuit according to the twenty-second aspect, In the at least two processor chips, one processor chip is rotated and stacked by 90 degrees or 180 degrees with respect to the other processor chip.
- a three-dimensional integrated circuit according to a twenty-fourth aspect of the present invention is the three-dimensional integrated circuit according to the twenty-third aspect, A first area in which the level 2 cache memories are adjacent to each other in the vertical direction at the time of stacking is divided in each of the at least two processor chips, The divided first area is controlled independently by each processor chip.
- a three-dimensional integrated circuit according to a twenty-fifth aspect of the present invention is the three-dimensional integrated circuit according to the twenty-fourth aspect,
- the divided first area is constituted by a level 3 cache memory.
- the three-dimensional integrated circuit of the twenty-sixth aspect of the present invention is the three-dimensional integrated circuit of the twenty-fifth aspect,
- the divided first area is composed of a high-speed cache memory. To do.
- a three-dimensional integrated circuit according to a twenty-seventh aspect of the present invention is the three-dimensional integrated circuit according to the twenty-fourth aspect, In each of the at least two processor chips, the second area adjacent to the processor core in the vertical direction when stacked is constituted by a low power consumption cache memory.
- the three-dimensional integrated circuit of the twenty-eighth aspect of the present invention is the three-dimensional integrated circuit of the twenty-fifth aspect, In addition, including a control circuit, The control circuit is arranged in the first region.
- a three-dimensional integrated circuit including a first chip and a second chip that is directly stacked on the first chip,
- the first chip includes a circuit block having a relatively large amount of heat generation and a circuit block having a relatively small amount of heat generation
- the second chip includes a circuit block having a relatively large amount of heat generation and a circuit block having a relatively small amount of heat generation
- the first chip and the second chip are arranged so that a circuit block having a relatively large amount of heat generated in the first chip does not overlap a circuit block having a relatively large amount of heat generated in the second chip in the vertical direction. And are stacked on top of each other.
- a three-dimensional processor device provides: A three-dimensional processor device including a plurality of processor chips to be stacked and an allocation control unit, Each processor chip comprises one or more processor cores,
- the allocation control unit includes a processor core position storage unit that stores data of a position of each processor core in the three-dimensional processor device, The allocation control unit controls program allocation to each processor core based on the position data of each processor core stored in the processor core position storage unit.
- a three-dimensional processor device is the three-dimensional processor device according to the thirtieth aspect,
- the allocation control unit controls program allocation to each processor core so that processor cores arranged adjacent to each other between the stacked processor chips do not simultaneously operate the program.
- a three-dimensional processor device is the three-dimensional processor device according to the thirtieth aspect,
- the allocation control unit controls program allocation to each processor core so that processor cores arranged adjacent to each other on the left and right do not simultaneously operate between the stacked processor chips.
- a three-dimensional processor device is the three-dimensional processor device according to the thirtieth aspect,
- the allocation control unit controls program allocation to each processor core so that processor cores arranged adjacent to each other on the left and right do not simultaneously operate between the stacked processor chips.
- a three-dimensional processor device is the three-dimensional processor device according to the thirtieth aspect,
- the allocation control unit controls program allocation to each processor core such that when one processor core is operating a program, the processor core farthest from the processor core continues to operate the program.
- a three-dimensional processor device is the three-dimensional processor device according to the thirtieth aspect, In addition, including a heat sink, The allocation control unit controls program allocation to each processor core so that a processor core near the heat sink operates with priority.
- the three-dimensional processor device of the thirty-sixth aspect of the present invention is the three-dimensional processor device of the thirtieth aspect,
- the allocation control unit is included in one of the processor cores, The allocation control unit controls the program allocation to each processor core so that the processor core including the allocation control unit is avoided and the other processor cores operate the program.
- the three-dimensional processor device of the thirty-seventh aspect of the present invention is the three-dimensional processor device of the thirtieth aspect,
- the plurality of processor chips have the same circuit block layout.
- the process scheduler of the thirty-eighth aspect of the present invention is A process scheduler for a plurality of processor cores in a three-dimensional multi-core processor device configured by stacking a plurality of processor chips, A load acquisition unit for acquiring a load amount in each processor core; In each process queue unit corresponding to each processor core in the three-dimensional multi-core processor device, a schedule unit that schedules a process based on the load amount of each processor core; In the processor core load acquisition unit, a load correction unit for correcting the load amount of each processor core, A position storage unit for storing the position of each processor core; A temperature acquisition unit for acquiring the temperature of each processor core; The load correction unit uses the position information of each processor core stored in the position storage unit and the temperature information of each processor core acquired by the temperature acquisition unit to determine the load amount of each processor core at the time of load acquisition. It is characterized by correction.
- a process scheduler is the process scheduler according to the thirty-eighth aspect,
- the load correction unit uses the position information of each processor core stored in the position storage unit and the temperature information of each processor core acquired by the temperature acquisition unit, and moves the processor core that is higher than a predetermined value up and down.
- the load amount of processor cores arranged adjacent to each other is modified so that the scheduling unit stops scheduling.
- the process scheduler for a plurality of processor cores in a three-dimensional multi-core processor device configured by stacking a plurality of processor chips, A process queue unit that queues processes for each processor core and causes each processor core to execute a process in turn, A queue invalidation enabling control unit that controls invalidation and validation of each of the process queue units; A position storage unit for storing the position of each processor core; A temperature acquisition unit for acquiring the temperature of each processor core; The queue invalidation validation control unit invalidates the process queue unit using position information of each processor core stored in the position storage unit and temperature information of each processor core acquired by the temperature acquisition unit. And controlling the activation.
- the process scheduler according to the forty-first aspect of the present invention is the process scheduler according to the fortieth aspect,
- the queue invalidation enabling control unit uses a position information of each processor core stored in the position storage unit and a temperature information of each processor core acquired by the temperature acquisition unit, and a processor having a temperature higher than a predetermined value. It is characterized by invalidating a process queue portion for a processor core arranged adjacent to the core in the vertical direction.
- a circuit structure and a control method for cooling a high-temperature portion of a processor chip according to the present disclosure a chip layout and a circuit layout that are arranged so that heat generation circuits do not overlap between chips in different layers, and hot on a chip
- the method of restricting the operation and process allocation of each circuit so as to prevent spotting is preferably used for a three-dimensional integrated circuit.
- peripheral circuit 6 ... operation control circuit, 12 ... tag mask circuit, 14 ... SRAM sub-array, 16 ... SRAM array, 66 ... heat sink, 78a, 78b ... process Scheduler, 80a, 80b ... Process scheduling unit, 82 ... Processor core load acquisition unit, 86 ... Processor core load correction unit, 88 ... Processor core position storage unit, 90 ... Processor core temperature acquisition , 92... Process queue invalidation enablement control unit, c1, c2, c3... Processor chip.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Power Engineering (AREA)
- Microelectronics & Electronic Packaging (AREA)
- General Engineering & Computer Science (AREA)
- Condensed Matter Physics & Semiconductors (AREA)
- Computing Systems (AREA)
- Electromagnetism (AREA)
- Radar, Positioning & Navigation (AREA)
- Automation & Control Theory (AREA)
- Human Computer Interaction (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Microcomputers (AREA)
Abstract
Description
[第1の実施形態~第6の実施形態]プロセッサチップの高温部分を冷却する回路構造及び制御方法に関する実施形態。
[第7の実施形態]異なる階層のチップ同士で発熱源となる回路が重ならないように配置されるチップレイアウト及び回路レイアウトに関する実施形態。
[第8の実施形態~第9の実施形態]チップ上に高温の場所(ホットスポット)ができないように各回路の動作やプロセス割り振りを制限する方法に関する実施形態。
1.1.プロセッサチップの構成
図1(a)は、第1の実施形態に係るプロセッサチップ(回路)c1の平面図である。なお、図1(b)は、従来のプロセッサチップc1’の平面図である。
第1の実施形態では、演算ユニットのような多量の熱を発生する回路と、キャッシュメモリのような熱の発生が少なく同一構成が一様に分布するような回路とが組み合わされた回路(チップ)において、メモリ回路が適宜分割され、分割された回路ブロックは独立して制御される。このように構成することにより、発生する熱の影響により動作不可能なメモリ回路の領域のみを停止し同時に動作可能なメモリ回路の領域を継続して動作させることができる。よって、発生する熱の影響によるプロセッサチップの性能低下を最小限に抑えることができる。なお、本実施形態の説明では、演算ユニットとキャッシュメモリを含む回路を取り上げているが、これに限定されるものではなく、発熱量の多い回路ブロックと発熱量の少ない回路ブロックを含む回路においても本実施形態を適用できる。
2.1.プロセッサチップの構成
図6は、第2の実施形態に係るプロセッサチップc1の平面図である。図6に示す第2の実施形態に係るプロセッサチップc1では、図1(a)に示す第1の実施形態に係るプロセッサチップと同様に、レベル2キャッシュメモリが複数のブロックに分割されているが、図6に示すレベル2キャッシュメモリの夫々のブロックは、ウェイの単位で利用されるように構成されている。
第2の実施形態では、プロセッサコアと、レベル2キャッシュメモリが適宜分割されたプロセッサチップにおいて、分割されたレベル2キャッシュメモリの回路ブロックはウェイ単位で利用され、且つウェイ単位の利用は有効・無効フラグ10により制御される。このように構成することにより、発生する熱の影響により、動作不可能なウェイ単位の利用に係る回路ブロックのみを停止し同時に動作可能な範囲を継続して動作させることができる。よって、発生する熱の影響によるプロセッサチップの性能低下を最小限に抑えることができる。
3.1.プロセッサチップの構成
図10は、第3の実施形態に係るプロセッサチップc1の平面図である。図10に示す第3の実施形態に係るプロセッサチップc1では、図1(a)に示す第1の実施形態に係るプロセッサチップと同様に、レベル2キャッシュメモリが複数のブロックに分割されているが、図10に示すレベル2キャッシュメモリの夫々のブロックは、セットの単位で利用されるように構成されている。
第3の実施形態では、プロセッサコアと、レベル2キャッシュメモリが適宜分割されたプロセッサチップにおいて、分割されたレベル2キャッシュメモリの回路ブロック(メモリブロック)はセット単位で利用され、且つセット単位で利用が制御される。このように構成することにより、発生する熱の影響により、動作不可能なセット単位の利用に係る回路ブロックのみを停止し同時に動作可能な範囲を継続して動作させることができる。よって、発生する熱の影響によるプロセッサチップの性能低下を最小限に抑えることができる。
4.1.分割したレベル2キャッシュメモリの放熱の制御フロー
第1の実施形態から第3の実施形態では、キャッシュメモリの分割構造について説明した。第4の実施形態は、分割したレベル2キャッシュメモリの放熱の制御に関するものである。本実施形態では、熱の影響を受ける回路ブロックを分離して、放熱のために細かく制御を行う。この熱くなった回路に対して放熱するためには、電源電圧を遮断することが好ましい。
図13は、第4の実施形態のレベル2キャッシュメモリ(以下、単に「キャッシュメモリ」という。)の電源遮断の第1の処理フローである。なお、図13に示す処理フローの制御を行うキャッシュメモリでは、分割された回路ブロック(メモリブロック)毎に温度をモニタリングする温度検出回路が設けられている。この温度検出回路として、後で説明する図20に示す温度検出回路34を利用することができる。以下の、図14~図18に示す処理フローの制御を行うキャッシュメモリにおいても同様である。
図14は、第4の実施形態に係る第2の処理フローである。図14に示すステップS21~S26の処理は、図13に示すステップS11~16の処理と略同様であるが、図14に示す処理フローでは、電源遮断を行わず、電源電圧を変化(低下)させる(S24)。集積回路のリーク電流などの消費電力は電源電圧に比例する。このために、電源電圧を下げることでも発生する熱量は低下して冷却の効果が得られる。更に、データを保持できる範囲で電圧を低下させる場合には、キャッシュメモリのパージ(S22)や無効化(S23)は不要である。通常、高温条件でのメモリ回路はデータ保持の能力が低くなるため、図13に示す電源遮断を行う処理フローと、図14に示す電源電圧を変更(低下)する処理フローとのいずれが適しているかについては、稼動させるアプリケーションも含めた上で判断することが好ましい。
図15は、第4の実施形態に係る第3の処理フローである。図15に示すステップS31~S36の処理は、図13に示すステップS11~16の処理と略同様であるが、図15に示す処理フローでは、電源遮断を行わず、クロックゲーティングを行う(S34)。クロックゲーティングの処理では、電源遮断の処理とは異なり、データが揮発してしまうことはない。このため、キャッシュメモリが起動可能温度より低温になり(S35・YES)、ゲーティングしていたキャッシュメモリを復帰させる際、即ち、ゲーティングしていたキャッシュメモリのゲーティングを解除する際には、キャッシュメモリの初期化は不要である(S36)。キャッシュメモリのゲーティングは、図12に示すSRAMのサブアレイのブロック図内の、CLK/メモリ制御回路30への入力クロックをゲーティングすることにより行われる。
図16は、第4の実施形態に係る第4の処理フローである。図16に示すステップS41~S46の処理は、図13に示すステップS11~16の処理と略同様であるが、図16に示す処理フローでは、電源遮断を行わず、メモリクロックの周波数の変更(減少)を行う(S44)。メモリクロックの周波数の変更の処理でも、電源遮断の処理とは異なり、データが揮発してしまうことはない。このため、キャッシュメモリが起動可能温度より低温になり(S45・YES)、メモリクロックの周波数を下げていたキャッシュメモリについて、周波数を元に戻す際には、キャッシュメモリの初期化は不要である(S46)。キャッシュメモリへのメモリクロックの周波数の変更は、図12に示すSRAMのサブアレイのブロック図内の、CLK/メモリ制御回路30への入力クロックを変更することにより行われる。なお、キャッシュメモリのメモリクロックの周波数変更(減少)は、DVFS(Dynamic Voltage and Frequency Scaling)のように、電源電圧を低下させることと同時に行われてもよい。
図17は、第4の実施形態に係る第5の処理フローである。図17に示すステップS51~S56の処理は、図13に示すステップS11~16の処理と略同様であるが、図17に示す処理フローでは、電源遮断を行わず、メモリクロックのデューティ比の変更(減少)を行う(S54)。メモリクロックのデューティ比の変更の処理でも、電源遮断の処理とは異なり、データが揮発してしまうことはない。このため、キャッシュメモリが起動可能温度より低温になり(S55・YES)、メモリクロックのデューティ比を元に戻す際には、キャッシュメモリの初期化は不要である(S56)。
図18は、第4の実施形態に係る第6の処理フローである。図18に示すステップS61~S66の処理は、図13に示すステップS11~16の処理と略同様であるが、図18に示す処理フローでは、電源遮断を行わず、キャッシュメモリ内のサブアレイ(図12参照)のチップセレクト信号をディスエーブルに(ここでは“1”に)固定する(S64)。サブアレイのチップセレクト信号をディスエーブルに固定する処理も、電源遮断の処理とは異なり、データが揮発してしまうことはない。このため、キャッシュメモリが起動可能温度より低温になり(S65・YES)、サブアレイのチップセレクト信号の固定を解除する際には、キャッシュメモリの初期化は不要である(S56)。サブアレイのチップセレクト信号をディスエーブルに固定することは、図12に示すSRAMのサブアレイのブロック図内の、CLK/メモリ制御回路30への入力制御信号を変更することにより行われる。
第4の実施形態では、キャッシュメモリの分割された回路ブロックに対して、モニタリングする発生熱に基づいて、放熱のための制御を行う。放熱のための制御として、電源遮断、電源電圧低下、クロックゲーティング、メモリクロックの周波数減少、メモリクロックのデューティ比減少、及びSRAMのサブアレイのチップセレクト信号の固定等が挙げられる。
5.1.分割したレベル2キャッシュメモリの放熱の制御のための構成
第5の実施形態は、分割したキャッシュメモリの放熱の制御のための具体的構成に関するものである。第4の実施形態では、放熱の制御を開始するためのトリガが全て温度であったが、第5の実施形態に係る装置では、放熱の制御を開始するためのトリガが温度以外のものである。なお、以下の第5の実施形態に係る装置では、制御の対象を電源電圧としているが、第4の実施形態の処理フローで示したように制御の対象が別のもの(例えば、メモリクロック周波数)であってもよい。
図20は、第5の実施形態に係るプロセッサチップc1の回路構成の第1の例を示す図である。図20に示すプロセッサチップc1では、レベル2キャッシュメモリを構成する各ブロック、即ち、レベル2キャッシュメモリ(1)(2)(3)夫々の温度情報に基づいて、各レベル2キャッシュメモリ(1)(2)(3)の電源電圧が制御されている。レベル2キャッシュメモリ(1)(2)(3)の各ブロック内の温度を把握するために、レベル2キャッシュメモリ(1)(2)(3)の夫々の中央に、温度検出回路34が配置されている。温度検出回路34は、例えば、サーマルダイオードにより構成される。サーマルダイオードは、電流を流したときの電圧を計測することによりその部分の温度が判別される素子である。温度検出回路34で検出された温度情報は、キャッシュメモリの外に配置された動作制御回路6へ送られる。動作制御回路6の中には、レベル2キャッシュメモリの各ブロックへの電源電圧を夫々制御するためのキャッシュブロック電源制御回路36が設けられている。レベル2キャッシュメモリの各ブロックの温度が動作許容温度以上になった際には、キャッシュメモリのパージ処理を伴うため、電源電圧低下(又は、遮断)開始の情報がプロセッサコア側へ通知され、パージ処理と無効化処理が行われる。これらの処理がプロセッサコア側で完了したあとで、キャッシュブロック電源制御回路36は電源電圧の低下(又は、遮断)処理を行う。
図21は、第5の実施形態に係るプロセッサチップc1の回路構成の第2の例を示す図である。図21に示すプロセッサチップc1では、動作制御回路6は、レベル2キャッシュメモリの各ブロックへの電源電圧を夫々制御するためのキャッシュブロック電源制御回路40を含み、各キャッシュブロック電源制御回路40はタイマー回路38を備える。図21に示す形態では、キャッシュブロック電源制御回路40は、タイマー回路38により計測された一定時間の経過後に、電源電圧の低下(又は、遮断)を開始する。例えば、組み込み向けのプロセッサにおいて所定の処理を周期的に行う場合には、タイマー回路38で一定時間を計測して、キャッシュブロック電源制御回路40にレベル2キャッシュメモリのブロックへの電源を遮断させるようにしてもよい。外部の温度によって周期が変化するような場合には、外部の温度をタイマー回路38に入力して外部の温度によりタイマー回路38の周期を変更するように構成してもよい。
図22は、第5の実施形態に係るプロセッサチップc1の回路構成の第3の例を示す図である。図22に示すプロセッサチップc1の動作制御回路6に含まれるキャッシュブロック電源制御回路44は、プロセッサコアに与えられるクロック信号に基づいて、電源電圧を変動(又は、遮断)させる。周辺回路4に含まれるクロックギア切り替え回路46が、プロセッサコア(プロセッサコア0、プロセッサコア1)に与えるクロック信号の周波数を切り替える。各キャッシュブロック電源制御回路44に備わるクロックモニタ回路42は、プロセッサコアに与えられるクロック信号の周波数をモニタしている。キャッシュブロック電源制御回路44は、クロックモニタ回路42のモニタ情報に基づいて電源電圧を変動(又は、遮断)させる制御を行う。
図23は、第5の実施形態に係るプロセッサチップc1の回路構成の第4の例を示す図である。図23に示すプロセッサチップc1の動作制御回路6に含まれるキャッシュブロック電源制御回路50は、同じく動作制御回路6に備わる稼働率算出回路48の算出する、プロセッサコアにおける演算ユニットの稼働率に基づいて、電源電圧を変動(又は、遮断)させる。通常、プロセッサコア内部の熱の発生は、演算ユニットの使用頻度に大きく影響を受ける。例えば、プロセッサによっては整数演算ユニットと小数演算ユニットは発熱しやすい傾向がある。このことを利用して、プロセッサコア内の命令から小数や整数演算ユニットの稼働率を算出する。図23に示す稼働率算出回路48は、プロセッサコアに含まれる命令デコード部52が小数/整数演算回路54に与える命令に基づいて、小数/整数演算回路(演算ユニット)の稼働率を計算する。稼働率算出回路48が算出する稼働率が一定の基準を超えた場合に、キャッシュブロック電源制御回路50は、キャッシュメモリの電源電圧低下(又は、遮断)を開始する。なお、小数/整数演算回路の動作周波数も発熱に影響するため、稼働率算出回路48は、命令デコード部52が小数/整数演算回路54に与える命令だけでなく、プロセッサコアの動作周波数にも基づいて、稼働率を算出するような構成であってもよい。
図24は、第5の実施形態に係るプロセッサチップc1の回路構成の第5の例を示す図である。図24に示すプロセッサチップc1の動作制御回路6に含まれるキャッシュブロック電源制御回路58は、同じく動作制御回路6に備わるキャッシュミス率算出回路56の算出する、キャッシュメモリのミス率に基づいて、電源電圧を変動(又は、遮断)させる。キャッシュミスが多く発生すると、プロセッサコアは、通常、演算をせずに処理を停止する。言い換えると、キャッシュミスが多いプロセッサコアは、演算をあまり行わずに休止していることが多く、よって、あまり発熱しない。このことを利用して、キャッシュのミス率の大小によって、レベル2キャッシュメモリの各ブロックの電源電圧を変更する。図24に示すプロセッサチップc1では、キャッシュミス時にはプロセッサコア内部のBCU(Bus Control Unit)60から、周辺回路4の外部DRAM制御部62を介して、外部メモリへのアクセスが発生する。キャッシュミス率算出回路56は、BCU60を介する外部メモリへのアクセス信号に基づいて、キャッシュミス率を算出する。キャッシュミス率算出回路56が算出するキャッシュミス率が、所定の時間、所与のしきい値よりも低くなった際に、キャッシュブロック電源制御回路58は、キャッシュメモリの電源電圧低下(又は、遮断)を開始する。
第5の実施形態では、キャッシュメモリの分割された回路ブロックに対して、放熱の制御を開始するためのトリガに基づいて、放熱のための制御を開始する。放熱の制御を行うためのトリガとして、キャッシュメモリの分割された回路ブロックにおける温度情報、周期、プロセッサコアに与えられるクロック信号の周波数、プロセッサコアにおける演算ユニットの稼働率、及びキャッシュメモリのミス率等が挙げられる。
6.1.プロセッサチップにおける動作制御回路の配置
第6の実施形態は、キャッシュメモリやプロセッサコアの放熱を制御する動作制御回路6の配置の位置に関するものである。動作制御回路6が周囲の熱の影響により動作許容温度より高くなると、キャッシュメモリやプロセッサコアの放熱が適切に実施されなくなる。従って、動作制御回路6の配置位置に関して、周囲の熱の発生を考慮する必要がある。
例えば、図25(a)に示すように、動作制御回路6がプロセッサコア(プロセッサコア0、プロセッサコア1)の近傍に配置された場合に、動作制御回路6そのものが熱の影響で動作許容温度より高くなってしまう可能性がある。従って、プロセッサコアが発熱し易いプロセッサチップにおいては、動作制御回路6がプロセッサコアと離れて配置されるのが好ましい。図25(b)は、第6の実施形態に係るプロセッサチップc1の回路レイアウトの第1の例を示す図である。網掛け部分は、85度以上の熱が発生し易い領域を示している。図25(b)に示すように、プロセッサチップc1では、動作制御回路6は、レベル2キャッシュメモリの近傍でありつつ、プロセッサコアから最も遠い位置に配置されている。このように配置することにより、動作制御回路6は、他の部位から発生する熱の影響を殆ど受けず、動作制御回路6そのものが許容範囲以上の温度になることがない。更に、プロセッサコアとの間にキャッシュメモリが存在しているので、プロセッサコアからの熱が動作制御回路6に回り込むまでに、キャッシュメモリにおける電源遮断などの冷却動作が開始され得る。よって、動作制御回路6は低温を維持し易い。
図26は、第6の実施形態に係るプロセッサチップc1の回路レイアウトの第2の例を示す図である。図26に示す動作制御回路6は、別チップc2上に配置されている。このように別チップc2上に動作制御回路6が配置されると、シリコン基板や金属配線を経由して熱が伝導することがない。即ち、動作制御回路6が、プロセッサコア及びキャッシュメモリと同一チップ上に形成される場合よりも、動作制御回路6に伝わる熱は確実に遮断され得る。
図27(a)は、第6の実施形態に係るプロセッサチップc1の回路レイアウトの第3の例を示す図である。図27(a)に示す回路レイアウトでは、動作制御回路6がプロセッサチップc1の外周に配置されている。このように、動作制御回路6をプロセッサチップc1の外周に配置することによって、動作制御回路6の熱が逃げ易くなる。特に、三次元積層を行う場合には、プロセッサチップc1の上面や下面だけではなく、側面から冷却された空気や液体が送り込まれる可能性があるため、プロセッサチップc1の中心よりも外周の方の温度が低くなり得る。ただし、プロセッサチップc1の外周上にIOセル64が存在する場合には、図27(b)に示すように、動作制御回路6はこのIOセル64よりも内側に配置される。
図28は、第6の実施形態に係るプロセッサチップc1の回路レイアウトの第4の例を示す図である。図28に示す回路は、3枚のプロセッサチップc1、c2、c3が三次元積層されたものであり、最上位のプロセッサチップc1の上面にはヒート
シンク66が設けられている。この図28に示す三次元積層回路では、動作制御回路6は、ヒートシンク66に最も近い最上位のプロセッサチップc1に配置されている。
第6の実施形態では、プロセッサコアとレベル2キャッシュメモリが適宜分割されたプロセッサチップにおいて、プロセッサコアやキャッシュメモリの放熱を制御する動作制御回路が、低温を維持する位置に配置される。このようにすることにより、動作制御回路が動作許容温度より高温になることが回避される。
7.1.従来の三次元集積回路の構成
第7の実施形態は、複数のプロセッサチップを積層してなる三次元集積回路に関するものである。まず、一般的な、複数のプロセッサチップからなる三次元集積回路について説明する。
同一のプロセッサチップを複数積層させた場合には、熱の発生が問題となる。図31(a)は、2枚のプロセッサチップc1、c2を積層させた三次元集積回路の側面図であり、図31(b)は、2枚のプロセッサチップc1、c2を積層させる場合の模式図である。図31(a)(b)に示すプロセッサチップは、1枚あたり2領域のプロセッサコアと2領域のレベル2キャッシュメモリを備える。
第7の実施形態は、上述の問題を解決するものである。本実施形態に係る三次元積層回路は、2枚以上のプロセッサチップを積層して構成される三次元集積回路であって、その中の少なくとも2枚のプロセッサチップが同一の回路ブロックレイアウトを持ち、且つそれら2枚のプロセッサチップが層間で配置を変えて積層する構造を備えることを特徴とする。ここでの「同一の回路ブロックレイアウト」ということは、プロセッサチップのマスクにおいて、配線層以外のトランジスタ層が同一であることを指すものとする。言い換えると、FEOL工程(Front End of Line)で使用するマスクが一致することである。
図32(a)は、2枚のプロセッサチップc1、c2を積層させた、第7の実施形態に係る三次元集積回路の側面図であり、図32(b)は、第7の実施形態に係る三次元集積回路における、2枚のプロセッサチップc1、c2を積層させる場合の模式図である。第7の実施形態では、プロセッサチップc1とプロセッサチップc2とを、180度回転させて積層している。このように、プロセッサチップc1、c2同士を180度回転させて積層することにより、発熱量の多いプロセッサコア部分と発熱量の少ないキャッシュメモリ部分とが重ね合わせられることになる。このように構成することによって、発熱量の多い部分同士が重なることがなくなり、図31に示す構成にてプロセッサコア同士の重なりにより発生するホットスポットが、発生しなくなる。よって、図32に示す構成を用いることで、冷却機構のコスト削減や回路の性能向上を見込むことができる。
(1)所定の閾値より発熱量の多いブロックと、所定の閾値より発熱量の少ないブロック。
(2)発熱量が最大である回路ブロックと、それ以外の回路ブロック。
(3)チップ全体の回路ブロックからの発熱量の平均値より発熱量の多いブロックと、チップ全体の回路ブロックからの発熱量の平均値より発熱量の少ないブロック。
図34は、第7の実施形態に係る三次元積層回路の第2の例を示しており、発熱量の少ない部分同士が重なる領域が、更に分けられている。即ち、プロセッサチップc1とプロセッサチップc2において、プロセッサコアとレベル2キャッシュメモリが重なる領域と、レベル2キャッシュメモリ同士が重なる領域とで、レベル2キャッシュメモリが分けられている。このように構成することで、あるプロセッサチップのプロセッサコアから伝導した熱により、他のプロセッサチップのレベル2キャッシュメモリ(2)が動作許容温度を超えたとしても、他のプロセッサチップにおける残余のレベル2キャッシュメモリ(1)には比較的熱が伝導していないと想定される。このとき、キャッシュメモリの部分停止による性能劣化を最低限に抑えることが可能となる。
図36は、第7の実施形態に係る三次元積層回路の第3の例を示す図である。発熱量の少ない部分同士が重なる領域は、それ以外の部分と比べて平均的に温度が低い。このことを利用して、図36に示す三次元積層回路のように、発熱量の少ない部分同士が重なる領域を高速キャッシュメモリで実装してもよい。高速キャッシュメモリは高速に動作する反面、消費電流が大きくなり発熱が多くなる傾向にある。しかしながら、各プロセッサコアの中央部分におけるキャッシュメモリ同士が重なる領域は比較的低温となるので、高速キャッシュメモリを配置しても熱の問題はあまり大きくならない。特に、キャッシュメモリ同士が重なる領域は、プロセッサコアの近傍になることが多いため、図36に示す配置は結果的に高速キャッシュメモリの性能が生かされることになる。
図38は、第7の実施形態に係る三次元積層回路の第4の例を示す図である。プロセッサ全体の制御回路等は、発熱量の多い部分(例えば、プロセッサコア)と積層時に重なる部分に配置しないことが好ましい。図38に示す回路では、プロセッサチップ全体の電源の制御を行う電源制御回路36aは、キャッシュメモリ同士が重なる領域の中に配置されている。なお、図38の電源制御回路36aは、同じプロセッサチップ上のプロセッサコアに設けられた温度センサ34aの検出する温度により、プロセッサチップ全体の電源の制御を行う。
第7の実施形態に係る三次元積層回路は、2枚以上のプロセッサチップを積層して構成される三次元集積回路であって、その中の少なくとも2枚のプロセッサチップが同一の回路ブロックレイアウトを持ち、且つそれら2枚のプロセッサチップが層間で配置を変えて積層される。このようにすることにより、三次元集積回路において、ホットスポットの発生が回避され易くなる。
8.1.プロセッサコアの動作の制御
8.1.(1)第1の例
図39(a)は、第8の実施形態における三次元集積回路の第1の例の構成を示した図である。第1の例の三次元集積回路は、3枚のプロセッサチップc1、c2、c3を積層した三次元集積回路である。図39(b)は、第8の実施形態に係る、3枚のプロセッサチップc1、c2、c3を積層させる場合の模式図である。第8の実施形態に係る三次元集積回路は、マルチプロセッサシステムを構築する。このシステム(三次元集積回路)は、同一回路レイアウトを持つプロセッサチップを3枚重ねて構成されている。1チップあたりに2プロセッサコアが実装されているため、全体として6プロセッサコアのマルチプロセッサシステムとなる。このような三次元マルチコアプロセッサについては、ソフトウェアは、1チップ上に6プロセッサコアが配置されるプロセッサチップ、即ち、6コアマルチプロセッサとして認識する。
図41(a)は、第8の実施形態おける三次元集積回路の第2の例の構成を示した図である。第2の例の三次元集積回路は、2枚のプロセッサチップc1、c2を積層した三次元集積回路である。図41(b)は、第8の実施形態に係る、2枚のプロセッサチップc1、c2の模式図である。図39に示す三次元集積回路では、同一の回路レイアウトを持つプロセッサチップが積層されているが、図41に示す三次元集積回路では、異なる回路レイアウトを持つプロセッサチップc1、c2が積層されている。図41(b)に示す三次元集積回路は、4プロセッサコアを持つプロセッサチップc1と、2プロセッサコアを持つプロセッサチップc2とが積層されて構成される、6マルチプロセッサコアの三次元集積回路である。
(1)左右に隣接して配置されたプロセッサコアが同時にプログラムを動作しないように制御する。
(2)あるプロセッサコアがプログラムを動作しているときにそのプロセッサコアから最も遠いプロセッサコアが続いてプログラムを動作するように制御する。
(3)ヒートシンクに近いプロセッサコアが数多くプログラムを動作するように制御する。
(4)その割り当て制御部77の近傍のプロセッサコアを回避して、他のプロセッサコアがプログラムを動作するように制御する。
第8の実施形態は、複数枚のプロセッサチップを積層した三次元集積回路に係るものであって、三次元集積回路内部の個々のプロセッサコアの位置関係が考慮されて、プロセス(プログラム)の割り当てが制御される。例えば、層間にて隣接するプロセッサチップ間で重なる部分のプロセッサコアが動作しないように、プロセス(プログラム)の割り当てが制御される。このようなプロセス(プログラム)割り当てを行うことで、隣接して重なり合う部分のプロセッサコアが、熱の発生源として重なったり集中したりすることが無いため、ホットスポットの発生が抑制され得る。
9.1.プロセススケジューリング
9.1.(1)第1の例
図43は、第9の実施形態に係るプロセススケジューラ78aの第1の例のブロック図である。図42は、その前提となる従来のプロセススケジューラ78’のブロック図と、2枚のプロセッサチップc1、c2を積層させて構成される三次元集積回路における各プロセッサチップc1、c2との、関係を示す図である。
図44は、第9の実施形態に係るプロセススケジューラ78bの第2の例のブロック図である。図44に示すプロセススケジューラ78bにおけるプロセススケジュール部80bには、プロセッサコア位置記憶部88及びプロセッサコア温度取得部90を備えるプロセスキュー無効化有効化制御部92が繋げられている。プロセスキュー無効化有効化制御部92は、プロセスキュー部84a、84b、84c、84dの無効化及び有効化を行う。
(1)現在動作しているプロセッサコアの中で最も熱いものと積層して上下で重なるプロセッサコアのプロセスキュー部を無効化する(停止する)。
(2)現在動作しているプロセッサコアの中で最も熱いものと上下左右で接するプロセッサコアのプロセスキュー部を無効化する(停止する)。
(3)現在動作しているプロセッサコアの中で最も熱いものを停止して、同時にそのプロセッサコアと積層して上下で重なるプロセッサコアのプロセスキュー部を有効化する。
(4)現在動作しているプロセッサコアの中で最も熱いものを停止して、同時にそのプロセッサコアと上下左右で接する(左上、右上など斜め方向での隣接も含む)プロセッサコアのプロセスキュー部を有効化する。
(5)現在動作しているプロセッサコアの中でしきい値以上の温度を示すものと積層して上下で重なるプロセッサコアのプロセスキュー部を無効化する(停止する)。
(6)現在動作しているプロセッサコアの中でしきい値以上の温度を示すものと上下左右で接するプロセッサコアのプロセスキュー部を無効化する(停止する)。
(7)現在動作しているプロセッサコアの中でしきい値以上の温度を示すものを停止して、同時にそのプロセッサコアと積層して上下で重なるプロセッサコアのプロセスキュー部を有効化する。
(8)現在動作しているプロセッサコアの中でしきい値以上の温度を示すものを停止して、同時にそのプロセッサコアと上下左右で接する(左上、右上など斜め方向での隣接も含む)プロセッサコアのプロセスキュー部を有効化する。
(9)温度とは無関係に、隣接するプロセッサチップ間で隣り合うプロセッサコアが同時に動作しないようにプロセスキュー部を有効化又は無効化する。
(10)温度とは無関係に、上下左右方向で隣接するプロセッサチップ間で隣り合うプロセッサコアが同時に動作しないようにプロセスキュー部を有効化又は無効化する。
(11)ヒートシンクに接するプロセッサチップ(1枚目)については、全プロセッサコアを使用可能にし、それ以外のプロセッサチップにおけるプロセッサコアに対しては、上述の(1)~(10)の手順を行う。
第9の実施形態は、複数枚のプロセッサチップを積層した三次元集積回路に対するプロセススケジューラに係るものであって、各プロセッサコアに対するプロセスキュー部へのプロセスのスケジューリングを制御するプロセススケジュール部への入力データが制御される。このようにすることにより、局所的なプロセッサコアにおけるホットスポットの発生が回避される。
以上の文中において、レベル2キャッシュメモリと記載されて部分については、レベル3キャッシュメモリでもレベル4キャッシュメモリでもよく、キャッシュメモリの階層に依存するものではない。
(1)本発明の第1の態様の集積回路装置は、
メモリ回路で構成される第1の回路と、演算回路で構成される第2の回路と、制御回路とを含み、
前記第1の回路は、前記第2の回路との間の配置位置の距離に応じて複数の回路ブロックに分割され、
前記制御回路は、分割された夫々の回路ブロックを独立して制御することを特徴とする。
前記制御回路による制御が無ければ、前記第2の回路が動作することによって発生する熱の影響により、前記第1の回路は、動作が可能な温度範囲を超えることを特徴とする。
前記メモリ回路がキャッシュメモリであり、前記演算回路がプロセッサコアである。
前記制御回路は、分割された夫々の前記回路ブロックに対して、独立して電源電圧の供給、遮断の制御を行うことを特徴とする。
前記制御回路は、分割された夫々の前記回路ブロックに対して、独立して電源電圧の変更の制御を行うことを特徴とする。
前記制御回路は、分割された夫々の前記回路ブロックに対して、独立してクロックゲーティングの制御を行うことを特徴とする。
前記制御回路は、分割された夫々の前記回路ブロックに対して、独立してメモリクロックの周波数の変更の制御を行うことを特徴とする。
前記制御回路は、分割された夫々の前記回路ブロックに対して、独立してメモリクロックのデューティ比の変更の制御を行うことを特徴とする。
前記制御回路は、分割された夫々の前記回路ブロックにおけるサブアレイのチップセレクト信号を、独立して制御することを特徴とする。
前記制御回路は、分割された夫々の前記回路ブロック内の温度に基づいて、前記回路ブロックを独立して制御することを特徴とする。
前記制御回路は、分割された夫々の前記回路ブロックに対して備わるタイマーの計測する時間に基づいて、前記回路ブロックを独立して制御することを特徴とする。
前記制御回路は、前記第2の回路に与えられるクロックの周波数に基づいて、前記回路ブロックを独立して制御することを特徴とする。
前記制御回路は、前記第2の回路における演算回路の稼働率に基づいて、前記回路ブロックを独立して制御することを特徴とする。
前記制御回路は、前記キャッシュメモリへのキャッシュミス率に基づいて、前記回路ブロックを独立して制御することを特徴とする。
前記制御回路が、前記第2の回路から最も遠い側の第1の回路に隣接する位置に配置されることを特徴とする。
前記制御回路が、別チップにされる配置することを特徴とする。
前記制御回路が、前記第2の回路から最も遠い側の同一チップの外周に設置されることを特徴とする。
更に、ヒートシンクを含み、
前記制御回路が、前記ヒートシンクに最も近いチップの層に配置されることを特徴とする。
第1のチップと、第1のチップと直接に積層する第2のチップとを含む三次元集積回路であって、
前記第1のチップは、比較的発熱量の多い回路ブロックと比較的発熱量の少ない回路ブロックと含み、
前記第2のチップは、比較的発熱量の多い回路ブロックと比較的発熱量の少ない回路ブロックと含み、
前記第1のチップにおける比較的発熱量の多い回路ブロックと、前記第2のチップにおける比較的発熱量の多い回路ブロックとが、層間で重なる面積が最小となるように、前記第1のチップと前記第2のチップとが相互に配置された上で積層されていることを特徴とする。
2枚以上のチップを積層して構成される三次元集積回路であって、
その中の少なくとも2枚のチップが同一回路ブロックレイアウトを持つチップであり、
前記少なくとも2枚のチップの配置が層間で異なるように配置されていることを特徴とする。
同一回路ブロックレイアウトを持つ前記少なくとも2枚のチップにおいて、一方のチップは他方のチップに対して90度若しくは180度回転して積層されていることを特徴とする。
同一回路ブロックレイアウトを持つ前記少なくとも2枚のチップがプロセッサチップであって、マルチコアシステムを構成する。
前記少なくとも2枚のプロセッサチップにおいて、一方のプロセッサチップが他方のプロセッサチップに対して90度若しくは180度回転して積層されていることを特徴とする。
積層時にレベル2キャッシュメモリ同士が上下方向で隣接する第1の領域が、前記少なくとも2枚のプロセッサチップの夫々において分割されており、
分割された前記第1の領域は、夫々のプロセッサチップにて独立して制御されることを特徴とする。
分割された前記第1の領域が、レベル3キャッシュメモリで構成されることを特徴とする。
分割された前記第1の領域が、高速キャッシュメモリで構成されることを特徴と
する。
前記少なくとも2枚のプロセッサチップの夫々において、積層時にプロセッサコアと上下方向で隣接する第2の領域が、低消費電力キャッシュメモリで構成されることを特徴とする。
更に、制御回路を含み、
前記制御回路が、前記第1の領域に配置されていることを特徴とする。
第1のチップと、第1のチップと直接に積層する第2のチップとを含む三次元集積回路であって、
前記第1のチップは、比較的発熱量の多い回路ブロックと比較的発熱量の少ない回路ブロックと含み、
前記第2のチップは、比較的発熱量の多い回路ブロックと比較的発熱量の少ない回路ブロックと含み、
前記第1のチップにおける比較的発熱量の多い回路ブロックが、前記第2のチップにおける比較的発熱量の多い回路ブロックと垂直方向に重ならないように、前記第1のチップと前記第2のチップとが相互に配置され上で積層されていることを特徴とする。
積層する複数のプロセッサチップと、割り当て制御部とを含む三次元プロセッサ装置であって、
各プロセッサチップは、一つ若しくは複数のプロセッサコアを備え、
前記割り当て制御部は、前記三次元プロセッサ装置における各プロセッサコアの位置のデータを記憶しているプロセッサコア位置記憶部を含み、
前記割り当て制御部は、前記プロセッサコア位置記憶部に記憶される各プロセッサコアの位置のデータに基づき、各プロセッサコアに対するプログラムの割り当てを制御する。
前記割り当て制御部は、積層するプロセッサチップ間において、上下に隣接して配置されたプロセッサコアが同時にプログラムを動作しないように各プロセッサコアに対するプログラムの割り当てを制御する。
前記割り当て制御部は、積層するプロセッサチップ間において、左右に隣接して配置されたプロセッサコアが同時にプログラムを動作しないように各プロセッサコアに対するプログラムの割り当てを制御する。
前記割り当て制御部は、積層するプロセッサチップ間において、左右に隣接して配置されたプロセッサコアが同時にプログラムを動作しないように各プロセッサコアに対するプログラムの割り当てを制御する。
前記割り当て制御部は、一つのプロセッサコアがプログラムを動作しているときに、そのプロセッサコアから最も遠いプロセッサコアが続いてプログラムを動作するように、各プロセッサコアに対するプログラムの割り当てを制御する。
更に、ヒートシンクを含み、
前記割り当て制御部は、前記ヒートシンクの近傍のプロセッサコアが優先してプログラムを動作するように各プロセッサコアに対するプログラムの割り当てを制御する。
前記割り当て制御部は、前記プロセッサコアのうちの一つに含まれており、
割り当て制御部は、その割り当て制御部を含むプロセッサコアを回避して、他のプロセッサコアがプログラムを動作するように、各プロセッサコアに対するプログラムの割り当てを制御する。
前記複数のプロセッサチップが、同一の回路ブロックレイアウトを有することを特徴とする。
複数のプロセッサチップを積層させて構成される三次元マルチコアプロセッサ装置における複数のプロセッサコアに対するプロセススケジューラであって、
各プロセッサコアにおける負荷量を取得する負荷取得部と、
三次元マルチコアプロセッサ装置における個々のプロセッサコアに対応する各プロセスキュー部に、各プロセッサコアの負荷量に基づいて、プロセスをスケジューリングするスケジュール部と、
前記プロセッサコア負荷取得部における、各プロセッサコアの負荷量を修正する負荷修正部と、
各プロセッサコアの位置を記憶する位置記憶部と、
各プロセッサコアの温度を取得する温度取得部と
を備え、
前記負荷修正部は、前記位置記憶部の記憶する各プロセッサコアの位置情報と、前記温度取得部の取得する各プロセッサコアの温度情報とを用いて、負荷取得時における各プロセッサコアの負荷量を修正することを特徴とする。
前記負荷修正部は、前記位置記憶部の記憶する各プロセッサコアの位置情報と、前記温度取得部の取得する各プロセッサコアの温度情報とを用いて、所定値より高温であるプロセッサコアと上下に隣接して配置されたプロセッサコアの負荷量を、前記スケジュール部がスケジューリングを停止するように修正することを特徴とする。
複数のプロセッサチップを積層させて構成される三次元マルチコアプロセッサ装置における複数のプロセッサコアに対するプロセススケジューラであって、
各プロセッサコアに対するプロセスのキューを行い各プロセッサコアに順にプロセスを実行させるプロセスキュー部と、
前記プロセスキュー部の夫々の無効化及び有効化を制御するキュー無効化有効化制御部と、
各プロセッサコアの位置を記憶する位置記憶部と、
各プロセッサコアの温度を取得する温度取得部と
を備え、
前記キュー無効化有効化制御部は、前記位置記憶部の記憶する各プロセッサコアの位置情報と、前記温度取得部の取得する各プロセッサコアの温度情報とを用いて、前記プロセスキュー部の無効化及び有効化を制御することを特徴とする。
前記キュー無効化有効化制御部は、前記位置記憶部の記憶する各プロセッサコアの位置情報と、前記温度取得部の取得する各プロセッサコアの温度情報とを用いて、所定値より高温であるプロセッサコアと上下に隣接して配置されたプロセッサコアに対するプロセスキュー部を無効化することを特徴とする。
Claims (20)
- メモリ回路で構成される第1の回路と、演算回路で構成される第2の回路と、制御回路とを含み、
前記第1の回路は、前記第2の回路との間の配置位置の距離に応じて複数の回路ブロックに分割され、
前記制御回路は、分割された夫々の回路ブロックを独立して制御することを特徴とする集積回路装置。 - 前記制御回路による制御が無ければ、前記第2の回路が動作することによって発生する熱の影響により、前記第1の回路は、動作が可能な温度範囲を超えることを特徴とする請求項1に記載の集積回路装置。
- 前記メモリ回路がキャッシュメモリであり、前記演算回路がプロセッサコアである請求項2に記載の集積回路装置。
- 前記制御回路は、分割された夫々の前記回路ブロックに対して、独立して電源電圧の供給、遮断の制御を行うことを特徴とする請求項3に記載の集積回路装置。
- 前記制御回路は、分割された夫々の前記回路ブロックに対して、独立して電源電圧の変更の制御を行うことを特徴とする請求項3に記載の集積回路装置。
- 前記制御回路が、別チップにされる配置することを特徴とする請求項3に記載の集積回路装置。
- 更に、ヒートシンクを含み、
前記制御回路が、前記ヒートシンクに最も近いチップの層に配置されることを特徴とする請求項3に記載の集積回路装置。 - 第1のチップと、第1のチップと直接に積層する第2のチップとを含む三次元集積回路であって、
前記第1のチップは、比較的発熱量の多い回路ブロックと比較的発熱量の少ない回路ブロックと含み、
前記第2のチップは、比較的発熱量の多い回路ブロックと比較的発熱量の少ない回路ブロックと含み、
前記第1のチップにおける比較的発熱量の多い回路ブロックと、前記第2のチップにおける比較的発熱量の多い回路ブロックとが、層間で重なる面積が最小となるように、前記第1のチップと前記第2のチップとが相互に配置された上で積層されていることを特徴とする三次元集積回路。 - 2枚以上のチップを積層して構成される三次元集積回路であって、
その中の少なくとも2枚のチップが同一回路ブロックレイアウトを持つチップであり、
前記少なくとも2枚のチップの配置が層間で異なるように配置されていることを特徴とする三次元集積回路。 - 同一回路ブロックレイアウトを持つ前記少なくとも2枚のチップにおいて、一方のチップは他方のチップに対して90度若しくは180度回転して積層されていることを特徴とする請求項9に記載の三次元集積回路。
- 同一回路ブロックレイアウトを持つ前記少なくとも2枚のチップがプロセッサチップであって、マルチコアシステムを構成する請求項9に記載の三次元集積回路。
- 前記少なくとも2枚のプロセッサチップにおいて、一方のプロセッサチップが他方のプロセッサチップに対して90度若しくは180度回転して積層されていることを特徴とする請求項11に記載の三次元集積回路。
- 積層時にレベル2キャッシュメモリ同士が上下方向で隣接する第1の領域が、前記少なくとも2枚のプロセッサチップの夫々において分割されており、
分割された前記第1の領域は、夫々のプロセッサチップにて独立して制御されることを特徴とする請求項12に記載の三次元集積回路。 - 積層する複数のプロセッサチップと、割り当て制御部とを含む三次元プロセッサ装置であって、
各プロセッサチップは、一つ若しくは複数のプロセッサコアを備え、
前記割り当て制御部は、前記三次元プロセッサ装置における各プロセッサコアの位置のデータを記憶しているプロセッサコア位置記憶部を含み、
前記割り当て制御部は、前記プロセッサコア位置記憶部に記憶される各プロセッサコアの位置のデータに基づき、各プロセッサコアに対するプログラムの割り当てを制御する三次元プロセッサ装置。 - 前記割り当て制御部は、積層するプロセッサチップ間において、上下に隣接して配置されたプロセッサコアが同時にプログラムを動作しないように各プロセッサコアに対するプログラムの割り当てを制御する請求項14に記載の三次元プロセッサ装置。
- 前記割り当て制御部は、積層するプロセッサチップ間において、左右に隣接して配置されたプロセッサコアが同時にプログラムを動作しないように各プロセッサコアに対するプログラムの割り当てを制御する請求項14に記載の三次元プロセッサ装置。
- 複数のプロセッサチップを積層させて構成される三次元マルチコアプロセッサ装置における複数のプロセッサコアに対するプロセススケジューラであって、
各プロセッサコアにおける負荷量を取得する負荷取得部と、
三次元マルチコアプロセッサ装置における個々のプロセッサコアに対応する各プロセスキュー部に、各プロセッサコアの負荷量に基づいて、プロセスをスケジューリングするスケジュール部と、
前記プロセッサコア負荷取得部における、各プロセッサコアの負荷量を修正する負荷修正部と、
各プロセッサコアの位置を記憶する位置記憶部と、
各プロセッサコアの温度を取得する温度取得部と
を備え、
前記負荷修正部は、前記位置記憶部の記憶する各プロセッサコアの位置情報と、前記温度取得部の取得する各プロセッサコアの温度情報とを用いて、負荷取得時における各プロセッサコアの負荷量を修正することを特徴とするプロセススケジューラ。 - 前記負荷修正部は、前記位置記憶部の記憶する各プロセッサコアの位置情報と、前記温度取得部の取得する各プロセッサコアの温度情報とを用いて、所定値より高温であるプロセッサコアと上下に隣接して配置されたプロセッサコアの負荷量を、前記スケジュール部がスケジューリングを停止するように修正することを特徴とする請求項17に記載のプロセススケジューラ。
- 複数のプロセッサチップを積層させて構成される三次元マルチコアプロセッサ装置における複数のプロセッサコアに対するプロセススケジューラであって、
各プロセッサコアに対するプロセスのキューを行い各プロセッサコアに順にプロセスを実行させるプロセスキュー部と、
前記プロセスキュー部の夫々の無効化及び有効化を制御するキュー無効化有効化制御部と、
各プロセッサコアの位置を記憶する位置記憶部と、
各プロセッサコアの温度を取得する温度取得部と
を備え、
前記キュー無効化有効化制御部は、前記位置記憶部の記憶する各プロセッサコアの位置情報と、前記温度取得部の取得する各プロセッサコアの温度情報とを用いて、前記プロセスキュー部の無効化及び有効化を制御することを特徴とするプロセススケジューラ。 - 前記キュー無効化有効化制御部は、前記位置記憶部の記憶する各プロセッサコアの位置情報と、前記温度取得部の取得する各プロセッサコアの温度情報とを用いて、所定値より高温であるプロセッサコアと上下に隣接して配置されたプロセッサコアに対するプロセスキュー部を無効化することを特徴とする請求項19に記載のプロセススケジューラ。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/996,160 US9122286B2 (en) | 2011-12-01 | 2012-10-22 | Integrated circuit apparatus, three-dimensional integrated circuit, three-dimensional processor device, and process scheduler, with configuration taking account of heat |
US14/800,979 US20150370754A1 (en) | 2011-12-01 | 2015-07-16 | Integrated circuit apparatus, three-dimensional integrated circuit, three-dimensional processor device, and process scheduler, with configuration taking account of heat |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011-263919 | 2011-12-01 | ||
JP2011-263921 | 2011-12-01 | ||
JP2011263921 | 2011-12-01 | ||
JP2011263919 | 2011-12-01 | ||
JP2011-263913 | 2011-12-01 | ||
JP2011263913 | 2011-12-01 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/996,160 A-371-Of-International US9122286B2 (en) | 2011-12-01 | 2012-10-22 | Integrated circuit apparatus, three-dimensional integrated circuit, three-dimensional processor device, and process scheduler, with configuration taking account of heat |
US14/800,979 Division US20150370754A1 (en) | 2011-12-01 | 2015-07-16 | Integrated circuit apparatus, three-dimensional integrated circuit, three-dimensional processor device, and process scheduler, with configuration taking account of heat |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013080426A1 true WO2013080426A1 (ja) | 2013-06-06 |
Family
ID=48534933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/006744 WO2013080426A1 (ja) | 2011-12-01 | 2012-10-22 | 熱を考慮した構造を持つ集積回路装置、三次元集積回路、三次元プロセッサ装置、及びプロセススケジューラ |
Country Status (3)
Country | Link |
---|---|
US (2) | US9122286B2 (ja) |
JP (1) | JPWO2013080426A1 (ja) |
WO (1) | WO2013080426A1 (ja) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015019030A (ja) * | 2013-07-12 | 2015-01-29 | キヤノン株式会社 | 半導体装置 |
CN104865895A (zh) * | 2014-02-24 | 2015-08-26 | 发那科株式会社 | 具备cpu的异常检测功能的控制装置 |
JP2017028085A (ja) * | 2015-07-22 | 2017-02-02 | 富士通株式会社 | 半導体装置および半導体装置の制御方法 |
JP2017532686A (ja) * | 2014-10-16 | 2017-11-02 | ホアウェイ・テクノロジーズ・カンパニー・リミテッド | 新規な低コスト、低電力高性能smp/asmpマルチプロセッサシステム |
US10354715B2 (en) | 2016-12-14 | 2019-07-16 | Fujitsu Limited | Semiconductor device and control method for semiconductor device |
US10948969B2 (en) | 2014-10-16 | 2021-03-16 | Futurewei Technologies, Inc. | Fast SMP/ASMP mode-switching hardware apparatus for a low-cost low-power high performance multiple processor system |
WO2023199182A1 (ja) * | 2022-04-15 | 2023-10-19 | 株式会社半導体エネルギー研究所 | 半導体装置 |
WO2023203435A1 (ja) * | 2022-04-22 | 2023-10-26 | 株式会社半導体エネルギー研究所 | 半導体装置 |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI508099B (zh) * | 2013-01-28 | 2015-11-11 | Phison Electronics Corp | 工作時脈切換方法、記憶體控制器與記憶體儲存裝置 |
CN106164810B (zh) * | 2014-04-04 | 2019-09-03 | 英派尔科技开发有限公司 | 使用基于电压的功能的性能变化的指令优化 |
US10371583B1 (en) * | 2014-11-11 | 2019-08-06 | Ansys, Inc. | Systems and methods for estimating temperatures of wires in an integrated circuit chip |
US10074417B2 (en) * | 2014-11-20 | 2018-09-11 | Rambus Inc. | Memory systems and methods for improved power management |
US9778868B1 (en) * | 2016-06-01 | 2017-10-03 | Ge Aviation Systems Llc | Data recorder for permanently storing pre-event data |
US10672745B2 (en) * | 2016-10-07 | 2020-06-02 | Xcelsis Corporation | 3D processor |
US10176147B2 (en) * | 2017-03-07 | 2019-01-08 | Qualcomm Incorporated | Multi-processor core three-dimensional (3D) integrated circuits (ICs) (3DICs), and related methods |
US10248558B2 (en) * | 2017-08-29 | 2019-04-02 | Qualcomm Incorporated | Memory leakage power savings |
US10755201B2 (en) | 2018-02-14 | 2020-08-25 | Lucid Circuit, Inc. | Systems and methods for data collection and analysis at the edge |
US10573630B2 (en) * | 2018-04-20 | 2020-02-25 | Advanced Micro Devices, Inc. | Offset-aligned three-dimensional integrated circuit |
US10901493B2 (en) * | 2018-06-11 | 2021-01-26 | Lucid Circuit, Inc. | Systems and methods for autonomous hardware compute resiliency |
KR102641520B1 (ko) * | 2018-11-09 | 2024-02-28 | 삼성전자주식회사 | 멀티-코어 프로세서를 포함하는 시스템 온 칩 및 그것의 태스크 스케줄링 방법 |
CN111339026A (zh) * | 2020-02-24 | 2020-06-26 | 电子科技大学 | 一种三维微处理芯片的实时性能优化技术 |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1083347A (ja) * | 1996-09-06 | 1998-03-31 | Fujitsu Ltd | キャッシュメモリ装置 |
JP2001291837A (ja) * | 2000-02-18 | 2001-10-19 | Hewlett Packard Co <Hp> | メモリ・アーキテクチャの実装方法 |
JP2002510085A (ja) * | 1998-03-31 | 2002-04-02 | インテル・コーポレーション | テンポラリ命令及び非テンポラリ命令用の共用キャッシュ構造 |
JP2004240669A (ja) * | 2003-02-05 | 2004-08-26 | Sharp Corp | ジョブスケジューラおよびマルチプロセッサシステム |
JP2005167159A (ja) * | 2003-12-05 | 2005-06-23 | Toshiba Corp | 積層型半導体装置 |
JP2005328026A (ja) * | 2004-04-16 | 2005-11-24 | Seiko Epson Corp | 薄膜デバイス、集積回路、電気光学装置、電子機器 |
WO2006117950A1 (ja) * | 2005-04-27 | 2006-11-09 | Matsushita Electric Industrial Co., Ltd. | 情報処理装置における電力制御装置 |
JP2007317213A (ja) * | 2000-10-25 | 2007-12-06 | Agere Systems Guardian Corp | キャッシュメモリにおける漏洩電力の低減方法及び装置 |
JP2009134716A (ja) * | 2007-11-28 | 2009-06-18 | Internatl Business Mach Corp <Ibm> | マルチプロセッサ・データ処理システムにおいて共有キャッシュ・ラインを与える方法、コンピュータ読み取り可能な記録媒体及びマルチプロセッサ・データ処理システム |
WO2010035426A1 (ja) * | 2008-09-25 | 2010-04-01 | パナソニック株式会社 | バッファメモリ装置、メモリシステム及びデータ転送方法 |
JP2011216806A (ja) * | 2010-04-02 | 2011-10-27 | Denso Corp | 電子回路装置 |
JP2011233842A (ja) * | 2010-04-30 | 2011-11-17 | Toshiba Corp | 不揮発性半導体記憶装置 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030145241A1 (en) | 2002-01-30 | 2003-07-31 | Zhigang Hu | Method and apparatus for reducing leakage power in a cache memory using adaptive time-based decay |
DE102005056907B3 (de) * | 2005-11-29 | 2007-08-16 | Infineon Technologies Ag | 3-dimensionales Mehrchip-Modul |
US20080091974A1 (en) * | 2006-10-11 | 2008-04-17 | Denso Corporation | Device for controlling a multi-core CPU for mobile body, and operating system for the same |
JP4940064B2 (ja) | 2007-08-28 | 2012-05-30 | ルネサスエレクトロニクス株式会社 | 半導体装置 |
US8335434B2 (en) * | 2007-10-23 | 2012-12-18 | Hewlett-Packard Development Company, L.P. | All optical fast distributed arbitration in a computer system device |
KR101642909B1 (ko) * | 2010-05-19 | 2016-08-11 | 삼성전자주식회사 | 불휘발성 메모리 장치, 그것의 프로그램 방법, 그리고 그것을 포함하는 메모리 시스템 |
-
2012
- 2012-10-22 WO PCT/JP2012/006744 patent/WO2013080426A1/ja active Application Filing
- 2012-10-22 US US13/996,160 patent/US9122286B2/en not_active Expired - Fee Related
- 2012-10-22 JP JP2013526232A patent/JPWO2013080426A1/ja active Pending
-
2015
- 2015-07-16 US US14/800,979 patent/US20150370754A1/en not_active Abandoned
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1083347A (ja) * | 1996-09-06 | 1998-03-31 | Fujitsu Ltd | キャッシュメモリ装置 |
JP2002510085A (ja) * | 1998-03-31 | 2002-04-02 | インテル・コーポレーション | テンポラリ命令及び非テンポラリ命令用の共用キャッシュ構造 |
JP2001291837A (ja) * | 2000-02-18 | 2001-10-19 | Hewlett Packard Co <Hp> | メモリ・アーキテクチャの実装方法 |
JP2007317213A (ja) * | 2000-10-25 | 2007-12-06 | Agere Systems Guardian Corp | キャッシュメモリにおける漏洩電力の低減方法及び装置 |
JP2004240669A (ja) * | 2003-02-05 | 2004-08-26 | Sharp Corp | ジョブスケジューラおよびマルチプロセッサシステム |
JP2005167159A (ja) * | 2003-12-05 | 2005-06-23 | Toshiba Corp | 積層型半導体装置 |
JP2005328026A (ja) * | 2004-04-16 | 2005-11-24 | Seiko Epson Corp | 薄膜デバイス、集積回路、電気光学装置、電子機器 |
WO2006117950A1 (ja) * | 2005-04-27 | 2006-11-09 | Matsushita Electric Industrial Co., Ltd. | 情報処理装置における電力制御装置 |
JP2009134716A (ja) * | 2007-11-28 | 2009-06-18 | Internatl Business Mach Corp <Ibm> | マルチプロセッサ・データ処理システムにおいて共有キャッシュ・ラインを与える方法、コンピュータ読み取り可能な記録媒体及びマルチプロセッサ・データ処理システム |
WO2010035426A1 (ja) * | 2008-09-25 | 2010-04-01 | パナソニック株式会社 | バッファメモリ装置、メモリシステム及びデータ転送方法 |
JP2011216806A (ja) * | 2010-04-02 | 2011-10-27 | Denso Corp | 電子回路装置 |
JP2011233842A (ja) * | 2010-04-30 | 2011-11-17 | Toshiba Corp | 不揮発性半導体記憶装置 |
Non-Patent Citations (1)
Title |
---|
KRISZTIAN FLAUTNER ET AL.: "Drowsy Caches: Simple Techniques for Reducing Leakage Power", COMPUTER ARCHITECTURE, 2002. PROCEEDINGS. 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON, 2002, pages 148 - 157, XP001110054 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015019030A (ja) * | 2013-07-12 | 2015-01-29 | キヤノン株式会社 | 半導体装置 |
CN104865895A (zh) * | 2014-02-24 | 2015-08-26 | 发那科株式会社 | 具备cpu的异常检测功能的控制装置 |
CN104865895B (zh) * | 2014-02-24 | 2017-08-29 | 发那科株式会社 | 具备cpu的异常检测功能的控制装置 |
US10126715B2 (en) | 2014-02-24 | 2018-11-13 | Fanuc Corporation | Controller having CPU abnormality detection function |
JP2017532686A (ja) * | 2014-10-16 | 2017-11-02 | ホアウェイ・テクノロジーズ・カンパニー・リミテッド | 新規な低コスト、低電力高性能smp/asmpマルチプロセッサシステム |
US10928882B2 (en) | 2014-10-16 | 2021-02-23 | Futurewei Technologies, Inc. | Low cost, low power high performance SMP/ASMP multiple-processor system |
US10948969B2 (en) | 2014-10-16 | 2021-03-16 | Futurewei Technologies, Inc. | Fast SMP/ASMP mode-switching hardware apparatus for a low-cost low-power high performance multiple processor system |
JP2017028085A (ja) * | 2015-07-22 | 2017-02-02 | 富士通株式会社 | 半導体装置および半導体装置の制御方法 |
US10354715B2 (en) | 2016-12-14 | 2019-07-16 | Fujitsu Limited | Semiconductor device and control method for semiconductor device |
WO2023199182A1 (ja) * | 2022-04-15 | 2023-10-19 | 株式会社半導体エネルギー研究所 | 半導体装置 |
WO2023203435A1 (ja) * | 2022-04-22 | 2023-10-26 | 株式会社半導体エネルギー研究所 | 半導体装置 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2013080426A1 (ja) | 2015-04-27 |
US20150370754A1 (en) | 2015-12-24 |
US9122286B2 (en) | 2015-09-01 |
US20140059325A1 (en) | 2014-02-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2013080426A1 (ja) | 熱を考慮した構造を持つ集積回路装置、三次元集積回路、三次元プロセッサ装置、及びプロセススケジューラ | |
US9311245B2 (en) | Dynamic cache sharing based on power state | |
US7076609B2 (en) | Cache sharing for a chip multiprocessor or multiprocessing system | |
JP3983250B2 (ja) | 演算処理方法および演算処理装置 | |
US8209989B2 (en) | Microarchitecture control for thermoelectric cooling | |
US8566539B2 (en) | Managing thermal condition of a memory | |
US11294808B2 (en) | Adaptive cache | |
US20060171244A1 (en) | Chip layout for multiple cpu core microprocessor | |
US9355035B2 (en) | Dynamic write priority based on virtual write queue high water mark for set associative cache using cache cleaner when modified sets exceed threshold | |
Wu et al. | Design exploration of hybrid caches with disparate memory technologies | |
JP2009157775A (ja) | プロセッサ | |
JP2023543778A (ja) | ディスアグリゲーテッド・コンピューター・システム | |
Park et al. | Power-aware memory management for hybrid main memory | |
EP4449245A1 (en) | Method to reduce register access latency in split-die soc designs | |
Lee et al. | Runtime thermal management for 3-D chip-multiprocessors with hybrid SRAM/MRAM L2 cache | |
JP6060770B2 (ja) | 情報処理装置、情報処理装置の制御方法及び情報処理装置の制御プログラム | |
US11989135B2 (en) | Programmable address range engine for larger region sizes | |
Kumar et al. | Fighting dark silicon: Toward realizing efficient thermal-aware 3-D stacked multiprocessors | |
US9195630B2 (en) | Three-dimensional computer processor systems having multiple local power and cooling layers and a global interconnection structure | |
Zhou et al. | Temperature-aware dram cache management—relaxing thermal constraints in 3-d systems | |
Sun et al. | Performance/thermal-aware design of 3D-stacked L2 caches for CMPs | |
Furat et al. | Reconfigurable hybrid cache hierarchy in 3D chip-multi processors based on a convex optimization method | |
Niknam et al. | Energy efficient 3D Hybrid processor-memory architecture for the dark silicon age | |
US20230041508A1 (en) | Selective allocation of memory storage elements for operation according to a selected one of multiple cache functions | |
Ofori-Attah et al. | A survey of system level power management schemes in the dark-silicon era for many-core architectures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2013526232 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13996160 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12853833 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12853833 Country of ref document: EP Kind code of ref document: A1 |