CN104216861A - Microprocessor and method of synchronously processing core in same - Google Patents

Microprocessor and method of synchronously processing core in same Download PDF

Info

Publication number
CN104216861A
CN104216861A CN201410431514.2A CN201410431514A CN104216861A CN 104216861 A CN104216861 A CN 104216861A CN 201410431514 A CN201410431514 A CN 201410431514A CN 104216861 A CN104216861 A CN 104216861A
Authority
CN
China
Prior art keywords
core
mentioned
square
crystal
microprocessor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410431514.2A
Other languages
Chinese (zh)
Other versions
CN104216861B (en
Inventor
G·葛兰·亨利
泰瑞·派克斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/281,488 external-priority patent/US9513687B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN104216861A publication Critical patent/CN104216861A/en
Application granted granted Critical
Publication of CN104216861B publication Critical patent/CN104216861B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a microprocessor and a method of synchronously processing a core in the microprocessor. The microprocessor comprises a plurality of semiconductor crystals, a but coupled with the plurality of semiconductor crystals and a plurality of processing cores, wherein an alien subset of the plurality of processing cores are placed in each semiconductor crystal of the plurality of semiconductor crystals; each crystal of the plurality of semiconductor crystals is formed by a control unit, selectively controlling a clock rate signal of each subset in the crystal core; for each processing core of each subset of the crystal core, a value is written into the response of the control unit, each clock rate signals of the core are shut down; a value is written into the control unit of other crystals of the plurality of crystals via the bus; after the processing cores are shut down by the clock rate signals, all the control units are started at the same time and process the clock rate signals of the processing cores; and the microprocessor requires less power.

Description

Microprocessor and synchronously process the method for core in the microprocessor
Technical field
The present invention has about a microprocessor, and is particularly to the core synchronization mechanism in a polycrystal multi-core microprocessor.
Background technology
The increase of multi-core microprocessor, mainly because which provide the advantage in performance.Mainly because semiconductor device geometry dimension size reduces rapidly, thus transistor density may be added.In a microprocessor, the existence of multinuclear has produced the demand communicated with other core with a core, to complete various function, and such as power management, cache management, debug and the configuration relevant to more multinuclear.
Traditionally, the program (such as, operating system or application program) operating in framework on polycaryon processor has used the semaphore be arranged in by a system storage addressable on all core frameworks to communicate.This may enough for many objects, but possibly cannot provide speed needed for other, accuracy and/or systemic hierarchial transparency.
Summary of the invention
The invention provides a kind of microprocessor.Above-mentioned microprocessor comprises multiple semiconductor crystal, is coupled to a bus of above-mentioned multiple semiconductor crystal and multiple process core, and wherein the different subclass of above-mentioned multiple process cores one is arranged in each semiconductor crystal of above-mentioned multiple semiconductor crystal.Each crystal of above-mentioned multiple semiconductor crystal comprises a control module, and it is configured to optionally control to one of above-mentioned each subclass of crystal center respective clock signal.For each process core of above-mentioned each subclass of crystal center, above-mentioned control module is configured to close to the above-mentioned respective clock signal of above-mentioned process core and by above-mentioned bus, above-mentioned value is write to the above-mentioned control module of other crystal in above-mentioned multiple crystal, using as the response of above-mentioned process core by the above-mentioned control module of a value write.After above-mentioned clock signal has closed all above-mentioned multiple process core, all above-mentioned control modules have been configured to the above-mentioned clock signal being simultaneously opened into all above-mentioned multiple process core jointly.
The invention provides a kind of method in order to process core synchronous in a microprocessor, wherein above-mentioned microprocessor has multiple semiconductor crystal, and is coupled to the bus of above-mentioned multiple semiconductor crystal, multiple process core, wherein a different subclass of above-mentioned multiple process core is arranged in each semiconductor crystal of above-mentioned multiple semiconductor crystal, and each crystal of wherein above-mentioned multiple semiconductor crystal comprises a control module, is configured to optionally control to one of above-mentioned each subclass of crystal center respective clock signal.Each process core that said method comprises for above-mentioned each subclass of crystal center performs following operation: write a value to above-mentioned control module by above-mentioned process core, closed to the above-mentioned respective clock signal of above-mentioned process core by above-mentioned control module, and by above-mentioned bus, above-mentioned value is write to the above-mentioned control module of other crystal in above-mentioned multiple crystal by above-mentioned control module.Said method is also included within after above-mentioned clock signal closed all above-mentioned multiple process core, is jointly opened into the above-mentioned clock signal of all above-mentioned multiple process core by all above-mentioned control modules simultaneously.
The invention provides a kind of computer program in for a computer installation coded by least one non-Transient calculation machine usable medium, above computer program product comprises the computing machine usable program code of instruction one microprocessor.Above computer usable program code comprises the first procedure code of the multiple semiconductor crystal of instruction.Above computer usable program code also comprises the second procedure code of instruction one bus, and above-mentioned bus is coupled to above-mentioned multiple semiconductor crystal.Above computer usable program code also comprises the 3rd procedure code of the multiple process core of instruction, and wherein the different subclass of above-mentioned multiple process cores one is arranged in each semiconductor crystal of above-mentioned multiple semiconductor crystal.Above computer usable program code also comprises for above-mentioned each crystal of multiple semiconductor crystal, in order to indicate the 4th procedure code of a control module, be configured to optionally control to one of above-mentioned each subclass of crystal center respective clock signal, wherein for each process core of above-mentioned each subclass of crystal center, above-mentioned control module is configured to close to the above-mentioned respective clock signal of above-mentioned process core and by above-mentioned bus, above-mentioned value is write to the above-mentioned control module of other crystal in above-mentioned multiple crystal, one value to be write as above-mentioned process core the response of above-mentioned control module.After above-mentioned clock signal has closed all above-mentioned multiple process core, all above-mentioned control modules have been configured to the above-mentioned clock signal being simultaneously opened into all above-mentioned multiple process core jointly.
The present invention has less power consumption.
Accompanying drawing explanation
Fig. 1 is the calcspar of display one multi-core microprocessor.
Fig. 2 is the calcspar of display one control word, a status word and a configuration words.
Fig. 3 is the process flow diagram of display one control module operation.
Fig. 4 is a calcspar of the microprocessor of another embodiment of display.
Fig. 5 is that display one microprocessor operation is with the process flow diagram of dump Debugging message.
Fig. 6 is the operation example sequential chart of display one according to microprocessor in Fig. 5 process flow diagram.
Fig. 7 A ~ 7B is the process flow diagram that display one microprocessor performs across the control operation of core speed buffering.
Fig. 8 is the sequential chart of display according to the microprocessor operation example of Fig. 7 A ~ 7B process flow diagram.
Fig. 9 is the operational flowchart that display microprocessor enters a low-power encapsulation C-state.
Figure 10 is the sequential chart of display according to Fig. 9 process flow diagram one microprocessor operation example.
Figure 11 is that microprocessor enters the operational flowchart of a low-power encapsulation C-state according to another embodiment of the present invention.
Figure 12 is the sequential chart of display according to microprocessor operation one example of Figure 11 process flow diagram.
Figure 13 is the sequential chart of display according to another example of microprocessor operation of Figure 11 process flow diagram.
Figure 14 is the process flow diagram dynamically reconfigured of display microprocessor.
Figure 15 shows the process flow diagram dynamically reconfigured according to microprocessor in another embodiment.
Figure 16 is the sequential chart of display according to microprocessor operation one example of Figure 15 process flow diagram.
Figure 17 is the calcspar showing hardware semaphore 118 in FIG.
Figure 18 is the operational flowchart that display works as that a core 102 reads hardware semaphore 118.
Figure 19 is the operational flowchart of display when a core write hardware semaphore.
Figure 20 is that display is when microprocessor use hardware semaphore is to perform the operational flowchart needing a resource exclusive ownership.
Figure 21 is that display sends according to the core of Fig. 3 process flow diagram the sequential chart that non-sleep synchronization request operates an example.
Figure 22 is a program flow diagram of display configure microprocessor.
Figure 23 is the program flow diagram of display according to configure microprocessor in another embodiment.
Figure 24 is the calcspar of display according to a multi-core microprocessor of another embodiment.
Figure 25 is the calcspar of display one microcode patching framework.
Figure 26 A ~ 26B be display Figure 24 in this microprocessor with an operational flowchart of the multinuclear of a microcode patching to this microprocessor of propagating Figure 25.
Figure 27 is the sequential chart of display according to an example of a microprocessor operation of Figure 26 A ~ 26B process flow diagram.
Figure 28 is the calcspar of display according to a multi-core microprocessor of another embodiment.
Figure 29 A ~ 29B shows according to this microprocessor in Figure 28 of another embodiment in order to propagate the operational flowchart of a microcode patching to multiple cores of this microprocessor.
Figure 30 is that the microprocessor of display Figure 24 is in order to repair the process flow diagram of a service processor procedure code.
Figure 31 is the calcspar of display according to a multi-core microprocessor of another embodiment.
Figure 32 is the operational flowchart that in display Figure 31, this microprocessor is updated to multiple cores of this microprocessor in order to propagate a MTRR.
Wherein, being simply described as follows of symbol in accompanying drawing:
100: multi-core microprocessor; 102A, 102B, 102N: core A, core B, core N; 103: non-core; 104: control module; 106: state working storage; 108A, 108B, 108C, 108D, 108N: synchronous working storage; 108E, 108F, 108G, 108H: the synchronous working storage of shadow; 114: fuse; 116: special random access memory; 118: hardware semaphore; 119: share cache memory; 122A, 122B, 122N: clock signal; 124A, 124B, 124N: look-at-me; 126A, 126B, 126N: data-signal; 128A, 128B, 128N: controlling electric energy signal; 202: control word; 204: wake events; 206: synchro control; 208: power supply lock; 212: sleep; 214: selective wake-up; 222:S; 224:C; 226: synchronous regime or C-state; 228: core set; 232: forcing synchronization; 234: selectivity synchronously stops; 236: core of stopping using; 242: status word; 244: wake events; 246: minimum conventional C-state; 248: error code; 252: configuration words; 254-0 ~ 254-7: activation; 256: local nuclear volume; 258: amount of crystals; 302,304,305,306,312,314,316,318,322,326,328,332,334,336: step; 402A, 402B: bus unit B between bus unit A, crystal between crystal; 404: bus between crystal; 406A, 406B: crystal A, crystal B; 502,504,505,508,514,516,518,524,526,528,532: step; 702,704,706,708,714,716,717,718,724,726,727,728,744,746,747,748,749,752: step; 902,904,906,907,908,909,914,916,919,921,924: step; 1102,1104,1106,1108,1109,1121,1124,1132,1134,1136,1137: step; 1402,1404,1406,1408,1412,1414,1416,1417,1418,1422,1424,1426: step; 1502,1504,1506,1508,1517,1518,1522,1524,1526,1532: step; 1702: have position; 1704: owner position; 1706: state machine 1802,1804,1806,1808: step; 1902,1904,1906,1908,1912,1914,1916,1918: step; 2002,2004,2006,2008: step; 2202,2203,2204,2205,2206,2208,2212,2214,2216,2218,2222,2224: step; 2302,2304,2305,2306,2312,2315,2318,2324: step; 2404: core microcode ROM (read-only memory); 2408: non-core microcode patching random access memory; 2423: service unit; 2425: non-core microcode ROM (read-only memory); 2439: repairing can addressing content memorizer; 2497: service unit start address working storage 2499: core random access memory; 2500: microcode patching; 2502: header; 2504: immediately repair; 2506: check and correction and; 2508:CAM data; 2512: core PRAM repairs; 2514: check and correction and; 2516:RAM repairs; 2518: non-core PRAM repairs; 2522: check and correction and; 2602,2604,2606,2608,2611,2612,2614,2616,2618,2621,2622,2624,2626,2628,2631,2632,2634,2652: step; 2808: core repairs RAM; 2912,2916,2922,2932: step; 3002,3004,3006: step; 3102: type of memory scope working storage; 3202,3204,3206,3208,3211,3212,3214,3216,3218,3252: step.
Embodiment
Hereafter for introducing most preferred embodiment of the present invention.Each embodiment in order to principle of the present invention to be described, but is not used to limit the present invention.Scope of the present invention is when being as the criterion with claims.
Please refer to Fig. 1, it is the calcspar of display one multi-core microprocessor 100.Microprocessor 100 comprises multiple process core, is denoted as 102A, 102B to 102N, and it is referred to as multiple process core 102, or is called for short multiple core 102, and is called separately process core 102 or is called for short core 102.More preferably say, each core 102 comprises the pipeline (scheming not shown) of one or more functional unit, it comprises an instruction cache (instruction cache), an instruction converting unit or instruction decoder, more preferably comprises a microcode (microcode) unit, temporary unit, reservation station (Reservation station), cache memory, performance element, the memory sub-system and comprise the retirement unit (retire unit) of an order buffer of calling by name.More preferably say, multiple core 102 comprises a SuperScale (Superscalar), non-sequential performs (out-of-order execution) microbody framework.In one embodiment, microprocessor 100 is x86 architecture microprocessor, but in other embodiments, microprocessor 100 meets the framework of other instruction set.
Microprocessor 100 also comprises the non-core 103 being different from above-mentioned multiple core 102 that is coupled to above-mentioned multiple core 102.Non-core 103 comprises a control module 104, special random access memory 116 (the Private Random Access Memory of fuse 114, PRAM) and one share cache memory 119 (Shared Cache Memory), such as, by multiple core 102 the second level (level-2 that shares, and/or the third level (level-3, L3) cache memory L2).Each core 102 configures in order to read data/write data to non-core 103 by a respective address/data bus 126 from non-core 103, and core 102 provides a nand architecture address space (being also considered as special or micro-architecture address space) to the shared resource of non-core 103.Special random access memory 116 is special or nand architecture, and that is it is not in the framework user program address space of microprocessor 100.In one embodiment, non-core 103 comprises arbitrated logic (Arbitration Logic), and it is by the resource of multiple core 102 requests for arbitration access non-core 103.
Each fuse 114 is electronic installations, and it can be blown or not be blown; When fuse 114 is not blown, fuse 114 has Low ESR and easy conduction current; When fuse 114 is blown, fuse 114 has high impedance and is not easy conduction current.One testing circuit is associated with each fuse 114, to assess this fuse 114, such as, detect this fuse 114 and whether conduct a high electric current or low-voltage (is not blown, such as, logic is zero or removes (clear)) or a low current or high voltage (blow, such as, logic is one or arranges (set)).This fuse 114 can during the manufacture of microprocessor 100 in be blown, and in certain embodiments, a fuse do not blown 114 can be blown after microprocessor 100 manufactures.More preferably say, a fuse blown 114 is irreversible.The example of one fuse 114 is a polysilicon fuse, and it can apply a sufficiently high voltage and blow between device.Another example Wei Nie – chromium fuse of one fuse 114, it can use a laser and blow.More preferably say, sensing circuit electric power opens sensing fuse 114, and provides its assessment to the corresponding positions in the preservation working storage (Holding Register) of microprocessor 100.When microprocessor 100 is reset releasing, multiple core 102 (such as, microcode) reading and saving working storage is to determine the value of the fuse 114 sensed.In one embodiment, before microprocessor 100 is reset releasing, the value upgraded can be scanned up to preservation working storage via a border scanning t test, for example, similarly be a joint test behavior tissue (Joint Test Action Group, JTAG) input, upgrade the value of fuse 114 with essence.This is for test and/or debug object, as described in the embodiment relevant to Figure 22 and Figure 23 particularly useful in below.
In addition, in one embodiment, microprocessor 100 comprises the local Advanced Programmable Interrupt Controllers APICs (Advanced Programmable Interrupt Controller, APIC) (scheming not shown) of being correlated with different from each core 102.In one embodiment, the explanation that California (California) holy Plutarch draws the Intel Company of (Santa Clara) local Advanced Programmable Interrupt Controllers APICs in May, 2012 Intel64 and IA-32 Framework Software developer handbook 3A is observed on local Advanced Programmable Interrupt Controllers APICs framework ground, particularly in Section 10.4.Especially local Advanced Programmable Interrupt Controllers APICs comprises an Advanced Programmable Interrupt Controllers APICs ID and and comprises pilot processor (Bootstrap Processor, BSP) the Advanced Programmable Interrupt Controllers APICs plot working storage of flag, its produce and purposes as follows by what describe in more detail, especially relevant with Figure 14 to Figure 16 embodiment.
Control module 104 comprises the combination of hardware, software or hardware and software.Control module 104 comprises a hardware semaphore (Hardware Semaphore) 118 (describing following Figure 17 to Figure 20 in detail), a state working storage 106, configures working storage 112 and a self-corresponding synchronous working storage 108 each with each core 102.More preferably say, the entity of each non-core 103 can by 102 addressing of each core in different address in nand architecture address space, and its this nand architecture address space can make microcode read and write core 102.
Each synchronous working storage 108 can be write by each self-corresponding core 102.State working storage 106 is read by each core 102.Configuration working storage 112 (the inactive core position 236 via Fig. 2 as described below) can be read by each core 102 and indirectly write.Control module 104 also can comprise interrupt logic (scheming not shown), this interrupt logic generates look-at-me (the interrupt signal to the correspondence of each core 102, INTR) 124, this look-at-me is produced to interrupt corresponding core 102 by control module 104.Interrupt source responds this control module 104 and produces a look-at-me 124 to a core 102, and interrupt source can comprise exterior interrupt (such as x86 framework INTR, SMI, NMI interrupt source) or bus events (such as, the bus signals STPCLK of x86 framework formula establishes (assertion) or removes and establishes (de-assertion)).In addition, each core 102 transmits an internuclear look-at-me 124 to other each core 102 by write control unit 104.More preferably say, except as otherwise noted, otherwise the microcode that internuclear look-at-me described herein is a core 102 asks the internuclear look-at-me of nand architecture via a micro-order (microinrstuction), it is different from the internuclear look-at-me of conventional architectures of being asked via a framework instruction by system software.Finally, when a synchronous situation (Synchronization Condition) occurs, as mentioned below (such as, refer to the square 334 in Figure 21 and Fig. 3), control module 104 can produce look-at-me 124 a to core 102 (a synchronous look-at-me).Control module 104 also produces the clock signal (CLOCK) 122 of a correspondence to each core 102, and wherein control module 104 can optionally be closed, and effectively makes corresponding core 102 enter to sleep and open to wake core 102 up and back up.Control module 104 also produces the controlling electric energy signal (PWR) 128 of a corresponding core to each core 102, and it optionally controls corresponding core 102 and receives or do not receive electric energy.Therefore, control module 104 optionally can make a core 102 enter a darker sleep state to close the electric energy of this core via the controlling electric energy signal 128 of correspondence, and reopens electric energy to this core 102 to wake this core 102 up.
One core 102 can write its corresponding, have in the synchronous working storage 108 of synchronization bit set (referring to the S position 222 of Fig. 2), aforesaid operations is regarded as a synchronization request (Synchronization Request).More detailed description is described below, and in one embodiment, this synchronization request Request Control unit 104 makes core 102 enter sleep state, and wakes this core 102 up when a synchronous situation occurs and/or when a specific wake events occurs.One synchronous situation occurs in a particular child set (referring to the core set field 228 in Fig. 2) that the core 102 enabling (referring to the activation position 254 in Fig. 2) all in microprocessor 100 maybe can enable core 102 when having write identical synchronous situation (being described in more detail in C position 224 in Fig. 2, a combination of synchronous situation or C-state field 226 and core set field 228, S position 222 to be described below in more detail) to the synchronous working storage 108 of its correspondence.In order to respond the generation of a synchronous situation, control module 104 wakes all cores 102 just waiting for this synchronous situation up simultaneously, that is, ask synchronous situation.In the embodiment that another is described below, core 102 can ask the core 102 only finally writing this synchronization request to be waken up (the selective wake-up position 214 referring to Fig. 2).In another embodiment, synchronization request does not ask core 102 to enter sleep state, and on the contrary, synchronization request Request Control unit 104 interrupts core 102 when synchronous situation occurs, and is described below in more detail, particularly Fig. 3 and Figure 21.
More preferably say, when control module 104 detects (owing to finally writing the last core 102 in synchronization request to a synchronous working storage 108) when synchronous situation occurs, control module 104 makes last core 102 enter sleep state, such as, close the clock signal 122 being sent to last write core 102, then wake all core 102 simultaneously up, such as, open the clock signal 122 being sent to all core 102.In this method, all core 102 is all accurately waken up in the identical clock period (clock cycles), such as, its clock signal 122 is unlocked.For some operation, such as debug (debugging), be useful especially (referring to the embodiment in Fig. 5), it is useful for accurately waking core 102 up in the same clock period.In one embodiment, non-core 103 comprises a single phase-locked loop (Phase-locked Loop, PLL), and it produces the clock signal 122 being supplied to core 102.In other embodiments, microprocessor 100 comprises multiple phase-locked loop, and it produces the clock signal 122 being provided to core 102.
control, state and configuration words
Please refer to Fig. 2, it shows a calcspar of a control word 202, status word 242 and a configuration words 252.One core 102 writes the synchronous working storage 108 of a value to the control module 104 of Fig. 1 of control word 202, to produce an atom request (atomic request), to ask to enter sleep state and/or close synchronization (synchronous) with other core 102 all in microprocessor 100 or a particular subset.One core 102 reads a value of this status word 242 that state working storage 106 transmits in this control module 104, to determine status information described herein.One core 102 reads the value configuring this configuration words 252 that working storage 112 transmits in this control module 104, and uses this value, is described below.
Control word 202 comprises the synchronous control group position 206 of wake events field 204, one and a power supply lock (Power Gate, a PG) position 208.This synchro control field 206 comprises various position or sub-field, and it controls the synchronous of the sleep of core 102 and/or core 102 and other core 102.Synchro control field 206 comprises sleep position 212, selective wake-up (SEL WAKE) position 214, S position 222, C position 224, synchronous regime or C-state field 226, core set field 228, forcing synchronization position 232, selectivity synchronously stops (kill) position 234, and core is stopped using core position 236.Status word 242 comprises wake events field 244, minimum conventional C-state field 246 and an error code field 248.This configuration words 252 comprises activation position 254, local nuclear volume field 256 and an amount of crystals field 258 of each core 102 of microprocessor 100.
The wake events field 204 of this control word 202 comprises multiple position corresponding to different event.As fruit stone 102 arranges one in wake events field 204, when event occur to should position time, control module 104 will wake this core 102 (such as, opening clock signal 122 to this core 102) up.When this core 102 is synchronous with other cores all specified in core set field 228, then there is a wake events.In one embodiment, core set field 228 can specify all core 102 in microprocessor 100; All core 102 shares a cache memory (such as, a second level (L2) speed buffering and/or the third level (L3) speed buffering) with instant (instant) core 102; In identical semiconductor crystal, all core 102 is instant core 102 (consulting in Fig. 4 the example of embodiment describing a polycrystal, multi-core microprocessor 100); Or all cores 102 in other semiconductor crystal are instant core 102.The one core set 102 sharing cache memory can be considered a wafer (Slice).Other example of other wake events comprises, but be not limited to, (de-assertion) and an internuclear interruption (inter-core interrupt) are established in establishment (assertion) or the releasing of x86INTR, SMI, NMI, a STPCLK.When a core 102 is waken up, it can read wake events field 244 in status word 242 to determine this positive movable wake events.
As fruit stone 102 this PG position 208 is set time, this control module 104 closes to the electric energy (such as, via this controlling electric energy signal 128) of core 102 after making core 102 enter sleep state.When control module 104 restores electricity subsequently to core 102, control module 104 removes PG position 208.The use of PG position 208 will have at following Figure 11 to Figure 13 and describe in more detail.
If when this core 102 setting sleep position 212 or selective wake-up position 214, control module 104, after core 102 write use specifies in the synchronous working storage 108 of wake events field 204 wake events, makes core 102 enter sleep state.This sleep position 212 and selective wake-up position 214 mutual exclusion.When a synchronous situation occurs, the difference between them is relevant with the action that control module 104 is taked.If core 102 arranges sleep position 212, when a synchronous situation occurs, then control module 104 will wake all core 102 up.Otherwise if a core 102 arranges selective wake-up position 214, when a synchronous situation occurs, control module 104 will only wake the core 102 of last write synchronous situation to its synchronous working storage up.
As fruit stone 102 does not put sleep position 212, when also not arranging selective wake-up position 214, although control module 104 can not make core 102 enter sleep state, when a synchronous situation occurs, control module 104 can not wake core 102 up.Control module 104 will be arranged on the position that instruction one synchronous situation is positive movable wake events field 204, and therefore core 102 can be detected this synchronous situation and occurs.Many be specified in wake events in this wake events field 204 also interruptible price by this control module 104 produce the source of a look-at-me to core 102.But if there is requirement, then the microcode of core 102 can cover and interrupt source.So, when core 102 is waken up, this microcode can read state working storage 106 and determines whether a synchronous situation or a wake events or both occur.
As fruit stone 102 arranges S position 222, its Request Control unit 104 is synchronous in a synchronous situation.This synchronous situation is in some combinations of C position 224, synchronous situation or C-state field 226 and be designated in core set field 228.If when C position 224 is set up, C-state field 226 specifies a C-state value; If C position 224 is removed, synchronous situation field 226 specifies a non-C-state synchronized situation.More preferably say, the value of synchronous regime or C-state field 226 comprises the bounded set of a nonnegative integer.In one embodiment, this synchronous situation or C-state field 226 are 4.When C position 224 is for removing (clear), a synchronous situation occurs in: all cores 102 in a specific core set field 228 have write S position 222 and gathered with in the identical value of synchronous situation field 226 to synchronous working storage 108.In one embodiment, the corresponding unique synchronous situation of the value of synchronous situation field 226, such as, synchronous situation various in the embodiment of demonstrating described by below.When C position 224 is set up, whether writes value identical in this C-state field 226 no matter synchronous situation occurs in all core 102 in a specific core set field 228, all write in the set to synchronous working storage 108 of respective S position 222.In the case, control module 104 distributes minimum write value in (post) this C-state field 226 to the minimum conventional C-state field 246 in this state working storage 106, this minimum write value can be read by a core 102, such as, read by the main core 102 in square 908 or by finally write/being optionally waken up core 102 in square 1108.In one embodiment, if core 102 specifies a preset value (such as, the set of all positions) in synchronous situation field 226, this instruction control module 104 is worth with the arbitrary synchronous situation field 226 specified by other core 102 to mate instant core 102.
If when core 102 sets forcing synchronization position 232, control module 104 is mated immediately by forcing all synchronization request of just carrying out.
In general, if arbitrary core 102 because of a wake events specified in wake events field 204 wake up time, control module 104 stops (kill) all synchronization request of just carrying out by removing S position 222 in synchronous working storage 108.But, if core 102 set this selectivity synchronous in stop bit 234 time, control module 104 will stop the synchronization request only having the core 102 waken up because of (asynchronous situation generation) wake events just carrying out.
If when two or more core 102 is asked synchronous under different synchronous situation, control module 104 thinks that this is a pause (deadlock) situation.If a value is the different value arranged in C position 224 and synchronous situation field 226 that S position 222, value of (set) is removing (clear) by two or more core 102 when writing in respective synchronous working storage 108, two or more core 102 is asked synchronously under different synchronous situation.For example, if a value is S position 222, value of setting (set) by a core 102 is that the removing C position 224 of (clear) and the value 7 of a synchronous situation 226 write in synchronous working storage 108, and a value is arrange C position 224 that S position 222, value of (set) is removing (clear) and a synchronous situation 226 to be worth 9 when writing in synchronous working storage 108 by another core 102, control module 104 thinks that this is a stall condition.In addition, if it is that the C position 224 arranging (set) writes in its synchronous working storage 108 by a value that the C position 224 that a value is removing is write to another core 102 in its synchronous working storage 108 by a core 102, then control module 104 thinks that this is a stall condition.In order to respond a stall condition, control module 104 stops all synchronization request of just carrying out, and wakes all core 102 in sleep mode up.Control module 104 also distributes (post) value in the error code field 248 of state working storage 106, and its state working storage 106 is can be read to determine this pause reason and the state working storage taken appropriate action by core 102.In one embodiment, error code 248 represents that the synchronous situation that each core 102 writes, this synchronous situation make each core determine whether to continue to perform the projected route of its action or be delayed to another core 102.For example, if a core 102 writes a synchronous situation to perform a power management operations (such as, perform an x86MWAIT instruction) and another core 102 write a synchronous situation with perform one cache management operation (such as, x86WBINVD instruction), then plan perform this MWAIT instruction core 102 because of MWAIT be a selectable operation, and WBINVD is an enforceable operation and cancels MWAIT instruction, to be delayed to the core 102 that another is just performing WBINVD instruction.For another example, if a core 102 writes a synchronous situation to perform a debug operation (such as, dump debug state (Dump debug state)) and another core 102 write a synchronous situation with perform one cache management operation (such as, WBINVD instruction) time, then plan carries out the core 102 of WBINVD by storing WBINVD state, wait for that dump debug occurs and recovers WBINVD state and perform WBINVD instruction, to be delayed to the core 102 of executive dumping debug.
In the embodiment of a single crystal, amount of crystals field 258 is zero.In the embodiment (such as, in Fig. 4) of more than one crystal, which crystal amount of crystals field 258 indicates resident by this core 102 institute reading configuration working storage 112.For example, in the embodiment of one or two crystal, this crystal be designated as 0 and 1 and this amount of crystals field 258 there is the value of 0 or 1.In one embodiment, for example, fuse 114 is optionally blown to specify a crystal to be 0 or 1.
Local nuclear volume field 256 indicates the local quantity of crystal center to just reading the core 102 configuring working storage 112.More preferably say, although have one by all core 102 the sole disposition working storage 112 shared, but control module 104 knows which core 102 is just reading configuration working storage 112, and in local nuclear volume field 256, provides correct value according to a reader.This makes the microcode of core 102 know to be arranged in the local nuclear volume between other core 102 of same crystal.In one embodiment, select suitable value at a multiplexer of non-core 103 part of microprocessor 100, this suitable value can read configuration working storage 112 based on core 102 and recover in the local nuclear volume field 256 of configuration words 252.In one embodiment, optionally blow fuse 114 and operate the value recovering local nuclear volume field 256 together with multiplexer.More preferably say, the value of local nuclear volume field 256 be fixing independently, its core 102 in crystal is spendable, indicated by activation position 254 as described below.That is, even if when one or more core 102 of this crystal is deactivated, the value of local nuclear volume field 256 remains fixing.In addition, the microcode of core 102 calculates the overall nuclear volume of core 102, and the overall nuclear volume of this core 102 is a value relevant to configuration, and its purposes is described in detail as follows.The nuclear volume of the overall core 102 of overall nuclear volume instruction microprocessor 100.Core 102 calculates its overall nuclear volume by using the value of amount of crystals field 258.Such as, in one embodiment, microprocessor 100 comprises 8 cores 102, and average mark to two has in the crystal of crystal value 0 and 1, and in each crystal, local nuclear volume field 256 recovers the value of one 0,1,2 or 3; The core being 1 in crystal value adds that namely 4 recover the value of local nuclear volume field 256 to calculate its overall nuclear volume.
Each core 102 of microprocessor 100 has a corresponding activation position 254 of configuration words 252, and whether configuration words 252 indicates this core 102 to be activated or stop using.In fig. 2, activation position 254 represents with activation position 254-x respectively, and wherein x is the overall nuclear volume of this corresponding core 102.In example hypothesis microprocessor 100 in Fig. 2, there are eight cores 102, in the example of Fig. 2 and Fig. 4,254-0 instruction in activation position has the core 102 of overall nuclear volume 0 (such as, core A) whether be activated, 254-1 instruction in activation position has the core 102 of overall nuclear volume 1 (such as, core B) whether be activated, whether the core 102 (such as, core C) that 254-2 instruction in activation position has overall nuclear volume 2 is activated etc.Therefore, by understanding overall nuclear volume, the microcode of a core 102 can be deactivated by determining in configuration words 252 which core 102 of microprocessor 100 and which core 102 is activated.More preferably say, if when this core 102 is activated, then an activation position 254 is set, if when core 102 is deactivated, then activation position 254 is eliminated.When this microprocessor 100 is reset, hardware is automatically inserted (populate) this activation position 254.More preferably say, when whether the given core 102 of the manufactured instruction of microprocessor 100 one is for enabling, if when stopping using, this hardware is optionally blown based on fuse 114 and inserts activation position 254.For example, if given core 102 is tested and when finding that it is fault, a fuse 114 can be blown the activation position 254 of removing this core 102.In one embodiment, a fuse be blown 114 indicates a core 102 to be inactive, and prevents the clock signal from being provided to inactive core 102.This inactive core position 236 can write in its synchronous working storage 108 by each core 102, and to remove its activation position 254, more details relevant to Figure 14 to Figure 16 will be described in detail as follows.More preferably say, removing activation position 254 can not stop this core 102 to perform instruction, but this configuration working storage 112 can be upgraded, and, this core 102 must set a different position (scheming not shown), to prevent this core itself from performing instruction, such as, its power supply is made to be removed and/or to close its clock signal.For a polycrystal configure microprocessor 100 (such as, Fig. 4), this configuration working storage 112 comprises an activation position 254 of all core 102 in this microprocessor 100, such as, the core 102 of all core 102 not only this local crystal, and can be the core 102 of this far-end crystal.More preferably say, in the microprocessor 100 of a polycrystal configuration, when a core 102 writes to its synchronous working storage 108, the value of synchronous working storage 108 is passed to the core 102 (referring to Fig. 4) of the synchronous working storage 108 of shadow in another crystal corresponding, wherein, if this inactive core position 236 is set up, a renewal will be caused to be transferred into far-end crystal configuration working storage 112, and make this locality and far-end crystal configuration working storage 112 all have identical value.
In one embodiment, configure working storage 112 cannot directly be write by a core 102.But, write to this configuration working storage 112 by a core 102 and be transmitted in the configuration working storage 112 of other crystal in a polycrystal microprocessor 100, such as, as the description in square in Figure 14 1406 by causing the value of local activation position 254.
control module
Please refer to Fig. 3, is the process flow diagram that display one describes this control module 104.Flow process starts from square 302.In square 302, a core 102 writes a synchronization request, and such as, write a control word 202 to its synchronous working storage 108, wherein this synchronization request is received by control module 104.When a polycrystal configure microprocessor 100 (such as, refer to Fig. 4), when the synchronous working storage of shadow 108 of a control module 104 receives the value propagating synchronous working storage 108 transmitted by other crystal 406, this control module 104 operates effectively according to Fig. 3, such as, when this control module 104, from its this earth's core 102, one of them receives a synchronization request (square 302), except this control module 104 makes core 102 enter sleep (such as, square 314), or wake up (at square 306, 328 or 336), or interrupt (at square 334), or stop core 102 at the wake events (square 326) of its local crystal 406, also insert its local state working storage 106 (square 318).Flow process proceeds to square 304.
In square 304, this control module 104 checks this synchronous situation in square 302, to determine whether a pause (deadlock) situation occurs, as above described by Fig. 2.If so, then flow process marches to square 306; Otherwise flow process proceeds to decision block 312.
In square 305, this control module 104 detect synchronous working storage 108 one of them wake events field 204 a wake events generation (except in square 316 by except the generation of a synchronous situation detected).Described in below square 326, control module 104 automatically can stop wake events.Control module 104 can be detected when this wake events occurs as an event asynchronous (Event Asynchronous) and write a synchronization request in square 302.Flow process also proceeds to square 306 by square 305.
In square 306, this control module 104 inserts state working storage 106, stops the synchronization request of just carrying out, and wakes the core 102 of arbitrary sleep up.As mentioned above, wake sleep core 102 up and can comprise its power of recovery.This core 102 then can read this state working storage 106, particularly error code 248, to determine the reason of pausing, and according to priority process corresponding to this collision sync request it, as described above.In addition, this control module 104 stops all synchronization request of just carrying out (such as, remove the S position 222 in the synchronous working storage 105 of each core 102), unless square 306 be by reach after square 305 and this selectivity synchronous in stop bit 234 be set time, in such cases, the synchronization request that the core 102 that this control module 104 can stop only to be waken up by this wake events is just carrying out.If square 306 reaches by after square 305, then this core 102 can read wake events 244 field to determine the wake events occurred.In addition, if this wake events is an interruption source of not covering (unmasked), then control module 104 will produce an interrupt request to this core 102 by this look-at-me 124.Flow process terminates in square 306.
In decision block 312, this control module 104 determines whether sleep position 212 or selective wake-up position 214 are set.If have, then flow process proceeds to square 314; Otherwise flow process proceeds to decision block 316.
In square 314, control module 104 makes this core 102 enter sleep state.As mentioned above, make a core 102 enter sleep state can comprise and remove its power supply.In one embodiment, as an optimized example, even if this PG position 208 is set, if this be the core 102 of last write (such as, to the generation of synchronous situation be caused), in square 314, this control module 104 does not remove the power supply of this core 102, and the core 102 instant in square 328 finally write because of this control module 104 backs up, therefore this selective wake-up position 214 is set.In one embodiment, this control module 104 comprises synchronous logic and sleep logic, and both are separated from each other, but communicate mutually; In addition, this each synchronous logic and sleep logic comprise a part for this synchronous working storage 108.Advantageously, the synchronous logic part writing to this synchronous working storage 108 and the sleep logic part being written to this synchronous working storage 108 are atom (atomic), namely indivisible.That is, if when part write occurs, its synchronous logic part and sleep logic part all ensure to occur.More preferably say, the piping obstruction of this core 102, do not allow any more write to occur, until it is guaranteed that two parts write in this synchronous working storage 108 occur all.Writing a synchronization request and entering dormant advantage is immediately that it does not need this core 102 (such as, microcode) to operate continuously to determine whether this synchronous situation occurs.Owing to can saving electric power and not consuming other resource, such as bus and/or memory band width, be therefore highly profitable.It should be noted that, in order to enter sleep state but without the need to request synchronous (such as with other core 102, square 924 and square 1124), this core 102 can write S position 222 for removing (Clear) and sleep position 212 are for setting (Set), be referred to herein as a Sleep Request, in this synchronous working storage 108; If when in wake events field 204, a specified uncovered wake events occurs (such as, square 305), but do not find the generation of this core 102 1 synchronous situation (such as, square 316) time, in such cases, this control module 104 wakes this core 102 (such as, square 306) up.Flow process proceeds to decision block 316.
In decision block 316, this control module 104 determines whether a synchronous situation occurs.If so, flow process proceeds to square 318.As mentioned above, a synchronous situation can only occur when S position 222 is set.In one embodiment, this control module 104 uses this activation position 254 in Fig. 2, and it indicates which core 102 in this microprocessor 100 to be activated, and which core 102 is deactivated.This control module 104 only finds the core 102 be activated, to determine whether a synchronous situation occurs.One core 102 can be tested and find defectiveness and being deactivated in the production time because of it.Therefore, a fuse is blown this core 102 cannot be operated and indicates this core 102 to be deactivated.The software that one core 102 can be asked because of this core 102 and being deactivated (such as, referring to Figure 15).For example, when a user asks, BIOS writes a special module working storage (Model Specific Register, MSR) to ask this core 102 to be deactivated, stop using itself (such as to respond this core 102, by this inactive core position 236), and notify that other core 102 reads the configuration working storage 112 that other core 102 determines this core 102 inactive.One core 102 also can repair (patch) (such as, referring to Figure 14) via a microcode, and this microcode produces by blowing fuse 114 and/or is loaded into from system storage (such as a FLASH memory).Except determining whether a synchronous situation occurs, this control module 104 checks this forcing synchronization position 232.If arrange (set), flow process then proceeds to square 318.If this forcing synchronization position 232 is removing (clear) and a synchronous situation not yet occurs, then flow process ends in square 316.
In square 318, this control module 104 inserts this state working storage 106.Explicitly, if it is all core 102 when asking a C-state synchronous that synchronous situation occurs, as mentioned above, this control module 104 inserts minimum conventional C-state field 246.Flow process proceeds to decision block 322.
In decision block 322, this control module 104 checks selective wake-up (SEL WAKE) position 214.If when this position is for arranging (set), flow process proceeds to square 326; Otherwise flow process proceeds to decision block 322.
In square 326, this control module 104 stops all wake events of other core 102 all except instant core (instant core), wherein this instant core is that last write synchronization request, to the core 102 of its synchronous working storage 108, therefore makes this synchronous situation occur in square 302.In one embodiment, if for stoping wake events and other side to be true (True) time, the logic of this control module 104 simply boolean (Boolean) AND computing have one be false (False) signal wake situation up.The purposes of all wake events of all core is stoped to be described in more detail as follows, particularly Figure 11 to Figure 13.Flow process proceeds to square 328.
In square 328, this control module 104 only wakes this instant core 102 up, but not this other synchronous core of wake request.In addition, this control module 104 stops by removing this S position 222 synchronization request that this instant core 102 just carrying out, but does not stop the synchronization request that other core 102 just carrying out, and such as, the S position 222 leaving other core 102 is arranged.Therefore advantageously, if when instant core 102 writes another synchronization request after it is waken up, it will cause the generation of synchronous situation (supposing that the synchronization request of other core 102 is not yet stopped) again, and an example will describe in Figure 12 and Figure 13 of below.Flow process ends at square 328.
In decision block 332, this control module 104 checks this sleep position 212.If when this position is for arranging (set), then flow process proceeds to square 336; Otherwise flow process proceeds to square 334.
In square 334, this control module 104 transmits a look-at-me (sync break) to all core 102.The sequential chart of Figure 21 is the example that a non-sleep synchronization request is described.Each core 102 can read this wake events field 244 and detect a synchronous situation be interrupt reason.Flow process has proceeded to square 334, and in the case, when core 102 writes its synchronization request, core 102 is selected not enter sleep state.Although this kind of situation does not make core 102 obtain the benefit same with when entering sleep state (such as, wake up) simultaneously, but it has makes core 102 wait for that its core 102 synchronously required of last write is without the need to waking up simultaneously, continues the potential advantages of processing instruction.Flow process ends at square 334.
In square 336, this control module 104 is waken up by all core 102 simultaneously.In one embodiment, this control module 104 is opened into this clock signal 122 of all core 102 exactly in the same clock period.In another embodiment, this control module 104 opens this clock signal 122 to all core 102 with an interlace mode.That is, this control module 104 is opening the predetermined quantity of clock signal 122 to each internuclear introducing one clock period (such as, clock order is ten or 100).But clock signal 122 staggered (staggering) is opened and is considered in the present invention simultaneously.For reducing the possibility of power loss spike when all core 102 is waken up, the staggered unlatching of clock signal 122 is useful.In another embodiment again, during in order to reduce the possibility of power loss spike, this control module 104 is opened into the clock signal 122 of all core 102 in the same clock period, but clock signal 122 be provided by being initially in a frequency reduced and improving frequency under target frequency, performing in one intermittent (stuttering) or compacting (throttled) mode.In one embodiment, this synchronization request is issued as the execution result of the micro-code instruction of this core 102, and this microcode is designed at least some synchronous situation value, and this microcode position of specifying this synchronous situation value is unique.For example, in microcode, only a place comprises a synchronous x request, and in microcode, only a place comprises a synchronous y request, and the rest may be inferred.In these cases, because all core 102 is waken up in identical place, it can make Microcode Design personnel design more efficient and flawless procedure code, and it is useful for therefore waking up simultaneously.In addition, re-establish and repair when trial because multinuclear interacts and occur mistake, but when then there is not mistake when single core runs, it may be useful especially for waking up for the purpose of debug simultaneously.Fig. 5 and Fig. 6 is this example of display.In addition, this control module 104 stops all synchronization request (such as, removing the S position 222 in the synchronous working storage 108 of each core 102) of just carrying out.Flow process ends at square 336.
One advantage of embodiment described herein is that it significantly can reduce the quantity of microcode in a microprocessor, because checking compared with circulating (looping) or performing other with the operation between synchronous multinuclear, microcode in each core can write synchronization request simply, enter sleep state, and know that when all core is waken in same place up in microcode.The microcode purposes of this synchronization request mechanism will be described in below.
polycrystal microprocessor
Please refer to Fig. 4, is a calcspar of another embodiment microprocessor 100 of display.Microprocessor 100 in Fig. 4 is similar to the microprocessor 100 of Fig. 1 in many aspects, wherein a polycaryon processor and core 102 all similar.But the embodiment of Fig. 4 is a polycrystal configuration.That is, this microprocessor 100 comprises and being arranged in a common packaging body (common package) and the multiple semiconductor crystal 406 communicated with another crystal via a crystal internal bus 404.The embodiment of Fig. 4 comprises two crystal 406, the crystal B406B being labeled as crystal A406A and being coupled by bus between crystal 404.In addition, each crystal 406 comprises bus unit 402 between a crystal, and between crystal, bus unit 402 contacts bus 404 between respective crystal 406 to this crystal.Further, each crystal 406 comprises the control module 104 in the non-core 103 being coupled to bus unit 402 between respective core 102 and crystal.In the fig. 4 embodiment, crystal A406A comprises four core 102-core A102A, core B102B, core C102C and core D102D, and wherein above-mentioned four cores 102 are coupled to the control module A104A that is coupled to bus unit A402A between a crystal; Similarly, crystal B406B comprises four core 102-core E102E, core F102F, core G102G and core H102H, and wherein above-mentioned four cores 102 are coupled to the control module B104B that is coupled to bus unit B402B between a crystal.Finally, each control module 104 is not only included in a synchronous working storage 108 of each core in this crystal 406 comprising itself, also a synchronous working storage 108 of each core in another crystal 406 is comprised, wherein, the synchronous working storage 108 in another crystal 406 above-mentioned is the shadow working storage (Shadow register) shown in Fig. 4.Therefore, each control module in embodiment illustrated in fig. 4 comprises eight synchronous working storages 108, is expressed as 108A, 108B, 108C, 108D, 108E, 108F, 108G and 108H.At control module A104A, synchronous working storage 108E, 108F, 108G and 108H are shadow working storage, and in control module B104B, synchronous working storage 108A, 108B, 108C, 108D are shadow working storage.
When a value is written to its synchronous working storage 108 by a core 102, the control module 104 in the crystal 406 of core 102, via bus 404 between bus unit between crystal 402 and crystal, writes shadow working storage 108 corresponding in this value to another crystal 406.In addition, if when core position 236 of stopping using is set in this value being transmitted to the synchronous working storage 108 of shadow, this control module 104 also upgrades activation position 254 corresponding in configuration working storage 112.In this way, even microprocessor 100 caryogamy put be dynamic change when (such as, Figure 14 to Figure 16), the generation of a synchronous situation (comprising across the generation of crystal (trans-die) synchronous situation) can be detected.In one embodiment, between crystal, bus 404 is buses of a relative low speeds, and this propagation can be adopted as the clock period order of predetermined quantity 100 core, and each control module 104 comprises a status mechanism, its time of taking a predetermined quantity to detect the generation of this synchronous situation, and opens all cores 102 in this clock signal to respective crystal 406.More preferably say, after control module 104 starts write value to another crystal 406 (such as, bus 404 between the crystal be awarded), control module 104 in local crystal 406 (such as, comprise the crystal 406 of write core 102) be configured to postpone to upgrade this local synchronization working storage until the time (such as, the summation of travel-time quantity and status mechanism synchronous situation generation detecting time quantity) of a predetermined quantity.In this kind of mode, the control module 104 in two crystal detects the generation of a synchronous situation simultaneously, and in two crystal 406, is opened into the clock signal of all core 102 simultaneously.When trial re-establish and repair only because multinuclear interact and occur, but when a single core just runs occur wrong time, may be useful especially for the purpose of debug.Fig. 5 and Fig. 6 describes the embodiment that may utilize this functionality advantage.
debugging operations
The core 102 of microprocessor 100 is configured to perform independent adjustment operation, the breakpoint (Breakpoint) of such as instruction execution and data access.In addition, microprocessor 100 is configured to perform the debugging operations for across core (trans-core), and such as, this debugging operations is relevant to this microprocessor 100 more than one core 102.
Refer to Fig. 5, it is that display microprocessor 100 operates with the process flow diagram of dump (dump) debugging (debug) information.This operation is described by the angle from a single core, but in microprocessor 100, each core 102 describes the state of the common dump microprocessor 100 of operation according to it.More particularly, Fig. 5 describes a core and receives request with the operation of dump Debugging message, and its flow process starts from square 502, and the operating process of other core 102 starts from square 532.
In square 502, one of them reception one request of core 102 is with dump Debugging message.More preferably say, above-mentioned adjustment information comprises the state of this core 102 or one subset.More preferably say, adjustment information by dump to system storage or by adjusting the external bus of monitoring of tools, similarly be a logic analyzer.Respond this request, core 102 transmits a debugging dump information to other core 102 and transmits the internuclear look-at-me of other core 102 1.More preferably say, in during interrupting being at this moment deactivated (such as, this microcode does not allow itself to be interrupted), core 102 stops microcode to respond this request with dump Debugging message (in square 502), or respond above-mentioned look-at-me (at square 532), and remaining in microcode, until square 528.In one embodiment, core 102 need interrupt when it is in sleep state and is positioned at framework instruction boundaries.In one embodiment, various internuclear information described herein (similarly be similarly be information in square 702,1502,2606 and 3206 at square 502 and other) is transmitted via this synchronous situation of synchronous working storage 108 control word or C-state field 226 and is received.In other embodiments, internuclear information is transmitted via the special random access memory 116 of non-core and is received.Flow process proceeds to square 504 from square 502.
In square 532, other core 102 one of them (in square 502, such as, receiving the core 102 outside this debugging dump request core 102) is interrupted and receives this debugging dump information due to the internuclear look-at-me that transmits in square 502 and information.As mentioned above, although the flow process in square 532 is described by the angle of single core 102, each other core 102 (such as, the core 102 not in square 502) is interrupted at square 532 and receives this information, and performs the step of square 504 to 528.Flow process proceeds to square 504 by square 532.
In square 504, core 102 writes the synchronization request of a synchronous situation 1 (being denoted as SYNC1 in Figure 5) in its synchronous working storage 108.Therefore, this control module 104 makes core 102 enter sleep state.Flow process proceeds to square 506.
In square 506, when all core has write SYNC1, core 102 waken up by control module 104.Flow process proceeds to square 508.
In square 508, its state of core 102 dump is in storer.Flow process proceeds to square 514.
In square 514, core 102 writes a SYNC2, and it causes control module 104 to make core 102 enter sleep state.Flow process proceeds to square 516.
In square 516, when all core has write SYNC2, core 102 waken up by control module 104.Flow process proceeds to square 518.
In square 518, core 102 dump this storage address of Debugging message in square 508 sets a flag (flag), resets (Reset) signal and maintains, then reset itself by one.Core 102 resets microcode, and this microcode is detected this flag and is again loaded into its state by stored storage address.Flow process proceeds to square 524.
In square 524, core 102 writes a SYNC3, and it causes control module 104 to make core 102 enter sleep state.Flow process proceeds to square 526.
In square 526, when all core has write SYNC3, core 102 waken up by control module 104.Flow process proceeds to square 528.
In square 528, this core 102 is removed replacement by the state be again loaded into based on this in square 518, and starts to extract framework (such as, x86) instruction.Flow process ends at square 528.
Please refer to Fig. 6, it is the operation example sequential chart of display one according to microprocessor 100 in Fig. 5 process flow diagram.In this example, microprocessor 100 configuration has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.But should will be understood that, in other embodiments, microprocessor 100 can comprise the core 102 of varying number.At this moment in sequence figure, the process of event-order serie is as described below.
Core 0 receives a debugging dump request, and transmits a debugging dump information and interrupting information to core 1 and core 2 (each square 502) with responsively.This core 0 then writes a SYNC1, and enters sleep state (each square 504).
Each core 1 and core 2 are finally by being interrupted in its current task and reading its information (each square 532).Responsively, each core 1 and core 2 write a SYNC1 and enter sleep state (each square 504).As shown in the figure, the time of each core write SYNC1 may be different, such as, because this instruction performs when this interruption is established.
When all core has write SYNC1, control module 104 has waken all core (each square 506) up simultaneously.Each core then its state of dump, to storer (each square 508), writes a SYNC2 and enters sleep state (each square 514).Need the time quantum possibility of this state of dump different; Therefore, may be different in the time of each core write SYNC2, as shown in the figure.
When all core has write SYNC2, control module 104 has waken all core (each square 516) up simultaneously.Each core then resets itself and by being again loaded into its state (each square 518) in storer, write SYNC3 also enters sleep state (each square 524).As shown in the figure, need reset and be again loaded into the time quantum of state may be different; Therefore, may be different in the time of each core write SYNC3.
When all core has write SYNC3, control module 104 has waken all core (each square 526) up simultaneously.Each core then starts to extract framework instruction (each square 528) at interrupted time point.
Tradition solution of synchronous operation between multiprocessor uses software signal amount (semaphore).But traditional solution shortcoming is that it cannot provide time grade synchronous (Clock-level Synchronization).The advantage of embodiment described herein is that control module 104 can open clock signal 122 to all core 102 simultaneously.
In method as above, the configurable core 102 of slip-stick artist of an adjustment microprocessor 100 one of them periodically to produce supervision time point, it is in order to produce debugging dump request, for example, after the instruction of a predetermined quantity performs.When microprocessor 100 operationally, slip-stick artist obtains all activities in record shelves on microprocessor 100 external bus.Can be provided to a software simulator close to bus by the record shelves part discovering time of origin, it is simulated this microprocessor 100 and debugs to help slip-stick artist.The simulation of this simulator performs the instruction indicated by each core 102, and simulates the execution that external microprocessor 100 bus uses record information.In one embodiment, the simulator of all core 102 starts from resetting point by one simultaneously.Therefore, all cores 102 of this microprocessor 100 stop resetting (such as, after SYNC2) in fact is at one time have higher effect.In addition, by having stopped its current task (such as at other core 102 all, after SYNC1) before, when waiting for its state of dump, debugging can not be performed (such as with other core by its state of core 102 dump, shared storage bus or speed buffering influence each other) procedure code and/or hardware interfere with each other, it can increase and regenerates mistake and the possibility judging its reason.Similarly, until all core 102 has completed again be loaded into its state (such as, after SYNC3), wait for start to extract framework instruction, again be loaded into state by a core 102 and can not perform with other core the procedure code debugged and/or hardware interferes with each other, it can increase and regenerates mistake and the possibility judging its reason.
These benefits provide advantage more more than existing method, and its existing method is as US Patent No. 8,370,684, and it is incorporated in this on the whole as a reference from all objects, and it cannot enjoy the benefit that can obtain this synchronization request core.
speed buffering control operation
The core 102 of microprocessor 100 is configured to perform independently speed buffering control operation, similarly is at local cache memory, such as, can't help two or more cores 102 the high-speed buffer shared.In addition, microprocessor 100 is configured to perform the speed buffering control operation for across core (Trans-core), such as, relevant to microprocessor 100 more than one core 102, and such as, because it is relevant to a shared cache memory 119.
Refer to Fig. 7 A ~ 7B, it is that display microprocessor 100 is in order to perform the process flow diagram across the control operation of core speed buffering.The embodiment of Fig. 7 A ~ 7B describes microprocessor 100 and how to perform an x86 framework and write back invalid buffering (Write Back and Invalidate Cache, WBINVD) instruction.The core 102 of one WBINVD instruction instruction execution instruction writes back all amendments in microprocessor 100 cache memory and walks to system storage and cache memory was lost efficacy, or empties (Flush).This WBINVD instruction also indicates this core 102 to issue the special bus cycles directly to refer in microprocessor 100 by any cache memory outside, to write back the data that it has been revised, and makes above-mentioned data failure.Aforesaid operations is with described by the angle of a single core, but each core 102 of microprocessor 100 jointly writes back and revised cache line (Modified cache line) according to the operation of this instructions and made the cache memory of microprocessor 100 invalid.Further illustrate, Fig. 7 A ~ 7B describes the operation that a core runs into WBINVD instruction, and its flow process starts from square 702, and the flow process of other core 102 starts from square 752.
In block 702, core 102 one of them run into a WBINVD instruction.Responsively, core 102 transmits a WBINVD command information to other core 102 and transmits an internuclear look-at-me to other core 102 above-mentioned.More preferably say, until flow process proceeds to square 748/749, core 102 during the time, look-at-me was deactivated in (such as, this microcode does not allow itself to be interrupted), stop microcode using the response (in block 702) as WBINVD instruction, or using the response as this look-at-me (in square 752), and maintain in microcode.Flow process proceeds to square 704 from square 702.
In square 752, other core 102 one of them (such as, except in block 702 run into a core except this WBINVD instruction core 102) is owing to being interrupted by this internuclear look-at-me of transmitting and receiving this WBINVD command information in block 702.As mentioned above, although flow process is described by the angle of single core 102 at square 752, but each other core 102 (such as, not being core 102 in block 702) is interrupted and receives this information in square 752, and performs the step of square 704 to square 749.Flow process proceeds to square 704 by square 752.
In square 704, this core 102 writes the synchronization request (being denoted as SYNC4 in Fig. 7 A ~ 7B) of a synchronous situation 4 in its synchronous working storage 108.Therefore, control module 104 makes core 102 enter sleep state.Flow process proceeds to square 706.
In block 706, when all core 102 has write SYNC4, this core 102 waken up by control module 104.Flow process proceeds to square 708.
In block 708, core 102 writes back and local cache memory was lost efficacy, and such as, can't help the 1st grade of (Level-1, L1) cache memory that core 102 shares with other core 102.Flow process proceeds to frame 714.
In square 714, core 102 writes a SYNC5, and it causes control module 104 to make core 102 enter sleep state.Flow process proceeds to square 716.
In square 716, when all core 102 has write SYNC5, core 102 waken up by control module 104.Flow process proceeds to decision block 717.
In decision block 717, core 102 judge its whether by block 702 run into this WBINVD instruction core 102 (with in square 752 the core 102 of this WBINVD command information of reception contrast).If so, then flow process proceeds to square 718; Otherwise flow process proceeds to square 724.
In square 718, core 102 writes back and shared scratch pad memory 119 was lost efficacy.In one embodiment, microprocessor 100 comprises multiple wafer at multiple core but and in not all core, the core of microprocessor 100 102 shares a cache memory, as mentioned above.In this embodiment, the intermediary operation (scheming not shown) be similar in square 717 to square 726 is performed, it is for writing back and make shared buffer out of memory by one of them executions of core 102 in the wafer, and other (multiple) core of this wafer is got back to and is similar to sleep state in square 724 with wait until this cache miss.Flow process proceeds to square 724.
In square 724, core 102 writes a SYNC6, and it causes control module 104 to make core 102 enter sleep state.Flow process proceeds to square 726.
In square 726, when all core 102 has write SYNC6, core 102 waken up by control module 104.Flow process proceeds to decision block 727.
In decision block 727, core 102 judge its whether by run in block 702 WBINVD instruction core 102 (with in square 752 the core 102 of this WBINVD command information of reception contrast).If so, then flow process proceeds to square 728; Otherwise flow process proceeds to square 744.
In square 728, core 102 issues the specific bus cycles to cause outside high-speed buffer to be write back and to make outside high-speed buffer lose efficacy.Flow process proceeds to square 744.
In square 744, write a SYNC13, it causes control module 104 to make core 102 enter sleep state.Flow process proceeds to square 746.
In square 746, when all core 102 has write SYNC13, core 102 waken up by control module 104.Flow process proceeds to decision block 747.
In decision block 747, core 102 judge its whether by run in block 702 WBINVD instruction core 102 (with in square 752 the core 102 of this WBINVD command information of reception contrast).If so, then flow process proceeds to square 748; Otherwise flow process proceeds to square 749.
In square 748, core 102 completes WBINVD instruction, and it comprises the WBINVD instruction of resignation (retire), and can comprise the entitlement (see Figure 20) abandoning a hardware semaphore.Flow process ends at square 748.
At square 749, before core 102 is interrupted in square 752, core 102 recovers to continue its task 102 just performed at square 749.Flow process ends at square 749.
Consult Fig. 8, it is that display was schemed according to the time sequential routine of the microprocessor 100 of Fig. 7 A ~ 7B process flow diagram.In this example, microprocessor 100 configuration has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.But should will be understood that, in other embodiments, microprocessor 100 can comprise the core 102 of varying number.
Core 0 runs into a WBINVD instruction and responds transmission one WBINVD command information, and interrupts core 1 and core 2 (each square 702).Core 0 is then write a SYNC4 and is entered sleep state (each square 704).
Each core 1 and core 2 are finally interrupted and read this information (each square 752) from its current task.Responsively, each core 1 and core 2 write a SYNC4 and enter sleep state (each square 704).As shown in the figure, the time of each core write SYNC4 may be different.
When all core has write SYNC4, control module 104 has waken all core (each square 706) up simultaneously.Each core then writes back and makes its specific cache miss (each square 708), and write SYNC5 also enters sleep state (each square 714).Need write back and make the time quantum possibility of cache miss different, therefore, may be different in the time of each core write SYNC5, as shown in the figure.
When all core has write SYNC5, control module 104 has waken all core (each square 716) up simultaneously.The core only running into WBINVD instruction writes back and shared cache memory 119 was lost efficacy (each square 718), and all core writes SYNC6 and enters sleep state (each square 724).Because only a core writes back and makes shared cache memory 119 lose efficacy, therefore the time of each core write SYNC6 may be different.
When all core has write SYNC6, control module 104 has waken all core (each square 726) up simultaneously.The core only running into WBINVD instruction completes WBINVD instruction (each square 748), and other core all recovers the process before interrupting.
Should will be understood that, although the embodiment that speed buffering steering order is an x86WBINVD instruction is described, other embodiment can assumes synchronization request be used to perform other speed buffering instruction.Such as, microprocessor 100 can perform similar action, to make perform an x86INVD instruction without the need to writing back high speed buffer data (at square 708 and 718) and make high-speed buffer lose efficacy simply.Again for another example, speed buffering steering order can be obtained by the instruction set architecture more more not identical than x86 framework.
power management operations
The operation performing the minimizing of each power is configured at the core 102 of microprocessor 100, such as, but be not limited to, stop performing instruction, Request Control unit 104 stopping transmits clock signal to core 102, Request Control unit 104 by the power supply removing core 102, write back and this locality making core 102 (such as, unshared) cache miss and store state to external memory storage of core 102, as special random access memory 116.When the power reduction operations that one or more core of core 102 executed is specified, it has entered one " core " C-state (being also referred to as a core idle state or core sleep state).In one embodiment, C-state value roughly can correspond to known ACPI (Advanced Configuration and Power Interface, ACPI) specification processor state, but also can comprise meticulousr granularity (Granularity).Generally speaking, a core 102 by the core C-state that enters with response from the request of aforesaid operations system.For example, x86 framework monitors waits for that (MWAIT) instruction is a power management instruction, and it provides a prompting, i.e. a target C-state, entering an optimized state to the core 102 performing instruction to allow microprocessor 100, similarly is a lower-wattage consume state.When a MWAIT instruction, target C-state is exclusive (proprietary) and non-ACPI C-state.Core C-state 0 (C0) is corresponding to the running status of core 102 and the activity that reduces gradually of the value correspondence that C-state increases gradually or responsive state (as states such as C1, C2, C3).One response reduced gradually or active state refer to that more multi-activity or responsive state save configuration or the mode of operation of more power relative to one, or reduce the configuration of response or mode of operation (such as, have one longer wake delay up, lessly to enable completely) for a certain reason and relatively.The example that one core 102 may save power operation is execution, stopping transmission clock signal, the reduction voltage of halt instruction and/or removes the part (such as, functional unit and/or local high-speed buffer) of core or the power supply of whole core.
In addition, microprocessor 100 is configured to perform the power reduction operations across core.Involve or affect multiple cores 102 of microprocessor 100 across core power reduction operations.For example, sharing cache memory 119 can be large and that relative consumption is a large amount of power.Therefore, significant power is saved by removing the clock signal that is sent to shared cache memory 119 and/or power supply is reached.But in order to remove clock signal to shared cache memory 119 and/or power supply, the core 102 of all shared cache memories must be agreed to make the consistance of data be maintained.Embodiment considers that this microprocessor 100 comprises the relevant resource of other shared power supply, similarly is shared clock pulse and power supply.In one embodiment, microprocessor 100 is coupled to the System on chip group comprising a Memory Controller, peripheral controllers and/or power source management controller.In other embodiments, one or more controller is integrated in microprocessor 100.System power saving can make controller take the action of power saving to reach by microprocessor 100 notification controller.For example, microprocessor 100 can make the cache miss of microprocessor and cuts out by notification controller, need not be investigated to make it.
Except the concept of a core C-state, in general microprocessor 100 has the C-state (being also referred to as an encapsulation idle state or encapsulation sleep state) of one " encapsulation ".Minimum (such as, peak power consumption) common core C-state of the corresponding core 102 of encapsulation C-state (such as, referring to the square 318 of field 246 in Fig. 2 and Fig. 3).But except the specific power reduction operations of core, encapsulation C-state relates to and performs one or more microprocessor 100 across core power reduction operations.Relevant to encapsulation C-state comprises across core power-save operation example the phase-locked loop (Phase-locked-loop that closedown one produces clock signal, PLL), and empty this shared cache memory 119, and stopping its clock pulse and/or power supply, it makes storer/peripheral control unit avoid this locality investigating microprocessor 100 to share cache memory.Other example for a change voltage, frequency and/or bus clock pulse ratio, reduce the size of cache memory, as shared cache memory 119, and run with the speed of half and share cache memory 119.
In many cases, operating system is by effectively in order to perform instruction in independent core 102, therefore independent core can be made to enter sleep state (such as, to a core C-state), but not there is the mode directly making microprocessor 100 enter sleep state (such as, to encapsulation C-state).Valuably, the core 102 describing microprocessor 100 in an embodiment works with working in coordination under the help of control module 104, also prepares the power-save operation across core is occurred when all core 102 has entered core C-state to detect.
Refer to Fig. 9, it is the operational flowchart that display microprocessor 100 enters a low-power encapsulation C-state.The embodiment of Fig. 9 describes microprocessor 100 and is coupled to a wafer set and uses the example that MWAIT instruction performs.But, should will be understood that, in other embodiments, operating system adopts other power management instruction and main core 102 to communicate mutually with the controller be integrated in microprocessor 100, and uses different (Handshake) agreement of shaking hands to describe.
This operation describes with the angle of a single core, but each core 102 of this microprocessor 100 may run into MWAIT instruction and jointly make microprocessor 100 enter optimum condition according to the operation of this instructions.Flow process starts from square 902.
In square 902, a core 102 runs into the MWAIT instruction that is used to specify target C-state, is denoted as Cx in fig .9, and wherein x is a nonnegative integral value.Flow process proceeds to square 904.
In square 904, core 102 writes a C position 224 and to gather and C-state field 226 value is that the synchronization request of x (being denoted as SYNC Cx in fig .9) is to its synchronous working storage 108.In addition, synchronization request specifies core 102 to be waken up in all wake events in its wake events field 204.Therefore, control module 104 makes core 102 enter sleep state.More preferably say, core 102 is before write SYNC Cx, and core 102 first writes back and the local cache memory making it write lost efficacy.Flow process proceeds to square 906.
In square 906, when all core 102 has write a SYNC Cx signal, core 102 controlled unit 104 has been waken up.As mentioned above, the x value write by other core 102 may be different, and control module 104 sends in minimum conventional C-state value to the minimum conventional C-state field 246 of state working storage 106 status word 242 (each square 318).Before square 906, and when core 102 is in sleep state, it can be waken up by a wake events, similarly is a look-at-me (such as, square 305 and 306).More particularly, but do not ensure that this operating system will perform the MWAIT instruction of all core 102, it can allow to occur (such as at a wake events, interrupting) instruction core 102 is before one of them cancels MWAIT instruction effectively, and microprocessor 100 performs the power-save operation relevant with encapsulating C-state.But, in square 906, once core 102 is waken up, in during clock pulse interrupts stopping using (such as, microcode does not allow itself to be interrupted), core 102 (in fact, all cores 102) still performs microcode due to the MWAIT instruction of (in square 902), and maintaining in microcode, until square 924.In other words, although small part has received MWAIT instruction to enter sleep state in all core 102, independent core 102 can be in sleep state, and the microprocessor 100 as an encapsulation can not indicate this wafer collection, and it has prepared to enter an encapsulation sleep state.But, once all core 102 has been agreed to enter an encapsulation sleep state, it is indicated effectively by the generation of synchronous situation in square 906, main core 102 is allowed to complete an encapsulation sleep state Handshake Protocol with wafer set (such as, square 908,909 and following 921), and be not interrupted and do not have other core 102 any to be interrupted.Flow process proceeds to decision block 907.
In decision block 907, core 102 judges that whether it is the main core 102 of microprocessor 100.More preferably say, if judge when the time of reseting, it was as BSP, a core 102 is main core 102.If when this core is main core, flow process proceeds to square 908; Otherwise flow process proceeds to square 914.
In square 908, main core 102 writes back and shared cache memory 119 was lost efficacy, and then communicates with this wafer collection that can take appropriate action to reduce power consumption.For example, during owing to being in encapsulation C-state when microprocessor 100, Memory Controller and/or peripheral control unit all maintain inefficacy, and therefore Memory Controller and/or peripheral control unit can be avoided detecting this locality of microprocessor 100 and shared cache memory.Illustrate for another example, this wafer set can make microprocessor 100 take power-save operation (establishment x86-style STPCLK, SLP, DPSLP, NAP, VRDSLP signal such as, as described below) by signal transmission to microprocessor 100.More preferably say, core 102 carries out the communication of power management information based on minimum conventional C-state field 246 value.In one embodiment, core 102 is issued an I/O and is read the power management message that the bus cycles to one provide wafer set relevant, such as, and the I/O address of encapsulation C-state value.Flow process proceeds to square 909.
In square 909, main core 102 waits for that wafer set establishes (assert) STPCLK signal.More preferably say, if when STPCLK signal is not established after the clock cycle that a predetermined number is bright, control module 104, after its synchronization request of just carrying out of termination, is detected this situation, is waken all core 102 up and indicate this mistake in error code field 248.Flow process proceeds to square 914.
In square 914, this core 102 writes a SYNC14.In one embodiment, this synchronization request specifies this core 102 not to be waken up in any wake events in its wake events field 204.Therefore, control module 104 makes core 102 enter sleep state.Flow process proceeds to square 916.
In square 916, when all core 102 has write a SYNC14, core 102 waken up by control module 104.Flow process proceeds to decision block 919.
In decision block 919, core 102 judges that whether it is the main core 102 of microprocessor 100.If so, then flow process proceeds to square 921; Otherwise flow process proceeds to square 924.
In square 921, main core 102 sends stopping permission (grant) cycle in microprocessor 100 bus, and to notify this wafer set, it may be taked across core (such as, package perimeter) relevant power-save operation overall to microprocessor 100, it similarly is the investigation avoiding microprocessor 100 cache memory, remove bus clock pulse (such as, x86-type BCLK) to microprocessor 100, and establish other signal in the bus (such as, x86-type SLP, DPSLP, NAP, VRDSLP), clock pulse and/or the power supply various piece to microprocessor 100 is removed to make microprocessor 100.Although the embodiment be described in herein relates to microprocessor 100 and and reads the Handshake Protocol (in square 908) between relevant wafer collection to I/O, the establishment (in square 909) of STPCLK, and stop the issue (in square 921) in permission cycle, it has history relevant to x86 architecture system, should will be understood that, it is relevant that other embodiment supposes to have different agreement instruction set architecture system to other, but also can save electric energy, raising performance and/or reduce complexity.Flow process proceeds to square 924.
In square 924, core 102 writes a Sleep Request (position 212 of such as, sleeping is setting (set) and S position 222 is the Sleep Request removing (clear)) to synchronous working storage 108.In addition, synchronization request indicates core 102 to be only waken up in the non-establishment wake events of STPCLK (that is, wakeup event of the de-assertion of STPCLK removes the wake events of the STPCLK established) at its wake events field 204.Therefore, control module 104 makes core 102 enter sleep state.Flow process ends at square 924.
Refer to Figure 10, it is display operates embodiment sequential chart according to Fig. 9 process flow diagram microprocessor 100.In this example, microprocessor 100 configuration has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.But should will be understood that, in other embodiments, microprocessor 100 can comprise the core 102 of varying number.
Core 0 runs into the MWAIT instruction (MWAIT C4) (each square 902) of an appointment C-state 4.Core 0 is then write a SYNC C4 and is entered sleep state (each square 904).Core 1 runs into the MWAIT instruction (MWAIT C3) (each square 902) of an appointment C-state 3.Core 1 is then write a SYNC C3 and is entered sleep state (each square 904).Core 2 runs into the MWAIT instruction (MWAIT C2) (each square 902) of an appointment C-state 2.Core 2 is then write a SYNC C2 and is entered sleep state (each square 904).As shown in the figure, may be different in the time of each core write SYNC Cx.In fact, before some other events occur, such as one interrupts, and one or more is endorsed and can not run into a MWAIT instruction.
When all core has write SYNC Cx, control module 104 has waken all core (each square 906) up simultaneously.Main core then sends I/O and reads the bus cycles (each square 908), and waits for the establishment (every square 909) of STPCLK.All core writes a SYNC14, and enters sleep state (each square 914).Empty (Flush) owing to only having main core and share cache memory 119, send I/O and read the bus cycles and wait for that STPCLK establishes, therefore the time of each core write SYNC14 may be different, as shown in the figure.In fact, main core after other core can hundreds of microsecond be sequentially written in SYNC14.
As all core write SYNC14, control module 104 wakes all core (each square 916) up simultaneously.Only a main core sends and stops permission cycle (Stop grant cycle) (each square 921).All core is written in a Sleep Request of wait in the non-establishment signal (~ STPCLK) of STPCLK and enters sleep state (each square 924).Because only main core sends the stopping permission cycle, therefore the time of each core write Sleep Request may be different, as shown in the figure.
When STPCLK signal is removed establishment (de-asserted), control module 104 wakes all core up.
Be can be observed by Figure 10, when core 0 performs Handshake Protocol, core 1 and core 2 valuably can the one effective periods of dormancy.However, it is noted that microprocessor 100 need be waken from encapsulation sleep state up the required time is directly proportional when sleep state (such as, save great power) usually to dormancy time length.Therefore, when encapsulating sleep state and being relatively of a specified duration (or even wherein independent core 102 sleep state time is longer), it is desirable to reduce further the generation that wakes up and/or relevant to Handshake Protocol needed for time of waking up.Figure 11 describes the Handshake Protocol of single core 102 process, and another core 102 keeps a dormant embodiment.In addition, according in the embodiment of Figure 11, save power and obtain by reducing response one wake events and core 102 quantity that is waken up further.
Refer to Figure 11, it is that microprocessor 100 enters the operational flowchart of a low-power encapsulation C-state according to another embodiment of the present invention.The embodiment of Figure 11 uses microprocessor 100 to be coupled to the example that in wafer set, MWAIT instruction performs and is described.But should will be understood that, in other embodiments, operating system adopts other power management instruction, and finally synchronous core 102 be integrated in microprocessor 100, and adopt and communicate with the controller of the different Handshake Protocol of description.
The embodiment of Figure 11 is similar to the embodiment of Fig. 9 in some respects.But enter low-down power rating at existing operations system request microprocessor 100 and tolerate in the environment with its correlation delay, the embodiment of Figure 11 is designed to be convenient to save potential larger power.More specifically, the embodiment of Figure 11 is conducive to controlling power to core and where necessary, as process interrupt time, to wake in core an only core up.Embodiment considers the operation being supported in two patterns in Fig. 9 and Figure 11 at this microprocessor 100.In addition, pattern is configurable, no matter is manufacturing (such as, by fuse 114) and/or automatically determining according to the specific C-state specified by MWAIT instruction via software control or by microprocessor 100.Flow process starts from square 1102.
In square 1102, core 102 runs into the MWAIT instruction (MWAIT Cx) being used to specify target C-state, and it is expressed as Cx in fig. 11, and flow process proceeds to square 1104.
In square 1104, it be set and C-state field 226 value is that the synchronization request of x (it is denoted as SYNC Cx in fig. 11) is in its synchronous working storage 108 that core 102 writes a C position 224.Synchronization request is also provided with selective wake-up (SEL WAKE) position 214 and PG position 208.In addition, synchronization request indicates core 102 to be waken up in all wake events in its wake events field 204, outside the establishment of STPCLK and the non-establishment (~ STPCLK, that is, the releasing of STPCLK is established) of STPCLK.(more preferably say there is other wake events, as AP start time, this synchronization request specify core 102 be not waken up).Therefore, control module 104 makes core 102 enter sleep state, and it comprises stoping because PG position 208 is set up provides power to core 102.In addition, core 102 writes back and makes local cache memory invalid, and stores the state of (being preferably special random access memory 116) its core 102 before write synchronization request.When core 102 is waken up subsequently (such as, at square 1137,1132 or 1106), (such as, from PRAM116) is recovered its state by core 102.As mentioned above, particularly relative to Fig. 3, when last core 102 write one there is the synchronization request that selective wake-up position 214 arranges time, except finally writing core 102, this control module 104 automatically can stop all wake events (each square 326) of all core 102.Flow process proceeds to square 1106.
In square 1106, when all core 102 has write a SYNC Cx, control module 104 has waken the core 102 of last write up.As mentioned above, the S position 222 that control module 104 maintains other core 102 is arranged, even if control module 104 wakes the core 102 of last write up and removes S position.Before square 1106, when core 102 is in sleep state, it can be waken up by a wake events, as one interrupts.But, be waken up in square 1106 once core 102, core 102 still performs microcode because of MWAIT instruction (square 1102), and during interruption is deactivated in (such as, this microcode does not allow itself to be interrupted) remain in microcode, until square 1124.In other words, although be no more than all core 102 received a MWAIT instruction to enter sleep state, only independent core 102 meeting dormancy, the microprocessor 100 as encapsulation does not indicate this wafer set, and it has been ready to enter an encapsulation sleep state.But, agree to enter an encapsulation sleep state once all core 102, it is by indicated by the synchronous regime generation of square 1106, core 102 (the core 102 finally write be waken up in square 906, it causes synchronous situation to occur) be allowed to complete with wafer set encapsulate sleep state Handshake Protocol (such as, square 1108,1109 and 1121 as follows) and can not be interrupted, and the core 102 without any other is interrupted.Flow process proceeds to square 1108.
In square 1108, core 102 writes back and shared cache memory 119 was lost efficacy, and then communicate with wafer set, it may take suitable action, to reduce power consumption.Flow process proceeds to square 1109.
In square 1109, core 102 waits for that wafer set is to establish STPCLK signal.More preferably say, if when STPCLK signal is not established after a clock cycle predetermined quantity, control module 104 detects this situation, and wake all core 102 up after its synchronization request of just carrying out of termination, and indicate this mistake in error code field 248.Flow process proceeds to square 1121.
In square 1121, core 102 sends a stopping and allowing the cycle to the wafer set in bus.Flow process proceeds to square 1124.
In square 1124, core 102 writes a Sleep Request, such as, has sleep position 212 for setting (set) and S position 222 are for removing (clear) and PG position 208 are for arranging (set), extremely in synchronous working storage 108.In addition, synchronization request specifies this core 102 to be only waken up in the wake events removing establishment STPCLK in its wake events field 204.Therefore, control module 104 makes core 102 enter sleep state.Flow process proceeds to square 1132.
In square 1132, control module 104 is detected the non-establishment of STPCLK and is waken core 102 up.It should be noted, previous control module 104 wakes core 102 up, and control module 104 does not limit power supply to core 102 yet.It is advantageous that now core 102 is the core uniquely operated, this provides core 102 chance to perform any action that must be performed to make it, and does not have other core 102 to operate.Flow process proceeds to square 1134.
In square 1134, core 102 writes in a working storage (scheming not shown) of control module 104 to untie the wake events of each other core 102 specified in the wake events field 204 of its corresponding synchronous working storage 108.Flow process proceeds to square 1136.
In square 1136, core 102 processes any wake events just carrying out specifying this core 102.For example, in one embodiment, the system comprising microprocessor 100 allows the interruption of oriented (both directed) (such as, point to the interruption of microprocessor 100 1 particular core) and the non-interruption to (non-directed) is (such as, when microprocessor 100 is selected, can interruption handled by arbitrary core 102 of microprocessor 100).The one non-example to interrupting is commonly called one " low priority interrupt ".In one embodiment, microprocessor 100 preferably points to the non-releasing to interrupting at square 1132 and establishes the single core 102 be waken up in STPCLK, because it is waken up, and this interruption can be processed to expect that other core 102 does not have any wake events just carried out, therefore can continue sleep and limit power supply.Flow process turns back to square 1104.
When wake events is removed (unblcked) in square 1134, except the core 102 be waken up in square 1132, the wake events of not specifying as fruit stone 102 carries out, then be conducive to core 102 and keep sleep state, and limit power supply in each square 1104.But when wake events is removed in square 1134, if a wake events of specifying just is processed by core 102, then core will not limit power supply (un-power-gated), and be waken up by control module 104.In the case, different flow processs starts from the square 1137 in Figure 11.
In square 1137, after wake events is removed in square 1134, another core 102 (such as, except removing the core 102 except wake events core 102 in square 1134) is waken up.Other core 102 processes and is anyly just carrying out and pointing to the wake events of other core 102, and such as, process one is interrupted.Flow process proceeds to square 1104 from square 1137.
Refer to Figure 12, it is display operates example sequential chart according to the microprocessor 100 of Figure 11 process flow diagram.In this example, microprocessor 100 configuration has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.But should will be understood that, in other embodiments, microprocessor 100 can comprise the core 102 of varying number.
Core 0 runs into the MWAIT instruction (MWAIT C7) (each square 1102) of an appointment C-state 7.In this example, C-state 7 allows restriction power supply.It is that to arrange (set) (" selective wake-up " as shown in Figure 12) and PG position 208 be the SYNC C7 arranging (set) that core 0 then writes a selective wake-up position 214, and enters sleep state and limit power supply (each square 1104).Core 1 runs into the MWAIT instruction (each square 1102) that an appointment C-state is 7.Then write selectivity wakes position 214 up to core 1 is that to arrange (set) and PG position 208 be the SYNC C7 arranging (set), and enters sleep state and limit power supply (each square 1104).Core 2 runs into the MWAIT instruction (each square 1102) that an appointment C-state is 7.Core 2 then writes that to have selective wake-up position 214 be that to arrange (set) and PG position 208 be the SYNC C7 arranging (set), and enters sleep state and limit power supply (each square 1104).(but in the embodiment being described in square 314 1 the best, the core finally write cannot limit power supply).As shown in the figure, the write of each core may be different with the time of SYNC C7.
When to have selective wake-up position 214 be the SYNC C7 of setting (set) for the core write of last write, this control module 104 stops the wake events (each square 326) of (block off) all last write core, is core 2 at the example of Figure 12.In addition, control module 104 only wakes the core (each square 1106) of last write up, and because of other core prolonged sleep and restriction power supply, and core 2 and wafer set perform Handshake Protocol, therefore can save power.Core 2 then sends I/O and reads the bus cycles (each square 1108), and waits for the establishment (each square 1109) of STPCLK.In order to respond STPCLK, core 2 send stop the permission cycle (each square 1121), and write one have STPCLK remove in wait for PG position 208 be setting (set) Sleep Request and enter sleep state and power-limiting (each square 1124).Above-mentionedly to endorse with dormancy and power-limiting one relatively long period.
When STPCLK cannot establish, control module 104 only wakes core 2 (each square 1132) up.In the example in figure 12, this wafer set cannot establish STPCLK to respond a non-reception to interrupting, and it is forwarded to microprocessor 100.Microprocessor 100 indicates non-to interruption to core 2, and it keeps sleep state because of other core and limits power supply and save power.Core is removed the wake events of other core (each square 1134) and is served non-to interruption (each square 1136).Core 2 then re-writes one, and to have selective wake-up position 214 be that to arrange (set) and PG position 208 be the SYNC C7 arranging (set), and enter sleep state and limit power supply (each square 1104).
Be when to arrange (set) and PG position 208 be the SYNC C7 of setting (set) when core 2 write has selective wake-up position 214, because the synchronization request of other core is still carried out, such as, the S position 222 of other core is not waken up by core 2 and removed, therefore this control module 104 stops (block off) wake events of all core except core 2, such as, core (each square 326) is finally write.In addition, control module 104 only wakes core 102 (each square 1106) up.Core 2 then sends I/O and reads the bus cycles (each square 1108), and waits for the establishment (each square 1109) of STPCLK.In order to respond STPCLK, core 2 sends and stops the permission cycle (each square 1121), and writing one, to have the PG position 208 of waiting in STPCLK cannot establish be arrange the Sleep Request of (set), and enter sleep state and power-limiting (each square 1124).
When STPCLK cannot establish, control module 104 only wakes core 2 (each square 1132) up.In the example in figure 12, STPCLK is established by releasing to interruption because other is non-.Therefore, microprocessor 100 indicates this interruption to core 2, and this can save power.Core 2 remove again other core wake events (each square 1134) and serve this non-to interruption (each square 1136).Core 2 then writes one again, and to have selective wake-up position 214 be arrange the SYNC C7 that (set) and PG position 208 be setting (set), and enter sleep state and power-limiting (each square 1104).
This period lasts considerable time, namely only non-to interrupting being produced.Figure 13 is that display one indicates one except finally writing the example of different IPs interrupt processing except core.
Know by comparing Figure 10 and Figure 12, embodiment in fig. 12 advantageously, once core 102 starts to enter sleep state (after writing SYNC C7 in the example in figure 12), only a core 102 is waken up perform Handshake Protocol with wafer set again, and other core 102 keeps sleep, if core 102 is in a sleep state quite grown, then can be a significant advantage.Power is saved may highly significant, particularly when operating system identification in systems in which for single core 102 deal with the work load very little.
In addition, advantageously, be indicated to other core 102 as long as no wake events, then only a core 102 is waken up (to provide service non-to event, similarly being a low priority interrupt).Come again, if core 102 is in a sleep state quite grown, then may have significant advantage.Non-to interruption except relatively infrequently, as USB interrupts, when not having service load especially in systems in which, it can be significant that power is saved.Further, even if when a wake events is indicated to another core 102 (such as, the instruction of interrupt operation system is to a single core 102, similarly be operating system timer interruption), embodiment can the advantageously single core 102 of switching at runtime, it performs encapsulation sleep state agreement and serves non-to wake events, as shown in figure 13, to enjoy the benefit waking an only single core 102 up.
Refer to Figure 13, it is display operates example sequential chart according to the microprocessor 100 of Figure 11 process flow diagram.The example of Figure 13 is similar to the example of Figure 12 in many aspects.But, removed in the first example of establishment at STPCLK, this interruption be a sensing core 1 interruption (instead of in Figure 12 example one non-to interruption).Therefore, control module 104 wakes core 2 (each square 1132) up, and wakes core 1 up after then being removed (each square 1134) by core 2 at wake events.Core 2 then writes one again, and to have selective wake-up position 214 be arrange the SYNC C7 that (set) and PG position 208 be setting (set), and enter sleep state and power-limiting (each square 1104).
(each block 1137) is interrupted in core 1 service-orientation.Then write has selective wake-up position 214 to core 1 again be arrange the SYNC C7 that (set) and PG position 208 are setting (set), and enter sleep state and power-limiting (each square 1104) in this example, core 2 writes its SYNCC7 before core 1 writes SYNC C7.Therefore, although core 0 still has its S position 222set when it writes initial SYNC C7, core 1 S position 222 when it is waken up still is eliminated.Therefore, when core 2 remove write SYNC C7 after wake events time, and non-final core writes synchronous C7 asks, and on the contrary, core 1 becomes last core and writes synchronous C7 and ask.
When core 1 writes one, to have selective wake-up position 214 be when to arrange (set) and PG position 208 be the SYNC C7 of setting (set), because the synchronization request of core 0 is still carried out (such as, its waking up not by core 1 and core 2 remove), and core 2 (in this instance) has write SYNC14 request, so this control module 104 stops the wake events of all core except core 1, such as, core (each square 326) is finally write.In addition, control module 104 only wakes core 1 (each square 1106) up.Core 1 then sends I/O and reads the bus cycles (each square 1108), and waits for that STPCLK establishes (each square 1109).In order to respond STPCLK, core 1 sends and stops the permission cycle (each square 1121), and the PG position 208 that write has wait STPCLK releasing establishment is the Sleep Request arranging (set), and enter sleep state and power-limiting (each square 1124).
When STPCLK is removed establishment, control module 104 only wakes core 1 (each square 1132) up.In the example in figure 12, STPCLK non-ly removes establishment to interruption due to one; Therefore, microprocessor 100 indicates non-to interruption to core 1, and it can save power.Process the non-period lasts considerable time to interrupting by core 1, that is, only non-to interrupting being produced.In this kind of mode, microprocessor 100 can make nearest interruption be instructed to save power to interruption to core 102 advantageous by instruction is non-, and it is shown in the example switching to the relevant Figure 13 of a different IPs.Core 1 is again removed the wake events (each square 1134) of other core and is served non-to interruption (each square 1136).Core 1 then writes one again, and to have selective wake-up position 214 be arrange the SYNC C7 that (set) and PG position 208 are setting (set), and enter sleep state and power-limiting (each square 1104).
Should will be understood that, although the embodiment that power management instruction is an x86 MWAIT instruction is described, other synchronization request can be considered by the embodiment using to perform power management instruction.For example, microprocessor 100 can perform similar operations has the default I/O port address that different C-state is correlated with by one group reading with response.For another example, power management instruction can be obtained by the instruction set architecture different from x86 framework.
dynamically reconfiguring of polycaryon processor
Each core 102 of microprocessor 100 produces the relevant value of configuration based on the configuration of each core 102 of microprocessor 100.More preferably say, the microcode of each core 102 produces, store and use the value that configuration is relevant.The generation that embodiment describes configuration correlation can be dynamic and useful, and it is described below.The example of configuration correlation includes, but are not limited to following content.
Each core 102 produces an overall nuclear volume relevant to above-mentioned Fig. 2.With only in core 102 core 102 that the core 102 of resident crystal 406 is relevant local nuclear volume 256 compared with, overall nuclear volume refers to the nuclear volume of the overall core 102 relevant to all core of microprocessor 100 102.In one embodiment, core 102 produces overall nuclear volume, and its overall nuclear volume is the product of core 102 quantity and the summation of local nuclear volume 256 thereof of core 102 amount of crystals 258 and each crystal, as follows:
Overall nuclear volume=(nuclear volume of number of crystals × each crystal)+local nuclear volume.
Each core 102 also produces a virtual core quantity.This virtual core quantity is that overall nuclear volume deducts and has one lower than inactive core 102 quantity of the overall nuclear volume of the overall nuclear volume of instant core 102.Therefore, when all cores 102 of this microprocessor 100 can be used, overall nuclear volume is identical with virtual core quantity.But, if one or more core 102 is stopped using, defectiveness time, the virtual core quantity of a core 102 may be different from its overall nuclear volume.In one embodiment, each core 102 inserts the APIC ID field of its virtual core quantity to the APIC ID working storage of its correspondence.But, according to another embodiment (such as, Figure 22 and Figure 23), then do not belong to this kind of situation.In addition, in one embodiment, the renewable APIC ID in APIC ID working storage of operating system.
Each core 102 also produces a BSP flag, and it indicates whether this core 102 is BSP.In one embodiment, in general (such as, when the function of " all core BSP " is in fig 23 stopped using) core 102 specifies this as homing sequence processor (Bootstrap Processor, BSP) and each other core 102 to specify itself be an application processor (Application Processor, AP).After reseting, AP core 102 carries out initialization, then enters sleep state and waits for that BSP notice starts to read and performs instruction.On the contrary, after AP core 102 initialization, BSP core 102 starts to read and the instruction of executive system firmware immediately, and such as, BIOS starts code, it in order to initialization system (such as, verification system storer and peripheral equipment whether normally work and initialization and/or configure them) and guide operating system, such as, be loaded into operating system (such as, be loaded into from disk), and control is transferred to operating system.Before guiding operating system, BSP decision systems configuration (such as, core 102 or logic processor quantity in systems in which), and stored in memory, can read after system configuration starts to make operating system.In operating system after directed, instruction AP core 102 starts to read and executive operating system instruction.In one embodiment, in general (such as, when the function of " amendment BSP " and " BSP of all core " in Figure 22 and Figure 23, when stopping using respectively), if when its virtual core quantity of a core 102 is 0, then specify this as BSP, and other core 102 all specify this as an AP core 102.Best, a core 102 inserts its BSP flag relevant configuration value to the BSP flag bit in the APIC substrate address register of its APIC corresponding.According in an embodiment, as mentioned above, BSP is the main core 102 in square 907 and 919, and it performs the encapsulation sleep state Handshake Protocol of Fig. 9.
Each core 102 also produces the APIC base value for inserting APIC substrate working storage.APIC substrate address based on core 102 APIC ID and produce.In one embodiment, the renewable APIC substrate address in APIC substrate address register of operating system.
Each core 102 also produces a crystal and mainly indicates, and whether it indicates this core 102 to be the main core 102 of the crystal 406 comprising this core 102.
Each core 102 also produces a wafer and mainly indicates, and it indicates whether this core 102 is the main core comprising instant core 102 wafer, wherein supposes that this microprocessor 100 is configured with wafer, and it is described in detail as above.
Each core 102 calculates configuration correlation and operation uses this configuration correlation, makes the system normal operation comprising microprocessor 100.For example, system indicates interrupt request to core 102 based on its relevant APIC ID.APIC ID determines which interrupt request core 102 should respond.Further illustrate, each interrupt request comprises an order ground identifier, and a core 102 only responds an interrupt request (if or this interrupt request identifier be in order to indicate that it is the particular value of all core 102 of a request) when order ground identifier is mated with the APIC ID of core 102.For another example, each core 102 must know whether it is BSP, to make it perform initial BIOS code and guide operating system, and performs encapsulation sleep state Handshake Protocol as described in Figure 9 in one embodiment.Embodiment is described below (consulting Figure 22 and 23), and wherein BSP flag and APIC ID can make an amendment by its normal value because of specific purpose, similarly are for test and/or debugging.
Refer to Figure 14, it is the process flow diagram that display microprocessor 100 dynamically reconfigures.In the explanation of Figure 14, with the polycrystal microprocessor 100 of Fig. 4 as a reference, it comprises two crystal 406 and eight cores 102.But should will be understood that, described dynamically reconfiguring can use the microprocessor 100 with different configurations, namely has more than two crystal or single crystal, and many or be less than eight cores 102 but at least two cores 102.This operation is described by the angle from a single core, but each core 102 of microprocessor 100 operates according to this description with overall dynamics and reconfigures this microprocessor 100.Flow process starts from square 1402.
In square 1402, microprocessor 100 is reset, and the hardware of microprocessor 100 inserts suitable value in the configuration working storage 112 of each core 102 based on the quantity of available core 102 and the amount of crystals that resides at core 104.In one embodiment, local nuclear volume 256 and amount of crystals 258 are hard-wired (hardwired).As mentioned above, hardware can determine that the state of whether being blown by fuse 114 or not blown is enabled or an inactive core 102.Flow process proceeds to square 1404.
In square 1404, core 102 reads configuration words 252 by configuration working storage 112.Core 102 then produces its correlation based on configuration words 252 value read in square 1402.When polycrystal microprocessor 100 configures, the configuration correlation produced in square 1404 will not consider the core 102 of other crystal 406.But the configuration correlation that (and in Figure 15 square 1524) produces in square 1414 and 1424 is by considering the core 102 of other crystal 406, as described below.Flow process proceeds to square 1406.
In square 1406, core 102 makes activation position 254 value of this earth's core 102 in this locality configuration working storage 112 be transmitted to the activation position 254 that far-end crystal 406 configures working storage 112 correspondence.For example, please refer to the configuration of Fig. 4, a core 102 in crystal A406A makes the activation position 254 relevant at the configuration working storage 112 center A, B, C and D (this earth's core) of crystal A406A (local crystal) be transmitted to the activation position 254 relevant with configuration working storage 112 center A, B, C and the D at crystal B406B (far-end crystal).On the contrary, a core 102 in crystal B406B makes the activation position 254 relevant at the configuration working storage 112 center E, F, G and H (this earth's core) of crystal B406B (local crystal) be transmitted to the activation position 254 relevant with configuration working storage 112 center E, F, G and the H at crystal A 406A (far-end crystal).In one embodiment, core 102 is transmitted to other crystal 406 by write local configuration working storage 112.More preferably say, write to local configuration working storage 112 by core 102 and local configuration working storage is not changed, but local control module 104 can be caused to propagate local activation position 254 be worth in far-end crystal 406.Flow process proceeds to square 1408.
In square 1408, core 102 writes the synchronization request of a synchronous situation 8 (being denoted as SYNC8 in fig. 8) in its synchronous working storage 108.Therefore, control module 104 makes core 102 enter sleep state.Flow process proceeds to square 1412.
In square 1412, when available core 102 all in the core set of being specified by core set field 228 have write a SYNC8, control module 104 has waken core 102 up.It should be noted that synchronous situation can be a polycrystal synchronous situation and occurs when polycrystal 406 microprocessor 100 configuration.That is, control module 104 by wait waking up (or sleep position 212 is not set at core 102 thus interrupts under determining sleepless situation) core 102, until write its synchronization request at core set field 228 (it can be included in the core 102 in crystal 406).Flow process proceeds to square 1414.
In square 1414, core 102 again read configuration working storage 112 and based on comprise by far-end crystal institute transmit the right value of activation position 254 the new value generation of configuration words 252 its configure correlation, flow process proceeds to decision block 1416.
In decision block 1416, core 102 determines whether it should stop using itself.In one embodiment, fuse 114 reads (before decision block 1416) because of this microcode in its reset process, with indicate core 102 should stop using itself and be blown, therefore core 102 determines that it need stop using itself.Fuse 114 can be blown during the manufacture of microprocessor 100 or afterwards.In another embodiment, fuse 114 value of renewal can be scanned up to and keep, in working storage, as mentioned above, and being indicated this core 102 to be deactivated by the value scanned.Figure 15 describes core 102 to judge that it should be stopped another embodiment of use by different modes.If when core 102 determines that it should be deactivated, flow process proceeds to square 1417; Otherwise flow process proceeds to square 1418.
In square 1417, core 102 writes core position 236 of stopping using to make itself to remove by the list of available core 102, such as, removes the activation position 254 of its correspondence in the configuration words 252 of configuration working storage 112.After this, core 102 can prevent from itself from performing any more instruction, more preferably by arranging one or more position to close its clock signal, and removes its power supply.Flow process terminates in square 1417.
In square 1418, core 102 writes the synchronization request of a synchronous situation 9 (being denoted as SYNC9 in fig. 14) to synchronous working storage 108.Therefore, control module 104 makes core 102 enter sleep state.Flow process proceeds to square 1422.
In square 1422, when all core 102 enabled has write a SYNC9, core 102 waken up by control module 104.In addition, when polycrystal 406 microprocessor 100 configuration, it may be that a quartz lock situation occurs that synchronous situation occurs based on the updated value in configuration working storage 112.Moreover when control module 104 determines whether a synchronous situation occurs, control module 104 considers to stop using itself core 102 in square 1417 by getting rid of.Illustrate in greater detail, in a situation, before itself the core 102 of not stopping using writes synchronous working storage 108 in square 1417, all other core 102 (except itself the core 102 of stopping using) writes a SYNC9, then, when itself the inactive core position of core 102 in square 1417 of not stopping using arranges the synchronous working storage 108 of write, control module 104 will the generation (in square 316) of detecting synchronous situation.When control module 104 because of inactive core 102 activation position 254 for (clear) that remove and determine that synchronous situation occurs time, control module 104 no longer considers inactive core 102.That is, enable core 102 due to all, but do not comprise inactive core 102, write SYNC9, whether core 102 of no matter stopping using writes SYNC9, and therefore control module 104 judges that synchronous situation occurs.Flow process proceeds to square 1424.
In square 1424, if when a core 102 is deactivated by the operation of another core 102 in square 1417, core 102 reads configuration working storage 112 again, and the new value of configuration words 252 reflects one stops using core 102.Core 102 produces its configuration correlation again according to the new value of configuration words 252, and it is similar to the mode in square 1414.One existence 102 of stopping using core may cause some configuration correlations to be different from the new value produced in square 1414.Such as, as mentioned above, virtual core quantity, APIC ID, BSP flag, BSP plot, the main wafer of predominant crystal can change because of the existence of inactive core 102.In next embodiment, after generation configuration correlation, core 102 one of them (such as, BSP), by the special random access memory 116 of some configuration correlation write non-core of all for microprocessor 100 core 102 entirety, makes it be read by all core 102 subsequently.For example, in one embodiment, overall configuration correlation is read to perform a framework instruction (such as, x86CPUID instruction), the Global Information that its instruction request microprocessor 100 is relevant by core 102, similarly is core 102 quantity of microprocessor 100.Flow process proceeds to decision block 1426.
In square 1426, core 102 removes and resets and start to extract framework instruction.Flow process ends at square 1426.
Refer to Figure 15, it shows the process flow diagram dynamically reconfigured according to microprocessor 100 in another embodiment.In the explanation of Figure 15, with the polycrystal microprocessor 100 of Fig. 4 as a reference, it comprises two crystal 406 and eight cores 102.But should will be understood that, described dynamically reconfiguring can use the microprocessor 100 with different configurations, namely has more than two crystal or single crystal, and many or be less than eight cores 102 but at least two cores 102.This operation is described by the angle from a single core, but each core 102 of microprocessor 100 operates according to this description with overall dynamics and reconfigures this microprocessor 100.Further illustrate, Figure 15 describes a core 102 and runs into core and to stop using the operation of instruction, and its flow process starts from square 1502, and another core 102 operates, and its operating process starts from square 1532.
In square 1502, core 102 one of them run into one and to stop using itself instruction in order to indicate core 102.In one embodiment, this instruction is an x86WRMSR instruction.Responsively, core 102 transmits one and reconfigures information to other core 102 and transmit the internuclear look-at-me of one.More preferably say, in during the time interrupts being deactivated (such as, this microcode does not allow himself to be interrupted), core 102 stops microcode to respond this instruction, to stop using itself (in square 1502), or respond this interruption (in square 1532), and maintain in microcode, until square 1526.Flow process proceeds to square 1504 by square 1502.
In square 1532, other core 102 one of them (core such as, except run into the core 102 of inactive instruction in square 1502 except) is interrupted due to the internuclear interruption transmitted in square 1502 and receives the information of reconfiguring.As mentioned above, although the flow process in square 1532 is described by the angle of a single core 102, but each other core 102 (core 102 such as, not in square 1502) is interrupted and receives this information and perform the step in square 1504 to 1526 in square 1532.Flow process proceeds to square 1504 by square 1532.
In square 1504, core 102 writes one synchronously asks the synchronization request of condition 10 (being denoted as SYNC10 in fig .15) in its synchronous working storage 108.Therefore, control module 104 makes core 102 enter sleep state.Flow process proceeds to square 1506.
In square 1506, when all available core 102 have write a SYNC10, core 102 waken up by control module 102.It should be noted that synchronous situation can be a polycrystal synchronous situation and occurs when polycrystal 406 microprocessor 100 configuration.That is, control module 104 by wait to wake (or interrupting under core 102 not yet determines to enter dormant situation) core 102 up, until core set field 228 (it can be included in the core 102 in crystal 406) is specified and the core 102 can enabling (it is indicated by activation position) writes its synchronization request.Flow process proceeds to decision block 1508.
In decision block 1508, core 102 judges that whether it be one in square 1502, be instructed to stop using itself core 102.If so, flow process proceeds to square 1517; Otherwise flow process proceeds to square 1518.
In square 1517, core 102 writes core position 236 of stopping using to make itself to remove by the list of available core 102, such as, removes the activation position 254 of its correspondence in the configuration words 252 of configuration working storage 112.After this, core 102 can prevent from itself from performing any more instruction, more preferably by arranging one or more position to close its clock signal, and removes its power supply.Flow process terminates in square 1517.
In square 1518, core 102 writes the synchronization request of a synchronous situation 11 (being denoted as SYNC11 in fig .15) to synchronous working storage 108.Therefore, control module 104 makes core 102 enter sleep state.Flow process proceeds to square 1522.
In square 1522, when all core 102 enabled has write a SYNC11, core 102 waken up by control module 104.In addition, when polycrystal 406 microprocessor 100 configuration, it may be that a polycrystal synchronous situation occurs that synchronous situation occurs based on the updated value in configuration working storage 112.Moreover when control module 104 determines whether a synchronous situation occurs, control module 104 considers to stop using itself core 102 in square 1517 by getting rid of.Illustrate in greater detail, in a situation, before itself the core 102 of not stopping using writes synchronous working storage 108 in square 1517, all other core 102 (except itself the core 102 of stopping using) writes a SYNC11, then when because of stop using core 102 activation position 254 for removing (clear) and determine whether synchronous situation occurs time, because control module 104 no longer considers core 102 of stopping using, therefore when itself the core 102 of not stopping using writes synchronous working storage 108 in square 1517, control module 104 is by the generation (in square 316) (referring to Figure 16) of detecting synchronous situation.That is, because all core 102 of enabling has write a SYNC11, whether core 102 of no matter stopping using writes SYNC11, and control module 104 judges that synchronous situation occurs.Flow process proceeds to square 1524.
In square 1524, core 102 reads configuration working storage 112, and its configuration words 252 will be reflected in square 1517 the inactive core 102 be deactivated.This core 102 then produces the relevant value of its configuration according to the new value of configuration words 252.More preferably say, in square 1502, stop using instruction performed by system firmware (such as, BIOS is arranged), and after core 102 is stopped using, restarting of system firmware executive system, such as, after in square 1526.During restarting, microprocessor 100 can carry out being different from the operation of previously configuration correlation generation in square 1524.For example, in during restarting, BSP can be one and is different from the core 102 before producing configuration correlation.Illustrate for another example again, guide before operating system by BSP determine be stored to storer can not be identical with this system configuration information making operating system and can the read quantity of core 102 and logic processor (such as, in systems in which).Illustrate for another example, the APIC ID of the core 102 still used is different from the APIC ID before producing configuration correlation, and in the case, operating system is by instruction interrupt request and response is different from the interrupt request that previously configuration correlation produces by core 102.Illustrate for another example again, in square 907 and 919, perform main core 102 that Fig. 9 encapsulates sleep state Handshake Protocol can be one and be different from the core 102 that previously configuration correlation produces.Flow process proceeds to decision block 1526.
In square 1526, core 102 recovers its performing of task before being interrupted in square 1526.Flow process ends at square 1526.
Dynamically reconfigure microprocessor 100 in this article to can be used in various applications.For example, dynamically reconfiguring can in the performance history of microprocessor 100 for test and/or simulation, and/or in on-the-spot test.In addition, a user may wonder the performance of system and/or the total amount of power consumption when only using core 102 subset to run a specific application program.In one embodiment, when a core 102 is deactivated, it can make its clock pulse stop and/or removing power supply, there is no electric consumption to make it.In addition, in the system of high reliability, each core 102 periodically can check whether other core 102 and the particular core selected by core 102 102 break down, and the core 102 endorsing disabling faulty of non-fault also makes remaining core 102 perform dynamically to reconfigure as described above.In this embodiment, control word 202 can comprise an additional field, and it makes write core 102 specify this core 102 to be deactivated and the operation be modified in described in Figure 15 makes a core can to stop using in square 1517 core 102 being different from core 102 itself.
Refer to Figure 16, it is display operates example sequential chart according to the microprocessor 100 of Figure 15 process flow diagram.In this example, microprocessor 100 configuration has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.But should will be understood that, in other embodiments, microprocessor 100 can comprise the core 102 of varying number and can be single crystal or polycrystal microprocessor 100.At this moment in sequence figure, the sequential of event is advanced downwards.
Core 1 run into one stop using itself instruction and responsively transmit one and reconfigure information and interrupt core 0 and core 2 (each square 1502).Core 1 then writes SYNC10 and enters sleep state (each square 1504).
Each core 0 and core 2 are finally interrupted and read this information (each square 1532) from its current task.Responsively, each core 0 and core 2 write SYNC10 and enter sleep state (each square 1504).As shown in the figure, the write of each core may be different with the time of SYNC10.For example, due to the delay of this instruction, therefore this instruction performs when interruption is established.
When all core 102 writes SYNC10, control module 104 wakes all core (each square 1506) up simultaneously.Core 0 and core 2 then determine its can not stop using itself (each decision block 1508), and write a SYNC11 and enter sleep state (each square 1518).But, because core 1 determines that it stops using itself, so it writes its inactive core position 236 (each square 1517).In this example, core 1 writes its inactive core position 236 after core 0 and core 2 write respective SYNC11, as shown in the figure.But arrange because control module 104 determines the core 102 that S position 222 is set up for each activation position 254, therefore control module 104 is detected this synchronous situation and is occurred.That is, even if the S position 222 of core 1 is not arranged, its activation position 254 is eliminated when the synchronous working storage 108 of square 1517 core 1 writes.
When all available core have write SYNC11, control module 104 has waken all core (each square 1522) up simultaneously.As mentioned above, when a polycrystal microprocessor 100, when core 1 writes its inactive core position 236, and local control module 104 removes the local activation position 254 of core 1 respectively, and local control module 104 also propagates local activation position 254 to far-end crystal 406.Therefore, Remote Control Unit 104 is also detected the generation of synchronous regime and is waken all available core of its crystal 406 simultaneously up.Core 0 and core 2 then produce its configuration correlation (each square 1524) based on the value of Reconfigurations working storage 112, and recover its interrupt before activity (each square 1526).
hardware semaphore (HARDWARE SEMAPHORE)
Please refer to Figure 17, it is the calcspar showing hardware semaphore 118 in FIG.Hardware semaphore 118 comprises one and has position (owned bit) 1702, owner position (owner bit) 1704 and a state machine 1706, and its state machine 1706 has position 1702 and owner position 1704 to respond the hardware semaphore 118 being read by core 102 and write in order to upgrade.More preferably say, in order to the hardware semaphore 118 that identification core has at present, the quantity of owner position 1704 is core 102 quantity that log configures with the microprocessor 100 that 2 is the end.In another embodiment, owner position 1704 comprises the position of each core 102 1 correspondence of microprocessor 100.It should be noted that, although one group has position 1702, owner position 1704 and state machine 1706 and is described and realizes with a hardware semaphore 118, but microprocessor 100 can comprise multiple hardware semaphore 118, wherein each hardware semaphore 118 all comprises above-mentioned a set of hardware.More preferably saying, needing exclusive operation of reading shared resource to perform, the microcode run in each core 102 reads and writes this hardware semaphore 118 to obtain one by the entitlement of core 102 shared resources, its be described in detail in below example in.Shared resource entitlement different from microprocessor 100 for each multiple hardware semaphore 118 can link together by this microcode.More preferably say, hardware semaphore 118 by core 102 in a nand architecture address space of core 102 in a preset address read and write.This nand architecture address space only can be read by the microcode of a core 102, but directly cannot read (such as, the programmed instruction of x86 framework) by user's procedure code.The state machine 1706 having position 1702 and owner position 1704 operation in order to upgrade hardware semaphore 118 is described as in Figure 18 and 19, and the use of hardware semaphore 118 is also describing afterwards.
Refer to Figure 18, it is the operational flowchart that display works as that a core 102 reads hardware semaphore 118.Flow process starts from square 1802.
In square 1802, a core 102, is denoted as core x, reads hardware semaphore 118.As mentioned above, more preferably say, the microcode of core 102 read this hardware semaphore 118 reside in presumptive address in nand architecture address space.Flow process proceeds to decision block 1804.
In decision block 1804, state machine 1706 checks this owner position 1704, to determine that whether core 102 be the owner of hardware semaphore 118.If so, then flow process proceeds to square 1808; Otherwise flow process proceeds to square 1806.
In square 1806, this hardware semaphore 118 returns and the null value read in core 102 does not have hardware semaphore 118 to indicate this core 102, and flow process terminates in square 1806.
At square 1808, this hardware semaphore 118 returns and reads the value in core 102, and to indicate this core 102 to have hardware semaphore 118, flow process terminates in square 1808.
As mentioned above, microprocessor 100 can comprise multiple hardware semaphore 118.In one embodiment, microprocessor 100 comprises 16 hardware semaphores 118, and when a core 102 reads presumptive address, it receives one 16 bit data value, its each corresponding 16 one of them different hardware semaphores 118 of hardware semaphore 118, and indicate the core 102 of this reading presumptive address whether to have corresponding hardware semaphore 118.
Refer to Figure 19, it is the operational flowchart that display works as that a core 102 writes hardware semaphore 118.Flow process starts from square 1902.
In square 1902, a core 102, is denoted as core x, and write hardware semaphore 118, such as, as above at the preset address of nand architecture.Flow process proceeds to decision block 1804.
In decision block 1904, state machine 1706 checks that this has position 1702, to determine hardware semaphore 118 whether by arbitrary core 102 is had or do not occupied (free).If had, then flow process proceeds to decision block 1914; Otherwise flow process proceeds to decision block 1906.
In decision block 1906, state machine 1706 checks the value of write.If this value is 1, it represents that core 102 is for obtaining the entitlement of hardware semaphore 118, then flow process proceeds to square 1908.But if this value is 0, it represents that core 102 is for abandoning the entitlement of hardware semaphore 118, then flow process proceeds to square 1912.
In square 1908, state machine 1706 upgrades and has position 1702 to 1, and arranges the hardware semaphore 118 that owner position 1704 indicates core x to have now.Flow process terminates in square 1908.
In square 1912, this state machine 1706 does not perform the renewal having position 1702, does not perform the renewal of owner position 1704 yet, and flow process ends in square 1912.
In decision block 1914, state machine 1706 checks this owner position 1704, to determine that whether core x is the owner of hardware semaphore 118.If so, then flow process proceeds to decision block 1916; Otherwise flow process proceeds to square 1912.
In decision block 1916, the value that state machine 1706 inspection writes.If this value is 1, it represents that this core 102 is for obtaining the entitlement of hardware semaphore 118, then flow process proceed to square 1912 (wherein therefore core 102 has had hardware semaphore 118, so there is not more kainogenesis, as in decision block 1914 judge).But if this value is 0, it represents that this core 102 is for abandoning the entitlement of hardware semaphore 118, then flow process proceeds to square 1918.
In square 1918, it is zero that the renewal of this state machine 1706 has position 1702, and do not have core 102 to have hardware semaphore 118 now to represent, flow process ends at square 1918.
As mentioned above, in one embodiment, microprocessor 100 comprises 16 hardware semaphores 118.When a core 102 writes this presumptive address, it writes one 16 bit data value, its each corresponding 16 one of them different hardware semaphores 118 of hardware semaphore 118, and indicate the core 102 of this write presumptive address whether to ask the entitlement (value is zero) having (value is 1) or abandon corresponding hardware semaphore 118.
In one embodiment, arbitrated logic arbitration by core 102 ask to access this hardware semaphore 118, to make core 102 by hardware semaphore 118 serializing (Serialize) read/write hardware semaphore 118.In one embodiment, arbitrated logic uses the fair algorithm (Round-Robin Fairness Algorithm) of a cycle control with access hardware semaphore 118 between core 102.
Refer to Figure 20, it shows when microprocessor 100 uses hardware semaphore 118 to perform the operational flowchart needing a resource exclusive ownership.Further illustrate, hardware semaphore 118 in order to run into respectively at two or more core 102 one write back and make shared cache memory 119 lost efficacy instruction guarantee sometime only a core 102 perform one and write back, and shared cache memory 119 was lost efficacy.This operation is with described by the angle of a single core, but with entirety, each core 102 of microprocessor 100 guarantees that core 102 execution writes back and makes the operation of other core 102 invalid according to the present invention.That is, the operation of Figure 20 guarantees that WBINVD instruction process is by serializing (Serialize).In one embodiment, the operation of Figure 20 can perform in a microprocessor 100, and it performs a WBINVD instruction according to the embodiment in Fig. 7 A ~ 7B.Flow process starts from square 2002.
In square 2002, a core 102 runs into a speed buffering steering order, similarly is a WBINVD instruction.Flow process proceeds to square 2004.
In square 2004, core 102 writes in 1 to WBINVD hardware semaphore 118.In one embodiment, this microcode distributed hardware semaphore 118 one of them to WBINVD operation in.This core 102 then reads WBINVD hardware semaphore 118 to determine whether it obtains entitlement.Flow process proceeds to decision block 2006.
In decision block 2006, if when core 102 determines that it obtains the entitlement of WBINVD hardware semaphore 118, then flow process proceeds to square 2008; Otherwise flow process is back to square 2004 again to attempt obtaining entitlement.It should be noted circulation between microcode when instant core 102 is via square 2004 to 2006, it finally can be interrupted by the core 102 having WBINVD hardware semaphore 118, because this core 102 is just performing WBINVD instruction in square 702 and transmit an interruption to instant core 102 in Fig. 7 A ~ 7B.More preferably say, via each circulation, the microcode of instant core 102 checks interruption status working storage, whether sends an interruption to instant core 102 to observe other core 102 one of them (such as, having the core 102 of this WBINVD hardware semaphore 118).This instant core 102 then will perform the operation of Fig. 7 A ~ 7B, and in square 749 according to Figure 20 recovery operation to attempt the entitlement obtaining hardware semaphore 118, to perform its WBINVD instruction.
In square 2008, core 102 has obtained all flow processs for the time being and has proceeded to square 702 in Fig. 7 A ~ 7B to perform WBINVD instruction.Due to the WBINVD command operating of part, in Fig. 7 A ~ 7B square 748, this core 102 writes in zero to WBINVD hardware semaphore 118 to abandon its entitlement.Flow process ends at square 2008.
One is similar to Figure 20 the operation described can be performed by this microcode, with the entitlement that other shared resource obtained is exclusive.The working storage that other resource of the exclusive ownership that one core 102 can obtain by using a hardware semaphore 118 to use is non-core 103, its by core 102 share.In one embodiment, non-core 103 working storage comprises a control working storage, and it comprises the respective field of each core 102.This field controls the operating aspect of each core 102.Because field is arranged in identical working storage, when a core 102 is for upgrading its respective field but cannot upgrading the field of other core 102, this core 102 must read this control working storage, revise the value read, and then writes back the value revised to controlling working storage.For example, microprocessor 100 can comprise a non-core 103 Properties Control working storage (Performance Control Register, PCR), and it is for controlling the bus clock pulse ratio of core 102.In order to upgrade its bus clock pulse ratio, a specific core 102 must read, revises and write back PCR.Therefore, in one embodiment, microcode is configured to when core 102 has the hardware semaphore 118 relevant to PCR, and the effective atom performing a PCR reads/revises/writes back.Bus clock pulse is than determining that single core 102 clock frequency is the multiple of the clock frequency of this support microcontroller 100 via an external bus.
Another resource is a reliable platform module (Trusted Platform Module, TPM).In one embodiment, microprocessor 100 performs a reliable platform module of running microcode in core 102.In given instant time, operate in one of them microcode of a core 102 and core 102 and implement TPM.But the core 102 implementing TPM may change in time.By using the hardware semaphore 118 be associated with TPM, the microcode of core 102 can guarantee that only a core 102 implements TPM in the time.Further illustrate, the core 102 just performing TPM at present write TPM state to special random access memory 116 before abandoning implementing this TPM, and the core 102 of this adapter enforcement TPM reads the state of TPM from special random access memory 116.Be configured to make when core 102 is for becoming the core 102 performing TPM at the microcode of each core 102, first core 102 obtain the entitlement of TPM hardware semaphore 118 before reading TPM state in by special random access memory 116, and start to perform TPM.In one embodiment, TPM roughly meets the TPM specification issued by believable computing tissue (Trusted Computing Group), similarly is ISO/IEC11889 specification.
As mentioned above, tradition is the software signal amount (software semaphore) adopted in the system memory to the solution of resource contention between multiple processor.The potential advantage of hardware semaphore 118 described herein is that it can avoid the generation of additional transmissions amount in extra memory bus, and its access speed is faster than the storer of access system.
interruption, non-sleep synchronization request
Refer to Figure 21, it is that display sends according to the core 102 of Fig. 3 process flow diagram the sequential chart that non-sleep synchronization request operates an example.In this example, microprocessor 100 configuration has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.But should will be understood that, in other embodiments, this microprocessor 100 can comprise the core 102 of varying number.
Core 0 writes a SYNC14, and it is not arranged in sleep position 212, is non-ly arranged in selective wake-up position 214 (such as, a non-sleep synchronization request) yet.Therefore, control module 104 allows core 0 to keep running (branch's "No" of each decision block 312).
Core 1 finally also writes a non-sleep SYNC14 and control module 104 allows core 1 to keep running.Finally, core 2 writes a non-sleep SYNC14.As shown in the figure, the time of each core write SYNC14 may be different.
When all core write non-sleep synchronous 14 time, control module 104 sends a sync break to each core 0, core 1 and core 2 simultaneously.Each core then receives sync break and service synchronization interrupts (unless this sync break crested, in this case, this microcode generally understands poll (poll) this sync break).
the appointment of pilot processor
In one embodiment, as mentioned above, usually (such as, when the function of Figure 23 " all core BSP " is deactivated) one core 102 specify this as bootstrap processor (BSP) and perform the task of specifying, similarly be guide work system.In one embodiment, usually the quantity of (such as, when Figure 22 and 23 " revises BSP " and the function of " all core BSP " is deactivated respectively) virtual core is preset as 0 by core 102BSP.
But it is likely favourable that the present inventor has observed that BSP is designated in a different mode, and embodiment will be described below.Such as, many tests of part microprocessor 100, particularly in manufacturing test, are performed by guiding operating system and working procedure code, to guarantee that this part microprocessor 100 normally carries out work.Start this operating system because of BSP core 102 executive system initialization, therefore BSP core 102 can the mode that cannot run of AP core run.In addition, from observation, even in the operating environment of multi-threading (Multithreaded), BSP bears the larger part of this processing load compared with AP usually, and therefore, AP core 102 cannot equally with BSP core 102 be done comprehensively to test.Finally, it only need represent microprocessor 100 by this BSP core 102 and performs as an entirety may some action, similarly is the encapsulation sleep state Handshake Protocol as Fig. 9 describes.
Therefore, embodiment describes arbitrary core 102 and can be designated as BSP.In one embodiment, in the test period of microprocessor 100, testing results N time, wherein N is the quantity of microprocessor 100 core 102, and microprocessor 100 is reconfigured to make BSP for different core 102 in each operation of test.This can advantageously provide better test coverage in the fabrication process, and also advantageously in the design process of microprocessor 100, discloses the mistake in microprocessor 100.Another advantage is that each core 102 can have a different APIC ID in different operations, thus responds different interrupt request, and it can provide test coverage widely.
Refer to Figure 22, it is a program flow diagram of display configure microprocessor 100.In the description of Figure 22 with reference to the polycrystal microprocessor 100 in figure 4, it comprises two crystal 406 and eight cores 102.But, should will be understood that, described hereinly dynamically reconfigure a microprocessor 100 that can use and there are different configurations, namely have more than two crystal or single crystal, and many or be less than eight cores 102 but at least two cores 102.This operation is described by the angle from a single core, but each core 102 of microprocessor 100 operates according to this description with overall dynamics and reconfigures this microprocessor 100.Flow process starts from square 2202.
In square 2202, microprocessor 100 is reset, and performs its initialized initial part, more preferably a mode its be similar to mode above described by Figure 14.But the generation of configuration correlation, similarly be the square 1424 in Figure 14, especially this APIC ID and BSP flag, perform in the mode described in square 2203 to 2204.Flow process proceeds to square 2203.
In square 2203, core 102 produces its virtual core quantity, is more preferably described in Figure 14.Flow process proceeds to decision block 2204.
In decision block 2204, core 102 samples an instruction to determine whether a function can enable.Function that this function is referred to herein as " amendment BSP ".In one embodiment, the function that a fuse 114 can revise BSP is blown.More preferably say, in test process, not blow the fuse 114 of amendment BSP function, but a true value (True) is scanned up in the preservation working storage position relevant to revising BSP function fuse 114, as as shown in above-mentioned Fig. 1, can enable to make this amendment BSP function.In the manner in which, this amendment BSP function is in part microprocessor 100 and impermanently enable, but stops using afterwards in power supply (power-up).More preferably say, the operation in square 2203 to 2214 is performed by the microcode of core 102.If when this amendment BSP function is activated, flow process proceeds to square 2205.Otherwise flow process proceeds to square 2206.
In square 2205, core 102 is modified in the virtual core quantity produced in square 2203.In one embodiment, core 102 revise virtual core quantity with to produce in square 2203 the result of a cyclical function (Rotate function) of generation virtual core quantity and an internal circulating load, as follows:
Virtual core quantity=circulation (internal circulating load, virtual core quantity).
Cyclical function, in one embodiment, to be circulated virtual check figure by period between core 102.Internal circulating load is a value of blowing fuse 114, or more preferably says, it is scanned up to and keeps in working storage in test process.Table 1 shows the virtual check figure of each core 102, its ordered pair (amount of crystals 258, local nuclear volume 256) is presented in the left row of an example configuration, and each internal circulating load is presented in top line, its amount of crystals 406 is two and core 102 quantity of each crystal 406 is 4, and all core 102 can be activated.In this kind of mode, tester is authorized to make core 102 produce the APIC ID of its virtual check figure and such as any effective value.Although for revising in an embodiment that virtual check figure is described in, other embodiment also can be expected.Such as, loop direction can be shown in form 1 on the contrary.Flow process proceeds to square 2206.
Table 1
? 0 1 2 3 4 5 6 7
(0,0) 0 7 6 5 4 3 2 1
(0,1) 1 0 7 6 5 4 3 2
(0,2) 2 1 0 7 6 5 4 3
(0,3) 3 2 1 0 7 6 5 4
(1,0) 4 3 2 1 0 7 6 5
(1,1) 5 4 3 2 1 0 7 6
(1,2) 6 5 4 3 2 1 0 7
(1,3) 7 6 5 4 3 2 1 0
In square 2206, the value revised of the default virtual core quantity produced in square 2203 or generation in square 2203 is inserted in local APIC ID working storage by core 102.In one embodiment, APIC ID working storage can be read in storage address 0x0FEE00020 by this core 102 (such as, by passing through BIOS and/or operating system) from itself.But in another embodiment, APIC ID working storage reads at MSR address 0x802 by core 102.Flow process proceeds to decision block 2208.
In decision block 2208, core 102 determines whether the APIC ID that it is inserted at square 2208 is zero.If so, then flow process proceeds to square 2212; Otherwise flow process proceeds to square 2214.
In square 2212, its BSP flag is set to very (true) by core 102, to represent that core 102 is for BSP.In one embodiment, BSP flag is one of the x86APIC plot working storage (IA32_APIC_BASE MSR) of this core 102.Flow process proceeds to decision block 2216.
In square 2214, BSP flag is set to vacation (false) by core 102, to represent core 102 not for BSP, such as, in an AP.Flow process proceeds to decision block 2216.
In decision block 2216, core 102 judges whether it is BSP, such as, whether specify this as the BSP core 102 in square 2212, and non-designated itself be the AP core 102 in square 2214.If so, then flow process proceeds to square 2218; Otherwise flow process proceeds to square 2222.
In square 2218, core 102 starts to extract and executive system initialization firmware (such as, BSP BIOS boot code).This can comprise the instruction relevant with APIC ID to BSP flag, such as, reads the instruction of APIC ID working storage or APIC plot working storage, and in the case, core 102 recovers the value write at square 2206 and 2212/2214.It also can comprise and represents microprocessor 100 as an entirety with executable operations as the unique core 102 of microprocessor 100, similarly is the encapsulation sleep state Handshake Protocol that Fig. 9 describes.More preferably say, BSP core 102 resets in vector at a defined framework and starts to obtain and executive system initialization firmware.Such as, in x86 framework, reset vector and point to 0xFFFFFFF0.More preferably say, executive system initialization firmware comprises this operating system of guiding, such as, is loaded into this operating system and changes control operation system into.Flow process proceeds to square 2224.
In square 2222, core 102 stops itself and waits for from the initiating sequence of BSP to start to extract and to perform instruction.In one embodiment, the initiating sequence received from BSP is included in an interrupt vector (such as, AP bios program code) of AP system initialization firmware.This can comprise the instruction relevant to BSP flag and APIC ID, and in such cases, core 102 recovers the value write in square 2206 and 2212/2214.Flow process proceeds to square 2224.
In square 2224, when core 102 performs instruction, this core 102 is based on the APIC ID receive interruption request writing on its APIC ID working storage in square 2206 and respond described interrupt request.Flow process ends at square 2224.
As mentioned above, according in an embodiment, virtual check figure be zero core 102 be preset as BSP.But inventor observes, may have and be designated as the favourable situation of BSP to all core 102, embodiment will be described in below.For example, microprocessor 100 developer has dropped into significantly a large amount of time and has become the huge test subject of original research and development one design in order to run in a monokaryon of single-threaded (single-threaded), and developer wants to use monokaryon test to test multi-core microprocessor 100.Such as, this test may run in the dos operating system that x86 realistic model is old and well-known.
Run during these tests can use this amendment BSP function described in Figure 22 in a continuous manner one at each core 102 and complete and/or keep working storage amendment fuse value with all core 102 of stopping using by blowing fuse or being scanned up to, but a core 102 is used for testing.But, inventor understood this will than in all core 102 simultaneously testing results need the more time (such as, 4 times are approximately) when one 4 core microprocessor 100, in addition, the time of required each independent microprocessor 100 part of test is valuable, especially when manufacturing hundreds thousand of or more microprocessor 100 parts, particularly when many tests are tested in very expensive testing apparatus.
In addition, other may for when running more than one core 102 (or all core 102) at one time, can produce more heat energy due to it and/or attract more energy, the speed path in microprocessor 100 logic will be applied in the situation of more multiple pressure power.The test run in this continuous print mode may can not produce extra pressure and disclose this speed path.
Therefore, embodiment describes all core 102 and can dynamically be specified this BSP core 102 can perform a test to make all core 102 simultaneously.
Refer to Figure 23, it is the program flow diagram of display according to configure microprocessor 100 in another embodiment.In the description of Figure 23 with reference to the polycrystal microprocessor 100 in figure 4, it comprises two crystal 406 and eight cores 102.But, should will be understood that, described hereinly dynamically reconfigure a microprocessor 100 that can use and there are different configurations, namely have more than two crystal or single crystal, and many or be less than eight cores 102 but at least two cores 102.This operation is described by the angle from a single core, but each core 102 of microprocessor 100 operates according to this description with overall dynamics and reconfigures this microprocessor 100.Flow process starts from square 2302.
In square 2302, microprocessor 100 is reset, and performs its initialized initial part, more preferably a mode its be similar to mode above described by Figure 14.But the generation of configuration correlation, similarly be the square 1424 in Figure 14, especially this APIC ID and BSP flag, perform in the mode described in square 2304 to 2312.Flow process proceeds to decision block 2304.
In decision block 2304, core 102 is detected a function and can be activated.Function that this function is referred to herein as " all core BSP ".More preferably say, blow fuse 114 and all core BSP functions can be made to be activated.More preferably say, in test process, not blow the fuse 114 of all core BSP functions, but a true value (True) is scanned up in the preservation working storage position relevant to all core BSP function fuses 114, as as shown in above-mentioned Fig. 1, can enable to make this all core BSP function.In the manner in which, this all core BSP function is in part microprocessor 100 and impermanently enable, but stops using afterwards in power supply (power-up).More preferably say, the operation in square 2304 to 2312 is performed by the microcode of core 102.If when this all core BSP function is activated, flow process proceeds to square 2305.Otherwise flow process proceeds to square 2203 in Figure 22.
In square 2305, no matter why, it is zero that core 102 sets its virtual core quantity to crystal 258 quantity of local nuclear volume 256 and core 102.Flow process proceeds to square 2306.
In square 2306, the virtual core quantity that value set in square 2305 is zero by core 102 inserts local APIC ID working storage.Flow process proceeds to square 2312.
In square 2312, no matter why, it is that true (True) is to represent that this core 102 is for BSP that core 102 arranges its BSP flag to crystal 258 quantity of local nuclear volume 256 and core 102.Flow process proceeds to square 2315.
In square 2315, when a core 102 performs a memory access requests, the higher address position of each core 102 memory access requests address revised respectively by microprocessor 100, makes each core 102 access its independent storage space.That is, according to the core 102 producing memory access requests, higher address position revised by microprocessor 100, with the value making higher address position have each core 102 1 uniqueness.In one embodiment, the higher address position indicated by the value of blowing fuse 114 revised by microprocessor 100.In another embodiment, microprocessor 100 revises higher address position based on the amount of crystals 258 of local nuclear volume 256 and core 102.For example, in a microprocessor 100, nuclear volume is in the embodiment of 4, and two higher positions of this storage address revised by microprocessor 100, and in two positions that each core 102 is higher, produce the value of a uniqueness.In fact, can be divided into N number of subspace by the storage space of microprocessor 100 addressing, wherein N is the quantity of core 102.The exploitation of test procedure make its limit oneself itself to specify in the address of minimum subspace in N number of subspace.Such as, suppose that microprocessor 100 can look for the address of storer 64GB and microprocessor 100 comprises four cores 102.This test is developed the minimum 8GB of only access memory.When core 0 performs the instruction of access memory address A (8GB lower in memory), microprocessor 100 produces an address in memory bus A (unmodified); When core 1 performs the instruction of access the same memory address A, this microprocessor 100 produces an address in memory bus A+8GB; When core 2 performs the instruction of access the same memory address A, this microprocessor 100 produces an address in memory bus A+16GB; And when core 3 performs the instruction of access the same memory address A, this microprocessor 100 produces an address in memory bus A+32GB.In this kind of mode, advantageously, core 102 can not conflict mutually in its access memory, and it can make test correctly perform.More preferably say, single-threaded test is performed in an independent test machine, and it can test separately this microprocessor 100.This microprocessor 100 developer's development and testing data are also supplied to this microprocessor 100 by test machine, on the contrary, this microprocessor 100 developer researches and develops result data, it for comparing the data result that this microprocessor 100 writes, to guarantee that this microprocessor 100 writes correct data in test machine is during a storer writes access.In one embodiment, share cache memory 119 (such as, highest order cache, it produces the address be used in external bus process) be the part of microprocessor 100, its configuration is in order to the amendment higher address position when all core BSP functions are enabled.Flow process proceeds to square 2318.
In square 2318, core 102 starts to extract and executive system initialization firmware (such as, BSP BIOS boot code).This can comprise the instruction relevant with APIC ID to this BSP flag, such as, reads the instruction of this APIC ID working storage or APIC plot working storage, and in the case, this core 102 recovers the null value write in square 2306.More preferably say, start in the replacement vector (Architecturally-defined reset vector) that this BSP core 102 defines at a framework to read and executive system initialization firmware.Such as, in x86 framework, reset vector and point to 0xFFFFFFF0 address.More preferably say, perform this system initialization firmware and comprise guiding operating system, such as, be loaded into this operating system and change to this operating system of control.Flow process proceeds to square 2324.
In square 2324, when core 102 performs instruction, this core 102 is the APIC ID value receive interruption request of zero based on writing on its APIC ID working storage value in square 2306 and responds described interrupt request.Flow process ends at square 2324.
Although all core 102 is designated as in the embodiment of this BSP and has been described in Figure 23, other embodiment can be considered multiple but be less than all core 102 and be designated as this BSP.
Although embodiment is described with an x86 type system for content, in its system, each core 102 uses a local APIC and has the relevance between local APIC ID and BSP specifies, should will be understood that, the appointment of this bootstrap processor is not limited to the embodiment of x86, but can use in the system with different system framework.
for the propagation of the microcode patching (PATCH) of multinuclear
Observed by previously, likely primarily of the many important function that the microcode of microprocessor performs, and especially, it need be executed in correct communication and coordination between this microcode example in this microprocessor multinuclear.Due to the complicacy of microcode, therefore a significant probability display mistake will be present in the microcode that need revise.This can cause the microcode patching of the old micro-code instruction of this mistake via the new micro-code instruction replacement of use.That is, this microprocessor comprises the specific hardware benefiting microcode patching.In the ordinary course of things, ideal is all cores this micro-amendment being applied to this microprocessor.Traditionally, it by performing separately a framework instruction to perform repairing in each core.But traditional method may have problem.
First, this repairing and use microcode example are (such as, core is synchronous, hardware semaphore uses) intercore communication be correlated with or with need the function of microcode intercore communication (such as, across core adjustment request, speed buffering control operation or power management, or dynamically multi-core microprocessor configuration) relevant.On each core, the execution of framework repairing application program may produce a time form respectively, and its microcode patching to be applied in some cores but not with being applied in other core (or a previous repairing application in some cores and new repairing application to other core).This may cause an internuclear communication failure and the incorrect operation of this microprocessor.If all cores of this microprocessor use identical microcode patching, other can be expected and not expected problem also may produce.
Secondly, the framework of this microprocessor specifies many functions, and it can be supported by this microprocessor in some example (instance), and is not supported by other microprocessor.During operation, microprocessor can communicate with the system software of this specific function of support.Such as, when an x86 architectural framework microprocessor, x86CPUID instruction can be performed to determine supported function setting by system software.But, determine that the instruction (such as, CPUID) of function setting performs respectively in each core of this microprocessor.In some cases, a function can be present in the mistake in this time and is deactivated because of one, and removes this microprocessor.But a microcode patching repairing this mistake can be developed out subsequently, can be activated to make this function after repairing application.But, if repair is implement (such as, repair the individual ones of instruction by application in each core, be implemented on each core respectively) with traditional routine, different endorsing depends on whether this repairing has been applied in core, indicates different functional configuration at a given time point.This may be problematic, especially such as, when this system software (as operating system, helping internuclear Thread to move), expects that all cores of this microprocessor have identical function setting.Especially, observed the functional configuration that some system softwares only obtain a core, and supposed that other core core has identical functional configuration.
Moreover, each nuclear control and/or with core the microcode example that communicates of the non-nuclear resource (such as, synchronous relevant hardware, hardware semaphore, share PRAM, share high-speed buffer or service unit) shared.Therefore, because in core, one of them has and uses microcode patching and other core there is no use (or two cores have different microcode patchings), in general, the microcode of two kinds of different IPs carries out control inclusive NAND nuclear resource in two different ways simultaneously to carry out communication may be problematic.
Finally, also can use the repairing of traditional approach at this microcode patching hardware of this microprocessor, but it may cause other core repairing application and the interference by a core repair operation, such as, if the part of repairing hardware is shared internuclear.
More preferably say, in the problem that framework instruction-level is described in herein with solution with the embodiment of an atom (atomic) mode application microcode patching to multi-core microprocessor.First, by repairing application execution in response to a framework instruction in single core 102 in overall microprocessor 100.That is, embodiment need not require that system software performs application microcode patching instruction (as described below) in each core 102.More particularly, the single core 102 running into this application microcode patching instruction is by transmission information and interrupt other core 102 with the example causing its microcode to be used for repair part, and all microcode examples and another microcode cooperation make this microcode patching be applied in the microcode patching software of each core 102, and when during disable interrupts, sharing the repairing hardware of this microprocessor 100 in all core 102.Secondly, to run in all core 102 and the microcode example realizing this atom repairing application mechanism is cooperated mutually with another microcode, avoid to make it performing arbitrary framework instruction (except an application microcode patching instruction) after all cores 102 of this microprocessor 100 have agreed to that applying this repairs, until all core 102 completes.That is, when arbitrary core 102 uses this microcode patching, core 102 is not had to perform a framework instruction.In addition, in a better embodiment, all core 102 arrives the identical place of this microcode and has the repairing application of disable interrupts to perform, and only performs this micro-code instruction for repairing until all cores of this microprocessor 100 confirm that this repairing has only been used as at core 102 afterwards.That is, when arbitrary core 102 of this microprocessor 100 is just using this repairing, core 102, except using the micro-code instruction of microcode patching, is not having core 102 to perform micro-code instruction.
Please refer to Figure 24, it is the calcspar of display according to a multi-core microprocessor 100 of another embodiment.This microprocessor 100 is in many aspects similar in appearance to the microprocessor 100 of Fig. 1.But, the microprocessor 100 of Figure 24 is also included in service unit (the Service Processing Unit in its non-core 103, SPU) 2423, service unit (SPU) start address working storage 2497, non-core microcode ROM (read-only memory) (Read Only Memory, ROM) 2425 and a non-core microcode patching random access memory (Random Access Memory, RAM) 2408.In addition, each core 102 comprises a core PRAM2499, a repairing can addressing content memorizer (Content Addressable Memory, CAM) 2439 and a core microcode ROM 2404.
Microcode comprises micro-code instruction.This micro-code instruction is for being stored in this one or more storer of microprocessor 100 (such as, non-core microcode ROM 2425, non-core microcode patching RAM2408 and/or core microcode ROM 2404) in nand architecture instruction, wherein this micro-code instruction by a core 102 based on being stored in this nand architecture microprogram counter (Micro-program Counter, Micro-PC) in, extraction (fetch) address extracted, and uses by this core 102 instruction realizing this microprocessor 100 instruction set architecture.More preferably say, this micro-code instruction is translated into micro-order by a micro-transfer interpreter (Microtranslator), its micro-order is performed by the performance element of this core 102, or in another embodiment, this micro-code instruction is directly performed by performance element, in the case, micro-code instruction is micro-order.This micro-code instruction is the instruction that nand architecture instruction means that it is not the instruction set architecture (Instruction Set Architecture, ISA) of this microprocessor 100, but it is different from the instruction set of this framework instruction set according to one and is encoded.This nand architecture microprogram counter be can't help the instruction set architecture of this microprocessor 100 and defined, and is different from framework definition (Architecturally-defined) programmable counter of this core 102.This microcode is in order to realize the some or all of instructions of the ISA instruction set of this microprocessor following.Perform ISA instruction to respond decoding one microcode, this core 102 changes a control one microcode routine program (Routine) relevant to this ISA into.This microcode routine program comprises micro-code instruction.This performance element performs this micro-code instruction, or according to preferred embodiment, this micro-code instruction is further by the micro-order translated to performed by this performance element.The execution result result for by this ISA instruction defined of this micro-code instruction (or the micro-order of being translated by this micro-code instruction) performed by this performance element.Therefore, the common execution of relevant to this ISA instruction microcode (or from micro-order that this microcode routine programmed instruction is translated) routine " implements (Implement) " this ISA instruction by this performance element.That is, by perform micro-code instruction (or from micro-order that this micro-code instruction is translated) performance element performed by jointly complete operation specified by this ISA instruction the input specified by this ISA instruction, to produce a result defined by this ISA instruction.In addition, when this microprocessor resets to configure this microprocessor, this micro-code instruction can be performed (or translating to the micro-order be performed).
This core microcode ROM 2404 has the microcode performed by the particular core 102 comprising this core microcode ROM 2404.This non-core microcode ROM 2425 also has the microcode performed by this core 102.But, compared with core microcode ROM 2404, this non-core ROM2425 by core 102 share.More preferably say, the access time due to this non-core ROM2425 is greater than this core ROM2404, and therefore this non-core ROM2425 has the microcode routine program needing less performance and/or more infrequently perform.In addition, this non-core ROM2425 to have by this SPU2423 extract and the procedure code performed.
This non-core microcode patching RAM2408 also by core 102 share.This non-core microcode patching RAM2408 has the micro-code instruction performed by core 102.When one of them content of project (entry) in this extraction address and this repairing CAM2439 matches, then this repairing CAM2439 have for respond a microcode extract address and by this repairing CAM2439 export the patch address of a microsequencer (Microsequencer) to.In the case, this patch address that this microsequencer exports is this microcode extraction address, but not the extraction of next order refers to address (or the destination address in branching type instruction situation), to repair as this non-core the reply that RAM 2408 exports a repairing micro-code instruction.Such as, because repairing micro-code instruction and/or micro-code instruction are after which an error source, therefore one repair micro-code instruction by repairing extraction implementation in RAM2408 in non-core, but not the micro-code instruction extracted from this non-core ROM2425 or this core ROM2404.Therefore, this repairing micro-code instruction is effectively replaced or is repaired and reside in this core ROM2404 or the unexpected micro-code instruction of this non-core microcode ROM 2425 in this original microcode extraction address.More preferably say, this repairing CAM2439 and repair RAM2408 and be written into the framework instruction be contained in respond packet in system software, the operating system similarly being BIOS or running in this microprocessor 100.
In other event, this non-core PRAM116 by this microcode in order to store this microcode the value that uses.A part of valid function of these values is constant
Except repairing via one or clearly revise the instruction of this value (such as response one, one WRMSR instruction) execution outside, when this microprocessor 100 be reset and during the operation of this microprocessor 100 in be not modified time, due to its be stored in this core microcode ROM 2404 or this non-core microcode ROM 2425 immediate value (immediate value) or at this microprocessor 100, time point that is manufactured or that write to this non-core PRAM116 by this microcode blows this fuse 114.Advantageously, these values can be revised via repairing described herein mechanism, without the need to changing this core microcode ROM 2404 or this non-core microcode ROM 2425 that cost may be very expensive, and also without the need to fuse 114 that one or more does not blow.
In addition, this non-core PRAM116 in order to preserve by this SPU2423 extract and the repairing code performed, as described herein.
This core PRAM2499, it is similar to this non-core PRAM116, is special (private), or nand architecture, it means this core PRAM2499 and is not arranged in this microprocessor 100 framework user program address space.But, unlike this non-core PRAM116, each PRAM2499 only by its respective core 102 read and can't help other core 102 share.As this non-core PRAM116, this core PRAM2499 also uses to store the value used by this microcode by this microcode.Advantageously, these values can be revised via repairing described herein mechanism, and without the need to changing this core microcode ROM 2404 or non-core microcode ROM 2425.
This SPU2423 comprises a stored routine processor, and it is one attached and be different from the adjunct (adjunct) of each core 102.Although described core 102 structure can perform the instruction (such as, the ISA instruction of x86) of this ISA of described core 102, this SPU2423 structurally cannot do like this.Therefore, for example, this operating system cannot be run in this SPU2423, and the ISA operating system dispatcher of described core 102 (such as, the ISA instruction of x86) also cannot be made to run in this SPU2423.In other words, the system resource of this SPU2423 not for being managed by this operating system.Or rather, this SPU2423 performs the operation for adjusting this microprocessor 100.In addition, this SPU2423 can help performance and other function of measuring described core 102.More preferably say, this SPU2423 is less, more uncomplicated than described core 102 and have less power consumption (such as, in one embodiment, this SPU2423 comprises built-in clock pulse gate (Clock Gating)).In one embodiment, SPU2423 comprises a FORTH CPU core.
The asynchronous events that can occur together with the debug instruction performed by described core 102 possibly cannot process very well.But, advantageously, this SPU2423 can be ordered to detect this event by a core 102, and executable operations, similarly be behavior and/or this microprocessor 100 external bus interface setting up record shelves (log) or revise this core 102 each side, using the response as this event of detecting.This SPU2423 can provide this record shelves information to this user, and it also can be interactive to ask this tracker provide this record shelves information or ask tire tracker to perform other action with tracker.In one embodiment, this SPU2423 can the working storage of this memory sub-system of access control and the programmable interrupt controller of each core 102, and the control working storage of this shared speed buffering working storage 119.
The example that this SPU2423 can detect event comprises as follows: (1) one core 102 just operates, and such as, this core 102 is not yet resignation (retire) programmable any instruction in the clock period of a quantity; (2) one cores 102 are loaded into by the data in a non-high-speed buffer area in storer; (3) in this microprocessor 100, temperature changes; (4) this operating system request changes at one of this microprocessor 100 bus clock pulse ratio and/or asks the change at this microprocessor 100 voltage levvl; (5) this microprocessor 100 meeting itself changes voltage levvl and/or bus clock pulse ratio, such as, to reach power saving and improving SNR; One internal timer overtime of (6) one cores 102; (7) one speed bufferings pry (snoop), it collides an amended scratchpad capable (Cache line), and causes that this scratchpad is capable to be written back in storer; (8) temperature of this microprocessor 100, voltage, bus clock pulse are than exceeding a respective scope; (9) one outer triggering signals established by a user in an external terminal (pin) of this microprocessor 100.
Advantageously, because of the procedure code 132 of core 102 described in this SPU2423 independent operating, it does not have similarly is in this core 102, perform the identical restriction of tracker microcode (tracer code).Therefore, this SPU2423 can detect or the notified event independent of this core 102 instruction exercise boundary and do not interrupt the state of this core 102.
This SPU2423 has the procedure code of its execution itself.This SPU2423 can extract its procedure code from non-core microcode ROM 2425 or from non-this core PRAM116.That is, more preferably say, this SPU2423 and this non-core ROM2425 and this non-core PRAM116 shares the microcode run in this core 102.This SPU2423 uses this non-core PRAM116 to store its data, comprises this record shelves.In one embodiment, this SPU2423 also comprises itself sequence port interface, and it can transmit this record shelves to external device (ED).Advantageously, this SPU2423 also can indicate the tracker run in a core 102 this record shelves information to be stored in system storage by non-core PRAM116.
This SPU2423 is communicated with described core 102 by state working storage and control working storage.This SPU state working storage comprises correspondence and is described in top and this SPU2423 can detect of each event.In order to notify this SPU2423 mono-event, this core 102 is to arranging one in the SPU state working storage of event.Some events position is set by the hardware of this microprocessor 100 and some are set by the microcode of described core 102.This SPU2423 reads this state working storage to determine the list of event.One controls position that working storage comprises each operation corresponding, its each be operating as this SPU2423 and respond detecting one of them operation of allocate event in state working storage.That is, in each possible event of this state working storage, one group of operative position is present in this control working storage.In one embodiment, each event has 16 act bits.In one embodiment, when this state working storage is written into indicate an event, it can cause this SPU2423 to interrupt, to read the response of this state working storage as this SPU2423, to determine which event occurs.Advantageously, so by reducing the demand of this this state working storage of SPU2423 poll to save power supply.This state working storage and control working storage also can be read by the user's program performing instruction (such as, RDMSR and WRMSR instruction) and write.
This group operation that this SPU2423 can perform as detecting one event response comprises the following.(1) this record shelves information is write this non-core PRAM116.For the operation of each write record shelves, multiple operative position exists to make programmer specify the subset of this only specific record shelves information to be written into.(2) by writing this record shelves information in this non-core PRAM116 to this sequence port interface.(3) write control working storage one of them to set an event of tracker.That is, this SPU2423 interruptible price one core 102 cause this tracker microcode need perform one group of operation relevant to this event.This operation is by specified by previous user.In one embodiment, when this SPU2423 writes this control working storage to arrange this event, this can cause this core 102 1 hardware check abnormal, and this hardware check abnormality processing machine check is to check whether tracker is activated.If so, then hardware check exception handler conversion and control is to this tracker.If this tracker reads this control working storage and the event be arranged in this control working storage is user when having enabled the event of this tracker, this tracker performs previous the operation described by the user relevant to event.Such as, this SPU2423 can arrange an event to cause this tracker by the record shelves information writing system storer that is stored in non-core PRAM116.(4) write one and control working storage, be branched off into a microcode address specified by this SPU2423 to cause this microcode.If this contributes to this microcode especially in an infinite loop, make this tracker can not perform any significant operation, but this core 102 still performs and returns (retire) this instruction, it means the event that this processor just performing and can not occur.(5) write a control working storage to reset to make a core 102.As above mention, this SPU2423 can detect the core just carried out 102 (such as, for some time programmable amount, not yet returning (retire) any instruction) and reset this core.This replacement microcode can check to check whether this replacement initiated by this SPU2423, if so, in the process of this core 102 of initialization, contributes to writing out this record shelves information in system storage before this record shelves information of removing.(6) shelves event is recorded continuously.In this mode, and non-camp one event is interrupted, but this SPU2423 checks rotation (spin) in the circulation (loop) of this state working storage one, and recorded information, to being shown in this this non-core PRAM116 relevant to event, and can be selected additionally this record shelves information to be write this sequence port interface continuously.(7) write one and control working storage, to stop a core 102 to issue a request to this shared cache memory 119, and/or request is to core 102 to stop this shared cache memory 119 to confirm.The design mistake that this is removing memory sub-system relevant is particularly useful, it similarly is page translation tables (tablewalk) hard error, even can this microprocessor 100 operation during in amendment this mistake, similarly be by one repair revise this SPU2423 procedure code, as described below.(8) being written to the control working storage of this microprocessor 100 1 external bus interface controller, to perform the process in external system bus, similarly is specific cycle or storer read/write cycle.(9) write to core 102 programmable interrupt controller one controls working storage, such as, produces one and interrupts to another core 102 or simulation one I/O device to core 102 or fixing repair the mistake in this interruptable controller.(10) write one of this shared cache memory 119 and control working storage to control its size, such as, stop using by different way or enable relevant shared cache memory 119.(11) writing the control working storage of the various functional unit of core 102 to configure different performance characteristics, similarly is branch prediction (branch prediction) and data preextraction (prefetch) algorithm.As described below, this SPU2423 procedure code can contribute to repaired, even if after completing the design of this microprocessor 100 and produced this microprocessor 100, makes this SPU2423 perform action as described herein and repairs the defect of design or perform other function.
This SPU start address working storage 2497 keeps when this SPU2423 removes replacement, starts this address of extracting instruction.This SPU start address working storage is write by core 102.This address can be arranged in non-core PRAM116 or non-core microcode ROM 2425.
Refer to Figure 25, it is the framework calcspar of display according to one embodiment of the invention one microcode patching 2500.In the embodiment of Figure 25, this microcode patching 2500 comprises following part: a header 2502; One repairs 2504 immediately; This repairs check and correction and (Checksum) 2506 of 2504 immediately; One CAM data 2508; One core PRAM repairs 2512; These CAM data 2508 and core PRAM repair a check and correction and 2514 of 2512; One RAM repairs 2516; One non-core PRAM repairs 2518; This core PRAM repairs a check and correction and 2522 of 2512 and RAM repairing 2516.Check and correction and 2506/2514/2522, after being loaded on this microprocessor 100, makes this microprocessor 100 check the integrality of repairing various piece.More preferably say, this microcode patching 2500 read by system storage and/or one non-volatile (Non-volatile) system, for example, similarly is from the ROM with a system bios or extended firmware or FLASH memory.Header 2502 describes each several part of this repairing 2500, similarly is its size, indicates this part whether to comprise the effective flag that is applied to this microprocessor 100 Efficient software patching in the position and that it is loaded in each self-healing relational storage of repair part.
This is immediately repaired 2504 and comprises procedure code (such as, instruction, preferably micro-code instruction) to be loaded on the non-core microcode patching RAM2408 of Figure 24 (such as, square 2612 at Figure 26 A ~ 26B), then by each core 102 performed (such as, at the square 2616 of Figure 26 A ~ 26B).This repairing 2500 also specifies this repairing 2504 to be immediately loaded on the address in this repairing RAM2408.More preferably say, this is immediately repaired 2504 yards and revises the preset value write by this replacement microcode, similarly is the value being written into the configuration working storage affecting the configuration of this microprocessor 100.After instant repairing 2504 is performed by each core outside this repairing RAM2408, can't again be performed.In addition, this RAM follow-up repairs the instant repairing 2504 that 2516 processes (square 2632 such as, in Figure 26 A ~ 26B) being loaded into this repairing RAM2408 may cover this repairing RAM2408.
This RAM repairs 2516 and comprises being substituted in the repairing micro-code instruction that core ROM2404 maybe needs in the non-core ROM2425 repaired.This RAM repairs 2516 and is also included in when this repairing 2500 is by use, and this repairing micro-code instruction is written into the address (such as, at the square 2632 of Figure 26 A ~ 26B) of this position in this repairing RAM2408.These CAM data 2508 are loaded on this repairing CAM2439 (such as, at the square 2626 of Figure 26 A ~ 26B) of each core 102.Be more than that these CAM data 2508 comprise one or more project with described by the work angle of this repairing CAM2439, each project comprises a pair microcode and extracts address.This first address is the micro-code instruction that is extracted and the content by this extraction matching addresses.This second address is the address pointed in this repairing RAM2408, this repairing micro-code instruction that its this repairing RAM2408 has the repaired micro-code instruction of replacement and is performed.
Be different from this and immediately repair 2504, this RAM repairs 2516 and maintains in this repairing RAM2408, and (together with operating according to this repairing CAM2439 repairing CAM data 2508) continues to operate to repair this core microcode ROM 2404 and/or this non-core microcode ROM 2425, until reset by another repairing 2500 or this microprocessor 100.
This core PRAM repairs 2512 data comprising this core PRAM2499 being written into each core 102 and the address (such as, at the square 2626 of Figure 26 A ~ 26B) be written in each project of these data in this core PRAM2499.This non-core PRAM repairs 2518 and comprises the data being written into this non-core PRAM116 and the address (such as, at the square 2632 of Figure 26 A ~ 26B) be written in each project of these data in this non-core PRAM116.
Refer to Figure 26 A ~ 26B, it is that an operation of this microprocessor 100 in display Figure 24 is with a process flow diagram of a microcode patching 2500 to multiple cores 102 of this microprocessor 100 of propagating Figure 25.This operation is with described by a single and new angle, but each core 102 of this microprocessor 100 according to the present invention's operation jointly to propagate all cores 102 of this microcode patching to this microprocessor 100.Figure 26 A ~ 26B describes the operation that a core use one running into this instruction is modified to this microcode, and its flow process starts from square 2602, and the operation of other core 102, its flow process starts from square 2652.Should will be understood that, multiple repairing 2500 can this microprocessor 100 operation during in different time in be applied to this microprocessor 100.Such as one first repairs 2500 when this system comprising this microprocessor 100 is directed, similarly be during BIOS initialization in, used according to the embodiment being described in atom herein, and one second repairs and 2500 to be used after this operating system, it is for particularly useful with the object removing this processor 100 mistake.
In square 2602, core 102 one of them run into an instruction it apply the instruction of this microcode patching in this microprocessor 100.More preferably say, this microcode patching is similar to microcode patching recited above.In one embodiment, this application microcode patching instruction is an x86WRMSR instruction.For responding this application microcode patching instruction, this core 102 disable interrupts also stops the microcode performing this application microcode patching instruction.Should will be understood that, this system software comprising this application microcode patching instruction can comprise a multiple instruction sequence, using the preparation of applying as this microcode patching.But more preferably say, it is as the response of this sequence single architecture instruction, and this microcode patching is transmitted to all core with an atomic way in this framework instruction-level.That is, once interrupt being deactivated (such as in this first core 102, in square 2602, this core 102 runs into this application microcode patching instruction), when the microcode performed is propagated this microcode patching and is applied to this microprocessor 100 all core 102 (such as, until after square 2652), interrupt still remaining inactive; Moreover once be deactivated in other core 102 (such as, at square 2652), it is still deactivated until this microcode patching has been applied to (such as, until after square 2634) in all core 102 of this microprocessor 100.Therefore, advantageously, this microcode patching to be propagated with an atomic way in this framework instruction-level and is applied in all cores 102 of this microprocessor 100.Flow process proceeds to square 2604.
In square 2604, this core 102 obtains the entitlement of this hardware semaphore 118 in Fig. 1.More preferably say, this microprocessor 100 comprises a hardware semaphore 118 relevant to repairing microcode.More preferably say, this core 102 obtains the entitlement of hardware semaphore 118 in such manner, described by the similar top Figure 20 of its mode, is more particularly square 2004 and 2006.By use, due to likely core 102, one of them uses one to repair 2500 using as the response running into an application microcode patching instruction to this hardware semaphore 118, and one second core 102 runs into an application microcode patching instruction, will bring into use this second repairing 2500 as this second core, it may cause incorrect execution, for example, due to the misuse of this first repairing 2500.Flow process proceeds to square 2606.
In square 2606, this core 102 transmits a repair information to other core 102 and transmits an internuclear interruption to other core 102.More preferably say, this core 102 the time interrupt be deactivated during in (such as, this microcode does not allow itself to be interrupted) stop this microcode to respond this application microcode patching instruction (square 2602), or respond this interruption (square 2652), and keeping in this microcode, until square 2634.Flow process proceeds to square 2608 by square 2606.
In square 2652, one of other core 102 (core such as, in square 2602 except this core 102 running into this application microcode patching instruction) is interrupted and receives this repair information because of this internuclear interruption transmitted in square 2606.In one embodiment, this core 102 (such as, in next x86 instruction boundaries) in next framework instruction boundaries obtains this interruption.In order to respond this interruption, this core 102 disable interrupts and stop process this repair information microcode.As mentioned above, although the flow process in square 2652 is with described by the angle of a single core 102, but each other core 102 (such as, the core 102 not in square 2602) is interrupted and receives this information in square 2652, and perform the step at square 2608 to square 2634.Flow process proceeds to square 2608 by square 2652.
In square 2608, this core 102 writes the synchronization request (being denoted as SYNC21 in Figure 26 A ~ 26B) of a synchronous situation 21 in its synchronous working storage 108, and make this core 102 enter sleep state by this control module 104, and subsequently when all core 102 has write SYNC21, waken up by this control module 104.Flow process proceeds to decision block 2611.
In decision block 2611, this core 102 judges that whether it is the core 102 (compared with the core 102 receiving this repair information in square 2652) of this microcode patching of meeting in square 2602.If so, then flow process proceeds to square 2612; Otherwise flow process proceeds to square 2614.
In square 2612, a part for the instant repairing 2504 of this microcode patching 2500 is loaded into this non-core and repairs RAM2408 by this core 102.In addition, this core 102 produce this loading immediately repair 2504 one check and and verify that itself and this check and correction and 2506 matches.More preferably say, this core 102 also transmits information to other core 102, its indicate this immediately repair 2504 length and this immediately repair 2504 and be loaded in non-core and repair position in RAM2408.Advantageously, the identical microcode carrying out the application of this microcode patching is performed because all core 102 is known, therefore when a previous RAM repairing 2516 is present in this non-core repairing RAM2408, then (suppose that the microcode being rendered in the application of this microcode patching is not repaired) due to during the period and will not have collision (hit) in this repairing CAM2439, therefore use this new this non-core of repairing covering repairing RAM2408 to be safe.In another embodiment, this is immediately repaired 2504 and is loaded into this non-core PRAM116 by this core 102, and this in square 2616 immediately repair 2504 perform before, this is repaired 2504 and copies to this non-core from this non-core PRAM116 and repair RAM2408 by core 102 immediately.More preferably say, this is repaired the part being loaded into this non-core PRAM116 being preserved for this object by this core 102 immediately, such as, be not used to a part of this non-core PRAM116 of other object, similarly be hold the value that used by this microcode (such as, core 102 state as above, TPM state or effective microcode constant), and a part of this non-core PRAM116 can repaired (such as, at square 2632), repair 2518 not destroyed (clobber) to make arbitrary previous non-core PRAM.In one embodiment, be loaded into this non-core PRAM116 or the action that copied by this non-core PRAM116 performs in multiple stage, to reduce this size needed for reserve part.Flow process proceeds to square 2614.
In square 2614, this core 102 writes the synchronization request of a synchronous situation 22 (being denoted as SYNC22 in Figure 26 A ~ 26B) to its synchronous working storage 108, and make this core 102 enter sleep state by this control module 104, subsequently when all core 102 writes a SYNC22, waken up by control module 104.Flow process proceeds to square 2616.
In square 2616, this core 102 performs this repairing 2504 immediately in this non-core repairing RAM2408.As mentioned above, in one embodiment, perform this immediately repaired before 2504 at this core 102, this repairing 2504 is immediately copied to this non-core by this non-core repairing RAM116 and repairs RAM2408 by this core 102.Flow process proceeds to square 2618.
In square 2618, this core 102 writes the synchronization request of a synchronous situation 23 (being denoted as SYNC23 in Figure 26 A ~ 26B) to its synchronous working storage 108, and make this core 102 enter sleep state by this control module 104, subsequently when all core 102 writes a SYNC23, waken up by control module 104.Flow process proceeds to decision block 2621.
In decision block 2621, this core 102 determines that whether this core 102 be the core 102 (compared with the core 102 receiving this repair information in square 2652) of this application microcode patching instruction run in square 2602.If so, then flow process proceeds to square 2622; Otherwise flow process proceeds to square 2624.
In square 2622, these CAM data 2508 and core PRAM are repaired 2512 and are loaded into this non-core PRAM116 by this core 102.In addition, this core 102 produces an inspection of this loading CAM data 2508 and core PRAM repairing 2512 and and verifies that itself and this check and correction and 2514 matches.More preferably say, this core 102 also transmits information to other core 102, and it indicates these CAM data 2508 and core PRAM to repair the length of 2512, and these CAM data 2508 and core PRAM repair 2512 and be loaded in position in non-core PRAM116.More preferably say, these CAM data 2508 and core PRAM are repaired the reserve part that 2512 are loaded into this non-core PRAM116 by this core 102, to make arbitrary previous non-core PRAM repair 2518 not destroyed (clobber), it is similar to the mode described in square 2612.Flow process advances to square 2624.
In square 2624, this core 102 writes the synchronization request of a synchronous situation 24 (being denoted as SYNC24 in Figure 26 A ~ 26B) to its synchronous working storage 108, and make this core 102 enter sleep state by this control module 104, subsequently when all core 102 writes a SYNC24, waken up by control module 104.Flow process proceeds to square 2626.
In square 2626, these CAM data 2508 are loaded into it by this non-core PRAM116 and repair CAM2439 by this core 102.In addition, this core PRAM is repaired 2512 and is loaded into its core PRAM2499 by this non-core PRAM116 by this core 102.Advantageously, just performing because all core is known and be rendered in microcode identical in the application of this microcode patching, even if this corresponding RAM repairs 2516 and is not yet written into this non-core repairing RAM2408 (it will occur in square 2632), due to during the period, (microcode supposing to be rendered in the application of this microcode patching is not repaired) will not have collision (hit) in this repairing CAM2439, therefore to use these CAM data 2508 to be loaded into this repairing CAM2439 be safe.In addition, just performing because all core 102 is known and be rendered in microcode identical in the application of this microcode patching, and interrupt not using until this repairing 2500 is transmitted to all core 102 in arbitrary core 102, therefore 2512 performed arbitrary renewals to this core PRAM2499 are repaired by this core PRAM, it comprises may affect the renewal of the value of this core 102 operation (such as in order to change, function setting), guarantee can not be seen in framework, until this repairing 2500 has been transmitted to all core 102.Flow process proceeds to square 2628.
In square 2628, this core 102 writes the synchronization request of a synchronous situation 25 (being denoted as SYNC25 in Figure 26 A ~ 26B) to its synchronous working storage 108, and make this core 102 enter sleep state by this control module 104, subsequently when all core 102 writes a SYNC25, waken up by control module 104.Flow process proceeds to decision block 2631.
In decision block 2631, this core 102 determines that whether this core 102 be the core 102 (compared with the core 102 receiving this repair information in square 2652) of this application microcode patching instruction run in square 2602.If so, then flow process proceeds to square 2632; Otherwise flow process proceeds to square 2634.
In square 2632, this core 102 is loaded into this RAM and repairs 2516 to this non-core repairing RAM2408.In addition, this core 102 is loaded into this non-core PRAM and repairs 2518 to this non-core PRAM116.In one embodiment, this non-core PRAM repairing 2518 comprises the procedure code performed by this SPU2423.In one embodiment, this non-core PRAM repairs the renewal that 2518 comprise this microcode institute use value, as mentioned above.In one embodiment, this non-core PRAM repairs the renewal that 2518 comprise this SPU2423 procedure code and this microcode institute use value.Advantageously, just performing because all core 102 is known and be rendered in microcode identical in the application of this microcode patching, more particularly, this repairing CAM2439 of all core 102 has been written into these new CAM data 2508 (such as, in square 2626), and (microcode supposing to be rendered in the application of this microcode patching is not repaired) will not have collision (hit) in this repairing CAM2439 during the period.In addition, just performing because all core 102 is known and be rendered in microcode identical in the application of this microcode patching, and interrupt not using until this repairing 2500 is transmitted to all core 102 in arbitrary core 102,2518 performed arbitrary renewals to this non-core PRAM116 are repaired by this non-core PRAM, comprise and may affect the renewal of the value of this core 102 operation (such as in order to change, function setting), guarantee can not be seen in framework, until this repairing 2500 has been transmitted to all core 102.Flow process proceeds to square 2634.
In square 2634, this core 102 writes the synchronization request of a synchronous situation 26 (being denoted as SYNC26 in Figure 26 A ~ 26B) to its synchronous working storage 108, and make this core 102 enter sleep state by this control module 104, subsequently when all core 102 writes a SYNC26, waken up by control module 104.Flow process ends at square 2634.
After square 2634, if when procedure code is loaded on this non-core PRAM116 for this SPU2423, this repairing core 102 also then starts to perform this procedure code, as described in Figure 30.In addition, after square 2634, this repairing core 102 is released in hardware semaphore 118 acquired in square 2634.Furthermore, after square 2634, this core 102 restarts above-mentioned interruption.
Refer to Figure 27, it is the sequential chart of display according to an example of a microprocessor operation of Figure 26 A ~ 26B process flow diagram.In this example, microprocessor 100 configuration has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.But should will be understood that, in other embodiments, this microprocessor 100 can comprise the core 102 of varying number.At this moment in sequence figure, the sequential that event is carried out is as described in below.
Core 0 receives a request and repairs the request (each square 2602) of microcode and obtain this hardware semaphore 118 (each square 2604) with response.Core 0 then transmits a microcode patching information and interrupts to core 1 and core 2 (each square 2606).Core 0 then writes a SYNC21 and enters sleep state (each square 2608).
Each core 1 and core 2 are finally by being interrupted in its current task and reading this information (each square 2652).To this, each core 1 and core 2 write a SYNC21 and and enter sleep state (each square 2608).As shown in the figure, such as, due to when this interruption is established, just performing the factor of this instruction delay, the time of each core write SYNC21 may be different.
When all core has write SYNC21, all core has waken up (each square 2608) by this control module 104 simultaneously.This is then repaired 2504 and is loaded into this non-core PRAM116 (each square 2612) by core 0 immediately, and writes a SYNC22, and enters sleep state (each square 2614).Each core 1 and core 2 write a SYNC22, and enter sleep state (each square 2614).
When all core has write this SYNC22, all core has waken up (each square 2614) by this control module 104 simultaneously.Each core performs this and immediately repairs 2504 (each square 2616) and write a SYNC23, and enters sleep state (each square 2618).
When all core has write this SYNC23, all core has waken up (each square 2618) by this control module 104 simultaneously.These CAM data 2508 and core PRAM are then repaired 2512 and are loaded into non-core PRAM116 (each square 2622) by core 0, and write a SYNC24, and enter sleep state (each square 2624).
When all core has write this SYNC24, all core has waken up (each square 2624) by this control module 104 simultaneously.Each core then uses these CAM data 2508 to be loaded into it and repairs CAM2439, and use this core PRAM repairing 2512 (each square 2626) to be loaded into its core PRAM2499, and write one SYNC25, and enter sleep state (each square 2628).
When all core has write this SYNC25, all core has waken up (each square 2628) by this control module 104 simultaneously.This RAM is then repaired 2516 and is loaded into this non-core repairing RAM2408 by core 0, and this non-core PRAM repairing 2518 is loaded into this non-core PRAM116, and write one SYNC26, and enters sleep state (each square 2634).
When all core has write this SYNC26, all core has waken up (each square 2634) by this control module 104 simultaneously.As mentioned above, if when procedure code to be loaded on for this non-core PRAM116 in this SPU2423 with the step of square 2632, this core 102 also then starts to perform this procedure code, described by following Figure 30.
Please refer to Figure 28, it is the calcspar of display according to a multi-core microprocessor 100 of another embodiment.This microprocessor 100 is in many aspects similar in appearance to the microprocessor 100 of Figure 24.But the microprocessor 100 of Figure 28 does not comprise a non-core and repairs RAM, but comprise a core repairing RAM2808 in each core 102, it provides repairs the similar function of RAM2408 with this non-core of Figure 24.But, core in each core 102 repair RAM2808 by its separately core 102 institute special and not with other core 102 share.
Refer to Figure 29 A ~ 29B, it shows according to this microprocessor 100 in Figure 28 of another embodiment in order to propagate the operational flowchart of a microcode patching to multiple cores 102 of this microprocessor 100.In another embodiment of Figure 28 and Figure 29 A ~ 29B, the repairing 2500 of Figure 25 can be modified, this check and correction and 2514 is made to adopt this RAM to repair 2516, but not adopt this core PRAM to repair 2512, and the integrality of these CAM data 2508, this core PRAM repair 2512 and this RAM repair 2516 and be loaded into these microprocessors 100 (such as, square 2922 in Figure 29 A ~ 29B) after, enable this microprocessor 100 to verify the integrality of these CAM data 2508, this core PRAM repair 2512 and this RAM repair 2516.The process flow diagram of Figure 29 A ~ 29B is similar to the process flow diagram of Figure 26 A ~ 26B in many aspects, and the square of same numbering is also similar.But square 2912 replaces square 2612, square 2916 replaces square 2616, square 2922 replaces square 2622, square 2926 replaces square 2626 and square 2932 replaces square 2632.In square 2912, this is repaired 2504 and is loaded into this non-core PRAM116 (but not being loaded into a non-core repairing RAM) by this core 102 immediately.In square 2916, in execution, this repaired before 2504 this core 102 immediately, this repairing 2504 is immediately copied to this core from non-core PRAM116 and repairs RAM2808.In square 2922, except these CAM data 2508 and this core PRAM repair except 2512, this RAM is repaired 2516 and is loaded into this non-core PRAM116 by this core 102.In square 2926, this core 102 is repaired CAM2439 except these CAM data 2508 are loaded into it by this non-core PRAM116 and are loaded into except its core PRAM2499 by this core PRAM repairing 2512 by this non-core PRAM116, and this RAM is also repaired 2516 and is loaded into its repairing RAM2808 from this non-core PRAM116 by this core 102.In square 2932, be different from the square 2632 of Figure 26 A ~ 26B, this RAM is not repaired 2516 and is loaded into a non-core repairing RAM by this core 102.
Can observe by above-described embodiment, benefit and be transmitted to this each relational storage 2439/2499/2808 of microprocessor 100 core 102 and propagate to the atom of this microcode patching 2500 of relevant non-core storer 2408/116 integrality and the validity of carrying out guaranteeing this repairing 2500 in such manner, even if there is multiple core 102 performed simultaneously, its core 102 energy shared resource, otherwise when being applied to traditional approach, core 102 may destroy each several part that (clobber) another core is repaired.
repair service processor procedure code
Refer to Figure 30, it is that the microprocessor 100 of display Figure 24 is in order to repair the process flow diagram of a service processor procedure code.Flow process starts from square 3002.
In square 3002, the procedure code performed by this SPU2423 is loaded into this non-core PRAM116 in the specified patch address of a repairing by this core 102, as described in Figure 26 A ~ 26B square 2632 above.Flow process enters this square 3004.
In square 3004, this core 102 controls this SPU2423 to perform the procedure code at patch address, and such as, the procedure code of this SPU2423 is written in the address in non-core PRAM116 in square 3002.In one embodiment, this SPU2423 configures and resets vector (such as in order to extract it from start address working storage 2497, this SPU2423 starts the address of extracting instruction after removing replacement), and this patch address is write this start address working storage 2497 by this core 102, then write in a control working storage that this SPU2423 is reset.Flow process proceeds to square 3006.
In square 3006, this SPU2423 starts, this patch address extraction procedure code (such as, extracting its first instruction), such as, to write this SPU2423 procedure code to the address in non-core PRAM116 in square 3002.In general, SPU2423 Hotfix code in this non-core PRAM116 is resided in by execution one redirect (jump) to the SPU2423 procedure code resided in this non-core ROM2425.Flow process ends at square 3006.
The function of repairing this SPU2423 procedure code may be particularly useful.Such as, this SPU2423 can be used to performance test of short duration in essence, for example, it may not make this performance test SPU2423 procedure code become the permanent part of this microprocessor 100, and only become a part for development part, such as, for fabrication portion, only become a part for development part.In another example, this SPU2423 can in order to look for and/or mis repair.In another example, this SPU2423 can in order to configure this microprocessor 100.
the atom being updated to the visual storage resources of the instant framework of each core is propagated
Please refer to Figure 31, it is the calcspar of display according to a multi-core microprocessor 100 of another embodiment.This microprocessor 100 is in many aspects similar in appearance to the microprocessor 100 of Figure 24.But each core 102 of the microprocessor 100 of Figure 31 also comprises visible type of memory scope working storage on framework (Memory Type Range Registers, MTRRs) 3102.That is, visible MTRR3102 on each core 102 instantiation framework, even if System Software Requirement MTRR3102 is consistent (describing in more detail as follows) in all core 102.MTRR3102 is the example of visible storage resources on each core instantiation framework, and on other each core instantiation framework, visible storage resources embodiment is described below.(although figure is also not shown, and each core 102 also comprises this core PRAM2499, core microcode ROM 2404, repairing CAM2439 in Figure 24, and in one embodiment, the core microcode patching RAM2808 of Figure 28).
MTRR3102 provides a kind of system software, is correlated with from multiple different physical address range in this microprocessor 100 system memory address space to make a type of memory.The example of different memory type comprises strong not cacheable (strong uncacheable), not cacheable (uncacheable), write combines (write-combining), writes by (write through), writes back (write back) and write protection (write protected).Each MTRR3102 (clearly or impliedly) specifies a memory range and type of memory thereof.The common value of each MTRR3102 defines a memory mapped, and it specifies the type of memory of different memory ranges.In one embodiment, MTRR3102 is similar at Intel64 and IA-32 Framework Software developer handbook, the 3rd: System Programming guide, in September, 2013, particularly the description of Section 11.11, it is cited in this article and forms the part of this instructions.
Wish that the memory mapped defined by MTRR3102 is identical in for all core of this microprocessor 100, to make this software of running in this microprocessor 100, there is a memory consistency.But, in traditional processor, there is no hardware supported to maintain the consistance of the internuclear MTRRs of a polycaryon processor.Explain bottom Intel handbook the 3rd 11-20 page as mentioned previously and describe, " P6 and how nearest processor family provide the hardware supported that there is no and provide in order to maintain [consistance of MTRRs value] ".Therefore, system software is then responsible for maintaining the consistance across core MTRR.The algorithm that Intel handbook 11.11.8 saves descriptive system software is quoted in top, and it is in order to maintain and to upgrade the consistance of closing with its each nuclear phase of MTRRs polycaryon processor, and such as, all core performs the instruction of its MTRRs separately of renewal.
On the contrary, this system software can upgrade this MTRR3102 at this core 102 in one of them asks (instance) separately, and in an atomic way, is beneficial to this core 102 propagates this embodiment being updated to the respective request of MTRR3102 in all core of this microprocessor 100 102 and be described in herein (mode being similar to the microcode patching be described in the Figure 24 to Figure 30 of top performed by embodiment).It provides a kind of conforming method of framework instruction-level between MTRR3102 in order to maintain different IPs 102.
Refer to Figure 32, its be display Figure 31 in this microprocessor 100 in order to propagate the operational flowchart of an one of the MTRR3102 multiple cores 102 being updated to this microprocessor 100.This operation is from described by the angle of a single core, but each core 102 of this microprocessor 100 is propagated the description that this MTRR3102 is updated to all core 102 of this microprocessor 100 operate according to common.In particular, Figure 32 describes the operation running into the core upgrading this MTRR3102 instruction, and its flow process starts from square 3202, and the operation of other core 102, its flow process starts from square 3252.
In square 3202, core 102 one of them run into the instruction that this core of an instruction upgrades its MTRR3102.That is, this MTRR update instruction comprises the updated value that a MTRR3102 identifier and is written into this MTRR3102.In one embodiment, this MTRR update instruction is an x86WRMSR instruction, and it is in order to specify in this updated value in EAX:EDX working storage and this MTRR3102 identifier at this ECX working storage, and it is the MSR address in the MSR address space of this core 102.In order to respond this MTRR update instruction, this core 102 disable interrupts also stops the microcode performing this MTRR update instruction.Should will be understood that, this system software comprising this MTRR update instruction can comprise a multiple instruction sequence, using the preparation upgraded as this MTRR3102.But more preferably say, it is as the response of this sequence single architecture instruction, and the MTRR3102 of all core 102 is updated with an atomic way in this framework instruction-level.That is, once interrupt being deactivated (such as in this first core 102, in square 3202, this core 102 runs into this MTRR update instruction), when perform microcode propagate new MTRR3102 value to during this microprocessor 100 all core 102 (such as, until after square 3218), interrupt still remaining inactive.Moreover once be deactivated in other core 102 (such as, at square 3252), it is still deactivated until this MTRR3102 of all core 102 of this microprocessor 100 upgrades (such as, until after square 2634).Therefore, advantageously, this new MTRR3102 value is transmitted in all cores 102 of this microprocessor 100 with an atomic way in this framework instruction-level.Flow process proceeds to square 3204.
In square 3204, this core 102 obtains the entitlement of this hardware semaphore 118 in Fig. 1.More preferably say, this microprocessor 100 comprises a hardware semaphore 118 relevant to a MTRR3102.More preferably say, this core 102 obtains the entitlement of hardware semaphore 118 in such manner, described by the similar top Figure 20 of its mode, is more particularly square 2004 and 2006.By use, due to likely core 102, one of them performs a MTRR3102 and upgrades this hardware semaphore 118, using as the response running into a MTRR update instruction, and one second core 102 runs into a MTRR update instruction, will start the response upgrading this MTRR3102 as this second core, this may cause incorrect execution.Flow process proceeds to square 3206.
In square 3206, a core 102 transmits a MTRR lastest imformation to other core 102 and transmits the internuclear interruption of other core 102 1.More preferably say, in during the time interrupts being deactivated (such as, this microcode does not allow itself to be interrupted), this core 102 stops this microcode to respond this MTRR update instruction (in square 3202) or to respond this interruption (in this square 3252), and being maintained in this microcode, until square 3218.Flow process proceeds to square 3208.
In square 3252, one of other core 102 (core such as, in square 3202 except this core 102 running into this MTRR update instruction) is interrupted and receives this MTRR lastest imformation because of this internuclear interruption transmitted in square 3206.In one embodiment, this core 102 (such as, in next x86 instruction boundaries) in next framework instruction boundaries obtains this interruption.In order to respond this interruption, this core 102 disable interrupts and stop process this MTRR lastest imformation microcode.As mentioned above, although the flow process in square 3252 is with described by the angle of a single core 102, but each other core 102 (such as, the core 102 not in square 3202) is interrupted and receives this information in square 3252, and perform the step at square 3208 to square 3234.Flow process proceeds to square 3208 by square 3252.
In square 3208, this core 102 writes the synchronization request (being denoted as SYNC31 in Figure 32) of a synchronous situation 31 in its synchronous working storage 108, and make this core 102 enter sleep state by this control module 104, and subsequently when all core 102 has write SYNC31, waken up by this control module 104.Flow process proceeds to decision block 3211.
In decision block 3211, this core 102 judges whether it is the core 102 (compared with the core 102 receiving this MTRR lastest imformation in square 3252) of meeting this MTRR update instruction in square 3202.If so, then flow process proceeds to square 3212; Otherwise flow process proceeds to square 3214.
In square 3212, this MTRR identifier specified by this MTRR update instruction and this MTRR are updated the MTRR updated value that other core 102 all can be seen and are loaded into this non-core PRAM116 by this core 102.When an x86 embodiment, MTRR3102 comprises: (1) is repair coverage MTRR, it comprises single 64 MSR upgraded via single WRMSR instruction and (2) different range MTRR, it comprises two 64 MSR, each MSR is written into by a different WRMSR instruction, such as, different MS R address is specified in these two WRMSR instructions.For different range MTRRs, this MSR one of them (this PHYSBASE working storage) comprises a plot and of this memory range in order to specify a type field of this type of memory, and other MSR (this PHYSMASK working storage) comprise that a significance bit and arranges that this scope covers (mask) cover field.More preferably say, this MTRR updated value that this core 102 is loaded into this non-core PRAM116 is as follows.
If when 1 this MSR is defined as this PHYSMASK working storage, then this core 102 is loaded into this non-core PRAM116 mono-128 updated value, and this updated value comprises the currency (it comprises base value and types value) of new 64 place values (it comprises this significance bit and shading values) specified by this WRMSR instruction and this PHYSBASE working storage.
If when 2 these MSR are defined as this PHYSBASE working storage:
If a significance bit in this PHYSMASK working storage is just set up, then this core 102 is loaded into the updated value of this non-core PRAM116 mono-128, and this updated value comprises the currency (this currency comprises this significance bit and shading values) of these 64 new place values (this 64 place value comprises this base value and types value) and this PHYSMASK working storage specified by this WRMSR instruction.
If b significance bit in this PHYSMASK working storage is just set up, then this core 102 is loaded into the updated value of this non-core PRAM116 mono-64, and this updated value only comprises these 64 new place values (this 64 place value comprises this base value and types value) specified by this WRMSR instruction.
In addition, if the updated value of this write is the value of one 128, this core 102 arranges a flag in this non-core PRAM116, and if when this updated value is the value of one 64, then this core 102 removes this flag.Flow process proceeds to square 3214 by square 3212.
In square 3214, this core 102 writes the synchronization request of a synchronous situation 32 (being denoted as SYNC32 in Figure 32) to its synchronous working storage 108, and make this core 102 enter sleep state by this control module 104, subsequently when all core 102 writes a SYNC32, waken up by control module 104.Flow process proceeds to square 3216.
In square 3216, this core 102 reads in this MTRR3102 identifier and this MTRR updated value of write square 3212 from this non-core PRAM116.Advantageously, this MTRR updated value is propagated and is performed with an atomic way, any renewal that may affect the MTRR3102 of respective core 102 operation is made to ensure architecturally invisible, until this updated value has been transmitted to the MTRR3102 of all core 102, just performing because all core is known and be rendered in microcode identical in this MTRR update instruction, and interrupt not using in arbitrary core 102, until this updated value is transmitted to the respective MTRR3102 of all core 102.As in above the present embodiment as described in square 3212, if when this flag is set up in square 3212, then this core 102 also upgrades (except fixed MSR) this PHYSMASK or PHYSBASE working storage; Otherwise, if when this flag is for removing (clear), then this core 102 only upgrades fixed MSR.Flow process proceeds to square 3218.
In square 3218, this core 102 writes the synchronization request of a synchronous situation 33 (being denoted as SYNC33 in Figure 32) to its synchronous working storage 108, and make this core 102 enter sleep state by this control module 104, subsequently when all core 102 writes a SYNC33, waken up by control module 104.Flow process ends at square 3218.
After square 3218, this MTRR core 102 is released in this hardware semaphore 118 obtained in square 3204.Further, after square 3218, this core 102 restarts interruption.
Observe from Figure 31 and Figure 32, the system software operated in Figure 31 microprocessor 100 can be beneficial to perform and in the single core of this microprocessor 100 102, perform a MTRR update instruction to complete the appointment MTRR3102 upgrading all core 102 of this microprocessor 100, and non-individual performs a MTRR update instruction in each core 102, it can provide the integrality of system.
One instantiation specific MTRR3102 in each core 102 is a system management range working storage (System Management Range Register, SMRR) 3102.Due to this SMRR3102 have procedure code and with System Management Mode (System Management Mode, the operation of SMM) relevant data, as a system management interrupt (System Management Interrupt, SMI) processor, this memory range therefore specified by this SMRR3102 is called as SMRAM region.When the procedure code run in a core 102 attempts this SMRAM region of access, if this core 102 runs in SMM, then this core 102 only allows this to access; Otherwise this core 102 ignores a write in this SMRAM region of write, and recover by this SMRAM region read each a fixed value.In addition, if the core 102 operated in this SMM attempts program code outside this SMRAM region, then this core 102 is abnormal by establishment one hardware check.In addition, when this core 102 operates in SMM, this core 102 only allows procedure code to write in this SMRR3102.This is conducive to the protection of SMM procedure code and data in this SMRAM region.In one embodiment, this SMRR3102 is similar in the 3rd, Intel64 and IA-32 Framework Software developer handbook: System Programming guide, in September, 2013, particularly describe at 11.11.2.4 and 34.4.2.1 joint, it is cited in this article and forms the part of this instructions.
In general, each core 102 has the example of himself SMM procedure code and data in memory.Desirably the SMM procedure code of each core 102 and data are subject to protecting to avoid not only coming from the procedure code run in itself, but also from the procedure code run in another core 102.In order to use SMRRs3102 to come, multiple SMM procedure code and data instance are positioned over block adjacent in storer by system software usually.That is, this SMRAM region is a single adjacent memory region comprising all SMM procedure codes and data instance.Specify when comprising the value of all SMM for this single adjacent memory region entirety of procedure code and data instance if this SMRR 3102 of all core of this microprocessor 100 102 has, this can stop a core runs in non-SMM procedure code to upgrade SMM procedure code and the data instance of another core 102.When a time window is present in that in core 102, SMRR3102 value is not identical, such as, in this microprocessor 100 different IPs 102, SMRRs3102 has different values, its arbitrary value is clearly less than the entirety in the single adjacent memory region comprising all SMM procedure codes and data instance, then system may be vulnerable to a security attack, for the character of given SMM, it may be serious.Therefore, description atom propagates the embodiment being updated to SMRRs3102 can be particularly advantageous.
In addition, other embodiment can expect that the renewal of visible storage resources is propagated with an atomic way of similar said method on other each core instantiation framework of this microprocessor 100.Such as, in one embodiment, some bit field position of this x86IA32_MISC_ENABLE MSR of each core 102 instantiation, and a WRMSR performed in a core 102 is transmitted to all cores 102 in this microprocessor 100 in a similar mode as above.In addition, embodiment also can expect that execution in a core 102 of a WRMSR is to other MSR be instantiated in all core 102 of this microprocessor 100, it is all on framework and special and/or current and following, is transmitted to all cores 102 in this microprocessor 100 in a similar mode as above.
In addition, although it is MTRRs that embodiment describes visible storage resources on this each core instantiation framework, it is be different from the resource of x86 ISA instruction set architecture and other resource except MTRRs that other embodiment is contemplated to this each core instantiation resource.For example, other resource except MTRRs comprises the MSR of CPUID value and report-back function, similarly is vector multimedia extensions (Vectored Multimedia eXtensions, VMX) function.
Although the present invention discloses as above with preferred embodiment; so itself and be not used to limit the present invention; those skilled in the art are not departing from spirit of the present invention and category, and when doing a little change and retouching, therefore protection scope of the present invention is when being as the criterion of defining with the application's claim.Such as, software can activation, such as, and function, manufacture, modelling, simulation, description and/or test device of the present invention and method.Above-mentioned by using general procedure language (such as: C, C++), hardware description language (Hardware Description Languages, HDL) comprises Verilog HDL, VHDL etc. to realize.This type of software can be contained in tangible media with the kenel of procedure code, such as any other machine-readable (as embodied on computer readable) storage medium is as semiconductor, disk, hard disk or CD (such as: CD-ROM, DVD-ROM etc.), wherein, when procedure code is by machine, as computing machine be loaded into and perform time, this machine becomes to implement device of the present invention.Method and apparatus of the present invention also can with procedure code kenel by some transmission mediums, as electric wire or cable, optical fiber or any transmission kenel transmit, wherein, when procedure code is by machine, as computing machine receive, be loaded into and perform time, this machine becomes to implement device of the present invention.When general service processor implementation, procedure code associative processor provides a class of operation to be similar to the unique apparatus of application particular logic circuit.Device of the present invention and method can be contained in a semiconductor intelligence property right core such as microprocessor core (being embedded in HDL), and convert the hardware product of integrated circuit to.In addition, device of the present invention and method can comprise the composite entity embodiment with hardware and software.Therefore protection scope of the present invention is when being as the criterion of defining depending on the application's claim.Finally, those skilled in the art based on the concept disclosed by the present invention and specific embodiment, can do a little change with retouching to reach identical object of the present invention without departing from the spirit and scope of the present invention.

Claims (18)

1. a microprocessor, is characterized in that, comprising:
Multiple semiconductor crystal;
One bus, is coupled to above-mentioned multiple semiconductor crystal; And
Multiple process core, wherein a different subclass of above-mentioned multiple process core is arranged in each semiconductor crystal of above-mentioned multiple semiconductor crystal,
Wherein, each crystal of above-mentioned multiple semiconductor crystal comprises a control module, and above-mentioned control module is configured to optionally to control a respective clock signal of each subclass of core to above-mentioned crystal;
For each process core of each subclass of core of above-mentioned crystal, above-mentioned control module is configured to close to the above-mentioned respective clock signal of above-mentioned process core and by above-mentioned bus, above-mentioned value is write to the above-mentioned control module of other crystal in above-mentioned multiple crystal, using as the response of above-mentioned process core by the above-mentioned control module of a value write; And
After above-mentioned clock signal has closed all above-mentioned multiple process core, all above-mentioned control modules have been configured to the above-mentioned clock signal being simultaneously opened into all above-mentioned multiple process core jointly.
2. microprocessor according to claim 1, is characterized in that, each crystal of above-mentioned multiple semiconductor crystal comprises:
One process with each of above-mentioned multiple process cores of above-mentioned microprocessor the respective control working storage that nuclear phase closes, and each making above-mentioned crystal controls the correspondence control working storage that working storage has other crystal of above-mentioned multiple crystal,
Wherein, for each process core of the nucleon set of above-mentioned crystal, above-mentioned control module is configured to the above-mentioned respective clock signal closing to above-mentioned process core, and by above-mentioned bus, above-mentioned value is written in the control working storage of the above-mentioned correspondence in other crystal of above-mentioned multiple crystal, above-mentioned value to be write as above-mentioned process core the response of above-mentioned process core control working storage separately; And
After above-mentioned clock signal has closed all above-mentioned multiple process core, jointly be opened into the above-mentioned clock signal of all above-mentioned multiple process core simultaneously, after above-mentioned value has write to all above-mentioned control working storage of the crystal of each control module, each above-mentioned control module has been configured to the above-mentioned clock signal of the nucleon set of the crystal being opened into each control module above-mentioned.
3. microprocessor according to claim 2, it is characterized in that, each control module is configured to the value write by above-mentioned respective core of above-mentioned control working storage postponing to upgrade its crystal, makes above-mentioned value be written simultaneously control working storage to the above-mentioned correspondence of other crystal above-mentioned in above-mentioned multiple crystal.
4. microprocessor according to claim 1, is characterized in that,
Whether each crystal of above-mentioned multiple semiconductor crystal comprises one and to enable with above-mentioned instruction above-mentioned process core or each of multiple process cores of microprocessor of stopping using processes the corresponding instruction that nuclear phase closes;
After being closed to the above-mentioned clock signal of all above-mentioned multiple process core, jointly be opened into the above-mentioned clock signal of all above-mentioned multiple process core simultaneously, after instruction indicates the above-mentioned clock signal of the launchable all above-mentioned multiple process core of above-mentioned relevant core to be closed extremely respectively, each above-mentioned process core is configured to the above-mentioned clock signal of the above-mentioned subclass being opened into its crystal nuclear.
5. microprocessor according to claim 4, is characterized in that, the renewal initiated with each autocorrelative instruction is put in each process caryogamy of above-mentioned multiple process core.
6. microprocessor according to claim 5, it is characterized in that, above-mentioned control module is jointly configured to upgrade one of all and above-mentioned multiple process core simultaneously and processes all instructions that nuclear phase closes, to initiate the response with the above-mentioned renewal of each autocorrelative instruction as above-mentioned process core.
7. microprocessor according to claim 1, it is characterized in that, the above-mentioned value being write to above-mentioned control module by above-mentioned process core comprises a selective wake-up instruction, wherein all above-mentioned control modules are configured to be opened into the above-mentioned respective clock signal of the process core temporarily finally write in above-mentioned multiple process core jointly, to write above-mentioned value.
8. microprocessor according to claim 1, it is characterized in that, each crystal of above-mentioned multiple semiconductor crystal comprises a state working storage, it can be read by each the process core in the above-mentioned subclass of above-mentioned crystal nuclear, after all above-mentioned control modules are opened into the above-mentioned clock signal of all above-mentioned multiple process core jointly simultaneously, a minimum conventional numerical value of a sleep state part of above-mentioned value write by each process core of above-mentioned multiple process core.
9. microprocessor according to claim 1, is characterized in that, also comprises:
Each control module of above-mentioned control module is configured to determine that the above-mentioned value write by each process core of above-mentioned multiple process core specifies the group being less than above-mentioned multiple process core to process core; And
After the above-mentioned clock signal of above-mentioned group process core is closed, no matter whether write above-mentioned value not included in the above-mentioned multiple process cores outside above-mentioned group, all above-mentioned control modules are configured to the above-mentioned clock signal being simultaneously opened into above-mentioned group process core jointly.
10. the method in order to process core synchronous in a microprocessor, it is characterized in that, above-mentioned microprocessor has multiple semiconductor crystal, one is coupled to the bus of above-mentioned multiple semiconductor crystal and multiple process core, wherein a different subclass of above-mentioned multiple process core is arranged in each semiconductor crystal of above-mentioned multiple semiconductor crystal, each crystal of above-mentioned multiple semiconductor crystal comprises a control module, above-mentioned control module is configured to optionally to control a respective clock signal of each subclass of core to above-mentioned crystal, said method comprises, each process core for above-mentioned each subclass of crystal center performs following operation:
A value is write to above-mentioned control module by above-mentioned process core;
The above-mentioned respective clock signal of above-mentioned process core is closed to by above-mentioned control module;
By above-mentioned bus, above-mentioned value is write to the above-mentioned control module of other crystal in above-mentioned multiple crystal by above-mentioned control module; And
After above-mentioned clock signal has closed all above-mentioned multiple process core, be jointly opened into the above-mentioned clock signal of all above-mentioned multiple process core by all above-mentioned control modules simultaneously.
11. methods according to claim 10, it is characterized in that, each crystal of above-mentioned multiple semiconductor crystal comprises one and processes with each of above-mentioned multiple process cores of above-mentioned microprocessor the respective control working storage that nuclear phase closes, make each control working storage of above-mentioned crystal have a corresponding control working storage of other crystal of above-mentioned multiple crystal, said method also comprises:
For each process core of the nucleon set of above-mentioned crystal, the above-mentioned respective clock signal of above-mentioned process core is closed to by above-mentioned control module, and by above-mentioned bus, above-mentioned value is written in the control working storage of the above-mentioned correspondence in other crystal of above-mentioned multiple crystal, above-mentioned value to be write as above-mentioned process core the response of above-mentioned process core control working storage separately; And
After above-mentioned clock signal has closed all above-mentioned multiple process core, jointly be opened into the above-mentioned clock signal of all above-mentioned multiple process core simultaneously, after above-mentioned value has write to all above-mentioned control working storage of the crystal of each control module, be opened into the above-mentioned clock signal of the nucleon set of the crystal of each control module above-mentioned by each above-mentioned control module.
12. methods according to claim 11, it is characterized in that, each control module is configured to the value write by above-mentioned respective core of above-mentioned control working storage postponing to upgrade its crystal, makes above-mentioned value be written simultaneously control working storage to the above-mentioned correspondence of other crystal above-mentioned in above-mentioned multiple crystal.
13. methods according to claim 10, it is characterized in that, whether each crystal of above-mentioned multiple semiconductor crystal comprises one and to enable with above-mentioned instruction above-mentioned process core or each of multiple process cores of microprocessor of stopping using processes the corresponding instruction that nuclear phase closes, wherein after being closed to the above-mentioned clock signal of all above-mentioned multiple process core, the above-mentioned clock signal being simultaneously jointly opened into all above-mentioned multiple process core is included in after indicating the above-mentioned clock signal of the launchable all above-mentioned multiple process core of above-mentioned relevant core to be closed respectively, the above-mentioned clock signal of the above-mentioned subclass of its crystal nuclear is opened into by each above-mentioned process core.
14. methods according to claim 13, is characterized in that, also comprise:
The renewal with each autocorrelative instruction is initiated by each process core of above-mentioned multiple process core.
15. methods according to claim 14, is characterized in that, also comprise:
Jointly upgrade one of all and above-mentioned multiple process core by above-mentioned control module simultaneously and process all instructions that nuclear phase closes, to initiate the response with the above-mentioned renewal of each autocorrelative instruction as above-mentioned process core.
16. methods according to claim 10, it is characterized in that, the above-mentioned value being write to above-mentioned control module by above-mentioned process core comprises a selective wake-up instruction, wherein all above-mentioned control modules are configured to be opened into the above-mentioned respective clock signal of the process core temporarily finally write in above-mentioned multiple process core jointly, to write above-mentioned value.
17. methods according to claim 10, is characterized in that, each crystal of above-mentioned multiple semiconductor crystal comprises a state working storage, and said method also comprises:
Read by each the process core in the above-mentioned subclass of above-mentioned crystal nuclear, after all above-mentioned control modules are opened into the above-mentioned clock signal of all above-mentioned multiple process core jointly simultaneously, a minimum conventional numerical value of a sleep state part of above-mentioned value write by each process core of above-mentioned multiple process core.
18. methods according to claim 10, is characterized in that, also comprise:
Determine that the above-mentioned value write by each process core of above-mentioned multiple process core specifies the group being less than above-mentioned multiple process core to process core by each control module of above-mentioned control module; And
After the above-mentioned clock signal of above-mentioned group process core is closed, no matter whether writes above-mentioned value not included in the above-mentioned multiple process cores outside above-mentioned group, be jointly opened into by all above-mentioned control modules the above-mentioned clock signal that above-mentioned group processes core simultaneously.
CN201410431514.2A 2013-08-28 2014-08-28 Microprocessor and the in the microprocessor method of synchronization process core Active CN104216861B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201361871206P 2013-08-28 2013-08-28
US61/871,206 2013-08-28
US201361916338P 2013-12-16 2013-12-16
US61/916,338 2013-12-16
US14/281,488 2014-05-19
US14/281,488 US9513687B2 (en) 2013-08-28 2014-05-19 Core synchronization mechanism in a multi-die multi-core microprocessor

Publications (2)

Publication Number Publication Date
CN104216861A true CN104216861A (en) 2014-12-17
CN104216861B CN104216861B (en) 2019-04-19

Family

ID=52098368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410431514.2A Active CN104216861B (en) 2013-08-28 2014-08-28 Microprocessor and the in the microprocessor method of synchronization process core

Country Status (1)

Country Link
CN (1) CN104216861B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7392414B2 (en) * 2003-07-15 2008-06-24 Intel Corporation Method, system, and apparatus for improving multi-core processor performance
US20090235260A1 (en) * 2008-03-11 2009-09-17 Alexander Branover Enhanced Control of CPU Parking and Thread Rescheduling for Maximizing the Benefits of Low-Power State
US20090235099A1 (en) * 2008-03-11 2009-09-17 Alexander Branover Protocol for Transitioning In and Out of Zero-Power State
US20120151263A1 (en) * 2010-12-09 2012-06-14 Advanced Micro Devices, Inc. Debug state machines and methods of their operation
US20120166845A1 (en) * 2010-12-22 2012-06-28 Via Technologies, Inc. Power state synchronization in a multi-core processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7392414B2 (en) * 2003-07-15 2008-06-24 Intel Corporation Method, system, and apparatus for improving multi-core processor performance
US20090235260A1 (en) * 2008-03-11 2009-09-17 Alexander Branover Enhanced Control of CPU Parking and Thread Rescheduling for Maximizing the Benefits of Low-Power State
US20090235099A1 (en) * 2008-03-11 2009-09-17 Alexander Branover Protocol for Transitioning In and Out of Zero-Power State
US20120151263A1 (en) * 2010-12-09 2012-06-14 Advanced Micro Devices, Inc. Debug state machines and methods of their operation
US20120166845A1 (en) * 2010-12-22 2012-06-28 Via Technologies, Inc. Power state synchronization in a multi-core processor

Also Published As

Publication number Publication date
CN104216861B (en) 2019-04-19

Similar Documents

Publication Publication Date Title
CN104462004A (en) Microprocessor and method of processing multi-core synchronization thereof
CN104216680A (en) Microprocessor and execution method
TWI637316B (en) Dynamic reconfiguration of multi-core processor
CN104238997A (en) Microprocessor and execution method thereof
CN104331388A (en) Micro-processor and method for synchronizing processing cores of same
CN104216679A (en) Microprocessor and execution method thereof
CN104239275A (en) Multicore microprocessor and reconfiguring method thereof
CN104239274B (en) Microprocessor and its configuration method
CN104360727A (en) Microprocessor and method for saving power
CN104331387A (en) Micro-processor and configuration method thereof
CN104216861A (en) Microprocessor and method of synchronously processing core in same
CN104239273A (en) Microprocessor and execution method thereof
CN104239272A (en) Microprocessor and operating method thereof
EP2843550B1 (en) Dynamic reconfiguration of mulit-core processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant