CN104239274B - Microprocessor and its configuration method - Google Patents

Microprocessor and its configuration method Download PDF

Info

Publication number
CN104239274B
CN104239274B CN201410431347.1A CN201410431347A CN104239274B CN 104239274 B CN104239274 B CN 104239274B CN 201410431347 A CN201410431347 A CN 201410431347A CN 104239274 B CN104239274 B CN 104239274B
Authority
CN
China
Prior art keywords
core
mentioned
square
process cores
microprocessor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410431347.1A
Other languages
Chinese (zh)
Other versions
CN104239274A (en
Inventor
G·葛兰·亨利
史蒂芬·嘉斯金斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/281,729 external-priority patent/US9535488B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Priority to CN201810985885.3A priority Critical patent/CN109165189B/en
Publication of CN104239274A publication Critical patent/CN104239274A/en
Application granted granted Critical
Publication of CN104239274B publication Critical patent/CN104239274B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A kind of microprocessor of present invention offer and its configuration method.Above-mentioned microprocessor includes an instruction and multiple process cores.Each process cores of above-mentioned multiple process cores are configured as sampling above-mentioned instruction.When above-mentioned instruction indicates first preset value, above-mentioned multiple process cores are configured as specifying a default process cores of above-mentioned multiple process cores jointly being a pilot processor.When second preset value of the above-mentioned instruction instruction different from above-mentioned first preset value, the process cores that above-mentioned multiple process cores are configured as in specified above-mentioned multiple process cores other than above-mentioned default process cores jointly are above-mentioned pilot processor.The present invention has less power consumption.

Description

Microprocessor and its configuration method
Technical field
The present invention is about a microprocessor, and particularly with regard to the dynamic of the bootstrap processor in a multi-core microprocessor It is specified.
Background technology
The increase of multi-core microprocessor is primarily due to it and provides the advantage in performance.May be mainly due to half Conductor device geometry dimension size is rapidly reduced, to increase transistor density.The presence of multinuclear in a microprocessor The demand communicated with other cores with a core is generated, to complete various functions, such as power management, cache memory pipe Reason, except it is wrong and with the relevant configuration of more cores.
Traditionally, the program (for example, operating system or application program) for operating in framework on multi-core processor has used position Semaphore on by all core frameworks in an addressable system storage is communicated.This may be sufficiently used for many mesh , but possibly can not provide other required speed, accuracy and/or systemic hierarchial transparency.
Invention content
The present invention provides a kind of microprocessor.Above-mentioned microprocessor includes an instruction and multiple process cores.Above-mentioned multiple places The reason each process cores of core are configured as sampling above-mentioned instruction.When above-mentioned instruction indicates first preset value, above-mentioned multiple processing Core is configured as specifying a default process cores of above-mentioned multiple process cores jointly being a pilot processor (Bootstrap Processor, BSP).When second preset value of the above-mentioned instruction instruction different from above-mentioned first preset value, above-mentioned multiple places The process cores that reason core is configured as in specified above-mentioned multiple process cores other than above-mentioned default process cores jointly are upper State pilot processor.
The present invention provides a kind of method of configuration multi-core microprocessor, and the above method includes:Sample above-mentioned microprocessor One instruction, wherein above-mentioned microprocessor includes multiple process cores;When above-mentioned instruction indicates first preset value, specify above-mentioned more One default process cores of a process cores are a pilot processor (Bootstrap Processor, BSP);When above-mentioned instruction indicates Different from above-mentioned first preset value second preset value when, specify above-mentioned multiple processing other than above-mentioned default process cores A process cores in core are above-mentioned pilot processor.
The present invention provides one kind, and at least a non-transient computer usable medium is encoded in for a computer installation Computer program product, above computer program product include the computer usable program code for indicating a microprocessor.Above-mentioned meter Calculation machine usable program code includes the first procedure code indicated, and indicates the second procedure code of multiple process cores, wherein The above-mentioned each process cores of multiple process cores are configured as sampling above-mentioned instruction.When above-mentioned instruction indicates first preset value, on Multiple process cores are stated to be configured as specifying a default process cores of above-mentioned multiple process cores jointly being a pilot processor (Bootstrap Processor, BSP).When second preset value of the above-mentioned instruction instruction different from above-mentioned first preset value, Above-mentioned multiple process cores are configured as one in specified above-mentioned multiple process cores other than above-mentioned default process cores jointly Process cores are above-mentioned pilot processor.
The present invention has less power consumption.
Description of the drawings
Fig. 1 is the block diagram for showing a multi-core microprocessor.
Fig. 2 is the block diagram for showing a control word, a status word and a configuration words.
Fig. 3 is the flow chart for showing control unit operation.
Fig. 4 is a block diagram of the microprocessor for showing another embodiment.
Fig. 5 is to show a microprocessor operation with the flow chart of dump Debugging message.
Fig. 6 is the operation example sequence diagram for showing one according to microprocessor in Fig. 5 flow charts.
Fig. 7 A~7B are to show that a microprocessor executes the flow chart of across core speed buffering control operation.
Fig. 8 is the sequence diagram for showing the microprocessor operation example according to Fig. 7 A~7B flow charts.
Fig. 9 is the operational flowchart that display microprocessor enters low-power encapsulation C- states.
Figure 10 is the sequence diagram shown according to one microprocessor operation example of Fig. 9 flow charts.
Figure 11 is the operating process that microprocessor according to another embodiment of the present invention enters low-power encapsulation C- states Figure.
Figure 12 is the sequence diagram for showing one example of microprocessor operation according to Figure 11 flow charts.
Figure 13 is the sequence diagram for showing another example of microprocessor operation according to Figure 11 flow charts.
Figure 14 is the flow chart that the dynamic of display microprocessor reconfigures.
Figure 15 is the flow chart that microprocessor dynamic reconfigures in showing according to another embodiment.
Figure 16 is the sequence diagram for showing one example of microprocessor operation according to Figure 15 flow charts.
Figure 17 is shown in a block diagram of hardware semaphore 118 in Fig. 1.
Figure 18 is shown when a core 102 reads the operational flowchart of hardware semaphore 118.
Figure 19 is the operational flowchart shown when core write-in hardware semaphore.
Figure 20 is shown when microprocessor using hardware semaphore to execute the operating process for needing a resource exclusive ownership Figure.
Figure 21 is to show that the core of flow chart according to fig. 3 sends out the sequence diagram that non-sleep synchronization request operates an example.
Figure 22 is the program flow diagram for showing configuration microprocessor.
Figure 23 is a program flow diagram of configuration microprocessor in showing according to another embodiment.
Figure 24 is the block diagram for showing a multi-core microprocessor according to another embodiment.
Figure 25 is the block diagram for showing a microcode patching framework.
Figure 26 A~26B are to show the microprocessor in Figure 24 with a microcode patching for propagating Figure 25 to the microprocessor One operational flowchart of multinuclear.
Figure 27 is the sequence diagram of an example of the microprocessor operation for showing 6A~26B flow charts according to fig. 2.
Figure 28 is the block diagram for showing a multi-core microprocessor according to another embodiment.
Figure 29 A~29B be in the Figure 28 shown according to another embodiment the microprocessor propagating a microcode patching extremely One operational flowchart of multiple cores of the microprocessor.
Figure 30 is to show the microprocessor of Figure 24 to repair the flow chart of a service processor procedure code.
Figure 31 is the block diagram for showing a multi-core microprocessor according to another embodiment.
Figure 32 be show the microprocessor in Figure 31 to propagate that a MTRR is updated to multiple cores of the microprocessor one Operational flowchart.
Wherein, symbol is simply described as follows in attached drawing:
100:Multi-core microprocessor;102A、102B、102N:Core A, core B, core N;103:Non-core;104:Control unit; 106:State buffer;108A、108B、108C、108D、108N:Synchronous buffer;108E、108F、108G、108H:Shadow is same Walk buffer;114:Fuse;116:Special random access memory;118:Hardware semaphore;119:Shared speed buffering is deposited Reservoir;122A、122B、122N:Clock signal;124A、124B、124N:Interrupt signal;126A、126B、126N:Data-signal; 128A、128B、128N:Controlling electric energy signal;202:Control word;204:Wake events;206:Synchronous control;208:Power supply lock; 212:Sleep;214:Selective wake-up;222:S;224:C;226:Synchronous regime or C- states;228:Core set;232:It forces It is synchronous;234:Selectivity is synchronous to be stopped;236:Deactivate core;242:Status word;244:Wake events;246:Minimum common C- shapes State;248:Error code;252:Configuration words;254-0~254-7:Enable;256:Local nuclear volume;258:Amount of crystals;302、 304、305、306、312、314、316、318、322、326、328、332、334、336:Step;402A、402B:Bus between crystal Bus unit B between unit A, crystal;404:Bus between crystal;406A、406B:Crystal A, crystal B;502、504、505、508、 514、516、518、524、526、528、532:Step;702、704、706、708、714、716、717、718、724、726、727、 728、744、746、747、748、749、752:Step;902、904、906、907、908、909、914、916、919、921、924: Step;1102、1104、1106、1108、1109、1121、1124、1132、1134、1136、1137:Step;1402、1404、 1406、1408、1412、1414、1416、1417、1418、1422、1424、1426:Step;1502、1504、1506、1508、 1517、1518、1522、1524、1526、1532:Step;1702:Possess position;1704:Owner position;1706:State machine 1802, 1804、1806、1808:Step;1902、1904、1906、1908、1912、1914、1916、1918:Step;2002、2004、 2006、2008:Step;2202、2203、2204、2205、2206、2208、2212、2214、2216、2218、2222、2224:Step Suddenly;2302、2304、2305、2306、2312、2315、2318、2324:Step;2404:Core microcode read-only memory;2408:It is non- Core microcode patching random access memory;2423:Service unit;2425:Non-core microcode read-only memory;2439:Repairing It can addressing content memorizer;2497:Service unit initial address buffer 2499:Core random access memory;2500:It is micro- Code repairing;2502:Header;2504:Immediately repairing;2506:Check and correction and;2508:CAM data;2512:Core PRAM repairings;2514: Check and correction and;2516:RAM is repaired;2518:Non-core PRAM repairings;2522:Check and correction and;2602、2604、2606、2608、2611、 2612、2614、2616、2618、2621、2622、2624、2626、2628、2631、2632、2634、2652:Step;2808:Core Repair RAM;2912、2916、2922、2932:Step;3002、3004、3006:Step;3102:Type of memory range is temporary Device;3202、3204、3206、3208、3211、3212、3214、3216、3218、3252:Step.
Specific implementation mode
Hereinafter introduce highly preferred embodiment of the present invention.Each embodiment is but non-limiting to illustrate the principle of the present invention The system present invention.The scope of the present invention is when being subject to claims.
Fig. 1 is please referred to, is the block diagram for showing a multi-core microprocessor 100.Microprocessor 100 includes multiple processing Core is denoted as 102A, 102B to 102N, is referred to as multiple process cores 102, or referred to as multiple cores 102, and be individually referred to as locating Manage core 102 or abbreviation core 102.More preferably to say, each core 102 includes the pipeline (not shown) of one or more functional units, Including an instruction cache (instruction cache), an instruction converting unit or instruction decoder, more preferably It is deposited including a microcode (microcode) unit, temporary call by name unit, reservation station (Reservation station), speed buffering Reservoir, execution unit, memory sub-system and the retirement unit (retire unit) including an order buffer.More preferably say, Multiple cores 102 include a SuperScale (Superscalar), non-sequential execution (out-of-order execution) microbody frame Structure.In one embodiment, microprocessor 100 is an x86 architecture microprocessors, but in other embodiments, and microprocessor 100 accords with Close the framework of other instruction set.
Microprocessor 100 also includes a non-core 103 different from above-mentioned multiple cores 102 for being coupled to above-mentioned multiple cores 102. Non-core 103 includes a control unit 104, fuse 114,116 (Private Random of a special random access memory Access Memory, PRAM) and a shared cache memory 119 (Shared Cache Memory), for example, by more The second level (level-2, L2) and/or the third level (level-3, L3) cache memory that a core 102 is shared.It is each The configuration of core 102 to read data/write data to non-core 103 from non-core 103 by a respective address/data bus 126, Core 102 provides a nand architecture address space (being also considered as special or micro-architecture address space) to the shared resource of non-core 103.It is special Random access memory 116 is special or nand architecture, that is to say, that it is not in framework user's program of microprocessor 100 In the space of location.In one embodiment, non-core 103 includes arbitrated logic (Arbitration Logic), passes through multiple cores 102 Requests for arbitration accesses the resource of non-core 103.
Each fuse 114 is an electronic device, can be blown or not be blown;When fuse 114 is not blown, Fuse 114 has Low ESR and easily conducts electric current;When fuse 114 is blown, fuse 114 has high impedance and does not allow Easily conduction electric current.One detection circuit is associated with each fuse 114, to assess the fuse 114, for example, detecting the fusing Device 114 whether conduct a high current or low-voltage (not blowing, for example, logic is zero or removes (clear)) or a low current or High voltage (is blown, for example, logic is one or setting (set)).The fuse 114 can be during the manufacture of microprocessor 100 It is blown, and in some embodiments, a fuse 114 not blown can be blown after the manufacture of microprocessor 100.More preferably It says, a fuse 114 blown is irreversible.The example of one fuse 114 is a polysilicon fuse, can be applied between device Add a sufficiently high voltage and blows.Another example of one fuse 114 is nickel-chromium fuse, and a laser can be used and blow. It more preferably says, sensing circuit electric power opens sensing fuse 114, and provides its assessment to the preservation buffer of microprocessor 100 A corresponding positions in (Holding Register).When microprocessor 100 is reset releasing, multiple cores 102 (for example, microcode) Reading and saving buffer is to determine the value of sensed fuse 114.In one embodiment, it is reset solution in microprocessor 100 Before removing, updated value can input scanning to preservation buffer via a boundary scan, for example, seem a combined testing It is inputted for tissue (Joint Test Action Group, JTAG), the value of fuse 114 is updated with essence.This is for testing And/or detect wrong purpose, such as lower section describe with it is particularly useful in the relevant embodiments of Figure 22 and Figure 23.
In addition, in one embodiment, microprocessor 100 includes different local advanced programmable related from each core 102 Interrupt control unit (Advanced Programmable Interrupt Controller, APIC) (not shown).It is real one It applies in example, abides by local Advanced Programmable Interrupt Controllers APICs framework California (California) holy santa clara The Intel Company of (Santa Clara) is one in May, 2012 Intel 64 and IA-32 Framework Software developer's handbooks 3A The explanation of local Advanced Programmable Interrupt Controllers APICs, especially in Section 10.4.Especially local advanced programmable interrupt control Device processed includes that an Advanced Programmable Interrupt Controllers APICs ID and one includes pilot processor (Bootstrap Processor, BSP) flag Target Advanced Programmable Interrupt Controllers APICs plot buffer, generate and purposes will be described in further detail it is as follows, especially with The related embodiments of Figure 14 to Figure 16.
Control unit 104 includes the combination of hardware, software or hardware and software.Control unit 104 includes a hardware signal Measure (Hardware Semaphore) 118 (describing following Figure 17 to Figure 20 in detail), a state buffer 106, one configuration temporarily Storage 112 and with 102 corresponding one synchronous buffer 108 of each core.It more preferably says, the entity of each non-core 103 is non- Can be addressed by each core 102 in different address in framework address space, the nand architecture address space can make microcode read and Core 102 is written.
Each synchronous buffer 108 can be written by corresponding core 102.State buffer 106 is read by each core 102 It takes.Configuring buffer 112 can be read by each core 102 (via the deactivated core position 236 of Fig. 2 as described below) and be written indirectly. Control unit 104 may also include interrupt logic (not shown), which generates to the corresponding interruption letter of each core 102 Number (interrupt signal, INTR) 124, the interrupt signal are generated by control unit 104 to interrupt corresponding core 102.In Disconnected source responds the control unit 104 and generates to an interrupt signal 124 of a core 102, and interrupt source may include exterior interrupt (example As x86 frameworks INTR, SMI, NMI interrupts source) or bus events (for example, the bus signals STPCLK of x86 framework formulas is established (assertion) or (de-assertion) is established in releasing).In addition, each core 102 can be transmitted by write control unit 104 One internuclear interrupt signal 124 is to other each cores 102.It more preferably says, unless otherwise stated, described herein internuclear Interrupt signal is that the microcode of a core 102 asks the internuclear interrupt signal of nand architecture via a microcommand (microinrstuction), It is different from instructing the requested internuclear interrupt signal of conventional architectures via a framework by system software.Finally, when one synchronizes feelings When condition (Synchronization Condition) has occurred and that, as described below (for example, please referring to the side in Figure 21 and Fig. 3 Block 334), control unit 104 can generate an interrupt signal 124 to core 102 (one synchronizes interrupt signal).Control unit 104 is also produced To each core 102, wherein control unit 104 can selectively close off a raw corresponding clock signal (CLOCK) 122, and have Effect ground makes corresponding core 102 be backed up into sleeping and opening to wake up core 102.Control unit 104 also generates a correspondence core Controlling electric energy signal (PWR) 128 selectively controls corresponding core 102 and receives or do not receive electric energy to each core 102.Cause This, control unit 104 can selectively make a core 102 enter a deeper sleep shape via corresponding controlling electric energy signal 128 State reopens electric energy to the core 102 to wake up the core 102 to close the electric energy of the core.
One core, 102 writable its corresponding, with sync bit set (positions S 222 for please referring to Fig. 2) synchronization buffer In 108, aforesaid operations are considered as a synchronization request (Synchronization Request).More detailed description is described as follows, In one embodiment, synchronization request request control unit 104 makes core 102 enter sleep state, and synchronizes and happen when one When and/or wake up the core 102 when specific wake events occur.One synchronize happen in microprocessor 100 own The core 102 that can enable and (please refer to the enable position 254 in Fig. 2) or the specific subset that can enable core 102 conjunction (please refer in Fig. 2 Core set field 228) have been written into identical synchronous situation and (be described in more detail in the positions C 224, synchronous situation or C- status bars in Fig. 2 One combination of position 226 and core set field 228, the positions S 222 are more fully described as follows) to its corresponding synchronous buffer 108 When.The occurrence of being synchronized in response to one, control unit 104 wake up all cores 102 for just waiting for the synchronous situation simultaneously, That is, having requested that synchronous situation.In another embodiment being described as follows, core 102 can ask to be only the last written the synchronization request A core 102 be waken up (the selective wake-up position 214 for please referring to Fig. 2).In another embodiment, synchronization request does not ask core 102 enter sleep state, on the contrary, synchronization request request control unit 104 interrupts core 102 when synchronous situation occurs, more in detail It carefully is described as follows, especially Fig. 3 and Figure 21.
It more preferably says, when control unit 104 is detected when a synchronous situation has occurred (due to being ultimately written synchronization request to same Walk the last core 102 in buffer 108), control unit 104 makes last core 102 enter sleep state, is sent to for example, closing Be ultimately written the clock signal 122 of core 102, then simultaneously wake up all cores 102, for example, open be sent to all cores 102 when Arteries and veins signal 122.In this method, all cores 102 are all accurately waken up in the identical clock cycle (clock cycles), For example, its clock signal 122 is made to be opened.For certain operations, such as it is particularly advantageous (please join except wrong (debugging) Read the embodiment in Fig. 5), it is beneficial for accurately waking up core 102 in the same clock cycle.In one embodiment, non- Core 103 includes a single phase-locked loop (Phase-locked Loop, PLL), generates the clock signal for being supplied to core 102 122.In other embodiments, microprocessor 100 includes multiple phase-locked loops, generates the clock signal for being provided to core 102 122。
Control, state and configuration words
Fig. 2 is please referred to, shows a block diagram of a control word 202, status word 242 and a configuration words 252.One core 102 One value of write-in control word 202 asks (atomic to the synchronization buffer 108 of the control unit 104 of Fig. 1 to generate an atom Request), with request into sleep state and/or with all other core 102 in microprocessor 100 or a specific subset contract Stepization (synchronization).One core 102 reads a value of the status word 242 that state buffer 106 is transmitted in the control unit 104, To determine status information described herein.What configuration buffer 112 was transmitted in the one core 102 reading control unit 104 should One value of configuration words 252, and the value is used, it is described as follows.
Control word 202 includes that a wake events field 204, one synchronizes control group position 206 and a power supply lock (Power Gate, PG) position 208.The synchronous control field 206 includes various positions or sub- field, controls sleep and/or the core 102 of core 102 It is synchronous with other cores 102.Synchronous control field 206 include one sleep position 212, the position 214 a selective wake-up (SEL WAKE), One positions 222 S, the positions 224 C, a synchronous regime or C- states field 226, a core set field 228, a forcing synchronization position 232, The synchronous suspension position (kill) 234 of one selectivity and core deactivate core position 236.Status word 242 include a wake events field 244, One minimum common C- states field 246 and an error code field 248.The configuration words 252 include each core of microprocessor 100 The 102 local nuclear volume field 256 in an enable position 254, one and an amount of crystals field 258.
The wake events field 204 of the control word 202 includes multiple positions corresponding to different event.As fruit stone 102 is arranged One in wake events field 204, when this corresponding occurs for event, control unit 104 will wake up the core 102 (for example, opening Clock signal 122 is opened to the core 102).When the core 102 is synchronous with all other core specified in core set field 228 When, then a wake events occur.In one embodiment, core set field 228 may specify all cores 102 in microprocessor 100;Institute Have core 102 and instant (instant) core 102 share a cache memory (for example, a second level (L2) speed buffering and/ Or the third level (L3) speed buffering);In identical semiconductor crystal, all cores 102 are instant core 102 (refering to described in Fig. 4 one One example of the embodiment of polycrystal, multi-core microprocessor 100);Or all cores 102 in other semiconductor crystals are instant Core 102.The core set 102 of one shared cache memory can be considered a chip (Slice).Other examples of other wake events Son includes, but are not limited to, and (de- is established in the establishment (assertion) or releasing of x86INTR, SMI, NMI, a STPCLK ) and an internuclear interruption (inter-core interrupt) assertion.When a core 102 is waken up, can be read in state Wake events field 244 in word 242 is to determine the positive movable wake events.
When the positions PG 208 are arranged such as fruit stone 102, which closes after so that core 102 is entered sleep state to core 102 electric energy (for example, via the controlling electric energy signal 128).When control unit 104 then restores electricity to core 102, control Unit 104 removes the positions PG 208.The use of the positions PG 208 will be more fully described in following Figure 11 to Figure 13.
If when the core 102 setting sleep position 212 or selective wake-up position 214, control unit 104 makes in the write-in of core 102 With specifying after the synchronization buffer 108 of 204 wake events of wake events field, core 102 is made to enter sleep state.The sleep position 212 and 214 mutual exclusion of selective wake-up position.When one, which synchronizes, happens, the difference between them is taken with control unit 104 Action it is related.If the setting sleep of core 102 position 212, when one, which synchronizes, happens, then control unit 104 will wake up all cores 102.If conversely, a core 102 setting selective wake-up position 214, when one, which synchronizes, happens, control unit 104 will only wake up Synchronous situation is ultimately written to the core 102 of its synchronization buffer.
If fruit stone 102 does not set sleep position 212, when selective wake-up position 214 not being arranged yet, although control unit 104 is not Core 102 can be made to enter sleep state, but when one synchronizes and happens, control unit 104 will not wake up core 102.Control is single Member 104, which will be arranged, is indicating that a synchronous situation is the position of just movable wake events field 204, therefore core 102 can be detected The synchronous situation has occurred and that.Many can refer to also interrupt by the control due to the wake events in the wake events field 204 An interrupt signal produced by unit 104 is to the source of core 102.If however, requiring, the microcode of core 102, which can cover interruption, to be come Source.In this way, when core 102 is waken up, the microcode can be read state buffer 106 determine a synchronous situation or a wake events or Whether the two occurs.
If the positions S 222 are arranged in fruit stone 102, request control unit 104 is synchronous in a synchronous situation.The synchronous situation is in C It is designated in some combinations of position 224, synchronous situation or C- states field 226 and in core set field 228.If the positions C 224 are set When setting, C- states field 226 specifies a C- state values;If the positions C 224 are to remove, synchronous situation field 226 specifies a non-C- shapes State synchronous situation.It more preferably says, the value of synchronous regime or C- states field 226 includes the bounded set of a nonnegative integer.One In embodiment, the synchronous situation or C- states field 226 are 4.When the positions C 224 are to remove (clear), synchronous situation hair Life exists:All cores 102 in one specific core set field 228 have been written into the set of the positions S 222 and synchronous situation field 226 In identical value to synchronous buffer 108.In one embodiment, the value of synchronous situation field 226 corresponds to a unique synchronous situation, For example, synchronous situation various in the embodiment demonstrated described by lower section.When the positions C 224 are set, synchronous situation is happened at All cores 102 whether have been written into identical value in the C- states field 226, all in a specific core set field 228 The respective collection of the positions S 222 is written to be bonded in synchronous buffer 108.In the case, control unit 104 distributes (post) the C- states Minimum write-in in field 226 is worth the minimum common C- states field 246 into the state buffer 106, the minimum write-in value It can be read by a core 102, for example, by main core 102 in square 908 or by being ultimately written/selecting in square 1108 Core 102 is waken up to selecting property to be read.In one embodiment, if core 102 specifies a preset value in synchronous situation field 226 (for example, all set), this instruction control unit 104 are any synchronous with specified by other cores 102 to match instant core 102 226 value of situation field.
If core 102 sets forcing synchronization position 232, control unit 104 is found forced all synchronization requests just carried out Match.
In general, if any core 102 is waken up because of the wake events specified by wake events field 204, Control unit 104 stops (kill) all synchronization requests just carried out by removing in synchronous buffer 108 positions S 222.So And if when the setting of core 102 the selection synchronizes middle stop bit 234, control unit 104 will stop only because of (asynchronous to happen) The synchronization request that the core 102 that wake events are waken up just is carrying out.
If two or more core 102 asks synchronous under different synchronous situations, control unit 104 thinks that this pauses for one (deadlock) situation.If a value is the positions 222 S that (set) is arranged, the C that a value is removing (clear) by two or more core 102 When different value in position 224 and synchronous situation field 226 is written in respective synchronous buffer 108, two or more core 102 then exists It asks to synchronize under different synchronous situations.For example, if a core 102 by a value be the positions 222 S of (set) are set, a value is clear Except in the positions 224 C of (clear) and the write-in to synchronous buffer 108 of value 7 of a synchronous situation 226, and another core 102 is by a value For the positions 222 S of setting (set), a value be remove (clear) the positions 224 C and 226 value 9 of a synchronous situation be written it is temporary to synchronizing When in device 108, control unit 104 then thinks that this is a stall condition.If in addition, a value is the positions C 224 removed by a core 102 Be written to its synchronize in buffer 108 and another core 102 by a value be arranged (set) the write-in of the positions C 224 synchronize to it is temporary In device 108, then control unit 104 thinks that this is a stall condition.In response to a stall condition, control unit 104 stops institute There is the synchronization request just carried out, and wakes up all cores 102 in sleep mode.Control unit 104 also distributes (post) in shape Value in the error code field 248 of state buffer 106, state buffer 106 are that can be read by core 102 to determine pause original Cause and the state buffer to take appropriate action.In one embodiment, error code 248 indicates the synchronization that each core 102 is written Situation, the synchronous situation make each core decide whether to continue to execute the projected route of its action or be delayed to another core 102.Citing For, if a core 102 synchronous situation is written with execute a power management operations (for example, execute an x86MWAIT instruction) and A synchronous situation is written to execute cache management operation (for example, x86WBINVD is instructed) in another core 102, then plan is held The core 102 of the row MWAIT instruction is because MWAIT is a selectable operation, and WBINVD is an enforceable operation and is cancelled MWAIT instruction, to be delayed to another positive core 102 for executing WBINVD instructions.As another example, if a core 102 write-in is together Step situation is to execute one except wrong operation (for example, dump removes wrong state (Dump debug state)) and another core 102 are written When one synchronous situation is to execute cache management operation (for example, WBINVD is instructed), then plan the core 102 for carrying out WBINVD By storing WBINVD states, wait for dump except mistaking raw and recovery WBINVD states and executing WBINVD instructions, to be delayed to Executive dumping is except wrong core 102.
Amount of crystals field 258 is zero in the embodiment of a single crystal.More than one a crystal embodiment (for example, In Fig. 4), amount of crystals field 258 indicates which crystal is resident by the core 102 for reading configuration buffer 112.Citing comes Say, in the embodiment of one or two crystal, the crystal be designated as 0 and 1 and the amount of crystals field 258 have 0 or 1 value. In one embodiment, for example, fuse 114 is selectively blown with a specified crystal as 0 or 1.
Local nuclear volume field 256 indicates the number of the local crystal center to the positive core 102 for reading and configuring buffer 112 Amount.It more preferably says, although having a sole disposition buffer 112 shared by all cores 102, control unit 104 is known Which core of road 102 is just reading configuration buffer 112, and is provided correctly in local nuclear volume field 256 according to a reader Value.This makes the microcode of core 102 know the local nuclear volume between other cores 102 in same crystal.In one embodiment, exist One multiplexer of 103 part of non-core of microprocessor 100 selects value appropriate, the value appropriate that can be based on core 102 and read It configures buffer 112 and restores in the local nuclear volume field 256 of configuration words 252.In one embodiment, it selectively blows The operation of fuse 114 restores the value of local nuclear volume field 256 together with multiplexer.It more preferably says, local nuclear volume column The value of position 256 is fixed independent, and the core 102 in crystal is workable, and enable position 254 as described below is signified Show.That is, even if when one or more cores 102 of the crystal are deactivated, the value of local nuclear volume field 256 remains solid It is fixed.In addition, the microcode of core 102 calculates the whole nuclear volume of core 102, the whole nuclear volume of the core 102 is one relevant with configuration Value, purposes are described in detail as follows.The nuclear volume of whole 100 whole core 102 of nuclear volume instruction microprocessor.Core 102 is by making Its whole nuclear volume is calculated with the value of amount of crystals field 258.For example, in one embodiment, microprocessor 100 includes 8 cores 102, average mark has to two in the crystal of crystal value 0 and 1, in each crystal, the local recovery of nuclear volume field 256 1, 1,2 or 3 value;The core for being 1 in crystal value restores the value of local nuclear volume field 256 to calculate its whole nuclear volume plus 4.
There are each core 102 of microprocessor 100 configuration words 252 to correspond to enable position 254, and configuration words 252 indicate the core Whether 102 be activated or deactivate.In fig. 2, enable position 254 is indicated with enable position 254-x respectively, and wherein x is the correspondence core 102 Whole nuclear volume.Example in Fig. 2 assumes there is eight cores 102 in microprocessor 100, in the example of Fig. 2 and Fig. 4, causes Energy position 254-0 instructions have whether the core 102 (for example, core A) of whole nuclear volume 0 is activated, and 254-1 instructions in enable position are with whole Whether the core 102 (for example, core B) of body nuclear volume 1 is activated, and 254-2 instructions in enable position have the 102 (example of core of whole nuclear volume 2 Such as, core C) whether be activated etc..Therefore, by understanding whole nuclear volume, the microcode of a core 102 can be by determining in configuration words 252 Which core 102 for determining microprocessor 100 is deactivated and which core 102 is activated.More preferably say, if the core 102 is activated, Then an enable position 254 is set, if core 102 is deactivated, enable position 254 is eliminated.When the microprocessor 100 is set again Periodically, hardware is automatically filled the enable position 254 (populate).It more preferably says, when microprocessor 100 has indicated one by manufacture Whether given core 102 is enabling, if be off, which is selectively blown based on fuse 114 and inserts enable Position 254.For example, if a given core 102 is tested and finds that it is failure, a fuse 114 can be blown To remove the enable position 254 of the core 102.In one embodiment, a fuse 114 being blown indicates that a core 102 is deactivated, and It prevents from the clock signal for being provided to deactivated core 102.This can be deactivated the write-in of core position 236 to its synchronization by each core 102 In buffer 108, to remove its enable position 254, more it will be described in the relevant details of Figure 14 to Figure 16 as follows.More preferably It says, removing enable position 254 will not prevent the core 102 from executing instruction, but can update the configuration buffer 112, also, the core 102 A different position (not shown) must be set, to prevent the core itself from executing instruction, for example, make its power supply be removed and/or Close its clock signal.For polycrystal configuration microprocessor 100 (for example, Fig. 4), which includes that this is micro- An enable position 254 of all cores 102 in processor 100, for example, all cores 102 not only can be the core 102 of the local crystal, and And it is alternatively the core 102 of the distal end crystal.It more preferably says, in the microprocessor 100 of polycrystal configuration, when a core 102 is write When entering to its synchronization buffer 108, the shadow that the value of synchronous buffer 108 is passed in corresponding another crystal synchronizes buffer 108 core 102 (please referring to Fig. 4), wherein be set if this deactivates core position 236, a update will be caused to be transferred into distal end crystal Configure buffer 112 so that local and distal end crystal configures the value all having the same of buffer 112.
In one embodiment, configuration buffer 112 can not be directly written by a core 102.However, extremely by a core 102 write-in The configuration buffer 112 will cause the value of local enable position 254 to be transmitted to other crystal in a polycrystal microprocessor 100 Configuration buffer 112 in, for example, such as the description in square 1406 in Figure 14.
Control unit
Referring to FIG. 3, being to show a flow chart for describing the control unit 104.Flow starts from square 302.In square In 302, a synchronization request is written in a core 102, for example, a control word 202 is written to its synchronization buffer 108, the wherein synchronization Request is received by control unit 104.In the case where a polycrystal configures microprocessor 100 (for example, referring to Fig. 4), when one The shadow of control unit 104 synchronizes buffer 108 and receives has propagated synchronous buffer 108 by what other crystal 406 were transmitted Value, the control unit 104 effectively operate according to fig. 3, for example, when from its this earth's core 102, one of them connects the control unit 104 A synchronization request (square 302) is received, in addition to the control unit 104 makes core 102 enter sleep (for example, square 314) or wake up It (in square 306,328 or 336) or interrupts (in square 334) or prevents core 102 in the wake events of its local crystal 406 (square 326) also inserts its local state buffer 106 (square 318).Flow proceeds to square 304.
In square 304, which checks the synchronous situation in square 302, to determine a pause (deadlock) whether situation has occurred, as described by figure 2 above.If so, flow marches to square 306;Otherwise, flow carries out To decision block 312.
In square 305, the control unit 104 detecting is in the 108 wake events field 204 of one of them of synchronous buffer A wake events generation (in addition to be detected in square 316 one synchronize the occurrence of other than).Such as lower section square 326 Described in, control unit 104 can automatically prevent wake events.Control unit 104 can detect the wake events and occur as A synchronization request is written in square 302 when one event asynchronous (Event Asynchronous).Flow also by square 305 into It goes to square 306.
In square 306, which inserts state buffer 106, stops the synchronization request just carried out, and Wake up the core 102 of any sleep.It may include restoring its power as described above, waking up sleep core 102.The core 102 then can be read The state buffer 106, especially error code 248, with determine pause the reason of, and according to the collision sync ask it is corresponding excellent First sequential processes it, as described above.In addition, the control unit 104 stops all synchronization requests just carried out (for example, removing The positions S in the synchronization buffer 105 of each core 102 222), unless square 306 is by reaching after square 305 and the selection When synchronizing middle stop bit 234 and being set, in this case, which can stop only to be waken up by the wake events The synchronization request that core 102 is just carrying out.If square 306 is by reaching after square 305, which can be read 244 column of wake events Position is to determine wake events occurred.If in addition, the wake events are an interruption sources for not covering (unmasked), control Unit 104 processed will generate an interrupt requests to the core 102 by the interrupt signal 124.Flow terminates in square 306.
In decision block 312, which determines whether sleep position 212 or selective wake-up position 214 are set It is fixed.If so, then flow is carried out to square 314;Otherwise, flow is carried out to decision block 316.
In square 314, control unit 104 makes the core 102 enter sleep state.As described above, making a core 102 into sleep Dormancy state may include removing its power supply.In one embodiment, as an optimized example, even if the positions PG 208 are set, if This is the core 102 (for example, the generation that will cause synchronous situation) being ultimately written, and in square 314, which does not move Except the power supply of the core 102, and because the control unit 104 backs up the core 102 that instant on is ultimately written in square 328, Therefore the selection wakes up position 214 and is set.In one embodiment, which includes synchronous logic and sleep logic, The two is separated from each other, but communicates;In addition, each synchronous logic includes the one of the synchronous buffer 108 with sleep logic Part.Advantageously, write-in sleeping to synchronous with this is written to buffer in the synchronous logic part of the synchronization buffer 108 108 Dormancy logical gate is atom (atomic), i.e., indivisible.That is, if when part write-in occurs, synchronize Logical gate and sleep logic part all ensure to occur.It more preferably says, the piping obstruction of the core 102, does not allow any more Write-in occur, until it is guaranteed to be written until two parts in the synchronization buffer 108 have all occurred.Write-in is together Step is asked and the advantages of immediately entering sleep state is that it does not need the core 102 (for example, microcode) and continuously operates so that determine should Whether synchronous situation has occurred and that.Due to can save electric power and not consume other resources, such as bus and/or Memory bandwidth Width, thus it is very useful.It is worth noting that, in order to enter sleep state but without ask it is synchronous with other cores 102 (for example, Square 924 and square 1124), the core 102 can be written the positions S 222 be remove (Clear) and sleep position 212 be set (Set), A referred to herein as Sleep Request, until in the synchronization buffer 108;If specified one does not hide in wake events field 204 When the wake events covered occur (for example, square 305), but the occurrence of this core 102 1 synchronizes is not found (for example, square 316) when, in this case, which wakes up the core 102 (for example, square 306).Flow proceeds to decision block 316。
In decision block 316, which determines whether a synchronous situation occurs.If so, flow is carried out to side Block 318.As described above, a synchronous situation can be only when the positions S 222 be set.In one embodiment, the control unit 104 Using the enable position 254 in Fig. 2, indicate which core 102 is activated in the microprocessor 100 and which core 102 is stopped With.The control unit 104 only looks for the core being activated 102, to determine whether a synchronous situation occurs.One core 102 can be because of its quilt It tests and finds defective in the production time and be deactivated.Therefore, a fuse is blown so that the core 102 can not operate simultaneously Indicate that the core 102 is deactivated.One core 102 can be deactivated (for example, please referring to Fig.1 5) due to the 102 requested software of core.It lifts For example, when a user asks, a special module buffer (Model Specific Register, MSR) is written in BIOS To ask the core 102 to be deactivated, itself (for example, core position 236 is deactivated by this) is stopped using to respond the core 102, and lead to Know that other cores 102 read other cores 102 and determine to deactivate the configuration buffer 112 of the core 102.One core 102 can also be via a microcode It repairs (patch) (for example, please referring to Fig.1 4), which can be generated by blowing fuse 114 and/or from system storage (such as a FLASH memory) is loaded into.Other than determining whether a synchronous situation occurs, which checks that this is strong Compel sync bit 232.If setting (set), flow is then carried out to square 318.If the forcing synchronization position 232 is to remove (clear) And one synchronous situation not yet occur, then flow ends in square 316.
In square 318, which inserts the state buffer 106.Explicitly, in case of synchronous feelings When condition is the synchronization that all cores 102 ask a C- states, as described above, the control unit 104 inserts minimum common C- status bars Position 246.Flow is carried out to decision block 322.
In decision block 322, which checks the position 214 selective wake-up (SEL WAKE).If the position is When (set) is arranged, flow is carried out to square 326;Otherwise, flow is carried out to decision block 322.
In square 326, which prevents all other core 102 other than instant core (instant core) All wake events, wherein the instant core be ultimately written in square 302 synchronization request to its synchronize buffer 108 core 102, therefore the synchronous situation is made to occur.In one embodiment, if wake events to be prevented and other aspects are true (True) When, simply boolean (Boolean) AND operation has one is the wake-up feelings of false (False) signal for the logic of the control unit 104 Condition.The purposes of all wake events of all cores is prevented to be described in more detail as follows, especially Figure 11 to Figure 13.Flow carries out To square 328.
In square 328, which only wakes up the instant core 102, but the not wake request synchronization is other Core.In addition, the control unit 104 stops the synchronization request that the instant core 102 is just carrying out by removing the positions S 222, but do not stop The synchronization request that other cores 102 are just carrying out, for example, the positions S 222 for leaving other cores 102 are arranged.It is therefore advantageous that if working as Instant core 102 will again result in the generation of synchronous situation (assuming that other when another synchronization request is written after it is waken up The synchronization request of core 102 is not yet aborted), an example will be described in lower section Figure 12 and Figure 13.Flow ends at square 328.
In decision block 332, which checks the sleep position 212.If the position is setting (set), Flow proceeds to square 336;Otherwise, flow proceeds to square 334.
In square 334, which transmits an interrupt signal (sync break) to all cores 102.Figure 21 when Sequence figure is the example for illustrating a non-sleep synchronization request.Each core 102 can be read the wake events field 244 and detect one and synchronizes The occurrence of be interrupt the reason of.Flow has progressed to square 334, in the case, when its synchronization request is written in core 102 When, the selection of core 102 does not enter sleep state.Although such situation is same when core 102 not being made to obtain with into sleep state Benefit (for example, waking up simultaneously), but its have make core 102 wait for be ultimately written its synchronize require core 102 without simultaneously In the case of wake-up, the potential advantages of instruction are continued with.Flow ends at square 334.
In square 336, which is waken up by all cores 102 simultaneously.In one embodiment, the control unit 104 are accurately opened into the clock signal 122 of all cores 102 in the same clock cycle.In another embodiment, the control list Member 104 opens the clock signal 122 to all cores 102 in such a way that one interlocks.That is, the control unit 104 is when opening Arteries and veins signal 122 is to each internuclear predetermined quantity (for example, clock sequence be ten or 100) for introducing a clock cycle.However, when Staggeredly (staggering) unlatching is considered in the present invention arteries and veins signal 122 simultaneously.For reduce when all cores 102 are waken up one The possibility of power loss spike, it is beneficial that clock signal 122, which is staggeredly opened,.In still another embodiment, in order to reduce electricity When power consumes the possibility of spike, which is opened into the clock signal 122 of all cores 102 in the same clock cycle, But by being initially at offer clock signal 122 in the frequency of a reduction and improving under frequency to target frequency, continue absolutely one It is executed in continuous (stuttering) or compacting (throttled) mode.In one embodiment, the synchronization request is as the core 102 The implementing result of micro-code instruction be issued, and the microcode is designed at least some synchronous situation values, and specifies this same It is unique to walk the microcode position of case values.For example, only one place includes a synchronization x requests in microcode, in microcode In only one place include one synchronize y request, and so on.In these cases, because all cores 102 are in identical local quilt It wakes up, Microcode Design personnel may make to design more efficiently and flawless procedure code, therefore it is beneficial to wake up simultaneously. In addition, when attempting to re-establish and repair mistake occur because multinuclear interacts, but do not occur mistake then when single core is run It mistakes, it may be particularly advantageous to be waken up simultaneously for the purpose of except mistake.Fig. 5 and Fig. 6 is to show this example.In addition, the control Unit 104 stops all synchronization requests just carried out (for example, removing the positions S in the synchronization buffer 108 of each core 102 222).Flow ends at square 336.
One advantage of embodiment described herein be its can substantially reduce the quantity of the microcode in a microprocessor, because compared with It recycles (looping) or executes other inspections to synchronize the operation between multinuclear, the microcode in each core can be simply written together Step request into sleep state, and is aware of when that in microcode, same place wakes up all cores.The synchronization request mechanism it is micro- Code purposes will be described in lower section.
Polycrystal microprocessor
Fig. 4 is please referred to, is the block diagram for showing another embodiment microprocessor 100.Microprocessor 100 in Fig. 4 exists Many aspects are similar to the microprocessor 100 of Fig. 1, wherein a multi-core processor and core 102 are similar.However, the embodiment of Fig. 4 It is polycrystal configuration.That is, the microprocessor 100 includes being mounted in a common packaging body (common package) And the multiple semiconductor crystal 406 communicated with another crystal via a crystal internal bus 404.The embodiment of Fig. 4 includes two crystal 406, the crystal B 406B coupled labeled as crystal A406A and by bus between crystal 404.In addition, each crystal 406 includes Bus unit 402 between one crystal, bus unit 402 contacts respective crystal 406 to bus 404 between the crystal between crystal.More into One step, each crystal 406 includes the control unit being coupled in the non-core 103 of bus unit 402 between respective core 102 and crystal 104.In the fig. 4 embodiment, crystal A 406A include four 102-core of core A 102A, core B 102B, core C 102C and core D 102D, wherein aforementioned four core 102 are coupled to the control unit A 104A for being coupled to bus unit A 402A between a crystal;Together Sample, crystal B 406B include four 102-core of core E 102E, core F 102F, core G102G and core H102H, wherein aforementioned four Core 102 is coupled to a control unit B104B for being coupled to bus unit B 402B between a crystal.Finally, each control unit 104 Be not only included in each core in the crystal 406 including itself one synchronizes buffer 108, also includes every in another crystal 406 The one of one core synchronizes buffer 108, wherein the synchronization buffer 108 in above-mentioned another crystal 406 is shadow shown in Fig. 4 Buffer (Shadow register).Therefore, each control unit in embodiment illustrated in fig. 4 includes eight synchronous buffers 108, it is expressed as 108A, 108B, 108C, 108D, 108E, 108F, 108G and 108H.In control unit A104A, synchronous buffer 108E, 108F, 108G and 108H are shadow buffer, and in control unit B104B, synchronous buffer 108A, 108B, 108C, 108D are shadow buffer.
When a value is written to it by a core 102 synchronizes buffer 108, the control unit in the crystal 406 of core 102 104, via bus 404 between bus unit between crystal 402 and crystal, it is temporary that corresponding shadow in the value to another crystal 406 is written Storage 108.In addition, if deactivated core position 236 is set in when propagating in the value of shadow synchronization buffer 108, the control Unit 104 also updates the corresponding enable position 254 in configuring buffer 112.In this way, even in microprocessor It is in the case of dynamic change (for example, Figure 14 to Figure 16) that 100 caryogamy, which are set, synchronize the occurrence of (including one across crystal (trans-die) generation of synchronous situation) it can be detected.In one embodiment, bus 404 is a relative low speeds between crystal Bus, and the clock cycle sequence for 100 core of a predetermined quantity can be used in the propagation, and each control unit 104 includes one Status mechanism takes the time of a predetermined quantity to detect the generation of the synchronous situation, and opens the clock signal to respective All cores 102 in crystal 406.More preferably say, control unit 104 start write-in be worth to another crystal 406 (for example, by Bus 404 between the crystal authorized), control unit 104 in local crystal 406 (e.g., including the crystal of write-in core 102 406) it is configured as delay and updates the local synchronization buffer until time of a predetermined quantity (for example, propagation time number The summation of detecting time quantity occurs with status mechanism synchronous situation for amount).In such mode, the control list in two crystal The occurrence of one synchronization of detecting simultaneously of member 104, and at the same time being opened into the clock pulse letter of all cores 102 in two crystal 406 Number.When trial re-establishes and repair the mistake for only occurring because multinuclear interacts, but not occurring when a single core is just run Mistake, for the purpose of removing mistake for may be particularly beneficial.Fig. 5 and Fig. 6 describes the embodiment possibly also with this functionality advantage.
Debugging operations
The core 102 of microprocessor 100 is configured to execute individually adjustment operation, such as instruction execution and data access Breakpoint (Breakpoint).In addition, microprocessor 100 is configured to execute to grasp across the debugging of core (trans-core) Make, for example, the debugging operations are related to the 100 more than one core 102 of microprocessor.
Referring to Fig. 5, it is the flow chart that (debug) information is debugged in the operation of display microprocessor 100 with dump (dump). The operation is described by the angle from a single core, but each core 102 operates common dump according to its description in microprocessor 100 The state of microprocessor 100.More specifically, Fig. 5 describes a core and receives request with the operation of dump Debugging message, flow Start from square 502, and the operating process of other cores 102 starts from square 532.
In square 502, one of them one request of reception of core 102 is with dump Debugging message.It more preferably says, above-mentioned adjustment letter Breath includes the state of the core 102 or one subset.It more preferably says, adjustment information can pass through tune by dump to system storage or one The external bus of finishing equipment monitoring, seems a logic analyzer.Respond the request, one debugging dump information of the transmission of core 102 to its Its core 102 simultaneously transmits 102 1 internuclear interrupt signal of other cores.It more preferably says, (example in a period of this time, interruption was deactivated Such as, which does not allow to be interrupted in itself), core 102 prevents microcode to respond the request with dump Debugging message (in square 502 In), or the above-mentioned interrupt signal (in square 532) of response, and be maintained in microcode, until square 528.In an embodiment In, core 102 only need to be in sleep state when it and be interrupted when being located at framework instruction boundaries.In one embodiment, described herein Various internuclear information (seem square 502 and it is other seem the information in square 702,1502,2606 and 3206) via Synchronous situation or C- states field 226 of 108 control word of synchronous buffer are transmitted and are received.In other embodiments, core Between information transmitted and received via the special random access memory of non-core 116.Flow proceeds to square 504 from square 502.
In square 532, one of other cores 102 in square 502 (for example, receive debugging dump request core A core 102 except 102) turn since the internuclear interrupt signal and information that are transmitted in square 502 are interrupted and receive the debugging Store up information.Although as described above, described by angle of the flow by single core 102 in square 532, each other cores 102 (for example, the not core 102 in square 502) is interrupted and receives the information in square 532, and executes the step of square 504 to 528 Suddenly.Flow proceeds to square 504 by square 532.
In square 504, the synchronization request that a synchronous situation 1 (being denoted as SYNC 1 in Figure 5) is written in core 102 is same to it It walks in buffer 108.Therefore, which makes core 102 enter sleep state.Flow proceeds to square 506.
In square 506, when all cores have been written into SYNC 1, core 102 is waken up by control unit 104.Flow carries out To square 508.
In square 508, in its state to memory of 102 dump of core.Flow proceeds to square 514.
In square 514, a SYNC 2 is written in core 102, causes control unit 104 that core 102 is made to enter sleep state.Stream Journey proceeds to square 516.
In square 516, when all cores have been written into SYNC 2, core 102 is waken up by control unit 104.Flow carries out To square 518.
In square 518, the storage address of 102 dump of core Debugging message in square 508 sets a flag (flag), it is maintained by resetting (Reset) signal, then resets itself.Core 102 resets microcode, which detects the flag It marks and its state is loaded by stored storage address again.Flow proceeds to square 524.
In square 524, a SYNC 3 is written in core 102, causes control unit 104 that core 102 is made to enter sleep state.Stream Journey proceeds to square 526.
In square 526, when all cores have been written into SYNC 3, core 102 is waken up by control unit 104.Flow carries out To square 528.
In square 528, which is removed based on the state being loaded into again in square 518 and is reset, and starts to carry Framework (for example, x86) is taken to instruct.Flow ends at square 528.
Fig. 6 is please referred to, is the operation example sequence diagram for showing one according to microprocessor 100 in Fig. 5 flow charts.In this example In son, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, being understood that It is that in other embodiments, microprocessor 100 may include the core 102 of different number.In this sequence diagram, the mistake of event-order serie Journey is as described below.
Core 0 receives a debugging dump request, and transmits a debugging dump information and interrupting information to 2 (each party of core 1 and core Block 502) in response.The core 0 is then written to a SYNC 1, and enters sleep state (each square 504).
Each core 1 and core 2 are finally by being interrupted and reading its information (each square 532) in its current task.As sound It answers, each core 1 and core 2 are written a SYNC 1 and enter sleep state (each square 504).As shown, each core write-in The time of SYNC 1 may be different, for example, since the instruction is carrying out when the interruption is established.
When all cores have been written into SYNC 1, control unit 104 wakes up all cores (each square 506) simultaneously.Each core Then its state of dump is written a SYNC 2 and enters sleep state (each square 514) to memory (each square 508). Need the time quantum of the dump state may be different;Therefore, the time for SYNC 2 being written in each core may be different, as shown in the figure.
When all cores have been written into SYNC 2, control unit 104 wakes up all cores (each square 516) simultaneously.Each core Then itself is reset and by being loaded into its state (each square 518) in memory again, SYNC 3 is written and entering sleep shape State (each square 524).As shown, need to reset and again be loaded into state time quantum may be different;Therefore, every The time that SYNC 3 is written in one core may be different.
When all cores have been written into SYNC 3, control unit 104 wakes up all cores (each square 526) simultaneously.Each core Then start to instruct (each square 528) in the time point extraction framework being interrupted.
Tradition solution of simultaneously operating between multiprocessor is to use software signal amount (semaphore).However, Traditional solution synchronizes (Clock-level Synchronization) the disadvantage is that it can not provide time grade.Herein The advantages of described embodiment is control unit 104 can open clock signal 122 to all core 102 simultaneously.
In method as described above, the engineer of an adjustment microprocessor 100 can configure one of core 102 with the period Property real estate biopsy look into time point, to generate debugging dump request, for example, executed in the instruction of a predetermined quantity Afterwards.When microprocessor 100 at runtime, engineer obtains all work on 100 external bus of microprocessor in a record shelves It is dynamic.The record shelves part of time of origin is noticeable close to bus can be provided to a software simulator, simulate the microprocessor 100 to help engineer to debug.Simulator simulation is executed by the instruction indicated by each core 102, and simulates external microprocessor 100 bus of device uses the execution for noting down information.In one embodiment, the simulator of all cores 102 is opened from simultaneously by a resetting point It is dynamic.Therefore, all cores 102 of the microprocessor 100 actually stop resetting (for example, after SYNC 2) in the same time is With higher effect.In addition, by all other core 102 stopped its current task (for example, SYNC 1 it Before afterwards), when waiting for its state of dump, by 102 dump of a core, its state will not execute debugging (for example, shared deposit with other cores Memory bus or speed buffering influence each other) procedure code and/or hardware interfere with each other, can increase and regenerate mistake and sentence The possibility for its reason of breaking.Similarly, (for example, in SYNC 3 until all cores 102 have completed to be loaded into its state again Later), it waits for start to extract framework instruction, the journey of debugging will not be executed with other cores by being loaded into state again by a core 102 Sequence code and/or hardware interfere with each other, and can increase the possibility for regenerating mistake and judging its reason.
These benefits provide the advantage more than existing method, existing method such as United States Patent (USP) US8, and 370,684, from All purposes can not enjoy the benefit that can obtain the synchronization request core collectively as with reference to this is incorporated in.
Speed buffering control operation
The core 102 of microprocessor 100 is configured to execute independent speed buffering control operation, seems in local high speed Buffer storage, for example, the high-speed buffer that do not shared by two or more cores 102.In addition, microprocessor 100 is configured It is operated to execute to be controlled across the speed buffering of core (Trans-core), for example, with 100 more than one core of microprocessor 102 is related, and for example, because it is related to a shared cache memory 119.
Fig. 7 A~7B are please referred to, are that display microprocessor 100 controls the flow operated to execute across core speed buffering Figure.The embodiment of Fig. 7 A~7B describes microprocessor 100 and how to execute an x86 frameworks to write back invalid buffering (Write Back And Invalidate Cache, WBINVD) instruction.The core 102 that one WBINVD instruction instructions execute instruction writes back in microprocessor All modifications go to system storage and cache memory are made to fail in 100 cache memory of device, or empty (Flush).WBINVD instructions also indicate the core 102 and issue the special bus cycles with will be outside arbitrary cache memory Directly refer in microprocessor 100, to write back the data that it has been changed, and makes above-mentioned data failure.Aforesaid operations are single with one Described by the angle of one core, but each core 102 of microprocessor 100 writes back to have changed and delay at a high speed jointly according to this specification operation It breasts the tape (Modified cache line) and keeps the cache memory of microprocessor 100 invalid.It further illustrates, schemes 7A~7B describes the operation that a core encounters WBINVD instructions, and flow starts from square 702, and the flow of other cores 102 is opened Start from square 752.
In block 702, one of core 102 encounters WBINVD instructions.In response, core 102 transmits a WBINVD Command information is to other cores 102 and transmits an internuclear interrupt signal to above-mentioned other cores 102.More preferably say, until flow into Before row to square 748/749, core 102 is in a period of the time, interrupt signal was deactivated (for example, the microcode does not allow itself Be interrupted), prevent response (in block 702) of the microcode to be instructed as WBINVD, or using as the interrupt signal (in square In 752) response, and maintain in microcode.Flow proceeds to square 704 from square 702.
In square 752, one of other cores 102 (for example, in addition to encountering WBINVD instructions in block 702 A core except core 102) it is interrupted due to the internuclear interrupt signal transmitted in block 702 and receives the WBINVD and refer to Enable information.As described above, although flow is described by the angle that square 752 is by single core 102, each 102 (example of other cores Such as, it is not core 102 in block 702) information is interrupted and received in square 752, and square 704 is executed to square 749 the step of.Flow proceeds to square 704 by square 752.
In square 704, the synchronization request which is written a synchronous situation 4 (is denoted as SYNC in Fig. 7 A~7B 4) it is synchronized in buffer 108 to it.Therefore, control unit 104 makes core 102 enter sleep state.Flow proceeds to square 706.
In block 706, when all cores 102 have been written into SYNC 4, which is waken up by control unit 104.Flow Proceed to square 708.
In block 708, core 102 writes back and makes local cache memory failure, for example, not by core 102 and its The 1st grade of shared (Level-1, L1) cache memory of its core 102.Flow proceeds to frame 714.
In square 714, a SYNC 5 is written in core 102, causes control unit 104 that core 102 is made to enter sleep state.Stream Journey proceeds to square 716.
In square 716, when all cores 102 have been written into SYNC 5, core 102 is waken up by control unit 104.Flow into Row arrives decision block 717.
In decision block 717, core 102 judges whether it is the core 102 for encountering WBINVD instructions in block 702 (being contrasted with the core 102 for receiving the WBINVD command informations in square 752).If so, flow proceeds to square 718; Otherwise, flow proceeds to square 724.
In square 718, core 102 writes back and shared scratch pad memory 119 is made to fail.In one embodiment, microprocessor 100 include multiple chips multiple cores but and not all core in, the core 102 of microprocessor 100 shares a cache memory, As described above.In this embodiment, the intermediary operation (not shown) being similar in square 717 to square 726 is performed, To be write back by the execution of one of core 102 in the wafer and being made shared buffer out of memory, and the chip is other (multiple) Core is returned to similar to the sleep state in square 724 to wait for until the cache miss.Flow proceeds to Square 724.
In square 724, a SYNC 6 is written in core 102, causes control unit 104 that core 102 is made to enter sleep state.Stream Journey proceeds to square 726.
In square 726, when all cores 102 have been written into SYNC 6, core 102 is waken up by control unit 104.Flow into Row arrives decision block 727.
In decision block 727, core 102 judge its whether be encounter in block 702 WBINVD instruction core 102 (with The core 102 that the WBINVD command informations are received in square 752 contrasts).If so, flow proceeds to square 728;It is no Then, flow proceeds to square 744.
In square 728, core 102 issues the specific bus cycles to cause external high-speed buffer to be written back into and make outside High-speed buffer fails.Flow proceeds to square 744.
In square 744, a SYNC 13 is written, causes control unit 104 that core 102 is made to enter sleep state.Flow into Row arrives square 746.
In square 746, when all cores 102 have been written into SYNC 13, core 102 is waken up by control unit 104.Flow Proceed to decision block 747.
In decision block 747, core 102 judge its whether be encounter in block 702 WBINVD instruction core 102 (with The core 102 that the WBINVD command informations are received in square 752 contrasts).If so, flow proceeds to square 748;It is no Then, flow proceeds to square 749.
In square 748, core 102 completes WBINVD instructions comprising the WBINVD instructions of resignation (retire), and can wrap Include the ownership for abandoning a hardware semaphore (see Figure 20).Flow ends at square 748.
In square 749, before core 102 is interrupted in square 752, core 102 restores to continue its positive execution in square 749 Task 102.Flow ends at square 749.
It is to show to be schemed according to the time sequential routine of the microprocessor 100 of Fig. 7 A~7B flow charts refering to Fig. 8.In this example In, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, being understood that It is that in other embodiments, microprocessor 100 may include the core 102 of different number.
Core 0 encounters a WBINVD and instructs and respond one WBINVD command informations of transmission, and interrupts core 1 and (each square of core 2 702).Core 0 then writes a SYNC 4 and enters sleep state (each square 704).
Each core 1 and core 2 are finally interrupted from its current task and read the information (each square 752).As sound It answers, each core 1 and core 2 are written a SYNC 4 and enter sleep state (each square 704).As shown, each core write-in The time of SYNC 4 may be different.
When all cores have been written into SYNC 4, control unit 104 wakes up all cores (each square 706) simultaneously.It is each Core then writes back and makes its specific cache miss (each square 708), and SYNC 5 is written and enters sleep shape State (each square 714).It need to write back and make the time quantum of cache miss may be different, therefore, be write in each core The time for entering SYNC 5 may be different, as shown in the figure.
When all cores have been written into SYNC 5, control unit 104 wakes up all cores (each square 716) simultaneously.Only encounter The core of WBINVD instructions writes back and makes the shared failure of cache memory 119 (each square 718) and the write-in of all cores SYNC 6 simultaneously enters sleep state (each square 724).Since an only core writes back and shared cache memory 119 is made to lose Effect, therefore the time of each core write-in SYNC 6 may be different.
When all cores have been written into SYNC 6, control unit 104 wakes up all cores (each square 726) simultaneously.Only encounter The core of WBINVD instructions completes WBINVD instructions (each square 748) and all other core restores the processing before interrupting.
Although being described it should be appreciated that speed buffering control instruction is the embodiment that an x86WBINVD is instructed, Other embodiments assume that synchronization request is used to execute other speed buffering instructions.For example, microprocessor 100 can perform class As act, so as to execute x86INVD instructions and simply without writing back high speed buffer data (in square 708 and 718) High-speed buffer is set to fail.For as yet another example, speed buffering control instruction can be by instruction more different than x86 framework Collection framework obtains.
Power management operations
It is configured to execute the operation of each power reduction in the core 102 of microprocessor 100, for example, but be not limited to, Stopping executes instruction, control unit 104 is asked to stop transmission clock signal to core 102, request control unit 104 by removal core 102 power supply writes back and makes local (for example, unshared) cache miss of core 102 and stores the state of core 102 To an external memory, such as special random access memory 116.When the specified power of 102 executed of a core one or more cores subtracts When operating less, " core " C- states (also referred to as a core idle state or core sleep state) are had been enter into.In one embodiment, C- state values can be generally corresponding to known Advanced Configuration and Power Interface (Advanced Configuration and Power Interface, ACPI) specification processor state, but may also comprise finer granularity (Granularity).In general, one Core 102 will enter a core C- states to respond the request from aforesaid operations system.For example, x86 frameworks monitoring waits for (MWAIT) instruction is power management instruction, provides a prompt, i.e. a target C- states, until the core 102 executed instruction is to permit Perhaps microprocessor 100 enters an optimized state, seems lower-wattage consume state.In the case of a MWAIT instruction, mesh Mark C- states are exclusive (proprietary) and non-ACPI C- states.Core C- states 0 (C0) correspond to the operation shape of core 102 The value that state and C- states gradually increase corresponds to the activity gradually decreased or responsive state (such as C1, C2, C3 state).One gradually The response of reduction or active state refer to that configuration or the operation shape of more power are saved relative to a more multi-activity or responsive state State, or for some reason and the opposite configuration for reducing response or mode of operation (for example, postpone with longer wake-ups, compared with It is few to enable completely).The example that one core 102 may save power operation is the execution of halt instruction, stops transmission clock signal, drop Low-voltage, and/or part (for example, functional unit and/or local high-speed buffer) or the power supply of entire core for removing core.
In addition, microprocessor 100 is configured to execute the power reduction operations across core.Across core power reduction operations involve Or influence multiple cores 102 of microprocessor 100.For example, sharing cache memory 119 can be big and disappear relatively Consume a large amount of power.Therefore, significant power saves the clock pulse letter that shared cache memory 119 can be sent to by removing Number and/or power supply reach.However, in order to remove to the clock signal of shared cache memory 119 and/or power supply, institute There is the core 102 of shared cache memory that must agree to so that the consistency of data is maintained.Embodiment considers micro- place It includes other shared relevant resources of power supply to manage device 100, seems shared clock pulse and power supply.In one embodiment, microprocessor 100 It is coupled to the System on chip group including a Memory Controller, peripheral controllers and/or power source management controller.In other realities It applies in example, one or more controllers are integrated into microprocessor 100.System power saving can be by 100 notification controller of microprocessor Make controller that the action of power saving be taken to reach.For example, microprocessor 100 can make the height of microprocessor with notification controller Fast cache invalidation is simultaneously closed, so that it need not be investigated.
Other than the concept of a core C- states, the in general C- states with one " encapsulation " of microprocessor 100 (are also claimed For an encapsulation idle state or encapsulation sleep state).Encapsulation C- states correspond to minimum (for example, peak power consumption) of core 102 Common core C- states (for example, please referring to the square 318 of the field 246 and Fig. 3 in Fig. 2).However, in addition to the specific power of core subtracts Few operation is outer, and encapsulation C- states are related to executing one or more microprocessors 100 across core power reduction operations.With encapsulation C- shapes Relevant across the core power-save operation example of state include close one generate clock signal phase-locked loop (Phase-locked-loop, PLL), and the shared cache memory 119 is emptied, and stops its clock pulse and/or power supply, make memory/outside control Device avoids the local of investigation microprocessor 100 from sharing cache memory.Other examples are to change voltage, frequency and/or total Line clock pulse than, reduce the size of cache memory, such as shared cache memory 119, and run with the speed of half Shared cache memory 119.
In many cases, operating system is by effectively to execute the instruction in independent core 102, therefore can enable individually Core enters sleep state (for example, to a core C- states), but without directly enable the entrance of microprocessor 100 sleep state (for example, To encapsulation C- states) mode.Valuably, side of the core 102 in control unit 104 of microprocessor 100 is described in embodiment It helps down and works with working in coordination, to detect when all cores 102 have been enter into core C- states and prepare that the power-save operation across core is made to occur.
Referring to Fig. 9, it is the operational flowchart that display microprocessor 100 enters that a low-power encapsulates C- states.Fig. 9's Embodiment describes the example that microprocessor 100 is coupled to a chipset and is executed using MWAIT instruction.However, being understood that It is that in other embodiments, operating system is using the instruction of other power managements and main core 102 and is integrated into microprocessor Controller in 100 communicates, and different shake hands (Handshake) agreement using one and describe.
This operation is to be described with the angle of a single core, but each core 102 of the microprocessor 100 can be potentially encountered MWAIT instruction simultaneously makes microprocessor 100 enter optimum state jointly according to this specification operation.Flow starts from square 902.
In square 902, a core 102 encounters a MWAIT instruction for specifying target C- states, is denoted as in fig.9 Cx, wherein x are a nonnegative integral values.Flow proceeds to square 904.
In square 904, it is that x (is denoted as in fig.9 that the positions C 224 set and 226 value of a C- states field, which is written, in core 102 SYNC Cx) synchronization request to its synchronize buffer 108.In addition, synchronization request specifies core in its wake events field 204 102 are waken up in all wake events.Therefore, control unit 104 enables core 102 enter sleep state.It more preferably says, core 102 Before SYNC Cx are written, core 102 first writes back and makes the local cache memory failure that it is written.The flow side of proceeding to Block 906.
In square 906, when all cores 102 have been written into a SYNC Cx signals, 102 controlled unit 104 of core wakes up. As described above, may be different by the x values that other cores 102 are written, and control unit 104 sends out minimum common C- state values to shape In the minimum common C- states field 246 of 106 status word 242 of state buffer (each square 318).Before square 906, and core 102 be in sleep state when, can be waken up by a wake events, seem an interrupt signal (for example, square 305 and 306).More Specifically, but do not ensure that the operating system will execute the MWAIT instruction of all cores 102, it allows to send out in a wake events Before one of raw (for example, interruption) instruction core 102 effectively cancels MWAIT instruction, microprocessor 100 executes and encapsulation C- The relevant power-save operation of state.However, in square 906, once core 102 is waken up, (example in a period of clock pulse is interrupted and deactivated Such as, microcode does not allow itself to be interrupted), MWAIT of the core 102 (in fact, all core 102) due to (in square 902) Instruction still executes microcode, and maintains in microcode, until square 924.In other words, although small part in all cores 102 MWAIT instruction is received to enter sleep state, individual core 102 can be in sleep state, but as micro- place of an encapsulation Reason device 100 would not instruct that the chip collection, and it is ready for entering an encapsulation sleep state.However, once all cores 102 have agreed into Enter an encapsulation sleep state, effectively indicated by the generation of the synchronous situation in square 906, main core 102 is allowed to and crystalline substance One encapsulation sleep state Handshake Protocol of piece group completion (for example, square 908,909 and following 921), and be not interrupted and do not have and appoint What its core 102 is interrupted.Flow proceeds to decision block 907.
In decision block 907, core 102 judge its whether be microprocessor 100 main core 102.It more preferably says, if sentencing Break reseting time its be BSP when, a core 102 be main core 102.If the core is main core, flow proceeds to square 908; Otherwise, flow proceeds to square 914.
In square 908, main core 102 writes back and shared cache memory 119 is made to fail, then with can take Appropriate action is communicated with the chip collection for reducing power consumption.For example, encapsulation C- states are in due to working as microprocessor 100 When, Memory Controller and/or peripheral control unit all maintain to fail, therefore Memory Controller and/or peripheral control unit can be kept away Exempt to detect the local of microprocessor 100 and shared cache memory.Illustrate as another example, which can transmit signal To microprocessor 100 make microprocessor 100 take power-save operation (for example, establishment x86-style STPCLK as described below, SLP, DPSLP, NAP, VRDSLP signal).It more preferably says, core 102 is based on minimum common 246 value of C- states field and carries out power The communication of management information.In one embodiment, core 102 issues an I/O and reads the bus cycles to an offer relevant electricity of chipset Source control information, for example, the I/O address of encapsulation C- state values.Flow proceeds to square 909.
In square 909, main core 102 waits for chipset to establish (assert) STPCLK signal.More preferably say, if When STPCLK signal is not established after the bright clock cycle of a predetermined number, control unit 104 is stopping its synchronization just carried out After request, this situation is detected, wake up all cores 102 and indicates the mistake in error code field 248.Flow proceeds to square 914。
In square 914, which is written a SYNC 14.In one embodiment, the synchronization request is in its wake events The core 102 is specified not to be waken up in any wake events in field 204.Therefore, control unit 104 enables core 102 enter sleep State.Flow proceeds to square 916.
In square 916, when all cores 102 have write a SYNC 14, core 102 is waken up by control unit 104.Stream Journey proceeds to decision block 919.
In decision block 919, core 102 judge its whether be microprocessor 100 main core 102.If so, before flow Enter square 921;Otherwise, flow proceeds to square 924.
In square 921, main core 102 sends out a stopping in 100 bus of microprocessor allows (grant) period with logical Knowing the chipset, it may take across core (for example, package perimeter) and the whole relevant power-save operation of microprocessor 100, seem to keep away Exempt from investigation, the removal bus clock pulse (for example, x86- type BCLK) to microprocessor 100 of 100 cache memory of microprocessor, And other signals (for example, x86- types SLP, DPSLP, NAP, VRDSLP) in the bus are established, so that microprocessor 100 removes Clock pulse and/or power supply to microprocessor 100 various pieces.Although being described in, embodiments herein relate to microprocessor 100 and one and I/O reads the Handshake Protocol (in square 908) between relevant chip collection, and the establishment of STPCLK is (in square In 909), and stop the publication (in square 921) for allowing the period, have that history is related to x86 architecture systems, Ying Keli Solution, other embodiments assume with it is other with different agreement instruction set architecture system it is related, but can also save electric energy, It improves performance and/or reduces complexity.Flow proceeds to square 924.
In square 924, a Sleep Request is written (for example, sleep position 212 is setting (set) and the positions S 222 are clear in core 102 Except the Sleep Request of (clear)) extremely synchronize buffer 108.In addition, synchronization request indicates core 102 in its wake events field 204 Only in non-established wake events (the wakeup event of the de-assertion of STPCLK, that is, release true of STPCLK The wake events of vertical STPCLK) in be waken up.Therefore, control unit 104 enables core 102 enter sleep state.Flow ends at Square 924.
Referring to Fig. 10, it is to show the sequence diagram for operating embodiment according to Fig. 9 flow charts microprocessor 100.In this example In son, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, being understood that It is that in other embodiments, microprocessor 100 may include the core 102 of different number.
Core 0 encounters the MWAIT instruction (MWAIT C4) (each square 902) of a specified C- states 4.Core 0 then writes one SYNC C4 simultaneously enter sleep state (each square 904).Core 1 encounters the MWAIT instruction (MWAIT C3) of a specified C- states 3 (each square 902).Core 1 then writes a SYNC C3 and enters sleep state (each square 904).Core 2 encounters a specified C- shapes The MWAIT instruction (MWAIT C2) (each square 902) of state 2.Core 2 then writes a SYNC C2 and enters sleep state (each party Block 904).As shown, the time in each core write-in SYNC Cx may be different.In fact, it occurs in some other events Before, such as one interrupt, one or more cores are not likely to be encountered a MWAIT instruction.
When all cores have been written into SYNC Cx, control unit 104 wakes up all cores (each square 906) simultaneously.Mainly Core then sends out I/O and reads the bus cycles (each square 908), and waits for the establishment of STPCLK (per square 909).All core A SYNC 14 is written, and enters sleep state (each square 914).It is slow due to only having main core to empty (Flush) shared high speed Memory 119 is rushed, I/O is sent out and reads the bus cycles and STPCLK is waited for establish, therefore the time of each core write-in SYNC 14 can Can be different, as shown in the figure.In fact, main core can be sequentially written in SYNC 14 after other cores with hundreds of microseconds.
When SYNC 14 is written in all cores, control unit 104 wakes up all cores (each square 916) simultaneously.Only one is main Core, which sends out stopping, allowing the period (Stop grant cycle) (each square 921).All cores are written in the non-establishment letters of STPCLK Sleep Request for being waited in number (~STPCLK) simultaneously enters sleep state (each square 924).Stop since only main core is sent out Only allow the period, therefore the time of each core write-in Sleep Request may be different, as shown in the figure.
When STPCLK signal, which is released from, establishes (de-asserted), control unit 104 wakes up all cores.
Can be observed by Figure 10, when core 0 executes Handshake Protocol, core 1 and core 2 valuably can one section of suspend mode it is effective when Between.It is noted, however, that need to microprocessor 100 be waken up usually and suspend mode the required time from encapsulation sleep state Time span is directly proportional (for example, great power is saved in sleep state).Therefore, relatively long in encapsulation sleep state In the case of (or the 102 sleep state time of an individual core is longer even in), it would be desirable to it is further reduced wake-up Generation and/or the required time waken up related to Handshake Protocol.Figure 11 describes the Handshake Protocol of the processing of single core 102, and another Core 102 keeps a dormant embodiment.In addition, according in the embodiment of Figure 11, saving power can further pass through It reduces by one wake events of response and 102 quantity of core that is waken up and obtains.
1 is please referred to Fig.1, is that microprocessor 100 according to another embodiment of the present invention enters low-power encapsulation C- shapes The operational flowchart of state.The embodiment of Figure 11 using microprocessor 100 be coupled to example that MWAIT instruction in chipset executes into Row explanation.However, it should be appreciated that in other embodiments, operating system is instructed using other power managements, and last It synchronous core 102 and is integrated into microprocessor 100, and using the communication of the controller of Handshake Protocols different from description.
The embodiment of Figure 11 is similar to the embodiment of Fig. 9 in some respects.However, in existing operations system request microprocessor Device 100 enters low-down power rating and tolerates in the environment of delay associated therewith, the embodiment of Figure 11 be designed in Save the power of potential bigger.More specifically, the embodiment of Figure 11 is conducive to control to the power of core and if necessary, such as handles When interruption, an only core in core is waken up.Embodiment considers to support the behaviour of two patterns in Fig. 9 and Figure 11 in the microprocessor 100 Make.In addition, pattern is configurable, either in manufacture (for example, passing through fuse 114) and/or control via software or by Microprocessor 100 is automatically determined according to by the specific C- states specified by MWAIT instruction.Flow starts from square 1102.
In square 1102, core 102 encounters the MWAIT instruction (MWAIT Cx) for specifying target C- states, is scheming Cx is expressed as in 11, flow proceeds to square 1104.
In square 1104, core 102 be written the positions C 224 be set and 226 value of a C- states field be x (its in fig. 11 Be denoted as SYNC Cx) synchronization request to its synchronize buffer 108 in.Synchronization request is also provided with selective wake-up (SEL WAKE) position 214 and the positions PG 208.In addition, synchronization request indicates core 102 in all wake events in its wake events field 204 In be waken up, except the establishment of STPCLK and the non-establishment (~STPCLK, that is, the releasing of STPCLK is established) of STPCLK. (more preferably saying there are other wake events, when starting such as AP, which specifies core 102 not to be waken up).Therefore, control is single Member 104 enables core 102 enter sleep state comprising prevents to provide power to core 102 because the positions PG 208 are set.In addition, core 102 write back and keep local cache memory invalid, and storage (the preferably special arbitrary access before synchronization request is written Memory 116) its core 102 state.When subsequent core 102 is waken up (for example, in square 1137,1132 or 1106), core 102 (for example, from PRAM 116) is restored into its state.As described above, especially with respect to Fig. 3, when the write-in of last core 102 one has When the synchronization request that selective wake-up position 214 is arranged, other than being ultimately written core 102, which can be automatically prevented from institute There are all wake events (each square 326) of core 102.Flow proceeds to square 1106.
In square 1106, when all cores 102 have been written into a SYNC Cx, the wake-up of control unit 104 is ultimately written Core 102.As described above, control unit 104 maintains the positions S 222 of other cores 102 to be arranged, finally write even if control unit 104 wakes up The core 102 that enters simultaneously removes S.Before square 1106, when core 102 is in sleep state, it can be called out by a wake events It wakes up, such as one interrupts.However, when core 102 is waken up in square 1106, core 102 is still held because of MWAIT instruction (square 1102) Row microcode, and in a period of interruption is deactivated (for example, the microcode does not allow itself to be interrupted) be maintained in microcode, until Until square 1124.In other words, although having been received by a MWAIT instruction to enter sleep state, only singly no more than all cores 102 Only core 102 can suspend mode, but as the microprocessor of encapsulation 100 do not indicate the chipset it be ready for entering an encapsulation and sleep State.However, when all cores 102 have agreed to enter an encapsulation sleep state, pass through the synchronous regime in square 1106 Indicated by generation, the core 102 (core 102 being ultimately written, cause synchronous situation) being waken up in square 906 is allowed to Encapsulation sleep state Handshake Protocol (for example, square 1108,1109 and 1121 as follows) is completed without quilt with chipset It interrupts, and not any other core 102 is interrupted.Flow proceeds to square 1108.
In square 1108, core 102 writes back and shared cache memory 119 is made to fail, and is then communicated with chipset, It may take action appropriate, to reduce power consumption.Flow proceeds to square 1109.
In square 1109, core 102 waits for chipset to establish STPCLK signal.It more preferably says, if STPCLK signal When not established after a clock cycle predetermined quantity, control unit 104 detects this situation, and is asked terminating its synchronization just carried out All cores 102 are waken up after asking, and the mistake is indicated in error code field 248.Flow proceeds to square 1121.
In square 1121, core 102 sends out the chipset on the stopping permission period to bus.Flow proceeds to square 1124。
In square 1124, a Sleep Request is written in core 102, for example, being setting (set) and S with sleep position 212 222 be removing (clear) and the positions PG 208 are setting (set), until in synchronous buffer 108.In addition, synchronization request wakes up at it The core 102 is specified only to be waken up in releasing the wake events for establishing STPCLK in event field 204.Therefore, control unit 104 Core 102 is enabled to enter sleep state.Flow proceeds to square 1132.
In square 1132, control unit 104 detects the non-establishments of STPCLK and wakes up core 102.It should be noted that previously control Unit 104 processed wakes up core 102, and control unit 104 does not limit power supply to core 102 yet.It is advantageous that at this time core 102 be it is unique just In the core of running, this provides 102 chance of core so that it executes any action that must be performed, without other cores 102 Running.Flow proceeds to square 1134.
In square 1134, the write-in of core 102 is right to its is opened in solution in a buffer (not shown) for control unit 104 The wake events of specified each other cores 102 in the wake events field 204 of buffer 108 should be synchronized.The flow side of proceeding to Block 1136.
In square 1136, core 102 handles any wake events for just carrying out specifying the core 102.For example, real one It applies in example, including the system of microprocessor 100 allows the interruption of oriented (both directed) (for example, being directed toward microprocessor The interruption of 100 1 particular cores) and it is non-to (non-directed) interruption (for example, when microprocessor 100 selects, can be by micro- Interruption handled by any core 102 of processor 100).One non-is commonly known as one " low priority interrupt " to the example of interruption. In one embodiment, microprocessor 100 is preferably directed to be waken up in the non-STPCLK to the releasing interrupted extremely in square 1132 establishment Single core 102, since it has been waken up, and can handle the interruption with it is expected other cores 102 do not have it is any just carrying out call out The event of waking up, therefore can continue to sleep and limit power supply.Flow returns to square 1104.
When wake events are released from (unblcked) in square 1134, in addition to the core being waken up in square 1132 Except 102, the wake events that do not specified such as fruit stone 102 are carrying out, then are conducive to core 102 and keep sleep state, and Power supply is limited in each square 1104.However, when wake events are released from square 1134, if a specified wake-up Event is just handled by core 102, then core will not limit power supply (un-power-gated), and be waken up by control unit 104.In this feelings Under condition, different flows starts from the square 1137 in Figure 11.
In square 1137, after wake events are released from square 1134, another core 102 is (for example, in addition in square The core 102 except wake events core 102 is released in 1134) it is waken up.Other cores 102 handle any positive progress and are directed toward other cores 102 wake events, for example, processing one is interrupted.Flow proceeds to square 1104 from square 1137.
2 are please referred to Fig.1, is to show the sequence diagram for operating an example according to the microprocessor 100 of Figure 11 flow charts.Herein In example, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, it should thus be appreciated that , in other embodiments, microprocessor 100 may include the core 102 of different number.
Core 0 encounters the MWAIT instruction (MWAIT C7) (each square 1102) of a specified C- states 7.In this example, C- State 7 allows to limit power supply.It is that (set) (" selection as shown in Figure 12 is arranged that core 0, which is then written to a selective wake-up position 214, Property wake up ") and the positions PG 208 be arranged (set) SYNC C7, and entrance sleep state and limit power supply (each square 1104). Core 1 encounters the MWAIT instruction (each square 1102) that a specified C- states are 7.Core 1 is then written to selective wake-up position 214 It is the SYNC C7 of (set) to be arranged, and enter sleep state and limitation power supply (each square that (set) and the positions PG 208, which is arranged, 1104).Core 2 encounters the MWAIT instruction (each square 1102) that a specified C- states are 7.Core 2 is then written to calls out with selectivity Wake up the SYNC C7 that position 214 is setting (set) and the positions PG 208 are setting (set), and (each into sleep state and limitation power supply Square 1104).(however, in being described in the best embodiment of square 314 1, the core being ultimately written can not limit power supply).Such as Shown in figure, the write-in of each core may be different with the time of SYNC C7.
When it is that the SYNC C7 of (set) are arranged that the core write-in being ultimately written, which has selective wake-up position 214, the control list Member 104 stops (block off) all wake events (each square 326) for being ultimately written core, is core 2 in the example of Figure 12. In addition, control unit 104 only wakes up the core (each square 1106) being ultimately written, because of other core prolonged sleeps and power supply is limited, And core 2 executes Handshake Protocol with chipset, therefore power can be saved.Core 2 then sends out I/O and reads bus cycles (each square 1108), and the establishment (each square 1109) of STPCLK is waited for.In response to STPCLK, core 2, which sends out stopping, allowing the period (every One square 1121), and it is Sleep Request and the entrance that (set) is arranged to be written one to have the positions waiting PG 208 in STPCLK releasings Sleep state and limitation power (each square 1124).Above-mentioned core with suspend mode and can limit the one relatively long time of power.
When STPCLK can not be established, control unit 104 only wakes up core 2 (each square 1132).In the example in figure 12, The chipset can not establish STPCLK to respond a non-reception to interruption, be forwarded to microprocessor 100.Microprocessor 100 It indicates non-to interrupting to core 2, power is saved due to other cores keep sleep state and limitation power supply.Core releases other cores The wake events of (each square 1134) simultaneously service non-to interruption (each square 1136).Core 2, which then re-writes one, has choosing Selecting property wakes up the SYNC C7 that position 214 is setting (set) and the positions PG 208 are setting (set), and enters sleep state and limitation electricity Source (each square 1104).
When the write-in of core 2 has, selective wake-up position 214 is setting (set) and the positions PG 208 are the SYNC C7 that (set) is arranged When, since the synchronization request of other cores is still carrying out, for example, the positions S 222 of other cores are not removed by the wake-up of core 2, therefore The control unit 104 stops (block off) wake events of all cores other than core 2, for example, it is (each to be ultimately written core Square 326).In addition, control unit 104 only wakes up core 102 (each square 1106).Core 2 then sends out I/O and reads the bus cycles (each square 1108), and wait for the establishment (each square 1109) of STPCLK.In response to STPCLK, core 2, which sends out stopping, being permitted Perhaps period (each square 1121), and it is setting (set) to be written one to have the positions PG 208 waited in STPCLK can not be established Sleep Request, and enter sleep state and limitation power (each square 1124).
When STPCLK can not be established, control unit 104 only wakes up core 2 (each square 1132).In the example in figure 12, STPCLK is because other non-to being released from establishment due to interruption.Therefore, microprocessor 100 indicates the interruption to core 2, this can save work( Rate.Core 2 releases the wake events (each square 1134) of other cores again and to service this non-to interruption (each square 1136).Core 2 Then one is written again has the SYNC C7 that selective wake-up position 214 is setting (set) and the positions PG 208 are setting (set), goes forward side by side Enter sleep state and limitation power (each square 1104).
This period lasts is for quite a long time, i.e., only non-to be generated to interruption.Figure 13 is one instruction one of display in addition to most The example of different IPs interrupt processing except core is written afterwards.
It can know that embodiment in fig. 12 advantageously, is slept once core 102 initially enters by comparing Figure 10 and Figure 12 Dormancy state (is written after SYNC C7) in the example in figure 12, and an only core 102 is waken up again to execute association of shaking hands with chipset View, and other cores 102 keep sleep, can be a notable advantage if core 102 is in a considerably long sleep state.Work( Rate saves possible highly significant, and especially 102 processing work of single core is loaded very in systems in operating system identification In the case of small.
Furthermore it is advantageous that be indicated to other cores 102 as long as no wake events, then an only core 102 be waken up (with It is non-to event to provide service, seems a low priority interrupt).Come again, it, can if core 102 is in a considerably long sleep state There can be significant advantage.In addition to relatively infrequent non-to interruption, such as USB is interrupted, and is not had in systems especially effective In the case of load, power saving can be significant.Further, even if a wake events are indicated to another core When 102 (for example, interrupt operation system is indicated to a single core 102, seems operating system timer interruption), embodiment can be advantageous The single core 102 of ground switching at runtime, execute encapsulation sleep state agreement and service are non-to wake events, as shown in figure 13, so as to Enjoy the benefit for waking up only one single core 102.
3 are please referred to Fig.1, is to show the sequence diagram for operating an example according to the microprocessor 100 of Figure 11 flow charts.Figure 13 Example it is similar to the example of Figure 12 in many aspects.However, being released from STPCLK in the first established example, which is One is directed toward the interruption (rather than one in Figure 12 examples is non-to interruption) of core 1.Therefore, control unit 104 wakes up 2 (each party of core Block 1132), and (each square 1134) is then released by core 2 in wake events and wakes up core 1 afterwards.Core 2 is then written one again to be had Selective wake-up position 214 is setting (set) and the positions PG 208 are the SYNC C7 of (set) to be arranged, and enter sleep state and limitation Power (each square 1104).
(each block 1137) is interrupted in 1 service-orientation of core.Then write-in has selective wake-up position 214 to set to core 1 again It is the SYNC C7 of (set) to be arranged, and enter sleep state and limit power (each square 1104) and exist to set (set) and the positions PG 208 In this example, its SYNC C7 is written before SYNC C7 are written in core 1 in core 2.Therefore, although core 0 is when initial SYNC C7 are written in it Still there is its S 222set, but the positions S 222 when it is waken up of core 1 are still eliminated.Therefore, when core 2 is after releasing wake events When SYNC C7 are written, not last core write-in synchronizes C7 requests, on the contrary, core 1 writes synchronous C7 requests as last core.
As the SYNC that the write-in of core 1 one is setting (set) with selective wake-up position 214 and the positions PG 208 are setting (set) When C7, because the synchronization request of core 0 is still carrying out (for example, it is not removed by the wake-up of core 1 and core 2), and core 2 is (herein In example) requests of SYNC 14 are had been written into, so the wake events of the control unit 104 blocking all cores other than core 1, for example, It is ultimately written core (each square 326).In addition, control unit 104 only wakes up core 1 (each square 1106).Core 1 then sends out I/ O reads the bus cycles (each square 1108), and STPCLK is waited for establish (each square 1109).In response to STPCLK, core 1 Sending out stopping allows the period (each square 1121), and it is setting that the positions PG 208 that there is waiting STPCLK to release establishment, which are written, (set) Sleep Request, and enter sleep state and limitation power (each square 1124).
When STPCLK is released from it is established when, control unit 104 only wakes up core 1 (each square 1132).In the example of Figure 12 In, STPCLK non-releases establishment due to one to interruption;Therefore, microprocessor 100 indicates non-to interrupting to core 1, can save Power.It is handled from core 1 non-to the period lasts of interruption considerable time, that is, only non-to be generated to interruption.In such mode In, microprocessor 100 can be such that nearest interruption is instructed to save power advantageous by instruction is non-to interruption to core 102, It is shown in the example of Figure 13 related to a different IPs are switched to.Core 1 releases wake events (each square of other cores again 1134) it and services non-to interruption (each square 1136).Then write-in one has selective wake-up position 214 for setting to core 1 again (set) and the positions PG 208 are the SYNC C7 of (set) to be arranged, and enter sleep state and limitation power (each square 1104).
It is other although being described it should be appreciated that power management instruction is the embodiment that an x86MWAIT is instructed The embodiment that synchronization request is used to perform power management instruction can be considered.For example, microprocessor 100 is executable Similar operations are to respond by one group of reading with the relevant default I/O port address of different C- states.As another example, work( Rate management instruction can be obtained by the instruction set architecture different from x86 frameworks.
The dynamic of multi-core processor reconfigures
Configuration of each core 102 of microprocessor 100 based on 100 each core 102 of microprocessor, which generates, configures relevant value. It more preferably says, the microcode of each core 102 is generated, stored and using the relevant value of configuration.The production of embodiment description configuration correlation Life can be dynamically and beneficial, be described as follows.The example of configuration correlation includes, but are not limited to the following contents.
Each core 102 generates one and the relevant whole nuclear volumes of above-mentioned Fig. 2.With the core for being resident crystal 406 only in core 102 The local nuclear volume 256 of 102 relevant cores 102 is compared, and whole nuclear volume refers to relevant with 100 all cores 102 of microprocessor The nuclear volume of whole core 102.In one embodiment, core 102 generates whole nuclear volume, and whole nuclear volume is 102 number of crystals of core Amount 258 and the product of 102 quantity of core of each crystal and its summation of local nuclear volume 256, as follows:
Whole nuclear volume=(nuclear volume of number of crystals × each crystal)+local nuclear volume.
Each core 102 also generates a virtual nuclear volume.The virtual nuclear volume is that whole nuclear volume is subtracted with one less than i.e. When core 102 whole nuclear volume whole nuclear volume 102 quantity of deactivated core.Therefore, in all cores of the microprocessor 100 In the case of 102 is available, whole nuclear volume is identical with virtual nuclear volume.If however, one or more cores 102 deactivate, have it is scarce When falling into, the virtual nuclear volume of a core 102 may be different from its whole nuclear volume.In one embodiment, it is empty to insert it for each core 102 Nucleoid quantity to its corresponding APIC ID buffer APIC ID fields.However, according to another embodiment (for example, Figure 22 and Figure 23), then it is not belonging to such situation.In addition, in one embodiment, operating system may be updated in APIC ID buffers APIC ID。
Each core 102 also generates a BSP flags, indicates whether the core 102 is BSP.In one embodiment, in general (for example, when the function of " all core BSP " in fig 23 deactivates) core 102 it is specified this as boot sequence processor It itself is an application processor (Application that (Bootstrap Processor, BSP) and each other cores 102, which are specified, Processor, AP).After reseting, AP cores 102 are initialized, and subsequently enter sleep state and BSP notices is waited for start to read It takes and executes instruction.On the contrary, after the initialization of AP cores 102, BSP cores 102 immediately begin to read and execute system firmware Instruction, for example, BIOS start codes, to initialize system (for example, verification system storage and the whether normal work of peripheral equipment Make and initialize and/or configure them) and operating system is guided, for example, it is loaded into operating system (for example, being loaded into from disk), And control is transferred to operating system.Before guiding operating system, BSP decision systems configure (for example, at core 102 or logic Manage the quantity of device in systems), and be stored in memory, so that operating system can be read after system configuration startup. In operating system after being guided, instruction AP cores 102 start to read and execute operating system instruction.In one embodiment, generally For (for example, in Figure 22 and Figure 23 " modification BSP " and " BSP of all cores " function, when deactivated respectively), if a core 102 When its virtual nuclear volume is 0, then specify this as BSP, and all other core 102 is specified originally as an AP cores 102.Most preferably, One core 102 inserts the BSP flag bits in its BSP flag relevant configuration value to the APIC substrate address registers of its corresponding APIC. According in an embodiment, as described above, BSP is the main core 102 in square 907 and 919, the encapsulation sleep shape of Fig. 9 is executed State Handshake Protocol.
Each core 102 also generates the APIC base values for inserting APIC substrate buffers.APIC substrates address is based on core 102 APIC ID and generate.In one embodiment, the APIC bases in APIC substrate address registers may be updated in operating system Bottom address.
Each core 102 also generates a crystal and mainly indicates, indicates whether the core 102 is the crystal 406 for including the core 102 Main core 102.
Each core 102 also generates a chip and mainly indicates, indicate the core 102 whether be include 102 chip of instant core Main core, wherein assuming that the microprocessor 100 is configured with chip, detailed description is as above.
Each core 102 calculates configuration correlation and simultaneously operates with the configuration correlation so that it is including microprocessor 100 is System normal operation.For example, system is based on its relevant APIC ID instruction interrupt requests to core 102.APIC ID determine core Which interrupt requests 102 should respond.It further illustrates, each interrupt requests including a mesh identifier, and a core 102 is only Responded when mesh identifier is matched with the APIC ID of core 102 interrupt requests (if or the interrupt requests identifier be one to Indicate that it is the particular value of all cores of a request 102).As another example, each core 102 must be known by whether it is BSP, with So that it is executed initial BIOS code and guide operating system, and executes encapsulation sleep state as described in Figure 9 in one embodiment Handshake Protocol.Embodiment is described as follows (refering to Figure 22 and 23), and wherein BSP flags and APIC ID can be due to specific purposes by it It makes an amendment in normal value, seems for testing and/or debugging.
4 are please referred to Fig.1, is the flow chart that 100 dynamic of display microprocessor reconfigures.In the explanation of Figure 14, with The polycrystal microprocessor 100 of Fig. 4 is as reference comprising two crystal 406 and eight cores 102.However, being understood that It is that described dynamic, which reconfigures can be used, has different configuration of microprocessor 100, that is, has more than two crystal or list A crystal, and more or less than eight cores 102 but at least two cores 102.This operation is described by angle from a single core, but micro- Each core 102 of processor 100 with overall dynamics operates according to the description and reconfigures the microprocessor 100.Flow is opened Start from square 1402.
In square 1402, microprocessor 100 is reset, and quantity of the hardware of microprocessor 100 based on available core 102 And it resides in the suitable value to the configuration buffer 112 of each core 102 of amount of crystals filling of core 104.In one embodiment, Local nuclear volume 256 and amount of crystals 258 are hard-wired (hardwired).As described above, hardware can decide whether by fuse 114 states blown or do not blown enable or deactivate a core 102.Flow proceeds to square 1404.
In square 1404, core 102 is by reading configuration words 252 in configuration buffer 112.Core 102 is then based in square 252 value of read configuration words generates its correlation in 1402.In the case where polycrystal microprocessor 100 configures, in square Generated configuration correlation will not consider the core 102 of other crystal 406 in 1404.However, in square 1414 and 1424 (with And square 1524 in Figure 15) caused by configuration correlation will consider the core 102 of other crystal 406, as described below.Flow carries out To square 1406.
In square 1406, core 102 makes 254 value of enable position of this earth's core 102 in buffer 112 is locally configured be passed It casts to distal end crystal 406 and configures 112 corresponding enable position 254 of buffer.For example, the configuration please referred to Fig.4, one in crystal Core 102 in A 406A makes and configuration buffer 112 center A, B, C and D (this earth's core) in crystal A 406A (local crystal) Relevant enable position 254 is transmitted to and 112 center A, B, C and D phase of configuration buffer in crystal B 406B (distal end crystal) The enable position 254 of pass.On the contrary, the core 102 in crystal B 406B makes and the configuration in crystal B 406B (local crystal) The relevant enable positions 112 center E, F, G and H (this earth's core) of buffer 254 be transmitted to at crystal A 406A (distal end crystal) The relevant enable position configuration buffer 112 center E, F, G and H 254.In one embodiment, core 102 is locally configured by write-in Buffer 112 propagates to other crystal 406.It more preferably says, local match is made to buffer 112 is locally configured by the write-in of core 102 It sets buffer not change, but local control unit 104 can be caused to propagate local 254 value of enable position to distal end crystal 406 In.Flow is carried out to square 1408.
In square 1408, core 102 be written a synchronous situation 8 (being denoted as SYNC 8 in fig. 8) synchronization request to its In synchronous buffer 108.Therefore, control unit 104 enables core 102 enter sleep state.Flow proceeds to square 1412.
In square 1412, when all available cores 102 have been written into one in the core set specified by core set field 228 When SYNC 8, control unit 104 wakes up core 102.It is worth noting that, the case where 406 microprocessor 100 of a polycrystal configures Under, synchronous situation occurs be that a polycrystal synchronous situation occurs.That is, control unit 104 will wait for wake up (or Core 102 be not arranged sleep position 212 to determine it is sleepless in the case of interrupt) core 102, until in core set field 228, (it can Core 102 to be included in crystal 406) its synchronization request is written until.Flow proceeds to square 1414.
In square 1414, core 102 reads again configuration buffer 112 and is based on including by the transmitted enable of distal end crystal Newly value generates its configuration correlation to the configuration words 252 of the right value of position 254, and flow proceeds to decision block 1416.
In decision block 1416, core 102 determines whether it should deactivate itself.In one embodiment, fuse 114 because The microcode reads (before decision block 1416) in its reset process, to indicate that core 102 should deactivate itself and be blown, therefore Core 102 determines that it need to deactivate itself.Fuse 114 can be blown during or after the manufacture of microprocessor 100.Another In embodiment, 114 value of newer fuse, which can be scanned up to, to be kept in buffer, as described above, and scanned value instruction The core 102 should be deactivated.Figure 15 is that description core 102 judges that it should be stopped another embodiment used by different modes.If When core 102 determines that it should be deactivated, flow proceeds to square 1417;Otherwise, flow proceeds to square 1418.
In square 1417, the write-in of core 102 deactivates core position 236 so as to be removed in itself list by available core 102, example Such as, its corresponding enable position 254 in the configuration words 252 of configuration buffer 112 is removed.Hereafter, core 102 can prevent from itself from holding Any more instructions of row more preferably by one or more positions are arranged come to close its clock signal, and remove its power supply.Flow Terminate in square 1417.
In square 1418, the synchronization request of a synchronous situation 9 (being denoted as SYNC 9 in fig. 14) is written to same in core 102 It walks in buffer 108.Therefore, control unit 104 enables core 102 enter sleep state.Flow proceeds to square 1422.
In square 1422, when the core 102 of all enablings has been written into a SYNC 9, core 102 is called out by control unit 104 It wakes up.In addition, in the case where 406 microprocessor 100 of a polycrystal configures, synchronous situation occurs based in configuration buffer 112 In updated value may be a quartz lock happen.Furthermore when control unit 104 determines whether a synchronous situation occurs When, control unit 104 will exclude to consider to deactivate the core 102 of itself in square 1417.It is described in more detail, in a situation In, before not deactivating the core 102 of itself and synchronous buffer 108 be written in square 1417, all other core 102 (in addition to Deactivate except the core 102 of itself) one SYNC 9 of write-in, then when the not deactivated core 102 of itself stops in square 1417 When buffer 108 synchronous with the setting write-in of core position, control unit 104 will detect the generation of synchronous situation (in square 316).When Control unit 104 because the enable position 254 for deactivating core 102 be (clear) removed determine that synchronous situation has occurred and that when, control Unit 104 does not consider further that deactivated core 102.That is, due to all enabling cores 102, but do not include deactivating core 102, it has write Enter SYNC 9, no matter deactivates whether core 102 has been written into SYNC 9, therefore control unit 104 judges that synchronous situation has occurred and that. Flow proceeds to square 1424.
In square 1424, if a core 102 is deactivated by operation of another core 102 in square 1417, core 102 Configuration buffer 112 is read again, and the new value of configuration words 252 reflects a deactivated core 102.Core 102 is then according to configuration words 252 new value generates it and configures correlation again, is similar to the mode in square 1414.One deactivated core can there are 102 Some configuration correlations can be caused to be different from the generated new value in square 1414.For example, as described above, virtual check figure Amount, APIC ID, BSP flags, BSP plots, the main chip of predominant crystal can because deactivate core 102 there are due to change.Next implementation In example, after generating and configuring correlation, core 102 one of them (for example, BSP) by 100 all cores of microprocessor 102 it is whole one The special random access memory 116 of non-core is written in a little configuration correlations, makes it that then can be read by all cores 102.For example, In one embodiment, whole configuration correlation is read by core 102 to execute framework instruction (for example, x86CPUID is instructed), Its 100 related Global Information of instruction request microprocessor seems 102 quantity of core of microprocessor 100.Flow proceeds to judgement Square 1426.
In square 1426, core 102, which removes, resets and starts to extract framework instruction.Flow ends at square 1426.
5 are please referred to Fig.1, is the flow chart that 100 dynamic of microprocessor reconfigures in showing according to another embodiment. In the explanation of Figure 15, as reference with the polycrystal microprocessor 100 of Fig. 4 comprising two crystal 406 and eight cores 102.So And, it should thus be appreciated that, described dynamic, which reconfigures can be used, has different configuration of microprocessor 100, that is, has more In two crystal or single crystal, and more or less than eight cores 102 but at least two cores 102.This operation is from a single core Described by angle, but each core 102 of microprocessor 100 with overall dynamics operates according to the description and reconfigures micro- place Manage device 100.It further illustrates, Figure 15 describes a core 102 and encounters the operation that core deactivates instruction, and flow starts from square 1502, and another core 102 operates, operating process starts from square 1532.
In square 1502, one of core 102 encounters one to indicate that core 102 deactivates the instruction of itself.It is real one It applies in example, which instructs for an x86WRMSR.In response, the transmission of core 102 one reconfigures information to other cores 102 and passes Send one internuclear interrupt signal.It more preferably says, (for example, the microcode does not allow its own in a period of the time, interruption was deactivated Be interrupted), core 102 prevents microcode to respond the instruction, to deactivate itself (in square 1502), or respond the interruption ( In square 1532), and maintain in microcode, until square 1526.Flow proceeds to square 1504 by square 1502.
In square 1532, one of other cores 102 are (for example, deactivate the core of instruction in addition to being encountered in square 1502 Core except 102) it is interrupted and receives by the internuclear interruption transmitted in square 1502 and reconfigure information.Institute as above It states, although described by angle of the flow in square 1532 by a single core 102, each other cores 102 are (for example, not Core 102 in square 1502) information is interrupted and received in square 1532 and executes the step in square 1504 to 1526 Suddenly.Flow proceeds to square 1504 by square 1532.
In square 1504, core 102 is written a synchronization and asks the synchronization request of condition 10 (being denoted as SYNC 10 in fig.15) extremely It is synchronized in buffer 108.Therefore, control unit 104 enables core 102 enter sleep state.Flow proceeds to square 1506.
In square 1506, when all available cores 102 have been written into a SYNC 10, core 102 is called out by control unit 102 It wakes up.It is worth noting that, in the case where 406 microprocessor 100 of a polycrystal configures, synchronous situation generation can be a polycrystal Synchronous situation occurs.That is, control unit 104 will wait for wake up (or core 102 not yet determines entrance it is dormant In the case of interrupt) core 102, until specified in core set field 228 (it may include the core 102 in crystal 406) and can Until its synchronization request is written in the core 102 of enabling (it is indicated by enable position).Flow proceeds to decision block 1508.
In decision block 1508, core 102 judge its whether be one be instructed in square 1502 with deactivate itself Core 102.If so, flow proceeds to square 1517;Otherwise, flow proceeds to square 1518.
In square 1517, the write-in of core 102 deactivates core position 236 so as to be removed in itself list by available core 102, example Such as, its corresponding enable position 254 in the configuration words 252 of configuration buffer 112 is removed.Hereafter, core 102 can prevent from itself from holding Any more instructions of row more preferably by one or more positions are arranged come to close its clock signal, and remove its power supply.Flow Terminate in square 1517.
In square 1518, the synchronization request of a synchronous situation 11 (being denoted as SYNC 11 in fig.15) is written extremely in core 102 In synchronous buffer 108.Therefore, control unit 104 enables core 102 enter sleep state.Flow proceeds to square 1522.
In square 1522, when the core 102 of all enablings has been written into a SYNC 11, core 102 is by 104 institute of control unit It wakes up.In addition, in the case where 406 microprocessor 100 of a polycrystal configures, synchronous situation occurs based in configuration buffer Updated value in 112 may be that a polycrystal synchronous situation occurs.Furthermore when control unit 104 determines that a synchronous situation is When no generation, control unit 104 will exclude to consider to deactivate the core 102 of itself in square 1517.It is described in more detail, one In situation, before not deactivating the core 102 of itself and synchronous buffer 108 be written in square 1517, all other core 102 (other than deactivating the core 102 of itself) one SYNC 11 of write-in, then when the enable position 254 because of deactivated core 102 is to remove (clear) when determining whether synchronous situation has occurred and that, because control unit 104 does not consider further that deactivated core 102, therefore ought not stop When synchronous buffer 108 being written in square 1517 with the core 102 of itself, control unit 104 will detect the hair of synchronous situation Raw (in square 316) (please referring to Fig.1 6).That is, since all enabling cores 102 have been written into a SYNC 11, no matter stop SYNC 11 whether is had been written into core 102, control unit 104 then judges that synchronous situation has occurred and that.Flow proceeds to square 1524。
In square 1524, core 102 reads configuration buffer 112, and configuration words 252, which will reflect in square 1517, to be stopped Deactivated core 102.The core 102 then generates it according to the new value of configuration words 252 and configures relevant value.It more preferably says, in side It is performed by system firmware (for example, BIOS is arranged) that instruction is deactivated in block 1502, and after core 102 deactivates, system firmware is held The restarting of row system, for example, after in square 1526.During restarting, microprocessor 100 can carry out not It is same as previously having configured the operation of correlation generation in square 1524.For example, BSP can be for one not during restarting It is same as generating the core 102 before configuration correlation.Illustrate as yet another example, before guiding operating system by BSP determine with It stores to memory so that the system configuration information that can read of operating system is (for example, core 102 and logic processor in systems Quantity) can differ.Illustrate as another example, the APIC ID of the core 102 still used are different from before generating configuration correlation APIC ID, in the case, operating system will indicate interrupt requests and response is different from previously configuration correlation and produced by core 102 Raw interrupt requests.Illustrate as yet another example, the master of Fig. 9 encapsulation sleep state Handshake Protocols is executed in square 907 and 919 It can be the core 102 for being different from that previously configuration correlation generates to want core 102.Flow proceeds to decision block 1526.
In square 1526, core 102 restores the task of its execution before being interrupted in square 1526.The flow side of ending at Block 1526.
The microprocessor 100 described herein that dynamically reconfigures can be used in various applications.For example, it moves State, which reconfigures, to be used to test and/or simulate in the development process of microprocessor 100, and/or in on-the-spot test.Separately Outside, a user may wonder the performance and/or work(using only system when 102 subset of a core, one specific application program of operation The total amount of rate consumption.In one embodiment, when a core 102 is deactivated, its clock pulse can be made to stop and/or remove power supply, with It is set to there is no consumption power supply.In addition, in the system of high reliability, each core 102 can periodically check other cores 102 and 102 selected particular core 102 of core whether break down, the core of non-failure can disabling faulty core 102 and make remaining Core 102 executes dynamically to be reconfigured as described above.In this embodiment, control word 202 may include an additional field, make Write-in core 102 specifies the core 102 to be deactivated and changes described operation in fig.15 so that a core can in square 1517 Deactivate the core 102 for being different from core 102 itself.
6 are please referred to Fig.1, is to show the sequence diagram for operating an example according to the microprocessor 100 of Figure 15 flow charts.Herein In example, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, it should thus be appreciated that , in other embodiments, microprocessor 100 may include the core 102 of different number and can be that single crystal or polycrystal are micro- Processor 100.In this sequence diagram, the sequential of event is advanced downwards.
Core 1, which encounters the instruction that one deactivates itself and transmits one in response, to be reconfigured information and interrupts core 0 and core 2 (each square 1502).Core 1 is then written to SYNC 10 and enters sleep state (each square 1504).
Each core 0 and core 2 are finally interrupted from its current task and read the information (each square 1532).As The write-in SYNC 10 of response, each core 0 and core 2 simultaneously enters sleep state (each square 1504).As shown, each core It is written possible different with the time of SYNC 10.For example, due to the delay of the instruction, which is established when interruption When and execute.
When SYNC 10 is written in all cores 102, control unit 104 wakes up all cores (each square 1506) simultaneously.Core 0 And core 2 then determines that it will not be deactivated itself (each decision block 1508), and a SYNC 11 is written and enters sleep state (each square 1518).However, because core 1 determines that it deactivates itself, so it, which is written, in it deactivates 236 (each square of core position 1517).In this example, it is written after respective SYNC 11 is written in core 0 and core 2 and deactivates core position 236 for core 1, as shown in the figure. However, since control unit 104 determines that the core 102 that each enable position 254 is set is arranged in the positions S 222, control is single Member 104 is detected the synchronous situation and is occurred.That is, even if the positions S 222 of core 1 are not arranged, enable position 254 is in square 1517 The synchronization buffer 108 of core 1 is eliminated when being written.
When all available cores have been written into SYNC 11, control unit 104 wakes up all cores (each square 1522) simultaneously. As described above, in the case of a polycrystal microprocessor 100, core position 236 is deactivated when it is written in core 1, and locally control Unit 104 removes the local enable position 254 of core 1 respectively, and it is brilliant that local control unit 104 also propagates local enable position 254 to distal end Body 406.Therefore, Remote Control Unit 104 also detects the generation of synchronous regime and to wake up its crystal 406 simultaneously all available Core.Core 0 and core 2 then generate it based on the value for having updated configuration buffer 112 and configure correlation (each square 1524), and extensive Activity (each square 1526) before its multiple interruption.
Hardware semaphore (HARDWARE SEMAPHORE)
7 are please referred to Fig.1, a block diagram of hardware semaphore 118 in Fig. 1 is shown in.Hardware semaphore 118 includes one Possess position (owned bit) 1702, owner position (owner bit) 1704 and a state machine 1706, state machine 1706 to Update possesses position 1702 and owner position 1704 to respond the hardware semaphore 118 for being read and being written by core 102.More preferably say, In order to recognize the hardware semaphore 118 that core possesses at present, the quantity of owner position 1704 is log with the microprocessor 100 that 2 be bottom 102 quantity of core of configuration.In another embodiment, owner position 1704 includes that 100 each core 102 1 of microprocessor is corresponding Position.It is worth noting that, although one group possesses position 1702, owner position 1704 and state machine 1706 and is described with a hardware signal Amount 118 is realized, but microprocessor 100 may include multiple hardware semaphores 118, wherein each hardware semaphore 118 all includes upper The a set of hardware stated.It more preferably says, needs the exclusive operation for reading shared resource to execute, run in each core 102 The ownership that microcode reads and the hardware semaphore 118 is written to obtain one by 102 shared resources of core, is described in detail in down In the example of side.The microcode can join each multiple hardware semaphores 118 shared resource ownership different from microprocessor 100 It is tied.It more preferably says, the preset address in a nand architecture address space of core 102 by core 102 of hardware semaphore 118 It is middle to read and be written.The nand architecture address space can only be read by the microcode of a core 102, but can not be directly by user's journey Sequence code reads (for example, program instruction of x86 frameworks).To update hardware semaphore 118 possess position 1702 and the owner position 1704 operation of state machine 1706 is described as in Figure 18 and 19, and the use of hardware semaphore 118 is also described later.
8 are please referred to Fig.1, is shown when a core 102 reads the operational flowchart of hardware semaphore 118.Flow starts from Square 1802.
In square 1802, a core 102 is denoted as core x, reads hardware semaphore 118.As described above, more preferably saying, core 102 microcode reads the presumptive address in 118 the resided in nand architecture address space of hardware semaphore.Flow proceeds to judgement Square 1804.
In decision block 1804, state machine 1706 checks owner position 1704, to determine whether core 102 is hardware letter Number amount 118 the owner.If so, flow proceeds to square 1808;Otherwise, flow proceeds to square 1806.
In square 1806, which returns and reads the zero in core 102 to indicate the core 102 not Possess hardware semaphore 118, flow terminates in square 1806.
In square 1808, which returns and reads the value in core 102, to indicate that the core 102 possesses firmly Part semaphore 118, flow terminate in square 1808.
As described above, microprocessor 100 may include multiple hardware semaphores 118.In one embodiment, microprocessor 100 Including 16 hardware semaphores 118, and when a core 102 reads presumptive address, one 16 bit data values are received, each One of them different hardware semaphore 118 of corresponding 16 hardware semaphores 118, and indicate the core 102 of the reading presumptive address Whether corresponding hardware semaphore 118 is possessed.
9 are please referred to Fig.1, is the operational flowchart shown when a core 102 write-in hardware semaphore 118.Flow starts from Square 1902.
In square 1902, a core 102 is denoted as core x, hardware semaphore 118 is written, for example, as described above non- The preset address of framework.Flow proceeds to decision block 1804.
In decision block 1904, state machine 1706 check this possess position 1702, with determine hardware semaphore 118 whether be Any core 102 possesses or is not occupied (free).If being possessed, flow proceeds to decision block 1914;Otherwise, flow Proceed to decision block 1906.
In decision block 1906, state machine 1706 checks the value of write-in.If the value is 1, it is hard to indicate that core 102 is intended to obtain The ownership of part semaphore 118, then flow proceed to square 1908.If however, the value is 0,102 hardware to be abandoned of core is indicated The ownership of semaphore 118, then flow proceed to square 1912.
In square 1908, the update of state machine 1706 possesses position 1702 to 1, and owner position 1704 is arranged and indicates that core x is existing In the hardware semaphore 118 possessed.Flow terminates in square 1908.
In square 1912, which is not carried out the update for possessing position 1702, is also not carried out owner position 1704 Update, flow ends in square 1912.
In decision block 1914, state machine 1706 checks owner position 1704, to determine whether core x is hardware signal The owner of amount 118.If so, flow proceeds to decision block 1916;Otherwise, flow proceeds to square 1912.
In decision block 1916, state machine 1706 checks value be written.If the value is 1, indicate that the core 102 is intended to Obtain hardware semaphore 118 ownership, then flow proceed to square 1912 (wherein therefore core 102 possessed hardware semaphore 118, so not having more kainogenesis, as judged in decision block 1914).If however, the value is 0, indicate that the core 102 is intended to put The ownership of hardware semaphore 118 is abandoned, then flow proceeds to square 1918.
In square 1918, it is zero that the state machine 1706 update, which possesses position 1702, to indicate not having core 102 to possess firmly now Part semaphore 118, flow end at square 1918.
As described above, in one embodiment, microprocessor 100 includes 16 hardware semaphores 118.When a core 102 is written When the presumptive address, one 16 bit data values are written, each corresponds to 16 hardware semaphores 118, and one of them is different hard Part semaphore 118, and indicate whether the core 102 of the write-in presumptive address asks to possess (value 1) or abandon corresponding hardware signal The ownership (value zero) of amount 118.
In one embodiment, arbitrated logic arbitration asked to access the hardware semaphore 118 by core 102 so that core 102 by Hardware semaphore 118 serializes (Serialize) read/write hardware semaphore 118.In one embodiment, arbitrated logic exists Using a loop control justice algorithm (Round-Robin Fairness Algorithm) with access hardware signal between core 102 Amount 118.
Figure 20 is please referred to, is shown when microprocessor 100 needs a resource to monopolize institute using hardware semaphore 118 to execute The operational flowchart having the right.It further illustrates, hardware semaphore 118 is write to be encountered respectively in two or more core 102 It returns and makes to ensure that executing one in a sometime only core 102 writes back in the case of the shared failure of cache memory 119 instruction, And shared cache memory 119 is made to fail.The operation is but the microprocessor 100 with described by the angle of a single core Each core 102 ensures that a core 102 execution writes back and keeps the operation of other cores 102 invalid according to the present invention with whole.That is, The operation of Figure 20 ensures that WBINVD instruction process is serialized (Serialize).In one embodiment, the operation of Figure 20 can be one It is executed in microprocessor 100, WBINVD instructions is executed according to the embodiment in Fig. 7 A~7B.Flow starts from square 2002。
In square 2002, a core 102 encounters a speed buffering control instruction, seems WBINVD instructions.Flow carries out To square 2004.
In square 2004, in the write-in of core 102 1 to WBINVD hardware semaphores 118.In one embodiment, the microcode has been It distributes in one of hardware semaphore 118 to WBINVD operations.The core 102 then read WBINVD hardware semaphores 118 with Determine whether it obtains ownership.Flow proceeds to decision block 2006.
In decision block 2006, if core 102 determines that it obtains the ownership of WBINVD hardware semaphores 118, flow Journey proceeds to square 2008;Otherwise, flow is back to square 2004 to again attempt to obtain ownership.It should be noted that when instant The microcode of core 102 is recycled via between square 2004 to 2006, eventually by possessing the core 102 of WBINVD hardware semaphores 118 It is interrupted, because the core 102 executes WBINVD just in Fig. 7 A~7B and instructs and transmit an interruption to instant core in square 702 102.More preferably say, via each cycle, the microcode of instant core 102 checks interrupt status buffer, with observe other cores 102 its One of (for example, possessing the core 102 of the WBINVD hardware semaphores 118) whether send an interruption to instant core 102.This is immediately Core 102 then will execute Fig. 7 A~7B operation, and in square 749 according to fig. 20 recovery operation with attempt obtain hardware signal The ownership of amount 118, to execute its WBINVD instructions.
In square 2008, core 102 has obtained all flows for the time being and has proceeded to the square 702 in Fig. 7 A~7B to execute WBINVD is instructed.Since the WBINVD of part instructs operation, in Fig. 7 A~7B squares 748, the core 102 write-in zero to WBINVD To abandon its ownership in hardware semaphore 118.Flow ends at square 2008.
One, which is similar to the described operations of Figure 20, to be executed by the microcode, monopolized with other shared resources of acquisition all Power.It is non-core 103 that one core 102, which can get by using other resources of exclusive ownership used in a hardware semaphore 118, Buffer, shared by core 102.In one embodiment, 103 buffer of non-core includes a control buffer comprising every One core, 102 respective field.The field controls the operating aspect of each core 102.Since field is located in identical buffer, when When one core 102 is intended to update its respective field but can not update the field of other cores 102, it is temporary which must read the control Storage, the read value of modification then write back the value changed to controlling buffer.For example, microprocessor 100 can wrap 103 Properties Control buffer of a non-core (Performance Control Register, PCR) is included, is used to control core 102 Bus clock pulse ratio.In order to update its bus clock pulse ratio, a specific core 102 must read, change and write back PCR.Therefore, one In embodiment, microcode is configured as when core 102 possesses hardware semaphore 118 relevant with PCR, executes effective original of a PCR Sub- reading/modification/writes back.Bus clock pulse ratio determines that single 102 clock frequency of core is the support microprocessor via an external bus The multiple of the clock frequency of device 100.
Another resource is a reliable platform module (Trusted Platform Module, TPM).In one embodiment, Microprocessor 100 executes a reliable platform module of running microcode in core 102.In the given instant time, operation In a core 102 and core 102, the microcode of one of them implements TPM.However, implementing the core 102 of TPM may change over time.It is logical Use hardware semaphore 118 associated with TPM is crossed, the microcode of core 102 can ensure that an only core 102 implements TPM in the time.More It specifically describes, TPM states to special arbitrary access is written before abandoning implementing the TPM and deposits for the positive core 102 for executing TPM at present Reservoir 116, and the core 102 for taking over implementation TPM reads the state of TPM from special random access memory 116.Each The microcode of core 102 is configured as making when core 102 is intended to become the core 102 for executing TPM, and core 102 is by special random access memory The ownership of TPM hardware semaphores 118 is obtained before reading TPM states in device 116 first, and starts to execute TPM.Implement one In example, TPM generally conforms to the TPM specification issued by believable operation tissue (Trusted Computing Group), seems ISO/IEC11889 specifications.
As described above, tradition solution of resource contention between multiple processors is utilized in system storage Software signal amount (software semaphore).The potential advantage of hardware semaphore 118 described herein is that it can avoid The generation of additional transmissions amount in extra memory bus, and its access speed is faster than the memory of access system.
It interrupts, non-sleep synchronization request
Figure 21 is please referred to, is to show that the core 102 of flow chart according to fig. 3 sends out non-sleep synchronization request and operates an example Sequence diagram.In this example, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.So And, it should thus be appreciated that, in other embodiments, which may include the core 102 of different number.
A SYNC 14 is written in core 0, is not set in position 212 of sleeping, nor be set to 214 (example of selective wake-up position Such as, a non-sleep synchronization request) in.Therefore, control unit 104 allows core 0 to remain operational the (branch of each decision block 312 "No").
A non-sleep SYNC 14 is also finally written for core 1 and control unit 104 allows core 1 to remain operational.Finally, core 2 is written One non-sleep SYNC 14.As shown, the time of each core write-in SYNC 14 may be different.
When all cores have been written into non-sleep synchronization 14, control unit 104 simultaneously send a sync break to each core 0, Core 1 and core 2.Each core then receives sync break and service synchronization is interrupted (unless the sync break is shielded, in such case Under, which generally understands poll (poll) sync break).
Pilot processor is specified
In one embodiment, as described above, usual (for example, when the function of Figure 23 " all core BSP " is deactivated) core 102 specify this as bootstrap processor (BSP) and execute specified task, seem guiding work system.In one embodiment, lead to Often the quantity of (for example, when Figure 22 and the function of 23 " modification BSP " and " all core BSP " are deactivated respectively) virtual core is by core 102BSP is preset as 0.
However, inventor have observed that BSP is designated in such a way that one is different, it may be advantageous, and embodiment will It is described below.For example, many tests of part microprocessor 100 especially in manufacture is tested are operated by guiding System is executed with operation procedure code, to ensure that the part microprocessor 100 is normally carried out work.Because BSP cores 102 execute system The operating system is initialized and starts, therefore BSP cores 102 can be run in such a way that AP cores can not be run.In addition, can by observation Know, even in the operating environment of multi-threading (Multithreaded), it is larger that BSP usually bears the processing load compared with AP Part, therefore, AP cores 102 can not make as BSP cores 102 comprehensively test.Finally, may having certain actions, it only need to be by It seems that encapsulation sleep state as described in Fig. 9 is shaken hands association to execute that the BSP cores 102, which represent microprocessor 100 and are integral, View.
Therefore, embodiment, which describes any core 102, can be designated as BSP.In one embodiment, in the survey of microprocessor 100 During examination, testing results n times, wherein N are the quantity of 100 core 102 of microprocessor, and micro- place in each operation of test Reason device 100 is reconfigured so that BSP is different core 102.This can advantageously provide better test in the fabrication process Coverage rate, and also advantageously in the design process of microprocessor 100 disclosed in the mistake in microprocessor 100.It is another excellent Point is that each core 102 can have a different APIC ID can to respond different interrupt requests in different operations Wider test coverage is provided.
Figure 22 is please referred to, is the program flow diagram for showing configuration microprocessor 100.Figure 22 description with reference to figure 4 In polycrystal microprocessor 100 comprising two crystal 406 and eight cores 102.However, it should be appreciated that being described in this Dynamic reconfigure can be used have a different configuration of microprocessor 100, that is, have more than two crystal or single crystal, And more or less than eight cores 102 but at least two cores 102.This operation is described by angle from a single core, but microprocessor 100 each core 102 with overall dynamics operates according to the description and reconfigures the microprocessor 100.The flow side of starting from Block 2202.
In square 2202, microprocessor 100 is reset, and executes the initial part of its initialization, more preferably a mode It is similar to mode described in above figure 14.However, the generation of configuration correlation, seems the square 1424 in Figure 14, especially It is APIC ID and the BSP flags, is executed in a manner of described in square 2203 to 2204.Flow proceeds to square 2203.
In square 2203, core 102 generates its virtual nuclear volume, is more preferably described in Figure 14.Flow proceeds to judgement Square 2204.
In decision block 2204, one instruction of the sampling of core 102 is to determine whether a function can enable.The function is herein Referred to as " modification BSP " function.In one embodiment, the function of BSP can be changed by blowing a fuse 114.It more preferably says, is testing In the process, the fuse 114 of modification BSP functions is not blown, but a true value (True) is scanned up to and is melted with modification BSP functions In disconnected 114 relevant preservation buffer position of device, as shown in above-mentioned Fig. 1, so that modification BSP functions can enable.In this mode In, modification BSP functions in part microprocessor 100 and it is impermanent enable, but deactivated afterwards in power supply (power-up). It more preferably says, performed by microcode of the operation in square 2203 to 2214 by core 102.If modification BSP functions are activated, Flow proceeds to square 2205.Otherwise, flow proceeds to square 2206.
In square 2205, the modification of core 102 generated virtual nuclear volume in square 2203.In one embodiment, core The 102 virtual nuclear volumes of modification are to generate a cyclical function (Rotate of the produced virtual nuclear volume in square 2203 Function result) and an internal circulating load, as follows:
Virtual nuclear volume=cycle (internal circulating load, virtual nuclear volume).
Cyclical function recycles virtual check figure by recurring number between core 102 in one embodiment.Internal circulating load is to burn One value of disconnected fuse 114, or more preferably say, it is scanned up to keeps in buffer during the test.Table 1 shows each core 102 virtual check figure, ordered pair (amount of crystals 258, local nuclear volume 256) are shown in the left row of an example configuration, And each internal circulating load is shown in the row of top, amount of crystals 406 is two and 102 quantity of core of each crystal 406 is 4, and All cores 102 can be activated.In such mode, tester, which is authorized to, makes core 102 generate its virtual check figure and for example any have The APIC ID of valid value.Although for changing in the embodiment that virtual check figure is described in, other embodiments can be also expected. For example, loop direction can be on the contrary shown in table 1.Flow proceeds to square 2206.
Table 1
0 1 2 3 4 5 6 7
(0,0) 0 7 6 5 4 3 2 1
(0,1) 1 0 7 6 5 4 3 2
(0,2) 2 1 0 7 6 5 4 3
(0,3) 3 2 1 0 7 6 5 4
(1,0) 4 3 2 1 0 7 6 5
(1,1) 5 4 3 2 1 0 7 6
(1,2) 6 5 4 3 2 1 0 7
(1,3) 7 6 5 4 3 2 1 0
In square 2206, core 102 is produced by the default virtual nuclear volume generated in square 2203 or in square 2203 In the raw value filling local APIC ID buffers changed.In one embodiment, APICID buffers can be existed by the core 102 In storage address 0x0FEE00020 (for example, by passing through BIOS and/or operating system) is read from itself.However, In another embodiment, APIC ID buffers can be read by core 102 in the addresses MSR 0x802.Flow proceeds to decision block 2208。
In decision block 2208, core 102 determines whether its APIC ID inserted in square 2208 is zero.If so, Flow proceeds to square 2212;Otherwise, flow proceeds to square 2214.
In square 2212, core 102 sets its BSP flag to true (true), to indicate core 102 for BSP.Implement one In example, BSP flags are one of the x86APIC plots buffer (IA32_APIC_BASE MSR) of the core 102.Flow proceeds to Decision block 2216.
In square 2214, BSP flags are set to false as (false) by core 102, to indicate core 102 not for BSP, for example, In one AP.Flow proceeds to decision block 2216.
In decision block 2216, core 102 judges whether it is BSP, such as, if it is specified originally as in square 2212 BSP cores 102, and non-designated itself is AP cores 102 in square 2214.If so, flow proceeds to square 2218;It is no Then, flow proceeds to square 2222.
In square 2218, core 102 starts to extract and execute system initialization firmware (for example, BSP BIOS bootstrap Code).This may include with BSP flags and the relevant instructions of APIC ID, for example, it is temporary to read APIC ID buffers or APIC plots The instruction of device, in the case, core 102 restore the value being written in square 2206 and 2212/2214.It may also include as micro- place It seems encapsulation sleep state that Fig. 9 is described to execute operation that reason 100 unique core 102 of device, which represents microprocessor 100 and is integral, Handshake Protocol.It more preferably says, BSP cores 102 start to obtain and execute system initialization in a defined framework resetting vector solid Part.For example, in x86 frameworks, resets vector and be directed toward 0xFFFFFFF0.It more preferably says, it includes drawing to execute system initialization firmware The operating system is led, for example, being loaded into the operating system and changing operating system in order to control.Flow proceeds to square 2224.
In square 2222, core 102 stops itself and the initiating sequence from BSP is waited for refer to start to extract and execute It enables.In one embodiment, the initiating sequence received from BSP include to AP system initialization firmwares an interrupt vector (for example, AP BIOS programs code).This may include with BSP flags and the relevant instructions of APIC ID, in this case, core 102 restore exist The value being written in square 2206 and 2212/2214.Flow proceeds to square 2224.
In square 2224, when core 102 executes instruction, the core 102 is temporary based on its APIC ID is write in square 2206 The APIC ID of storage receive interrupt requests and respond the interrupt requests.Flow ends at square 2224.
As described above, according in an embodiment, the core 102 that virtual check figure is zero is preset as BSP.However, inventor is Observe may have a case that be designated as BSP to all cores 102 advantageous, embodiment will be described in lower section.For example, 100 developer of microprocessor has put into the significantly a large amount of time and has been designed in single-threaded at original research and development one (single-threaded) the huge test subject run in a monokaryon, and developer wants to test to survey using monokaryon Try multi-core microprocessor 100.For example, the test may be old and well-known in x86 realistic models dos operating system in run.
Running these tests in each core 102 can use the modification BSP functions described in Figure 22 with continuous one Mode in complete and/or by blow fuse or scanning to keep buffer change fuse value to deactivate all cores 102, But a core 102 is used for being tested.However, inventor have understood that this will than in all cores 102 simultaneously testing results needs More times (for example, being about 4 times in the case of one 4 core microprocessor 100), in addition, required test is each individually micro- The time of 100 part of processor be it is valuable, especially when manufacturing hundreds thousand of or more 100 parts of microprocessor, especially When many tests are tested in very expensive test equipment.
In addition, other may be when running more than one core 102 (or all cores 102) in the same time, due to it It will produce more thermal energy and/or attract more energy, the speed path in 100 logic of microprocessor that will be applied in more The case where multiple pressure power.The test run in this continuous mode may not generate additional pressure and disclose the speed road Diameter.
Therefore, embodiment, which describes all cores 102, to specify the BSP cores 102 so that all cores 102 can be performed simultaneously by dynamic One test.
Figure 23 is please referred to, is a program flow diagram of configuration microprocessor 100 in showing according to another embodiment.Scheming 23 description is with reference to the polycrystal microprocessor 100 in figure 4 comprising two crystal 406 and eight cores 102.However, Ying Keli Solution, dynamic described herein, which reconfigures can be used, has a different configuration of microprocessor 100, that is, has more than two Crystal or single crystal, and more or less than eight cores 102 but at least two cores 102.This operation is the angle institute from a single core Description, but each core 102 of microprocessor 100 with overall dynamics operates according to the description and reconfigures the microprocessor 100.Flow starts from square 2302.
In square 2302, microprocessor 100 is reset, and executes the initial part of its initialization, more preferably a mode It is similar to mode described in above figure 14.However, the generation of configuration correlation, seems the square 1424 in Figure 14, especially It is APIC ID and the BSP flags, is executed in a manner of described in square 2304 to 2312.Flow proceeds to decision block 2304。
In decision block 2304, core 102 is detected a function and can be activated.The function is referred to herein as " all cores BSP " functions.It more preferably says, blowing fuse 114 can be such that all core BSP functions are activated.More preferably say, during the test, The fuse 114 of all core BSP functions is not blown, but a true value (True) is scanned up to and fuses with all core BSP functions In 114 relevant preservation buffer position of device, as shown in above-mentioned Fig. 1, so that all core BSP functions can enable.In this mode In, all core BSP functions in part microprocessor 100 and it is impermanent enable, but stop after power supply (power-up) With.It more preferably says, performed by microcode of the operation in square 2304 to 2312 by core 102.If all core BSP functions are opened Used time, flow proceed to square 2305.Otherwise, flow proceeds to square 2203 in Figure 22.
In square 2305, no matter 258 quantity of crystal of local nuclear volume 256 and core 102 why, it is empty that core 102 sets its Nucleoid quantity is zero.Flow proceeds to square 2306.
In square 2306, the virtual nuclear volume that set value is zero in square 2305 is inserted local APIC by core 102 ID buffers.Flow proceeds to square 2312.
In square 2312, no matter 258 quantity of crystal of local nuclear volume 256 and core 102 why, its BSP is arranged in core 102 Flag is true (True) to indicate the core 102 for BSP.Flow is carried out to square 2315.
In square 2315, when a core 102 executes a memory access requests, microprocessor 100 is respectively modified often The higher address position of one core, 102 memory access requests address so that each core 102 accesses its individual storage space. That is according to the core 102 for generating memory access requests, microprocessor 100 changes higher address position, so that higher address position With 102 1 unique value of each core.In one embodiment, the modification of microprocessor 100 is by blowing indicated by the value of fuse 114 Higher address position.In another embodiment, amount of crystals 258 of the microprocessor 100 based on local nuclear volume 256 and core 102 Change higher address position.For example, in the embodiment that nuclear volume is 4 in a microprocessor 100, microprocessor 100 is changed Higher two positions of the storage address, and a unique value is generated in 102 higher two positions of each core.In fact, can The storage space addressed by microprocessor 100 is divided into N number of subspace, and wherein N is the quantity of core 102.Test program is opened Hair is so that it limits oneself itself to specify the address of the minimum subspace in N number of subspace.For example, it is assumed that microprocessor 100 It includes four cores 102 that the address of memory 64GB and microprocessor 100, which can be looked for,.The test, which is developed, only accesses memory most Low 8GB.When core 0 executes the instruction of access storage address A (lower 8GB in memory), microprocessor 100 is being deposited An address is generated in memory bus A (unmodified);When core 1 executes the instruction of access the same memory address A, the microprocessor 100 generate an address in memory bus A+8GB;When core 2 executes the instruction of access the same memory address A, micro- place Reason device 100 generates an address in memory bus A+16GB;And when core 3 executes the instruction of access the same memory address A When, which generates an address in memory bus A+32GB.In such mode, advantageously, core 102 will not It can mutually conflict in it accesses memory, test can be made to be appropriately carried out.It more preferably says, single-threaded test is performed in In one independent test machine, the microprocessor 100 can be individually tested.100 developer of the microprocessor develops test number It is supplied to the microprocessor 100 according to and by test machine, on the contrary, 100 developer of the microprocessor researches and develops result data, For the data result for comparing the microprocessor 100 during access and being written is written in a memory by test machine, to ensure Correct data are written in the microprocessor 100.In one embodiment, cache memory 119 is shared (for example, highest high Fast buffer storage, generate for external bus processing in address) be microprocessor 100 a part, configuration to Higher address position is changed when all core BSP functions enable.Flow proceeds to square 2318.
In square 2318, core 102 starts to extract and execute system initialization firmware (for example, BSP BIOS bootstrap Code).This may include with the BSP flags and the relevant instructions of APIC ID, for example, reading the APIC ID buffers or APIC plots The instruction of buffer, in the case, the core 102 restore the zero being written in square 2306.It more preferably says, the BSP cores 102 start to read and hold in the resetting vectorial (Architecturally-defined reset vector) that a framework defines Row system initialization firmware.For example, in x86 frameworks, resets vector and be directed toward the addresses 0xFFFFFFF0.It more preferably says, executing should System initialization firmware includes guiding operating system, for example, being loaded into the operating system and changing the operating system in order to control.Stream Journey proceeds to square 2324.
In square 2324, when core 102 executes instruction, the core 102 is temporary based on its APIC ID is write in square 2306 The APIC ID values that storage value is zero receive interrupt requests and respond the interrupt requests.Flow ends at square 2324.
Although all cores 102 are designated as being described in Figure 23 in the embodiment of the BSP, other embodiments can To consider multiple but be designated as the BSP all or fewer than core 102.
Although embodiment is described with an x86 type system for content, each core 102 uses a local APIC and tool in system There is the relevance between local APIC ID and BSP are specified, it should thus be appreciated that, the specified not office of the bootstrap processor It is limited to the embodiment of x86, but can be used in the system with different system framework.
The propagation of microcode patching (PATCH) for multinuclear
As observed by previously, it is possible to many important functions of mainly being executed by the microcode of microprocessor, and particularly, It correctly need to communicate and coordinate between the microcode example in being implemented in the microprocessor multinuclear.Due to the complexity of microcode, Therefore a significant probability shows that mistake will be present in needing in modified microcode.This can be caused via using new micro-code instruction to replace The microcode patching of the old micro-code instruction of the mistake is completed.That is, the microprocessor includes beneficial to the specific of microcode patching Hardware.Under normal circumstances, ideal is that micro- modification is applied to all cores of the microprocessor.Traditionally, by Framework instruction is individually performed in each core to execute repairing.However, traditional method might have problem.
First, the repairing to using microcode example (for example, core is synchronous, hardware semaphore use) intercore communication it is related or With need microcode intercore communication function (for example, across core adjust request, speed buffering control operation or power management, or dynamic it is more Core microprocessor configures) it is related.The execution of framework repairing application program may will produce form between a period of time on each core respectively, Its microcode patching be applied in some cores but not be applied in other cores (or a previous repairing application some cores and newly Repairing application to other cores).This is likely to result in an internuclear communication failure and the incorrect operation of the microprocessor.If should All cores of microprocessor use identical microcode patching, other expectable and not expected problem that may also generate.
Secondly, the framework of the microprocessor specifies many functions, can be micro- by this in certain examples (instance) Reason device is supported, and is not supported by other microprocessors.During operation, microprocessor can with support the specific function System software is communicated.For example, in the case of an x86 architectural framework microprocessors, x86CPUID instructions can be soft by system Part is executed to determine supported function setting.However, determining the instruction (for example, CPUID) of function setting respectively at micro- place It manages and is executed in each core of device.In some cases, a function can be deactivated because of the mistake that one was present in the time, and be solved Except the microprocessor.However, can be developed with the latter microcode patching for repairing this mistake, so that this function can be in repairing application After be activated.However, if repairing is implemented (for example, by applying a of repairing instruction in each core with traditional routine Do not instruct, be implemented on each core respectively), different core may depend on whether the repairing has been applied in core, be given one Time point indicates different functional configuration.This may be it is problematic, especially when the system software (such as operating system, for example, Internuclear Thread is helped to migrate), it is expected that all cores function setting having the same of the microprocessor.Especially, it has been observed that Some system softwares only obtain the functional configuration of a core, and assume other cores functional configuration having the same.
Furthermore each nuclear control and/or with the non-nuclear resource that core is shared (for example, synchronous relevant hardware, hardware signal Amount, shared PRAM, shared high-speed buffer or service unit) communication microcode example.Therefore, because in core wherein it One has no use (or two cores are with different microcode patchings), in general, two kinds with other cores using microcode patching It may be problematic that the microcode of different IPs carries out controlling or with non-nuclear resource communicate in two different ways simultaneously.
Finally, the repairing of traditional approach can also be used in the microcode patching hardware of the microprocessor, but it may make At other core repairing applications and by the interference of a core repair operation, if for example, the part of repairing hardware is internuclear shared.
It more preferably says, in framework instruction-level using microcode patching a to multi-core microprocessor in a manner of an atom (atomic) Embodiment with solve the problems, such as description in this article.First, by repairing application in whole microprocessor 100 in response to list The execution that a framework instructs in one core 102.That is, embodiment need not require system software to execute one in each core 102 using micro- Code repairing instruction (as described below).More specifically, information will be transmitted using the single core 102 that microcode patching instructs by encountering this And other cores 102 are interrupted to cause its microcode to make with another microcode cooperation for the example of repair part and all microcode examples It obtains the microcode patching to be applied in the microcode patching software of each core 102, and when deactivating interruption in all cores 102, share The repairing hardware of the microprocessor 100.Secondly, the microcode of the atom repairing application mechanism is run and realized in all cores 102 Example is mutually cooperated with another microcode, so that it avoids executing any framework and instructing existing (other than an application microcode patching instruction) All cores 102 of the microprocessor 100 have agreed to after being repaired using this, until all cores 102 are completed.That is, working as When any core 102 is using the microcode patching, framework instruction is executed without core 102.In addition, in one more preferably embodiment, institute There is core 102 to reach the identical place of the microcode to execute the repairing application for having and deactivating and interrupting, and use is only executed in core 102 later In repairing the micro-code instruction until all cores of the microprocessor 100 confirm that the repairing has been used.That is, working as When any core 102 of the microprocessor 100 is just using the repairing, core 102 does not have other than the micro-code instruction for using microcode patching Core 102 executes micro-code instruction.
Figure 24 is please referred to, is the block diagram for showing a multi-core microprocessor 100 according to another embodiment.The microprocessor Device 100 is similar to the microprocessor 100 of Fig. 1 in many aspects.However, the microprocessor 100 of Figure 24 further includes in its non-core 103 In service unit (Service Processing Unit, SPU) 2423, service unit (SPU) initial address it is temporary Storage 2497, a non-core microcode read-only memory (Read Only Memory, ROM) 2425 and a non-core microcode patching are deposited at random Access to memory (Random Access Memory, RAM) 2408.In addition, each core 102 includes a core PRAM2499, a repairing It can addressing content memorizer (Content Addressable Memory, CAM) 2439 and a core microcode ROM 2404.
Microcode includes micro-code instruction.The micro-code instruction be stored in the microprocessor 100 one or more memories (for example, Non-core microcode ROM 2425, non-core microcode patching RAM2408 and/or core microcode ROM 2404) in nand architecture instruction, wherein should Micro-code instruction is based on being stored in the nand architecture microprogram counter (Micro-program Counter, Micro- by a core 102 PC the extraction address (fetch) is extracted in), and is used by the core 102 to realize 100 instruction set architecture of microprocessor Instruction.More preferably say, which is translated into microcommand by a micro- transfer interpreter (Microtranslator), microcommand by Performed by the execution unit of the core 102, or in another embodiment, the micro-code instruction is directly performed by execution unit, herein In the case of, micro-code instruction is microcommand.The micro-code instruction is that nand architecture instruction means that it is not the instruction set of the microprocessor 100 The instruction of framework (Instruction Set Architecture, ISA), but its according to one be different from the architecture instruction set finger It enables collection and is encoded.The nand architecture microprogram counter is not defined by the instruction set architecture of the microprocessor 100, and different (Architecturally-defined) program counter is defined in the framework of the core 102.This is micro- as follows to realize for the microcode The some or all of instructions of the ISA instruction set of processor.ISA instructions are executed in response to one microcode of decoding, which is changed into Control one and the relevant microcode routine programs (Routine) of the ISA.The microcode routine program includes micro-code instruction.The execution Unit executes the micro-code instruction, or according to preferred embodiment, which is further translated for by the execution unit institute The microcommand of execution.The micro-code instruction (or the microcommand translated by the micro-code instruction) is by the execution performed by the execution unit As a result it is that defined result is instructed by the ISA.Therefore, relevant microcode is instructed (or to refer to from the microcode routine program with the ISA Enabling the microcommand of translation) the common execution of routine program is " to implement (Implement) " ISA by the execution unit to instruct. That is by executing the common execution performed by the execution unit of micro-code instruction (or the microcommand translated from the micro-code instruction) The operation by the ISA instructions in the input of the ISA instructions is completed, institute is instructed by the ISA to generate one The result of definition.In addition, when the microprocessor resets to configure the microprocessor, which can be performed (or translating to the microcommand being performed).
The core microcode ROM 2404 possesses by the microcode performed by the particular core 102 including the core microcode ROM 2404.This is non- Core microcode ROM 2425 also possesses by the microcode performed by the core 102.However, compared with core microcode ROM 2404, non-core ROM 2425 are shared by core 102.More preferably say, since the access time of non-core ROM 2425 is more than core ROM 2404, Non-core ROM 2425 possesses the microcode routine program for needing less performance and/or less frequently executing.In addition, non-core ROM 2425 possess the procedure code for being extracted and being executed by the SPU 2423.
Non-core microcode patching RAM2408 is also shared by core 102.Non-core microcode patching RAM2408 possesses by core 102 Performed micro-code instruction.When the content phase of the extraction address and one of project (entry) in repairing CAM 2439 Timing, then repairing CAM2439, which possesses, extracts address by one microcode of response and is exported to a micro- sequence by repairing CAM 2439 The patch address of row device (Microsequencer).In the case, the patch address of microsequencer output is the microcode Extraction address rather than the extraction of next sequence refer to address (or the destination address instructed in branching type), using non-as this Core repairs the reply of one repairing micro-code instruction of the outputs of RAM 2408.For example, because repairing micro-code instruction and/or after which micro- Code instruction is an error source, therefore a repairing micro-code instruction is carried out by being extracted in repairing RAM2408 in non-core, rather than non-from this The micro-code instruction extracted in core ROM 2425 or core ROM 2404.Therefore, which effectively replaces or repaiies Benefit resides in core ROM 2404 in the original microcode extraction address or the 2425 unexpected microcode of non-core microcode ROM refers to It enables.It more preferably says, repairing CAM 2439 and repairing RAM 2408 are loaded into and are referred to responding the framework being included in system software The operating system that order seems BIOS or is run in the microprocessor 100.
In other events, non-core PRAM 116 is worth to store used in the microcode by the microcode.These values A part of valid function is constant
Except the execution for the instruction (for example, a WRMSR instruction) that may clearly change the value via a repairing or for response one Except, when the microprocessor 100 is reset and is not modified during the operation of the microprocessor 100, since it is storage It is stored in the immediate value (immediate value) of the core microcode ROM 2404 or the non-core microcode ROM 2425 or in the microprocessor Device 100 is manufactured or blows the fuse 114 by the time point that the microcode is written to non-core PRAM 116.Advantageously, this A little values can be changed via repairing mechanism described herein, without changing the possible very expensive core microcode of cost ROM2404 or the non-core microcode ROM 2425, and without the fuse 114 that one or more do not blow.
In addition, non-core PRAM 116 is to preserve the repairing code for being extracted and being executed by the SPU 2423, such as this paper institutes It states.
Core PRAM 2499 is similar to non-core PRAM 116, to be special (private) or nand architecture, Mean that core PRAM 2499 is not in 100 framework user's program address space of microprocessor.However, unlike this is non- Core PRAM 116, every PRAM 2499 are only read by its respective core 102 and are not shared by other cores 102.As the non-core As PRAM 116, core PRAM2499 is also worth using to store used in the microcode by the microcode.Advantageously, these Value can be changed via repairing mechanism described herein, and without changing the core microcode ROM 2404 or non-core microcode ROM 2425。
The SPU 2423 has stored program processor including one, is an adjunct attached and different from each core 102 (adjunct).Although can perform the instruction (for example, ISA instructions of x86) of the ISA of the core 102 in 102 structure of the core, But the SPU 2423 can not be done so in structure.So that it takes up a position, for example, the operating system can not transport in the SPU 2423 Row can not also be such that the ISA operation system scheduler (for example, ISA instructions of x86) of the core 102 is transported in the SPU 2423 Row.In other words, the SPU2423 not system resources to be managed by the operating system.More precisely, the SPU 2423 is held Operation of the row for adjusting the microprocessor 100.In addition, the SPU 2423 can help to measure the performance of the core 102 and other Function.More preferably say, the SPU 2423 is smaller than the core 102, it is less complex and with less power consumption (for example, In one embodiment, which includes that built-in clock pulse gates (Clock Gating)).In one embodiment, SPU 2423 include a FORTH CPU cores.
The asynchronous events occurred together can be instructed possibly can not to handle very with by the mistake of removing performed by the core 102 It is good.However, it is advantageous that the SPU 2423 can be ordered by a core 102 to detect the event, and operation is executed, seems to establish One record shelves (log) change behavior and/or 100 external bus interface of microprocessor of 102 various aspects of core, using as detecing Survey the response of this event.The SPU 2423 can provide the record shelves information to the user, and it can also be mutual with tracker It is dynamic that ask, the tracker provides the record shelves information or request tire tracker executes other actions.In one embodiment, the SPU 2423 are able to access that controlling the buffer of the memory sub-system and the programmable interrupt controller of each core 102 and this is total to Enjoy the control buffer of speed buffering buffer 119.
The example that the SPU 2423 can detect event includes as follows:(1) the one just running of core 102, for example, the core 102 is one Not yet resignation (retire) programmable any instruction in the clock cycle of quantity;(2) one cores 102 are loaded into non-by memory one Data in speed buffering region;(3) temperature changes in the microprocessor 100;(4) operating system request is micro- at this 100 bus clock pulse of processor than one variation and/or ask 100 voltage level of microprocessor a variation;(5) meet this The microprocessor 100 of body changes voltage level and/or bus clock pulse ratio, for example, to reach power saving and improve performance;(6) one One internal timer overtime of core 102;(7) one speed bufferings spy upon (snoop), collide a modified scratchpad row (Cache line), and the scratchpad row is caused to be written back in memory;(8) temperature of the microprocessor 100, voltage, Bus clock pulse ratio exceeds a respective range;An external terminal (pin) of (9) one outer triggering signals in the microprocessor 100 In established by a user.
Advantageously, because of the procedure code 132 of core 102 described in 2423 independent operatings of SPU, it seems in the core not have The identical limitation of tracker microcode (tracer code) is executed in 102.Therefore, which can detect or be notified independence In the 102 instruction execution boundary of core event and do not interrupt the state of the core 102.
The SPU 2423 has the procedure code of its execution itself.The SPU 2423 can from non-core microcode ROM 2425 or from Its procedure code is extracted in non-core PRAM 116.That is, more preferably saying, the SPU 2423 and non-core ROM 2425 and the non-core The shared microcodes run in the core 102 of PRAM 116.The SPU 2423, to store its data, is wrapped using non-core PRAM 116 Include the record shelves.In one embodiment, which further includes the sequence port interface of itself, can transmit the record shelves To an external device (ED).Advantageously, the SPU 2423 can also indicate that the tracker run in a core 102 to believe the record shelves Breath is by the storages to system storage of non-core PRAM 116.
The SPU 2423 is communicated by state buffer and control buffer with the core 102.The SPU state buffer packets It includes and corresponding is described in top and the SPU 2423 can detect one of each event.It, should in order to notify 2,423 1 events of SPU Core 102 is arranged one in the SPU state buffers of the corresponding event.Some events position by the microprocessor 100 hardware institute Be arranged and some microcodes by the core 102 set by.The SPU 2423 reads the state buffer to determine to have occurred The list of event.One control buffer includes the position of corresponding each operation, and each operation is that the SPU 2423 response detectings exist An operation of one of event is specified in state buffer.That is, in each possible thing of the state buffer Part, one group of operative position are present in the control buffer.In one embodiment, each event has 16 act bits.Implement one In example, when the state buffer is written into indicate an event, the SPU 2423 can be caused to interrupt, using as the SPU 2423 read the response of the state buffer, to determine which event has occurred and that.Advantageously, can be so somebody's turn to do by reducing The demand of 2423 polls of the SPU state buffer is to save power supply.The state buffer and control buffer can also be referred to by execution User's program of (for example, RDMSR and WRMSR instruction) is enabled to read and write.
The executable group operations as one event response of detecting of the SPU 2423 include the following terms.(1) by the record Non-core PRAM 116 is written in shelves information.Operation for each write-in record shelves, multiple operative positions exist so that program is set Meter personnel specify the subset of the only specific record shelves information that should be written into.(2) by the record shelves are written in non-core PRAM 116 Information is to the sequence port interface.(3) one of write-in control buffer is to set an event of tracker.That is, The SPU 2423 can interrupt a core 102 and cause the tracker microcode that need to execute one group and the relevant operation of the event.The operation It can be by specified by previous user.In one embodiment, when the control buffer is written so that the thing is arranged in the SPU 2423 When part, this can cause 102 1 hardware check of core abnormal, and the hardware check abnormality processing machine check is to check tracker It is no to be activated.If so, hardware check exception handler conversion and control is to the tracker.The tracker reads the control buffer And if the event being arranged in the control buffer is user when having enabled the event of the tracker, the tracker by with The relevant user of event executes previously described operation.For example, an event can be arranged to cause the tracking in the SPU 2423 Device will be in the record shelves information writing system memory that be stored in non-core PRAM 116.(4) one control buffer of write-in, to make It is branched off by the microcode address specified by the SPU 2423 at the microcode.If this is that be particularly helpful to the microcode unlimited one In cycle so that the tracker cannot execute any significant operation, but the core 102 still executes and retracts (retire) this refers to It enables, means that the event that the processor is just executing will not occur.(5) one control buffer of write-in is so that a core 102 is reset.Such as Mentioned above, which can detect the core 102 that one is just carrying out and (for example, for some time programmable amounts, not yet move back Return (retire) any instruction) and reset the core.Whether the resetting microcode can check to check the resetting by 2423 institutes of SPU It initiates, if so, during initializing core 102, contributes to before removing the record shelves information to write out the record shelves information Into system storage.(6) shelves event is continuously recorded.In this mode, and one event of non-camp is interrupted, but the SPU 2423 one check the state buffer cycles (loop) in rotate (spin), and continuously record information to be shown in this with The relevant non-core PRAM116 of event, and may be selected that the sequence port interface additionally is written in the record shelves information.(7) it is written One control buffer issues a request to the shared cache memory 119 to stop a core 102, and/or stops the shared height 119 confirmation request of fast buffer storage is to core 102.This is particularly useful in the removal relevant design mistake of memory sub-system, as It is page translation tables (tablewalk) hardware error, or even the mistake can be changed during the microprocessor 100 operates, as It is that 2423 procedure codes of SPU are changed by a repairing, as described below.(8) 100 1 external bus of microprocessor is written to connect The control buffer of mouthful controller, to execute the processing in external system bus, seem the specific period or memory read/ Write cycle.(9) write-in is interrupted for example, generating one to another to a control buffer of 102 programmable interrupt controller of a core The mistake of core 102 or one I/O devices of simulation to core 102 or fixed reparation in the interrupt control unit.(10) this is total for write-in one A control buffer of cache memory 119 is enjoyed to control its size, for example, deactivating or enabling relevant in different ways Shared cache memory 119.(11) the control buffer of write-in 102 various functions unit of core is special to configure different performances Sign, seems branch prediction (branch prediction) and data preextraction (prefetch) algorithm.As described below, the SPU 2423 procedure codes can help to be repaired, even if completing the design of the microprocessor 100 and having produced the microprocessor 100 Later, the SPU 2423 is made to execute the defect of action repairing design as described herein or execute other functions.
The SPU initial addresses buffer 2497 keeps, when the SPU 2423 is removed and reset, starting the ground of extraction instruction Location.The SPU initial addresses buffer is written by core 102.The address can be located at non-core PRAM116 or non-core microcode ROM 2425 In.
Figure 25 is please referred to, is the framework block diagram shown according to one microcode patching 2500 of one embodiment of the invention.Scheming In 25 embodiment, which includes following part:One header 2502;One repairing 2504 immediately;This is repaired immediately 2504 check and correction and (Checksum) 2506;One CAM data 2508;One core PRAM repairings 2512;The CAM data 2508 and core One check and correction of PRAM repairings 2512 and 2514;One RAM repairings 2516;One non-core PRAM repairings 2518;Core PRAM repairings 2512 An and check and correction and 2522 for RAM repairings 2516.It proofreads and 2506/2514/2522 after being loaded on the microprocessor 100, Make the integrality of the microprocessor 100 verification repairing various pieces.It more preferably says, the microcode patching 2500 is by system storage And/or one non-volatile (Non-volatile) system read, for example, seem from system bios or expansible In the ROM or FLASH memory of firmware.Header 2502 describes each section of the repairing 2500, seems its size, is repaiied in its loading Whether position and the instruction part in the benefit each self-healing relational storage in part include one applied to the microprocessor 100 One effective flag of Efficient software patching.
The instant repairing 2504 includes procedure code (for example, instruction, preferable micro-code instruction) to be loaded on the non-of Figure 24 Core microcode patching RAM 2408 (for example, in square 2612 of Figure 26 A~26B), then performed by each core 102 (for example, The square 2616 of Figure 26 A~26B).The repairing 2500 also specifies the instant repairing 2504 to be loaded in repairing RAM2408 Address.It more preferably says, this repairs 2504 yards and changes the preset value being written by the resetting microcode immediately, seems to be written into influence to be somebody's turn to do The value for the configuration buffer that microprocessor 100 configures.It is held by each core outside repairing RAM2408 in instant repairing 2504 After row, it can't be performed again.In addition, follow-up RAM repairings 2516 be loaded into repairing RAM2408 process (for example, Square 2632 in Figure 26 A~26B) the instant repairing 2504 of repairing RAM2408 may be covered in.
RAM repairings 2516 include the repairing microcode in being substituted in core ROM2404 or the non-core ROM2425 that need to repair Instruction.RAM repairings 2516 further include when the repairing 2500 is by use, the repairing micro-code instruction is written into the repairing The address (for example, in square 2632 of Figure 26 A~26B) of the position in RAM 2408.The CAM data 2508 are loaded on each The repairing CAM2439 (for example, in square 2626 of Figure 26 A~26B) of core 102.It is with the behaviour of repairing CAM 2439 above Make described by angle, which includes one or more projects, and each project includes that a pair of of microcode extracts address.This One address is the micro-code instruction being extracted and the content by the extraction address matching.Second address is directed in the repairing Address in RAM 2408, the repairing microcode that there is repairing RAM 2408 substitution to be repaired micro-code instruction and be performed refer to It enables.
Different from the instant repairing 2504, RAM repairings 2516 maintain in repairing RAM2408, and (with according to repairing The repairing CAM2439 operations of CAM data 2508 are together) continue running to repair the core microcode ROM 2404 and/or the non-core Microcode ROM 2425, until being reset by another repairing 2500 or the microprocessor 100.
Core PRAM repairing 2512 includes being written into the data of the core PRAM2499 of each core 102 and every in the data One project is written into the address (for example, in square 2626 of Figure 26 A~26B) in core PRAM2499.Non-core PRAM repairings 2518 include being written into the data of non-core PRAM 116 and being written into non-core PRAM 116 in each project of the data Address (for example, in square 2632 of Figure 26 A~26B).
Figure 26 A~26B are please referred to, are to show that an operation of the microprocessor 100 in Figure 24 is micro- to propagate the one of Figure 25 Code repairing 2500 to multiple cores 102 of the microprocessor 100 a flow chart.The operation is retouched with a single and new angle It states, but 100 each core 102 of microprocessor is operated according to the present invention to propagate the microcode patching jointly to the microprocessor 100 All cores 102.Figure 26 A~26B describe the core that one encounters the instruction and are changed to the operation of the microcode using one, and flow starts In square 2602, and the operation of other cores 102, flow start from square 2652.It should be appreciated that multiple repairings 2500 can It is applied to the microprocessor 100 in different time during the microprocessor 100 operates.Such as one first repairing 2500 work as Seem during BIOS initialization, according to description atom in this article when the system including the microprocessor 100 is guided Embodiment and used and one second repairing 2500 is used after the operating system, to remove at this It is particularly useful for the purpose of 100 mistake of reason device.
In square 2602, one of core 102 encounters an instruction, and it applies the microcode patching in the microprocessor 100 Instruction.It more preferably says, which is similar to microcode patching recited above.In one embodiment, this is repaiied using microcode It is x86WRMSR instructions to mend instruction.It is instructed using microcode patching to respond this, which deactivates to interrupt and prevent to execute this and answer The microcode instructed with microcode patching.It should be appreciated that the system software including this using microcode patching instruction may include one Multiple instructions sequence, using the preparation applied as the microcode patching.It more preferably, however says, is instructed as the sequence single architecture Response, and the microcode patching is transmitted to all cores in the framework instruction-level with an atomic way.That is, in once Break and be deactivated in first core 102 (for example, in square 2602, which encounters this and instructed using microcode patching), when holding (for example, until after square 2652 when capable microcode propagates the microcode patching and is applied to 100 all cores 102 of microprocessor Until), interruption still remains deactivated;Furthermore it once being deactivated (for example, in square 2652) in other cores 102, is still deactivated (for example, being after square 2634 until the microcode patching has been applied in 100 all cores 102 of microprocessor Only).It is therefore advantageous that the microcode patching is transmitted with an atomic way in the framework instruction-level and is applied to the microprocessor In all cores 102 of device 100.Flow proceeds to square 2604.
In square 2604, which obtains the ownership of the hardware semaphore 118 in Fig. 1.It more preferably says, micro- place It includes one and the relevant hardware semaphore of repairing microcode 118 to manage device 100.It more preferably says, which obtains hardware letter in such manner The ownership of number amount 118, mode is similar to described by the Figure 20 of top, more specifically square 2004 and 2006.The hardware is believed Number amount 118 used due to be possible to core 102 one of them using a repairing 2500 to refer to using microcode patching as encountering one The response of order, and one second core 102 encounters an application microcode patching and instructs, this will be begun to use second to repair as second core 2500 are mended, incorrect execution is likely to result in, for example, due to the misuse of first repairing 2500.Flow proceeds to Square 2606.
In square 2606, which transmits a repair information to other cores 102 and transmits one and internuclear interrupt to other Core 102.It more preferably says, the core 102 is in a period of the time, interruption was deactivated (for example, the microcode does not allow itself to be interrupted) It prevents the microcode and instructs (square 2602) using microcode patching to respond this, or respond the interruption (square 2652), and keeping should In microcode, until square 2634.Flow proceeds to square 2608 by square 2606.
In square 2652, one of other cores 102 in addition to encountering this in square 2602 using microcode patching (for example, refer to A core except the core 102 enabled) it is interrupted and receives the repairing because of the internuclear interruption transmitted in square 2606 Information.In one embodiment, the core 102 in next framework instruction boundaries (for example, in next x86 instruction boundaries), which obtains, is somebody's turn to do It interrupts.In response to the interruption, which deactivates the microcode for interrupting and preventing to handle the repair information.Although as described above, Flow in square 2652 is with described by the angle of a single core 102, but each other cores 102 are not (for example, in square 2602 In core 102) be interrupted and receive the information in square 2652, and execute square 2608 to square 2634 the step of.Stream Journey proceeds to square 2608 by square 2652.
In square 2608, the synchronization request which is written a synchronous situation 21 (is denoted as in Figure 26 A~26B SYNC 21) it synchronizes in buffer 108 to it, and enable the core 102 enter sleep state by the control unit 104, and then work as institute When thering is core 102 to have been written into SYNC 21, waken up by the control unit 104.Flow proceeds to decision block 2611.
In decision block 2611, which judges whether it is the core 102 for meeting the microcode patching in square 2602 (compared with the core 102 for receiving the repair information in square 2652).If so, flow proceeds to square 2612;Otherwise, Flow proceeds to square 2614.
In square 2612, it is non-that which by a part for the instant repairing 2504 of the microcode patching 2500 is loaded into this Core repairs RAM 2408.In addition, the core 102 generate the one of loading repairing 2504 immediately check and and verify its with the check and correction and 2506 match.More preferably say, which also conveys information to other cores 102, indicate this it is instant repairing 2504 length and The instant repairing 2504 is loaded in the position in non-core repairing RAM2408.Advantageously, because executing reality known to all cores 102 The identical microcode of row microcode patching application, therefore when a previous RAM repairings 2516 are present in the non-core and repair RAM2408, Then due to (not being repaired assuming that being rendered in the microcode that the microcode patching is applied) in repairing CAM 2439 during the period In will not have collision (hit), therefore it will be safe to cover the non-core to repair RAM2408 using the new repairing.In another embodiment In, which is loaded into non-core PRAM 116, and the instant repairing 2504 in square 2616 by the instant repairing 2504 Before execution, this is repaired 2504 and copies to non-core repairing RAM 2408 from non-core PRAM 116 by core 102 immediately.More preferably It says, which repairs this part for being loaded into the non-core PRAM 116 for being preserved for this purpose immediately, for example, not It is used for a part of the non-core PRAM 116 of other purposes, seems to hold the value used in the microcode (for example, institute as above 102 state of core, TPM states or the effective microcode constant stated), and a part of of non-core PRAM 116 can be repaired (example Such as, in square 2632) so that any previous non-core PRAM repairings 2518 are not destroyed (clobber).In one embodiment, it carries Enter non-core PRAM 116 or the action replicated by non-core PRAM 116 executes in multiple stages, has been retained with reducing this Size needed for part.Flow proceeds to square 2614.
In square 2614, which is written the same of a synchronous situation 22 (being denoted as SYNC 22 in Figure 26 A~26B) Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores When 102 one SYNC 22 of write-in, waken up by control unit 104.Flow proceeds to square 2616.
In square 2616, which executes the instant repairing 2504 in non-core repairing RAM2408.As described above, In one embodiment, before the core 102 executes the instant repairing 2504, the core 102 is by the instant repairing 2504 by the non-core Repairing RAM 116 is copied to non-core repairing RAM 2408.Flow is carried out to square 2618.
In square 2618, which is written the same of a synchronous situation 23 (being denoted as SYNC 23 in Figure 26 A~26B) Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores When 102 one SYNC 23 of write-in, waken up by control unit 104.Flow proceeds to decision block 2621.
In decision block 2621, which determines whether the core 102 is that this encountered in square 2602 applies microcode Repair the core 102 of instruction (compared with the core 102 for receiving the repair information in square 2652).If so, flow carries out To square 2622;Otherwise, flow proceeds to square 2624.
In square 2622, which is loaded into non-core PRAM by the CAM data 2508 and core PRAM repairings 2512 116.In addition, the core 102 generates an inspection of loading CAM data 2508 and core PRAM repairings 2512 and and verifies itself and the school Pair and 2514 match.It more preferably says, which also conveys information to other cores 102, indicates the CAM data 2508 and core The length and the CAM data 2508 of PRAM repairings 2512 and core PRAM repairings 2512 are loaded in non-core PRAM 116 Position.It more preferably says, which is loaded into the one of non-core PRAM 116 by the CAM data 2508 and core PRAM repairings 2512 Member-retaining portion is similar to institute in square 2612 so that any previous non-core PRAM repairings 2518 are not destroyed (clobber) The mode of description.Flow advances to square 2624.
In square 2624, which is written the same of a synchronous situation 24 (being denoted as SYNC 24 in Figure 26 A~26B) Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores When 102 one SYNC 24 of write-in, waken up by control unit 104.Flow proceeds to square 2626.
In square 2626, which is loaded into it by non-core PRAM 116 by the CAM data 2508 and repairs CAM 2439.In addition, core PRAM repairings 2512 are loaded into its core PRAM 2499 by the core 102 by non-core PRAM 116.It is advantageous It is to be rendered in identical microcode in microcode patching application due to just being executed known to all cores, even if correspondence RAM repairings 2516 It is not yet written into non-core repairing RAM 2408 (it will occur in square 2632), due to during the period (assuming that carrying out It is not repaired in the microcode of microcode patching application) will not have collision (hit) in repairing CAM 2439, therefore using should It is safe that CAM data 2508, which are loaded into repairing CAM 2439,.Further, since just being executed known to all cores 102, to be rendered in this micro- Code repairing application in identical microcode, and interrupt incite somebody to action not in any core 102 using until the repairing 2500 is transmitted to institute Until having core 102, therefore by 2512 performed any update to core PRAM 2499 of core PRAM repairings comprising to Change the update (for example, function setting) for the value that may influence the core 102 operation, guarantee will not be seen in framework, until this Until repairing 2500 has been transmitted to all cores 102.Flow proceeds to square 2628.
In square 2628, which is written the same of a synchronous situation 25 (being denoted as SYNC 25 in Figure 26 A~26B) Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores When 102 one SYNC 25 of write-in, waken up by control unit 104.Flow proceeds to decision block 2631.
In decision block 2631, which determines whether the core 102 is that this encountered in square 2602 applies microcode Repair the core 102 of instruction (compared with the core 102 for receiving the repair information in square 2652).If so, flow carries out To square 2632;Otherwise, flow proceeds to square 2634.
In square 2632, which is loaded into RAM repairings 2516 to the non-core and repairs RAM 2408.In addition, the core 102 are loaded into non-core PRAM repairings 2518 to non-core PRAM 116.In one embodiment, non-core PRAM repairings 2518 include By the procedure code performed by the SPU 2423.In one embodiment, non-core PRAM repairings 2518 include the microcode institute use value Update, as described above.In one embodiment, non-core PRAM repairings 2518 include 2423 procedure codes of SPU and the microcode The update of institute's use value.Advantageously, because just executed known to all cores 102 be rendered in the microcode patching application in it is identical micro- Code, more specifically, the repairing CAM 2439 of all cores 102 has been loaded into the new CAM data 2508 (for example, in square In 2626), and (be not repaired assuming that being rendered in the microcode that the microcode patching is applied) in repairing CAM during the period To not have collision (hit) in 2439.It is rendered in phase in microcode patching application further, since just being executed known to all cores 102 With microcode, and interrupt incite somebody to action not in any core 102 using until the repairing 2500 is transmitted to all cores 102, by 2518 performed any update to non-core PRAM 116 of non-core PRAM repairings, including may influence the core to change The update (for example, function setting) of the value of 102 operations, guarantee will not be seen in framework, until the repairing 2500 has been transmitted Until all cores 102.Flow proceeds to square 2634.
In square 2634, which is written the same of a synchronous situation 26 (being denoted as SYNC 26 in Figure 26 A~26B) Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores When 102 one SYNC 26 of write-in, waken up by control unit 104.Flow ends at square 2634.
After square 2634, if procedure code is loaded on the non-core PRAM116 for the SPU 2423, this is repaiied It mends core 102 also then to start to execute the procedure code, as described in Figure 30.In addition, after square 2634, the repairing core 102 release exists Acquired hardware semaphore 118 in square 2634.Furthermore, after square 2634, in the core 102 restarting State interruption.
Figure 27 is please referred to, is the sequential of an example of the microprocessor operation for showing 6A~26B flow charts according to fig. 2 Figure.In this example, there are three cores 102 for a microprocessor 100 configuration tool, are denoted as core 0, core 1 and core 2, as shown in the figure.So And, it should thus be appreciated that, in other embodiments, which may include the core 102 of different number.In this sequence diagram In, the sequential that event carries out is as described in lower section.
Core 0 receives the request (each square 2602) of request repairing microcode and obtains the hardware semaphore with response 118 (each squares 2604).Core 0 then transmits a microcode patching information and interrupts to core 1 and core 2 (each square 2606).Core 0 It is then written to a SYNC 21 and enters sleep state (each square 2608).
Each core 1 and core 2 are finally by being interrupted and reading the information (each square 2652) in its current task.It is right This, each core 1 and core 2 are written a SYNC 21 and and enter sleep state (each square 2608).As shown, for example, due to When the interruption is established, the factor of the instruction delay is just being executed, the time of each core write-in SYNC 21 may be different.
When all cores have been written into SYNC 21, which wakes up (each square 2608) by all cores simultaneously. The instant repairing 2504 is then loaded into non-core PRAM 116 (each square 2612) by core 0, and a SYNC 22 is written, and Into sleep state (each square 2614).A SYNC 22 is written in each core 1 and core 2, and enters sleep state (each square 2614)。
When all cores have been written into the SYNC 22, which wakes up (each square by all cores simultaneously 2614).Each core executes 2504 (each squares 2616) of instant repairing and a SYNC23 is written, and it is (every to enter sleep state One square 2618).
When all cores have been written into the SYNC 23, which wakes up (each square by all cores simultaneously 2618).The CAM data 2508 and core PRAM repairings 2512 are then loaded into non-core PRAM 116 (each square 2622) by core 0, And a SYNC 24 is written, and enter sleep state (each square 2624).
When all cores have been written into the SYNC 24, which wakes up (each square by all cores simultaneously 2624).Each core then uses the CAM data 2508 to be loaded into it and repairs CAM 2439, and (every using core PRAM repairings 2512 One square 2626) it is loaded into its core PRAM 2499, and a SYNC 25 is written, and enter sleep state (each square 2628).
When all cores have been written into the SYNC 25, which wakes up (each square by all cores simultaneously 2628).RAM repairings 2516 are then loaded into the non-core and repair RAM 2408 by core 0, and non-core PRAM repairings 2518 are carried Enter to non-core PRAM 116, and one SYNC 26 of write-in, and enters sleep state (each square 2634).
When all cores have been written into the SYNC 26, which wakes up (each square by all cores simultaneously 2634).As described above, if procedure code has been loaded on for the non-core PRAM 116 in the SPU 2423 with square 2632 When step, which also then starts to execute the procedure code, as described by following figure 30.
Figure 28 is please referred to, is the block diagram for showing a multi-core microprocessor 100 according to another embodiment.The microprocessor Device 100 is similar to the microprocessor 100 of Figure 24 in many aspects.It is repaired however, the microprocessor 100 of Figure 28 does not include a non-core RAM, but include that a core repairs RAM 2808 in each core 102, it provides similar with Figure 24 non-core repairing RAM 2408 Function.However, core repairing RAM 2808 in each core 102 by its respectively the institute of core 102 it is special and not with 102 institute of other cores It is shared.
Please refer to Figure 29 A~29B, be in the Figure 28 shown according to another embodiment the microprocessor 100 propagating One microcode patching to multiple cores 102 of the microprocessor 100 an operational flowchart.In another reality of Figure 28 and Figure 29 A~29B It applies in example, the repairing 2500 of Figure 25 can be changed so that the check and correction and 2514 repairs 2516 using the RAM, rather than using should Core PRAM repairings 2512, and repair 2512 and RAM repairings 2516 in the integrality of the CAM data 2508, core PRAM and be loaded into After the microprocessor 100 (for example, square 2922 in Figure 29 A~29B), the microprocessor 100 is enabled to verify the CAM numbers 2512 and RAM repairings 2516 are repaired according to 2508 integrality, core PRAM.The flow chart of Figure 29 A~29B class in many aspects It is similar to the flow chart of Figure 26 A~26B, and the square equally numbered is also similar.However, square 2912 replaces square 2612, square 2916 replace square 2616, square 2922 replaces square 2622, square 2926 replaces square 2626 and 2932 side of replacement of square Block 2632.In square 2912, which is loaded into non-core PRAM 116 by the instant repairing 2504 and (rather than is loaded into one Non-core repairs RAM).In square 2916, the core 102 execute this it is instant repairing 2504 before, by this it is instant repairing 2504 from Non-core PRAM 116 copies to core repairing RAM 2808.In square 2922, in addition to the CAM data 2508 and core PRAM are repaiied It mends except 2512, which is loaded into non-core PRAM 116 by RAM repairings 2516.In square 2926, which removes The CAM data 2508 are loaded into it by non-core PRAM 116 and repair CAM 2439 and by core PRAM repairings 2512 by this Non-core PRAM 116 is loaded into except its core PRAM2499, which also carries RAM repairings 2516 from non-core PRAM 116 Enter to it and repairs RAM 2808.In square 2932, it is different from the square 2632 of Figure 26 A~26B, which does not repair the RAM It mends 2516 and is loaded into non-core repairing RAM.
It can be by being observed in above-described embodiment, beneficial to propagating to 100 core of microprocessor, 102 each relational storage 2439/ 2499/2808 and the atom propagation of the microcode patching 2500 to related non-nuclear memory 2408/116 carry out in such manner with true The integrality and validity of the repairing 2500 are protected, even if there are multiple cores 102 being performed simultaneously, 102 energy shared resource of core is no Then when applied to traditional approach, core 102 may destroy each section of (clobber) another core repairing.
Repair service processor procedure code
Figure 30 is please referred to, is to show the microprocessor 100 of Figure 24 to repair the flow of a service processor procedure code Figure.Flow starts from square 3002.
In square 3002, which is loaded into the procedure code executed by the SPU 2423 in a repairing specified one Non-core PRAM 116 in patch address, as described in Figure 26 A~26B squares 2632 above.Flow enters the square 3004.
In square 3004, which controls the SPU 2423 to execute the procedure code in patch address, for example, the SPU 2423 procedure code is written in the address in non-core PRAM 116 in square 3002.In one embodiment, the SPU 2423 Configuration is vectorial (for example, being extracted the SPU 2423 is removed after resetting to extract its resetting since initial address buffer 2497 The address of instruction) and the core 102 the initial address buffer 2497 is written into the patch address, being then written to one makes this In the control buffer that SPU 2423 is reset.Flow proceeds to square 3006.
In square 3006, which starts in the patch address extraction procedure code (for example, extracting its first finger Enable), for example, the address in SPU 2423 procedure codes to non-core PRAM 116 is written in square 3002.In general, it is resident 2423 Hotfix codes of SPU in non-core PRAM 116 will execute one and redirect (jump) to residing in non-core ROM 2423 procedure codes of SPU in 2425.Flow ends at square 3006.
The function of repairing 2423 procedure codes of SPU may be particularly useful.For example, the SPU 2423 can be used for substantially Of short duration performance test, for example, it may be not intended to that 2423 procedure codes of performance test SPU is made to become the microprocessor 100 A permanent part, and only become a part for development part, for example, for manufacturing part, only become development part A part.In another example, which can be used to look for and/or repair mistake.In another example, the SPU 2423 It can be used to configure the microprocessor 100.
The atom for being updated to the visual storage resources of the instant framework of each core is propagated
Figure 31 is please referred to, is the block diagram for showing a multi-core microprocessor 100 according to another embodiment.The microprocessor Device 100 is similar to the microprocessor 100 of Figure 24 in many aspects.However, 100 each core 102 of microprocessor of Figure 31 further includes Visible type of memory range buffer (Memory Type Range Registers, MTRRs) 3102 on framework.Also It is to say, each core 102 instantiates visible MTRR 3102 on framework, even if System Software Requirement MTRR 3102 is in all cores It is consistent in 102 (more detailed description is as follows).MTRR 3102 is that each core instantiates visible storage resources on framework Visible storage resources embodiment is described as follows on example and other each core instantiation frameworks.(although figure do not show that, But each core 102 further includes core PRAM 2499, core microcode ROM 2404, repairing CAM 2439 in Figure 24, and real one It applies in example, the core microcode patching RAM 2808 of Figure 28).
MTRR 3102 provides a kind of system software so that a type of memory in 100 system storage of microprocessor Multiple and different physical address ranges is related in address space.The example of different memory type includes strong not cacheable (strong uncacheable), not cacheable (uncacheable), write-in combine (write-combining), write-in logical It crosses (write through), write back (write back) and write protection (write protected).Every MTRR3102 is (bright Really or impliedly) specify a memory range and its type of memory.The common value of each MTRR3102 defines a memory and reflects It penetrates, specifies the type of memory of different memory ranges.In one embodiment, MTRR3102 be similar to Intel 64 with And IA-32 Framework Software developer's handbooks, the 3rd:System Programming guide, in September, 2013, especially at Section 11.11 Description, is cited and forms part of this specification herein.
Wish the memory defined in MTRR 3102 be mapped in for be in 100 all cores of the microprocessor it is identical, So that the software operated in the microprocessor 100 has a memory consistency.However, in traditional processor, and No hardware supported is to maintain the consistency of the internuclear MTRRs of a multi-core processor.3rd 11- of Intel handbooks as mentioned previously Description is explained in page 20 bottoms, " P6 and more nearest processor families provide have no provide to maintain [MTRRs values it is consistent Property] hardware supported ".Therefore, system software is then responsible for maintaining the consistency across core MTRR.Quote Intel handbooks the in top 11.11.8 an algorithm of section description system software is closed to maintain and update with each nuclear phase of its MTRRs multi-core processor Consistency, for example, all cores execute the instruction of its respective MTRRs of update.
On the contrary, the system software one of them middle update MTRR 3102 can respectively be asked in the core 102 (instance), and in an atomic way being conducive to the core 102 propagation, this is updated in 100 all cores 102 of microprocessor The embodiment description of MTRR 3102 respectively asked (is similar in this article and describes embodiment institute in Figure 24 to Figure 30 above The mode of the microcode patching executed).It provides a kind of maintaining 3102 framework instruction-levels of MTRR of different IPs 102 The method of consistency.
Figure 32 is please referred to, is that the microprocessor 100 is updated to micro- place to propagate a MTRR 3102 in display Figure 31 Manage the operational flowchart of one of multiple cores 102 of device 100.Described by angle of the operation from a single core, but the microprocessor 100 each core 102 is carried out according to propagating the MTRR3102 jointly and be updated to the description of 100 all cores 102 of microprocessor Operation.More specifically, Figure 32 descriptions encounter the operation for the core for updating the MTRR 3102 instructions, flow starts from square 3202, and the operation of other cores 102, flow start from square 3252.
In square 3202, core 102 one of them encounter the instruction that the instruction core updates its MTRR 3102.Namely It says, the MTRR more new commands include that a MTRR3102 identifiers and one are written into the updated value of the MTRR 3102.Implement one In example, the MTRR more new commands are x86WRMSR instructions, to specify in EAX:The updated value in EDX buffers and MTRR3102 identifiers of the ECX buffers, for the addresses MSR in the MSR address spaces of the core 102.For sound It should MTRR more new commands, the deactivated interruption of the core 102 and the microcode for preventing to execute the MTRR more new commands.It should be appreciated that The system software including the MTRR more new commands may include a multiple instructions sequence, using as the 3102 newer preparations of MTRR. It more preferably, however says, as the response of sequence single architecture instruction, the MTRR 3102 of all cores 102 is instructed in the framework It is updated with an atomic way in grade.It is deactivated in first core 102 (for example, in square 3202 that is, once interrupting In, which encounters the MTRR more new commands), when the microcode of execution propagates new 3102 values of MTRR to 100 institute of microprocessor When having core 102 (for example, until after square 3218), interruption still remains deactivated.Furthermore once the quilt in other cores 102 It deactivates (for example, in square 3252), is still deactivated until the MTRR 3102 of 100 all cores 102 of the microprocessor has updated Until (for example, until after square 2634).It is therefore advantageous that 3102 values of new MTRR in the framework instruction-level with One atomic way is transmitted in all cores 102 of the microprocessor 100.Flow proceeds to square 3204.
In square 3204, which obtains the ownership of the hardware semaphore 118 in Fig. 1.It more preferably says, micro- place Managing device 100 includes and a 3102 relevant hardware semaphores 118 of a MTRR.It more preferably says, which obtains firmly in such manner The ownership of part semaphore 118, mode is similar to described by the Figure 20 of top, more specifically square 2004 and 2006.This is hard Part semaphore 118 is used, and due to being possible to core 102, one of them executes a MTRR 3102 and updates, using as encountering a MTRR The response of more new command, and one second core 102 encounters a MTRR more new commands, it should update will be started as second core The response of MTRR3102, this is likely to result in incorrect execution.Flow proceeds to square 3206.
In square 3206, a core 102 transmits a MTRR fresh informations and to other cores 102 and transmits 102 1 core of other cores Between interrupt.More preferably say, in a period of the time, interruption was deactivated (for example, the microcode does not allow itself to be interrupted), the core 102 prevent the microcode to respond the MTRR more new commands (in square 3202) or respond the interruption (in the square 3252), And be maintained in the microcode, until square 3218.Flow proceeds to square 3208.
In square 3252, one of other cores 102 are (for example, in addition to encountering the MTRR more new commands in square 3202 A core except the core 102) it is interrupted and receives MTRR updates because of the internuclear interruption transmitted in square 3206 Information.In one embodiment, the core 102 in next framework instruction boundaries (for example, in next x86 instruction boundaries), which obtains, is somebody's turn to do It interrupts.In response to the interruption, which deactivates the microcode for interrupting and preventing to handle the MTRR fresh informations.Though as described above, So the flow in square 3252 is with described by the angle of a single core 102, but each other cores 102 are not (for example, in square Core 102 in 3202) information is interrupted and received in square 3252, and execute in square 3208 to the step of square 3234 Suddenly.Flow proceeds to square 3208 by square 3252.
In square 3208, which is written the synchronization request (SYNC 31 is denoted as in Figure 32) of a synchronous situation 31 It is synchronized in buffer 108 to it, and enables the core 102 enter sleep state by the control unit 104, and then when all cores 102 When having been written into SYNC 31, waken up by the control unit 104.Flow proceeds to decision block 3211.
In decision block 3211, which judges whether it is to meet the MTRR more new commands in square 3202 Core 102 (compared with the core 102 for receiving the MTRR fresh informations in square 3252).If so, flow proceeds to square 3212;Otherwise, flow proceeds to square 3214.
In square 3212, which will be updated MTRR identifiers and the MTRR quilts of instruction by the MTRR Update is so that the MTRR updated value that all other core 102 can be seen that is loaded into non-core PRAM 116.In an x86 embodiments In the case of, MTRR 3102 includes:(1) repair coverage MTRR comprising one via single WRMSR instruction newer single 64 Position MSR and (2) different range MTRR comprising two 64 MSR, every MSR are written by a different WRMSR instructions, For example, the two WRMSR instructions specify the different addresses MSR.For different range MTRRs, one of the MSR (should PHYSBASE buffers) include the memory range a plot and one to specify the type of memory a type field, And others MSR (the PHYSMASK buffers) includes that the masking column that the range covers (mask) is arranged in a significance bit and one Position.It more preferably says, the MTRR updated value which is loaded into non-core PRAM 116 is as follows.
If 1, the MSR is determined as the PHYSMASK buffers, which is loaded into non-core PRAM 116 1 128 Updated value, the updated value include by specified by the WRMSR instruction new 64 place value (it includes the significance bit and shading values) and The current value of the PHYSBASE buffers (it includes base value and types value).
If 2, the MSR is determined as the PHYSBASE buffers:
If a, significance bit is just being set in the PHYSMASK buffers, which is loaded into non-core PRAM 116 One 128 updated value, the updated value include that (64 place value includes the base for this is new specified by the WRMSR instruction 64 place values Value and types value) and the PHYSMASK buffers current value (current value includes the significance bit and shading values).
If b, significance bit is just being set in the PHYSMASK buffers, which is loaded into non-core PRAM 116 One 64 updated value, the updated value only include that (64 place value includes the base for this is new specified by the WRMSR instruction 64 place values Value and types value).
If in addition, the updated value of the write-in is one 128 values, which is arranged a flag in non-core PRAM 116 Mark, if also, updated value when being one 64 values, which removes the flag.Flow proceeds to square by square 3212 3214。
In square 3214, which is written the synchronization request of a synchronous situation 32 (SYNC 32 is denoted as in Figure 32) Buffer 108 is synchronized to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores 102 are written When one SYNC 32, waken up by control unit 104.Flow proceeds to square 3216.
In square 3216, which reads the MTRR 3102 being written in square 3212 from non-core PRAM 116 Identifier and the MTRR updated value.Advantageously, the MTRR updated value propagate with an atomic way execute so that it is any may The update guarantee for influencing the MTRR 3102 of the operation of respective core 102 is architecturally invisible, until the updated value has been transmitted to institute Until the MTRR 3102 for having core 102, it is rendered in identical microcode in the MTRR more new commands due to just being executed known to all cores, And it interrupts and will not be used in any core 102, be until the updated value is transmitted to 102 respective MTRR 3102 of all cores Only.As described in square 3212 in above the present embodiment, if the flag is set in square 3212, which also updates (other than fixed MSR) PHYSMASK the or PHYSBASE buffers;Otherwise, if the flag is when removing (clear), Then the core 102 only updates fixed MSR.Flow proceeds to square 3218.
In square 3218, which is written the synchronization request of a synchronous situation 33 (SYNC 33 is denoted as in Figure 32) Buffer 108 is synchronized to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores 102 are written When one SYNC 33, waken up by control unit 104.Flow ends at square 3218.
After square 3218, which discharges the hardware semaphore 118 obtained in square 3204.More Further, after square 3218, the core 102 restarting is interrupted.
From from Figure 31 and Figure 32 it is found that operating in system software in Figure 31 microprocessors 100 can be conducive to execute at this A MTRR more new commands are executed in 100 single core 102 of microprocessor to complete to update the finger of 100 all cores 102 of microprocessor Determine MTRR 3102, and non-individual executes a MTRR more new commands in each core 102, can provide the integrality of system.
One instantiation specific MTRR3102 in each core 102 is a system management range buffer (System Management Range Register, SMRR) 3102.Since the SMRR 3102 possesses procedure code and and System Management Mode The operation of (System Management Mode, SMM) relevant data, such as a system management interrupt (System Management Interrupt, SMI) processor, therefore be referred to as by the memory range specified by the SMRR 3102 The regions SMRAM.When the procedure code run in a core 102 is attempted to access the regions SMRAM, if the core 102 runs on SMM In, then the core 102 only allows this access;Otherwise, which ignores the write-in that the regions SMRAM are written, and restores by this The fixed value of each is read in the regions SMRAM.In addition, if the core 102 operated in the SMM is attempted at this Program code outside the regions SMRAM, then the core 102 will establishment one hardware check exception.In addition, when the core 102 operates in SMM When, which only allows procedure code to be written in the SMRR3102.This is conducive to SMM procedure codes and data in the regions SMRAM Protection.In one embodiment, which is similar in Intel64 and IA-32 Framework Software developers handbook the 3rd Volume:System Programming guide, in September, 2013 are drawn herein especially in 11.11.2.4 and 34.4.2.1 section descriptions With and form part of this specification.
In general, each core 102 has the example of its own SMM procedure codes and data in memory.Desirably The SMM procedure codes and data of each core 102 are protected to come not only from the procedure code run in itself, but also The procedure code run in another core 102.It is completed to use SMRRs3102, system software is usually by multiple SMM programs Code and data instance are positioned over block adjacent in memory.That is, the regions SMRAM are one single including all SMM procedure codes With the adjacent memory region of data instance.If the SMRR 3102 of 100 all cores 102 of microprocessor has specified packet When including values of all SMM for the single adjacent memory region entirety of this of procedure code and data instance, this can be prevented in non-SMM In the procedure code that runs of a core update the SMM procedure codes and data instance of another core 102.When a time window is present in core 102 When 3102 values of middle SMRR differ, for example, SMRRs 3102 has different values in 100 different IPs 102 of microprocessor, Any value is clearly less than the entirety in the single adjacent memory region for including all SMM procedure codes and data instance, then system can A security attack can be vulnerable to, may be serious for giving the property of SMM.Therefore, description atom propagates update Embodiment to SMRRs 3102 can be particularly advantageous.
In addition, visible storage resources on the other each core instantiation frameworks of the expectable microprocessor 100 of other embodiments Update be transmitted with an atomic way of the similar above method.For example, in one embodiment, the instantiation of each core 102 should Certain bit field positions of x86IA32_MISC_ENABLE MSR, and a performed WRMSR in a core 102 is with similar as above One mode is transmitted to all cores 102 in the microprocessor 100.In addition, embodiment is also contemplated by a WRMSR's Execution in one core 102 is all on framework and special to the other MSR being instantiated in 100 all cores 102 of microprocessor And/or current and future, all cores being transmitted in a manner of similar as described above one in the microprocessor 100 102。
In addition, although it is MTRRs, other implementations that embodiment, which describes visible storage resources on each core instantiation framework, It is different from the resource of x86ISA instruction set architectures and other other than MTRRs that example, which is expected to each core instantiation resource, Resource.For example, other resources other than MTRRs include the MSR of CPUID values and report-back function, seem that vector is more Media extension (Vectored Multimedia eXtensions, VMX) function.
Although the present invention has been disclosed as a preferred embodiment, however, it is not to limit the invention, people in the art Member do not departing from spirit and scope of the invention, when can do it is a little change and retouch, therefore protection scope of the present invention when with Subject to the application claim is defined.For example, software can enable, for example, function, manufacture, modelling, simulation, description and/ Or test device of the present invention and method.It is above-mentioned can by using general procedure language (such as:C, C++), hardware retouches Predicate speech (Hardware Description Languages, HDL) includes Verilog HDL, VHDL etc. to realize.It is such Software can be contained in the kenel of procedure code in tangible media, such as any other machine-readable (such as computer-readable) Storage medium for example semiconductor, disk, hard disk or CD (such as:CD-ROM, DVD-ROM etc.), wherein when procedure code is by machine Device, when such as computer loading and execution, this machine becomes to implement the device of the invention.The method and apparatus of the present invention also may be used To transmit medium by some with procedure code kenel, if electric wire or cable, optical fiber or any transmission kenel are transmitted, In, when procedure code is by machine, when receiving, be loaded into and execute such as computer, this machine becomes to implement the device of the invention.When In general service processor implementation, procedure code combination processing device provides an operation and is similar to the uniqueness for applying particular logic circuit Device.It is (embedded that device of the present invention and method may be included in a semiconductor intelligence property right core such as microprocessor core In HDL), and it is converted into the hardware product of integrated circuit.In addition, device of the present invention and method may include with hardware And the composite entity embodiment of software.Therefore subject to protection scope of the present invention ought be defined depending on the application claim. Finally, those skilled in the art can based on disclosed herein concept and specific embodiment, do not departing from the present invention essence A little change and retouch to reach the identical purpose of the present invention can be done in god and range.

Claims (17)

1. a microprocessor, which is characterized in that including:
One instruction:And
Multiple process cores,
Wherein, each process cores of above-mentioned multiple process cores are configured as sampling above-mentioned instruction;
When above-mentioned instruction indicates first preset value, above-mentioned multiple process cores are configured as specifying above-mentioned multiple process cores jointly A default process cores be a pilot processor;
When second preset value of the above-mentioned instruction instruction different from above-mentioned first preset value, above-mentioned multiple process cores are configured as It is above-mentioned pilot processor to specify a process cores in above-mentioned multiple process cores other than above-mentioned default process cores jointly;With And
The extraction in a framework defines resetting vector of specified pilot processor instructs and executes above-metioned instruction.
2. microprocessor according to claim 1, which is characterized in that each process cores of above-mentioned multiple process cores are configured For:
Generate one and the relevant respectively different default interrupt control unit identifier of above-mentioned process cores;And
When above-mentioned instruction indicates above-mentioned second preset value, above-mentioned respectively different default interrupt control unit identifier is changed so that Each process cores of above-mentioned multiple process cores have a new respective different default interrupt control unit identifier.
3. microprocessor according to claim 2, which is characterized in that
In order to specify above-mentioned multiple process cores above-mentioned default process cores be above-mentioned pilot processor, above-mentioned multiple process cores by with It is set to and generates the respective different default interrupt control unit identifier to generate a minimum in specified above-mentioned multiple process cores Process cores;And
It is handled for above-mentioned guiding to specify a process cores in above-mentioned multiple process cores other than above-mentioned default process cores Device, above-mentioned multiple process cores are configured to designate in above-mentioned multiple process cores and change to generate the respective different of above-mentioned minimum The process cores of default interrupt control unit identifier.
4. microprocessor according to claim 2, which is characterized in that in order to change the everywhere with above-mentioned multiple process cores Manage the relevant above-mentioned respectively different default interrupt control unit identifier of core, so that each process cores of above-mentioned multiple process cores have There are a new respective different default interrupt control unit identifier, above-mentioned multiple process cores to be configured as in above-mentioned multiple process cores Recycle above-mentioned respectively different default interrupt control unit identifier.
5. microprocessor according to claim 4, which is characterized in that above-mentioned each in order to be recycled in above-mentioned multiple process cores From different default interrupt control unit identifier, above-mentioned multiple process cores are configured as one second instruction by above-mentioned microprocessor A specified quantity recycles.
6. microprocessor according to claim 2, which is characterized in that
Above-mentioned microprocessor includes multiple semiconductor crystals;
One distinct set of above-mentioned multiple process cores is located at each semiconductor crystal of above-mentioned multiple semiconductor crystals;
Above-mentioned microprocessor is fabricated such that each process cores of above-mentioned multiple process cores have a default amount of crystals and check figure; And
For each process cores of above-mentioned multiple process cores, in order to generate above-mentioned respectively different default interrupt control unit identification Symbol, above-mentioned process cores are configured as above-mentioned default amount of crystals based on above-mentioned process cores and check figure generate it is above-mentioned respectively different pre- If interrupt control unit identifier.
7. microprocessor according to claim 1, which is characterized in that it includes upper that the one of above-mentioned microprocessor, which preserves buffer, State instruction.
8. microprocessor according to claim 7, which is characterized in that above-mentioned preservation buffer is configured as receiving and be blown An or sensing assessment of the fusible link not being blown.
9. microprocessor according to claim 7, which is characterized in that above-mentioned preservation buffer is configured as sweeping from a boundary Retouch the value that input receives above-mentioned instruction.
10. microprocessor according to claim 1, which is characterized in that specified pilot processor, which is extracted and executed, to be referred to It enables to guide an operating system.
11. microprocessor according to claim 1, which is characterized in that specified pilot processor executes above-mentioned micro- place One encapsulation sleep state of reason device is shaken hands agreement, is slept wherein other process cores of above-mentioned multiple process cores do not execute above-mentioned encapsulation Dormancy state is shaken hands agreement.
12. a kind of method of configuration multi-core microprocessor, which is characterized in that the above method includes:
An instruction of above-mentioned microprocessor is sampled, wherein above-mentioned microprocessor includes multiple process cores;
When above-mentioned instruction indicates first preset value, a default process cores of above-mentioned multiple process cores is specified to be handled for a guiding Device;
When second preset value of the above-mentioned instruction instruction different from above-mentioned first preset value, specifies and removed in above-mentioned multiple process cores Process cores except above-mentioned default process cores are above-mentioned pilot processor;And
The extraction in a framework defines resetting vector of specified pilot processor instructs and executes above-metioned instruction.
13. according to the method for claim 12, which is characterized in that further include:
Generate the respective different default interrupt control unit identifier of one and each processing nuclear phase pass of above-mentioned multiple process cores;And
When above-mentioned instruction indicates above-mentioned second preset value, change above-mentioned with each processing nuclear phase of above-mentioned multiple process cores pass Respective different default interrupt control unit identifier so that each process cores of above-mentioned multiple process cores have one new respective different Default interrupt control unit identifier.
14. according to the method for claim 13, which is characterized in that
It is above-mentioned specify above-mentioned multiple process cores above-mentioned default process cores be above-mentioned pilot processor the step of include:It is specified above-mentioned The process cores of the respective different default interrupt control unit identifier to generate a minimum are generated in multiple process cores;And
The above-mentioned process cores specified in above-mentioned multiple process cores other than above-mentioned default process cores are above-mentioned pilot processor The step of include:It specifies in above-mentioned multiple process cores and changes to generate the respective different default interrupt control unit of above-mentioned minimum The process cores of identifier.
15. according to the method for claim 13, which is characterized in that each processing of above-mentioned modification and above-mentioned multiple process cores The relevant above-mentioned respectively different default interrupt control unit identifier of core makes each process cores of above-mentioned multiple process cores have one The step of new respective different default interrupt control unit identifier includes:It is recycled in above-mentioned multiple process cores above-mentioned respectively different Default interrupt control unit identifier.
16. according to the method for claim 15, which is characterized in that it is above-mentioned recycled in above-mentioned multiple process cores it is above-mentioned respectively The step of different default interrupt control unit identifier includes:Pass through the specified quantity of one second instruction of above-mentioned microprocessor It recycles.
17. according to the method for claim 13, which is characterized in that
Above-mentioned microprocessor includes multiple semiconductor crystals;
One distinct set of above-mentioned multiple process cores is located at each semiconductor crystal of above-mentioned multiple semiconductor crystals;
Above-mentioned microprocessor is fabricated such that each process cores of above-mentioned multiple process cores have a default amount of crystals and check figure; And
Wherein for each process cores of above-mentioned multiple process cores, above-mentioned respectively different default interrupt control unit identifier is generated The step of include:Above-mentioned default amount of crystals and check figure based on above-mentioned process cores generate above-mentioned respectively different default interruption control Device identifier.
CN201410431347.1A 2013-08-28 2014-08-28 Microprocessor and its configuration method Active CN104239274B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810985885.3A CN109165189B (en) 2013-08-28 2014-08-28 Microprocessor, method of configuring the same, and computer-readable storage medium

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201361871206P 2013-08-28 2013-08-28
US61/871,206 2013-08-28
US201361916338P 2013-12-16 2013-12-16
US61/916,338 2013-12-16
US14/281,729 US9535488B2 (en) 2013-08-28 2014-05-19 Multi-core microprocessor that dynamically designates one of its processing cores as the bootstrap processor
US14/281,729 2014-05-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201810985885.3A Division CN109165189B (en) 2013-08-28 2014-08-28 Microprocessor, method of configuring the same, and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN104239274A CN104239274A (en) 2014-12-24
CN104239274B true CN104239274B (en) 2018-09-21

Family

ID=52227372

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201410431347.1A Active CN104239274B (en) 2013-08-28 2014-08-28 Microprocessor and its configuration method
CN201810985885.3A Active CN109165189B (en) 2013-08-28 2014-08-28 Microprocessor, method of configuring the same, and computer-readable storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201810985885.3A Active CN109165189B (en) 2013-08-28 2014-08-28 Microprocessor, method of configuring the same, and computer-readable storage medium

Country Status (1)

Country Link
CN (2) CN104239274B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009121B (en) * 2017-12-21 2021-12-07 中国电子科技集团公司第四十七研究所 Dynamic multi-core configuration method for application
CN114489817A (en) * 2021-12-28 2022-05-13 深圳市腾芯通智能科技有限公司 Processor starting method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611911B1 (en) * 1999-12-30 2003-08-26 Intel Corporation Bootstrap processor election mechanism on multiple cluster bus system
US7100034B2 (en) * 2003-05-23 2006-08-29 Hewlett-Packard Development Company, L.P. System for selecting another processor to be the boot strap processor when the default boot strap processor does not have local memory
CN1997966A (en) * 2004-07-06 2007-07-11 茵姆拜迪欧有限公司 Method and system for concurrent excution of mutiple kernels
WO2013049371A2 (en) * 2011-09-30 2013-04-04 Intel Corporation Constrained boot techniques in multi-core platforms

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1331053C (en) * 2004-02-12 2007-08-08 华为技术有限公司 Flag register and method for avoiding resource access conflict between multiple processes
US7996663B2 (en) * 2007-12-27 2011-08-09 Intel Corporation Saving and restoring architectural state for processor cores
JP2012146201A (en) * 2011-01-13 2012-08-02 Toshiba Corp On-chip router and multi-core system using the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611911B1 (en) * 1999-12-30 2003-08-26 Intel Corporation Bootstrap processor election mechanism on multiple cluster bus system
US7100034B2 (en) * 2003-05-23 2006-08-29 Hewlett-Packard Development Company, L.P. System for selecting another processor to be the boot strap processor when the default boot strap processor does not have local memory
CN1997966A (en) * 2004-07-06 2007-07-11 茵姆拜迪欧有限公司 Method and system for concurrent excution of mutiple kernels
WO2013049371A2 (en) * 2011-09-30 2013-04-04 Intel Corporation Constrained boot techniques in multi-core platforms

Also Published As

Publication number Publication date
CN109165189B (en) 2020-12-08
CN109165189A (en) 2019-01-08
CN104239274A (en) 2014-12-24

Similar Documents

Publication Publication Date Title
CN104462004B (en) The method of microprocessor and its internuclear synchronous operation of processing
CN104216680B (en) Microprocessor and its execution method
TWI637316B (en) Dynamic reconfiguration of multi-core processor
CN104331388B (en) Microprocessor and the method for the internuclear synchronization of processing in microprocessor
CN104216679B (en) Microprocessor and its execution method
CN104238997B (en) Microprocessor and its execution method
CN104239274B (en) Microprocessor and its configuration method
CN104360727B (en) Microprocessor and the method for using its power saving
CN104239275B (en) Multi-core microprocessor and its relocation method
CN104331387B (en) Microprocessor and its configuration method
CN104239273B (en) Microprocessor and its execution method
CN104239272B (en) Microprocessor and its operating method
CN104216861B (en) Microprocessor and the in the microprocessor method of synchronization process core

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant