TWI512448B - Instruction for enabling a processor wait state - Google Patents

Instruction for enabling a processor wait state Download PDF

Info

Publication number
TWI512448B
TWI512448B TW099136477A TW99136477A TWI512448B TW I512448 B TWI512448 B TW I512448B TW 099136477 A TW099136477 A TW 099136477A TW 99136477 A TW99136477 A TW 99136477A TW I512448 B TWI512448 B TW I512448B
Authority
TW
Taiwan
Prior art keywords
processor
core
low power
value
power state
Prior art date
Application number
TW099136477A
Other languages
Chinese (zh)
Other versions
TW201131349A (en
Inventor
Martin G Dixon
Scott D Rodgers
Taraneh Bahrami
Stephen H Gunther
Prashant Sethi
Per Hammarlund
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of TW201131349A publication Critical patent/TW201131349A/en
Application granted granted Critical
Publication of TWI512448B publication Critical patent/TWI512448B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3228Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3293Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30083Power or thermal control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/3009Thread control instructions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Description

用以啟用處理器等待狀態之指令Instruction to enable processor wait state 發明的技術領域Technical field of invention

本發明係有關用以啟用處理器等待狀態的指令。The present invention is directed to instructions for enabling a processor wait state.

發明的技術背景Technical background of the invention

隨著處理器技術的演進,具有較多核心的處理器也變得可得。為了有效率地執行軟體,該等核心受分派成可進行一單一應用程式的不同執行緒。該種配置稱為合作執行緒軟體。在現代合作執行緒軟體中,使一執行緒等待另一個執行緒完成是相當平常的事。習知地,上面有等待執行緒正在執行中的處理器會在等待時耗用主動電力。再者,等待時間可能是不確定的,且因此該處理器可能無法知悉它要等待多久。As processor technology evolves, processors with more cores are also available. In order to execute software efficiently, the cores are assigned to different threads that can perform a single application. This configuration is called cooperative thread software. In modern cooperative thread software, it is quite common to have one thread wait for another thread to complete. Conventionally, a processor with a waiting thread executing is using active power while waiting. Again, the latency may be uncertain, and thus the processor may not be aware of how long it will wait.

另一種使一核心等待的機構是把該核心置於一等待狀態中,例如一低電力狀態。為了實行此任務,將喚起一作業系統(OS)。該OS可執行一對指令,稱為一MONITOR指令以及一MWAIT指令。要注意的是,該等指令是應用程式階層軟體不可得的。反之,該等指令僅能用於OS特權階層,以設定供監看的一位址範圍並且使該處理器進入一低電力狀態,直到受監看的該位址範圍受到更新為止。然而,進入該OS以執行該等指令的過程中有著相當多的冗餘工作。此種冗餘工作呈高潛伏期間的形式,並且會進一步產生複雜性,因為當該等待中執行緒從該等待狀態退出時,OS排程議題可能會導致該等待中執行緒無法成為下一個受排程執行緒。Another mechanism that allows a core to wait is to place the core in a wait state, such as a low power state. In order to carry out this task, an operating system (OS) will be evoked. The OS can execute a pair of instructions, called a MONITOR instruction and a MWAIT instruction. It should be noted that these instructions are not available to the application hierarchy software. Conversely, these instructions can only be used in the OS privilege hierarchy to set an address range for monitoring and to put the processor into a low power state until the monitored address range is updated. However, there is considerable redundancy in the process of entering the OS to execute the instructions. This redundant operation takes the form of a high latency period and further complicates the complexity, because when the waiting thread exits from the wait state, the OS scheduling issue may cause the waiting thread to become the next one. Schedule thread.

發明的概要說明Summary of the invention

依據本發明的一實施例,係特地提出一種處理器,其包含:一核心,其包括用以從一第一應用程式接收並且解碼一指令的一解碼邏輯組件,該指令指定欲受監看之一位置的一識別資料以及一計時器數值,該核心並且包括耦合至該解碼邏輯組件以針對該計時器數值進行一計數的一計時器;以及耦合至該核心的一電力管理單元,其用以至少部分地根據該計時器數值判定出用於該處理器之一低電力狀態的一類型,並且如果該受監看位置的一數值並不等於一目標值且該計時器數值尚未超過,該電力管理單元用以響應於此判定結果使該處理器進入該低電力狀態,而不需一作業系統(OS)的介入。In accordance with an embodiment of the present invention, a processor is specifically provided comprising: a core comprising a decoding logic component for receiving and decoding an instruction from a first application, the instruction specifying a watch to be monitored An identification data of a location and a timer value, the core and a timer coupled to the decoding logic component for counting the timer value; and a power management unit coupled to the core for Determining, based at least in part on the timer value, a type of low power state for the processor, and if the value of the monitored position is not equal to a target value and the timer value has not exceeded, the power The management unit is responsive to this determination to cause the processor to enter the low power state without the intervention of an operating system (OS).

圖式的簡要說明Brief description of the schema

第1圖以流程圖展示出根據本發明一實施例的一種方法。Figure 1 shows in a flow chart a method in accordance with an embodiment of the present invention.

第2圖以流程圖展示出根據本發明一實施例而可針對一目標值進行的一項測試。Figure 2 is a flow chart showing a test that can be performed for a target value in accordance with an embodiment of the present invention.

第3圖以方塊圖展示出根據本發明一實施例的一種處理器核心。Figure 3 is a block diagram showing a processor core in accordance with an embodiment of the present invention.

第4圖以方塊圖展示出根據本發明一實施例的一種處理器。Figure 4 is a block diagram showing a processor in accordance with an embodiment of the present invention.

第5圖以方塊圖展示出根據本發明另一實施例的一種處理器。Figure 5 is a block diagram showing a processor in accordance with another embodiment of the present invention.

第6圖以流程圖展示出根據本發明一實施例之多個合作執行緒之間的互動狀況。Figure 6 is a flow chart showing the interaction between a plurality of cooperative threads in accordance with an embodiment of the present invention.

第7圖以方塊圖展示出根據本發明一實施例的一種系統。Figure 7 is a block diagram showing a system in accordance with an embodiment of the present invention.

較佳實施例的詳細說明Detailed description of the preferred embodiment

在各種不同實施例中,可備置並使用一使用者階層指令(換言之,一應用程式階層指令),以允許一應用程式等待一或多個狀況的發生。當該應用程式正在等待時,可使上面正有該應用程式執行的一處理器(例如一多核心處理器的一核心)處於一低電力狀態,或者該處理器可進行切換以執行另一個執行緒。雖然本發明的範圍不受限於此,該處理器可等待的狀況可包括檢測一數值、一計時器的逾時、接收到一中斷信號等狀況,例如從另一個處理器接收到一中斷信號。In various embodiments, a user hierarchy instruction (in other words, an application level instruction) may be provisioned and used to allow an application to wait for one or more conditions to occur. While the application is waiting, a processor (eg, a core of a multi-core processor) on which the application is executing may be placed in a low power state, or the processor may switch to perform another execution. thread. Although the scope of the present invention is not limited thereto, the condition that the processor can wait may include detecting a value, a timeout of a timer, receiving an interrupt signal, etc., for example, receiving an interrupt signal from another processor. .

依此,一應用程式可等待一或多個操作發生,例如,在另一個執行緒中,而不需要屈服於一作業系統(OS)或其他監管程式軟體。再者,根據該指令所備置的指令資訊,這種等待狀態可於一種依時性方式發生,以使該處理器可選出要進入的一適當低電力狀態。換言之,該處理器本身的控制邏輯組件可根據所備置的指令資訊以及在該處理器中執行的各種不同計算結果來判定出要進入的一適當低電力狀態。因此,可以避免需要牽連到OS以進入一低電力狀態的冗餘工作。要注意的是,該處理器不需要等待另一個同位體處理器,但可等待一共處理器,例如一浮點共處理器或其他固定功能裝置。Accordingly, an application can wait for one or more operations to occur, for example, in another thread without succumbing to an operating system (OS) or other supervisory software. Moreover, based on the instruction information provided by the instruction, the wait state can occur in a time-dependent manner to enable the processor to select an appropriate low power state to enter. In other words, the control logic component of the processor itself can determine an appropriate low power state to enter based on the prepared instruction information and various different calculations performed in the processor. Therefore, redundant work that needs to be tied to the OS to enter a low power state can be avoided. It should be noted that the processor does not need to wait for another peer processor, but can wait for a total of processors, such as a floating point coprocessor or other fixed function device.

在各種不同實施例中,一使用者階層指令可具有與其相關聯的各種不同資訊,包括要監看的一位置、要查找的一數值、以及一逾時值。雖然本發明的範圍不受限於此,為了討論方便,可把此使用者階層指令稱為一處理器等待指令。可以備置該種使用者階層指令的不同風格,其可各例如指出等待一特定數值、數值組、範圍,或使該種等待與一項操作連接,例如,當該數值成真時使一計數器增量。In various embodiments, a user hierarchy instruction can have various different information associated therewith, including a location to be monitored, a value to look for, and a timeout value. Although the scope of the present invention is not limited thereto, this user hierarchy instruction may be referred to as a processor wait instruction for the convenience of discussion. Different styles of such user-level instructions may be provided, each of which may, for example, indicate waiting for a particular value, set of values, range, or cause the wait to be connected to an operation, for example, incrementing a counter when the value is true the amount.

大致上,一處理器可響應於一處理器等待指令而使各種不同動作發生,該指令可包括下面的指令資訊或者與下面的指令資訊相關聯:一來源欄位,其指出欲受測試之一數值的位置;一逾時或期限計時器值,其指出該等待狀態應該要結束的一時點(如果並未達到欲受測試的該數值);以及一結果欄位,其指出欲獲取的該數值。在其他應用中,除了該等欄位之外,一目的地或遮罩欄位可存在於一實行方案中,其中該來源值受到遮罩且針對一預定值進行測試(例如不管該遮罩之結果的遮罩值是否為非零)。In general, a processor may cause various actions to occur in response to a processor waiting for an instruction, the instruction may include the following instruction information or associated with the following instruction information: a source field indicating one of the tests to be tested The position of the value; a timeout or expiration timer value indicating a point in time at which the wait state should end (if the value to be tested is not reached); and a result field indicating the value to be obtained . In other applications, in addition to the fields, a destination or mask field may exist in an implementation where the source value is masked and tested for a predetermined value (eg, regardless of the mask) Whether the resulting mask value is non-zero).

如上所述,該處理器可響應於此指令來執行各種不同操作。大致上,該等操作可包括:測試該受監看位置的一數值是否為一目標值(例如進行一布林(Boolean)運算以測試一〝真實〞狀況);以及測試是否已經達到該期限計時器數值。如果並未符合該等狀況中之任一種(例如真實),或者如果從另一個實體接收到一項中斷,便可以完成該指令。否則,可能要啟始一機構以監看該位置,來查看該數值是否將改變。因此在此時,可以進入一等待狀態。在此等待狀態中,該處理器可進入一低電力狀態,或者可啟始執行另一個處理器硬體執行緒的動作。如果一低電力狀態是所欲的,該處理器可至少部分地根據到該期限計時器為止的剩下時間長度,來選出一適當低電力狀態。隨後可以進入該低電力狀態,且該處理器可維持為此狀態,直到受到上述該等狀況中之一喚醒為止。儘管係以此種一般操作來進行說明,要了解的是,在不同實行方案中,各種不同特徵與操作可利用不同方式出現。As described above, the processor can perform various different operations in response to this instruction. In general, the operations may include: testing whether a value of the monitored position is a target value (eg, performing a Boolean operation to test a true 〞 condition); and testing whether the deadline has been reached. The value of the device. If one of these conditions is not met (eg, true), or if an interrupt is received from another entity, the instruction can be completed. Otherwise, you may want to start an organization to monitor the location to see if the value will change. Therefore, at this time, a waiting state can be entered. In this wait state, the processor can enter a low power state or can initiate an action of another processor hardware thread. If a low power state is desired, the processor can select an appropriate low power state based, at least in part, on the length of time remaining until the deadline timer. The low power state can then be entered and the processor can remain in this state until waking up by one of the aforementioned conditions. Although described in terms of such general operations, it is to be understood that various features and operations may be present in different ways in different implementations.

現在請參照第1圖,其以流程圖展示出根據本發明一實施例的一種方法。如第1圖所示,可由執行用以掌管一處理器等待操作之一使用者階層指令的一處理器來實行方法100。如所見地,方法100可藉由解碼一已接收指令來開始(方塊110)。舉一實例來說,該指令可為由一應用程式備置的一使用者階層指令,例如利用多個執行緒實行的一應用程式,該等執行緒各包括可與執行一合作執行緒應用程式的動作具有某種互相依賴性的指令。在解碼該指令之後,該處理器可把一記憶體數值載入到一快取記憶體以及一暫存器中(方塊120)。更確切來說,該指令的一來源運算元可識別出一位置,例如上面可取得一數值的記憶體。可把此數值載入到一快取記憶體中,例如與正執行該指令之該核心相關聯的一低階層快取記憶體,例如一私密快取記憶體。再者,可把該數值儲存到該核心的一暫存器中。舉一實例來說,此暫存器可為該執行緒之一邏輯性處理器的一般用途暫存器。接下來,控制動作前進至方塊130。在方塊130中,可響應於該指令資訊來計算一期限。更確切來說,如果並未符合一狀況(例如一所欲數值未受到更新),此期限可為該等待狀態應該要發生的一段時間。在一實施例中,該指令格式可包括提供一期限計時器數值的資訊。為了判定到達此期限為止的適當時間,在某些實行方案中,可比較所接收到的期限計時器數值以及存在於該處理器中的一目前時間計數器數值,例如一時間戳記計數器(TSC)數值。在某些實施例中,可把此差異載入到一期限計時器中,其可利用一計數器或暫存器來實行。在一實施例中,此期限計時器可為開始進行倒數的一倒數計時器。在此實行方案中,係從該目前TSC值減去該期限,且該倒數計時器針對該等多個循環週期而起作用。當該TSC數值超出該期限時,它便觸發該處理器的恢復動作。換言之,如以下將討論地,當使該期限計時器減量而成為零時,如果仍然在該時間進行的話,便可終止該等待狀態。在一暫存器實行方案中,一比較器可在每個循環周期中比較該TSC計數器的數值以及該期限。Referring now to Figure 1, a flowchart is shown in accordance with an embodiment of the present invention. As shown in FIG. 1, method 100 can be performed by a processor executing a user hierarchy command to control a processor to wait for operation. As can be seen, method 100 can begin by decoding a received instruction (block 110). For example, the command can be a user-level instruction prepared by an application, such as an application implemented by using multiple threads, each of which includes a cooperative thread application that can be executed with the execution thread. Actions have some sort of interdependent instruction. After decoding the instruction, the processor can load a memory value into a cache memory and a scratchpad (block 120). More specifically, a source of operands of the instruction can identify a location, such as a memory that can take a value above. This value can be loaded into a cache memory, such as a low level cache memory associated with the core that is executing the instruction, such as a private cache memory. Furthermore, the value can be stored in a register of the core. As an example, the scratchpad can be a general purpose register for one of the threads of the logic processor. Next, the control action proceeds to block 130. In block 130, a deadline may be calculated in response to the instruction information. More specifically, if a condition is not met (eg, a desired value is not updated), this period may be the period of time that the waiting state should occur. In an embodiment, the instruction format may include information providing a deadline timer value. In order to determine the appropriate time to reach this deadline, in some implementations, the received deadline timer value and a current time counter value present in the processor, such as a timestamp counter (TSC) value, may be compared. . In some embodiments, this difference can be loaded into a deadline timer that can be implemented using a counter or register. In an embodiment, the deadline timer can be a countdown timer that begins counting down. In this implementation, the deadline is subtracted from the current TSC value and the countdown timer is active for the plurality of cycles. When the TSC value exceeds the deadline, it triggers the recovery action of the processor. In other words, as will be discussed below, when the deadline timer is decremented to zero, the wait state can be terminated if it is still being performed at that time. In a register implementation, a comparator can compare the value of the TSC counter and the duration in each cycle.

上述操作因此可適切地設定在該等待狀態中欲受存取以及測試的各種不同結構。因此,可以進入一等待狀態。此等待狀態可大致上為迴圈155的部分,該迴圈可反覆地執行,直到多種狀況中之一發生為止。如所見地,可以判定出來自該指令資訊的一目標值是否與儲存在該暫存器中的數值相符(決策方塊140)。在該指令資訊包括該目標值的一實行方案中,從記憶體取得且儲存在該暫存器中的資料可受到測試,以判定其數值是否與此目標值相符。若是,尚未符合此狀況,且控制動作前進至方塊195,其中可以完成執行該等待指令的動作。完成該指令的此動作可另外設定各種不同旗標或其他數值,以致能指出要退出該等待狀態之原因的一項指示。一旦完成了該指令,可以繼續進行請求該等待狀態之該執行緒的操作。The above operation can thus appropriately set various different structures to be accessed and tested in the waiting state. Therefore, it is possible to enter a waiting state. This wait state can be substantially a portion of loop 155 that can be executed repeatedly until one of a plurality of conditions occurs. As can be seen, it can be determined whether a target value from the command information matches the value stored in the register (decision block 140). In an implementation in which the instruction information includes the target value, the data retrieved from the memory and stored in the register can be tested to determine if its value matches the target value. If so, the condition has not been met and the control action proceeds to block 195 where the action to execute the wait instruction can be completed. This action of completing the instruction may additionally set various different flags or other values so as to indicate an indication of the reason for exiting the waiting state. Once the instruction is completed, the operation of the thread requesting the wait state can continue.

反之,如果在決策方塊140中判定出尚未符合該狀況,控制動作便前進至決策方塊150,其中可判定出該期限是否已經產生。若是,該指令便以上述方式完成。否則,控制動作便前進至決策方塊160,其中可判定出另一個硬體部件是否正尋求著喚醒該處理器。若是,該指令便以上述方式完成。否則,控制動作便前進至方塊170,其中可至少部分地根據該期限計時器數值判定出一低電力狀態。換言之,該處理器本身可以根據該期限將發生之前的剩餘時間並且以不需要OS介入的方式來判定一適當低電力狀態。為了實現此項判定,在某些實施例中,可以使用一處理器之一非核心的邏輯組件。此邏輯組件可包括一圖表或者可與一圖表相關聯,該圖表連結了各種不同低電力狀態與期限計時器數值,如下所述地。根據方塊170中的此項判定結果,該處理器可進入一低電力狀態(方塊180)。在該低電力狀態中,可使該處理器的各種不同結構置於一低電力狀態,即上面執行有該等指令的一核心以及其他部件二者。欲置於一低電力狀態中的該等特定結構以及該低電力狀態的位準可依據實行方案而不同。要注意的是,如果因為一已更新數值並非為該目標值而越過了該迴圈,可根據已更新期限計時器數值來判定一新的低電力狀態,因為如果只剩下有限的時間,進入某一種低電力狀態(例如一深度睡眠狀態)可能是不適當的。Conversely, if it is determined in decision block 140 that the condition has not been met, the control action proceeds to decision block 150 where it can be determined if the deadline has been generated. If so, the instruction is completed in the above manner. Otherwise, control proceeds to decision block 160 where it can be determined if another hardware component is seeking to wake up the processor. If so, the instruction is completed in the above manner. Otherwise, control proceeds to block 170 where a low power state can be determined based at least in part on the deadline timer value. In other words, the processor itself can determine an appropriate low power state based on the time remaining before the deadline will occur and in a manner that does not require OS intervention. To achieve this determination, in some embodiments, one of the processors may be a non-core logical component. This logic component can include a chart or can be associated with a chart that links various different low power state and deadline timer values, as described below. Based on the result of this determination in block 170, the processor can enter a low power state (block 180). In this low power state, the various different configurations of the processor can be placed in a low power state, i.e., a core and other components on which the instructions are executed. The particular structure to be placed in a low power state and the level of the low power state may vary depending on the implementation. It should be noted that if the loop is crossed because an updated value is not the target value, a new low power state can be determined based on the updated deadline timer value, because if only a limited time remains, A certain low power state (eg, a deep sleep state) may be inappropriate.

可發生使該核心退出該低電力狀態的各種不同事件。要注意的是,如果經快取資料(即,對應於該受監看位置)已經受到更新(決策方塊190),便可執行該低電力狀態。若是,控制動作將返回到決策方塊140。相似地,如果該期限過期及/或從另一個硬體部件接收到一喚醒信號,控制動作可從該低電力狀態前進至決策方塊150與160中之一。儘管在第1圖的實施例中係以此種高階層實行方案來展示出本發明,要了解的是,本發明的範圍不受限於此。Various different events can occur that cause the core to exit the low power state. It is noted that the low power state can be executed if the cached data (i.e., corresponding to the monitored location) has been updated (decision block 190). If so, the control action will return to decision block 140. Similarly, if the deadline expires and/or a wake-up signal is received from another hardware component, the control action can proceed from the low power state to one of decision blocks 150 and 160. Although the present invention has been exhibited in such a high-level embodiment in the embodiment of Fig. 1, it is to be understood that the scope of the invention is not limited thereto.

在其他實行方案中,可發生針對一目標值的一項遮罩式測試。換言之,該使用者階層指令可隱含地表示欲獲取的一目標值。舉一實例來說,此目標值可為介於從記憶體取得之一來源值以及出現在該指令之一來源/目的地運算元中之一遮罩值之間之一項遮罩操作的一非零數值。在一實施例中,該使用者階層指令可為一載入、遮罩、等待,如果為一處理器ISA的零(LDMWZ)指令。在一實施例中,該指令可為LDMWZ r32/64、M32/64的格式。在此格式中,第一運算元(r32/64)可儲存一遮罩,且第二運算元(M32/64)可識別出一來源值(即,該受監看位置)。依次地,可把一逾時值儲存在一第三暫存器中。例如,該期限可位於一隱含暫存器中。尤其,可以使用該等EDX:EAX暫存器,其為當該TSC計數器受讀取時受寫入的相同暫存器組。大致上,該指令可進行一信號值的非繁忙輪詢,並且如果該信號是不可得,便進入一低電力等待狀態。在不同實行方案中,可以掌管位元式信號以及計數信號二種,其中零表示沒有項目正在等待中。該逾時值可指出在無條件地恢復操作之前,以該處理器應該等待一非零結果的TSC循環周期來測量的時間長度。在一實施例中,可經由一記憶體映射暫存器(例如一組態與狀態暫存器(CSR))而針對哪些實體處理器處於一低電力狀態的資訊來備置軟體。In other implementations, a masked test for a target value can occur. In other words, the user hierarchy instruction may implicitly represent a target value to be acquired. For example, the target value can be one of a masking operation between one source value from memory and one mask value in one of the source/destination operands of the instruction. Non-zero value. In one embodiment, the user hierarchy instruction can be a load, a mask, a wait, if it is a zero (LDMWZ) instruction of a processor ISA. In an embodiment, the instructions may be in the format of LDMWZ r32/64, M32/64. In this format, the first operand (r32/64) can store a mask, and the second operand (M32/64) can identify a source value (ie, the monitored location). In turn, a timeout value can be stored in a third register. For example, the deadline can be in an implicit register. In particular, these EDX:EAX registers can be used, which are the same set of registers that are written when the TSC counter is read. In general, the command can perform a non-busy polling of a signal value and enter a low power wait state if the signal is not available. In different implementations, it is possible to control both the bit signal and the count signal, where zero means that no item is waiting. The timeout value may indicate the length of time that the processor should wait for a non-zero result TSC cycle period before unconditionally restoring the operation. In one embodiment, the software may be provisioned via a memory mapping register (eg, a configuration and state register (CSR)) for information on which physical processors are in a low power state.

在此實施例中,該LDMWZ指令將從該來源記憶體位置載入資料、以該來源/目的地數值遮罩它、並且進行測試以確認所得數值是否為零。如果該遮罩值不為零,便使從記憶體載入的該數值置於未受遮罩的該來源/目的地暫存器中。否則,該處理器將進入一低電力等待狀態。要注意的是,此低電力狀態可或不可對應於一目前界定低電力狀態,例如根據進階組態與電源介面(ACPI)規格第4版(2009年6月16日發表)的所謂C-狀態。該處理器可維持為低電力狀態,直到指定的時間區間過期、發出表示一外部異常的信號(例如一般中斷(INTR)、非遮罩中斷(NMI)、或系統管理中斷(SMI))為止,或者以受遮罩時為非零的一數值來寫入該來源記憶體位置。作為進入此等待狀態的部分,該處理器可清除一記憶體映射暫存器(CSR)位元,其指出該處理器目前正處於等待中。In this embodiment, the LDMWZ instruction will load the data from the source memory location, mask it with the source/destination value, and test to confirm if the resulting value is zero. If the mask value is not zero, the value loaded from the memory is placed in the unmasked source/destination register. Otherwise, the processor will enter a low power wait state. It should be noted that this low power state may or may not correspond to a currently defined low power state, such as the so-called C- according to the Advanced Configuration and Power Interface (ACPI) Specification Version 4 (published on June 16, 2009). status. The processor can be maintained in a low power state until a specified time interval expires, signaling a signal indicating an external exception (eg, general interrupt (INTR), non-mask interrupt (NMI), or system management interrupt (SMI)), Or write the source memory location with a value that is non-zero when masked. As part of entering this wait state, the processor can clear a Memory Map Register (CSR) bit indicating that the processor is currently waiting.

因為以受遮罩時將產生一非零數值的一數值來寫入該受監看位置而從該等待狀態退出時,可以清除一旗標暫存器的非零數值指示符,並且把該未受遮罩值讀取置於該目的地暫存器中。如果計時器的過期狀況造成從該低電力狀態中退出,可設定該旗標暫存器的該零數值指示符,以允許軟體能檢測該種狀況。如果因為一外部異常而發生一項退出狀況,該處理器以及記憶體的狀態將使該指令不被視為已經執行。因此,在返回到該正常執行流程時,相同的LDMWZ指令將受到再次執行。The non-zero value indicator of a flag register can be cleared and the non-zero value indicator of a flag register can be cleared because a value that would generate a non-zero value when masked is written to the monitored position and exits from the wait state. The masked value read is placed in the destination scratchpad. If the expiration condition of the timer causes an exit from the low power state, the zero value indicator of the flag register can be set to allow the software to detect the condition. If an exit condition occurs due to an external exception, the state of the processor and memory will cause the instruction to not be considered to have been executed. Therefore, the same LDMWZ instruction will be executed again when returning to the normal execution flow.

現在請參照第2圖,以流程圖展示出根據本發明另一實施例而可針對一目標值進行的一項測試。如第2圖所示,方法200可藉著把來源資料載入到一第一暫存器中來開始(方塊210)。可利用在一第二暫存器中出現的一遮罩來遮罩此來源資料(方塊220)。在各種不同實施例中,該等第一與第二暫存器可由一指令來指定,且可對應於分別用以儲存該來源資料與目的地資料的位置。可隨後判定出該遮罩操作的結果是否為零(決策方塊230)。若是,尚未符合該所欲狀況,且該處理器可進入一低電力狀態(方塊240)。否則,可把該來源資料儲存到該第二暫存器中(方塊250),且指令執行動作便完成(方塊260)。Referring now to Figure 2, a flow chart showing a test for a target value in accordance with another embodiment of the present invention is shown. As shown in FIG. 2, method 200 can begin by loading source data into a first register (block 210). The source material can be masked by a mask appearing in a second register (block 220). In various embodiments, the first and second registers may be specified by an instruction and may correspond to locations for storing the source and destination data, respectively. It can then be determined if the result of the masking operation is zero (decision block 230). If so, the desired condition has not been met and the processor can enter a low power state (block 240). Otherwise, the source data can be stored in the second register (block 250) and the instruction execution action is complete (block 260).

在該等待狀態中,將更新該目標位置,如決策方塊265中所判定地,控制動作將返回到方塊220以進行該遮罩操作。如果判定出在該等待狀態中已經發生了另一個狀況(如決策方塊270所判定地),控制動作便前進至方塊260以供完成該指令。儘管在第2圖的實施例中係以此種高階層實行方案來展示出本發明,要了解的是,本發明的範圍不受限於此。In the wait state, the target location will be updated, as determined in decision block 265, the control action will return to block 220 to perform the masking operation. If it is determined that another condition has occurred in the wait state (as determined by decision block 270), then control proceeds to block 260 for completion of the command. Although the present invention has been exhibited in such a high-level embodiment in the embodiment of Fig. 2, it is to be understood that the scope of the invention is not limited thereto.

現在請參照第3圖,其以方塊圖展示出根據本發明一實施例的一種處理器核心。如第3圖所示,處理器核心300可為一種多階段管線式脫序處理器。在第3圖中,係以相對簡化的視圖展示出處理器核心300,以展示出根據本發明一實施例之結合處理器等待狀態使用的各種不同特徵。Referring now to FIG. 3, a block diagram illustrates a processor core in accordance with an embodiment of the present invention. As shown in FIG. 3, processor core 300 can be a multi-stage pipelined out-of-order processor. In FIG. 3, processor core 300 is shown in a relatively simplified view to illustrate various different features in connection with processor wait state usage in accordance with an embodiment of the present invention.

如第3圖所示,核心300包括前端單元310,其可用來擷取欲受執行的指令並且製備該等指令以供後續用於該處理器中。例如,前端單元310可包括擷取單元301、指令快取記憶體303、以及指令解碼器305。在某些實行方案中,前端單元310可另包括一線跡快取記憶體,以及微碼儲存體與一微操作儲存體。擷取單元301可擷取巨集指令,例如從記憶體或指令快取記憶體303,並且把該等指令饋送到指令解碼器305以把它們解碼為基元,即供該處理器執行的微操作。根據本發明一實施例,欲在前端單元310中受到掌管的該種指令可為一使用者階層處理器等待指令。此指令可令該等前端單元能存取各種不同微操作,以致能執行該等操作,例如上面與該等待指令相關聯的多項操作。As shown in FIG. 3, core 300 includes a front end unit 310 that can be used to retrieve instructions to be executed and to prepare the instructions for subsequent use in the processor. For example, the front end unit 310 can include a capture unit 301, an instruction cache 303, and an instruction decoder 305. In some implementations, the front end unit 310 can further include a stitch cache memory, and a microcode storage body and a micro operation storage body. The capture unit 301 can retrieve macro instructions, such as memory 303 from memory or instructions, and feed the instructions to the instruction decoder 305 to decode them into primitives, ie, micro-processors for execution by the processor. operating. According to an embodiment of the invention, the instruction to be managed in the front end unit 310 may be a user hierarchy processor waiting for an instruction. This instruction may enable the front end unit to access various different micro operations such that the operations can be performed, such as the multiple operations associated with the wait instruction above.

在前端單元310以及執行單元320之間耦合的是脫序(OOO)引擎315,其可用來接收該微指令並且製備該指令以供執行。更確切來說,OOO引擎315可包括各種不同緩衝器,其用以重新定序微指令流程並且配置執行所需的各種不同資源,並且在各種不同暫存器檔案(例如,暫存器檔案330以及延伸式暫存器檔案335)中的儲存位置上提供重新命名邏輯性暫存器的動作。暫存器檔案330可包括用於整數與浮點操作的分別暫存器檔案。延伸式暫存器檔案335可提供用於向量大小單元的儲存體,例如每個暫存器256個或512個位元。Coupled between front end unit 310 and execution unit 320 is an out of order (OOO) engine 315 that can be used to receive the microinstructions and prepare the instructions for execution. More specifically, the OOO engine 315 can include a variety of different buffers for reordering the microinstruction flow and configuring the various different resources required for execution, and in various different scratchpad archives (eg, the scratchpad archive 330) And the act of renaming the logical scratchpad is provided at a storage location in the extended scratchpad file 335). The scratchpad file 330 can include separate scratchpad files for integer and floating point operations. The extended scratchpad file 335 can provide a bank for vector size units, such as 256 or 512 bits per register.

各種不同資源可出現在執行單元320中,例如包括各種不同整數、浮點、以及單一指令多個資料(SIMD)邏輯組件單元,以及其他專業化硬體。例如,該等執行單元可包括一或多個運算邏輯單元(ALU) 322。此外,可存在著根據本發明一實施例的喚醒邏輯組件324。該種喚醒邏輯組件可響應於一使用者階層指令而用來執行與進行一處理器等待模式有關之該等操作中的某些。如以下進一步討論地,用以掌管該等等待狀態的其他邏輯組件可存在於一處理器的另一個部分中,例如一非核心。同樣展示於第3圖中的是一組計時器326。在本文中用以進行分析的相關計時器包括一TSC計時器,以及一期限計時器,其可藉由與一期限對應的一數值來設定,而如果並未符合任何其他狀況,該處理器將在該期限之前離開該等待狀態。當該期限計時器到達一預定計數值(其在某些實施例中可為一倒數到零的動作)時,喚醒邏輯組件324可啟動某些操作。可把結果提供給收回邏輯組件,即一重新定序緩衝器(ROB) 340。更確切來說,ROB 340可包括各種不同陣列以及用以接收與受執行指令相關聯之資訊的邏輯組件。此資訊隨後受到ROB 340檢視,以判定是否可有效地收回該等指令並且提交給該處理器之架構式狀態的結果資料,或者是否有能防止該等指令之適當收回而發生的一或多個異常。當然,ROB 340可掌管與收回動作相關聯的其他操作。在根據本發明一實施例之一處理器等待指令的脈絡中,收回動作可使ROB 340設定一旗標暫存器或其他狀態暫存器之一或多個指示符的狀態,其可指出一處理器退出一等待狀態的原因。A variety of different resources may be present in execution unit 320, including, for example, various different integers, floating point, and single instruction multiple data (SIMD) logic component units, as well as other specialized hardware. For example, the execution units may include one or more operational logic units (ALUs) 322. Additionally, there may be a wake-up logic component 324 in accordance with an embodiment of the present invention. The wake-up logic component can be used to perform some of the operations associated with performing a processor wait mode in response to a user hierarchy instruction. As discussed further below, other logic components to manage these wait states may exist in another portion of a processor, such as a non-core. Also shown in Figure 3 is a set of timers 326. The associated timer for analysis herein includes a TSC timer and a deadline timer that can be set by a value corresponding to a deadline, and if not in any other condition, the processor will Leave the wait state before the deadline. When the deadline timer reaches a predetermined count value (which may be a countdown to zero action in some embodiments), the wake logic component 324 may initiate certain operations. The result can be provided to a reclaim logic component, a reorder buffer (ROB) 340. More specifically, ROB 340 can include a variety of different arrays and logic components to receive information associated with the executed instructions. This information is then viewed by the ROB 340 to determine whether the instructions can be effectively retrieved and submitted to the processor's architectural status, or whether there are one or more instances that prevent proper retraction of the instructions. abnormal. Of course, the ROB 340 can take over other operations associated with the retract action. In a context in which the processor waits for an instruction in accordance with an embodiment of the present invention, the retracting action may cause the ROB 340 to set the state of one or more indicators of a flag register or other state register, which may indicate a The reason the processor exits a wait state.

如第3圖所示,ROB 340係耦合至快取記憶體350,其在一實施例中可為一低階層快取記憶體(例如一L1快取記憶體),然本發明的範圍並不受限於此。同樣地,執行單元320可直接地耦合至快取記憶體350。如所見地,快取記憶體350包括監看引擎352,其可受組配成監看一特定快取行,即一受監看位置,並且當該數值受到更新時、當該快取行中的快取同調狀態發生一項改變時、及/或當該快取行遺失時,可對喚醒邏輯組件324(及/或對非核心部件)提供一項反饋。監看引擎352取得一條給定行並且使它維持於共享狀態。如果監看引擎352曾經從該共享狀態遺失該快取行,它將喚醒該處理器。從快取記憶體350,資料通訊可藉由較高階層的快取記憶體、系統記憶體等等來進行。儘管在第3圖的實施例中係以此種高階層實行方案來展示出本發明,要了解的是,本發明的範圍不受限於此。As shown in FIG. 3, the ROB 340 is coupled to the cache memory 350, which in one embodiment may be a low-level cache memory (eg, an L1 cache memory), although the scope of the present invention is not Limited by this. Likewise, execution unit 320 can be directly coupled to cache memory 350. As can be seen, the cache memory 350 includes a watch engine 352 that can be assembled to monitor a particular cache line, ie, a monitored location, and when the value is updated, when in the cache line A wakeup logic component 324 (and/or a non-core component) may be provided with a feedback when a change occurs in the cache coherency state and/or when the cache line is lost. The watch engine 352 takes a given row and maintains it in a shared state. If the watch engine 352 has lost the cache line from the shared state, it will wake up the processor. From the cache memory 350, data communication can be performed by a higher level of cache memory, system memory, and the like. Although the present invention has been described in the embodiment of Fig. 3 in such a high-level embodiment, it is to be understood that the scope of the invention is not limited thereto.

現在請參照第4圖,其以方塊圖展示出根據本發明一實施例的一種處理器。如第4圖所示,處理器400可為一種多核心處理器,包括多個核心410a 至410n 。在一實施例中,可把各個該種核心組配為上面參照第3圖所述的核心300。該等各種不同核心可經由互連體415耦合至包括各種不同部件的非核心420。如所見地,非核心420可包括共享快取記憶體430,其可為一最後階層快取記憶體。此外,該非核心可包括整合式記憶體控制器440、各種不同介面450、以及電力管理單元455。在各種不同實施例中,可把與執行一處理器等待指令相關聯之功能性中的至少某些實行於電力管理單元455中。例如,根據與該指令一起接收到的資訊,例如該期限計時器數值,電力管理單元455可判定當中要把正在執行該等待指令的一給定核心置於此狀態的一適當低電力狀態。在一實施例中,電力管理單元455可包括使計時器數值與一低電力狀態相關聯的一圖表。單元455可根據與一指令相關聯的經判定期限值來查找此圖表,並且選出對應的等待狀態。依次地,電力管理單元455可產生多個控制信號,以使各種不同部件,即一給定核心以及其他處理器單元二者,進入一低電力狀態。如所見地,處理器400可經由一記憶體匯流排與系統記憶體460通訊。此外,藉由介面450,可以與各種不同晶片下部件進行連結,例如周邊裝置、大量儲存體等等。儘管在第4圖的實施例中係以特定實行方案來展示出本發明,要了解的是,本發明的範圍不受限於此。Referring now to Figure 4, a block diagram illustrates a processor in accordance with an embodiment of the present invention. As shown in FIG. 4, the processor 400 may be a multi-core processor comprising a plurality of core 410 a to 410 n. In an embodiment, each such core may be grouped as core 300 as described above with reference to FIG. The various cores can be coupled via interconnect 415 to a non-core 420 that includes a variety of different components. As can be seen, the non-core 420 can include a shared cache 430, which can be a last-level cache memory. Additionally, the non-cores can include an integrated memory controller 440, various different interfaces 450, and a power management unit 455. In various embodiments, at least some of the functionality associated with executing a processor wait instruction may be implemented in power management unit 455. For example, based on information received with the instruction, such as the expiration timer value, power management unit 455 can determine an appropriate low power state in which a given core that is executing the wait instruction is placed in this state. In an embodiment, power management unit 455 can include a chart that correlates the timer value to a low power state. Unit 455 can look up the chart based on the determined deadline value associated with an instruction and select a corresponding wait state. In turn, power management unit 455 can generate a plurality of control signals to cause various different components, namely a given core and other processor units, to enter a low power state. As can be seen, the processor 400 can communicate with the system memory 460 via a memory bus. In addition, through the interface 450, it is possible to connect with various different under-wafer components, such as peripheral devices, bulk storage, and the like. Although the present invention has been shown in a specific embodiment in the embodiment of Fig. 4, it is to be understood that the scope of the invention is not limited thereto.

在其他實施例中,一處理器架構可包括模擬特徵,以使該處理器可執行一第一ISA的指令,稱為一來源ISA,其中該架構係根據一第二ISA,稱為一目標ISA。大致上,包括該OS以及應用程式二者的軟體係受彙編到該來源ISA中,且硬體將以特別效能及/或能源效率特徵來針對一給定硬體實行方案實行特別設計的該目標ISA。In other embodiments, a processor architecture may include an analog feature such that the processor can execute a first ISA instruction, referred to as a source ISA, wherein the architecture is referred to as a target ISA according to a second ISA. . In general, the soft system including both the OS and the application is compiled into the source ISA, and the hardware will specifically design the target for a given hardware implementation with special performance and/or energy efficiency characteristics. ISA.

現在請參照第5圖,其以方塊圖展示出根據本發明另一實施例的一種處理器。如第5圖所示,系統500包括處理器510與記憶體520。記憶體520包括保有系統與應用程式軟體二者的習知記憶體522,以及保有針對該目標ISA而裝備之軟體的隱蔽記憶體524。如所見地,處理器510包括把來源碼轉換成目標碼的模擬引擎530。模擬動作可藉由解譯或二進制轉譯來完成。解譯通常用於它首先遇到的程式碼。隨後,因為係經由動態特徵研究而發現經常執行程式碼區域(例如熱點),它們被轉譯成該目標ISA且受儲存在隱蔽記憶體524的一程式碼快取記憶體中。最佳化動作係做為轉譯程序的部分來完成,而相當經常使用的程式碼將後續地受到進一步最佳化。該等受轉譯程式碼區塊將維持在程式碼快取記憶體524中,以使它們可以重複地受到使用。Referring now to Figure 5, a block diagram illustrates a processor in accordance with another embodiment of the present invention. As shown in FIG. 5, system 500 includes a processor 510 and a memory 520. The memory 520 includes a conventional memory 522 that holds both the system and the application software, and a covert memory 524 that holds software for the target ISA. As can be seen, the processor 510 includes a simulation engine 530 that converts the source code into a target code. The simulation action can be done by interpretation or binary translation. Interpretation is usually used for the code it first encounters. Subsequently, because of the frequent execution of code regions (eg, hotspots) via dynamic feature studies, they are translated into the target ISA and stored in a code cache of hidden memory 524. The optimized actions are done as part of the translation process, and the code that is used quite often is subsequently further optimized. The translated code blocks will be maintained in the code cache 524 so that they can be used repeatedly.

仍請參照第5圖,處理器510,其可為一種多核心處理器的一核心,可包括對指令快取記憶體(I-快取記憶體) 550提供指令指標器位址的程式計數器540。如所見地,I-快取記憶體550可另直接地接收來自隱蔽記憶體部分524而未達到一給定指令位址的目標ISA指令。因此,I-快取記憶體550可儲存目標ISA指令,其可提供給為該目標ISA之一解碼器的解碼器560以接收處於巨集指令階層的進入指令,並且把該等指令轉換成微指令以供在處理器管線570中執行。儘管本發明的範圍並不受限於此,管線570可為一脫序管線,包括用以執行與收回指令的各種不同階段。如上所述的各種不同執行單元、計時器、計數器、儲存位置與監視器可位於管線570中,以執行根據本發明一實施例的一處理器等待指令。換言之,即使在當中處理器510為不同於對其提供一使用者階層處理器等待指令之一種微架構之一微架構的一實行方案中,可在基本硬體上執行該指令。Still referring to FIG. 5, the processor 510, which may be a core of a multi-core processor, may include a program counter 540 that provides an instruction pointer address to the instruction cache (I-cache memory) 550. . As can be seen, the I-cache memory 550 can additionally receive the target ISA instructions from the covert memory portion 524 that do not reach a given instruction address. Thus, I-cache memory 550 can store target ISA instructions that can be provided to decoder 560, which is one of the target ISA decoders, to receive incoming instructions at the macro instruction level and convert the instructions into micro The instructions are for execution in processor pipeline 570. Although the scope of the present invention is not limited in this regard, the pipeline 570 can be a separate pipeline including various stages for executing and retracting instructions. Various different execution units, timers, counters, storage locations and monitors as described above may be located in pipeline 570 to perform a processor wait instruction in accordance with an embodiment of the present invention. In other words, even though the processor 510 is in an implementation different from one of the microarchitectures that provide a user hierarchy processor to wait for instructions, the instructions can be executed on the base hardware.

現在請參照第6圖,其以流程圖展示出根據本發明一實施例之合作執行緒之間的互動。如第6圖所示,方法600可用來執行多個執行緒,例如在一多執行緒處理器中。在第6圖的脈絡中,二個執行緒,執行緒1與執行緒2,為一種單一應用程式,且可為互相依賴的,因此欲由一執行緒使用的資料必須首先受到該第二執行緒更新。因此,如所見地,執行緒1可在其執行過程中接收一處理器等待指令(方塊610)。在執行此等待指令的過程中,可判定出是否已經符合一測試狀況(決策方塊620)。若否,該執行緒可進入一低電力狀態(方塊630)。儘管並未展示於第6圖,要了解的是,當各種不同狀況中之一發生時可退出此狀態。反之,如果判定出已經符合該測試狀況,控制動作將前進至方塊640,其中可在該第一執行緒中繼續進行程式碼執行動作。要注意的是,該測試狀況可參照一受監看位置,以指出該第二執行緒已經成功地於何時完成一項更新。因此,在執行參照執行緒2所示的該程式碼之前,並未符合該測試狀況,且該處理器將進入一低電力狀態。Referring now to Figure 6, a flow chart illustrates the interaction between cooperative threads in accordance with an embodiment of the present invention. As shown in FIG. 6, method 600 can be used to execute multiple threads, such as in a multi-thread processor. In the context of Figure 6, two threads, thread 1 and thread 2, are a single application and can be interdependent, so the data to be used by a thread must first be subjected to the second execution. Update. Thus, as can be seen, thread 1 can receive a processor wait instruction during its execution (block 610). In the course of executing this wait instruction, it may be determined whether a test condition has been met (decision block 620). If not, the thread can enter a low power state (block 630). Although not shown in Figure 6, it is to be understood that this state can be exited when one of various conditions occurs. Conversely, if it is determined that the test condition has been met, the control action will proceed to block 640 where the code execution action may continue in the first thread. It should be noted that the test condition can refer to a monitored location to indicate when the second thread has successfully completed an update. Therefore, the test condition is not met before the execution of the code shown in reference thread 2, and the processor will enter a low power state.

仍請參照第6圖,有關執行緒2,它可執行與該第一執行緒相互依賴的程式碼(方塊650)。例如,該第二執行緒可執行用以更新一或多個數值的程式碼,該(等)數值係用於執行該第一執行緒的過程。為了確保該第一執行緒係使用該等經更新數值來進行執行動作,可以寫入該應用程式,以使該第一執行緒進入該低電力狀態,直到該資料受到該第二執行緒更新為止。因此,在執行該第二執行緒的過程中,可以判定出它是否已經完成相互依賴程式碼的執行動作(決策方塊660)。若否,可以繼續相互依賴程式碼的執行動作。反之,如果已經完成此相互依賴程式碼片段,控制動作將前進至方塊670,其中可把一預定值寫入到該受監看位置中(方塊670)。例如,此預定值可對應於與該處理器等待指令相關聯的一測試值。在其他實施例中,該預定值可為一數值,以使得受遮罩或受使用作為一遮罩時(在該受監看位置中有一數值),該結果不為零,其表示已經符合該測試狀況,且該第一執行緒可繼續執行。仍請參照執行緒2,在寫下此預定值之後,可繼續進行該第二執行緒的程式碼執行動作(方塊680)。儘管在第6圖的實施例中係以特定實行方案來展示出本發明,要了解的是,本發明的範圍不受限於此。Still referring to FIG. 6, regarding thread 2, it can execute a code that is interdependent with the first thread (block 650). For example, the second thread can execute a code for updating one or more values, the value being used to execute the first thread. To ensure that the first thread performs the action using the updated values, the application can be written to cause the first thread to enter the low power state until the data is updated by the second thread. . Therefore, during execution of the second thread, it can be determined whether it has completed the execution of the interdependent code (decision block 660). If not, you can continue to rely on the execution of the code. Conversely, if the interdependent code segment has been completed, control will proceed to block 670 where a predetermined value can be written to the monitored location (block 670). For example, the predetermined value may correspond to a test value associated with the processor waiting for an instruction. In other embodiments, the predetermined value can be a value such that when masked or used as a mask (having a value in the monitored position), the result is not zero, indicating that the The condition is tested and the first thread can continue to execute. Still referring to thread 2, after writing the predetermined value, the code execution action of the second thread can continue (block 680). Although the present invention has been shown in a specific embodiment in the embodiment of Fig. 6, it is to be understood that the scope of the invention is not limited thereto.

因此,本發明的實施例可致能一種輕量級耽誤機構,其允許一處理器能延遲等待一或多個預定狀況發生,而不需要OS的介入。於此,不需要使一應用程式輪巡一信號/數值以在一迴圈中變為真實,包括使該處理器消耗電力且在一超執行緒機器中防止其他執行緒使用該等循環週期的測試、中止、以及跳躍操作。因此,可以避免冗餘工作以及排程限制二者中的OS監看動作(該等待應用程式可能不是受排程的下一個執行緒)。因此,可以在多個合作執行緒之間出現輕量級通訊,再者,一處理器可根據使用者已經指出的時間參數彈性地選出一睡眠狀態。Thus, embodiments of the present invention can enable a lightweight corruption mechanism that allows a processor to delay waiting for one or more predetermined conditions to occur without the intervention of the OS. There is no need for an application to patrol a signal/value to become true in a loop, including causing the processor to consume power and preventing other threads from using the cycles in a hyper-thread machine. Test, abort, and jump operations. Therefore, it is possible to avoid OS monitoring actions in both redundant work and scheduling restrictions (the waiting application may not be the next thread of the schedule). Therefore, lightweight communication can occur between multiple cooperative threads. Further, a processor can flexibly select a sleep state according to time parameters that the user has indicated.

本發明的實施例可實行於多種不同類型的系統中。現在請參照第7圖,其以方塊圖展示出根據本發明一實施例的一種系統。如第7圖所示,多處理器系統700為一種點對點互連體系統,且包括經由點對點互連體750耦合的第一處理器770以及第二處理器780。如第7圖所示,處理器770與處理器780可各為多核心處理器,包括第一處理器核心與第二處理器核心(即,處理器核心774a與處理器核心774b以及處理器核心784a與處理器核心784b),然更多核心可潛在地位於該等處理器中。該等處理器核心可執行各種不同指令,包括一使用者階層處理器等待指令。Embodiments of the invention may be implemented in a variety of different types of systems. Referring now to Figure 7, a block diagram illustrates a system in accordance with an embodiment of the present invention. As shown in FIG. 7, multiprocessor system 700 is a point-to-point interconnect system and includes a first processor 770 and a second processor 780 coupled via a point-to-point interconnect 750. As shown in FIG. 7, processor 770 and processor 780 can each be a multi-core processor, including a first processor core and a second processor core (ie, processor core 774a and processor core 774b and processor core 784a and processor core 784b), although more cores may potentially be located in the processors. The processor cores can execute a variety of different instructions, including a user hierarchy processor waiting for instructions.

仍請參照第7圖,第一處理器770另包括記憶體控制器中樞(MCH) 772以及點對點(P-P)介面776與點對點(P-P)介面778。相似地,第二處理器780包括MCH 782以及P-P介面786與P-P介面788。如第7圖所示,MCH 772與MCH 782使該等處理器耦合至個別記憶體,即記憶體732與記憶體734,其為本地式附接至該等個別處理器之主要記憶體的部分(例如一動態隨機存取記憶體(DRAM))。第一處理器770與第二處理器780可分別經由P-P互連體752與P-P互連體754耦合至晶片組790。如第7圖所示,晶片組790包括P-P介面794與P-P介面798。Still referring to FIG. 7, the first processor 770 further includes a memory controller hub (MCH) 772 and a point-to-point (P-P) interface 776 and a point-to-point (P-P) interface 778. Similarly, second processor 780 includes MCH 782 and P-P interface 786 and P-P interface 788. As shown in FIG. 7, MCH 772 and MCH 782 couple the processors to individual memories, namely memory 732 and memory 734, which are locally attached to the main memory of the individual processors. (eg, a dynamic random access memory (DRAM)). First processor 770 and second processor 780 can be coupled to chip set 790 via P-P interconnect 752 and P-P interconnect 754, respectively. As shown in FIG. 7, the chip set 790 includes a P-P interface 794 and a P-P interface 798.

再者,晶片組790包括用以藉由P-P互連體739使晶片組790耦合至高效能圖形引擎738的介面792。依次地,晶片組790可經由介面796耦合至第一匯流排716。如第7圖所示,各種不同輸入/輸出(I/O)裝置714可耦合至第一匯流排716,與匯流排橋接器718一起,其使第一匯流排716耦合至第二匯流排720。各種不同裝置可耦合至第二匯流排720,例如包括鍵盤/滑鼠722、通訊裝置726、以及資料儲存單元728,例如碟片驅動機或其他大量儲存裝置,其在一實施例中可包括程式碼730。再者,音訊I/O 724可耦合至第二匯流排720。Moreover, chipset 790 includes an interface 792 for coupling wafer set 790 to high performance graphics engine 738 via P-P interconnect 739. In turn, wafer set 790 can be coupled to first bus bar 716 via interface 796. As shown in FIG. 7, various input/output (I/O) devices 714 can be coupled to the first bus bar 716, along with the bus bar bridge 718, which couples the first bus bar 716 to the second bus bar 720. . A variety of different devices can be coupled to the second bus 720, including, for example, a keyboard/mouse 722, a communication device 726, and a data storage unit 728, such as a disc drive or other mass storage device, which in one embodiment can include a program Code 730. Further, the audio I/O 724 can be coupled to the second bus 720.

本發明的實施例可實行於程式碼中,並且可受儲存在上面儲存有指令的一儲存媒體上,其可用來規劃一系統以執行該等指令。該儲存媒體可包括但不限於:任何類型的碟片,包括軟碟片、光碟片、固態硬碟驅動機(SSD)、小型光碟唯讀記憶體(CD-ROM)、可複寫式光碟(CD-RW)、以及磁電性光碟;半導體裝置,例如唯讀記憶體(ROM)、諸如動態隨機存取記憶體(DRAM)的隨機存取記憶體(RAM)、靜態隨機存取記憶體(SRAM)、可抹除式可規劃唯讀記憶體(EPROM)、快閃記憶體、電性可抹除式可規劃唯讀記憶體(EEPROM);磁性或光學卡、或適於儲存電子指令的任何其他類型媒體。Embodiments of the present invention can be implemented in a code and can be stored on a storage medium having stored thereon instructions that can be used to plan a system to execute the instructions. The storage medium may include, but is not limited to, any type of disc, including floppy discs, optical discs, solid state drive (SSD), compact disc read only memory (CD-ROM), rewritable compact disc (CD) -RW), and magneto-optical discs; semiconductor devices such as read-only memory (ROM), random access memory (RAM) such as dynamic random access memory (DRAM), static random access memory (SRAM) Erasable, programmable read-only memory (EPROM), flash memory, electrically erasable programmable read-only memory (EEPROM); magnetic or optical card, or any other suitable for storing electronic instructions Type media.

儘管已經參照有限數量的實施例來揭露本發明,熟知技藝者將可從其意會到各種不同的修改方案與變化方案。所意圖的是,以下的申請專利範圍涵蓋屬於本發明之真實精神與範圍內的該等修改方案與變化方案。Although the invention has been described with reference to a limited number of embodiments, those skilled in the art will recognize various modifications and variations. It is intended that the following claims are intended to cover such modifications and alternatives

100、200、600...方法100, 200, 600. . . method

110~195、210~270、610~680...步驟方塊110~195, 210~270, 610~680. . . Step block

300、774a-b、784a-b...處理器核心300, 774a-b, 784a-b. . . Processor core

301...擷取單元301. . . Capture unit

303...指示快取記憶體303. . . Indicating cache memory

305...指示解碼器305. . . Indicating decoder

310...前端單元310. . . Front end unit

315...脫序(OOO)引擎315. . . Out of order (OOO) engine

320...執行單元320. . . Execution unit

322...運算邏輯單元(ALU)322. . . Arithmetic logic unit (ALU)

324...喚醒邏輯組件324. . . Wake up logic component

326...計時器326. . . Timer

330...暫存器檔案330. . . Scratch file

335...延伸式暫存器檔案335. . . Extended register file

340...重新定序緩衝器(ROB)340. . . Reordering buffer (ROB)

350...快取記憶體350. . . Cache memory

352...監看引擎352. . . Monitor engine

400、510...處理器400, 510. . . processor

410a-n ...核心410 an . . . core

415...互連體415. . . Interconnect

420...非核心420. . . Non-core

430...共享快取記憶體430. . . Shared cache memory

440...整合式記憶體控制器440. . . Integrated memory controller

450a-n、792、796...介面450a-n, 792, 796. . . interface

455...電力管理單元455. . . Power management unit

460...系統記憶體460. . . System memory

500...系統500. . . system

520、732~734...記憶體520, 732~734. . . Memory

522...習知記憶體522. . . Traditional memory

524...隱蔽記憶體524. . . Covert memory

530...模擬引擎530. . . Simulation engine

540...程式計數器540. . . Program counter

550...指示快取記憶體(I-快取記憶體)550. . . Indicates cache memory (I-cache memory)

560...解碼器560. . . decoder

570...處理器管線570. . . Processor pipeline

700...多處理器系統700. . . Multiprocessor system

714...輸入/輸出(I/O)裝置714. . . Input/output (I/O) device

716...第一匯流排716. . . First bus

718...匯流排橋接器718. . . Bus bar bridge

720...第二匯流排720. . . Second bus

722...鍵盤/滑鼠722. . . Keyboard/mouse

724...音訊I/O724. . . Audio I/O

726...通訊裝置726. . . Communication device

728...資料儲存單元728. . . Data storage unit

730...程式碼730. . . Code

738...高效能圖形引擎738. . . High performance graphics engine

739、752~754...點對點(P-P)互連體739, 752~754. . . Point-to-point (P-P) interconnect

750...點對點互連體750. . . Point-to-point interconnect

770...第一處理器770. . . First processor

772、782...記憶體控制器中樞(MCH)772,782. . . Memory Controller Hub (MCH)

776~778、786~788、794、798...點對點(P-P)介面776~778, 786~788, 794, 798. . . Point-to-point (P-P) interface

780...第二處理器780. . . Second processor

790...晶片組790. . . Chipset

第1圖以流程圖展示出根據本發明一實施例的一種方法。Figure 1 shows in a flow chart a method in accordance with an embodiment of the present invention.

第2圖以流程圖展示出根據本發明一實施例而可針對一目標值進行的一項測試。Figure 2 is a flow chart showing a test that can be performed for a target value in accordance with an embodiment of the present invention.

第3圖以方塊圖展示出根據本發明一實施例的一種處理器核心。Figure 3 is a block diagram showing a processor core in accordance with an embodiment of the present invention.

第4圖以方塊圖展示出根據本發明一實施例的一種處理器。Figure 4 is a block diagram showing a processor in accordance with an embodiment of the present invention.

第5圖以方塊圖展示出根據本發明另一實施例的一種處理器。Figure 5 is a block diagram showing a processor in accordance with another embodiment of the present invention.

第6圖以流程圖展示出根據本發明一實施例之多個合作執行緒之間的互動狀況。Figure 6 is a flow chart showing the interaction between a plurality of cooperative threads in accordance with an embodiment of the present invention.

第7圖以方塊圖展示出根據本發明一實施例的一種系統。Figure 7 is a block diagram showing a system in accordance with an embodiment of the present invention.

100...方法100. . . method

110~195...步驟方塊110~195. . . Step block

Claims (24)

一種處理器,其包含:核心,其包括解碼邏輯電路及計時器,該解碼邏輯組件用以接收並且解碼來自第一應用程式之指令,該指令用以指定欲受監看之位置的識別及計時器數值,且該計時器係耦合至該解碼邏輯電路用以針對該計時器數值而進行計數;以及電力管理單元,係耦合至該核心,用以至少部分地根據該計時器數值來判定該處理器要進入之多個低電力狀態之低電力狀態的類型,並且如果該受監看位置的數值並不等於目標值且該計時器數值尚未超過,則該電力管理單元用以回應於該判定而致使該處理器進入該低電力狀態,而不需要屈服於作業系統(OS)。 A processor comprising: a core comprising decoding logic and a timer, the decoding logic component for receiving and decoding instructions from a first application for specifying identification and timing of a location to be monitored And a timer coupled to the decoding logic for counting the timer value; and a power management unit coupled to the core to determine the processing based at least in part on the timer value The type of low power state in which the plurality of low power states are to enter, and if the value of the monitored position is not equal to the target value and the timer value has not exceeded, the power management unit is responsive to the determination The processor is caused to enter the low power state without succumbing to the operating system (OS). 如申請專利範圍第1項之處理器,另包含耦合至快取記憶體的監看引擎,用以判定包括該受監看位置之副本之該快取記憶體的快取線是否被更新。 The processor of claim 1, further comprising a monitoring engine coupled to the cache memory for determining whether a cache line of the cache memory including the copy of the monitored location is updated. 如申請專利範圍第2項之處理器,其中,該監看引擎用以將該更新過的副本以及喚醒信號傳遞到該核心。 The processor of claim 2, wherein the monitoring engine is configured to deliver the updated copy and the wake-up signal to the core. 如申請專利範圍第3項之處理器,其中,該核心用以判定該更新過的副本是否對應於該目標值,且若是,則離開該低電力狀態,否則便判定新的低電力狀態並且進入該新的低電力狀態中。 The processor of claim 3, wherein the core is configured to determine whether the updated copy corresponds to the target value, and if yes, leave the low power state, otherwise determine a new low power state and enter The new low power state. 如申請專利範圍第1項之處理器,其中,該指令為使用者層級的指令,用以致使該處理器載入第一數值、 在介於該第一數值與儲存在目的地位置中的資料之間進行遮罩運算、以及如果該遮罩運算的結果為第一結果則進入該低電力狀態,否則該處理器用以將該第一數值載入到該目的地位置。 The processor of claim 1, wherein the instruction is a user-level instruction for causing the processor to load the first value, Performing a mask operation between the first value and the data stored in the destination location, and entering the low power state if the result of the mask operation is the first result, otherwise the processor is configured to use the A value is loaded to the destination location. 如申請專利範圍第5項之處理器,其中,如果該結果等於零,則該處理器用以設定旗標暫存器的零位指示器。 The processor of claim 5, wherein if the result is equal to zero, the processor is configured to set a zero indicator of the flag register. 如申請專利範圍第1項之處理器,其中,該計時器係用以被設定為與時間戳記計數器數值與該計時器數值間之差異相對應的數值。 The processor of claim 1, wherein the timer is configured to be a value corresponding to a difference between a timestamp counter value and the timer value. 如申請專利範圍第1項之處理器,其中,該處理器包含多核心處理器,該多核心處理器包括該核心及第二核心,其中,該指令為具有要執行於該核心上的第一執行緒及要更新該受監看位置之第二執行緒的指令。 The processor of claim 1, wherein the processor comprises a multi-core processor, the multi-core processor comprising the core and the second core, wherein the instruction is to have a first to be executed on the core The thread and the instruction to update the second thread of the monitored position. 如申請專利範圍第8項之處理器,其中,該核心係用以回應於對該受監看位置的該更新而離開該低電力狀態。 The processor of claim 8, wherein the core is configured to leave the low power state in response to the update to the monitored location. 如申請專利範圍第9項之處理器,其中,該核心係用以在那之後,於該第二執行緒更新該受監看位置之前,使用由該第二執行緒所更新的資料來執行該第一執行緒的至少一操作。 The processor of claim 9, wherein the core is configured to perform the use of the information updated by the second thread after the second thread updates the monitored position after that At least one operation of the first thread. 一種方法,其包含下列步驟:在處理器中接收並解碼來自第一應用程式的指令,該指令指定欲受監看之位置的識別以及計時器數值; 回應於該指令,至少部分地根據該計時器數值而在該處理器中判定該處理器要進入之多個低電力狀態之低電力狀態的類型;以及如果該受監看位置的數值並不等於目標值且該計時器數值尚未超過,則回應於該判定而致使該處理器進入該低電力狀態。 A method comprising the steps of: receiving and decoding, in a processor, an instruction from a first application, the instruction specifying an identification of a location to be monitored and a timer value; Responding to the instruction to determine, in the processor based at least in part on the timer value, a type of low power state in which the processor is to enter a plurality of low power states; and if the monitored position value is not equal The target value and the timer value has not passed, and the processor is caused to enter the low power state in response to the determination. 如申請專利範圍第11項之方法,其中,該指令另指定該受監看位置的該目標值。 The method of claim 11, wherein the instruction further specifies the target value of the monitored position. 如申請專利範圍第11項之方法,另包含回應於超過該計時器數值而離開該低電力狀態。 The method of claim 11, further comprising exiting the low power state in response to exceeding the timer value. 如申請專利範圍第11項之方法,另包含當該受監看位置的數值等於該目標值時,離開該低電力狀態,其包括接收來自該處理器之快取記憶體之監看引擎的喚醒信號,當包括該受監看位置之副本之快取線的儲存數值已經被改變時,該監看引擎即傳送該喚醒信號。 The method of claim 11, further comprising leaving the low power state when the value of the monitored position is equal to the target value, comprising waking up by a monitoring engine that receives the cache memory from the processor. The signal, when the stored value of the cache line including the copy of the monitored position has been changed, the monitoring engine transmits the wake-up signal. 如申請專利範圍第11項之方法,另包含使用該處理器的電力管理單元(PMU),根據具有多個分錄之表中的資訊,從多個低電力狀態中選出該低電力狀態的該類型,該多個分錄之各個分錄使一低電力狀態與一計時器數值相關聯,並且將至少一控制信號從該PMU傳送到該處理器的核心,以致使該核心進入該低電力狀態。 The method of claim 11, further comprising a power management unit (PMU) using the processor, selecting the low power state from the plurality of low power states according to information in a table having a plurality of entries Type, each entry of the plurality of entries associates a low power state with a timer value, and transmits at least one control signal from the PMU to a core of the processor to cause the core to enter the low power state . 如申請專利範圍第11項之方法,另包含接收來自耦合至該處理器之第二處理器的喚醒信號,並且回應於該喚醒信號而離開該低電力狀態。 The method of claim 11, further comprising receiving a wake-up signal from a second processor coupled to the processor and exiting the low power state in response to the wake-up signal. 一種系統,其包含:包括第一核心及第二核心的多核心處理器,該第一核心包括解碼邏輯組件及計時器,該解碼邏輯組件用以解碼使用者層級的指令以致使等待狀態發生,該使用者層級的指令指定欲受監看的位置及計時器數值,該計時器係耦合至該解碼邏輯組件以針對該計時器數值而進行計數,該多核心處理器另包括耦合至該等第一與第二核心的電力管理邏輯組件,用以至少部分地根據該計時器數值而選擇該多核心處理器要進入之多個低電力狀態的其中一個低電力狀態,但不需要進入作業系統(OS),並且如果該受監看位置的數值並不等於目標值,則回應於該選擇而致使該第一核心進入該選擇到的低電力狀態;耦合至該多核心處理器的動態隨機存取記憶體(DRAM)。 A system comprising: a multi-core processor comprising a first core and a second core, the first core comprising a decoding logic component and a timer, the decoding logic component for decoding user level instructions to cause a wait state to occur, The user level instructions specify a location to be monitored and a timer value coupled to the decoding logic component to count for the timer value, the multicore processor further including coupling to the And a power management logic component of the second core for selecting, based at least in part on the timer value, one of a plurality of low power states of the plurality of low power states to enter, but not entering the operating system ( OS), and if the value of the monitored location is not equal to the target value, causing the first core to enter the selected low power state in response to the selection; dynamic random access coupled to the multi-core processor Memory (DRAM). 如申請專利範圍第17項之系統,其中,該第一核心用以回應於該使用者層級的指令,在第一運算元與第二運算元之間進行遮罩運算,並且如果該遮罩運算的結果並不是該目標值,則進入該選擇到的低電力狀態。 The system of claim 17, wherein the first core is configured to perform a mask operation between the first operand and the second operand in response to the instruction of the user hierarchy, and if the mask operation The result is not the target value, then enters the selected low power state. 如申請專利範圍第18項之系統,另包含耦合至該第一核心的監看邏輯組件,該監看邏輯組件用以回應於對該受監看位置的更新而致使該第一核心離開該低電力狀態。 A system as claimed in claim 18, further comprising a monitoring logic component coupled to the first core, the monitoring logic component responsive to the update of the monitored location causing the first core to leave the low Power status. 如申請專利範圍第19項之系統,其中,該監看邏輯組件用以當與該受監看位置相關聯的快取線已被更新 時,或者該快取線的同調狀態已被更新時,將喚醒信號傳送到該第一核心。 The system of claim 19, wherein the monitoring logic component is used when a cache line associated with the monitored location has been updated When the coherent state of the cache line has been updated, a wake-up signal is transmitted to the first core. 一種包含機器可存取儲存媒體的物件,該機器可存取儲存媒體包括有指令,當該等指令被執行時致使系統用以:在第一執行緒的執行期間,於多核心處理器的第一核心中接收使用者層級的處理器等待指令,該使用者層級的處理器等待指令指定欲受監看的位置及計時器數值;在該第一核心中判定是否已經符合該使用者層級的處理器等待指令的條件,且若否,則進入由該多核心處理器之電力管理邏輯組件從多個低電力狀態中所選擇到的低電力狀態中;更新數值於該多核心處理器的第二核心上之第二執行緒的執行期間;回應於該數值更新,離開該第一核心的該低電力狀態,並且判定是否已經符合該條件;以及若符合該條件,則繼續在該第一核心上執行該第一執行緒。 An object comprising a machine-accessible storage medium, the machine-accessible storage medium comprising instructions that, when executed, cause the system to: during execution of the first thread, on the multi-core processor a processor in the core receiving the user level waits for an instruction, the processor of the user level waits for an instruction to specify a location to be monitored and a timer value; and determines, in the first core, whether the user level has been processed. Waiting for the condition of the instruction, and if not, entering the low power state selected by the power management logic component of the multi-core processor from the plurality of low power states; updating the value to the second of the multi-core processor During the execution of the second thread on the core; in response to the value update, leaving the low power state of the first core, and determining whether the condition has been met; and if the condition is met, continuing on the first core Execute the first thread. 如申請專利範圍第21項之物件,另包含下列指令:致使該系統能夠回應於對該受監看位置的更新而使該第一核心離開該低電力狀態,並且使用該數值更新而針對該條件來進行測試。 An article of claim 21, further comprising the instructions of causing the system to cause the first core to leave the low power state in response to an update to the monitored location and to update the condition using the value To test. 如申請專利範圍第22項之物件,另包含下列指令:致使該系統能夠在與該受監看位置相關聯的快取線已 被更新時,或該快取線的同調狀態已被更新時,判定對該受監看位置的該更新,並且使該第一核心回應於對該受監看位置的該更新而離開該低電力狀態。 For example, the object of claim 22, further comprising the following instructions: enabling the system to be in the cache line associated with the monitored position When updated, or when the coherent state of the cache line has been updated, the update to the monitored location is determined, and the first core is caused to leave the low power in response to the update to the monitored location status. 如申請專利範圍第21項之物件,另包含下列指令:致使該系統能夠使用該電力管理邏輯組件而根據具有多個分錄之表中的資訊,從多個低電力狀態中選出該低電力狀態,該多個分錄之各個分錄使一低電力狀態與一計時器數值相關聯,並且傳送至少一控制信號以致使該第一核心進入該低電力狀態。 The article of claim 21, further comprising the following instructions: causing the system to use the power management logic component to select the low power state from a plurality of low power states based on information in a table having a plurality of entries Each entry of the plurality of entries associates a low power state with a timer value and transmits at least one control signal to cause the first core to enter the low power state.
TW099136477A 2009-12-18 2010-10-26 Instruction for enabling a processor wait state TWI512448B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/641,534 US8464035B2 (en) 2009-12-18 2009-12-18 Instruction for enabling a processor wait state

Publications (2)

Publication Number Publication Date
TW201131349A TW201131349A (en) 2011-09-16
TWI512448B true TWI512448B (en) 2015-12-11

Family

ID=44152840

Family Applications (1)

Application Number Title Priority Date Filing Date
TW099136477A TWI512448B (en) 2009-12-18 2010-10-26 Instruction for enabling a processor wait state

Country Status (8)

Country Link
US (3) US8464035B2 (en)
JP (2) JP5571784B2 (en)
KR (1) KR101410634B1 (en)
CN (1) CN102103484B (en)
DE (1) DE102010052680A1 (en)
GB (1) GB2483012B (en)
TW (1) TWI512448B (en)
WO (1) WO2011075246A2 (en)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9672019B2 (en) 2008-11-24 2017-06-06 Intel Corporation Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads
US10621092B2 (en) 2008-11-24 2020-04-14 Intel Corporation Merging level cache and data cache units having indicator bits related to speculative execution
US8464035B2 (en) * 2009-12-18 2013-06-11 Intel Corporation Instruction for enabling a processor wait state
US8775153B2 (en) * 2009-12-23 2014-07-08 Intel Corporation Transitioning from source instruction set architecture (ISA) code to translated code in a partial emulation environment
US8977878B2 (en) * 2011-05-19 2015-03-10 Texas Instruments Incorporated Reducing current leakage in L1 program memory
US9207730B2 (en) * 2011-06-02 2015-12-08 Apple Inc. Multi-level thermal management in an electronic device
WO2013048468A1 (en) 2011-09-30 2013-04-04 Intel Corporation Instruction and logic to perform dynamic binary translation
US9063760B2 (en) * 2011-10-13 2015-06-23 International Business Machines Corporation Employing native routines instead of emulated routines in an application being emulated
WO2013089685A1 (en) 2011-12-13 2013-06-20 Intel Corporation Enhanced system sleep state support in servers using non-volatile random access memory
CN107025093B (en) 2011-12-23 2019-07-09 英特尔公司 For instructing the device of processing, for the method and machine readable media of process instruction
WO2013101165A1 (en) * 2011-12-30 2013-07-04 Intel Corporation Register error protection through binary translation
JP5900606B2 (en) * 2012-03-30 2016-04-06 富士通株式会社 Data processing device
US20140075163A1 (en) * 2012-09-07 2014-03-13 Paul N. Loewenstein Load-monitor mwait
JP5715107B2 (en) * 2012-10-29 2015-05-07 富士通テン株式会社 Control system
CN104813277B (en) * 2012-12-19 2019-06-28 英特尔公司 The vector mask of power efficiency for processor drives Clock gating
US9081577B2 (en) 2012-12-28 2015-07-14 Intel Corporation Independent control of processor core retention states
US9164565B2 (en) 2012-12-28 2015-10-20 Intel Corporation Apparatus and method to manage energy usage of a processor
US9405551B2 (en) 2013-03-12 2016-08-02 Intel Corporation Creating an isolated execution environment in a co-designed processor
JP6175980B2 (en) * 2013-08-23 2017-08-09 富士通株式会社 CPU control method, control program, and information processing apparatus
US9513687B2 (en) * 2013-08-28 2016-12-06 Via Technologies, Inc. Core synchronization mechanism in a multi-die multi-core microprocessor
US9891936B2 (en) 2013-09-27 2018-02-13 Intel Corporation Method and apparatus for page-level monitoring
WO2015057819A1 (en) * 2013-10-15 2015-04-23 Mill Computing, Inc. Computer processor with deferred operations
CN105094747B (en) * 2014-05-07 2018-12-04 阿里巴巴集团控股有限公司 The device of central processing unit based on SMT and the data dependence for detection instruction
US10467011B2 (en) * 2014-07-21 2019-11-05 Intel Corporation Thread pause processors, methods, systems, and instructions
KR20160054850A (en) * 2014-11-07 2016-05-17 삼성전자주식회사 Apparatus and method for operating processors
US20160306416A1 (en) * 2015-04-16 2016-10-20 Intel Corporation Apparatus and Method for Adjusting Processor Power Usage Based On Network Load
KR102476357B1 (en) 2015-08-06 2022-12-09 삼성전자주식회사 Clock management unit, integrated circuit and system on chip adopting the same, and clock managing method
US20170177336A1 (en) * 2015-12-22 2017-06-22 Intel Corporation Hardware cancellation monitor for floating point operations
US11023233B2 (en) 2016-02-09 2021-06-01 Intel Corporation Methods, apparatus, and instructions for user level thread suspension
US10185564B2 (en) 2016-04-28 2019-01-22 Oracle International Corporation Method for managing software threads dependent on condition variables
US11016893B2 (en) 2016-09-30 2021-05-25 Intel Corporation Method and apparatus for smart store operations with conditional ownership requests
US11061730B2 (en) * 2016-11-18 2021-07-13 Red Hat Israel, Ltd. Efficient scheduling for hyper-threaded CPUs using memory monitoring
US10289516B2 (en) 2016-12-29 2019-05-14 Intel Corporation NMONITOR instruction for monitoring a plurality of addresses
US10627888B2 (en) 2017-01-30 2020-04-21 International Business Machines Corporation Processor power-saving during wait events
US11086672B2 (en) * 2019-05-07 2021-08-10 International Business Machines Corporation Low latency management of processor core wait state
CN113867518A (en) * 2021-09-15 2021-12-31 珠海亿智电子科技有限公司 Processor low-power consumption blocking type time delay method, device and readable medium
CN113986663A (en) * 2021-10-22 2022-01-28 上海兆芯集成电路有限公司 Electronic device and power consumption control method thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1282030A1 (en) * 2000-05-08 2003-02-05 Mitsubishi Denki Kabushiki Kaisha Computer system and computer-readable recording medium
US20060005197A1 (en) * 2004-06-30 2006-01-05 Bratin Saha Compare and exchange operation using sleep-wakeup mechanism

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7363474B2 (en) 2001-12-31 2008-04-22 Intel Corporation Method and apparatus for suspending execution of a thread until a specified memory access occurs
US7127561B2 (en) 2001-12-31 2006-10-24 Intel Corporation Coherency techniques for suspending execution of a thread until a specified memory access occurs
US7213093B2 (en) 2003-06-27 2007-05-01 Intel Corporation Queued locks using monitor-memory wait
JP4376692B2 (en) 2004-04-30 2009-12-02 富士通株式会社 Information processing device, processor, processor control method, information processing device control method, cache memory
GB2414573B (en) 2004-05-26 2007-08-08 Advanced Risc Mach Ltd Control of access to a shared resource in a data processing apparatus
US7810083B2 (en) 2004-12-30 2010-10-05 Intel Corporation Mechanism to emulate user-level multithreading on an OS-sequestered sequencer
US8607235B2 (en) 2004-12-30 2013-12-10 Intel Corporation Mechanism to schedule threads on OS-sequestered sequencers without operating system intervention
US8719819B2 (en) 2005-06-30 2014-05-06 Intel Corporation Mechanism for instruction set based thread execution on a plurality of instruction sequencers
US8516483B2 (en) 2005-05-13 2013-08-20 Intel Corporation Transparent support for operating system services for a sequestered sequencer
US8010969B2 (en) 2005-06-13 2011-08-30 Intel Corporation Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers
US7882339B2 (en) * 2005-06-23 2011-02-01 Intel Corporation Primitives to enhance thread-level speculation
US8028295B2 (en) 2005-09-30 2011-09-27 Intel Corporation Apparatus, system, and method for persistent user-level thread
GB0519981D0 (en) * 2005-09-30 2005-11-09 Ignios Ltd Scheduling in a multicore architecture
US7941681B2 (en) * 2007-08-17 2011-05-10 International Business Machines Corporation Proactive power management in a parallel computer
US20090150696A1 (en) * 2007-12-10 2009-06-11 Justin Song Transitioning a processor package to a low power state
US9081687B2 (en) 2007-12-28 2015-07-14 Intel Corporation Method and apparatus for MONITOR and MWAIT in a distributed cache architecture
US8156362B2 (en) 2008-03-11 2012-04-10 Globalfoundries Inc. Hardware monitoring and decision making for transitioning in and out of low-power state
DE102009001142A1 (en) * 2009-02-25 2010-08-26 Robert Bosch Gmbh Electromechanical brake booster
US8156275B2 (en) * 2009-05-13 2012-04-10 Apple Inc. Power managed lock optimization
US8464035B2 (en) * 2009-12-18 2013-06-11 Intel Corporation Instruction for enabling a processor wait state

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1282030A1 (en) * 2000-05-08 2003-02-05 Mitsubishi Denki Kabushiki Kaisha Computer system and computer-readable recording medium
US20060005197A1 (en) * 2004-06-30 2006-01-05 Bratin Saha Compare and exchange operation using sleep-wakeup mechanism

Also Published As

Publication number Publication date
JP2012531681A (en) 2012-12-10
CN102103484A (en) 2011-06-22
GB2483012A (en) 2012-02-22
TW201131349A (en) 2011-09-16
US20130185580A1 (en) 2013-07-18
US8464035B2 (en) 2013-06-11
US8990597B2 (en) 2015-03-24
DE102010052680A1 (en) 2011-07-07
KR101410634B1 (en) 2014-06-20
KR20120110120A (en) 2012-10-09
US9032232B2 (en) 2015-05-12
CN102103484B (en) 2015-08-19
GB201119728D0 (en) 2011-12-28
WO2011075246A2 (en) 2011-06-23
WO2011075246A3 (en) 2011-08-18
JP2014222520A (en) 2014-11-27
US20130246824A1 (en) 2013-09-19
GB2483012B (en) 2017-10-18
JP5571784B2 (en) 2014-08-13
JP5795820B2 (en) 2015-10-14
US20110154079A1 (en) 2011-06-23

Similar Documents

Publication Publication Date Title
TWI512448B (en) Instruction for enabling a processor wait state
TWI742032B (en) Methods, apparatus, and instructions for user-level thread suspension
TWI590153B (en) Methods for multi-threaded processing
JP5801372B2 (en) Providing state memory in the processor for system management mode
US8539485B2 (en) Polling using reservation mechanism
TWI266987B (en) Method for monitoring locks, processor, system for monitoring locks, and machine-readable medium
US9128781B2 (en) Processor with memory race recorder to record thread interleavings in multi-threaded software
US7127561B2 (en) Coherency techniques for suspending execution of a thread until a specified memory access occurs
TW201508635A (en) Dynamic reconfiguration of multi-core processor
TW201508643A (en) Propagation of microcode patches to multiple cores in multicore microprocessor
US8447960B2 (en) Pausing and activating thread state upon pin assertion by external logic monitoring polling loop exit time condition
US9886396B2 (en) Scalable event handling in multi-threaded processor cores
US20110173420A1 (en) Processor resume unit
JP5474926B2 (en) Electric power retirement

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees