KR20070057906A

KR20070057906A - Sharing monitored cache lines across multiple cores

Info

Publication number: KR20070057906A
Application number: KR1020077007487A
Authority: KR
Inventors: 마이클 티. 클락
Original assignee: 어드밴스드 마이크로 디바이시즈, 인코포레이티드
Priority date: 2004-10-01
Filing date: 2005-09-21
Publication date: 2007-06-07
Also published as: US20060075060A1; TWI366132B; CN101036116A; JP4982375B2; WO2006039162A2; DE602005020960D1; TW200627271A; WO2006039162A3; US7257679B2; CN101036116B; EP1807754A2; KR101216190B1; JP2008515096A; EP1807754B1

Abstract

In one embodiment, a system comprises a first processor core and a second processor core. The first processor core is configured to communicate an address range indication identifying an address range that the first processor core is monitoring for an update. The first processor core is configured to communicate the address range indication responsive to executing a first instruction defined to cause the first processor core to monitor the address range. Coupled to receive the address range indication, the second processor core is configured, responsive to executing a store operation that updates at least one byte in the address range, to signal the first processing core. Coupled to receive the signal from the second processor core, the first processor core is configured to exit a first state in which the first processor core is awaiting the update in the address range responsive to the signal.

Description

Cache line sharing monitored across multiple cores {SHARING MONITORED CACHE LINES ACROSS MULTIPLE CORES}

본 발명은 프로세서에 관한 것이고, 특히 변화에 대한 캐시 라인(cache lines)의 모티터링(monitoring)에 관한 것이다.TECHNICAL FIELD The present invention relates to processors, and in particular, to monitoring cache lines for changes.

많은 응용 프로그램이 다른 응용 프로그램과 상호작용하도록 라이팅(writing)된다. 추가적으로, 많은 응용 프로그램들이 멀티-스레드된(multi-threded) 응용으로서 라이팅된다. 멀티-스레드된 응용은 상대적으로 독립적으로 실행하도록 설계된 다수 코드 시퀀스(code sequences)(스레드(threads))를 갖는다. 이 스레드(또는 응용)는 다양한 방법으로 서로 통신할 수 있다. 간결함을 위해, 용어 "스레드"는 이 설명 내에서 멀티-스레드된 응용으로부터의 코드 시퀀스를 언급하거나 또는 만약 응용이 그 자체로 멀티-스레드되지 않았다면 전체로서 응용을 언급하기 위해 사용된다.Many applications are written to interact with other applications. In addition, many applications are written as multi-threded applications. Multi-threaded applications have multiple code sequences (threads) designed to execute relatively independently. These threads (or applications) can communicate with each other in a variety of ways. For brevity, the term "thread" is used within this description to refer to a code sequence from a multi-threaded application or to refer to an application as a whole if the application is not multi-threaded by itself.

메모리 위치는 종종 스레드들 사이에서 통신하기 위해 사용된다. 예를 들어, 메모리 위치는, 메모리의 더 커다란 영역에 대한 억세스(access)를 제어하고, 주변 장치와 같은 컴퓨터 시스템 내에서의 또 다른 리소스(resource)에 대한 억세스를 제어하고, 특별한 코드 시퀀스(종종 "임계 지역(critical section)"으로 언급됨)를 실행하는 능력을 제어하고, 기타 등등을 제어하는데 사용되는 세마포어(semaphore)를 저장하도록 정의될 수 있다. 상기 것들 중 어떤 것은 아래에서 보호되는 리소스로 언급된다. 일반적으로, 스레드는 세마포어에 억세스할 수 있고 그 상태를 점검할 수 있다. 만약 그 상태가, 보호되는 리소스를 스레드가 제어할 수 있다고 표시한다면, 이 스레드는 세마포어 상태를 바꾸어, 보호되는 리소스를 이 스레드가 제어한다고 표시할 수 있다. 만약에 그 상태가, 보호되는 리소스의 제어 내에 또 다른 스레드가 있다고 표시한다면, 이 스레드는 (예를 들어, 보호되는 리소스와 함께 종료되었다고 표시하도록 세마포어를 라이팅하는 다른 스레드에 의해) 그 상태가 바뀔 때까지 세마포어를 계속 점검할 수 있다. 메모리 위치는 또한 스레드들 사이에서 다른 메시지(message)들을 보내기 위해 (또는 메시지가 이용가능하다고 표시하기 위해) 사용될 수도 있다. 만약 소정의 스레드가 또 다른 스레드로부터의 메시지를 기다리고 있다면, 이 메시지가 이용가능하다고 표시하는 값으로 메모리 위치가 라이팅될 때까지 상기 소정의 스레드는 메모리 위치를 계속 점검할 수 있다. 다른 많은 예들이 존재하며, 여기서 스레드는 또 다른 스레드와 통신하기 위해 메모리 위치를 사용한다.Memory locations are often used to communicate between threads. For example, a memory location controls access to a larger area of memory, controls access to another resource within a computer system, such as a peripheral device, and may be a special code sequence (often Can be defined to store semaphores used to control the ability to execute " critical section " Some of these are referred to below as protected resources. In general, a thread can access a semaphore and check its status. If the state indicates that the thread can control the protected resource, this thread can change the semaphore state, indicating that this thread controls the protected resource. If the state indicates that there is another thread in the control of the protected resource, this thread will change its state (for example, by another thread writing a semaphore to indicate that it has terminated with the protected resource). You can keep checking the semaphore until The memory location may also be used to send other messages between threads (or to indicate that a message is available). If a thread is waiting for a message from another thread, the thread can continue to check the memory location until the memory location is written to a value indicating that the message is available. Many other examples exist, where a thread uses a memory location to communicate with another thread.

일반적으로, 스레드가 요구된 상태에 대한 메모리 위치를 점검하고 있고 그리고 메모리 위치 내의 요구된 상태를 찾지 못하는 경우, 스레드는 "스핀 루프(spin loop)"로 들어가고, 여기서 스레드는 요구된 상태에 대해 점검하면서 메모리 위치에 반복적으로 억세스한다. 메모리 위치가 결국 요구된 상태로 라이팅 될 때, 스레드는 스핀 루프에서 나올 수 있다. 스레드가 스핀 루프에 있는 동안, 스레 드는 실제로 유용한 일을 달성하지 못하고 있다. 그러나, 스레드를 실행하는 프로세서는 스핀 루프를 실행하는 파워(power)를 소비하고 있다.In general, if a thread is checking the memory location for the requested state and cannot find the required state in the memory location, the thread enters a "spin loop" where the thread checks for the requested state. Repeatedly accessing memory locations. When the memory location is eventually written to the required state, the thread can exit the spin loop. While the thread is in the spin loop, the thread is not really doing anything useful. However, the processor executing the thread is consuming the power to execute the spin loop.

어떤 명령 세트 구조는 프로세서가 이러한 상황들에 대해 최적화되도록 하는 명령들을 정의한다(만약 프로그래머가, 스핀 루프에서 그리고 스레드가 메모리 위치 내의 요구된 상태를 기다리는 다른 경우에 이 명령들을 사용한다면). 예를 들어, x86 명령 세트(Streaming Single instruction multiple data Extensions 3, 또는 SSE3를 가짐)는 MONITOR/MWAIT 명령 쌍을 정의한다. MONITOR 명령은 프로세서가 (예를 들어, 또 다른 프로세서에 의해 실행되는 저장(store)으로 인해) 업데이트(update)를 위해 모니터링(monitoring)하는 어드레스 범위(address range)을 확립하기 위해 사용될 수 있다. MWAIT 명령은 프로세서로 하여금 업데이트를 기다리는 동안 "구현 의존성 최적화된 상태(implementation dependent optimized state)"에 들어가도록 하는데 사용될 수 있다. 프로세서는 모니터링된 어드레스 범위에서의 저장에 응답하여 (그리고 또한 어떤 인터럽트(interrupts)에 대해서 그리고 모니터링된 어드레스 범위와 관계없는 다른 이유에 대해서) 구현 의존성 최적화된 상태에서 나온다. 일반적으로, 프로세서는 프로세서 내에서 구현되는 정상 코히런시 매커니즘(coherency mechanism)을 통한 업데이트에 관해 통보받는다.Some instruction set structures define instructions that allow the processor to optimize for these situations (if the programmer uses these instructions in the spin loop and in other cases where the thread waits for the required state in a memory location). For example, the x86 instruction set (with Streaming Single instruction multiple data extensions 3, or SSE3) defines a MONITOR / MWAIT instruction pair. The MONITOR instruction can be used to establish an address range that the processor monitors for updates (eg, due to a store executed by another processor). The MWAIT instruction may be used to cause the processor to enter an "implementation dependent optimized state" while waiting for an update. The processor emerges in an implementation dependent optimization state in response to storage in the monitored address range (and also for some interrupts and for other reasons not related to the monitored address range). In general, a processor is informed about updates through normal coherency mechanisms implemented within the processor.

일 실시예로, 시스템은 제 1 프로세서 코어(processor core)와 제 2 프로세서 코어를 포함한다. 상기 제 1 프로세서 코어는 상기 제 1 프로세서 코어가 업데이트에 대해 모니터링 하고 있는 어드레스 범위를 식별하는 어드레스 범위 표시를 전달하도록 구성된다. 상기 제 1 프로세서 코어는, 상기 제 1 프로세서 코어로 하여금 업데이트에 대해 상기 어드레스 범위를 모니터링 하도록 정의된 제 1 명령을 실행하는 것에 응답하여 상기 어드레스 범위 표시를 전달하도록 구성된다. 상기 제 2 프로세서 코어는 상기 어드레스 범위 표시를 수신하도록 연결되고, 상기 어드레스 범위 내에서 적어도 하나의 바이트(byte)를 업데이트 하는 저장 동작을 실행하는 것에 응답하여, 상기 제 1 프로세서 코어에 신호를 보내도록 구성된다. 상기 제 1 프로세서 코어는 상기 제 2 프로세서 코어로부터의 상기 신호를 수신하도록 연결되고, 상기 신호에 응답하여 제 1 상태에서 나오도록 구성되며, 여기서 상기 제 1 상태는 상기 제 1 프로세서 코어가 상기 어드레스 범위에서의 업데이트를 기다리고 있는 상태이다.In one embodiment, the system includes a first processor core and a second processor core. The first processor core is configured to convey an address range indication that identifies the address range that the first processor core is monitoring for updates. The first processor core is configured to convey the address range indication in response to executing the first instruction defined by the first processor core to monitor the address range for an update. The second processor core is coupled to receive the address range indication, and in response to performing a store operation to update at least one byte within the address range, to signal the first processor core. It is composed. The first processor core is coupled to receive the signal from the second processor core and is configured to exit from a first state in response to the signal, wherein the first state is such that the first processor core is in the address range. Waiting for an update from.

또 다른 실시예로, 방법이 고려된다. 이 방법은 제 1 프로세서 코어가 업데이트에 대해 모니터링 하고 있는 어드레스 범위를 식별하는 어드레스 범위를 식별하는 어드레스 범위 표시를 상기 제 1 프로세서 코어로부터 제 2 프로세서 코어로 전달하는 단계와, 여기서 상기 전달은 상기 제 1 프로세서 코어가 업데이트에 대한 상기 어드레스 범위를 모니터링 하도록 정의된 제 1 명령을 실행하는 것에 응답하며; 상기 제 2 프로세서 코어 내의 상기 어드레스 범위 내에서 적어도 하나의 바이트를 업데이트 하는 저장 동작을 실행하는 단계와; 상기 저장 동작에 응답하여, 상기 제 1 프로세서 코어에 신호를 보내는 단계와; 그리고 상기 제 1 프로세서 코어에서의 제 1 상태에서 나오는 단계를 포함하며, 여기서 상기 제 1 프로세서 코어는 상기 신호를 보내는 것에 응답하여 상기 어드레스 범위 내에서 상기 업데이트를 기다리고 있다.In another embodiment, a method is contemplated. The method includes passing an address range indication from the first processor core to a second processor core identifying an address range that identifies an address range that a first processor core is monitoring for updates, wherein the forwarding is performed by the first processor core. One processor core is responsive to executing a first instruction defined to monitor the address range for updates; Executing a store operation to update at least one byte within the address range in the second processor core; In response to the storing operation, sending a signal to the first processor core; And exiting from a first state in the first processor core, where the first processor core is waiting for the update within the address range in response to sending the signal.

또 다른 실시예로, 모니터 유닛(monotor unit)을 포함하는 프로세서 코어가 고려된다. 상기 프로세서 코어는 제 1 명령에 응답하여 업데이트에 대한 어드레스 범위를 모니터링하도록 구성된다. 상기 프로세서 코어는 상기 어드레스 범위에 대한 상기 업데이트를 기다리기 위해 제 1 상태에 들어가도록 구성된다. 상기 모니터 유닛은 상기 제 1 명령을 실행하는 것에 응답하여 상기 어드레스 범위를 식별하는 어드레스 범위 표시를 제 2 프로세서 코어에 전달하도록 구성되고, 그리고 상기 제 2 프로세서 코어가 상기 어드레스 범위에서의 적어도 하나의 바이트를 업데이트하고 있다는 것을 표시하는 상기 제 2 프로세서 코어로부터의 신호를 수신하도록 구성된다. 상기 프로세서 코어는 상기 신호에 응답하여 상기 제 1 상태에서 나오도록 구성된다.In another embodiment, a processor core that includes a monitor unit is contemplated. The processor core is configured to monitor the address range for the update in response to the first instruction. The processor core is configured to enter a first state to wait for the update to the address range. The monitor unit is configured to deliver an address range indication to a second processor core in response to executing the first command, the second processor core being at least one byte in the address range. And receive a signal from the second processor core indicating that it is updating. The processor core is configured to exit the first state in response to the signal.

다음의 세부적인 설명은 첨부되는 도면을 참조하며, 이 도면은 이제 간단히 설명된다. The following detailed description refers to the accompanying drawings, which are now briefly described.

도 1은 복수의 프로세서 코어를 포함하는 컴퓨터 시스템의 일 실시예의 블록도이다.1 is a block diagram of one embodiment of a computer system including a plurality of processor cores.

도 2는 모니터 명령의 실행 동안 프로세서 코어의 일 실시예의 동작을 나타낸 순서도이다.2 is a flowchart illustrating operation of one embodiment of a processor core during execution of a monitor instruction.

도 3은 MWait 명령의 실행 동안 프로세서 코어의 일 실시예의 동작을 나타낸 순서도이다.3 is a flowchart illustrating operation of one embodiment of a processor core during execution of an MWait instruction.

도 4는 저장 명령의 실행 동안 프로세서 코어의 일 실시예의 동작을 나타낸 순서도이다.4 is a flow diagram illustrating operation of one embodiment of a processor core during execution of a store instruction.

도 5는 캐시 라인의 업데이트를 기다리는 동안 낮은 파워 상태에 들어가기 위한 프로세서 코어의 일 실시예의 동작을 나타내는 상태 머신도이다.5 is a state machine diagram illustrating the operation of one embodiment of a processor core to enter a low power state while waiting for an update of a cache line.

도 6은 동일한 노드에서의 프로세서 코어가 모니터링된 캐시 라인을 업데이트하는 경우 프로세서 코어의 일 실시예의 동작을 나타내는 예이다.6 is an example illustrating operation of one embodiment of a processor core when the processor core in the same node updates the monitored cache line.

도 7은 또 다른 노드에서의 프로세서 코어가 모니터링된 캐시 라인을 업데이트하는 경우 프로세서 코어의 일 실시예의 동작을 나타내는 예이다.7 is an example illustrating operation of one embodiment of a processor core when the processor core at another node updates the monitored cache line.

본 발명의 다양한 수정과 대안적 형태가 가능하지만, 그 특정 실시예들은 도면에서 예로서 도시되며, 그리고 본원에서 세부적으로 설명된다. 그러나, 이것에 대한 도면과 세부적 설명은 본 발명을 개시되는 특별한 형태에 한정하려는 것이 아니고, 그 반대로 본 발명은 첨부되는 특허청구범위에 의해 정해지는 바와 같이 본 발명의 사상과 범위 내에 있는 모든 수정물, 등가물 및 대안적인 것을 포괄하는 것을 의도한다는 것을 이해해야한다.While various modifications and alternative forms of the invention are possible, specific embodiments thereof are shown by way of example in the drawings and are described in detail herein. However, the drawings and detailed description thereof are not intended to limit the invention to the particular forms disclosed, and on the contrary, the invention is contemplated by all modifications that fall within the spirit and scope of the invention as defined by the appended claims. It should be understood that the intention is to cover equivalents, alternatives, and alternatives.

x86 명령 세트 구조(MONITOR 및 MWAT 명령들을 정의하는 적어도 SSE3 확장(extensions)을 포함하며, AMD64^TM 확장 또는 어떤 다른 확장과 같은 다른 확장을 포함할 수 있음)를 구현하는 프로세서 코어를 포함하는 예시적 실시예들이 아래에서 설명된다. 다른 실시예들은 어떤 명령 세트 구조를 구현할 수 있고, 그리고 모 니터링 되는 어드레스 범위(예를 들어, 캐시 라인, 또는 다른 어떤 어드레스 범위)를 확립하도록 정의되고 그리고 프로세서 코어로 하여금 모니터링 되는 어드레스 범위 내에서 적어도 하나의 바이트에 대한 업데이트를 기다리는 상태에 들어가도록 정의된 하나 또는 그 이상의 명령들을 포함할 수 있다. 즉, 명령 또는 명령들을 실행하는 것에 응답하여, 프로세서 코어는 어드레스 범위를 모니터링 할 수 있고 모니터링되는 어드레스 범위 내에서의 업데이트를 기다리는 상태로 들어갈 수 있다. MONITOR 및 MWAIT 명령들은 이러한 명령들의 예로서 사용된다. 명세서에서의 편리함을 위해, MONITOR 명령은 모니터 명령으로 언급되며(대문자가 아님) 그리고 MWAIT는 MWait 명령으로 언급된다(단지 M과 W만 대문자임).Example implementation that includes a processor core that implements an x86 instruction set structure (including at least SSE3 extensions defining MONITOR and MWAT instructions, and may include other extensions such as AMD64 ^™ extensions or any other extension) Examples are described below. Other embodiments may implement some instruction set structure, and are defined to establish a monitored address range (eg, cache line, or any other address range) and allow the processor core to at least within the monitored address range. It may include one or more instructions defined to enter a state waiting for an update for one byte. That is, in response to executing the instruction or instructions, the processor core may monitor the address range and enter a state waiting for an update within the monitored address range. The MONITOR and MWAIT instructions are used as examples of these instructions. For convenience in the specification, the MONITOR command is referred to as the monitor command (not in uppercase) and MWAIT is referred to as the MWait command (only M and W are uppercase).

이제 도 1로 돌아가면, 컴퓨터 시스템(10)의 일 실시예의 블록도가 도시된다. 도시된 실시예에서, 컴퓨터 시스템(10)은 노드(nodes)(12A-12B)와, 메모리(memory)(12A-1B)와, 주변 장치(16A-16B)를 포함한다. 노드(12A-12B)는 연결되고, 그리고 노드(12B)는 주면 장치(16A-16B)에 연결된다. 노드(12A-12B) 각각은 각각각 메모리(14A-14B)에 연결된다. 노드(12A)는 브리지(bridge)(20A)에 연결된 프로세서 코어(18A-18B)를 포함하며, 이 브리지는 메모리 제어기(22A)와 복수의 하이퍼트랜스포트(HyperTransport^TM, HT) 인터페이스(interface) 회로(24A-24C)에 더 연결된다. 노드(12B)는 유사하게 브리지(20B)에 연결된 프로세서 코어(18C-18D)를 포함하며, 이 브리지는 메모리 제어기(22B)와 복수의 하이퍼트랜스포트(HT) 인터페이스 회로(24D-24F)에 더 연결된다. HT 회로들(24C-24D)은 (본 실시예에서는, HT 인 터페이스를 통해) 연결되고 그리고 HT 회로(24F)는 주변 장치(16A)에 연결되며, 이 주변 장치(16A)는 (본 실시예에서는, HT 인터페이스를 사용하여) 데이지 체인(daisy chain) 구성으로 주변 장치(16B)에 연결된다. 메모리 제어기(22A-22B)는 각각 메모리(14A-14B)에 연결된다.Turning now to FIG. 1, shown is a block diagram of one embodiment of a computer system 10. In the illustrated embodiment, computer system 10 includes nodes 12A-12B, memory 12A-1B, and peripherals 16A-16B. Nodes 12A-12B are connected, and node 12B is connected to principal plane devices 16A-16B. Each of the nodes 12A-12B is connected to each memory 14A-14B, respectively. Node 12A includes a processor core 18A-18B coupled to bridge 20A, which bridge is comprised of a memory controller 22A and a plurality of HyperTransport ^™ (HT) interface circuits. Further connected to (24A-24C). Node 12B similarly includes processor cores 18C-18D connected to bridge 20B, which bridge is further coupled to memory controller 22B and a plurality of hypertransport (HT) interface circuits 24D-24F. Connected. The HT circuits 24C-24D are connected (in this embodiment, via the HT interface) and the HT circuit 24F is connected to the peripheral device 16A, which peripheral device 16A is connected to (the present embodiment). In the example, it is connected to the peripheral device 16B in a daisy chain configuration (using a HT interface). Memory controllers 22A-22B are connected to memory 14A-14B, respectively.

프로세서 코어(18A-18B)의 일 실시예의 추가적인 세부사항이 도 1에 도시된다. 프로세서 코어(18C-18D)는 비슷할 수 있다. 예시된 실시예에서, 프로세서 코어(18A)는 레지스터(registers)(28A-28B)와 비교기(30A-30B)를 포함하는 모니터 유닛(26A)을 포함한다. 레지스터(28A)는 비교기(30A)에 연결되고, 이 비교기(30A)는 브리지(20A)에 더 연결되어 인터페이스로부터 무효화 프로브(invalidating probe)(P-Inv)의 어드레스를 수신한다. 레지스터(28B)는 비교기(30B)에 연결되고, 이 비교기(30B)는 프로세서 코어(18A)로부터 저장 어드레스(StAddr)를 수신하기 위해 더 연결된다. 비교기(30B)의 출력은 Wakeup-ST 신호로서 모니터 유닛(26B)에 연결된다. 모니터 유닛(26B)은, 예시된 실시예에서, 레지스터(28A-28B)와 비교기(30A-30B)와 유사하게 각각 레지스터(28C-28D)와 비교기(30C-30D)를 포함한다. 비교기(30D)의 출력은 Wakeup-ST 신호로서 모니터 유닛(26A)에 연결된다. 레지스터(28A)는 레지스터(28D)에 연결되고 그리고 레지스터(28B)는 레지스터(28C)에 연결된다.Additional details of one embodiment of processor cores 18A-18B are shown in FIG. 1. Processor cores 18C-18D may be similar. In the illustrated embodiment, processor core 18A includes a monitor unit 26A that includes registers 28A- 28B and comparators 30A-30B. Register 28A is coupled to comparator 30A, which is further coupled to bridge 20A to receive the address of an invalidating probe (P-Inv) from the interface. Register 28B is coupled to comparator 30B, which is further coupled to receive a storage address StAddr from processor core 18A. The output of comparator 30B is connected to monitor unit 26B as a Wakeup-ST signal. The monitor unit 26B includes, in the illustrated embodiment, registers 28C-28D and comparators 30C-30D, respectively, similar to registers 28A- 28B and comparators 30A-30B. The output of comparator 30D is connected to monitor unit 26A as a Wakeup-ST signal. Register 28A is coupled to register 28D and register 28B is coupled to register 28C.

프로세서 코어들(18A-18D) 각각은 모니터 명령을 실행하는 것에 응답하여 어드레스 범위를 모니터링 하도록 구성될 수 있다. 추가적으로, 모니터링 프로세서 코어(18A-18D)는 모니터링 되는 어드레스 범위를 식별하는 어드레스 범위 표시를 적어도 하나의 다른 프로세스 코어(18A-18D)("수신하는 프로세서 코어")에 전달할 수 있다. 예를 들어, 예시된 실시예에서, 모니터링 프로세서 코어(18A-18D)는 어드레스 범위 표시를 동일한 노드(12A-12B)에서의 다른 프로세서 코어(18A-18D)에 전달할 수 있다. 즉, 프로세서 코어(18A)는 그 어드레스 범위 표시를 프로세서 코어(18B)에 전달할 수 있고(그 반대의 경우도 마찬가지), 그리고 프로세서 코어(18C)는 그 어드레스 범위 표시를 프로세서 코어(18D)에 전달할 수 있다(그 반대의 경우도 마찬가지). 수신하는 프로세서 코어(18A-18D)는, 수신하는 프로세서 코어(18A-18D)가 명령 실행에 응답하여 수행하는 어드레스 범위에 대한 저장 동작에 대해 모니터링 할 수 있다. 만약 이러한 저장이 탐지된다면, 수신하는 프로세서 코어(18A-18D)는 모니터링하는 프로세서 코어(18A-18D)에 신호를 보낼 수 있다. 예를 들어, 예시된 실시예에서, 수신하는 프로세서 코어(18A-18D)는 모니터링하는 프로세서 코어(18A-18D)에 대한 Wakeup-ST 신호를 어서팅(asserting) 할 수 있다. 모니터링하는 프로세서 코어(18A-18D)는 상기 신호에 응답하여 MWait 명령의 실행을 통해 들어간 상태에서 (만약 그 상태에 여전히 있다면) 나올 수 있다. 어떤 실시예에서, 모니터링 되는 어드레스 범위에 저장 동작의 탐지 신호를 보내는 수신하는 프로세서 코어는, 정상 통신 인터페이스를 통한 코히런시 통신의 전송을 통해 일어날 수 있는 것보다 더 빠르게 모니터링 프로세서 코어가 상기 상태에서 나올 수 있게 할 수 있다.Each of the processor cores 18A-18D may be configured to monitor an address range in response to executing a monitor command. In addition, the monitoring processor cores 18A-18D may pass an address range indication to at least one other process core 18A-18D (“receiving processor core”) that identifies the address range being monitored. For example, in the illustrated embodiment, the monitoring processor cores 18A-18D may pass address range indications to other processor cores 18A-18D at the same nodes 12A-12B. That is, processor core 18A may pass its address range indication to processor core 18B (or vice versa), and processor core 18C may pass its address range indication to processor core 18D. Can be used (or vice versa). Receiving processor cores 18A-18D may monitor for storage operations for address ranges that the receiving processor cores 18A-18D perform in response to instruction execution. If such storage is detected, the receiving processor cores 18A-18D may signal the monitoring processor cores 18A-18D. For example, in the illustrated embodiment, the receiving processor cores 18A-18D may assert a Wakeup-ST signal for the monitoring processor cores 18A-18D. The monitoring processor cores 18A-18D may exit (if still in that state) from entering through execution of the MWait instruction in response to the signal. In some embodiments, a receiving processor core that sends a detection signal of a storage operation to a monitored address range is faster than the monitoring processor core is in this state than can occur through the transmission of coherency communication over a normal communication interface. I can make it come out.

일반적으로, 어드레스 범위 표시는 모니터링 되는 어드레스 범위를 정의하는 어떠한 값 또는 값들이 될 수 있다. 예를 들어, 어드레스 범위는 메모리 내의 인접 하는 바이트들의 블록에 대응할 수 있다. 만약 블록의 크기가 고정되어 있다면(예를 들어 캐시 라인, 또는 고정된 수의 캐시 라인, 또는 캐시 라인의 부분), 블록의 베이스 어드레스(base address)가 사용될 수 있다. 마찬가지로, 만약 크기가 변할 수 있지만 프로세서 코어들(18A-18D) 각각이 동일한 크기로 프로그래밍 된다면, 베이스 어드레스가 사용될 수 있다. 다른 실시예에서, 베이스 어드레스 및 크기 또는 베이스 어드레스 및 엔딩 어드레스(ending address)는 어드레스 범위를 식별할 수 있다. 이 설명의 나머지 부분에 대해서, 캐시 라인이 어드레스 범위의 크기이고 그리고 캐시 라인의 베이스 어드레스가 어드레스 범위 표시로서 사용되는 실시예가 예로서 사용된다. 그러나, 다른 실시예들은 어떠한 사이즈 어드레스 범위 및 어떠한 대응하는 어드레스 범위 표시를 사용할 수 있다.In general, the address range indication can be any value or values that define the address range being monitored. For example, the address range may correspond to a block of contiguous bytes in the memory. If the size of the block is fixed (e.g., a cache line, or a fixed number of cache lines, or portions of the cache line), the base address of the block can be used. Likewise, if the size may vary but each of the processor cores 18A-18D is programmed to the same size, the base address may be used. In other embodiments, the base address and size or base address and ending address may identify the address range. For the remainder of this description, an embodiment where the cache line is the size of the address range and the base address of the cache line is used as the address range indication is used as an example. However, other embodiments may use any size address range and any corresponding address range indication.

프로세서 코어(18A-18B)가 모니터 유닛(26A-26B)을 포함하도록 도 1에서 더 세부적으로 도시된다. 모니터 유닛(26A)(및 그 레지스터(28A-28B) 및 비교기(30A-30B))의 동작이 더 세부적으로 설명되며, 모니터 유닛(26B)의 동작은 비슷할 수 있다. 레지스터(28A)는 모니터 유닛(28A)에 의해 모니터링 되는 어드레스(MAddr)를 저장한다. 즉, 프로세서 코어(18A)는 프로세서 코어(18A)에 의한 모니터 명령의 실행 동안 발생되는 어드레스로 레지스터(28A)를 라이팅할 수 있다. MAddr은 어드레스에 의해 표시되는 캐시 라인의 업데이트를 표시하는 어떠한 통신으로 프로세서 코어(18A)에 공급되는 어드레스와 비교기(30A)를 통해 비교된다. 예를 들어, 예시된 실시예에서, 무효화 프로브(P-Inv)는 업데이트의 표시일 수 있다. 일반적으로, 프로브(probe)는, 프로브의 수신기가 프로브에 의해 식별되는 캐시 라인을 가지는 지를 결정하고 그리고 만약 발견된다면 그 캐시 라인에 대한 상태 변화를 특정하도록 코히런시 스킴(coherency schemes)에서 사용되는 통신이다(그리고 가능하게는 수정된 캐시 라인이 메모리 또는 요구기(requestor)로 리턴(return) 하도록 요구한다). 무효화 프로브는 캐시 라인의 상태 변화를 무효(invalid)로 특정한다. 무효화 프로브는 소스 디바이스(source device)(예를 들어 프로세서, 주변 장치 등)에 의해 업데이트되는 다른 캐시에서의 캐시 라인을 무효화하도록 어떤 코히런시 스킴에서 사용될 수 있다. 다른 표시들이 사용될 수 있다. 예를 들어, 쓰기 동작은 무효화 프로브 대신에 또는 무효화 프로브에 더하여 사용될 수 있다. 또 다른 예로서, 읽기 동작의 소스가 캐시 라인을 수정할 예정이라고 표시하는 읽기 동작은 업데이트의 표시 일 수 있다. 이러한 읽기 동작은 종종 동작을 수정하려는 목적의 읽기, 읽기 수정 동작(read modify operation), 또는 읽기 배타적 동작(read exclusive operation)으로 언급된다. 다른 실시예에서, MAddr은 프로세서 코어(18A)에 의해 수신되는 어떠한 프로브의 주소와 비교될 수 있고, 심지어는 프로브가 업데이트를 표시하지 않을지라도 그러하다. 이러한 비교는 프로세서 코어(18A)로 하여금 MWait 상태에서 나오게 하고 (명령 시퀀스 내의 MWait 명령을 따르는 명령들을 통해) 캐시 라인을 다시 읽도록 할 수 있다. 이러한 방식으로, 소프트웨어는 프로브를 야기하는 모니터링된 캐시 라인에 대한 억세스의 소스가 캐시 라인의 배타적 카피(exclusive copy)를 수신하지 않는 것을 확실하게 할 수 있다(그 다음으로 이것은 무효화 프로브를 야기하지 않고 이후에 업데이트 할 수 있다).It is shown in more detail in FIG. 1 so that processor cores 18A-18B include monitor units 26A- 26B. The operation of monitor unit 26A (and its registers 28A- 28B and comparators 30A-30B) is described in more detail, and the operation of monitor unit 26B may be similar. The register 28A stores the address MAddr monitored by the monitor unit 28A. In other words, processor core 18A may write register 28A to an address generated during execution of a monitor command by processor core 18A. MAddr is compared via comparator 30A with an address supplied to processor core 18A in any communication indicating an update of the cache line indicated by the address. For example, in the illustrated embodiment, the invalidation probe P-Inv may be an indication of an update. In general, probes are used in coherency schemes to determine if a probe's receiver has a cache line identified by the probe and, if found, to specify a change in state for that cache line. Communication (and possibly requiring a modified cache line to return to memory or a requestor). The invalidation probe specifies the state change of the cache line as invalid. Invalidation probes can be used in some coherency schemes to invalidate cache lines in other caches that are updated by a source device (eg, processor, peripheral, etc.). Other indications may be used. For example, a write operation can be used in place of or in addition to an invalidation probe. As another example, the read operation indicating that the source of the read operation is to modify the cache line may be an indication of an update. This read operation is often referred to as read, read modify operation, or read exclusive operation for the purpose of modifying the operation. In another embodiment, MAddr may be compared to the address of any probe received by processor core 18A, even if the probe does not indicate an update. This comparison may cause processor core 18A to exit the MWait state and reread the cache line (via instructions that follow the MWait instruction in the instruction sequence). In this way, the software can ensure that the source of access to the monitored cache line causing the probe does not receive an exclusive copy of the cache line (this then does not cause an invalidation probe) Can be updated).

만약 매치(match)가 비교기(30A)에 의해 탐지된다면, 모니터 유닛(26A)은 프 로세서 코어(18A)로 하여금 (예를 들어, 도 1에서의 WMait 신호의 어서팅을 통해) MWait 명령에 응답하여 들어간 상태에서 나오게 한다. 프로세서 코어(18A)는 MWait 명령의 다음에 오는 명령들을 계속 실행할 수 있다. 소프트웨어는 모니터링 되는 캐시 라인 안에서의 값을 점검하기 위해 MWait 명령의 다음에 오는 명령들을 포함할 수 있고, 만약 요구된 상태가 발견되지 않으면, 모니터 명령/MWait 명령으로 다시 분기하여 상기 상태에 다시 들어간다.If a match is detected by the comparator 30A, the monitor unit 26A causes the processor core 18A to respond to the MWait command (e.g., by asserting the WMait signal in FIG. 1). To get out of the state. The processor core 18A may continue to execute instructions following the MWait instruction. The software may include instructions following the MWait command to check the value in the monitored cache line, and if the required state is not found, branches back to the monitor command / MWait command and reenters the state.

모니터 유닛(26A)은 또한 모니터링 되는 캐시 라인의 어드레스를 모니터 유닛(26B)에 전달한다. 예시된 실시예에서, 모니터 유닛(26A)은 레지스터(28A)로부터의 어드레스를 모니터 유닛(26B)에 직접 출력할 수 있다. 다른 실시예에서, 어드레스는 다른 방식으로 전달될 수 있다. 예를 들어, 어드레스는 인터페이스를 통해 브리지(20A)에 전송될 수 있고(예를 들어, 어드레스가 모니터링되는 어드레스라는 것을 표시하도록 코드화된 통신으로서) 그리고 브리지(20A)는 프로세서 코어(18B)에 대한 통신을 라우팅(routing) 할 수 있다.Monitor unit 26A also conveys the address of the cache line to be monitored to monitor unit 26B. In the illustrated embodiment, monitor unit 26A may directly output an address from register 28A to monitor unit 26B. In other embodiments, the address may be conveyed in other ways. For example, an address can be sent over the interface to bridge 20A (eg, as communication coded to indicate that the address is a monitored address) and bridge 20A to processor core 18B. You can route communications.

유사한 방식으로, 모니터 유닛(26A)은 모니터 유닛(26B)에 의해 모니터링 되는 어드레스를 수신할 수 있다. 예시된 실시예에서, 모니터 유닛(26A)은 모니터 유닛(26B)으로부터 모니터링된 어드레스(도 1에서 MAddrS)의 섀도우 카피(shadow copy)를 저장하도록 레지스터(26B)를 포함한다. 모니터 유닛(26A)은 프로세서 코어(18A)에 의해 수행된 저장 동작의 어드레스(도 1에서 StAddr)와 MAddrS를 비교한다. 만약 MAddrS 어드레스에 의해 표시된 캐시 라인에 대한 저장이 탐지된다면(비교기 30B), 모니터 유닛(26A)은 모니터 유닛(26B)에 대한 Wakeup-ST 신호를 어서팅 할 수 있다. 다른 실시예에서, 모니터 유닛(26B)은 레지스터(28C)의 출력을 모니터 유닛(26A)에 계속 공급할 수 있고, 그리고 레지스터(28B)는 이러한 실시예에서 구현될 수 없다.In a similar manner, monitor unit 26A can receive an address monitored by monitor unit 26B. In the illustrated embodiment, monitor unit 26A includes a register 26B to store a shadow copy of the monitored address (MAddrS in FIG. 1) from monitor unit 26B. The monitor unit 26A compares the address of the storage operation performed by the processor core 18A (StAddr in FIG. 1) with MAddrS. If storage for the cache line indicated by the MAddrS address is detected (comparator 30B), monitor unit 26A may assert a Wakeup-ST signal to monitor unit 26B. In another embodiment, monitor unit 26B may continue to supply the output of register 28C to monitor unit 26A, and register 28B may not be implemented in this embodiment.

모니터 유닛(26B)은 모니터 유닛(26A)에 대한 Wakeup-ST 신호를 유사하게 발생시킬 수 있다. 모니터 유닛(26B)으로부터 어서팅된 Wakeup-ST 신호를 수신하는 것에 응답하여, 모니터 유닛(26A)은 MWait 명령에 응답하여 들어간 상태에서 나오도록 구성되며, 이것은 캐시 라인에 대한 무효화 프로브를 탐지하는 것과 유사하다.The monitor unit 26B can similarly generate a Wakeup-ST signal for the monitor unit 26A. In response to receiving the asserted Wakeup-ST signal from the monitor unit 26B, the monitor unit 26A is configured to exit in response to the MWait command, which is equivalent to detecting an invalidation probe for the cache line. similar.

일반적으로, 프로세서 코어(18A)는 컴퓨터 시스템(10)의 다른 컴포넌트들(예를 들어, 주변 장치(16A-16B), 모니터링된 어드레스의 섀도우 카피 및 Wakeup-ST 신호의 상기 언급된 프로세서 코어(18B)와의 통신을 제외하고서의 프로세서 코어(18B-18D), 메모리 제어기(22A-22B), 등)과 통신하기 위해 브리지(20A)에 대한 인터페이스를 사용할 수 있다. 이 인터페이스는 어떤 요구된 방식으로 설계될 수 있다. 캐시 코히런트(cache coherent) 통신은 상기 언급된 바와 같이, 인터페이스를 위해 정의될 수 있다. 일 실시예에서, 브리지(20A)와 프로세서 코어(18A-18B) 사이의 인터페이스 상에서의 통신은 HT 인터페이스 상에 사용된 것과 유사한 패킷(packet)들의 형태일 수 있다. 다른 실시예에서는, 어떤 요구된 통신이 사용될 수 있다(예를 들어, 버스 인터페이스 상에서의 트랜잭션(transaction)). 다른 실시예에서, 프로세서 코어(18A-18B)는 브리지(20A)에 대한 인터페이스를 공유할 수 있다(예를 들어 공유된 버스 인터페이스).In general, processor core 18A may be configured with other components of computer system 10 (eg, peripherals 16A-16B, shadow copies of monitored addresses, and the aforementioned processor core 18B of Wakeup-ST signals. Interface to bridge 20A may be used to communicate with processor cores 18B-18D, memory controllers 22A- 22B, etc. This interface can be designed in any required way. Cache coherent communication may be defined for an interface, as mentioned above. In one embodiment, the communication on the interface between the bridge 20A and the processor cores 18A-18B may be in the form of packets similar to those used on the HT interface. In other embodiments, any required communication can be used (eg, a transaction on a bus interface). In other embodiments, processor cores 18A-18B may share an interface to bridge 20A (eg, a shared bus interface).

브리지(20A)는 일반적으로 프로세서 코어(18A-18B) 및 HT 회로(24A-24C)로부터의 통신들을 수신하도록 구성되고 그리고 통신 타입, 통신 내의 어드레스 등에 의존하는 메모리 제어기(22A), HT 회로(24A-24C), 프로세서 코어(18A-18B)에 이러한 통시들을 라우팅하도록 구성된다. 일 실시예에서, 브리지(20A)는 시스템 요구 큐(System Request Queue, SRQ)를 포함하며, 수신된 통신들은 브리지(20A)에 의해 SRQ에 라이팅된다. 브리지(20A)는 프로세서 코어(18A-18B), HT 회로(24A-24C), 및 메모리 제어기(22A) 사이에서 목적지 또는 목적지들로의 라우팅을 위해 SRQ로부터의 통신을 스케쥴링(scheduling) 할 수 있다. 브리지(20B)는 프로세서 코어(18C-18D), HT 회로(24D-24F), 및 메모리 제어기(22B)에 관하여 유사할 수 있다.The bridge 20A is generally configured to receive communications from the processor cores 18A-18B and the HT circuits 24A-24C and depends on the communication type, address in the communication, etc., the memory controller 22A, the HT circuit 24A. -24C), configured to route these communications to processor cores 18A-18B. In one embodiment, bridge 20A includes a System Request Queue (SRQ), and received communications are written to SRQ by bridge 20A. The bridge 20A may schedule communications from the SRQ for routing to or from destinations between the processor cores 18A-18B, the HT circuits 24A-24C, and the memory controller 22A. . Bridge 20B may be similar with respect to processor cores 18C-18D, HT circuits 24D-24F, and memory controller 22B.

메모리(14A-14B)는 적당한 메모리 디바이스를 포함할 수 있다. 예를 들어, 메모리(14A-14B)는 하나 또는 그 이상의 램버스 DRAM(Rambus DRAM, RDARM) 동기식 DRAM(synchronous DRAM), 더블 데이터 레이트(Double Data Rate, DDR) SDRAM, 스태틱(static) RAM 등을 포함할 수 있다. 컴퓨터 시스템(10)의 어드레스 공간은 메모리들(14A-14B) 사이에서 나누어질 수 있다. 각 노드(12A-12B)는, 어떤 어드레스가 어떤 메모리(14A-14B)에 매핑(mapping) 되는 지를 결정하는 위해, 그리고 따라서 특별한 어드레스에 대한 메모리 요구가 어떤 노드(12A-12B)에 라우팅되어야 하는 지를 결정하기 위해 사용되는 메모리 맵(memory map)을 (예를 들어, 브리지(20A) 내에) 포함할 수 있다. 메모리 제어기(22A-22B)는 메모리(14A-14B)로의 인터페이싱을 위한 제어 회로를 포함할 수 있다. 추가적으로, 메모리 제어기(22A-22B)는 메모리 요구 등을 큐잉(queuing)하기 위해 요구 큐들을 포함할 수 있다.Memory 14A-14B may include a suitable memory device. For example, the memory 14A-14B includes one or more Rambus DRAMs (RDARMs) synchronous DRAMs, Double Data Rate (DDR) SDRAMs, static RAMs, and the like. can do. The address space of computer system 10 may be divided between memories 14A-14B. Each node 12A-12B must determine which address is mapped to which memory 14A-14B, and therefore the memory request for a particular address must be routed to which node 12A-12B. Memory maps (eg, within the bridge 20A) that are used to determine whether or not the device is located. Memory controllers 22A-22B may include control circuitry for interfacing to memory 14A-14B. Additionally, memory controllers 22A-22B may include request queues for queuing memory requests and the like.

HT 회로(22A-24F)는 HT 링크(link)로부터의 패킷들을 수신하고 HT 링크 상에 패킷들을 전송하기 위해 다양한 버퍼 및 제어 회로를 포함할 수 있다. HT 인터페이스는 패킷들을 전송하기 위한 단방향성 링크를 포함한다. 각 HT 회로(24A-24F)는 이러한 두 개의 링크들(전송을 위한 것 하나와 수신을 위한 것 하나)에 연결될 수 있다. 소정의 HT 인터페이스는 캐시 코히런트 방식에서 (예를 들어, 노드(12A-12B) 사이에서) 동작 될 수 있거나 또는 비-코히런트 방식에서 (예를 들어, 주변 장치(16A-16B)로/로부터) 동작 될 수 있다. 예시된 실시예에서, HT 회로(24C와 24D)는 노드(12A-12B) 사이의 통신을 위한 코히런트 HT 링크들을 통해 연결된다. HT 회로(24A-24B와 24E)는 사용되지 않으며, 그리고 HT 회로(24F)는 주변 장치(16A-16B)에 대한 비-코히런트 링크들을 통해 연결된다.The HT circuits 22A-24F may include various buffers and control circuitry for receiving packets from the HT link and transmitting packets on the HT link. The HT interface includes a unidirectional link for transmitting packets. Each HT circuit 24A-24F may be connected to these two links (one for transmission and one for reception). Certain HT interfaces may be operated in a cache coherent manner (eg, between nodes 12A-12B) or in / out of peripheral devices 16A-16B (eg, in a non-coherent manner). Can be operated. In the illustrated embodiment, the HT circuits 24C and 24D are connected via coherent HT links for communication between nodes 12A-12B. HT circuits 24A-24B and 24E are not used, and HT circuit 24F is connected via non-coherent links to peripherals 16A-16B.

주변 장치(16A-16B)는 어떤 타입의 주변 장치일 수 있다. 예를 들어, 주변 장치(16A-16B)는 또 다른 컴퓨터 시스템과 통신하기 위한 디바이스를 포함할 수 있고, 이 디바이스는 상기 또 다른 컴퓨터 시스템에 연결될 수 있다(예를 들어, 네트워크 인터페이스 카드 또는 모뎀). 더욱이, 주변 장치(16A-16B)는 비디오 가속기, 오디오 카드, 하드 또는 플로피 디스크 드라이브 또는 드라이브 제어기 SCSI(Small Computor Systems Interface) 어댑터 및 전화통신 카드(telephony card), 사운드 카드, GPIB 또는 필드(field) 버스 인터페이스 카드와 같은 다양한 데이터 획득 카드를 포함할 수 있다. 용어 "주변 장치"는 입력/출력(I/O) 디바이스를 포함하려는 의도임을 주의해야 한다.Peripheral devices 16A-16B may be any type of peripheral device. For example, peripherals 16A-16B may include a device for communicating with another computer system, which device may be coupled to the another computer system (eg, a network interface card or modem). . Furthermore, peripherals 16A-16B may include video accelerators, audio cards, hard or floppy disk drives, or drive controller Small Computor Systems Interface (SCSI) adapters and telephony cards, sound cards, GPIB, or fields. Various data acquisition cards such as bus interface cards may be included. It should be noted that the term "peripheral device" is intended to include input / output (I / O) devices.

일 실시예에서, 노드(12A-12B) 각각은 본원의 도 1에 도시된 회로를 포함하 는 단일 집적 회로 칩(chip)일 수 있다. 즉, 각 노드(12A-12B)는 칩 멀티프로세서(Chip MultiProcessor, CMP)일 수 있다. 다른 실시예는 요구된 바와 같이 두 개 또는 그 이상의 개별 집적 회로로서 노드(12A-12B)를 구현할 수 있다. 어떤한 레벨의 집적 및 개별 컴포넌트들이 사용될 수 있다.In one embodiment, each of the nodes 12A-12B may be a single integrated circuit chip that includes the circuitry shown in FIG. 1 herein. That is, each node 12A-12B may be a chip multiprocessor (CMP). Other embodiments may implement nodes 12A-12B as two or more separate integrated circuits as required. Any level of integration and discrete components can be used.

일반적으로, 프로세서 코어(18A-18D)는 소정의 명령 세트 구조에서 정의된 명령들을 실행하기 위해 설계된 회로를 포함할 수 있다. 즉, 프로세서 코어 회로는 명령 세트 구조 내에서 정의된 명령들의 결과를 가져오고(fetch), 디코딩(decodign)하고, 실행하고, 그리고 저장하도록 구성될 수 있다. 프로세서 코어(18A-18D)는 수퍼파이프라인(superpipelined), 스퍼스칼라(superscalar), 또는 그것들의 조합을 포함하는, 어떤 요구된 구성을 포함할 수 있다. 다른 구성은 스칼라(scalar), 파이프라인(pipelined), 비-파이프라인(non-pipelined) 등을 포함할 수 있다. 다양한 실시예들이 비순서적 추론 실행(out of orde speculative execution) 또는 순서적 실행을 사용할 수 있다. 프로세서 코어는, 상기 구성들 중 어느 하나와 함께, 하나 또는 그 이상의 명령들 또는 다른 기능들에 대한 마이크로코딩(microcoding)을 포함할 수 있다. 다양한 실시예들이 캐시, 변환 색인 버퍼(Translation Lookaside Buffers, TLBs) 등과 같은 다양한 다른 디자인 특성을 구현할 수 있다. CMP 실시예에서, 소정의 노드(12A-12B) 내의 프로세서 코어는 CMP 내에 포함된 회로를 포함할 수 있다. 다른 실시예에서, 프로세서 코어(18A-18D)는 각각 별개의 집적 회로를 포함할 수 있다.In general, processor cores 18A-18D may include circuitry designed to execute instructions defined in a given instruction set structure. In other words, the processor core circuitry may be configured to fetch, decode, execute, and store the results of the instructions defined within the instruction set structure. Processor cores 18A-18D may include any desired configuration, including superpipelined, superscalar, or a combination thereof. Other configurations may include scalars, pipelined, non-pipelined, and the like. Various embodiments may use out of orde speculative execution or sequential execution. The processor core, together with any of the above configurations, may include microcoding for one or more instructions or other functions. Various embodiments may implement various other design features, such as caches, Translation Lookaside Buffers (TLBs), and the like. In a CMP embodiment, the processor cores within a given node 12A-12B may include circuitry contained within the CMP. In other embodiments, processor cores 18A-18D may each include separate integrated circuits.

상기 설명된 바와 같이, 프로세서 코어(18A-18D)는 명령 실행 동안 저장 동 작을 수행하도록 구성될 수 있다. 다양한 실시예에서, 저장 동작은 명시적 저장 명령의 결과일 수 있고, 목적지로서 메모리 오퍼랜드(operand)를 갖는 다른 명령에서는 암시적일 수 있으며, 또는 양쪽 모두일 수 있다. 일반적으로, 저장 동작은 저장 동작과 관련된 어드레스에 의해 특정된 메모리 위치에서의 하나 또는 그 이상의 바이트들의 업데이트일 수 있다.As described above, processor cores 18A-18D may be configured to perform a storage operation during instruction execution. In various embodiments, the store operation may be the result of an explicit store instruction, may be implicit in other instructions having a memory operand as a destination, or both. In general, the storage operation may be an update of one or more bytes at a memory location specified by an address associated with the storage operation.

어서팅되고, 디어서팅(deasserting)되고, 발생되는 등의 다양한 신호들이 앞서 설명되었다. 일반적으로, 신호는 소스에 의해 수신기로 전송되는 어떠한 표시일 수 있다. 신호는 예를 들어 어서팅 또는 디어서팅 될 수 있는 하나 또는 그 이상의 신호 라인들을 포함할 수 있다.Various signals such as being asserted, deasserted, generated, and the like have been described above. In general, the signal may be any indication transmitted by the source to the receiver. The signal may include one or more signal lines that may be asserted or deasserted, for example.

주의할 것으로, 본 실시예가 노드들 사이에서의 통신 및 노드와 주변 장치 사이에서의 통신을 위해 HT 인터페이스를 사용하는 반면, 다른 실시예는 앞서의 통신 중 어느 하나를 위한 어떤 요구된 인터페이스 또는 인터페이스들을 사용할 수 있다. 예를 들어, 다른 패킷 기반의 인터페이스가 사용될 수 있고, 버스 인터페이스가 사용될 수 있고, 다양한 표준 주변 인터페이스가 사용될 수 있는 등등이다(예를 들어, 주변 컴포넌트 상호연결(Peripheral Component Interconnect, PCI), PCI 익스프레스(express) 등).It should be noted that while the present embodiment uses the HT interface for communication between nodes and for communication between nodes and peripheral devices, other embodiments use any required interface or interfaces for any of the foregoing communications. Can be used. For example, other packet-based interfaces may be used, bus interfaces may be used, various standard peripheral interfaces may be used (eg, Peripheral Component Interconnect (PCI), PCI Express), and the like. (express) and so on).

주의할 것으로, 도 1에 도시된 컴퓨터 시스템(10)은 두 개의 노드(12A-12B)를 포함하는 반면, 다른 실시예들은 하나의 노드 또는 두 개의 노드보다 많은 노드를 구현할 수 있다. 마찬가지로, 각 노드(12A-12B)는 다양한 실시예에서 두 개 또는 그 이상의 프로세서 코어들을 포함할 수 있다. 어떤 실시예에서, 노드 안의 각 각의 프로세서 코어 내의 모니터 유닛(26)은 동일한 노드 안의 각각 다른 프로세서 코어로부터의 모니터링된 캐시 라인들의 어드레스를 수신하도록 구성될 수 있고, 그리고 모니터링 되는 캐시 라인들 각각에 대한 저장 동작에 대해 모니터링 하도록 구성될 수 있다. 다른 실시예에서, 각각 다른 모니터링된 캐시 라인들에 대한 저장 동작을 탐지하기 위해 프로세서 코어들의 서브세트(subset)는 식별될 수 있고 구성될 수 있다. 컴퓨터 시스템(10)의 다양한 실시예들은 노드(12A-12B) 당 다른 수의 HT 인터페이스를 포함할 수 있고, 하나 또는 그 이상의 노드들 등등에 연결된 다른 수의 주변 장치들을 포함할 수 있다.Note that computer system 10 shown in FIG. 1 includes two nodes 12A-12B, while other embodiments may implement one node or more than two nodes. Likewise, each node 12A-12B may include two or more processor cores in various embodiments. In some embodiments, the monitor unit 26 in each processor core in the node may be configured to receive the address of monitored cache lines from different processor cores in the same node, and in each of the monitored cache lines. Can be configured to monitor for storage operations. In another embodiment, a subset of processor cores may be identified and configured to detect storage operations for each of the different monitored cache lines. Various embodiments of computer system 10 may include different numbers of HT interfaces per node 12A-12B, and may include other numbers of peripheral devices connected to one or more nodes, and the like.

도 2-4는 다양한 명령들을 실행하기 위해 프로세서 코어(18A-18D)의 일 실시예의 동작을 나타내는 순서도이고, 그리고 도 5는 프로세서 코어(18A-18D)의 일 실시예의 예시적 상태를 나타내는 상태 머신도이다. 아래의 도 2-5의 설명에서, 프로세서 코어(18A)는 예로서 사용되지만, 프로세서 코어(18B-18D)는 유사하다. 도 2-4를 통해 도시된 각각의 명령에 대해서, 명령을 실행하는 프로세서 코어(18A-18D)는 도면의 간단함과 간결함을 위해 도 2-4에서 도시되지 않은 다른 동작(예를 들어 예외에 대한 점검 등)을 수행할 수 있다.2-4 are flowcharts illustrating the operation of one embodiment of processor cores 18A-18D to execute various instructions, and FIG. 5 is a state machine representing an exemplary state of one embodiment of processor cores 18A-18D. It is also. In the description of FIGS. 2-5 below, processor core 18A is used as an example, but processor cores 18B-18D are similar. For each instruction shown through FIGS. 2-4, the processor cores 18A-18D executing the instruction may be modified to other operations (e.g., exceptions) not shown in FIGS. Checks, etc.).

이제 도 2로 돌아가서, 모니터 명령을 실행하기 위한 프로세서 코어(18A)의 일 실시예의 동작을 설명하는 순서도가 도시된다. 프로세서 코어(18A)는 도 2에 도시된 동작을 수행하기 위해 마이크로코드(microcode) 및/또는 회로를 포함할 수 있다. 도 2에 도시된 블록들이 이해를 쉽게 하기 위해 특별한 순서로 도시되었지만, 어떠한 순서가 사용될 수 있다. 더욱이, 블록들은 프로세서 코어(18A) 내의 조합 로직(combinatorial logic)에 의해 병렬로 수행될 수 있다. 다양한 실시예에서 요구되는 바와 같이 순서도에서 설명된 동작은 다수 클럭 싸이클(clock cycles)을 통해 파이프라인될 수 있고 그리고/또는 블록들은 다수 클럭 싸이클을 통해 파이프라인될 수 있다.Turning now to FIG. 2, a flow chart illustrating the operation of one embodiment of processor core 18A for executing monitor instructions is shown. Processor core 18A may include microcode and / or circuitry to perform the operations shown in FIG. 2. Although the blocks shown in FIG. 2 are shown in a particular order to facilitate understanding, any order may be used. Moreover, the blocks may be performed in parallel by combinatorial logic in processor core 18A. As required in various embodiments, the operations described in the flowcharts can be pipelined through multiple clock cycles and / or blocks can be pipelined through multiple clock cycles.

본 실시예에 있어서, 모니터링 되는 캐시 라인의 어드레스는 EAX 레지스터(또는 만약 AMD64^TM 확장이 프로세서 코어(18A)에 의해 구현된다면, RAX 레지스터)내에 존재하도록 정의된다. 다른 실시예에서, 프로세서 코어(18A)는 모니터링 되는 캐시 라인의 어드레스를 발생시키기 위해 두 개 또는 그 이상의 오퍼랜드를 더 할 수 있다. 어떤 실시예서, 만약 보호되는 모드가 인에이블(enable) 된다면, EAX 레지스터의 콘테츠(contents)는 선형 어드레스를 위해 세그먼트(segment) 레지스터들 중 하나 내에 정의된 세그먼트 베이스 어드레스(segment base address)에 더해지는 오프셋(offset)이다. 다른 실시예에서, 세그먼트 베이스 어드레스는 제로(0)일 수 있고, 그리고 EAX 레지스터의 콘텐츠는 아마 선형 어드레스와 같다. 만약 페이징(paging)이 인에이블 된다면(결정 블록(40), "예" 레그(leg)), 어드레스는 페이징 매커니즘을 통해 물리적 어드레스로 변환되는 가상 어드레스(virtual address)(예를 들어, 선형 어드레스)(블록(42))이다. 물리적 어드레스는 모니터 유닛(26A)에 의해 모니터링되는 어드레스일 수 있다. 어느 경우에나, 프로세서 코어(18A)는, 모니터링 되는 어드레스를 모니터 유닛(26A) 내에서의 MAddr 레지스터(28A)에 라이팅 할 수 있다(블록(44)). 추가적으로, 프로세서 코어(18A)는 이 어 드레스를 다른 프로세서 코어(18B)에 전달할 수 있다(블록(46)). 다른 실시예에서, 프로세서 코어(18A)는 이 어드레스를 하나 이상의 다른 프로세서 코어에 전달할 수 있다. 프로세서 코어(18A)는 모니터 유닛(26A)를 또한 "아밍(arming)" 할 수 있다(블록(48)). 일반적으로, 모니터 유닛(26A)를 아밍하는 것은 모니터 명령이 실행되는 것을 (그리고 따라서 모니터링 되는 어드레스가 모니터 유닛(26A) 내에서 확립되는 것을) 표시하는 상태에 모니터 유닛(26A)을 놓는 것을 말할 수 있다. 아밍된 상태는, 아래에서 더 세부적으로 설명되는 바와 같이, MWait 명령에 대한 응답을 결정하는 데 사용될 수 있다.In this embodiment, the address of the cache line to be monitored is defined to be in the EAX register (or RAX register if AMD64 ^™ extension is implemented by processor core 18A). In other embodiments, processor core 18A may add two or more operands to generate the address of the cache line being monitored. In some embodiments, if the protected mode is enabled, the contents of the EAX register are added to the segment base address defined in one of the segment registers for a linear address. Offset. In another embodiment, the segment base address can be zero, and the content of the EAX register is probably the same as the linear address. If paging is enabled (decision block 40, " yes " leg), the virtual address (e.g., linear address) is translated into a physical address via a paging mechanism. (Block 42). The physical address may be the address monitored by the monitor unit 26A. In either case, processor core 18A may write the monitored address to MAddr register 28A in monitor unit 26A (block 44). In addition, processor core 18A may pass this dress to another processor core 18B (block 46). In other embodiments, processor core 18A may pass this address to one or more other processor cores. Processor core 18A may also "arm" monitor unit 26A (block 48). In general, arming monitor unit 26A may refer to placing monitor unit 26A in a state that indicates that a monitor command is executed (and thus that a monitored address is established within monitor unit 26A). have. The armored state can be used to determine a response to the MWait command, as described in more detail below.

도 3은 MWait 명령을 실행하기 위한 프로세서 코어(18A)의 일 실시예의 동작을 설명하는 순서도이다. 프로세서 코어(18A)는 도 3에 도시된 동작을 수행하기 위해 마이크로코드 및/또는 회로를 포함할 수 있다. 도 3에 도시된 블록들이 이해를 쉽게 하기 위해 특별한 순서로 도시되었지만, 어떠한 순서가 사용될 수 있다. 더욱이, 블록들은 프로세서 코어(18A) 내의 조합 로직에 의해 병렬로 수행될 수 있다. 다양한 실시예에서 요구되는 바와 같이 순서도에서 설명된 동작은 다수 클럭 싸이클을 통해 파이프라인될 수 있고 그리고/또는 블록들은 다수 클럭 싸이클을 통해 파이프라인될 수 있다.3 is a flow chart illustrating the operation of one embodiment of processor core 18A for executing MWait instructions. Processor core 18A may include microcode and / or circuitry to perform the operations shown in FIG. 3. Although the blocks shown in FIG. 3 are shown in a particular order for ease of understanding, any order may be used. Moreover, the blocks may be performed in parallel by combinatorial logic in processor core 18A. As required in various embodiments, the operations described in the flowcharts can be pipelined through multiple clock cycles and / or blocks can be pipelined through multiple clock cycles.

만약 모니터 유닛(26A)이 모니터 명령의 이전 실행을 통해 아밍된다면(그리고 캐시 라인에 대한 그 다음 업데이트의 탐지가 존재하지 않는다면--결정 블록(50), "예" 레그), 프로세서 코어(18A)는 이 실시예에서 수면(sleep) 상태로 들어갈 수 있다(블록(52)). 다른 실시예들에서 MWait 명령에 응답하여 다양한 상태로 들어갈 수 있다(예를 들어, 앞서 설명된 구현 의존성 최적화된 상태). 수면 상태는 파워 보존 상태일 수 있으며, 이 상태에서 프로세서 코어(18A)는 파워 소모를 감소시키려 한다. 어떤 실시예에서 프로세서 코어(18A)는 수면 상태에서 명령을 실행하는 것을 멈출 수 있다. 다양한 실시예에서, 수면 상태는 파워 소모를 감소시키기 위해 하나 또는 그 이상의 다음의 것들을 포함할 수 있다. 프로세서 코어(18A)가 동작하는 클럭 주파수를 감소시키는 것, 다양한 회로에 대한 클럭들을 게이팅(gating)하는 것, 클럭을 턴오프(turn off)하는 것, 위상 고정 루프 또는 다른 클럭 발생 회로를 턴오프하는 것, (모니터 유닛을 제외하고) 프로세서 코어를 파워 다운(power down)하는 것, 등. 수면 상태는 예를 들어 개인용 컴퓨터 시스템 내에서의 파워 관리의 다양한 실시예에서 사용되는 스톱 그랜트(stop grant) 상태들 중 어느 하나일 수 있다. 다른 실시예에서, 다른 상태들이 사용될 수 있다. 예를 들어, 만약 프로세서 코어(18A)가 멀티-스레딩(multi-threading) 설비들(facilties)을 구현한다면, 프로세서 코어(18A)는, 모니터링 되는 캐시 라인에 대한 업데이트가 탐지될 때까지 MWait 명령의 실행에 관한 또 다른 스레드를 실행하는 것으로 스위칭(switching) 할 수 있다.If monitor unit 26A is armed through the previous execution of the monitor instruction (and if there is no detection of the next update to the cache line--decision block 50, a "yes" leg), processor core 18A May enter a sleep state in this embodiment (block 52). In other embodiments, various states may be entered in response to the MWait command (eg, implementation dependency optimized state described above). The sleep state may be a power conservation state, in which the processor core 18A attempts to reduce power consumption. In some embodiments, processor core 18A may stop executing instructions in a sleep state. In various embodiments, the sleep state may include one or more of the following to reduce power consumption. Reducing the clock frequency at which processor core 18A operates, gating clocks for various circuits, turning off the clock, turning off a phase locked loop or other clock generating circuit To power down the processor core (except the monitor unit), and the like. The sleep state can be any of the stop grant states used, for example, in various embodiments of power management within a personal computer system. In other embodiments, other states may be used. For example, if the processor core 18A implements multi-threading facilities, the processor core 18A may not execute the MWait instruction until an update to the monitored cache line is detected. You can switch by running another thread for execution.

만약 모니터 유닛(26A)이 아밍되지 않는다면(결정 블록(50), "아니오" 레그), 프로세스 코어(18A)는 MWait 명령에 관해 어떠한 행동을 취할 수 없고 그리고 MWait 명령의 다음에 오는 다음 명령과 함께 실행을 계속 할 수 있다. 모니터 유닛(26A)은, 만약 모니터 명령이 MWait 명령 이전에 실행되지 않는다면, (비록 다른 명령들이 모니터 명령과 MWait 명령 사이에서 실행될 수 있을지라도) 모니터 유 닛(26A)은 아밍될 수 없다. 추가적으로, 만약 모니터 명령이 이전에 실행된다면, 모니터 유닛(26A)은 아밍될 수 없으나, MWait 명령의 실행 전에, 모니터링된 캐시 라인의 업데이트는 탐지된다.If monitor unit 26A is not armed (decision block 50, "no" leg), process core 18A cannot take any action with respect to the MWait instruction and with the next instruction following the MWait instruction. You can continue running. The monitor unit 26A cannot be armed if the monitor command is not executed before the MWait command (although other commands can be executed between the monitor command and the MWait command). In addition, if the monitor command has been executed previously, the monitor unit 26A cannot be armed, but before the execution of the MWait command, an update of the monitored cache line is detected.

다음으로 도 4로 돌아가서, 저장 동작을 수행하기 위한 프로세서 코어(18A)의 일 실시예의 동작을 설명하는 순서도가 도시된다. 프로세서 코어(18A)는 도 4에 도시된 동작을 수행하기 위한 회로 및/또는 마이크로코드를 포함할 수 있다. 도 4에 도시된 블록들이 이해를 쉽게 하기 위해 특별한 순서로 도시되었지만, 어떠한 순서가 사용될 수 있다. 더욱이, 블록들은 프로세서 코어(18A) 내의 조합 로직에 의해 병렬로 수행될 수 있다. 다양한 실시예에서, 요구되는 바와 같이, 순서도에서 설명된 동작은 다수 클럭 싸이클을 통해 파이프라인될 수 있고 그리고/또는 블록들은 다수 클럭 싸이클을 통해 파이프라인될 수 있다.4, a flow diagram illustrating the operation of one embodiment of the processor core 18A for performing the storage operation is shown. Processor core 18A may include circuitry and / or microcode to perform the operations illustrated in FIG. 4. Although the blocks shown in FIG. 4 are shown in a particular order to facilitate understanding, any order may be used. Moreover, the blocks may be performed in parallel by combinatorial logic in processor core 18A. In various embodiments, as desired, the operations described in the flowcharts can be pipelined through multiple clock cycles and / or blocks can be pipelined through multiple clock cycles.

모니터 유닛(26A)은 저장 동작의 어드레스를 레지스터(28B)(MAddrS 어드레스를 저장하는 레지스터) 내의 어드레스에 비교한다. 만약 저장 어드레스가 MAddrS와 매칭(matching) 된다면(결정 블록(54), "예" 레그), 모니터 유닛(26A)은 프로세서 코어(18B)에 대한 Wakeup-ST 신호를 어서팅 할 수 있다(블록(56)). 어느 한 경우에나, 프로세서 코어(18A)는 메모리를 업데이트 함으로써 저장을 완료할 수 있다(블록(58)). 이 메모리는, 캐시를 구현한 프로세서 코어(18A)의 실시예에서, 캐시 내에서 업데이트될 수 있다. 추가적으로, 캐시 코히런시는 컴퓨터 시스템(10) 내에서 구현된 코히런시 프로토콜(protocol)에 따라 유지될 수 있다.The monitor unit 26A compares the address of the storage operation with the address in the register 28B (a register that stores the MAddrS address). If the storage address matches MAddrS (decision block 54, "yes" leg), monitor unit 26A may assert a Wakeup-ST signal to processor core 18B (block ( 56)). In either case, processor core 18A may complete storage by updating the memory (block 58). This memory may be updated within the cache, in an embodiment of the processor core 18A implementing the cache. In addition, cache coherency may be maintained according to a coherency protocol implemented within computer system 10.

도 5는 모니터/MWait 명령의 구현에 관한 프로세서 코어(18A)의 예시적 상태 를 도시한 상태 머신도이다. 다른 목적의 다른 상태들이 프로세서 코어(18A)의 다양한 실시예에 의해 구현될 수 있다. 도 5에는 정상 상태(60), 아밍된 상태(62), 수면 상태(64)가 도시된다.5 is a state machine diagram illustrating an example state of the processor core 18A regarding the implementation of the monitor / MWait instruction. Other states for other purposes may be implemented by various embodiments of the processor core 18A. 5 shows a steady state 60, an armed state 62, and a sleep state 64.

정상 상태(60)는 (모니터/MWait 명령에 대해 정의된 바와 같이) 명령들이 실행되고 있고 캐시 라인의 모니터링이 수행되고 있지 않은 프로세서 코어(18A)의 상태일 수 있다. 아밍된 상태(62)는 모니터 유닛(26A)이, (모니터 명령의 실행을 통해) 모니터링 되는 캐시 라인의 어드레스로 업데이트 되고 MWait 명령의 그 다음 실행을 기다리고 있는 상태일 수 있다. 수면 상태(64)는 상기 언급된 파워 보존 상태이다. 다른 상태들이 상기 언급된 바와 같이 다른 실시예에서 수면 상태(64) 대신에 사용될 수 있다.Steady state 60 may be a state of processor core 18A where instructions are being executed (as defined for a monitor / MWait instruction) and no monitoring of cache lines is being performed. The armed state 62 may be a state in which the monitor unit 26A is updated with the address of the cache line being monitored (via execution of a monitor instruction) and waiting for the next execution of the MWait instruction. Sleep state 64 is the power conservation state mentioned above. Other states may be used instead of the sleep state 64 in other embodiments as mentioned above.

만약 프로세서 코어(18A)가 정상 상태(60)에 있고 모니터 명령이 실행되고 있다면, 상태 머신은 아밍된 상태(62)로 전이한다. 아밍된 상태(62)에서, 만약 모니터링된 캐시 라인에 대한 무효화 프로브가 탐지된다면(WExit 어서팅 된다면), 또는 어서팅된 Wakeup-ST 신호가 모니터 유닛(26A)에 의해 수신된다면, 상태 머신은 다시 정상 상태(60)로 전이한다. 이 전이는 모니터링된 캐시 라인에 대한 업데이트가 MWait 명령이 수행되기 전에 일어나는 경우를 나타낸다. 다른 한편으로, 만약 MWait 명령이, 상태 머신이 아밍된 상태(62)에 있는 동안 실행된다면, 상태 머신은 수면 상태(64)로 전이한다. 상태 머신은, 모니터링된 캐시 라인에 대한 무효화 프로브(WExit), 모니터 유닛(26A)에 대한 Wakeup-ST 신호의 어서팅(Wakeup-ST), 또는 MWait 명령 및/또는 프로세서 코어 구현에 대해 정의된 어떤 다른 탈출(exit) 조 건(Other-Exit)을 탐지하는 것에 응답하여, 수면 상태(64)로부터 정상 상태(60)로 전이할 수 있다. 다른 탈출 조건은 실시예마다 변할 수 있지만, 프로세서 코어(18A)에 대한 외부 인터럽트의 전달, 프로세서 코어(18A)의 리세트(reset), 등을 포함할 수 있다.If processor core 18A is in normal state 60 and a monitor command is executing, the state machine transitions to armed state 62. In the armed state 62, if an invalidation probe for the monitored cache line is detected (if WExit asserted), or if the asserted Wakeup-ST signal is received by the monitor unit 26A, the state machine again. Transition to steady state 60. This transition indicates when an update to the monitored cache line occurs before the MWait command is executed. On the other hand, if the MWait instruction is executed while the state machine is in the armed state 62, the state machine transitions to the sleep state 64. The state machine may be configured for invalidation probes (WExit) for monitored cache lines, assertion of Wakeup-ST signals to monitor unit 26A (Wakeup-ST), or any defined for MWait instructions and / or processor core implementations. In response to detecting another exit condition, it may transition from sleep state 64 to steady state 60. Other escape conditions may vary from embodiment to embodiment, but may include the delivery of external interrupts to the processor core 18A, reset of the processor core 18A, and the like.

도 6은 동일한 노드(12A) 내에 있는 프로세서 코어(예를 들어, 프로세서 코어(18B))가 모니터링된 캐시 라인을 업데이트하는 경우 프로세서 코어(18A)의 동작을 나타내는 예이다. 프로세서 코어(18A)에 의해 실행되는 코드는 표제(굵게 된 것) "프로세서 코어 18A, 노드 12A" 아래에서 보여진다. 프로세서 코어(18B)에 의해 실행되는 코드는 표제(굵게 된 것) "프로세서 코어 18B, 노드 12A" 아래에서 보여진다. 프로세서 코어(18A)는 모니터 명령을 실행하고, 모니터 유닛(26A)을 모니터링 하고 아밍하기 위한 캐시 라인의 어드레스 "A"를 확립한다. 그 다음으로 상기 코드는 어드레스 A에 대한 점검(도 6에서 "점검 [A]"로 표시됨)을 포함하다. 이 점검은 모니터링된 캐시 라인 내의 메모리 위치를 읽는 것과 이것을 요구된 상태에 비교하는 것을 포함한다. 만약 요구된 상태가 메모리 위치 내에 있다면, 점검은 MWait 명령 주위로 분기할 수 있고 그 이후의 프로세싱을 계속할 수 있다. 점검은 모니터 명령의 실행을 갖는 레이스 조건(race condition)에서 일어나는 캐시 라인에 대한 업데이트를 탐지할 수 있다. 예를 들어, 요구된 상태가 캐시 라인 내에 존재하지 않고 프로세서 코어(18A)는 MWait 명령을 실행한다. 따라서 프로세서 코어(18A)는 수면 상태(화살표(70))로 들어간다.6 is an example illustrating operation of processor core 18A when a processor core (eg, processor core 18B) within the same node 12A updates a monitored cache line. The code executed by processor core 18A is shown under the heading (bold) “processor core 18A, node 12A”. The code executed by processor core 18B is shown under the heading (in bold) “processor core 18B, node 12A”. Processor core 18A executes a monitor instruction and establishes an address " A " of the cache line for monitoring and arming monitor unit 26A. The code then includes checking for address A (indicated by " check [A] " in FIG. 6). This check involves reading the memory location in the monitored cache line and comparing it to the required state. If the requested state is in a memory location, the check can branch around the MWait instruction and continue processing thereafter. The check can detect updates to cache lines that occur in a race condition with the execution of a monitor command. For example, the requested state does not exist in the cache line and processor core 18A executes the MWait instruction. Processor core 18A thus enters the sleep state (arrow 70).

프로세서 코어(18B)는 어드레스 A에 대한 저장 동작을 실행하고 그리고 저장 동작의 어드레스가 프로세서 코어(18A)로부터의 섀도우된 모니터 어드레스(MAddrS)와 매칭하는 지를 (모니터 유닛(28B) 내에서) 탐지한다. 따라서, 프로세서 코어(18B)(그리고 더 특별하게는 모니터 유닛(28B))는 Wakeup-ST 신호를 어서팅 함으로써(화살표(72)) 프로세서 코어(18A)(그리고 더 특별하게는 모니터 유닛(26A))에 신호를 보낸다. 프로세서 코어(18A)는 어드레스 A를 다시 점검하고(도 6에서 점검[A]) 캐시 라인 내의 요구된 상태를 탐지한다. 따라서, 프로세서 코어(18A)는 다른 명령과 함께 실행을 계속한다.The processor core 18B executes a storage operation for address A and detects (in monitor unit 28B) whether the address of the storage operation matches the shadowed monitor address MAddrS from processor core 18A. . Thus, processor core 18B (and more particularly monitor unit 28B) may be asserted by asserting the Wakeup-ST signal (arrow 72) to processor core 18A (and more specifically monitor unit 26A). Signal). Processor core 18A checks address A again (check [A] in FIG. 6) and detects the required state in the cache line. Thus, processor core 18A continues execution with other instructions.

도 7은, 또 다른 노드(12B) 내의 프로세서 코어(예를 들어, 프로세서 코어(18C))가 모니터링된 캐시 라인을 업데이트 하는 경우, 프로세서 코어(18A)의 동작을 나타내는 예이다. 프로세서 코어(18A)에 의해 실행되는 코드는 표제(굵게 된 것) "프로세서 코어 18A, 노드 12A" 아래에서 보여진다. 프로세서 코어(18C)에 의해 실행되는 코드는 표제(굵게 된 것) "프로세서 코어 18C, 노드 12B" 아래에서 보여진다. 추가적으로, 프로세서 코어(18C)와 프로세서 코어(18A) 사이의 통신의 전송은 도 7의 가운데에서 보여진다. 도 6의 예와 유사하게, 프로세서 코어(18A)는 모니터 명령을 실행하고, 모니터 유닛(26A)을 모니터링 하고 아밍하기 위해 캐시 라인의 어드레스 "A"를 확립하고, 어드레스 "A"를 점검하고, 그리고 MWait 명령을 실행한다. 따라서 프로세서 코어(18A)는 수면 상태에 들어간다(화살표(74)).7 is an example illustrating the operation of processor core 18A when a processor core (eg, processor core 18C) in another node 12B updates a monitored cache line. The code executed by processor core 18A is shown under the heading (bold) “processor core 18A, node 12A”. The code executed by processor core 18C is shown under the heading (in bold) “processor core 18C, node 12B”. Additionally, the transmission of communication between processor core 18C and processor core 18A is shown in the middle of FIG. Similar to the example of FIG. 6, processor core 18A executes a monitor command, establishes address “A” of the cache line, monitors address “A”, to monitor and arm monitor unit 26A, Then run the MWait command. Processor core 18A thus enters a sleep state (arrow 74).

프로세서 코어(18C)는 어드레스 A에 대한 저장 명령을 실행한다. 본 실시예에서는, 프로세서 코어(18C)는 프로세서 코어(18A)에 의해 모니터링 되는 어드레스의 섀도우 카피를 가지지 않으며, 따라서 저장을 완료하기 위해 코히런시 동작의 정상 전송을 계속한다. 특히, 프로세서 코어(18C)는 노드(12B) 내의 브리지(20B)에 무효화 프로브를 전송한다(화살표(76)). 브리지(20B)는 그 다음에 노드(12A)에 이 모효화 프로브를 전송한다(그리고 브리지(20A) 내에 도착한다). 브리지(20A)는 그 다음에 프로세서 코어(18A)에 이 무효화 프로브를 전송하고, 프로세서 코어(18A)는 이 무효화 프로브의 어드레스가 레지스터(28A) 내의 어드레스와 매칭되는 지를 탐지한다. 따라서, 프로세서 코어(18A)는 수면 상태에서 나온다(화살표(78)). 프로세서 코어(18A)는 어드레스 A를 다시 점검하고(도 7에서 점점[A]), 그리고 캐시 라인 내의 요구된 상태를 탐지한다. 따라서, 프로세서 코어(18A)는 다른 명령들을 갖는 실행을 계속한다.Processor core 18C executes a store instruction for address A. In this embodiment, processor core 18C does not have a shadow copy of the address monitored by processor core 18A and thus continues normal transmission of coherency operations to complete storage. In particular, processor core 18C sends an invalidation probe to bridge 20B in node 12B (arrow 76). Bridge 20B then sends this validation probe to node 12A (and arrives within bridge 20A). Bridge 20A then sends this invalidation probe to processor core 18A, which detects whether the address of this invalidation probe matches the address in register 28A. Thus, processor core 18A emerges from the sleep state (arrow 78). Processor core 18A checks address A again (increased [A] in FIG. 7) and detects the required state in the cache line. Thus, processor core 18A continues execution with other instructions.

여러 변형 및 수정이 앞서 개시된 내용을 충분히 이해했다면 본 발명의 기술 분야에서 숙련된 기술을 갖은 자들에게는 명백할 것이다. 다음의 특허청구범위는 이러한 모든 변형 및 수정을 포함하는 것으로 해석되도록 의도된 것이다.Many modifications and variations will be apparent to those of ordinary skill in the art having a thorough understanding of the foregoing. The following claims are intended to be construed to include all such variations and modifications.

본 발명은 일반적으로 프로세서 및 변화에 대한 캐시 라인의 모니터링에 응용가능하다.The present invention is generally applicable to the monitoring of processors and cache lines for changes.

Claims

In processor core 18A,

A monitor unit 26A configured to monitor an address range for update in response to a first command, wherein the processor core 18A enters a first state 64 to wait for the update to the address range. And the monitor unit 26A is configured to deliver an address range indication to the second processor core 18B identifying the address range in response to executing the first instruction, and the monitor unit 26A ) Is configured to receive a signal Wakeup-ST from the second processor core 18B indicating that the second processor core 18B is updating at least one byte in the address range, The processor core 18A is configured to exit the first state 64 in response to the signal Wakeup-ST. Processor core.

The method of claim 1,

And the address range indication comprises an address identifying a block of contiguous memory bytes.

The method of claim 1,

Further has an interface 24C for communicating with other components of the computer system, and the processor core 18A, if the indication of the update from the interface 24C indicates an update within the address range. And exit the first state (64) in response to receiving the indication of the update.

The method of claim 3, wherein

And the indication of the update is a probe.

The method of claim 1,

The monitor unit 26A is further configured to store a shadow copy of a second address range indication received from the second processor core 18B, where the second processor core 18B is in the second address range indication. And monitor for updates within the second address range indicated by the processor core.

The method of claim 5,

The monitor unit 26A signals the second processor core 18B in response to the processor core 18A executing a second storage operation to update at least one byte within the second address range. Processor core, characterized in that configured to send.

The method of claim 1,

And said first state comprises a power conservation state.

The processor core 18A cited in the claims; And

A second processor core coupled to receive the address range indication and configured to signal the processor core 18A in response to performing a store operation to update at least one byte within the address range ( 18B).

Passing from the first processor core 18A to the second processor core 18B an address range indication that identifies the address range that the first processor core 18A is monitoring for updates;

Wherein the delivery is responsive to executing a first instruction defined to cause the first processor core 18A to monitor the address range for an update;

Executing a storage operation by the second processor core (18B) to update at least one byte within the address range;

In response to the storing operation, sending a signal to the first processor core (18A); And

The first processor core (18A) exiting from a first state, wherein the first processor core (18A) is waiting for the update within the address range in response to the signal.

The method of claim 9,

The first processor core is coupled to an interface for communicating with other components of a computer system, and the method receives an indication of the update if the indication of an update from the interface indicates an update within the address range. And in response to the first processor core exiting the first state.

The method of claim 10,

The indication of the update is a probe.