KR100258358B1

KR100258358B1 - The method and apparatus for data coherency of distributed shared memory

Info

Publication number: KR100258358B1
Application number: KR1019970051082A
Authority: KR
Inventors: 정병선
Original assignee: 구자홍; 엘지전자주식회사
Priority date: 1997-10-04
Filing date: 1997-10-04
Publication date: 2000-06-01
Also published as: KR19990030726A

Abstract

PURPOSE: A method and apparatus for maintaining a data consistency of a distributed shared memory are provided to manage a local memory distributed to a plurality of nodes which mount a plurality of processors by embodying a local memory directory, and by maintaining a data consistency between local memories mounted in each node. CONSTITUTION: A local memory access address latch detects and stores an address accessed to a local memory. An address latch receives a relevant address from the local memory access address latch, and searches a local memory directory. A directory cache is composed of the 16M number of address.

Description

Method and device for maintaining data consistency in distributed shared memory

본 발명은 분산 공유 메모리(Distributed Shared-Memory)구조의 대칭형 다중 처리 시스템(Symmetric Multi-Processor : 이하 "SMP"라 약칭함)에 관한 것으로서, 보다 상세하게는 다수개의 프로세서를 탑재하고 있는 다수개의 노드에 분산되어 있는 지역 메모리를 관리하도록 지역 메모리 디렉토리를 구현함으로써, 각 노드에 탑재된 공유 메모리의 일관성을 유지시키는 분산 공유 메모리의 데이터 일관성 제어방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a symmetric multi-processor (hereinafter, abbreviated as " SMP ") of a distributed shared-memory architecture, and more particularly to a plurality of nodes equipped with a plurality of processors. The present invention relates to a method and apparatus for controlling data coherency of distributed shared memory that maintains the consistency of shared memory mounted in each node by implementing a local memory directory to manage local memory distributed in each node.

일반적으로 분산 공유 메모리 구조의 SMP 시스템은 다수개의 프로세서를 탑재한 각 노드는 하나의 운용체계 복사본을 운용한다. 이것은 노드 내의 프로세서와 메모리 사이의 상호 연결에 일관성을 유지하기 위한 상호 연결 구조를 적용해야 한다는 뜻이다. 또한 각 노드의 운용체계가 성능향상이나 애플리케이션 추가에 따른 자원 활용 면에서 많은 작업을 수행하기 때문에 개발자 애플리케이션에 적합한 시스템이다. 각 벤더들이 제공하는 SMP 아키텍처는 기본적으로 동일함으로 소프트웨어를 간단하게 이식(porting)할 수 있다.In general, SMP systems with distributed shared memory architectures have a single copy of the operating system for each node with multiple processors. This means that the interconnect structure must be applied to maintain consistency in the interconnect between the processor and the memory in the node. In addition, it is a system suitable for developer applications because each node's operating system performs a lot of work in terms of performance or resource utilization by application addition. The SMP architecture provided by each vendor is basically the same, allowing for simple porting of the software.

한편, 대형 단일 노드 SMP 시스템의 단점은 백플레인의 크기 및 속도, 그리고 공유된 시스템 버스에 따라 탑재 가능한 프로세서의 수가 제한된다는 단점이 있었다.On the other hand, the disadvantage of the large single node SMP system is that the size and speed of the backplane, and the number of processors that can be mounted depending on the shared system bus is limited.

이와 같은 단점을 개선하기 위하여 종래에는 도 1 에 도시된 바와 같이 분산 공유 메모리 구조를 갖는 SMP 시스템을 구현하게 되었다.In order to improve this disadvantage, conventionally, as shown in FIG. 1, an SMP system having a distributed shared memory structure has been implemented.

도 1은 일반적인 분산 공유 메모리 구조의 8 웨이(8-Way) SMP 시스템의 구조도로서, 이에 도시된 바와 같이 각 노드(100)(200)는 로컬버스에 최대 4개까지 지원하는 프로세서(101∼104)(201∼204)를 각각 탑재하고, 상기 각 프로세서(101∼104)(201∼204)는 버스 트랜잭션 제어부(111)(211)를 통하여 메모리(120)(220)를 공유하는 분산 공유 메모리 구조로 구성된다.1 is a structural diagram of an 8-way SMP system of a general distributed shared memory structure. As shown in FIG. 1, each node 100 and 200 supports up to four processors 101 to 104 on a local bus. Distributed memory structure, each having a plurality of memory cards 201 to 204, and each of the processors 101 to 104 and 201 to 204 sharing the memory 120 and 220 through the bus transaction control units 111 and 211. It consists of.

이와 같이 구성된 각 노드(100)(200)는 상호 연결버스를 통하여 데이터를 공유하고, 상호 연결버스에는 입/출력 버스(PCI I/O Bus)가 접속될 수 있는 PCI 브릿지(130)(131)를 공유하도록 한다.Each of the nodes 100 and 200 configured as described above shares data through an interconnection bus, and PCI bridges 130 and 131 to which an interconnection bus may be connected may be input / output buses (PCI I / O buses). Share it.

한편, 상호 연결 버스는 표준 에스씨아이(SCI : Scalable Coherent Interconnect) 버스로서, 시스템 확장성(다수개의 프로세싱 노드를 상호 연결)을 증가시키는데 용이하다.On the other hand, the interconnect bus is a standard Scalable Coherent Interconnect (SCI) bus, which is easy to increase system scalability (interconnecting multiple processing nodes).

또한, 각 노드가 공유하는 각 노드의 메모리(120)(220)는 버스 트랜잭션 제어부(111)(211)와 메모리 제어부(122)(212)를 통하여 자신 또는 다른 노드로부터 요청된 데이터를 제공하고, 해당 데이터의 상태 즉, 자신 또는 다른 노드의 트랜잭션에 의하여 변경된 데이터의 상태를 관리하는 메모리 디렉토리(도면에 미도시)를 각 노드에 구현하게 된다.In addition, the memory 120 and 220 of each node shared by each node may provide data requested from itself or another node through the bus transaction control unit 111 and 211 and the memory control unit 122 and 212. Each node implements a memory directory (not shown) that manages the state of the data, that is, the state of the data changed by the transaction of itself or another node.

이러한, 메모리 디렉토리는 자기 노드의 메모리에 대하여 다른 노드의 프로세서가 참조한 이력을 관리하게 된다.This memory directory manages the history of other node's processors with respect to the memory of its own node.

예를 들어 도 3에 도시된 바와 같이 메모리 디렉토리의 어드레스가 'A[31:0]'일 경우에는 4GB(Giga Byte)가 어드레싱이 가능하다. 즉 바이트 단위로 이력을 관리할 경우 4G개가 필요하게 된다. 이때, 캐시 라인 사이즈가 32Byte일 경우 메모리의 이력관리를 32Byte 단위로 하게된다.For example, as shown in FIG. 3, when the address of the memory directory is 'A [31: 0]', 4 GB (Giga Byte) can be addressed. In other words, 4G is required when managing the history in byte units. At this time, if the cache line size is 32 bytes, the history management of the memory is performed in 32 byte units.

메모리의 참조에 따라 상태를 'Clean, Shared, Invalid(Dirty)'로 구분하도록 2bit가 필요하며, 결국 256Mbit(Mega bit)(128M * 2bit) 즉, 메모리 디렉토리를 관리하기 위하여 32MByte의 메모리가 필요하게 된다.According to the memory reference, 2 bits are required to classify the state into 'Clean, Shared, Invalid (Dirty)', and thus, 32 MByte memory is required to manage 256 Mbit (Mega bit) (128M * 2bit). do.

그러나, 상기한 종래 기술에 의한 분산 공유 메모리 구조의 SMP 시스템은 각 노드를 연결하고자 상호 연결버스를 사용함으로 시스템 확장성이 증가하는 반면 버스 자체가 전송 폭이 좁으며, 신호 전송시 파이프라이닝(Pipelining) 방식이 아닌 점대점(Point to point) 전송방식을 채택하고 있기 때문에 상호 연결 버스 상에서의 지연 시간이 증가는 문제점이 있었고, 또한 일반적인 서버의 응용 프로그램(Application Program)인 온라인 트랜잭션 프로세싱(OLTP : On-Line Transaction Processing) 응용 프로그램이 수행될 때는 SMP 프로세서들간의 지속적인 통신과 상호 간섭을 필요로 하되므로, 상호 연결 버스 상으로의 접근이 매우 빈번하게 되어 결국, 시스템의 성능이 저하되는 문제점이 있었다.However, the SMP system of the distributed shared memory structure according to the related art uses an interconnection bus to connect each node, thereby increasing system scalability, while the bus itself has a narrow transmission width, and pipelining during signal transmission. Because of the point-to-point transmission method, the delay time on the interconnection bus is increased, and online transaction processing (OLTP: On), which is a general application program of the server, is problematic. -Line Transaction Processing) Since the application requires continuous communication and mutual interference between SMP processors, the access to the interconnect bus becomes very frequent, resulting in a performance degradation of the system.

또한, 종래 기술에 따른 분산 공유 메모리를 관리하기 위한 메모리 디렉토리는 접근시간(Access time)이 최소한 10nsec(nano sec)인 고속 메모리를 사용하여야 하고, 1대 1로 매핑(1:1 Mapping)을 해야 하므로 많은 메모리 용량이 필요하게 되고, 최대 64GByte의 메모리가 탑재될 경우 이를 관리하기 위한 메모리 디렉토리는 엄청난 용량의 고속 메모리가 필요하게 되어, 결국 보드상의 공간을 많이 차지할 뿐만 아니라, 많은 비용이 소모되는 문제점이 있었다.In addition, the memory directory for managing distributed shared memory according to the prior art should use a high-speed memory of at least 10 nsec (nano sec) access time, and must be 1: 1 mapping (1: 1 mapping). Therefore, a large memory capacity is required, and when a memory of up to 64GByte is loaded, a memory directory for managing it requires a large amount of high-speed memory, which consumes a lot of space on the board and is expensive. There was this.

따라서 본 발명은 상기한 종래 기술의 문제점을 개선하고자 하여 분산 공유 메모리 구조의 상호 연결 버스를 넓은 버스 전송 폭을 가지며 파이프 라이닝 방식으로 신호를 전송하는 고속의 로컬버스로 구현하고, 각 분산 공유 메모리를 관리하는 로컬 메모리 디렉토리(Local Memory Directory)를 분산 공유 메모리의 용량에 관계없이 고속 소용량의 로컬 메모리 디렉토리만으로 모든 분산 공유 메모리를 관리함으로써, 각 노드에 분산되어 있는 로컬 메모리 사이의 데이터 일관성을 유지시키는 분산 공유 메모리의 데이터 일관성 제어방법 및 장치를 제공함에 그 목적이 있다.Therefore, the present invention implements the interconnect bus of the distributed shared memory structure as a high speed local bus that transmits signals in a pipelining manner with a wide bus transmission width, and improves the problems of the prior art. Managed local memory directory manages all distributed shared memory only with fast small local memory directory regardless of the capacity of distributed shared memory, thereby maintaining data consistency among local memory distributed in each node It is an object of the present invention to provide a method and an apparatus for controlling data consistency of a shared memory.

본 발명의 다른 목적은 각 분산 공유 메모리를 관리하는 로컬 메모리 디렉토리를 분산 공유 메모리의 용량에 관계없이 고속 소용량으로 구현하여 모든 분산 공유 메모리를 관리하도록 로컬 메모리를 접근하는 프로세서, 트랜잭션 형태(Transaction Type), 로컬 메모리 디렉토리의 현재상태, 원거리 노드에 구동된 트랜잭션을 스누프한 결과에 따라 로컬 메모리 디렉토리를 제어함으로서, 각 분산되어 있는 로컬 메모리 사이의 데이터 일관성을 유지하도록 하는데 있다.Another object of the present invention is to implement a local memory directory that manages each distributed shared memory at high speed and small capacity regardless of the capacity of the distributed shared memory to access the local memory to manage all distributed shared memory, transaction type (Transaction Type) The local memory directory is controlled according to the current state of the local memory directory and the result of snooping a transaction driven to a remote node, thereby maintaining data consistency among the distributed local memories.

도 1은 일반적인 분산 공유 메모리 구조의 8 웨이 SMP 시스템 블록 구성도이고,1 is a block diagram of an 8-way SMP system of a general distributed shared memory structure,

도 2는 종래의 로컬 메모리 디렉토리의 데이터 구성도이고,2 is a data configuration diagram of a conventional local memory directory,

도 3은 본 발명에 따른 분산 공유 메모리의 데이터 일관성 유지장치의 블록 구성도이고,3 is a block diagram of an apparatus for maintaining data consistency of a distributed shared memory according to the present invention;

도 4는 도 3에서 로컬 메모리 디렉토리의 데이터 구성도이고,4 is a diagram illustrating the configuration of data in a local memory directory in FIG. 3;

도 5a 내지 도 5d는 본 발명에 따른 분산 공유 메모리의 데이터 일관성 유지과정의 흐름도이다.5A to 5D are flowcharts illustrating a data consistency maintenance process of a distributed shared memory according to the present invention.

< 도면의 주요 부분에 대한 부호의 설명 ><Description of Symbols for Main Parts of Drawings>

111 : 버스 트랜잭션 제어부 112 : 메모리 제어부111: bus transaction controller 112: memory controller

116 : 디렉토리 제어부 117 : 로컬 메모리 디렉토리116: directory control unit 117: local memory directory

이와 같은 본 발명의 목적을 달성하기 위한 분산 공유 메모리의 데이터 일관성 제어과정은 다수개의 프로세서를 탑재한 분산 공유 메모리 구조의 SMP로 이루어진 노드에서 소정의 프로세서가 자신의 노드 내의 로컬 메모리를 참조할 경우 해당 버스 트랜잭션을 검지하고, 상기 버스 트랜잭션이 쓰기를 위한 읽기 트랜잭션(RAW : Read For Write)일 경우 상기 로컬 메모리를 관리하는 로컬 메모리 디렉토리 상태를 검지하는 제 1 과정과; 상기 로컬 메모리 디렉토리 상태가 무효 상태(Invalid)일 경우 원거리 노드에 해당 트랜잭션을 발생한 후 스누프 단계(Snoop phase)를 관찰하는 제 2 과정과; 상기 스누프 결과 해당 데이터가 존재할 경우 상기 데이터를 원거리 노드(Remote Node)로부터 라이트 백(Write back) 받아 해당 프로세서에게 공급하는 제 3 과정과; 상기 스누프 결과 해당 데이터가 존재하지 않을 경우 해당 메모리 상태를 상기 원거리 노드로부터 전달받는 제 4 과정과; 상기 로컬 메모리 디렉토리의 상태가 공유 상태(Shared)일 경우 원거리 노드에 해당 데이터의 무효화 트랜잭션(Invalidation)을 발생시킨 후 스누프 단계를 수행하는 제 5 과정과; 상기 스누프 결과 원거리 노드에 존재하는 해당 데이터가 공유 상태일 경우 원거리 노드의 해당 데이터를 무효화시킨 후 해당 데이터를 로컬 메모리로부터 해당 프로세서에 제공하는 제 6 과정과; 상기 스누프 결과 원거리 노드에 해당 데이터가 존재하지 않을 경우 자신의 노드에 존재하는 해당 데이터를 해당 프로세서에 제공하는 제 7 과정으로 이루어짐을 특징으로 한다.Such a data coherency control process of the distributed shared memory to achieve the object of the present invention, when a predetermined processor refers to the local memory in its node in a node consisting of SMP of a distributed shared memory structure equipped with a plurality of processors Detecting a bus transaction and detecting a state of a local memory directory managing the local memory when the bus transaction is a read for write (RAW); A second step of observing a snoop phase after generating a corresponding transaction at a remote node when the local memory directory state is invalid; A third step of receiving the data back from the remote node and supplying the data to the processor when the corresponding data exists as a result of snooping; A fourth step of receiving a corresponding memory state from the remote node if the corresponding data does not exist as a result of snooping; A fifth step of performing a snoop step after generating an invalidation transaction (Invalidation) of the corresponding data in the remote node when the state of the local memory directory is shared; A sixth step of invalidating the corresponding data of the remote node and providing the corresponding data to the processor from the local memory when the corresponding data existing in the remote node is shared as a result of snooping; When the corresponding data does not exist in the far node as a result of snooping, a seventh process of providing the corresponding data existing in the own node to the corresponding processor is performed.

또한, 본 발명의 목적을 달성하기 위한 다른 분산 공유 메모리의 데이터 일관성 제어과정은 다수개의 프로세서를 탑재한 분산 공유 메모리 구조의 대칭형 다중 처리기로 이루어진 노드에서 소정의 프로세서가 자신의 노드 내의 로컬 메모리를 참조할 경우 해당 버스 트랜잭션을 검지하고, 상기 버스 트랜잭션이 단순한 데이터의 읽기(RIR : Read For Read) 트랜잭션일 경우 상기 로컬 메모리를 관리하는 로컬 메모리 디렉토리 상태를 검지하는 제 1 과정과; 상기 로컬 메모리 디렉토리 상태에 따라 현재 구동된 트랜잭션을 지연시키고, 원거리 노드에 단순 읽기 트랜잭션을 발생한 후 스누프 단계를 관찰하는 제 2 과정과; 상기 로컬 메모리 디렉토리 상태가 무효상태이고, 원거리 노드의 로컬버스의 스누프 결과 해당 데이터가 원거리 노드의 소정 프로세서가 더티상태로 가지고 있을 경우 상기 데이터를 원거리 노드로부터 라이트 백 받아 해당 프로세서에 공급하는 제 3 과정과; 상기 스누프 결과 해당 데이터가 존재하지 않을 경우 해당 메모리 상태를 상기 원거리 노드로부터 전달받는 제 4 과정과; 상기 로컬 메모리 디렉토리의 상태가 공유 상태이고, 스누프 단계의 관찰 결과 원거리 노드의 소정 프로세서가 해당 데이터를 공유하고 있는 상태일 경우 해당 데이터의 상태를 상기 원거리 노드로부터 전달받은 후 연기되었던 트랜잭션의 응답주기에서 요청 프로세서가 해당 트랜잭션을 재수형 시키는 제 5 과정과; 상기 스누프 결과 원거리 노드에 해당 데이터가 존재하지 않을 경우 자신의 노드에 존재하는 해당 데이터를 해당 프로세서에 제공하는 제 6 과정으로 이루어짐을 특징으로 한다.In addition, in order to achieve the object of the present invention, a data coherency control process of another distributed shared memory refers to a local memory in a node of a processor in a node composed of a symmetric multiprocessor of a distributed shared memory structure having a plurality of processors. Detecting a corresponding bus transaction and detecting a local memory directory state managing the local memory when the bus transaction is a simple read for read (RIR) transaction; Delaying a currently driven transaction according to the local memory directory state, generating a simple read transaction at a remote node, and observing a snoop step; A third state in which the data is written back from the remote node and supplied to the corresponding processor when the local memory directory state is invalid and the corresponding data is dirty by a predetermined processor of the remote node as a result of snooping of the local bus of the remote node; Process; A fourth step of receiving a corresponding memory state from the remote node if the corresponding data does not exist as a result of snooping; If the state of the local memory directory is shared and the snoop step observes that the predetermined processor of the remote node is sharing the data, the response period of the transaction that has been postponed after receiving the state of the data from the remote node A fifth step in which the request processor reformates the transaction; When the corresponding data does not exist in the remote node as a result of snooping, the sixth process of providing the corresponding data existing in the own node to the corresponding processor is performed.

또한, 본 발명의 목적을 달성하기 위한 또 다른 분산 공유 메모리의 데이터 일관성 제어과정은 다수개의 프로세서를 탑재한 분산 공유 메모리 구조의 대칭형 다중 처리기로 이루어진 노드에서 소정의 프로세서가 자신의 노드 내의 로컬 메모리를 참조할 경우 해당 버스 트랜잭션을 검지하고, 상기 버스 트랜잭션이 무효화 트랜잭션일 경우 상기 로컬 메모리를 관리하는 로컬 메모리 디렉토리 상태를 검지하는 제 1 과정과; 상기 로컬 메모리 디렉토리 상태가 공유 상태일 경우 원격 노드의 로컬버스에 무효화 트랜잭션을 구동시킨 후 스누프 단계를 관찰하는 제 2 과정과; 상기 스누프 단계 관찰 결과 원거리 노드의 특정 프로세서가 해당 데이터를 공유 상태로 가지고 있을 경우 해당 데이터의 상태를 원거리 노드로부터 전달받은 후 로컬 메모리 디렉토리의 상태를 클린상태로 변경시키는 제 3 과정과; 상기 스누프 단계 관찰 결과 원거리 노드의 특정 프로세서가 해당 데이터를 가지고 있지 않을 경우 해당 데이터의 상태를 원거리 노드로부터 전달받은 후 로컬 메모리 디렉토리의 상태를 그대로 유지하는 제 4 과정으로 이루어짐을 특징으로 한다.In addition, the data coherency control process of the distributed shared memory to achieve the object of the present invention is that a predetermined processor in the node consisting of a symmetric multiprocessor of a distributed shared memory structure equipped with a plurality of processors, the local memory in its own node Detecting a corresponding bus transaction if referred to, and detecting a state of a local memory directory managing the local memory when the bus transaction is an invalidated transaction; A second step of observing a snoop step after running an invalidation transaction on a local bus of a remote node when the local memory directory state is a shared state; A third step of changing the state of the local memory directory to a clean state after receiving the state of the data from the remote node when the specific processor of the remote node has the corresponding data in the shared state as a result of the snoop step observation; As a result of observing the snoop step, if the specific processor of the far node does not have the corresponding data, the fourth process of maintaining the state of the local memory directory after receiving the state of the corresponding data from the far node.

본 발명의 목적을 달성하기 위한 또 다른 분산 공유 메모리의 데이터 일관성 제어과정은 다수개의 프로세서를 탑재한 분산 공유 메모리 구조의 대칭형 다중 처리기로 이루어진 노드에서 원거리 노드의 소정의 프로세서가 자신의 노드 내의 로컬 메모리를 참조할 경우 해당 버스 트랜잭션을 검지하는 제 1 과정과; 상기 검지된 트랜잭션이 무효화 트랜잭션일 경우 로컬 메모리 디렉토리 상태를 무효상태로 갱신하는 제 2 과정과; 상기 검지된 트랜잭션이 데이터를 읽어오는 트랜잭션일 경우 해당 로컬 메모리 디렉토리 상태를 쓰기를 위한 읽기 트랜잭션일 경우에는 무효상태로, 단순한 읽기 트랜잭션일 경우에는 공유 상태로 갱신한 후 로컬 메모리로부터 데이터를 읽어오는 제 3 과정과; 상기 검지된 트랜잭션이 쓰기 트랜잭션일 경우 로컬 메모리 디렉토리를 해당 상태로 갱신하는 제 4 과정으로 이루어짐을 특징으로 한다.In order to achieve the object of the present invention, a data coherency control process of a distributed shared memory includes a process in which a predetermined processor of a remote node has a local memory in its node in a node composed of a symmetric multiprocessor of a distributed shared memory structure having a plurality of processors. A first step of detecting a corresponding bus transaction when referring to the method; A second step of updating a local memory directory state to an invalid state when the detected transaction is an invalidated transaction; If the detected transaction reads data, the local memory directory state is updated to invalid state if it is a read transaction for writing or shared state if it is a simple read transaction, and then data is read from local memory. 3 courses; If the detected transaction is a write transaction, it is characterized in that the fourth step of updating the local memory directory to the corresponding state.

본 발명의 목적을 달성하기 위한 분산 공유 메모리의 데이터 일관성 제어장치는 다수개의 프로세서를 탑재한 분산 공유 메모리 구조의 대칭형 다중 처리기로 이루어진 다수개의 노드에 탑재되어 공유하는 로컬 메모리의 이력을 관리하여 데이터 일관성을 유지시키는 로컬 메모리 디렉토리와, 상기 로컬 메모리 디렉토리를 제어하는 디렉토리 제어수단을 포함한 분산 공유 메모리의 데이터 일관성 제어장치에 있어서, 상기 로컬 메모리 디렉토리는 로컬 메모리에 접근하는 어드레스를 검출 보관하는 로컬 메모리 접근 어드레스 래치수단과; 상기 로컬 메모리 접근 어드레스 래치수단으로부터 해당 어드레스를 전송 받아 로컬 메모리 디렉토리를 검색하도록 래치하는 디렉토리 캐시를 포함하여 구성된 것을 특징으로 한다.In order to achieve the object of the present invention, the data coherency control apparatus of distributed shared memory manages the data coherency by managing the history of shared local memory mounted on a plurality of nodes composed of symmetric multiprocessors of a distributed shared memory structure having a plurality of processors. A data coherence control device of a distributed shared memory including a local memory directory for holding a local memory directory and directory control means for controlling the local memory directory, the local memory directory having a local memory access address for detecting and storing an address for accessing a local memory. Latch means; And a directory cache configured to receive a corresponding address from the local memory access address latching means and to latch a search for a local memory directory.

여기서, 상기 디렉토리 캐시는 캐시 라인내의 바이트 단위 어드레싱을 위한 소정 비트의 바이트 오프셋과; 디렉토리 캐시에 접근할 때 어드레싱하여 메모리 상태를 나타내는 소정 비트의 인덱스 어드레스와; 상기 인덱스 어드레스에 대하여 소정 비율로 맵핑하는 태그 어드레스와; 상기 로컬 메모리의 용량 증가에 따라 관리 메모리의 용량을 확보하는 예비 어드레스로 구성하여 된 것을 특징으로 한다.Wherein the directory cache comprises a byte offset of a predetermined bit for byte-by-byte addressing in the cache line; A predetermined bit index address that addresses the address of the directory cache when accessing the directory cache; A tag address for mapping the index address at a predetermined rate; According to the increase in the capacity of the local memory, it is characterized by consisting of a spare address to secure the capacity of the management memory.

이하, 본 발명의 바람직한 실시예를 첨부된 도면을 참조하여 보다 상세히 설명하면 다음과 같다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 3은 본 발명에 따른 각 노드의 분산 공유 메모리의 데이터 일관성 제어장치의 블록 구성도로서, 이에 도시된 바와 같이 버스 트랜잭션 제어부(111)의 제어에 의하여 로컬버스 및 상호 연결버스로의 데이터 또는 트랜잭션을 각각 요청 및 응답하는 제 1, 제 2 요청/응답기(113a)(113b)와; 상기 버스 트랜잭션 제어부(BTC : Bus Transaction Controller)(111)의 제어에 의하여 상기 로컬버스 및 상호 연결버스로 구동되는 트랜잭션을 발생 순서에 따라 저장하는 제 1, 제 2 순서 유지큐(IOQ : In Order Queue) 인터페이스부(114a)(114b)와; 상기 버스 트랜잭션의 제어부(111)의 제어에 의하여 상기 제 1, 제 2 순서 유지큐 인터페이스부(114a)에 기록된 각 트랜잭션을 지연 또는 수행시키는 제 1, 제 2 지연 큐 인터페이스부(115a)(115b)와; 상기 버스 트랜잭션 제어부(111)의 제어에 따라 로컬 메모리(220)의 이력관리를 제어하는 디렉토리 제어부(116)와; 상기 디렉토리 제어부(116)의 제어에 의하여 상기 로컬 메모리(220)의 이력을 관리하여 데이터 일관성을 유지시키는 로컬 메모리 디렉토리(117)로 구성한다.3 is a block diagram of a data coherence control device of a distributed shared memory of each node according to the present invention. As shown therein, data or transactions to a local bus and an interconnect bus are controlled by a bus transaction control unit 111. First and second request / responders 113a and 113b, respectively requesting and responding to each other; First and second order queues (IOQ) for storing transactions driven by the local bus and the interconnection bus according to the order of occurrence under the control of the bus transaction controller (BTC) 111. Interface portions 114a and 114b; First and second delay queue interface units 115a and 115b which delay or execute each transaction recorded in the first and second order hold queue interface units 114a under the control of the control unit 111 of the bus transaction. )Wow; A directory controller 116 for controlling history management of the local memory 220 under the control of the bus transaction controller 111; The local memory directory 117 manages the history of the local memory 220 under the control of the directory controller 116 to maintain data consistency.

여기서, 로컬 메모리 디렉토리(117)는 로컬 메모리(220)에 접근하는 어드레스를 검출 보관하는 로컬 메모리 접근 어드레스 래치부(Local memory access address latch)와; 상기 로컬 메모리 접근 어드레스 래치부로부터 해당 어드레스를 전송받아 로컬 메모리 디렉토리를 검색하도록 래치하는 디렉토리 캐시(Directory cache)로 구성된다.The local memory directory 117 may include a local memory access address latch for detecting and storing an address for accessing the local memory 220; And a directory cache that receives the corresponding address from the local memory access address latch and latches to search for a local memory directory.

또한, 상기 디렉토리 캐시는 프로세서 내의 제 2 레벨 캐시 메모리의 캐시 라인내의 바이트 단위 어드레싱(addressing)을 위한 바이트 오프셋(Bytes Offsets)(A[4:0])과; 디렉토리 캐시에 접근할 때 어드레싱하여 메모리 상태(Clean, Shared, Dirty)를 나타내도록 각 번지당 2bit씩 16Mega개의 번지(총 4MByte)로 구성된 인덱스 어드레스(Index address)(A[28:5])와; 상기 인덱스 어드레스(A[28:5])에 대하여 1:8 비율로 맵핑하는 태그 어드레스(Tag addr)(A[31:29])와; 상기 로컬 메모리(120)의 용량 증가에 따라 관리 메모리의 용량을 확보하는 예비 어드레스(Reserved addr)(A[35:32])로 구성된다.The directory cache further includes bytes offsets (A [4: 0]) for byte-by-byte addressing in cache lines of a second level cache memory in a processor; An index address (A [28: 5]) consisting of 16 Mega addresses (4 MBytes in total), 2 bits per address, addressing the directory cache to indicate memory status (Clean, Shared, Dirty); A tag address Tag addr A [31:29] that maps the index address A [28: 5] in a 1: 8 ratio; As the capacity of the local memory 120 increases, a reserved address A [35:32] for securing the capacity of the management memory is configured.

이와 같이 구성된 본 발명의 작용을 도 1 내지 도 5d를 참조하여 상세히 설명하면 다음과 같다.The operation of the present invention configured as described above will be described in detail with reference to FIGS. 1 to 5D.

먼저, 본 발명은 분산 공유 메모리 구조의 SMP 컴퓨터 시스템에서 분산되어 있는 메모리 사이의 데이터 일관성을 유지하기 위하여 로컬 메모리 디렉토리(117)(디렉토리 캐시)를 소용량의 고속 디렉토리만으로 구현한 것이다.First, the present invention implements a local memory directory 117 (directory cache) with only a small amount of fast directories in order to maintain data coherency between distributed memories in an SMP computer system having a distributed shared memory structure.

즉, 로컬 메모리 디렉토리(117)는 자기 노드내의 로컬 메모리가 원거리 노드의 프로세서로부터 접근이 되었을 경우 해당 데이터의 상태를 관리하는 것이다. 이와 같이 원거리 노드로부터의 메모리 접근 이력을 관리함으로써, 자기 노드내의 프로세서가 로컬 메모리에 접근할 때 저장된 이력을 참조하여 적절한 대응을 하게 된다.That is, the local memory directory 117 manages the state of data when local memory in its node is accessed from the processor of the remote node. By managing the memory access history from the remote node in this way, when the processor in its node accesses the local memory, it makes an appropriate response with reference to the stored history.

이와 같은 로컬 메모리의 이력을 관리하기 위하여 로컬 메모리 디렉토리는 각 캐시(프로세서에 탑재된 제 2 레벨 캐시메모리(L2))라인(32Byte)당 2bit 씩 할당되어 있고, 3가지 상태가 존재하게 된다.In order to manage the history of the local memory, the local memory directory is allocated 2 bits per line (32 bytes) of each cache (second level cache memory L2 mounted in the processor), and three states exist.

여기서, 3가지 상태는 먼저 해당 어드레스가 원거리 노드의 프로세서로부터 전혀 접근이 되지 않았을 경우를 나타내는 클린상태(Clean)와, 해당 어드레스가 원거리 노드의 프로세서로부터 읽기 참조를 위해서 접근되었을 경우 해당 어드레스의 데이터가 원거리 노드의 프로세서와 공유하고 있음을 나타내는 공유상태(Shared)와, 해당 번지가 원거리 노드의 프로세서로부터 쓰기 참조를 위해서 접근되었을 경우, 해당 어드레스의 데이터는 무효하고 원거리 노드의 프로세서에만 유효한 데이터가 존재하고 있음을 나타내는 무효상태(Invalid)가 존재하게 된다.Here, the three states are first a clean state indicating that the address has not been accessed from the remote node's processor at all, and when the address is accessed for read reference from the remote node's processor, the data of the address is stored. If the shared address indicates that it is shared with the remote node's processor, and the address is accessed for a write reference from the remote node's processor, the data at that address is invalid and only valid for the remote node's processor. There will be an Invalid state indicating there is.

여기서, 도 4를 참조하여 보다 상세히 설명하면 다음과 같다.Here, it will be described in more detail with reference to Figure 4 as follows.

로컬 메모리에 접근하는 어드레스를 검출하여 저장하는 로컬 메모리 접근 어드레스 래치와 상기 로컬 메모리 접근 어드레스 래치로부터 해당 어드레스를 전달받아 로컬 메모리 디렉토리(117)를 검색하고 갱신하는데 사용되는 두 번째 단의 어드레스 래치가 있다. 그리고 디렉토리 캐시는 각 어드레스당 2비트씩 16M 개의 어드레스(총 4MByte)로 구성되어 있다.There is a local memory access address latch that detects and stores an address that accesses a local memory, and a second stage address latch that is used to retrieve and update a local memory directory 117 by receiving the address from the local memory access address latch. . The directory cache is composed of 16M addresses (4MByte total), 2 bits per address.

만약, 로컬 메모리(120)의 용량이 4GByte의 용량이 사용될 경우 여기에 필요한 디렉토리 캐시를 구현하기 위해서는 총 32MByte의 정적 램(Static RAM)과 같은 고속 메모리가 필요하게 되는데, 본 발명에서는 이를 4MByte로 구현한 것이다.If the capacity of the local memory 120 is 4GByte, a high speed memory such as a total RAM of 32MByte is required to implement the directory cache required for the local memory 120. In the present invention, this is implemented as 4MByte. It is.

이를 위하여 도 4에 도시된 바와 같이 특정 인덱스 어드레스에 대하여 서로 다른 8개의 태그 어드레스를 가지고 있는 어드레스가 존재하게 된다.To this end, as shown in FIG. 4, an address having eight different tag addresses for a specific index address exists.

이와 같이 구현된 로컬 메모리 디렉토리(117)내의 디렉토리 캐시에 접근하는 경우는 자신의 노드내의 프로세서가 로컬 메모리에 접근하는 경우와 원거리 노드의 프로세서가 로컬 메모리에 접근하는 경우가 있다.In the case of accessing the directory cache in the local memory directory 117 implemented as described above, there may be a case where a processor in its node accesses local memory and a processor of a remote node accesses local memory.

먼저, 도 5를 참조하여 소정의 프로세서가 자신의 노드에 위치한 로컬 메모리에 접근하는 경우에 관하여 상세히 설명하면 다음과 같다.First, a case in which a predetermined processor accesses a local memory located in its node will be described in detail with reference to FIG. 5.

도 5a는 특정 프로세서가 자신의 노드내의 분산 공유 메모리에 접근하는 경우중 쓰기를 위한 읽기와 쓰기 버스 트랜잭션이 발생할 경우 로컬 메모리 디렉토리의 관리 흐름도로서, 이에 도시된 바와 같이 로컬 버스 상에서 특정 프로세서가 캐시 라인을 더티상태(Dirty)로 가지고 있고, 같은 로컬 버스 상에서 다른 프로세서가 이 데이터를 버스 읽기 라인(Read For Read)트랜잭션이거나, 버스 일기 무효화 라인(Read For Write) 트랜잭션에 의하여 요구시 더티 캐시 라인을 가지고 있는 프로세서는 암시 라이트 백(Implicit Writeback) 트랜잭션에 의하여 해당 데이터를 로컬버스상에 내놓게 된다.FIG. 5A is a flowchart illustrating management of a local memory directory when a read and write bus transaction for writing occurs while a specific processor accesses distributed shared memory in its node. As shown in FIG. Is in a dirty state, and another processor on the same local bus may have this data read by a bus read-for-line transaction or with a dirty cache line when requested by a bus read-for-line transaction. A processor may present its data on a local bus by means of an implicit writeback transaction.

이때, 쓰기를 위한 읽기 트랜잭션인 경우에는 요청했던 프로세서만 해당 데이터를 가져가게 되고, 로컬 메모리(120)는 갱신하지 않게 된다. 한편, 발생된 트랜잭션이 단순히 읽기를 위한 트랜잭션일 경우에는 요청했던 프로세서가 해당 데이터를 가져가게 되는 것은 물론 로컬 메모리(120)도 갱신하게 된다. 이때, 로컬 메모리에 쓰기 주기가 발생하게 된다.In this case, in the case of a read transaction for writing, only the requested processor takes the corresponding data, and the local memory 120 does not update. On the other hand, when the generated transaction is simply a read-only transaction, the requested processor not only takes the data but also updates the local memory 120. At this time, a write cycle occurs in the local memory.

또한, 로컬 버스상에 특정 프로세서가 새로운 캐시 라인을 맵핑(Mapping)시키기 위하여 같은 인덱스 어드레스의 다른 캐시 라인을 교체시키려고 할 때, 해당 캐시 라인이 더티상태이면 버스 쓰기 라인 트랜잭션에 의해서 그 캐시 라인을 먼저 로컬 메모리에 쓰고 나서 새로운 캐시 라인을 맵핑하게 된다. 이때도 로컬 메모리에 쓰기 주기가 발생하게 된다.Also, when a particular processor on a local bus attempts to swap another cache line at the same index address to map a new cache line, if that cache line is dirty, the cache line is first placed by a bus write line transaction. It writes to local memory and then maps a new cache line. At this time, a write cycle occurs in the local memory.

이와 같이 로컬 메모리에 쓰기를 수행하는 두 가지의 경우에는 해당 노드 내의 특정 프로세서가 관련 데이터를 더티상태로 가지고 있기 때문에 원거리 노드에서는 해당 데이터가 존재하지 않게 되고, 따라서 로컬 메모리 디렉토리의 상태는 갱신없이 계속 클린상태가 유지하게 된다.In the two cases of writing to the local memory as described above, the data is not present in the remote node because a particular processor in the node has the relevant data dirty, so the state of the local memory directory is kept without updating. The clean state is maintained.

이와 같이 로컬 버스 상에서 발생한 트랜잭션이 쓰기를 위한 읽기이고 다른 프로세서가 해당 데이터를 더티상태로 가지고 있지 않은 경우에는 메모리 접근 전에 먼저 로컬 메모리 디렉토리(117)의 상태를 확인하게 된다.When a transaction occurring on the local bus is a read for writing and another processor does not have the data in a dirty state, the state of the local memory directory 117 is first checked before memory access.

로컬 메모리 디렉토리(117)의 상태가 클린상태인 경우에는 로컬 메모리 디렉토리(117) 상태를 갱신하지 않고 로컬 메모리로부터 해당 데이터를 요청 프로세서에 제공한다.When the state of the local memory directory 117 is in a clean state, the data is provided to the request processor from the local memory without updating the state of the local memory directory 117.

그러나, 만약 로컬 메모리 디렉토리(117)의 상태가 무효상태일 경우에는 원거리 노드로부터 해당 데이터를 읽어와야 하므로 먼저 현재의 트랜잭션을 연기(Defer)시킨다. 그러고 나서 상호 연결버스를 통하여 원거리 노드에 있는 버스 트랜잭션 제어부(111)에게 해당 데이터를 요구하게 된다. 원거리 노드의 버스 트랜잭션 제어부는 원거리 노드 내의 로컬 버스에 쓰기를 위한 읽기 트랜잭션을 발생시킨 후 스누프 단계를 관찰한다.However, if the state of the local memory directory 117 is invalid, since the corresponding data must be read from the remote node, the current transaction is first deferred. Then, the data is requested to the bus transaction control unit 111 at the far node through the interconnection bus. The bus transaction controller of the far node issues a read transaction for writing to the local bus within the far node and then observes the snoop phase.

상기한 스누프 단계에서 'HITM'신호가 구동되면 원거리 로컬버스상의 특정 프로세서가 해당 데이터를 더티상태로 갖고 있다는 것을 의미하고, 연이어서 해당 프로세서가 라이트백 트랜잭션을 수행하게 되기 때문에 원거리 노드의 버스 트랜잭션 제어부는 라이트백 데이터를 받아서 상호 연결 버스를 통하여 자신의 노드의 버스 트랜잭션 제어부(111)에 전달된다.When the 'HITM' signal is driven in the snoop step, it means that a specific processor on the remote local bus has the corresponding data dirty. Subsequently, the processor executes the writeback transaction. The control unit receives the writeback data and delivers the data to the bus transaction control unit 111 of its node through the interconnection bus.

상호 연결 버스를 통하여 데이터를 전달받은 버스 트랜잭션 제어부(111)는 자신의 로컬버스상에서 연기되었던 트랜잭션의 응답주기를 수행하여 해당 데이터를 쓰기를 위한 읽기 트랜잭션을 발생시켰던 프로세서에게 제공하게 되는데, 이때 로컬 메모리(120)나 로컬 메모리 디렉토리(117)는 갱신하지 않는다.The bus transaction control unit 111 that receives data through the interconnection bus performs a response cycle of a transaction that has been postponed on its local bus, and provides the processor to the processor that has issued a read transaction for writing the corresponding data. 120 and the local memory directory 117 are not updated.

한편, 상기한 스누프 단계에서 'HITM'신호가 구동되지 않을 경우(Inactive) 원거리 노드의 로컬버스상에 해당 데이터가 없다는 것을 의미하므로 원거리 버스 트랜잭션 제어부는 상호 연결 버스를 통하여 해당 노드의 버스 트랜잭션 제어부에 요청된 데이터가 없다는 응답을 전달하게 된다.On the other hand, when the 'HITM' signal is not driven in the snoop step (Inactive), it means that there is no corresponding data on the local bus of the remote node. Therefore, the remote bus transaction controller controls the bus transaction of the node through the interconnection bus. Will return a response stating that no data was requested.

이와 같이, 원거리 노드로부터 요청된 데이터가 없다는 응답을 받은 버스 트랜잭션 제어부(111)는 자신의 노드의 로컬버스 상에서 연기되었던 트랜잭션의 응답주기를 수행하여 해당 데이터를 로컬 메모리(120)로부터 읽어와 쓰기를 위한 읽기 트랜잭션을 발생시켰던 프로세서에 제공하게 된다.As such, the bus transaction controller 111 that has received a response that there is no data requested from the remote node performs a response period of a transaction that has been postponed on the local bus of its node to read and write the corresponding data from the local memory 120. It provides the processor that issued the read transaction.

이것은 로컬 메모리 디렉토리(117)의 캐시 디렉토리에서 인덱스 어드레스는 일치하지만 태그 어드레스가 다른 어드레스를 원거리 노드의 특정 프로세서가 가지고 있다는 것을 의미하므로, 로컬 메모리 디렉토리(117)의 상태는 갱신하지 않고 계속 무효상태를 유지하게 된다.This means that a particular processor of the remote node has an address that matches the index address but has a different tag address in the cache directory of the local memory directory 117, so that the status of the local memory directory 117 is not updated and continues to be invalid. Will be maintained.

만약, 로컬 메모리 디렉토리(117)의 상태가 공유상태일 경우에는 원거리 노드와 해당 데이터를 공유하고 있기 때문에 먼저 현재의 트랜잭션을 연기 시킨 후, 상호 연결버스를 통하여 원거리 노드에 있는 버스 트랜잭션 제어부에 해당 데이터를 무효화(Invalidation)시킬 것을 요구하게 된다. 원거리 노드의 버스 트랜잭션 제어부는 원거리 노드 내의 로컬버스에 무효화 트랜잭션을 발생시킨 후 스누프 단계를 관찰하게 된다.If the state of the local memory directory 117 is shared, since the corresponding data is shared with the remote node, the current transaction is postponed first, and then the corresponding data is transferred to the bus transaction controller of the remote node through the interconnection bus. You will be asked to invalidate. The bus transaction controller of the far node issues an invalidation transaction to the local bus in the far node and then observes the snoop phase.

스누프 단계에서 'HIT' 신호가 구동되면 원거리 노드의 로컬버스상의 특정 프로세서가 해당 데이터를 공유상태로 가지고 있다는 것을 의미하므로 원거리 노드의 버스 트랜잭션 제어부는 상호 연결 버스를 통하여 해당 노드의 버스 트랜잭션 제어부(111)에게 무효화 시켰다는 응답을 하게 된다.When the 'HIT' signal is driven in the snoop phase, it means that a specific processor on the local bus of the far node has the shared data. Therefore, the bus transaction controller of the far node is connected to the bus transaction controller of the corresponding node through the interconnection bus. 111) to announce the invalidation.

버스 트랜잭션 제어부(111)는 자신의 노드의 로컬버스 상에서 연기되었던 트랜잭션의 응답주기를 수행하여 해당 데이터를 로컬 메모리로부터 쓰기를 위한 읽기 트랜잭션을 발생시켰던 프로세서에게 제공하게 되는데, 원거리 노드에서 공유되고 있던 데이터가 무효화되었기 때문에 로컬 메모리 디렉토리(117)의 상태는 클린상태로 갱신하게 된다.The bus transaction control unit 111 performs a response cycle of a transaction that has been postponed on its node's local bus and provides the processor with the read data for writing the data from local memory. Since is invalidated, the state of the local memory directory 117 is updated to a clean state.

스누프 단계에서 'HIT' 신호가 구동되지 않으면 원거리 노드의 로컬 버스 상에서 해당 데이터가 없다는 것을 의미하고, 원거리 버스 트랜잭션 제어부는 상호 연결 버스를 통해서 해당 노드의 버스 트랜잭션 제어부(111)에 해당 데이터가 없다는 응답을 전달하게 된다.If the 'HIT' signal is not driven in the snoop phase, it means that there is no corresponding data on the local bus of the far node, and the far bus transaction controller does not have the corresponding data in the bus transaction controller 111 of the node through the interconnect bus. Will pass the response.

해당 노드의 버스 트랜잭션 제어부(111)는 로컬 버스 상에서 연기되었던 트랜잭션의 응답주기를 수행하여 해당 데이터를 로컬 메모리(120)로부터 읽어와서 쓰기를 위한 읽기 트랜잭션을 발생시켰던 프로세서에게 제공하게 된다.The bus transaction control unit 111 of the node performs a response cycle of a transaction that has been postponed on the local bus to read the data from the local memory 120 and provide the processor to the read transaction for writing.

이것은 로컬 메모리 디렉토리(117)의 캐쉬 디렉토리에서 인덱스 어드레스는 일치하지만 태그 어드레스가 다른 번지를 원거리 노드의 특정 프로세서가 가지고 있다는 것을 의미하기 때문에 로컬 메모리 디렉토리(117)의 상태는 갱신하지 않고 계속 공유상태를 유지하게 된다.This means that the address of the local memory directory 117 remains shared without updating the state of the local memory directory 117 because it means that a particular processor on the remote node has an address that matches the address in the cache directory of the local memory directory 117 but has a different tag address. Will be maintained.

다음은 자신의 노드의 프로세서가 로컬 메모리(120)에 접근하는 경우(특정 프로세서가 자기 노드내의 분산 공유 메모리에 접근하는 경우)중 단순 읽기를 위한 버스 트랜잭션이 발생할 경우 로컬 메모리 디렉토리(117) 관리에 대하여 도 5b를 참조하여 보다 상세히 설명하면 다음과 같다.Next, when a node's processor accesses local memory 120 (a specific processor accesses distributed shared memory in its node), a bus transaction for simple reading occurs. With reference to Figure 5b will be described in more detail as follows.

로컬버스 상에서 발생한 트랜잭션이 단순 읽기 동작이고, 다른 프로세서가 해당 데이터를 더티상태로 가지고 있지 않은 경우에는, 메모리 접근 전에 먼저 로컬 메모리 디렉토리(117)의 상태를 확인하게 된다.If a transaction occurring on the local bus is a simple read operation and another processor does not have the data in a dirty state, the state of the local memory directory 117 is first checked before the memory access.

로컬 메모리 디렉토리(117)의 상태가 클린상태일 경우에는 로컬 메모리 디렉토리(117)상태의 갱신없이 로컬 메모리(120)로부터 해당 데이터를 제공한다.When the state of the local memory directory 117 is in a clean state, the corresponding data is provided from the local memory 120 without updating the state of the local memory directory 117.

만약, 로컬 메모리 디렉토리(117)의 상태가 무효상태일 경우에는 원거리 노드로부터 해당 데이터를 읽어와야 하기 때문에 먼저 현재의 트랜잭션을 연기(defer)시킨 후, 상호 연결버스를 통해서 원거리 노드에 있는 버스 트랜잭션 제어부에 해당 데이터를 요구하게 된다. 원거리 노드의 버스 트랜잭션 제어부는 원거리 노드내의 로컬버스에 단순 읽기 트랜잭션을 발생시킨 후 스누프 단계를 관찰하게 된다.If the local memory directory 117 is in an invalid state, since the corresponding data must be read from the remote node, the current transaction must be deferred first, and then the bus transaction controller of the remote node through the interconnection bus. You will be asked for that data. The bus transaction controller of the far node issues a simple read transaction to the local bus in the far node and observes the snoop phase.

상기한 스누프 단계에서 'HITM' 신호가 구동되면, 원거리 노드의 로컬버스상의 특정 프로세서가 해당 데이터를 더티상태로 가지고 있다는 것을 의미하고, 연이어서 해당 프로세서가 라이트백 트랜잭션을 수행하게 되기 때문에, 원거리 노드의 버스 트랜잭션 제어부는 라이트 백 데이터를 받아서 상호 연결 버스를 통하여 해당 노드의 버스 트랜잭션 제어부(111)에 전달하게 된다. 상기의 데이터를 전달받은 후 해당 노드의 로컬 버스상에서 연기되었던 트랜잭션의 응답주기를 수행하여 해당 데이터를 단순 읽기 트랜잭션을 발생시켰던 프로세서에게 제공하게 되는데, 이때 데이터는 자신의 노드와 원거리 노드의 프로세서들 사이에 공유하고 있기 때문에 자신의 노드의 로컬 메모리(120)를 갱신하고, 로컬 메모리 디렉토리(117)의 상태는 공유상태로 갱신하게 된다.When the 'HITM' signal is driven in the snoop phase, it means that a particular processor on the local bus of the far node has dirty data, and the processor performs a writeback transaction in succession. The bus transaction control unit of the node receives the write back data and delivers it to the bus transaction control unit 111 of the node through the interconnection bus. After receiving the above data, it executes the response cycle of the delayed transaction on the local bus of the node and provides the data to the processor that issued the simple read transaction. In this case, the local memory 120 of its own node is updated, and the state of the local memory directory 117 is updated to the shared state.

만약, 스누프 단계에서 'HITM' 신호가 구동되지 않으면, 원거리 노드의 로컬버스 상에 해당 데이터 없다는 것을 의미하고, 원거리 노드의 버스 트랜잭션 제어부는 상호 연결 버스를 통해서 해당 노드의 버스 트랜잭션 제어부(111)에게 해당 데이터가 없다는 응답을 전달하게 된다.If the 'HITM' signal is not driven in the snoop phase, it means that there is no corresponding data on the local bus of the far node, and the bus transaction controller of the far node is the bus transaction controller 111 of the node through the interconnection bus. Will return a response saying that the data is missing.

상기한 응답을 전달받은 버스 트랜잭션 제어부(111)는 자신의 노드의 로컬버스 상에서 연기되었던 트랜잭션의 응답주기를 수행하여 해당 데이터를 로컬 메모리로부터 읽어와서 단순 읽기 트랜잭션을 발생시켰던 프로세서에게 제공하게 된다.The bus transaction control unit 111 receiving the above response performs a response period of a transaction that has been postponed on the local bus of its node, reads the data from the local memory, and provides the processor with the simple read transaction.

이것은 로컬 메모리 디렉토리(117)의 캐쉬 디렉토리에서 인덱스 어드레스는 일치하지만, 태그 어드레스가 다른 번지를 원거리 노드의 특정 프로세서가 가지고 있다는 것을 의미하므로, 로컬 메모리 디렉토리(117)의 상태는 갱신하지 않고 계속 무효상태를 유지하게 된다.This means that the address of the local memory directory 117 is kept invalid because the specific processor of the remote node has the same address in the cache directory of the local memory directory 117 but the tag address is different. Will be maintained.

로컬 메모리 디렉토리(117)의 상태가 공유상태일 경우에는 원거리 노드와 해당 데이터를 공유하고 있는지를 확인해야 하므로 먼저, 현재의 트랜잭션을 연기(defer)시킨 후, 상호 연결버스를 통하여 원거리 노드에 있는 버스 트랜잭션 제어부에게 해당 데이터의 공유화 여부를 확인해줄 것을 요구하게 된다. 원거리 노드의 버스 트랜잭션 제어부는 원거리 노드내의 로컬버스에 단순 읽기를 위한 트랜잭션을 발생시킨 후 스누프 단계를 관찰한다.When the state of the local memory directory 117 is shared, it is necessary to check whether the corresponding data is shared with the remote node. Therefore, first, the current transaction is deferred, and then the bus in the remote node through the interconnection bus. The transaction controller is required to confirm whether the data is shared. The bus transaction controller of the far node generates a transaction for simple reading on the local bus in the far node and observes the snoop phase.

상기의 스누프 단계에서 'HIT'신호가 구동되면, 원거리 노드의 로컬 버스상의 특정 프로세서가 해당 데이터를 공유상태로 가지고 있다는 것을 의미하기 때문에, 원거리 노드의 버스 트랜잭션 제어부는 상호 연결 버스를 통해서 해당 노드의 버스 트랜잭션 제어부(111)에 공유하고 있다는 응답을 하게 된다.When the 'HIT' signal is driven in the snoop phase, it means that a particular processor on the local bus of the far node has the corresponding data shared, so the bus transaction controller of the far node is connected to the node through the interconnect bus. The bus transaction control unit 111 of the response is shared.

이때, 버스 트랜잭션 제어부(111)는 자신의 노드의 로컬버스 상에서 연기되었던 트랜잭션의 응답 주기에서 요청 프로세서가 해당 트랜잭션을 재 수행하도록 재시도 응답을 한다.At this time, the bus transaction control unit 111 retries the request processor to perform the transaction again in the response period of the transaction that has been postponed on the local bus of its node.

요청 프로세서가 버스 트랜잭션을 재 시작했을 때 버스 트랜잭션 제어부(111)는 스누프 단계에서 'HIT' 신호를 구동함으로서, 해당 데이터가 공유되고 있음을 알려주고 응답 주기에서 로컬 메모리(120)로부터 데이터를 공급한다. 이때, 로컬 메모리 디렉토리(117)의 상태는 갱신하지 않고 계속 공유상태를 유지하게 된다.When the request processor restarts the bus transaction, the bus transaction control unit 111 drives the 'HIT' signal in the snoop phase, thereby informing that the data is being shared and supplying data from the local memory 120 in a response period. . At this time, the state of the local memory directory 117 is not updated and continues to be shared.

스누프 단계에서 'HIT' 신호가 구동되지 않으면, 로컬 버스 상에서 해당 데이터가 없다는 것을 의미하고, 원거리 노드의 버스 트랜잭션 제어부는 상호 연결 버스를 통해서 해당 노드의 버스 트랜잭션 제어부(111)에 요청된 데이터가 없다는 응답을 전달하게 된다. 응답을 받은 버스 트랜잭션 제어부(111)는자신의 노드의 로컬버스 상에서 연기되었던, 트랜잭션의 응답 주기를 수행하여 해당 데이터를 로컬 메모리(120)로부터 읽어와서 읽기 트랜잭션을 발생시켰던 프로세서에게 제공하게 된다. 이것은 로컬 메모리 디렉토리(117)의 디렉토리 캐쉬에서 인덱스 어드레스는 일치하지만, 태그 어드레스가 다른 어드레스를 원거리 노드의 특정 프로세서가 가지고 있다는 것을 의미하기 때문에 로컬 메모리 디렉토리(117)의 상태는 갱신하지 않고 계속 공유상태를 유지하게 된다.If the 'HIT' signal is not driven in the snoop phase, it means that there is no corresponding data on the local bus, and the bus transaction controller of the far node is requested to the bus transaction controller 111 of the node through the interconnection bus. Will return a response. The bus transaction controller 111 receiving the response performs a response cycle of the transaction, which has been postponed on the local bus of its node, to read the corresponding data from the local memory 120 and provide the read transaction to the processor. This means that the index address in the directory cache of the local memory directory 117 matches, but because the tag processor has a different address from a particular processor on the remote node, the state of the local memory directory 117 is not updated and continues to be shared. Will be maintained.

다음은 소정의 프로세서가 자신의 노드의 로컬 메모리(120)에 접근하는 경우 발생된 버스 트랜잭션을 검지하여, 만약 무효화 버스 트랜잭션이 발생될 경우 로컬 메모리 디렉토리(117)를 관리하는 과정을 도 5c를 참조하여 보다 상세히 설명하면 다음과 같다.Next, a process of detecting a bus transaction generated when a predetermined processor accesses the local memory 120 of its node, and managing the local memory directory 117 when an invalidation bus transaction occurs, see FIG. 5C. When described in more detail as follows.

로컬버스 상에서 발생한 트랜잭션이 무효화(Invalidation)일 경우에는 먼저 로컬 메모리 디렉토리(117)의 상태를 확인하게 된다.When the transaction occurring on the local bus is invalidation, the state of the local memory directory 117 is first checked.

로컬 메모리 디렉토리(117)의 상태가 클린상태일 경우에는 해당 데이터가 원거리 노드에 존재하고 있지 않다는 것을 의미하기 때문에 원거리 노드의 로컬버스에 무효화 트랜잭션을 발생시킬 필요 없이 종료하게 된다.If the state of the local memory directory 117 is in a clean state, it means that the corresponding data is not present in the remote node. Therefore, the local memory directory 117 ends without invalidating a transaction on the local bus of the remote node.

그러나, 만약 로컬 메모리 디렉토리(117)의 상태가 무효상태일 경우에는, 로컬 메모리 디렉토리(117)의 캐쉬 디렉토리에서 인덱스 어드레스는 일치하지만, 태그 어드레스가 다른 어드레스를 원거리 노드의 특정 프로세서가 더티상태로 가지고 있다는 것을 의미하기 때문에 로컬 메모리 디렉토리(117)의 상태는 갱신하지 않고, 트랜잭션을 종료하게 된다.However, if the state of the local memory directory 117 is invalid, a particular processor of the remote node may have an address in the cache directory of the local memory directory 117 that has an identical address but a different tag address. Because it means that there is, the status of the local memory directory 117 is not updated, and the transaction is terminated.

또한, 로컬 메모리 디렉토리(117)의 상태가 공유상태일 경우에는 원거리 노드와 해당 데이터를 공유하고 있는지를 확인해야 하기 때문에, 상호 연결 버스를 통해서 원거리 노드에 있는 버스 트랜잭션 제어부는 원거리 노드내의 로컬버스에 무효화 트랜잭션을 발생시킨 후 스누프 단계를 관찰한다.In addition, when the state of the local memory directory 117 is shared, it is necessary to check whether the corresponding data is shared with the remote node. Therefore, the bus transaction controller in the remote node is connected to the local bus in the remote node through the interconnection bus. After generating an invalidation transaction, observe the snoop phase.

스누프 단계에서 'HIT' 신호가 구동되면, 원거리 노드의 로컬버스상의 특정 프로세서가 해당 데이터를 공유상태로 가지고 있다는 것을 의미하기 때문에, 원거리 노드의 버스 트랜잭션 제어부는 상호 연결 버스를 통하여 해당 노드의 버스 트랜잭션 제어부(111)에게 공유하고 있다는 응답을 하게 된다. 이때, 해당 노드의 버스 트랜잭션 제어부(111)는 원거리 노드에서 공유하고 있던 데이터가 무효화되었기 때문에 로컬 메모리 디렉토리(117)의 상태를 클린상태로 갱신하게 된다.When the 'HIT' signal is driven in the snoop phase, it means that a specific processor on the local node's local bus has the data shared, so the bus transaction controller of the remote node can access the node's bus through the interconnect bus. The transaction control unit 111 responds to the sharing. At this time, the bus transaction control unit 111 of the node updates the state of the local memory directory 117 to a clean state because the data shared by the remote node is invalidated.

그러나, 만약 스누프 단계에서 'HIT' 신호가 구동되지 않으면 원거리 노드의 로컬 버스 상에 해당 데이터가 없다는 것을 의미하고, 원거리 노드의 버스 트랜잭션 제어부는 상호 연결 버스를 통해서 해당 노드의 버스 트랜잭션 제어부(111)에게 해당 데이터가 없다는 응답을 전달하게 된다. 이것은 해당 노드의 로컬 메모리 디렉토리(117)의 캐쉬 디렉토리에서 인덱스 어드레스는 일치하지만, 태그 어드레스가 다른 어드레스를 원거리 노드의 특정 프로세서가 공유상태로 가지고 있다는 것을 의미하기 때문에 로컬 메모리 디렉토리(117)의 상태는 갱신하지 않고, 트랜잭션을 종료하게 된다.However, if the 'HIT' signal is not driven in the snoop phase, it means that there is no corresponding data on the local bus of the far node, and the bus transaction controller of the far node is the bus transaction controller 111 of the node through the interconnect bus. ) Is sent a response saying that the data is missing. This means that the address of the local memory directory 117 is shared because a particular processor of the remote node has shared the address of the index in the cache directory of the node's local memory directory 117, but the tag address is different. The transaction is terminated without updating.

다음은 원거리 노드의 프로세서가 자신의 노드의 로컬 메모리(120)에 접근하는 경우 발생된 버스 트랜잭션을 검지하여, 해당 버스 트랜잭션에 따라 로컬 메모리 디렉토리(117)를 관리하는 과정을 도 5d를 참조하여 보다 상세히 설명하면 다음과 같다.Next, a process of detecting a bus transaction generated when a processor of a remote node accesses the local memory 120 of its node and managing the local memory directory 117 according to the corresponding bus transaction will be described with reference to FIG. 5D. It will be described in detail as follows.

만약, 원거리 노드의 특정 프로세서에서 발생된 버스 트랜잭션이 무효화(Invalidation) 트랜잭션일 경우에는, 원거리 노드에 있는 프로세서가 먼저 읽어간 데이터를 갱신하려고 한다는 것을 의미하므로, 로컬 메모리 디렉토리(117)상태를 무효상태로 갱신한다.If a bus transaction generated in a specific processor of the remote node is an invalidation transaction, it means that the processor in the remote node attempts to update the data read first, thus invalidating the state of the local memory directory 117. Update to.

또한, 만약 버스 트랜잭션 형태가 쓰기(Write)일 경우에는 원거리 노드에 있는 프로세서가 먼저 읽어가서 갱신한 데이터를 같은 인덱스 어드레스의 다른 캐쉬 라인으로 교체(replacement)하기 위해서 원거리 노드의 로컬 버스상에서 버스 쓰기 라인(Bus Write Line) 트랜잭션을 수행하였거나, 원거리 로컬 버스 상에 다른 프로세서의 단순 읽기 요구에 의해서 암시 라이트백(implicit writeback)을 수행한 경우이다.Also, if the bus transaction type is Write, the bus write line on the local node's local bus in order to replace the data first read and updated by the processor at the remote node with another cache line at the same index address. (Bus Write Line) This is the case when a transaction is executed or an implicit writeback is performed by a simple read request of another processor on a remote local bus.

이때, 해당 노드의 버스 트랜잭션 제어부(111)는 원거리 노드의 로컬 버스상의 버스 쓰기 라인(Bus Write Line) 트랜잭션에 의한 쓰기(Write)일 경우에는 해당 노에 요청된 데이터가 더 이상 존재하지 않다는 것을 의미하기 때문에 로컬 메모리 디렉토리(117)의 상태를 클린상태(Clean)로 갱신한다. 그리고 원거리 노드의 로컬버스 상에 다른 프로세서의 단순 읽기 요구로 인하여 발생한 암시 라이트백 트랜잭션에 의한 쓰기일 경우에는 근거리 노드와 해당 데이터를 공유하고 있다는 것을 의미하므로, 로컬 메모리 디렉토리(117)의 상태를 공유상태로 갱신한다.At this time, the bus transaction control unit 111 of the corresponding node means that the data requested to the corresponding furnace no longer exists in the case of writing by a bus write line transaction on the local bus of the remote node. Therefore, the state of the local memory directory 117 is updated to a clean state. In the case of a write by an implicit writeback transaction caused by a simple read request of another processor on the local bus of the remote node, it means that the corresponding data is shared with the local node. Thus, the state of the local memory directory 117 is shared. Update to the state.

또한, 원거리 노드의 로컬 버스상에 다른 프로세서의 쓰기를 위한 읽기 요구 때문에 발생한 암시 라이트백(Implicit Writeback) 트랜잭션이 발생하였을 경우에는 원거리 노드 상에서 캐시에서 캐시로의(cache to cache) 전송이 발생하게 되고 해당 노드의 로컬 메모리(120)에는 접근이 발생하지 않게 되고, 로컬 메모리 디렉토리(117)의 상태는 계속 무효상태를 유지하고 있게 된다.In addition, when an implicit writeback transaction occurs due to a read request for writing to another processor on the local bus of the far node, a cache to cache transfer occurs on the far node. Access does not occur in the local memory 120 of the node, and the state of the local memory directory 117 remains in an invalid state.

또한, 버스 트랜잭션 형태가 쓰기를 위한 읽기인 경우에는 원거리 노드에 있는 프로세서가 궁극적으로 해당 데이터를 갱신하기 위하여 읽어 가는 것이기 때문에 로컬 메모리(120)로부터 해당 데이터를 제공하고, 로컬 메모리 디렉토리(117)의 상태는 무효상태로 갱신한다.In addition, when the bus transaction type is read for writing, since the processor at the far node is ultimately reading to update the data, the corresponding data is provided from the local memory 120 and the local memory directory 117 is provided. The state is updated to invalid state.

또한, 버스 트랜잭션 형태가 단순한 읽기(RFR) 트랜잭션인 경우에는 원거리 노드에 있는 프로세서가 읽기 전용으로 해당 데이터를 읽어가는 것이기 때문에 로컬 메모리(120)로부터 해당 데이터를 제공하고, 로컬 메모리 디렉토리(117)의 상태는 공유상태로 갱신한다.In addition, when the bus transaction is a simple read (RFR) transaction, since the processor at the remote node reads the data as read-only, the corresponding data is provided from the local memory 120, and the local memory directory 117 The state is updated to shared state.

이상에서 설명한 바와 같이 본 발명은 분산 공유 메모리 구조의 SMP 시스템에서 각종 프로그램을 수행할 경우 그 특성상 시간 및 공간의 지역성이 있으므로, 같은 인덱스 어드레스를 가졌으나 태그 어드레스가 다른 어드레스가 발생할 확률이 매우 낮으므로, 본 발명에 따른 분산 공유 메모리의 데이터 일관성 제어방법 및 장치에 의하여 최대 공유 메모리 용량에 관계없이 작은 용량의 고속 메모리를 가지고 디렉토리 캐시를 구현함으로써, 분산되어 있는 로컬 메모리 사이에 데이터 일관성을 유지시킬 수 있는 효과가 있다.As described above, in the SMP system having a distributed shared memory structure, the present invention has a locality of time and space because of its characteristics, and therefore, the probability of generating an address having the same index address but different tag address is very low. By using the method and apparatus for controlling data coherency of distributed shared memory according to the present invention, a directory cache may be implemented with a small amount of high-speed memory regardless of the maximum shared memory capacity, thereby maintaining data consistency among distributed local memories. It has an effect.

Claims

In a node consisting of a symmetric multiprocessor of a distributed shared memory structure having a plurality of processors, a predetermined processor detects a corresponding bus transaction when referring to local memory in its node, and when the bus transaction is a read transaction for writing, Detecting a local memory directory state managing the local memory; A second step of observing a snoop step after generating a corresponding transaction at a remote node when the local memory directory state is invalid; A third step of receiving the data back from the remote node and supplying the data to the processor when the corresponding data exists as a result of snooping; A fourth step of receiving a corresponding memory state from the remote node if the corresponding data does not exist as a result of snooping; A fifth step of performing a snoop step after generating an invalidation transaction of the corresponding data in a remote node when the state of the local memory directory is shared; A sixth step of invalidating the corresponding data of the remote node and providing the corresponding data to the processor from the local memory when the corresponding data existing in the remote node is shared as a result of snooping; And a seventh process of providing the corresponding data existing in the own node to the corresponding processor when the corresponding data does not exist in the remote node as a result of the snooping.

The method of claim 1, wherein the fourth process comprises: receiving a result from the remote node when there is no corresponding data on the remote local bus as a result of snooping of the local bus of the remote node; Maintaining the local memory directory state after receiving the result from the remote node, and performing a delayed transaction to provide the corresponding data from the local memory to the processor. .

2. The method of claim 1, wherein the seventh step comprises: receiving a result from the remote node when there is no corresponding data on the remote local bus as a result of snooping of the local bus of the remote node; Performing a response period of a transaction that has been postponed on the local bus of the node after receiving a response from the remote node that there is no corresponding data; According to the response of the transaction, the corresponding data is provided to the processor that generated the transaction from the local memory of its node, and the local memory directory state consists of maintaining the shared state. Way.

In a node composed of a symmetric multiprocessor with a distributed shared memory structure having a plurality of processors, when a processor refers to local memory in its node, a corresponding bus transaction is detected and the bus transaction is a simple read of data. Detecting a state of a local memory directory managing the local memory; Delaying a currently driven transaction according to the local memory directory state, generating a simple read transaction at a remote node, and observing a snoop step; A third state in which the data is written back from the remote node and supplied to the corresponding processor when the local memory directory state is invalid and the corresponding data is dirty by a predetermined processor of the remote node as a result of snooping of the local bus of the remote node; Process; A fourth step of receiving a corresponding memory state from the remote node if the corresponding data does not exist as a result of snooping; If the state of the local memory directory is shared and the snoop step observes that the predetermined processor of the remote node is sharing the data, the response period of the transaction that has been postponed after receiving the state of the data from the remote node A fifth step in which the request processor re-executes the transaction; And a sixth process of providing the corresponding data existing in the own node to the corresponding processor when the corresponding data does not exist in the remote node as a result of the snooping.

5. The method of claim 4, wherein the fourth process comprises: receiving a status from the remote node when there is no corresponding data as a result of snooping of the local bus of the remote node; Maintaining the state of the local memory directory as it is after receiving the state from the remote node, and performing the delayed transaction to provide the corresponding data from the local memory to the processor. .

5. The method of claim 4, wherein the sixth step comprises: receiving a result from the remote node when there is no corresponding data on the remote local bus as a result of snooping of the local bus of the remote node; Performing a response period of a transaction that has been postponed on the local bus of the node after receiving a response indicating that there is no corresponding data from the remote node; According to the response of the transaction, the corresponding data is provided to the processor that generated the transaction from the local memory of its node, and the local memory directory state consists of maintaining the shared state. Way.

In a node composed of a symmetric multiprocessor of a distributed shared memory structure having a plurality of processors, a predetermined processor detects a corresponding bus transaction when the local memory in the node is referenced, and the local memory when the bus transaction is an invalidated transaction. Detecting a state of a local memory directory to manage a state; A second step of observing a snoop step after running an invalidation transaction on a local bus of a remote node when the local memory directory state is a shared state; A third step of changing the state of the local memory directory to a clean state after receiving the state of the data from the remote node when the specific processor of the remote node has the corresponding data in the shared state as a result of the snoop step observation; If the particular processor of the remote node does not have the corresponding data as a result of the snoop step observation, a fourth process of receiving the state of the data from the remote node and maintaining the state of the local memory directory as it is distributed How to control data consistency in memory.

A first step of detecting a corresponding bus transaction when a predetermined processor of a remote node refers to local memory in its node in a node formed of a symmetric multiprocessor of a distributed shared memory structure having a plurality of processors; A second step of updating a local memory directory state to an invalid state when the detected transaction is an invalidated transaction; If the detected transaction reads data, the local memory directory state is updated to invalid state if it is a read transaction for writing or shared state if it is a simple read transaction, and then data is read from local memory. 3 courses; And a fourth process of updating a local memory directory to a corresponding state when the detected transaction is a write transaction.

A local memory directory for managing data history by maintaining a history of shared local memory, which is mounted in a plurality of nodes of a symmetric multiprocessor of a distributed shared memory structure having a plurality of processors, and a directory controller for controlling the local memory directory In the data consistency control device of distributed shared memory including:

The local memory directory includes a byte offset of a predetermined bit for byte-by-byte addressing in a cache line; A predetermined bit index address that addresses the address of the directory cache when accessing the directory cache; A tag address for mapping the index address at a predetermined rate; And a preliminary address for securing a capacity of a management memory according to an increase in the capacity of the local memory.

10. The apparatus of claim 9, wherein the index address has a size of 4 megabytes and has 8 different tag addresses for each index address.