KR100257163B1

KR100257163B1 - Method and apparatus for locality control of symmetric multiprocessing system

Info

Publication number: KR100257163B1
Application number: KR1019970049552A
Authority: KR
Inventors: 정병선
Original assignee: 구자홍; 엘지전자주식회사
Priority date: 1997-09-29
Filing date: 1997-09-29
Publication date: 2000-05-15
Also published as: KR19990027155A

Abstract

PURPOSE: A method and apparatus for controlling a locality of a symmetric multiprocessing system are provided to process a transaction generated in one processor node, and to prevent a degradation according to remote memory access to another processor node by improving a locality in each processor node using hardware. CONSTITUTION: In case that a transaction generated from one processor among a plurality of processors is driven to a local bus, and cache memories of the rest processors are referred through the first local bus by a cache miss generated from the first processor. In case that the cache miss is generated from each processor, a bus transaction control unit senses the cache miss, and judges whether relevant data are existed in the first shared memory or a local memory.

Description

The method and apparatus for locality control of symmetric multi-processing system

본 발명은 분산형 공유 메모리(Distributed shared memory)구조인 대칭형 다중 처리(Symmetric Multi-Processing : 이하 "SMP"라 약칭함) 방식의 중대형 컴퓨터에 관한 것으로서, 보다 상세하게는 SMP 컴퓨터 시스템에서 다른 노드의 메모리를 참조하는 원격 접근(Remote memory access)을 최소화할 수 있도록 지역 메모리를 하드웨어로 구현하고 이를 제어하도록 하는 SMP 시스템의 지역성 제어방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a medium-to-large computer of symmetric multi-processing (hereinafter, abbreviated as "SMP"), which is a distributed shared memory structure, and more particularly, to another node in an SMP computer system. The present invention relates to a method and apparatus for controlling locality of an SMP system that implements and controls local memory in hardware so as to minimize remote memory access referring to memory.

일반적으로 SMP 시스템의 구조는 도 1에 도시된 바와 같이 CPU와 캐시 메모리를 포함한 다수개의 프로세서(11∼13)가 상호 연결버스(Interconnection Bus)에 접속되어 있고, 상기 상호 연결버스를 통하여 데이터를 공유하도록 상기 상호 연결버스에 접속되어 있는 공유 메모리(14)로 구성되어 있다.In general, as shown in FIG. 1, the SMP system has a plurality of processors 11 to 13 including a CPU and a cache memory connected to an interconnection bus, and share data through the interconnection bus. The shared memory 14 is connected to the interconnection bus.

이와 같이 구성된 단일 버스 기반의 SMP 시스템은 각 프로세서(11∼13)에 캐시 메모리를 구비하여 상호 연결버스 상에서의 데이터 일관성을 하드웨어에 의하여 유지시켜 주기 때문에 소프트웨어의 이식성이 높고 프로그래밍이 용이하며 단순히 프로세서 장치의 증설에 의한 성능 증가 효과가 크다.The single bus-based SMP system configured as described above has a cache memory in each processor 11 to 13 to maintain data consistency on the interconnect bus by hardware, so that the software is highly portable, easy to program, and simply a processor device. The performance increase effect by the expansion of is large.

또한, 공유 메모리(14) 노드가 한 곳에 집중되어 있기 때문에 각 프로세서(11∼13)에서 공유 메모리(14)에 접근할 경우 지연시간(Latency time)이 일정하게 된다. 따라서, 이러한 구조를 균등 메모리 억세스(UMA : Uniform Memory Access)구조라고 한다.In addition, since the nodes of the shared memory 14 are concentrated in one place, when the shared memory 14 is accessed by each of the processors 11 to 13, a latency time is constant. Therefore, such a structure is called a uniform memory access (UMA) structure.

그러나 이와 같은 구조의 SMP 시스템은 시스템의 확장성에 문제점이 있었다. 즉, 상호 연결버스에는 수백 개의 신호가 전송되고 또한 이것이 점점 고속화되기 때문에 신호 전송의 물리적, 전기적 제한 그리고 상호 연결 버스 상에서 버스 트랜잭션의 상호 충돌 때문에 여기에 접속될 수 있는 프로세서의 개수는 제한적인 문제점이 있었다.However, such an SMP system has a problem in scalability of the system. In other words, because hundreds of signals are transmitted on the interconnect bus and it becomes faster, the physical and electrical limitations of the signal transmission and the number of processors that can be connected to it because of the collision of bus transactions on the interconnect bus are limited. there was.

따라서, 이와 같은 확장성을 개선하기 위하여 도 2에 도시된 바와 같이 분산형 공유 메모리 구조의 SMP 시스템을 구축하게 되었다.Accordingly, in order to improve such scalability, an SMP system having a distributed shared memory structure is constructed as shown in FIG. 2.

이와 같은 분산형 공유 메모리 구조의 SMP 시스템은 로컬버스에 다수개의 프로세서를 접속하고, 상기 로컬버스에 구동된 트랜잭션을 처리하도록 하고, 별도의 공유 메모리를 구비하는 제 1, 제 2 프로세서 노드(20)(30)와; 상기 제 1, 제 2 프로세서 노드(20)(30)를 접속하여 상호 데이터를 교환하는 상호 연결 버스로 구성된다.The SMP system having a distributed shared memory structure as described above connects a plurality of processors to a local bus, processes transactions driven on the local bus, and includes first and second processor nodes 20 having separate shared memories. 30; It consists of an interconnection bus that connects the first and second processor nodes 20 and 30 to exchange data with each other.

이와 같이 구성된 종래의 분산형 공유 메모리 구조의 SMP 시스템은 제 1, 제 2 공유 메모리(26)(36)가 각 프로세서 노드(20)(30)에 분산되어 있으므로, 소정의 프로세서에서 로컬 버스에 구동한 트랜잭션을 처리하기 위하여 해당 데이터가 자신의 프로세서 노드의 메모리를 억세스하고, 자신의 프로세서 노드의 메모리에 존재하지 않을 경우에는 다른 프로세서 노드의 메모리를 억세스해야 한다.In the SMP system of the conventional distributed shared memory structure configured as described above, since the first and second shared memories 26 and 36 are distributed to each processor node 20 and 30, the SMP system is driven on a local bus in a predetermined processor. In order to process one transaction, the data needs to access the memory of its own processor node, and if it does not exist in the memory of its own processor node, it must access the memory of another processor node.

예를 들어, 제 1 프로세서 노드(20)에 존재하는 소정의 프로세서에서 구동된 트랜잭션을 처리하기 위하여 제 1 버스 트랜잭션 제어부(25)의 제어에 의하여 제 1 메모리(26)에 원하는 데이터가 존재할 경우 이를 억세스하여 해당 프로세서에 제공한다. 그러나, 요청된 데이터가 제 2 프로세서 노드(30)의 제 2 메모리(26)에 존재할 경우에는 제 1 버스 트랜잭션 제어부(25)의 제어에 의하여 상호 연결버스를 통하여 원격 접근을 함으로써, 요청된 데이터를 제 2 메모리(26)로부터 패치하여 해당 프로세서에 제공하게 된다.For example, if desired data exists in the first memory 26 under the control of the first bus transaction control unit 25 to process a transaction driven by a predetermined processor present in the first processor node 20, this may be the case. Access to the processor. However, when the requested data exists in the second memory 26 of the second processor node 30, the requested data may be remotely accessed through the interconnection bus under the control of the first bus transaction control unit 25. Patches from the second memory 26 to provide to the processor.

여기서, 각 프로세서가 자신의 프로세서 노드의 메모리에 접근할 경우에는 지연시간이 짧게 되고, 다른 프로세서 노드의 메모리를 접근해야 할 경우에는 상호 연결 버스를 통해야 하기 때문에 상대적으로 긴 지연시간이 필요하게 된다.In this case, when each processor accesses the memory of its processor node, the delay time is short, and when the memory of another processor node needs to be accessed, a relatively long delay time is required.

즉, 프로세서의 메모리 접근시 지연시간이 일정하지 않기 때문에 이러한 구조를 비균등 메모리 접근(Non-Uniform Memory Access : 이하 "NUMA"라 약칭함) 구조라 한다. 이와 같은 NUMA 구조 중에서 상호 연결 버스 상에서의 데이터 일관성을 하드웨어에 의하여 유지시켜 주는 구조를 캐시 일관성 NUMA(Cache Coherent NUMA : 이하 "ccNUMA"라 약칭함) 구조라 한다.That is, since the delay time is not constant when the processor accesses the memory, such a structure is called a non-uniform memory access (hereinafter, abbreviated as "NUMA") structure. Among these NUMA structures, a structure that maintains data consistency on the interconnect bus by hardware is called a cache coherent NUMA (hereinafter, abbreviated as "ccNUMA") structure.

이와 같은 ccNUMA 구조로 구현하고 있는 시스템들은 프로세서들 및 메모리 등의 탑재되어 있는 하나의 노드 단위를 기본으로 하고 있는데, 이러한 노드 내에서는 일정한 범위 내에서의 다수개의 SMP 프로세서가 탑재되어 있고, 이러한 노드들이 상호 연결버스로 연결되어 있다. 그리고 도 1에 도시된 바와 같은 SMP 구조의 다중 프로세서 시스템의 문제점인 확정성을 보완하기 위하여 즉, 다수개의 노드들을 접속시키기 위하여 에스씨아이(Scalable Coherent Interconnect : 이하, "SCI" 약칭함) 버스, 크로스바 스위칭(Crossbar switching) 버스등 새로운 상호 연결 버스 방식을 사용하고 있다.Systems implemented with such a ccNUMA structure are based on a single node unit in which processors and memory are installed. In these nodes, a plurality of SMP processors within a certain range are installed. It is connected by interconnection bus. In addition, to compensate for the determinism of the multi-processor system of the SMP structure as shown in FIG. 1, that is, to connect a plurality of nodes, a SC (Scalable Coherent Interconnect: hereinafter abbreviated as "SCI") bus, crossbar A new interconnect bus is used, such as a crossbar switching bus.

그런데, 분산 공유 메모리 구조의 시스템은 자신의 프로세서 노드내의 각 프로세서에 탑재된 캐시 메모리에서 데이터를 교환하는 것이 최상의 조건이며, 만약 각 프로세서로부터 캐시 미스가 발생할 경우에는 자신의 프로세서 노드에 탑재된 메모리를 접근한다. 만약, 자신의 프로세서 노드의 메모리에 데이터가 존재하지 않을 경우에는 다른 프로세서 노드의 데이터를 참조해야 한다.However, in a distributed shared memory system, it is best to exchange data in the cache memory of each processor in its processor node. If a cache miss occurs from each processor, the memory loaded in the processor node may be changed. Approach If there is no data in the memory of its processor node, it should refer to the data of another processor node.

그러나, 이와 같이 종래 기술에 따른 분산형 공유 메모리 구조의 SMP 시스템은 소정의 프로세서에서 발생된 트랜잭션을 처리하기 위하여 다른 프로세서 노드의 메모리 영역을 접근해야 할 경우, 특히 이와 같은 다른 프로세서 노드로의 접근이 빈번할 수록시스템 성능이 저하되는 문제점이 있었다.However, in the SMP system of the distributed shared memory structure according to the related art, when the memory area of another processor node needs to be accessed in order to process a transaction generated by a given processor, in particular, access to such another processor node is not possible. As the frequency increases, system performance deteriorates.

따라서, 상기한 문제점을 개선하고자 하여 대용량의 외부 캐시 메모리를 탑재하여 프로세서 노드의 상호 연결 버스로의 접근을 최소화하고자 할 경우에는 하드웨어적으로 구현하는 것이 매우 복잡해지는 문제점이 있다.Therefore, there is a problem in that the hardware implementation is very complicated in order to minimize the access to the interconnection bus of the processor node by mounting a large external cache memory to improve the above problem.

또한, 상기한 문제점을 개선하고자 하여 프로세스 스케듈링(Process scheduling)이나 메모리 할당(Memory allocation)이 인접한 프로세서나 메모리에서 발생할 수 있도록 하는 특성 등이 구현되도록 하는 소프트웨어적인 해결 방법이 있으나, 그 구현 방법이 매우 복잡하고, 난이도가 높은 문제점이 있다.In addition, there is a software solution to implement the characteristics such that process scheduling or memory allocation can occur in an adjacent processor or memory in order to improve the above problems, but the implementation method It is very complicated and has a high difficulty.

따라서, 본 발명은 상기한 종래 기술의 문제점을 개선하고자 하여 각 프로세서 노드 내의 지역성(Locality)을 하드웨어적으로 개선함으로써, 분산 공유 메모리 구조의 한 프로세서 노드 내에서 발생된 트랜잭션을 보다 많이 처리하도록 하여 다른 프로세서 노드로의 원거리 메모리 접근에 따른 성능 저하를 방지하는 SMP 시스템의 지역성 제어방법 및 장치를 제공함에 그 목적이 있다.Accordingly, the present invention seeks to improve the above-mentioned problems of the prior art by hardware improvement of locality within each processor node, thereby processing more transactions generated within one processor node of a distributed shared memory structure, and thus the other. An object of the present invention is to provide a method and apparatus for controlling locality of an SMP system that prevents performance degradation due to remote memory access to a processor node.

도 1은 종래의 단일 버스 방식의 다중 처리 시스템의 구성도이고,1 is a block diagram of a conventional single bus multi-processing system,

도 2는 본 발명에 따른 대칭형 다중처리 시스템의 지역성 제어장치의 블록 구성도이고,2 is a block diagram of an apparatus for controlling locality of a symmetric multiprocessing system according to the present invention;

도 3은 도 2에서 지역성 제어부의 상세 블록 구성도이고,FIG. 3 is a detailed block diagram of the locality controller of FIG. 2;

도 4는 본 발명에 따른 대칭형 다중처리 시스템의 지역성 제어과정의 흐름도이다.4 is a flowchart of a locality control process of a symmetric multiprocessing system according to the present invention.

〈 도면의 주요 부분에 대한 부호의 설명 〉<Description of the code | symbol about the principal part of drawing>

100, 200 : 제 1, 제 2 노드100, 200: first and second nodes

101∼104, 201∼204 : 제 1 내지 제 8 프로세서101 to 104, 201 to 204: first to eighth processors

110, 210 : 제 1, 제 2 버스 트랜잭션 제어부110, 210: first and second bus transaction control unit

111, 211 : 제 1, 제 2 공유 메모리111, 211: first and second shared memory

120, 220 : 제 1, 제 2 지역성 제어부 121 : 제 1 태그 어드레스 큐 122, 124 : 제 2, 제 3 태그 어드레스 저장부120, 220: first and second locality control unit 121: first tag address queue 122, 124: second, third tag address storage unit

123, 125 : 제 1, 제 2 비교부 126 : 제어부123 and 125: first and second comparators 126: control unit

127 : 인덱스 어드레스 저장부 128 : 인덱스 어드레스 카운터127: index address storage unit 128: index address counter

129 : 지역 메모리129: local memory

301, 302 : 제 1, 제 2 입/출력 버스 브리지301 and 302: first and second input / output bus bridges

상기한 본 발명의 목적을 달성하기 위한 SMP 시스템의 지역성 제어방법은 분산형 공유 메모리 구조의 다중 프로세서 시스템의 로컬버스에 발생된 트랜잭션에 해당하는 데이터를 공급하도록 다른 노드의 메모리로 접근 할 경우, 일정 시간동안 동일 어드레스가 다른 노드의 메모리로 접근하는 횟수를 검지하는 제 1 과정과; 상기 다른 노드로의 접근 빈도가 일정 횟수 이상일 경우, 상기 구동된 어드레스 영역의 버스트 데이터(Burst Data)를 다른 노드의 메모리로부터 자신의 노드의 지역 메모리에 저장시키는 제 2 과정과; 상기 로컬버스에 구동된 어드레스에 해당되는 데이터가 상기 지역 메모리에 존재할 경우 상기 데이터를 해당 프로세서에 제공하는 제 3 과정으로 이루어짐을 특징으로 한다.The locality control method of the SMP system for achieving the above object of the present invention, when access to the memory of the other node to supply the data corresponding to the transaction occurred on the local bus of the multi-processor system of the distributed shared memory structure, Detecting a number of times the same address accesses the memory of another node during the time period; A second process of storing burst data of the driven address area from a memory of another node to a local memory of its own node when the frequency of access to the other node is more than a predetermined number of times; When the data corresponding to the address driven on the local bus is present in the local memory, the third process of providing the data to the processor.

여기서, 제 1 과정은 소정의 프로세서에서 발생된 트랜잭션을 처리하기 위하여 원격 노드의 메모리를 참조해야 할 경우, 상기 트랜잭션의 관련 어드레스를 감시하고, 현재 다른 어드레스에 대한 원격 노드로의 접근 빈도 측정 중인가를 판단하는 단계와; 상기 다른 어드레스에 대한 원격 노드로의 접근 빈도 측정이 종료된 후 추후 구동된 어드레스와 비교하도록 현재 구동된 어드레스를 저장하는 단계와; 상기 원격 노드로의 접근 빈도 측정 중이 아닐 경우 현재 구동된 어드레스를 저장하고, 이전에 구동된 어드레스와 비교하는 단계와; 상기 비교 결과 매치상태에 따라 그 횟수를 계수하는 단계; 상기 매치 횟수가 설정된 횟수 이상인지를 판단하는 단계로 이루어짐을 특징으로 한다.In this case, when the first process needs to refer to the memory of the remote node in order to process a transaction generated by a given processor, the first process monitors the relevant address of the transaction and determines whether the access frequency of the remote node to another address is currently measured. Judging; Storing the currently driven address for comparison with a later driven address after the measurement of the frequency of access to the remote node for the other address is complete; Storing the currently driven address and comparing it with a previously driven address if the access frequency to the remote node is not being measured; Counting the number of times according to a match state as a result of the comparison; And determining whether the number of matches is a set number or more.

본 발명의 목적을 달성하기 위한 대칭형 다중처리 시스템의 지역성 제어장치는 다중 프로세서를 연결하는 로컬버스에 발생된 트랜잭션을 처리하도록 해당 어드레스의 메모리 접근을 제어하는 버스 트랜잭션 제어수단과, 상기 버스 트랜잭션 제어수단을 통하여 원격 노드로의 메모리 접근이 발생할 경우, 동일 어드레스가 일정시간 내에 설정된 횟수 이상 발생함에 따라, 해당 어드레스 영역의 일정한 규모의 데이터를 자신의 노드의 지역 메모리에 저장하는 지역성 제어수단을 포함한 다수개의 노드와; 상기 각 노드를 상호 접속하여 입/출력버스를 공유함과 아울러, 상기 버스 트랜잭션 제어수단의 제어에 의하여 어드레스 및 데이터를 상호 전송하는 상호 접속 버스를 포함하여 구성됨을 특징으로 한다.Locality control apparatus of a symmetric multiprocessing system for achieving the object of the present invention is a bus transaction control means for controlling the memory access of the address to process transactions generated on the local bus connecting multiple processors, the bus transaction control means When a memory access to a remote node occurs through a plurality of times, as the same address occurs more than a set number of times within a predetermined time, a plurality of locality control means for storing a certain amount of data of the corresponding address area in the local memory of its node A node; And an interconnection bus for interconnecting each node to share an input / output bus and for transmitting an address and data to each other under the control of the bus transaction control means.

여기서, 상기 각 노드에 포함된 지역성 제어수단은 버스 트랜잭션 제어수단으로부터 원격 노드로의 버스 트랜잭션이 발생할 경우 해당 태그 어드레스를 저장하는 제 1 태그 어드레스 저장부와; 상기 제 1 태그 어드레스 저장부에 저장된 어드레스를 원격 노드로의 트랜잭션 횟수를 측정하도록 저장하는 제 2 태그 어드레스 저장부와; 상기 제 2 태그 어드레스 저장부에 저장된 어드레스와 이 후 발생된 트랜잭션의 어드레스를 비교하여 일정 시간 내에 동일하게 원격 노드로의 트랜잭션 빈도를 측정하는 제 1 비교기와; 상기 제 1 비교기의 비교 결과 일정한 시간 내에 같은 태그 어드레스가 설정된 횟수 이상으로 원격 노드로의 요청할 경우 해당 태그 어드레스를 저장하는 제 3 태그 어드레스 저장부와; 상기 제 3 태그 어드레스 저장부에 저장된 태그 어드레스와 동일한 어드레스의 트랜잭션이 구동되는지를 검지하는 제 2 비교부와; 상기 각 비교부 및 각 저장수단을 제어하는 제어수단과; 상기 제 3 태그 어드레스 저장부에 저장된 태그 어드레스 영역에 해당하는 데이터가 저장된 원격 노드의 메모리로부터 상기 데이터를 저장하는 지역 메모리와; 상기 버스 트랜잭션 제어수단에서 스누핑한 어드레스의 인덱스 어드레스 부분을 저장하는 인덱스 어드레스 저장부와; 상기 지역 메모리에 데이터를 저장함에 따라 인덱스 어드레스를 발생하는 인덱스 어드레스 카운터로 이루어짐을 특징으로 한다.Here, the locality control means included in each node includes a first tag address storage unit for storing a corresponding tag address when a bus transaction from a bus transaction control means to a remote node occurs; A second tag address storage unit for storing an address stored in the first tag address storage unit to measure the number of transactions to a remote node; A first comparator that compares an address stored in the second tag address storage with an address of a subsequent generated transaction and equally measures a transaction frequency to a remote node within a predetermined time; A third tag address storage unit for storing a corresponding tag address when a request is made to the remote node more than a predetermined number of times within a predetermined time as a result of the comparison of the first comparator; A second comparator for detecting whether a transaction of the same address as the tag address stored in the third tag address storage is driven; Control means for controlling the comparison unit and each storage means; A local memory for storing the data from a memory of a remote node in which data corresponding to a tag address area stored in the third tag address storage unit is stored; An index address storage section for storing an index address portion of an address snooped by the bus transaction control means; And an index address counter that generates an index address as data is stored in the local memory.

이와 같이 이루어진 본 발명의 바람직한 실시예를 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다.When described in detail with reference to the accompanying drawings a preferred embodiment of the present invention made as follows.

도 3은 본 발명의 실시예에 따른 대칭형 다중처리 시스템의 지역성 제어장치의 개략적인 블록 구성도로서, 이에 도시된 바와 같이 제 1, 제 2 로컬버스에 각각 연결된 제 1, 제 2 프로세서 군(101∼104)(201∼204)과; 상기 각 로컬버스를 통하여 발생된 트랜잭션을 처리하도록 해당 어드레스의 메모리 접근을 제어하는 제 1, 제 2 버스 트랜잭션 제어부(BTC : Bus Transaction Controller)(110)(210)와; 상기 제 1, 제 2 버스 트랜잭션 제어부(110)(210)의 제어에 따라 상기 각 프로세서(101∼104)(201∼204) 및 다른 노드로부터의 요구에 따라 데이터를 읽기 또는 쓰기동작을 수행하는 제 1, 제 2 공유 메모리(111)(211)와; 상기 제 1, 제 2 버스 트랜잭션 제어부(110)(210)를 통하여 원격 노드로의 메모리 접근이 발생할 경우, 동일 어드레스가 일정시간 내에 설정된 횟수 이상 발생함에 따라, 해당 어드레스 영역의 일정한 데이터를 자신의 노드의 지역 메모리에 저장하는 제 1, 제 2 지역성 제어부(120)(220)를 포함한 제 1, 제 2 노드(100)(200)로 구성되어 있으며, 상기 제 1, 제 2 노드(100)(200)를 상호 접속하여 입/출력 버스(301)(302)를 공유함과 아울러, 상기 각 버스 트랜잭션 제어부(110)(210)의 제어에 의하여 어드레스 및 데이터를 상호 전송하는 상호 접속 버스로 구성된다.FIG. 3 is a schematic block diagram of an apparatus for controlling locality of a symmetric multiprocessing system according to an exemplary embodiment of the present invention. As shown in FIG. 3, the first and second processor groups 101 connected to the first and second local buses, respectively, are shown in FIG. 104) 201 to 204; First and second bus transaction controllers (BTCs) 110 and 210 for controlling memory accesses of corresponding addresses to process transactions generated through the local buses; Under the control of the first and second bus transaction controllers 110 and 210, a data read or write operation is performed in response to requests from the respective processors 101 to 104, 201 to 204, and other nodes. First and second shared memories 111 and 211; When a memory access to a remote node occurs through the first and second bus transaction controllers 110 and 210, as the same address occurs more than a set number of times within a predetermined time, constant data of the corresponding address area is generated. The first and second nodes 100 and 200 are configured to include first and second locality controllers 120 and 220 to be stored in a local memory of the first and second nodes 100 and 200. Are connected to each other, and the input / output buses 301 and 302 are shared, and the bus and the bus transaction control unit 110 and 210 are controlled by the interconnect bus.

여기서, 상기 제 1, 제 2 지역성 제어부(120)(220)는 도 4에 도시된 바와 같이 자신의 버스 트랜잭션 제어부(110)(210)로부터 다른 노드로의 버스 트랜잭션이 발생할 경우 해당 태그 어드레스(Monitoring Address1)를 저장하는 제 1 태그 어드레스 큐(Tag Address Queue[31:13])(121)와; 상기 제 1 태그 어드레스 큐(121)에 저장된 어드레스를 원격 노드로의 트랜잭션 횟수를 측정하도록 저장하는 제 2 태그 어드레스 저장부(Tag Address Store[31:13])(122)와; 상기 제 2 태그 어드레스 저장부(122)에 저장된 어드레스와 이 후 발생된 트랜잭션의 어드레스(Monitoring Address2)를 비교하여 동일할 경우 일정 시간 내에 원격 노드로의 트랜잭션 빈도를 측정하는 제 1 비교부(123)와; 상기 제 1 비교부(123)의 비교 결과 일정한 시간 내에 같은 태그 어드레스가 설정된 횟수 이상으로 원격 노드로 요청함에 따라 해당 태그 어드레스를 저장하는 제 3 태그 어드레스 저장부(124)와; 상기 제 3 태그 어드레스 저장부(124)에 저장된 태그 어드레스와 동일한 어드레스의 트랜잭션(Snooping Address)이 구동되는지를 검지하는 제 2 비교부(125)와; 상기 각 비교부(122)(125) 및 각 저장부(121,122,124)를 제어하는 제어부(Main Locality Controller)(126)와; 상기 제 3 태그 어드레스 저장부(124)에 저장된 태그 어드레스 영역에 해당하는 데이터가 저장된 다른 노드의 공유 메모리로부터 상기 데이터를 읽어와 저장하는 지역 메모리(DATA RAM)(129)와; 상기 버스 트랜잭션 제어부(110)(210)에서 스누핑한 어드레스의 인덱스 어드레스(Index Address) 부분을 저장하는 인덱스 어드레스 저장부(127)와; 상기 지역 메모리(129)에 데이터를 저장함에 따라 인덱스 어드레스를 발생하는 인덱스 어드레스 카운터(128)로 구성한다.In this case, the first and second locality controllers 120 and 220 monitor the corresponding tag address when a bus transaction from the bus transaction controller 110 and 210 to another node occurs as shown in FIG. 4. A first tag address queue (Tag Address Queue [31: 13]) 121 for storing Address 1; A second tag address store (Tag Address Store [31:13]) 122 which stores the address stored in the first tag address queue 121 to measure the number of transactions to a remote node; The first comparison unit 123 which compares an address stored in the second tag address storage unit 122 with an address of a later generated transaction (Monitoring Address2) and measures a transaction frequency to a remote node within a predetermined time when it is the same. Wow; A third tag address storage unit 124 for storing a corresponding tag address when a request is made to the remote node more than a predetermined number of times within a predetermined time as a result of the comparison of the first comparison unit 123; A second comparison unit (125) for detecting whether a transaction (Snooping Address) of the same address as the tag address stored in the third tag address storage unit (124) is driven; A control unit (Main Locality Controller) 126 for controlling the comparison units 122 and 125 and the storage units 121, 122, and 124; A local memory (DATA RAM) 129 for reading and storing the data from a shared memory of another node in which data corresponding to a tag address area stored in the third tag address storage unit 124 is stored; An index address storage unit (127) for storing an index address portion of an address snooped by the bus transaction control unit (110) (210); An index address counter 128 that generates an index address as data is stored in the local memory 129 is configured.

또한, 상기 지역 메모리(129)는 소용량의 고속 메모리로 구성한다.In addition, the local memory 129 is composed of a small capacity high speed memory.

이와 같이 구성된 본 발명의 작용을 도 2 내지 도 5를 참조하여 보다 상세히 설명하면 다음과 같다.The operation of the present invention configured as described above will be described in more detail with reference to FIGS. 2 to 5.

먼저, 발명은 다수개의 프로세서중 어느 한 프로세서에서 발생된 트랜잭션이 로컬버스에 구동될 경우, 예를 들어, 제 1 프로세서(101)에서 발생된 캐시 미스에 의하여 제 1 로컬버스를 통하여 다른 프로세서(102∼104)의 캐시 메모리를 참조하여 원하는 데이터를 패치하게 된다. 그러나, 각 프로세서(102∼104)에서도 캐시 미스가 발생될 경우에는 버스 트랜잭션 제어부(110)에서 이를 검지하여 제 1 공유 메모리(111) 또는 지역 메모리(129)에 해당 데이터가 존재하는지 판단한다.First, when the transaction generated in one of a plurality of processors is driven on the local bus, for example, a cache miss generated in the first processor 101 may cause a different processor 102 to be executed through the first local bus. The desired data is fetched with reference to the cache memory of ˜104). However, when a cache miss occurs in each of the processors 102 to 104, the bus transaction controller 110 detects this and determines whether the corresponding data exists in the first shared memory 111 or the local memory 129.

제 1 공유 메모리(111)에 요청된 데이터가 존재할 경우에는 해당 데이터를 제 1 프로세서(101)에 제공하면 된다. 또한, 경우(로컬버스에 구동된 트랜잭션의 어드레스가 제 2 노드(200)의 공유 메모리 어드레스일 경우)에 따라서 지역성 제어부(120) 내의 지역 메모리(129)에 원하는 데이터가 존재할 경우에는 해당 데이터를 제 1 프로세서(101)에 제공하면 된다.If the requested data exists in the first shared memory 111, the corresponding data may be provided to the first processor 101. In addition, if desired data exists in the local memory 129 in the locality control unit 120 according to the case (when the address of a transaction driven on the local bus is the shared memory address of the second node 200), the corresponding data is removed. 1 may be provided to the processor 101.

그러나 만약, 제 1 공유 메모리(129) 또는 지역 메모리(129)에도 존재하지 않을 경우 상호 접속 버스를 통하여 제 2 노드(200)의 공유 메모리(211)를 참조하게 된다.However, if it does not exist in the first shared memory 129 or the local memory 129, the shared memory 211 of the second node 200 is referred to through the interconnection bus.

이와 같은 제 2 노드(200)의 메모리 참조(Remote memory access)에 관하여 도 5를 참조하여 보다 상세히 설명하면 다음과 같다.Such a memory reference of the second node 200 will be described in more detail with reference to FIG. 5 as follows.

제 1 로컬버스에 발생된 트랜잭션을 처리하기 위하여 제 2 노드의 공유 메모리(210)를 참조해야 할 경우, 제 1 버스 트랜잭션 제어부(110)의 제어에 의하여 해당 어드레스를 제 2 노드(200)의 공유 메모리(210)를 참조하기 위하여 상호 접속 버스에 구동시킨다.When it is necessary to refer to the shared memory 210 of the second node in order to process a transaction generated in the first local bus, the corresponding address is shared by the second node 200 under the control of the first bus transaction controller 110. It is driven to an interconnect bus for reference to the memory 210.

이때, 제 1 지역성 제어부(120)에서는 원격 요청된 어드레스를 획득(snarfing)한다. 이와 같이 획득된 어드레스는 제 1 지역성 제어부(120)에서 일정한 시간 내에 같은 태그 어드레스의 제 2 노드로의 접근 빈도(match counting)를 측정하여 일정 횟수 이상인지를 판단하게 된다. 이때, 이미 다른 어드레스가 제 2 노드(200)로의 접근 빈도 측정이 진행중일 경우에는 현재 구동된 어드레스(Monitoring Address1)를 제 1 태그 어드레스 큐(121)에 저장한다.In this case, the first locality controller 120 acquires a remotely requested address. The address thus obtained is determined by the first locality controller 120 to measure a matching counting of the same tag address to the second node within a predetermined time to determine whether the address is a predetermined number or more. At this time, when measurement of the access frequency to the second node 200 is already performed, another address is stored in the first tag address queue 121.

이와 같이, 버스 트랜잭션 제어부(110)에 의하여 제 2 노드(200)의 공유 메모리(211)를 참조할 어드레스를 제 1 태그 어드레스 큐(121)에 저장한 경우에는 매치 카운트가 종료되었는지, 즉 이전의 매치 카운트가 완료되었는지를 판단하여, 만약 완료되었으면 제 1 태그 어드레스 큐(121)에 처음 들어온 어드레스를 제 1 태그 어드레스 저장부(122)에 저장하고, 그렇지 않은 경우에는 제 1 태그 어드레스 큐(121)에 제일 처음 들어온 어드레스를 버린다.As described above, when the bus transaction controller 110 stores the address for referring to the shared memory 211 of the second node 200 in the first tag address queue 121, whether the match count is over, that is, the previous count is completed. It is determined whether the match count is completed, and if it is completed, the first address entered in the first tag address queue 121 is stored in the first tag address storage 122, otherwise, the first tag address queue 121 is not included. Discards the first address entered.

만약, 현재 매치 카운팅이 진행되고 있지 않을 경우에는 관련 어드레스의 태그를 제 2 태그 어드레스 저장부(122)에 저장한다. 이와 같이 저장된 태그 어드레스는 제 1 비교부(123)에서 연이어서 발생하는 원격 노드로의 버스 트랜잭션 어드레스와 비교하게 된다.If the match counting is not currently performed, the tag of the related address is stored in the second tag address storage 122. The tag address stored as described above is compared with a bus transaction address to a remote node which is sequentially generated in the first comparator 123.

한편, 제 1 비교부(123)에서 현재 구동된 어드레스의 매치 카운팅이 진행되고 있을 경우에는 관련 어드레스를 기 저장된 태그 어드레스와 비교하여 일치 여부를 판단한다. 만약, 일치할 경우에는 매치 카운터가 세트되었는지, 즉 일정 시간 내에 같은 태그 어드레스의 발생 빈도가 어느 횟수 이상인지를 판단한다.On the other hand, when match counting of the address currently driven by the first comparator 123 is in progress, the corresponding address is compared with a previously stored tag address to determine whether there is a match. If there is a match, it is determined whether the match counter is set, that is, how many times the frequency of occurrence of the same tag address occurs within a predetermined time.

만약, 일정 시간 내에 같은 태그 어드레스의 발생 빈도가 설정된 횟수 이상 발생되지 않고 설정된 매치 시간이 지날 경우에는 종료한다. 그러나, 일정시간 내에 같은 태그 어드레스의 발생빈도가 설정된 횟수 이상일 경우에는 매치 카운트를 세트시키고, 버스 트랜잭션 제어부(110)에 현재 제 1 노드(100)내의 로컬버스를 일시 정지시킬 것을 요청하고, 제 2 노드(200)의 제 2 공유 메모리(211)로부터 4KB의 데이터를 한꺼번에 읽어와 제 1 노드(100)의 제 1 지역 메모리(129)에 쓰기를 수행한다. 쓰기를 종료한 후 버스 트랜잭션 제어부(110)에 알려 제 1 노드(100)내의 제 1 로컬버스를 다시 정상 동작시킨다.If the same frequency of occurrence of the same tag address does not occur more than the set number of times within a predetermined time and ends when the set match time passes. However, when the frequency of occurrence of the same tag address within a predetermined time is more than the set number of times, the match count is set, and the bus transaction controller 110 is requested to pause the local bus in the first node 100, and the second 4 KB of data is read from the second shared memory 211 of the node 200 at one time and written to the first local memory 129 of the first node 100. After the writing is finished, the bus transaction controller 110 is notified to operate the first local bus in the first node 100 again.

이와 같이 제 1 지역 메모리(129)에 저장된 데이터는 추후 버스 트랜잭션 제어부(110)의 요청에 의하여 데이터를 제공하게 된다. 즉, 제 2 비교부(125)에서 제 3 태그 어드레스 저장부(124)에 저장된 태그 어드레스와 버스 트랜잭션 제어부(110)에 의한 스누핑 어드레스(Snooping Address)가 일치하는지를 비교하여 일치할 경우 제어부(126)에 의하여 제 1 지역 메모리(129)에 저장된 해당 데이터를 요청한 프로세서에 제공하게 된다. 이때, 인덱스 어드레스 저장부(127)에서는 스누핑 어드레스의 인덱스 어드레스 부분을 저장하여, 저장된 인덱스 어드레스를 이용하여 제 1 지역 메모리(129)에 저장된 데이터의 검색이 용이하도록 한다.As such, the data stored in the first local memory 129 may provide data at a request of the bus transaction controller 110 later. That is, the controller 126 compares the tag address stored in the third tag address storage unit 124 with the snooping address by the bus transaction controller 110 in the second comparison unit 125. The corresponding data stored in the first local memory 129 is provided to the requesting processor. In this case, the index address storage unit 127 stores the index address portion of the snooping address to facilitate retrieval of data stored in the first local memory 129 using the stored index address.

여기서 인덱스 어드레스 카운터(128)에서는 제 2 노드의 제 2 공유 메모리(211)에서 데이터를 읽어와 제 1 지역 메모리(129)에 저장할 때 인덱스 어드레스를 발생한다.The index address counter 128 generates an index address when data is read from the second shared memory 211 of the second node and stored in the first local memory 129.

이상에서 설명한 바와 같이 본 발명은 로컬버스에 발생된 트랜잭션을 처리하기 위하여 어드레스의 원격 노드로 참조하는 빈도를 측정하여 빈도가 일정 시간동안 일정 횟수 이상일 경우 해당 어드레스의 데이터를 원격 노드의 메모리로부터 가져와 저장한 후 버스 트랜잭션 제어부의 요청에 따라 제공하는 지역 메모리를 포함한 지역성 제어장치를 구현함으로써, 원격 노드로의 메모리를 접근할 경우 트랜잭션 처리를 위하여 걸리는 지연시간을 단축시킬 수 있을 뿐만 아니라, 그 하드웨어의 구현이 용이한 효과가 있다.As described above, the present invention measures the frequency of referencing to a remote node of an address in order to process a transaction generated on a local bus, and when the frequency is a predetermined number of times for a predetermined time, the data of the address is taken from the remote node's memory and stored. Then, by implementing a locality control device including local memory provided at the request of the bus transaction control unit, it is possible to reduce the delay time for transaction processing as well as to implement the hardware when accessing the memory to a remote node. This has an easy effect.

Claims

When accessing the memory of another node to supply the data corresponding to the transaction occurred on the local bus of the multi-processor system of the distributed shared memory structure, the number of times that the same address accesses the memory of the other node for a certain time is detected. 1 course; A second step of storing burst data of the driven address area from a memory of another node to a local memory of its own node when the frequency of access to the other node is more than a predetermined number of times; And a third process of providing the data to a corresponding processor when data corresponding to an address driven on the local bus exists in the local memory.

2. The method of claim 1, wherein the first step monitors the associated address of the transaction and, if it is necessary to refer to the memory of the remote node to process a transaction originating from a given processor, and to the remote node for a different address now. Determining whether an access frequency is being measured; Storing the currently driven address for comparison with a later driven address after the measurement of the frequency of access to the remote node for the other address is complete; Storing the currently driven address and comparing it with a previously driven address if the access frequency to the remote node is not being measured; Counting the number of times according to a match state as a result of the comparison; And determining whether the number of matches is greater than or equal to a predetermined number of times.

The method of claim 1, wherein the second process stops a transaction currently occurring in the local bus of its own node when the frequency of access to the other node is greater than or equal to a predetermined number of times, and then enters the memory of the other node more than the set number of times. Reading and storing related data of an accessed address from a memory of a remote node; And normal operation of the transaction generated in the local bus after completing the storage in the local memory of the node from the remote node.

The method of claim 1, wherein the third process stores an index address portion of a corresponding address when an address of a transaction generated by snooping a local bus does not exist in its node, and compares the address with a tag address stored in a local memory. Making a step; And if there is a matching tag address as a result of the comparison, supplying corresponding data from the local memory.

Bus transaction control means for controlling memory access of a corresponding address to process transactions generated on a local bus connecting multiple processors, and when the memory access to a remote node occurs through the bus transaction control means, the same address is set for a predetermined time. A plurality of nodes including locality control means for storing data of a predetermined size of a corresponding address area in a local memory of its own node as the number of times set within the same number of times occurs; Locality of a symmetric multiprocessing system comprising an interconnection bus for interconnecting each node to share an input / output bus, and transmitting address and data to each other under the control of the bus transaction control means. Control unit.

6. The system of claim 5, wherein the bus transaction control means included in each node mediates snooping the local bus, controls memory access requests from its own nodes, and controls memory accesses from processors and input / output devices of remote nodes. And controlling the history of the memory and controlling access to the memory and the input / output devices of the remote node when referring to the memory of the remote node.

6. The system of claim 5, wherein the locality control means included in each node comprises: a first tag address storage unit for storing a corresponding tag address when a bus transaction from a bus transaction control means to a remote node occurs; A second tag address storage unit for storing an address stored in the first tag address storage unit to measure the number of transactions to a remote node; A first comparator that compares an address stored in the second tag address storage with an address of a subsequent generated transaction and equally measures a transaction frequency to a remote node within a predetermined time; A third tag address storage unit for storing a corresponding tag address when a request is made to the remote node more than a predetermined number of times within a predetermined time as a result of the comparison of the first comparator; A second comparator for detecting whether a transaction of the same address as the tag address stored in the third tag address storage is driven; Control means for controlling the comparison unit and each storage means; A local memory for storing the data from a memory of a remote node in which data corresponding to a tag address area stored in the third tag address storage unit is stored; An index address storage section for storing an index address portion of an address snooped by the bus transaction control means; And an index address counter which generates an index address as data is stored in the local memory.

6. The symmetric multiprocessing system of claim 5, wherein the interconnection bus is comprised of a local bus having an input / output bus bridge connected and having the same protocol as a local bus connecting the processor to connect each node. Locality control.