KR20120072952A

KR20120072952A - Multicore system and memory management device for multicore system

Info

Publication number: KR20120072952A
Application number: KR1020100134895A
Authority: KR
Inventors: 이재진; 장춘기; 박정호
Original assignee: 서울대학교산학협력단
Priority date: 2010-12-24
Filing date: 2010-12-24
Publication date: 2012-07-04
Also published as: KR101192423B1

Abstract

PURPOSE: A multi-core system and a memory management device thereof are provided to reduce unnecessary network traffic and simplify a system structure by selectively loading a page which is accessed by a processor on a cache. CONSTITUTION: A first TLB(Translation Lookaside Buffer) exception processor(211) duplicates a page descriptor having a loading location determining field to a TLB of a first processor. The TLB indicates a page which the first processor accesses and indicates wether the page is loaded to a cache of the first processor from a memory area. If the page is a write-shared page, a second TLB exception processor(212) transfers an interrupt message to the second processor.

Description

Multicore system and memory management device for multicore system}

다수의 프로세싱 코어를 포함하는 멀티코어 시스템 및 멀티코어 시스템의 일관성 정책 기술과 관련된다.It relates to the coherency policy techniques of multicore systems and multicore systems that include multiple processing cores.

일반적으로, 멀티코어 시스템이란 여러 개의 작업을 한 번에 처리하기 위해 두 개 이상의 프로세싱 코어를 가지고 있는 프로세서를 말한다. 멀티 코어 프로세서는 싱글 코어 프로세서에 비해 성능 및 소비 전력 절감 면에서 유리하기 때문에 최근 각광을 받고 있다.In general, a multicore system refers to a processor that has two or more processing cores to process several tasks at once. Multi-core processors have been in the spotlight recently because they are advantageous in terms of performance and power savings over single-core processors.

멀티코어 시스템에는 동일한 코어가 다수개 존재하는 대칭형 멀티코어 시스템(SMP, Symmetric Multi-Processing)과 DSP(Digital Processing Processor)나 GPU(Graphic Processing Unit) 등 GPP(General Purpose Processor)로 사용될 수 있는 다양한 이기종 코어들로 이루어진 비대칭 멀티코어 시스템(AMP, Asymmetric Multi-Processing)이 있다.Multi-core systems include a variety of heterogeneous types that can be used as symmetric multi-core systems (SMP, Symmetric Multi-Processing) with multiple cores, and General Purpose Processors (GPPs) such as Digital Processing Processors (DSPs) or Graphic Processing Units (GPUs). There is an asymmetric multi-processing system (AMP) consisting of cores.

멀티코어 시스템에서 각 프로세싱 코어는 독립적으로 태스크를 실행한다. 따라서 어떤 메모리 영역에 서로 다른 프로세싱 코어가 동시에 접근하는 경우 데이터 불일치가 발생할 수 있다. 이러한 데이터 불일치를 방지하기 위해 대부분의 멀티코어 시스템은 coherent cache architecture를 채택하고 있다. In a multicore system, each processing core executes a task independently. As a result, data inconsistencies may occur when different processing cores simultaneously access a memory area. To avoid such data inconsistencies, most multicore systems employ a coherent cache architecture.

대표적인 coherent cache architecture로는 snooping-based coherent cache와 directory-based coherent cache가 있다. Representative coherent cache architectures include snooping-based coherent cache and directory-based coherent cache.

Snooping-based coherent cache는 실제 상용 프로세서에서 가장 많이 쓰이고 있는 기법으로, 각 프로세싱 코어가 공유 버스에 연결되어 다른 프로세싱 코어가 어떤 메모리 블록에 어떤 접근을 하고 있는지를 snooping하는 방식으로 coherence를 제공한다. Snooping-based coherent cache의 경우, 공유 버스를 이용하여 모든 메모리 요청이 모든 프로세서에 전달되므로 확장성면에서 많은 수의 프로세싱 코어가 집적된 CMP(chip multi-process)에 적용하기엔 적합하지 않다.Snooping-based coherent caches are the most commonly used techniques in real-world processors, providing coherence by snooping which memory blocks are accessed by each processing core on a shared bus. In the case of snooping-based coherent caches, all memory requests are delivered to all processors using a shared bus, making them unsuitable for chip multi-process (CMP) integration with a large number of processing cores.

Directory-based coherent cache는 cache에 저장되는 메모리 블록마다 어떤 프로세싱 코어가 어떠한 권한으로 접근하고 있는지를 기록하는 directory가 존재하여 필요한 경우 해당 프로세싱 코어에만 coherent message를 보내는 기법이다. 따라서 snooping-based coherent cache보다 확장성이 좋다고 여겨지고 있으나 하드웨어 설계의 복잡도가 매우 높아서 이 역시 CMP에 적용하기에는 적절치 아니하다.Directory-based coherent cache is a technique that sends a coherent message to only those processing cores when there is a directory that records which processing core has access to what authority in each memory block stored in the cache. Therefore, scalability is better than snooping-based coherent cache, but the complexity of hardware design is so high that it is not suitable for CMP.

많은 수의 프로세싱 코어가 집적된 CMP에 적용가능한 멀티코어 시스템의 메모리 관리 장치 및 이러한 메모리 관리 장치가 적용된 멀티코어 시스템이 제공된다.A memory management apparatus of a multicore system applicable to a CMP in which a large number of processing cores are integrated, and a multicore system to which such a memory management apparatus is applied are provided.

본 발명의 일 양상에 따른 메모리 관리 장치는, 제 1 프로세서가 접근하려는 페이지를 지시하고 그 페이지가 메모리 영역으로부터 제 1 프로세서의 캐시로 로드될지 또는 제 1 프로세서의 로컬 스토어로 로드될지를 나타내는 로드 위치 결정 필드를 갖는 페이지 디스크립터를 제 1 프로세서의 TLB(translation lookaside buffer, 변환 참조 버퍼)로 복사하는 제 1 TLB 예외 처리부, 및 페이지가 페이지 일관성(page coherence)이 보장되어야 하는 쓰기 공유 페이지(write-shared page)인 경우, 그 페이지에 접근했던 제 2 프로세서의 캐시가 플러시되고 제 2 프로세서의 TLB가 무효화되도록 인터럽트 메시지를 제 2 프로세서로 전달하고, 제 2 프로세서의 응답에 따라 제 1 프로세서의 캐시를 플러시하고 로드 위치 결정 필드를 수정하는 제 2 TLB 예외 처리부를 포함할 수 있다.A memory management apparatus according to an aspect of the present invention is a load position indicating a page to be accessed by a first processor and indicating whether the page is loaded from a memory area into a cache of the first processor or into a local store of the first processor. A first TLB exception handler that copies the page descriptor with decision fields to the translation lookaside buffer (TLB) of the first processor, and a write-shared page where the page must ensure page coherence page), forward the interrupt message to the second processor so that the cache of the second processor that accessed the page is flushed and the TLB of the second processor is invalidated, and flush the cache of the first processor according to the response of the second processor. And a second TLB exception processing unit that modifies the load location determination field.

또한 본 발명의 일 양상에 따른 멀티코어 시스템은, 제 1 프로세서, 메모리 일관성을 제공하지 아니하는 제 1 캐시, 및 메모리 일관성을 제공하는 제 1 로컬 스토어를 포함하는 제 1 코어와, 제 2 프로세서, 메모리 일관성을 제공하지 아니하는 제 2 캐시, 및 메모리 일관성을 제공하는 제 2 로컬 스토어를 포함하는 제 2 코어 및, 제 1 또는 제 2 프로세서가 접근하려는 페이지에 관한 페이지 디스크립터의 로드 위치 결정 비트에 따라 페이지를 제 1 또는 제 2 캐시에 복사하거나 제 1 또는 제 2 로컬 스토어에 복사하는 메모리 관리부를 포함할 수 있다.In addition, a multicore system according to an aspect of the present invention may include a first core comprising a first processor, a first cache not providing memory consistency, and a first local store providing memory consistency, a second processor, A second core comprising a second cache that does not provide memory coherence, and a second local store that provides memory coherence, and according to the load position determination bits of the page descriptors regarding pages that the first or second processor is trying to access. It may include a memory management unit for copying the page to the first or second cache or to the first or second local store.

개시된 내용에 따르면, 프로세서에 의해 접근되는 페이지를 그 페이지의 일관성 보장 필요성에 따라 일관성 정책이 지원되는 로컬 스토어와 일관성 정책이 지원되지 아니하는 캐시에 선택적으로 로드하여 관리하기 때문에 불필요한 네트워크 트래픽을 줄이고 시스템 구조를 단순화시킬 수 있다.According to the disclosure, the system accesses the pages that are accessed by the processor by selectively loading and managing them in the local store where the consistency policy is supported and the cache where the consistency policy is not supported, according to the necessity of guaranteeing the consistency of the pages. The structure can be simplified.

도 1은 본 발명의 일 실시예에 따른 멀티코어 시스템을 도시한다.
도 2는 본 발명의 일 실시예에 따른 멀티코어 시스템의 메모리 관리 장치를 도시한다.
도 3은 본 발명의 일 실시예에 따른 페이지 디스크립터의 로드 위치 결정 필드를 도시한다.
도 4 내지 도 7은 본 발명의 일 실시예에 따른 멀티코어 시스템의 메모리 관리 동작을 도시한다.1 illustrates a multicore system according to one embodiment of the invention.
2 illustrates a memory management apparatus of a multicore system according to an exemplary embodiment of the present invention.
3 illustrates a load position determination field of a page descriptor according to an embodiment of the present invention.
4 through 7 illustrate a memory management operation of a multicore system according to an embodiment of the present invention.

가상 메모리에서 프로세서가 접근하는 가상 메모리 영역은 페이지 단위로 나누어진다. 이러한 페이지 중 실제로 일관성(coherence)이 보장되어야 하는 페이지는 다수의 프로세서가 접근하고, 그 중 적어도 하나의 프로세서가 해당 page에 write operation을 수행하는 경우가 될 수 있다. 이러한 페이지를 쓰기 공유 페이지라 칭한다.The virtual memory area accessed by the processor in the virtual memory is divided into pages. Among these pages, a page in which coherence should be guaranteed is accessed by a plurality of processors, and at least one of the pages may perform a write operation on the page. Such a page is called a write shared page.

본 발명의 일 양상에 따른 메모리 시스템 설계에 따라, 쓰기 공유 페이지에 대한 일관성은 로컬 스토어에 의해 제공될 수 있다. 로컬 스토어는 소프트웨어에 의해 제어되는 각 프로세서의 임시 저장소가 될 수 있다. 이 로컬 스토어는 하드웨어에 의해 제어되는 캐시와는 다르다. According to a memory system design in accordance with an aspect of the present invention, consistency for write shared pages may be provided by a local store. The local store can be a temporary store for each processor controlled by software. This local store is different from the cache controlled by the hardware.

각 프로세서의 페이지 요청은 TLB(translation lookaside buffer)로 복사되는 페이지 디스크립터에 의해 이루어지며, 페이지 디스크립터는 해당 페이지가 로컬 스토어로 로드되어야 하는지 또는 일관성을 제공하지 아니하는 캐시로 로드되어야 하는지를 나타내는 특정 필드를 포함할 수 있다.Each processor's page request is made by a page descriptor that is copied into a translation lookaside buffer (TLB), which contains a specific field that indicates whether the page should be loaded into the local store or into a cache that does not provide consistency. It may include.

이하, 첨부된 도면을 참조하여 본 발명의 실시를 위한 구체적인 예를 상세히 설명한다. Hereinafter, specific examples for carrying out the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 멀티코어 시스템을 도시한다.1 illustrates a multicore system according to one embodiment of the invention.

도 1을 참조하면, 멀티코어 시스템(100)은 다수의 코어(110), 메모리 관리부(120), 및 메인 메모리(130)를 포함한다. Referring to FIG. 1, the multicore system 100 includes a plurality of cores 110, a memory manager 120, and a main memory 130.

각각의 코어(110)는 타일(tile) 또는 메쉬(mesh) 구조로 연결된다. 각각의 코어(110)는 페이지 단위로 메인 메모리(130)의 메모리 블록에 접근한다. 메모리 관리부(120)는 페이지 일관성(page coherence)이 보장되면서 코어(110)들이 메인 메모리(130)의 특정 페이지에 접근하여 읽기/쓰기 동작을 실행할 수 있도록 각종 예외 처리와 코어(110)의 동작을 제어한다. Each core 110 is connected in a tile or mesh structure. Each core 110 accesses a memory block of the main memory 130 in page units. The memory manager 120 performs various exception processing and operations of the core 110 to allow the cores 110 to access a specific page of the main memory 130 and perform a read / write operation while ensuring page coherence. To control.

각각의 코어(110)는 프로세서(111), TLB(112, 113), L1 캐시(114, 115), L2 캐시(116), 로컬 스토어(117), DMA(118), 및 라우터/스위치(119)를 포함한다.Each core 110 includes a processor 111, TLBs 112, 113, L1 caches 114, 115, L2 cache 116, local store 117, DMA 118, and router / switch 119 ).

프로세서(111)는 TLB(tranlation lookaside buffer, 변환 참조 버퍼)(112, 113)에 저장된 페이지 디스크립터(page descriptor)를 통해 메인 메모리(130)의 페이지에 접근한다. The processor 111 accesses a page of the main memory 130 through a page descriptor stored in translation lookaside buffers (TLBs) 112 and 113.

TLB(112, 113)는 프로세서(111)가 접근하려는 페이지에 관한 페이지 디스크립터를 저장한다. TLB(112, 113)는 인스트럭션과 관련된 페이지의 페이지 디스크립터가 저장되는 I-TLB(112)와 데이터와 관련된 페이지의 페이지 디스크립터가 저장되는 D-TLB(113)로 구분될 수 있다.The TLBs 112 and 113 store page descriptors for pages to be accessed by the processor 111. The TLBs 112 and 113 may be divided into an I-TLB 112 storing a page descriptor of a page related to an instruction and a D-TLB 113 storing a page descriptor of a page related to data.

L1 캐시(114, 115)는 프로세서(111)가 접근하려는 페이지를 임시로 저장한다. 프로세서(111)는 L1 캐시(114, 115)를 통해 빠르게 페이지에 접근하는 것이 가능하다. L1 캐시(114, 115)는 인스트럭션과 관련된 페이지가 저장되는 L1 인스트럭션 캐시(114)와 데이터와 관련된 페이지가 저장되는 L1 데이터 캐시(115)로 구분될 수 있다. The L1 caches 114 and 115 temporarily store pages that the processor 111 wants to access. The processor 111 may quickly access a page through the L1 caches 114 and 115. The L1 caches 114 and 115 may be divided into an L1 instruction cache 114 that stores a page related to an instruction and an L1 data cache 115 that stores a page related to data.

L1 데이터 캐시(115)는 데이터 일관성(data coherence)를 제공하지 아니할 수 있다. 다시 말해, 프로세서(111)가 L1 데이터 캐시(115)를 통해 페이지에 접근하는 경우, 프로세서(111)는 데이터 일관성을 고려하지 않고 자유롭게 데이터를 읽고 쓰는 것이 가능하다. The L1 data cache 115 may not provide data coherence. In other words, when the processor 111 accesses a page through the L1 data cache 115, the processor 111 may freely read and write data without considering data consistency.

L2 캐시(116)는 프로세서(111)가 접근하려는 페이지를 임시로 저장한다. L2 캐시(116)는 다른 코어에 존재하는 프로세서가 공유할 수 있는 캐시로서, Non-Unified Cache Access(NUCA)로 구성될 수 있다. 즉 L2 캐시(116)는 슬라이스로 나누어져서 각 코어(110)에 분산배치될 수 있다. The L2 cache 116 temporarily stores a page to be accessed by the processor 111. The L2 cache 116 is a cache that can be shared by a processor existing in another core and may be configured as Non-Unified Cache Access (NUCA). That is, the L2 cache 116 may be divided into slices and distributed to each core 110.

로컬 스토어(117)는 프로세서(111)가 접근하려는 페이지를 임시로 저장한다. 로컬 스토어(117)는 메모리 관리부(120)에 의해 관리되며, 메모리 관리부(120)의 관리에 의해 데이터 일관성을 제공할 수 있다. 다시 말해, 프로세서(111)가 로컬 스토어(117)를 통해 페이지에 접근하는 경우, 프로세서(111)에 의해 접근된 페이지는 메모리 관리부(120)에 의해 페이지 일관성이 보장될 수 있다. The local store 117 temporarily stores the page to be accessed by the processor 111. The local store 117 is managed by the memory manager 120 and may provide data consistency by managing the memory manager 120. In other words, when the processor 111 accesses a page through the local store 117, the page accessed by the processor 111 may be guaranteed page consistency by the memory manager 120.

DMA(118)는 메인 메모리(130)와 로컬 스토어(117) 간의 데이터 전송을 관리하고, 라우터/스위치(119)는 코어(110)간의 네트워크 메시지를 전송 및 수신한다.The DMA 118 manages data transfer between the main memory 130 and the local store 117, and the router / switch 119 sends and receives network messages between the cores 110.

본 실시예에 따라, 어떤 코어(110)의 프로세서(111)가 접근하려고 하는 페이지는 캐시(예컨대, 115) 또는 로컬 스토어(117)로 로드될 수 있다. 페이지가 캐시(115)로 로드될지 또는 로컬 스토어(117)로 로드될지는 그 페이지에 관한 페이지 디스크립터의 특정 필드에 따라 결정될 수 있다. 또한 본 실시예에 따라, 캐시(115)는 페이지 일관성을 제공하지 아니하고 로컬 스토어(117)는 페이지 일관성을 제공할 수 있다. According to the present embodiment, a page that a processor 111 of a core 110 tries to access may be loaded into a cache (eg, 115) or a local store 117. Whether a page is loaded into cache 115 or into local store 117 may be determined according to a particular field of the page descriptor for that page. Also in accordance with this embodiment, the cache 115 does not provide page consistency and the local store 117 may provide page consistency.

따라서 어떤 페이지에 대한 페이지 일관성 보장의 필요성에 따라 그 페이지의 페이지 디스크립터의 특정 필드가 적절히 조절되고, 그에 따라 페이지의 로드 위치가 캐시(117) 또는 로컬 스토어(117)로 결정되기 때문에 페이지 일관성을 제공하면서도 일관성 보장을 위한 네트워크 트래픽을 줄일 수 있다. This provides page consistency because certain fields in the page descriptor of that page are appropriately adjusted according to the need for ensuring page consistency for a page, and therefore the load location of the page is determined by the cache 117 or local store 117. At the same time, network traffic can be reduced to ensure consistency.

도 2는 본 발명의 일 실시예에 따른 멀티코어 시스템의 메모리 관리 장치를 도시한다. 이것은 도 1의 메모리 관리부(120) 또는 DMA(118)의 일부 구성에 관한 일 예가 될 수 있다.2 illustrates a memory management apparatus of a multicore system according to an exemplary embodiment of the present invention. This may be an example of some configurations of the memory manager 120 or the DMA 118 of FIG. 1.

도 1 및 도 2를 참조하면, 메모리 관리 장치(200)는 예외 처리부(201)와 일관성 보장부(202)를 포함한다.1 and 2, the memory management apparatus 200 includes an exception processor 201 and a consistency guaranteeer 202.

예외 처리부(201)는 프로세서(111)가 어떤 페이지에 접근하기 위해 D-TLB(113)를 참조하였을 때 해당 페이지의 페이지 디스크립터가 D-TLB(113)에 존재하지 않는 경우 호출될 수 있다. When the processor 111 refers to the D-TLB 113 to access a page, the exception processor 201 may be called when the page descriptor of the page does not exist in the D-TLB 113.

예외 처리부(201)는 제 1 예외 처리부(211) 및 제 2 예외 처리부(212)를 포함할 수 있다. The exception processing unit 201 may include a first exception processing unit 211 and a second exception processing unit 212.

제 1 예외 처리부(211)는 프로세서(111)가 접근하려는 페이지의 페이지 디스크립터를 D-TLB(113)로 복사한다. 이때, 페이지 디스크립터는, 본 발명의 일 실시예에 따라, 소정의 로드 위치 결정 필드를 가질 수 있다. 로드 위치 결정 필드는 하나 또는 두개의 비트 영역으로 구성될 수 있다. 로드 위치 결정 필드는 페이지 디스크립터가 지시하는 페이지가 L1 데이터 캐시(115)로 로드될지 또는 로컬 스토어(117)로 로드될지 여부를 나타낸다. The first exception processing unit 211 copies the page descriptor of the page that the processor 111 wants to access to the D-TLB 113. In this case, the page descriptor may have a predetermined load position determination field according to an embodiment of the present invention. The load position determination field may consist of one or two bit regions. The load location determination field indicates whether the page indicated by the page descriptor is loaded into the L1 data cache 115 or into the local store 117.

또한 본 발명의 일 양상에 따라, 제 1 예외 처리부(211)는 페이지 디스크립터를 복사하는 동안 페이지 디스크립터의 락(lock)을 설정하였다가 예외 처리가 끝나면 락을 해제할 수 있다.In addition, according to an aspect of the present invention, the first exception processing unit 211 may set a lock of the page descriptor while copying the page descriptor, and release the lock when the exception processing ends.

제 2 예외 처리부(212)는 페이지 디스크립터가 지시하는 페이지가 페이지 일관성(page coherence)이 보장되어야 하는 쓰기 공유 페이지(write-shared page)인지 여부를 판단한다. 예컨대, 제 2 예외 처리부(212)는 그 페이지로 다수의 프로세서가 접근하였고 그 중 적어도 하나의 프로세서가 쓰기 동작을 위해 접근한 경우 해당 페이지를 쓰기 공유 페이지로 설정할 수 있다.The second exception processor 212 determines whether the page indicated by the page descriptor is a write-shared page to which page coherence should be guaranteed. For example, when a plurality of processors have accessed the page and at least one processor has accessed the page for a write operation, the second exception processor 212 may set the page as a write sharing page.

페이지가 쓰기 공유 페이지인 경우, 제 2 예외 처리부(212)는 인터럽트 메시지를 생성하고, 생성된 인터럽트 메시지를 그 페이지에 접근했던 다른 프로세서들에게 전달하고, 다른 프로세서들로부터 그 메시지에 대한 응답을 받을 때까지 대기한다. 생성된 인터럽트 메시지는 해당 페이지에 접근했던 다른 프로세서들의 캐시가 플러시되고 TLB가 무효화되도록 하는 요청 메시지가 될 수 있다.If the page is a write shared page, the second exception handler 212 generates an interrupt message, forwards the generated interrupt message to other processors that have accessed the page, and receives a response to the message from other processors. Wait until. The generated interrupt message can be a request message that causes the cache of other processors that have accessed the page to be flushed and the TLB invalidated.

인터럽트 메시지를 수신한 각 프로세서들은 자신의 캐시를 플러시(무효화 포함)하고 또한 자신의 TLB를 무효화한 후 응답 메시지를 보낸다. 응답 메시지에 따라 제 2 예외 처리부(212)는 프로세서(111)의 L1 데이터 캐시(115)를 플러시하고 페이지 디스크립터의 로드 위치 결정 필드를 수정한다.Each processor that receives the interrupt message flushes its cache (including invalidation), invalidates its TLB, and sends a response message. According to the response message, the second exception processor 212 flushes the L1 data cache 115 of the processor 111 and modifies the load position determination field of the page descriptor.

일관성 보장부(202)는 정해진 일관성 프로토콜에 따라 페이지 일관성이 보장될 수 있도록 각 코어(110)의 로컬 스토어(117)를 제어한다. 예컨대, 일관성 보장부는 릴리스 일관성 프로토콜에 따라 특정한 시점에 로컬 스토어(117)의 내용이 메인 메모리(130)에 반영되도록 하거나 쓰기 공유 페이지를 접근한 프로세서들의 D-TLB(113)를 무효화시키는 것이 가능하다.The consistency guarantee unit 202 controls the local store 117 of each core 110 to ensure page consistency according to a predetermined consistency protocol. For example, the consistency guarantee unit may allow the contents of the local store 117 to be reflected in the main memory 130 at a specific point in time according to the release coherency protocol, or invalidate the D-TLB 113 of the processors that have accessed the write shared page. .

도 3은 본 발명의 일 실시예에 따른 페이지 디스크립터의 로드 위치 결정 필드를 도시한다.3 illustrates a load position determination field of a page descriptor according to an embodiment of the present invention.

도 3을 참조하면, 페이지 디스크립터의 유후 비트 영역에 C 비트와 L 비트가 지정될 수 있다. 예컨대, 로드 위치 결정 비트 11은 페이지 디스크립터가 지시하는 페이지의 로드 위치가 L1 캐시 및 L2 캐시임을 나타낸다. 로드 위치 결정 비트 10은 페이지 디스크립터가 지시하는 페이지의 로드 위치가 L2 캐시임을 나타낸다. 로드 위치 결정 비트 01은 페이지 디스크립터가 지시하는 페이지의 로드 위치가 로컬 스토어임을 나타낸다. 로드 위치 결정 비트 11은 페이지 디스크립터가 지시하는 페이지가 캐시 또는 로컬 스토어를 통하지 않고 직접 프로세서에 의해 접근됨을 나타낸다. 도 3에서, 로드 위치 결정 비트의 초기 디폴트 값은 11로 주어질 수 있다.Referring to FIG. 3, C bits and L bits may be designated in the rich bit area of the page descriptor. For example, load position determination bit 11 indicates that the load position of the page indicated by the page descriptor is L1 cache and L2 cache. The load position determination bit 10 indicates that the load position of the page indicated by the page descriptor is the L2 cache. The load position determination bit 01 indicates that the load position of the page indicated by the page descriptor is a local store. Load position determination bit 11 indicates that the page indicated by the page descriptor is accessed by the processor directly, rather than via a cache or local store. In FIG. 3, the initial default value of the load positioning bit may be given as 11.

도 4 내지 도 7은 본 발명의 일 실시예에 따른 멀티코어 시스템의 메모리 장치에 대한 동작을 도시한다.4 through 7 illustrate an operation of a memory device of a multicore system according to an exemplary embodiment of the present invention.

도 2 및 도 4에서, 프로세서 #1이 페이지 A(401)를 읽기 위하여 D-TLB(410)를 참조하였으나, TLB miss가 발생하여 예외 처리부(201)가 호출되었다. 예외 처리부(201)는 메모리 영역(130)에서 페이지 A(401)의 페이지 디스크립터(402)를 프로세서 #1의 D-TLB(410)로 복사한다. In FIG. 2 and FIG. 4, the processor # 1 refers to the D-TLB 410 to read the page A 401, but the exception processing unit 201 is called because a TLB miss occurs. The exception processor 201 copies the page descriptor 402 of the page A 401 to the D-TLB 410 of the processor # 1 in the memory area 130.

이때, 예외 처리부(201)는 페이지 디스크립터(402)가 D-TLB(410)로 복사되고 해당 페이지가 로드될 때까지 페이지 디스크립터(402)에 락(lock)을 설정하는 것이 가능하다. In this case, the exception processing unit 201 may set a lock on the page descriptor 402 until the page descriptor 402 is copied to the D-TLB 410 and the corresponding page is loaded.

페이지 디스크립터(402)가 D-TLB(410)로 복사되면, 프로세서 #1은 캐시(420) 또는 로컬 스토어(430)를 통해 페이지 A(401)로 접근한다. 이때 페이지 디스크립터(402)의 로드 위치 결정 필드가 디폴트 값인 '1'이므로, 페이지 A(401)는 캐시(420)로 로드된다. Once the page descriptor 402 is copied to the D-TLB 410, processor # 1 accesses page A 401 via cache 420 or local store 430. In this case, since the load position determination field of the page descriptor 402 is '1', which is the default value, the page A 401 is loaded into the cache 420.

페이지 A(401)가 로드되면, 예외 처리부(201)는 페이지 A(401)가 쓰기 공유 페이지(write-shared page)인지 여부를 판단한다. 예컨대, 예외 처리부(201)는 로드 위치 결정 필드가 '1'인 페이지 디스크립터(402)에 의해 지시되는 페이지에 대하여, 해당 페이지로 다수의 프로세서가 접근하였는지 여부 및 접근한 프로세서 중 적어도 하나의 프로세서가 쓰기 동작을 위해 접근하였는지 여부에 따라 페이지 A(401)가 쓰기 공유 페이지인지 여부를 판단할 수 있다. 페이지 A(401)로 프로세서 #1만 접근한 상태이므로 페이지 A(401)는 쓰기 공유 페이지가 아니다. When the page A 401 is loaded, the exception processing unit 201 determines whether the page A 401 is a write-shared page. For example, the exception processing unit 201 may determine whether a plurality of processors have accessed the page indicated by the page descriptor 402 having the load position determination field '1', and at least one processor having accessed the page. It may be determined whether the page A 401 is a write shared page according to whether or not it has approached for a write operation. Page A 401 is not a write-sharing page because only processor # 1 is accessed by page A 401.

이어서 도 2 및 도 5에서, 프로세서 #2가 페이지 A(501)를 쓰기 위하여 D-TLB(510)를 참조하였으나, TLB miss가 발생하여 예외 처리부(201)가 호출되었다. 예외 처리부(201)는 메모리 영역(130)에서 페이지 A(501)의 페이지 디스크립터(502)를 프로세서 #2의 D-TLB(510)로 복사한다. Subsequently, in FIG. 2 and FIG. 5, the processor # 2 refers to the D-TLB 510 to write the page A 501, but an exception processing unit 201 was called because a TLB miss occurred. The exception processing unit 201 copies the page descriptor 502 of the page A 501 to the D-TLB 510 of the processor # 2 in the memory area 130.

페이지 디스크립터(502)가 D-TLB(510)로 복사되면, 프로세서 #2는 캐시(520) 또는 로컬 스토어(530)를 통해 페이지 A(501)로 접근한다. 이때 페이지 디스크립터(502)의 로드 위치 결정 필드가 디폴트 값인 '1'이므로, 페이지 A(501)는 캐시(520)로 로드된다. Once page descriptor 502 is copied to D-TLB 510, processor # 2 accesses page A 501 through cache 520 or local store 530. At this time, since the load position determination field of the page descriptor 502 is '1', which is the default value, the page A 501 is loaded into the cache 520.

페이지 A(501)가 로드되면, 예외 처리부(201)는 페이지 A(501)가 쓰기 공유 페이지(write-shared page)인지 여부를 판단한다. 예컨대, 도 4에서 프로세서 #1이 이미 페이지 A(401)에 접근하였고, 이후 도 5에서 프로세서 #2가 동일한 페이지 A(501)에 대해 쓰기 동작을 수행하였으므로, 예외 처리부(201)는 페이지 A(501)가 쓰기 공유 페이지인 것으로 판단할 수 있다. When the page A 501 is loaded, the exception processing unit 201 determines whether the page A 501 is a write-shared page. For example, since processor # 1 has already accessed page A 401 in FIG. 4, and processor # 2 has performed a write operation on the same page A 501 in FIG. 5, the exception processing unit 201 may execute page A ( It may be determined that 501 is a write sharing page.

페이지 A(501)가 쓰기 공유 페이지로 결정된 경우, 예외 처리부(201)는 페이지 A(501)에 접근했던 다른 프로세서, 즉 프로세서 #1로 인터럽트 메시지를 전송한다. 이 인터럽트 메시지를 수신한 프로세서 #1은, 도 4에서, 캐시(420)에 저장된 페이지 A를 무효화(invalidation) 및 플러시(flush)하고, TLB(410)에 저장된 페이지 디스크립터를 무효화한다. 그리고 프로세서 #1은 프로세서 #2로 응답 메시지를 보낸다. When page A 501 is determined to be a write shared page, the exception handler 201 transmits an interrupt message to another processor that has accessed page A 501, that is, processor # 1. Upon receiving this interrupt message, processor # 1 invalidates and flushes page A stored in cache 420 and invalidates the page descriptor stored in TLB 410 in FIG. Processor # 1 then sends a response message to processor # 2.

또한 프로세서 #1의 응답 메시지에 따라, 예외 처리부(201)는, 도 5에서, 캐시(520)에 저장된 페이지 A를 무효화(invalidation) 및 플러시(flush)하고, 페이지 디스크립터(502)의 로드 위치 결정 필드를 '0'으로 변경한다.In addition, in response to the response message of the processor # 1, the exception processing unit 201 invalidates and flushes page A stored in the cache 520 in FIG. 5, and determines a load position of the page descriptor 502. Change the field to '0'.

이어서 도 2 및 도 6에서, 프로세서 #1이 다시 페이지 A(601)에 접근하려고 하면 도 4 및 도 5를 통해 TLB(610)가 무효화된 상태이기 때문에 다시 TLB miss가 발생된다. TLB miss에 따라 호출된 예외 처리부(201)는 페이지 A(601)의 페이지 디스크립터(602)를 프로세서 #1의 TLB(610)로 복사한다. Subsequently, in FIGS. 2 and 6, when processor # 1 tries to access page A 601 again, TLB miss is generated again because the TLB 610 is invalidated through FIGS. 4 and 5. The exception processor 201 called according to the TLB miss copies the page descriptor 602 of the page A 601 to the TLB 610 of the processor # 1.

페이지 디스크립터(602)가 D-TLB(610)로 복사되면, 프로세서 #1은 캐시(620) 또는 로컬 스토어(630)를 통해 페이지 A(601)로 접근한다. 이때 페이지 디스크립터(602)의 로드 위치 결정 필드가 도 6을 통해 '0'으로 변경되었으므로, 페이지 A(601)는 로컬 스토어(630)로 로드된다.Once page descriptor 602 is copied to D-TLB 610, processor # 1 accesses page A 601 via cache 620 or local store 630. At this time, since the load position determination field of the page descriptor 602 is changed to '0' through FIG. 6, the page A 601 is loaded into the local store 630.

또한 도 2 및 도 7에서, 프로세서 #3이 페이지 A(701)에 접근하는 경우, 도 6과 마찬가지로 페이지 A(701)의 페이지 디스크립터(702)가 프로세서 #3의 TLB(710)로 복사되고, 페이지 A(701)가 로컬 스토어(730)로 로드되는 것이 가능하다.2 and 7, when processor # 3 approaches page A 701, the page descriptor 702 of page A 701 is copied to the TLB 710 of processor # 3, as in FIG. It is possible for page A 701 to be loaded into the local store 730.

도 6과 도 7을 참조하면, 동일한 페이지 A(601)(701)가 프로세서 #1의 로컬 스토어(630) 및 프로세서 #3의 로컬 스토어(730)에 동시에 로드되어 있으므로, 프로세서의 동작에 따라 페이지(601)(701)에 일관성(coherence)이 보장되지 아니할 수도 있다. 6 and 7, since the same page A 601 and 701 are simultaneously loaded into the local store 630 of the processor # 1 and the local store 730 of the processor # 3, the page according to the operation of the processor may be used. Coherence may not be guaranteed at (601) (701).

도 2, 도 6 및 도 7에서, 일관성 보장부(202)는 프로세서 #1의 로컬 스토어(630) 및 프로세서 #3의 로컬 스토어(730)를 제어해서 페이지 일관성이 보장되도록 한다. 다시 말해, 일관성 보장부(202)는 정해진 일관성 정책(coherent protocol)에 따라 어느 하나의 프로세서가 변경한 내용을 특정한 시점에 다른 프로세서가 볼 수 있게 하는 것이 가능하다. 2, 6, and 7, the consistency guarantee unit 202 controls the local store 630 of the processor # 1 and the local store 730 of the processor # 3 to ensure page consistency. In other words, the consistency guarantee unit 202 may allow other processors to see the changes made by any one processor at a specific point in time according to a coherent protocol.

예컨대, Release consistency model에 따라, 도 6에서, release point에 로컬 스토어(630)의 내용을 메인 메모리(130)로 반영하고 페이지 A(601)을 접근했던 모든 프로세서들의 로컬 스토어(예컨대, 도 5의 520 및 도 7의 730)에 저장된 쓰기 공유 페이지를 invalid로 설정하는 것이 가능하다. 이후 각각의 프로세서 #1, #2, 및 #3는 자신의 로컬 스토어에 저장된 쓰기 공유 페이지가 invalid로 설정되어있는지 검사한 후 invalid로 설정된 쓰기 공유 페이지에 대해서는 TLB에서 해당 페이지 디스크립터를 무효화시킬 수 있다. 그러나 이것은 Release consistency model을 예시한 것에 불과한 것으로, 일관성 보장부(202)는 그 밖에도 다른 일관성 정책에 따라 각 로컬 스토어 및 각 프로세서의 페이지 접근 순서를 제어할 수 있음은 물론이다. For example, according to the release consistency model, in FIG. 6, the local store (eg, FIG. 5 of FIG. 5) reflects the contents of the local store 630 to main memory 130 at the release point and accesses page A 601. It is possible to set the write sharing page stored at 520 and 730 of FIG. 7 to invalid. Each processor # 1, # 2, and # 3 can then check that the write-shared page stored in their local store is set to invalid, and then invalidate that page descriptor in the TLB for the write-shared page set to invalid. . However, this is merely an example of a release consistency model, and the consistency guarantee unit 202 may control page access order of each local store and each processor according to other consistency policies.

도 5 내지 도 7과 같은 과정이 반복되면, 실제로 일관성이 보장되어야 하는 페이지에 대해서만 선택적으로 로컬 스토어를 통해 일관성 보장을 위한 동작이 수행되는 것을 알 수 있다. 즉 프로세서에 의해 접근되는 페이지를 그 페이지의 일관성 보장 필요성에 따라 일관성 정책이 지원되는 로컬 스토어와 일관성 정책이 지원되지 아니하는 캐시에 선택적으로 로드하여 관리하기 때문에 불필요한 네트워크 트래픽을 줄이고 시스템 구조를 단순화시킬 수 있다. When the process as shown in FIGS. 5 to 7 is repeated, it can be seen that an operation for guaranteeing consistency is selectively performed through the local store only for pages that need to be guaranteed. In other words, the pages accessed by the processor are selectively loaded and managed according to the necessity of ensuring the consistency of the pages in the local store with the consistency policy and the cache without the consistency policy, thereby reducing unnecessary network traffic and simplifying the system structure. Can be.

한편, 본 발명의 실시 예들은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.Meanwhile, the embodiments of the present invention can be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored.

컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현하는 것을 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Examples of the computer-readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device and the like, and also a carrier wave (for example, transmission via the Internet) . In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers skilled in the art to which the present invention belongs.

나아가 전술한 실시 예들은 본 발명을 예시적으로 설명하기 위한 것으로 본 발명의 권리범위가 특정 실시 예에 한정되지 아니할 것이다.Furthermore, the above-described embodiments are intended to illustrate the present invention by way of example and the scope of the present invention will not be limited to the specific embodiments.

Claims

A TLB of the first processor with a page descriptor having a load positioning field indicating whether the first processor is to access the page and whether the page is to be loaded from the memory area into the cache of the first processor or into the local store of the first processor. a first TLB exception processing unit for copying to a translation lookaside buffer; And
If the page is a write-shared page where page coherence should be ensured, an interrupt message is issued to flush the cache of the second processor that accessed the page and invalidate the TLB of the second processor. A second TLB exception processing unit for transmitting to a second processor, flushing the cache of the first processor and modifying the load location determination field according to a response of the second processor; Memory management apparatus of a multicore system comprising a.

The method of claim 1,
A consistency guarantee unit controlling local stores of the first and second processors to ensure page consistency according to a predetermined consistency protocol; Memory management apparatus of the multi-core system further comprising.

The method of claim 1, wherein the first TLB exception processing unit
And setting a lock on the page descriptor while copying the page descriptor to the TLB.

The method of claim 1, wherein the second TLB exception processing unit
And determining whether the page is a write shared page according to whether at least two processors have accessed the page and at least one of the processors has accessed the write operation.

The method of claim 1, wherein the second TLB exception processing unit
In response to the response of the second processor, a memory of a multicore system that modifies the load location determination field so that the page is loaded into the local store of the first or third processor when the first or third processor accesses the page. Management device.

The method of claim 1, wherein the load position determination field is
Memory management apparatus of a multicore system including at least one bit area.

The method of claim 2, wherein the consistency guarantee unit
The memory management apparatus of the multi-core system reflects the modified page in the memory region according to the release coherence protocol so that the modified page in the local store of one processor can refer to.

A first core comprising a first processor, a first cache that does not provide memory coherency, and a first local store that provides memory coherency;
A second core comprising a second processor, a second cache not providing memory coherency, and a second local store providing memory coherency;
A memory manager configured to copy the page to the first or second cache or to the first or second local store according to the load position determination bits of the page descriptor regarding the page to be accessed by the first or second processor; Multicore system comprising a.

8. The memory device of claim 7, wherein the memory manager
And modifying the load location determination bits when the page is a write-shared page where page coherence should be guaranteed.

8. The memory device of claim 7, wherein the memory manager
And controlling the first and second local stores to ensure page consistency according to a given consistency protocol.