KR20070020391A

KR20070020391A - Dmac issue mechanism via streaming id method

Info

Publication number: KR20070020391A
Application number: KR1020067012781A
Authority: KR
Inventors: 메튜 에드워드 킹; 페이춘 피터 리우; 데이비드 무이; 다케시 야마자키
Original assignee: 가부시키가이샤 소니 컴퓨터 엔터테인먼트; 인터내셔널 비지니스 머신즈 코포레이션
Priority date: 2004-07-29
Filing date: 2005-07-28
Publication date: 2007-02-21

Abstract

장치, 방법 및 컴퓨터 프로그램이 직접 메모리 액세스(Direct Memory Access; DMA) 명령을 실행하기 위하여 제공된다. 물리적 큐는 프로세서에서 프로세서, 프로세서에서 입출력(Input/Output; I/O) 장치 및 프로세서에서 외부 또는 시스템 메모리와 같은 명령 타입에 기초하여 소프트웨어에 의해 다수개의 가상 큐로 분할된다. 그리고, 명령은 DMA 명령, 즉 로드 또는 저장의 타입에 기초하여 슬롯으로 할당된다. 할당되고 나면, DMA 명령을 실행하기 위한 더욱 효율적인 방식을 제공하기 위하여, 명령은 슬롯 사이에서 변경되고 슬롯의 연속 시스템을 사용함으로써 실행될 수 있다. Apparatus, methods, and computer programs are provided for executing Direct Memory Access (DMA) instructions. Physical queues are divided into multiple virtual queues by software based on instruction types such as processor to processor, input / output (I / O) devices to the processor, and external or system memory to the processor. The command is then assigned to a slot based on the type of DMA command, i. Once assigned, the instructions can be changed between slots and executed by using a continuous system of slots to provide a more efficient way to execute DMA commands.

DMA, 명령, 큐, 슬롯, 쿼터 DMA, commands, queues, slots, quotas

Description

DMCA issuing mechanism by streaming ID method {DMAC ISSUE MECHANISM VIA STREAMING ID METHOD}

본 발명은 직접 메모리 액세스(Direct Memory Access; DMA) 요구 명령의 발행에 관한 것으로, 더욱 상세하게는 명령 큐(queue)의 동작에 관한 것이다. The present invention relates to the issue of a Direct Memory Access (DMA) request command, and more particularly to the operation of a command queue.

지난 수년 동안, DMA는 컴퓨터 아키텍처의 중요한 양상이 되고 있다. DMA와 더불어, 멀티프로세서 시스템이 보다 고속화된 처리 능력을 제공하기 위해서, DMA를 이용하여 발전되고 있다. 특히 DMA에 관하여, 프로세서로부터 발행되어 DMA 제어기(DMA Controller; DMAC)가 실행하는 요구 또는 명령의 전형적인 두 타입, 즉 로드 및 저장이 존재한다. 물론, 시스템에 따라서, 개별적인 프로세서가 입출력(Input/Output; I/O) 장치, 다른 프로세서의 로컬 메모리 및 메모리 장치 등으로부터 로드 또는 저장할 능력을 가질 수도 있다. Over the years, DMAs have become an important aspect of computer architecture. In addition to DMA, multiprocessor systems have evolved using DMA to provide faster processing power. Especially with regard to DMA, there are two typical types of requests or instructions issued from a processor and executed by a DMA controller (DMAC), namely load and store. Of course, depending on the system, individual processors may have the ability to load or store from input / output (I / O) devices, local memory and memory devices of other processors, and the like.

그러나, 최근에 멀티프로세서 및 DMAC는 단일 칩 상에 결합되고 있다. 단일 칩으로의 축소는 증가된 속도뿐만 아니라 축소된 크기를 가능하게 한다. DMAC, 프로세서, 버스 인터페이스 유닛(Bus Interface Unit; BIU) 및 버스는 모두 칩 상에 결합될 수 있다. 이러한 시스템의 데이터 흐름은 프로세서 코어(processor core)로부터 시작하는데, 프로세서 코어는 DMA 명령을 배정하고, 이러한 명령은 DMA 명령 큐로 저장된다. 각각의 DMA 명령은 BIU로의 보다 작은 버스 요구로 전개되거나 분할될 수 있다. 그 결과로 전개된 요구는 버스 요구 큐를 보류 중인 BIU로 저장된다. 그리고, BIU는 버스 제어기로 요구를 전송한다. 일반적으로, 요구는 DMA로부터 수신된 순서로 BIU로부터 송신된다. 버스 요구가 완료되는 경우, 버스 요구 큐를 보류 중인 BIU는 새로운 DMA 큐를 수신하는 것이 가능하다. Recently, however, multiprocessors and DMACs have been combined on a single chip. Reduction to a single chip allows for reduced speed as well as increased speed. The DMAC, processor, bus interface unit (BIU) and bus can all be coupled on a chip. The data flow of such a system starts from a processor core, which allocates DMA instructions, which are stored in a DMA instruction queue. Each DMA command can be expanded or split into smaller bus requests to the BIU. The resulting deployed request is stored in the BIU pending bus request queue. The BIU then sends a request to the bus controller. In general, requests are sent from the BIU in the order received from the DMA. When the bus request is completed, the BIU pending the bus request queue is able to receive a new DMA queue.

한편, 병목 현상이 소스(source) 장치에서의 버스 요구 큐를 보류 중인 BIU와 수신측 장치에서의 스누프 큐의 물리적 크기로 인하여 발생될 수 있다. 전형적으로, 병목 현상은 실행 명령의 큐 오더(order) 및/또는 지연의 기능이 있다. 예를 들어, 다른 프로세서의 로컬 메모리로부터 로드할 제 2 명령이 동적 임의 접근 기억 장치(Dynamic Random Access Memory; DRAM)로 저장할 제 1 명령을 기다리면서 지연될 수 있다. 이에 따라, 그 결과의 병목 현상은 동작 속도에 있어서의 심각한 손실을 발생시킬 수 있다. Meanwhile, a bottleneck may occur due to the physical size of the snoop queue at the receiving device and the BIU pending the bus request queue at the source device. Typically, the bottleneck is a function of queue order and / or delay of execution instructions. For example, a second instruction to load from the local memory of another processor may be delayed waiting for the first instruction to be stored in Dynamic Random Access Memory (DRAM). As a result, the resulting bottleneck can result in a significant loss in operating speed.

병목 현상의 요인은 DMA 명령 오더의 실행일 수 있다. 사실 상, 특정 명령이 다른 명령보다도 고속으로 실행된다. 예를 들어, 동일 칩 상의 프로세서 사이에서 데이터를 이동시키는 DMA 명령 실행은, 전형적으로 보다 장시간 이루어지는 외부 메모리 또는 I/O 장치로의 DMA 명령 실행보다 고속으로 완료될 수 있다. 그 결과로서, 메모리 또는 I/O 장치로의 데이터 이동에 대한 DMA 명령은 요구 큐를 보류 중인 BIU에서 보다 장시간 머무를 것이다. 결국, 요구 큐를 보류 중인 BIU는 DMA로부터의 부가적인 버스 요구에 대하여 전혀 혹은 거의 수용 공간을 남기지 않으면서, 보다 늦은 버스 요구로 점유될 수 있다. 이는, 프로세서가 버스 요구 큐를 보류 중 인 BIU의 이용가능한 공간을 위해 대기하도록 정지해야만 하기 때문에, 프로세서의 성능 저하를 발생시킨다. The bottleneck may be the execution of the DMA command order. In fact, certain instructions execute faster than others. For example, DMA instruction execution that moves data between processors on the same chip may be completed at a faster rate than DMA instruction execution to external memory or I / O devices, which typically takes longer periods. As a result, DMA commands for moving data to memory or I / O devices will stay longer in the pending BIU than the request queue. As a result, a BIU pending a request queue may be occupied by a later bus request, leaving little or no accommodation space for additional bus requests from the DMA. This causes performance degradation of the processor because the processor must stop to wait for the available space of the BIU pending the bus request queue.

병목 현상의 다른 요인은 재시도일 수 있다. 복합 소스 장치가 동일 수신측 장치로/로부터 데이터를 이동시키는 경우, 스누프(snoop) 큐가 소스 장치가 후에 동일한 버스 요구를 재시도하도록 하는 풀(full) 상태일 때, 수신측 장치는 버스 요구를 거절해야만 한다. Another factor in the bottleneck may be retry. When the composite source device moves data to / from the same receiving device, when the snoop queue is in a full state that causes the source device to retry the same bus request later, the receiving device receives the bus request. You must refuse.

병목 현상의 또 다른 요인은 수신측 장치에서의 명령의 오더 또는 실행일 수 있다. 종래의 DRAM 액세스에 있어서, DRAM 장치는 연속적인 메모리 뱅크(memory bank) 상에서 병렬로 동작할 수 있다. 게다가, 전형적으로 양방향 버스는 DRAM 장치와 인터페이스(interface)하는데 사용된다. 데이터 이동 방향이 빈번하게 변하는 경우, 버스의 선회에 요구되는 부가적인 버스 사이클로 인하여 버스 대역폭이 감소된다. 또한, 보다 큰 병렬 DRAM 액세스를 획득하기 위해서, 동일한 메모리 페이지(page)에 일련의 판독 또는 기록을 행하는 것이 바람직하다. Another factor in the bottleneck may be the order or execution of the command at the receiving device. In conventional DRAM access, DRAM devices may operate in parallel on successive memory banks. In addition, bidirectional buses are typically used to interface with DRAM devices. If the direction of data movement changes frequently, the bus bandwidth is reduced due to the additional bus cycles required for the bus to turn. In addition, in order to obtain larger parallel DRAM access, it is desirable to perform a series of reads or writes to the same memory page.

따라서, 상기 문제를 해결하기 위하여, DMA 발행 메커니즘의 효율성을 개선하는 방법 및/또는 장치가 필요하다. Accordingly, to solve the above problem, a method and / or apparatus for improving the efficiency of a DMA issuance mechanism is needed.

본 발명의 다양한 양상이 유용한 DMA 요구와 관련된 특정 병목 현상 문제를 해결할 수 있다. DMA 제어기는 처리 소자로부터 DMA 명령을 수신하고, 버스 인터페이스 유닛으로 각각의 요구를 전개한다. DMA 제어기 내의 발행 논리는, 어떤 명령이 발송하도록 허용되는지 및/또는 명령을 발행하는 것의 타이밍(timing)을 결정한다. DMA 제어기 내의 버스 인터페이스 유닛은 큐를 갖는데, 큐는 DMA 요구가 버스로 발행되기 전에 DMA 요구를 홀드(hold)한다. 각각의 DMA 요구는 외부 메모리, I/O 장치 또는 온칩(on-chip) 메모리를 향하여 타깃(target)을 정할 수 있다. 온칩 메모리의 예로는 로컬 저장 메모리(처리 소자의 로컬 메모리), 메모리 매핑된 IO(Memory Mapped IO; MMIO) 또는 캐시-투-캐시(cache-to-cache) 이동 수단이 포함된다. Various aspects of the present invention may address certain bottleneck problems associated with useful DMA requirements. The DMA controller receives the DMA command from the processing element and develops each request to the bus interface unit. The issue logic in the DMA controller determines which commands are allowed to issue and / or the timing of issuing the commands. The bus interface unit in the DMA controller has a queue, which holds the DMA request before the DMA request is issued to the bus. Each DMA request may target a external memory, I / O device or on-chip memory. Examples of on-chip memory include local storage memory (local memory of the processing element), memory mapped IO (MMIO) or cache-to-cache moving means.

하나의 개요에 있어서, DMA 큐에 있어 다수개의 DMA 요구의 하나 이상의 속성은 메모리 시스템의 총 성능을 감소시키는데 도움이 될 수 있다. 예를 들어, 하나 이상의 속성은 DMA 요구가 DMA 판독 명령에 대한 것이든지 또는 DMA 기록 명령에 대한 것이든지 포함할 수 있다. 각각의 DMA 명령은 다수개의 메모리 액세스로 분할되고, 그런 다음에도 정상적으로 실행되기 때문에, 연속의 수많은 DMA 판독 동작 또는 수많은 DMA 기록 동작이 성능을 감소시킬 수 있다. 교대의 판독 및 기록 동작은 이러한 문제를 해결할 수 있다. In one overview, one or more attributes of multiple DMA requests in a DMA queue may help to reduce the total performance of the memory system. For example, one or more attributes may include whether the DMA request is for a DMA read command or for a DMA write command. Since each DMA command is divided into a number of memory accesses and then executed normally, a number of successive DMA read operations or numerous DMA write operations can reduce performance. Alternate read and write operations can solve this problem.

본 발명의 하나 이상의 실시예에 따르면, "슬롯(slot)"이라는 개념은 DMA 큐의 DMA 요구에 대응하여 메모리 액세스를 발행하는 것의 타이밍을 제어하는데 이용될 수 있다. 예를 들어, 일정 메모리 액세스가 일정 타임 슬롯(또는 간격) "T"(T=0, 1, 2, 3, . . .)를 허용한다고 가정하면, 판독 및 기록 동작에 대한 명령 발행은, 판독 메모리 액세스 동작이 슬롯 2T동안 발행되는 반면에, 기록 메모리 액세스 동작은 슬롯 2T + 1동안 발행되도록 제어될 수 있다. 예로서, 연속적인 판독 또는 기록 동작이 발행되거나, 프로세서-투-프로세서(processor-to-processor) 액세스가 요구되거나 또는 프로세서에서 I/O 액세스가 요구되는 경우, 일부 시스템은 감소된 성능을 경험할 수 있다. 이러한 시스템에 있어서, 프로세서에서 메모리 액세스에 대한 연속적인 판독 또는 기록 동작은 감소된 성능을 발생시킬 수 없다. 한편, 단지 프로세서에서 프로세서 또는 프로세서에서 I/O 액세스 문맥(context)에 슬롯 개념을 이용함으로써, 성능의 손실은 방지할 수 있다. In accordance with one or more embodiments of the present invention, the concept of "slot" may be used to control the timing of issuing a memory access in response to a DMA request in a DMA queue. For example, assuming that constant memory access permits a certain time slot (or interval) "T" (T = 0, 1, 2, 3, ...), issuing a command for a read and write operation is a read. While the memory access operation is issued during slot 2T, the write memory access operation can be controlled to be issued during slot 2T + 1. For example, some systems may experience reduced performance when a continuous read or write operation is issued, processor-to-processor access is required, or I / O access is required at the processor. have. In such a system, successive read or write operations for memory accesses in the processor may not result in reduced performance. On the other hand, the loss of performance can be avoided by simply using the slot concept in the processor or in the I / O access context in the processor.

또한, 스트리밍 ID 개념의 이용은 고속 메모리 액세스를 유지하는 것에 대한 이로움을 입증할 수 있다. 스트리밍 ID 접근은, 메모리로의 액세스 요구 또는 I/O 장치로의 액세스 요구 사이의 식별과 같이, 요구된 액세스의 타깃을 식별하는데 이용될 수 있다. 스트리밍 ID 개념이 해결하는 동작 문제를 설명하기 위해서, I/O 장치로의 액세스가 메모리로의 액세스보다 더욱 장시간을 필요로 한다고 가정한다. DMA 큐에 저장된 메모리 액세스 요구의 수와 I/O 액세스 요구의 수가 실질적으로 동일하다고 더 가정한다. 이러한 경우, 메모리로의 액세스 시간은, (I/O 액세스 요구를 서비스하는 것 때문에) 보통은 존재하지 않는 잉여 시간을 포함할 수 있고, 그에 따라, 기대보다 늦을 수 있다. 이러한 문제를 해결하기 위해서, 각각의 쿼터(quota), 즉 메모리 액세스 요구에 대한 하나의 쿼터와 I/O 액세스 요구에 대한 하나의 쿼터를 할당하는 것이 가능하다. 따라서, I/O 액세스 요구가 (I/O 액세스 쿼터에 도달하는 것 때문에) 중지되더라도, 메모리 액세스 쿼터에는 도달되지 않을 수 있기 때문에, 메모리 액세스 요구를 서비스하는 것의 속도는 성능의 어떤 감소 없이 유지될 수 있다. In addition, the use of the streaming ID concept may prove beneficial for maintaining high speed memory access. Streaming ID access may be used to identify the target of the requested access, such as identification between an access request to memory or an access request to an I / O device. To illustrate the operational problem solved by the streaming ID concept, assume that access to an I / O device requires more time than access to memory. Further assume that the number of memory access requests and the number of I / O access requests stored in the DMA queue are substantially the same. In such a case, the access time to memory may include excess time that is not normally present (due to servicing I / O access requests), and thus may be later than expected. In order to solve this problem, it is possible to allocate each quota, one quota for memory access requests and one quota for I / O access requests. Thus, even if an I / O access request is suspended (due to reaching an I / O access quota), since the memory access quota may not be reached, the speed of servicing the memory access request will be maintained without any reduction in performance. Can be.

슬롯 접근 및 스트리밍 ID 접근은 개별적으로 또는 결합하여 이용될 수 있다. 이에 따라, 발행 정책은 다음의 요소, 즉 슬롯 변경, 스트리밍 ID 그룹 및 DMA 명령의 에이지(age) 중 적어도 어느 하나를 고려할 수 있다. 당업자라면 여기의 상세한 설명이 슬롯 및 스트리밍 ID 접근의 결합을 주로 가리키고 있다는 것을 인식할 것이다. Slot access and streaming ID access can be used individually or in combination. Accordingly, the issuance policy may consider at least one of the following factors: slot change, streaming ID group, and age of the DMA command. Those skilled in the art will appreciate that the detailed description herein mainly refers to the combination of slot and streaming ID access.

예를 들어, 본 발명의 일 실시예에 있어서, DMAC의 명령을 실행하는 방법 및 컴퓨터 프로그램이 고려되었다. 슬롯이 처음에 선택된다. 슬롯이 선택되고나면, 선택된 슬롯의 어떤 그룹이 유효한지에 관한 결정이 이루어진다. 유효한 그룹이 존재하지 않는 경우, 다른 슬롯이 선택된다. 한편, 적어도 하나의 유효한 그룹이 존재하는 경우, 연속 조정 계획이 그룹을 선택하는데 이용된다. 선택된 그룹 내에서, 가장 오래된 미결정 DMA 명령이 택해지고 전개된다. 전개된 버스 요구는 BIU에 배정된다. 전개 후에, DMA 명령 파라미터(parameter)가 갱신되고, DMA 명령 큐로 후기록된다. For example, in one embodiment of the present invention, a method and computer program for executing the instructions of DMAC have been considered. The slot is initially selected. Once the slot is selected, a determination is made as to which group of the selected slot is valid. If no valid group exists, another slot is selected. On the other hand, if there is at least one valid group, a continuous coordination plan is used to select the group. Within the selected group, the oldest undetermined DMA command is taken and deployed. The developed bus request is assigned to the BIU. After expansion, the DMA command parameters are updated and written back to the DMA command queue.

본 발명 및 본 발명의 이점에 대한 보다 깊은 이해를 위하여, 첨부하는 도면과 함께 이하의 설명을 참조하도록 한다. For a deeper understanding of the present invention and its advantages, reference is made to the following description in conjunction with the accompanying drawings.

도 1은 DMAC를 사용한 멀티프로세서 컴퓨터 시스템을 나타내는 블록도이다. 1 is a block diagram illustrating a multiprocessor computer system using DMAC.

도 2A는 개선된 DMAC 명령 큐를 나타내는 블록도이다.2A is a block diagram illustrating an improved DMAC command queue.

도 2B는 개선된 DMAC 명령 레지스터의 제어 레지스터를 나타내는 블록도이다. 2B is a block diagram illustrating a control register of an improved DMAC command register.

도 3은 DMAC 발행 메커니즘에 의한 명령의 발행을 나타내는 흐름도이다. 3 is a flow diagram illustrating issuance of instructions by the DMAC issuance mechanism.

이하에서, 수많은 특정 상세한 사항이 본 발명의 완전한 이해를 제공하기 위해 설명된다. 한편, 당업자라면 본 발명이 이러한 특정 상세한 사항이 없이 실시될 수 있다는 것을 이해할 것이다. 다른 예에 있어서, 불필요한 설명에서 본 발명을 흐리지 않도록, 주지의 소자는 개요 또는 블록도로 도시되고 있다. 더욱이, 대부분의 경우, 네트워크 통신 및 전자 신호 기술 등에 관한 상세한 사항은, 이러한 상세한 사항이 본 발명의 완전한 이해를 획득하는데 필요한 것으로 고려되지 않고, 당업자에 의해 이해될 것으로 고려되므로 생략된다. In the following, numerous specific details are set forth in order to provide a thorough understanding of the present invention. On the other hand, those skilled in the art will appreciate that the invention may be practiced without these specific details. In other instances, well-known elements are shown in outline or block diagram form in order to avoid obscuring the present invention from unnecessary description. Moreover, in most cases, details regarding network communications and electronic signal technology, etc., are omitted since such details are not considered necessary to obtain a thorough understanding of the present invention and are to be understood by those skilled in the art.

게다가, 다른 지시가 없는 한, 여기에 기술된 모든 기능은 하드웨어, 소프트웨어 또는 이들의 어떤 조합 내에서 구현될 수 있다. 한편, 바람직한 실시예에 있어서, 이러한 기능은, 다른 지시가 없는 한, 이러한 기능을 실행하도록 코딩된 컴퓨터 프로그램 코드, 소프트웨어 및/또는 집적 회로와 같은 코드에 따라서, 컴퓨터 또는 전자 데이터 프로세서와 같은 프로세서에 의해 실행된다. In addition, unless otherwise indicated, all of the functions described herein may be implemented in hardware, software, or any combination thereof. On the other hand, in a preferred embodiment, these functions may be provided to a processor, such as a computer or electronic data processor, in accordance with code such as computer program code, software and / or integrated circuits coded to carry out such functions, unless otherwise indicated. Is executed by

도 1을 참조하면, 일반적으로 참조 부호 100은 DMAC를 사용한 멀티프로세서 컴퓨터 시스템을 나타낸다. 시스템(100)은 제 1 프로세서(101), 제 2 프로세서(103), 제 3 프로세서(105), 버스(130), 메모리 제어기(122), 메모리 장치(124), I/O 제어기(126) 및 I/O 장치(128)를 포함한다. 게다가, 시스템(100)과 함께 사용될 수 있는 다양한 타입의 저장 또는 메모리 장치가 존재한다. 또한, 도 1에 도시된 바와 같이, 단일 프로세서 또는 복합 프로세서가 존재할 수 있다. Referring to FIG. 1, generally reference numeral 100 denotes a multiprocessor computer system using DMAC. The system 100 includes a first processor 101, a second processor 103, a third processor 105, a bus 130, a memory controller 122, a memory device 124, an I / O controller 126. And I / O device 128. In addition, there are various types of storage or memory devices that can be used with the system 100. In addition, as shown in FIG. 1, there may be a single processor or a composite processor.

프로세서(101, 103, 105) 각각은 데이터를 통신하기 위하여 공통의 방식으로 구성된다. 제 1 프로세서(101), 제 2 프로세서(103) 및 제 3 프로세서(105) 각각은 제 1 프로세서 코어(104), 제 2 프로세서 코어(106) 및 제 3 프로세서 코어(108)를 개별적으로 더 포함한다. 제 1 프로세서 코어(104)는 제 1 로드 통신 채널(152) 및 제 1 저장 통신 채널(150)을 통하여 제 1 DMAC(110)에 결합된다. 제 2 프로세서 코어(106)는 제 2 로드 통신 채널(156) 및 제 2 저장 통신 채널(154)을 통하여 제 2 DMAC(112)에 결합된다. 제 3 프로세서 코어(108)는 제 3 로드 통신 채널(160) 및 제 3 저장 통신 채널(158)을 통하여 제 3 DMAC(114)에 결합된다. 제 1 DMAC(110)는 제 4 저장 통신 채널(162) 및 제 4 로드 통신 채널(164)을 통하여 제 1 BIU(116)에 결합된다. 제 2 DMAC(112)는 제 5 저장 통신 채널(166) 및 제 5 로드 통신 채널(168)을 통하여 제 2 BIU(118)에 결합된다. 제 3 DMAC(114)는 제 6 저장 통신 채널(170) 및 제 6 로드 통신 채널(172)을 통하여 제 3 BIU(120)에 결합된다. Each of the processors 101, 103, 105 are configured in a common manner to communicate data. Each of the first processor 101, the second processor 103, and the third processor 105 further includes a first processor core 104, a second processor core 106, and a third processor core 108 separately. do. The first processor core 104 is coupled to the first DMAC 110 via a first load communication channel 152 and a first storage communication channel 150. The second processor core 106 is coupled to the second DMAC 112 via a second load communication channel 156 and a second storage communication channel 154. The third processor core 108 is coupled to the third DMAC 114 via the third load communication channel 160 and the third storage communication channel 158. The first DMAC 110 is coupled to the first BIU 116 via a fourth storage communication channel 162 and a fourth load communication channel 164. The second DMAC 112 is coupled to the second BIU 118 via a fifth storage communication channel 166 and a fifth load communication channel 168. The third DMAC 114 is coupled to the third BIU 120 via the sixth storage communication channel 170 and the sixth load communication channel 172.

또한, 개별적인 프로세서 각각은 공통의 방식으로 동작한다. 로드 또는 저장 명령과 같은 명령은 프로세서 코어에서 발신한다. 일정 프로세서에 의해 발행될 수 있는 다양한 명령이 존재한다. 한편, 설명의 사정 상, 세 개의 별도의 명령 타입, 즉 프로세서에서 프로세서, 프로세서에서 메모리 장치 및 프로세서에서 I/O 장치에 중점을 둔다. 명령이 프로세서 코어에 의해 발행되고 나면, 명령은 DMAC 상으로 전송된다. 그리고, DMAC는 BIU로 명령을 전개하고, BIU에서 보류 중인 버스 요구 큐는 전개된 버스 요구를 저장한다. 다음으로, 버스 요구는 버스로 송신된다. 버스 제어기가 요구를 제공하는 경우, 소스와 수신측 장치는 버스 요구를 완료하기 위해 데이터 전송을 실행할 것이다. In addition, each of the individual processors operates in a common manner. Instructions such as load or store instructions originate from the processor core. There are various instructions that can be issued by a given processor. On the other hand, for the sake of explanation, it focuses on three separate instruction types: processor to processor, processor to memory device, and processor to I / O device. After the command is issued by the processor core, the command is sent over DMAC. The DMAC then deploys the command to the BIU, and the bus request queue pending at the BIU stores the deployed bus request. Next, the bus request is sent to the bus. If the bus controller provides a request, the source and receiving devices will execute the data transfer to complete the bus request.

DMAC를 사용한 멀티프로세서 컴퓨터 시스템(100)은 여러 구성 부품 사이에서 데이터 및 버스 요구를 통신하기 위하여 버스(130)를 사용함으로써 동작한다. 제 1 프로세서(101)는 제 7 저장 통신 채널(174) 및 제 7 로드 통신 채널(176)을 통하여 버스(130)에 결합된다. 제 2 프로세서(103)는 제 8 저장 통신 채널(178) 및 제 8 저장 로드 통신 채널(180)을 통하여 버스(130)에 결합된다. 제 3 프로세서(105)는 제 9 저장 통신 채널(182) 및 제 9 로드 통신 채널(184)를 통하여 버스(130)에 결합된다. 메모리 제어기(122)는 메모리 장치(124)로 그리고 메모리 장치(124)로부터 데이터를 통신하기 위하여 양방향 메모리 버스 구현을 사용한다. 이에 따라, 메모리 제어기(122)는 제 10 저장 통신 채널(186) 및 제 10 로드 통신 채널(188)을 통하여 양방향 메모리 버스 구현에 의해 버스(130)에 결합된다. 또한, I/O 제어기(126)는 제 11 저장 통신 채널(190) 및 제 11 로드 통신 채널(192)을 통하여 버스(130)에 결합된다. Multiprocessor computer system 100 using DMAC operates by using bus 130 to communicate data and bus requests between various components. The first processor 101 is coupled to the bus 130 via a seventh storage communication channel 174 and a seventh load communication channel 176. The second processor 103 is coupled to the bus 130 via an eighth storage communication channel 178 and an eighth storage load communication channel 180. The third processor 105 is coupled to the bus 130 via a ninth storage communication channel 182 and a ninth load communication channel 184. Memory controller 122 uses a bidirectional memory bus implementation to communicate data to and from memory device 124. Accordingly, memory controller 122 is coupled to bus 130 by a bi-directional memory bus implementation via tenth storage communication channel 186 and tenth load communication channel 188. In addition, I / O controller 126 is coupled to bus 130 via eleventh storage communication channel 190 and eleventh load communication channel 192.

또한, 버스(130)로의 연결에 더하여, 여러 다른 구성 부품 사이에의 연결이 존재할 수 있다. 더욱 상세하게는, 메모리 제어기(122) 및 I/O 제어기(126)와 같은 제어기는 다른 개별적인 장치로의 연결을 필요로 한다. 메모리 제어기(122)는 제 1 대역폭으로 제어된 통신 채널(194)을 통하여 메모리 장치(124)에 결합된다. I/O 제어기(126)는 제 2 대역폭으로 제어된 통신 채널(196) 및 제 3 대역폭으로 제어된 통신 채널(198)을 통하여 I/O 장치(128)에 결합된다. In addition to connections to the bus 130, there may also be connections between various other components. More specifically, controllers such as memory controller 122 and I / O controller 126 require connection to other separate devices. The memory controller 122 is coupled to the memory device 124 via a communication channel 194 controlled by the first bandwidth. I / O controller 126 is coupled to I / O device 128 via communication channel 196 controlled by the second bandwidth and communication channel 198 controlled by the third bandwidth.

도 2A 및 도 2B를 참조하면, 일반적으로 참조 부호 200 및 250은 DMAC의 명령 큐 및 제어 레지스터를 각각 나타낸다. DMA 명령 큐(200)는 고정된 번호의 엔트 리(entry)를 포함하고, 각각의 엔트리는 세 개의 필드(field), 즉 슬롯 필드(210), 스트리밍 ID 필드(220) 및 명령 필드(230)로 분할된다. DMA 제어 레지스터(25)는 슬롯 가능 레지스터(252) 및 쿼터 레지스터(266)를 포함한다. 2A and 2B, reference numerals 200 and 250 generally denote command queues and control registers of the DMAC, respectively. DMA command queue 200 includes a fixed number of entries, each entry having three fields: slot field 210, streaming ID field 220 and command field 230. Divided into. The DMA control register 25 includes a slotable register 252 and a quarter register 266.

도 1의 DMAC(110)와 같은 DMAC에서, 물리적 큐에 명령의 큐를 행하는 제한된 수의 큐 엔트리가 존재한다. 착신되는 DMA 명령은 어떤 이용가능한 명령 큐 엔트리에 배치될 수 있다. 각각의 DMA 명령에 대한 슬롯 지정은 슬롯 필드(210)로 입력된다. DMA 명령은, 스트리밍 ID와 같이, 명령 조작 부호 및 피연산자로 구성되기 때문에, 스트리밍 ID는 스트리밍 ID 필드(220)에 배치되고, 명령 조작 부호 및 다른 피연산자는 명령 필드(230)에 배치된다. 각각의 스트리밍 ID는 단일 비트 슬롯 가능 레지스터(slot enable register; 252)에서 가능해지거나 억제된 슬롯 기능을 갖도록 구성되는데, 단일 비트 슬롯 가능 레지스터(252)는 그룹 0의 가능 슬롯(254), 그룹 1의 가능 슬롯(256) 및 그룹 2의 가능 슬롯(258)에 의해 나타내어진다. 더욱이, 그룹 0의 쿼터(260), 그룹 1의 쿼터(262), 그룹 2의 쿼터(264)로 나타내어지는 특정 쿼터가 존재한다. 쿼터의 총 합은 BIU의 보류 중인 버스 요구 큐의 크기에 의해 제한된다. In a DMAC, such as the DMAC 110 of FIG. 1, there is a limited number of queue entries that queue a command in a physical queue. The incoming DMA command can be placed in any available command queue entry. The slot designation for each DMA command is entered into the slot field 210. Since the DMA command is composed of an instruction operation code and operands, like the streaming ID, the streaming ID is placed in the streaming ID field 220 and the instruction operation code and other operands are placed in the instruction field 230. Each streaming ID is configured to have a slot function enabled or suppressed in a single bit slot enable register 252, where the single bit slot enabled register 252 is configured for the available slot 254 of group 0, the group 1; It is represented by a possible slot 256 and a possible slot 258 in group 2. Moreover, there are certain quotas represented by the quota 260 of group 0, the quota 262 of group 1, and the quota 264 of group 2. The total sum of quotas is limited by the size of the BIU's pending bus request queue.

슬롯의 가능 또는 억제는 버스 대역폭 특성에 부합시키는데 이용된다(예컨대, 버스가 메모리 버스와 같이 양방향인 경우, 슬롯 기능은 억제된다). 슬롯 기능이 스트리밍 ID 그룹에 대하여 가능해진 경우, 로드 명령은 슬롯 필드(210)에서 0의 값이 할당될 것이고, 저장 명령은 슬롯 필드(210)에서 1의 값이 할당될 것이다. 슬롯 기능이 억제된 경우, 로드 및 저장 명령은 슬롯 필드(210)에서 0의 값이 할당 될 것이다. Enabling or suppressing slots is used to match bus bandwidth characteristics (eg, slot functionality is suppressed if the bus is bidirectional, such as a memory bus). If the slot function is enabled for the Streaming ID group, the load command will be assigned a value of 0 in the slot field 210 and the save command will be assigned a value of 1 in the slot field 210. If the slot function is disabled, load and store instructions will be assigned a value of zero in the slot field 210.

그러나, 전형적으로, 세 개의 버스 요구 동작, 즉 프로세서에서 프로세서, 프로세서에서 외부 또는 시스템 메모리 및 프로세서에서 I/O 장치가 발생할 수 있다. 세 개의 동작은 스트리밍 ID 그룹으로 할당될 수 있다. Typically, however, three bus request operations may occur, i.e., an I / O device in the processor, an external or system memory in the processor, and a processor. Three operations may be assigned to a streaming ID group.

일반적으로, 프로세서에서 프로세서 명령은 스트리밍 ID 그룹 0으로 할당되고, 프로세서에서 메모리 명령은 스트리밍 ID 그룹 1에 할당되며, 프로세서에서 I/O 명령은 스트리밍 ID 그룹 2에 할당된다. 이러한 경우, DMA 명령과 관련된 버스 대역폭 특성에 부합시키기 위하여, 슬롯 기능은 스트리밍 ID 그룹 0 및 2에 대하여 가능해지고, 그룹 1에 대하여 억제된다. In general, processor instructions are assigned to streaming ID group 0 at the processor, memory instructions are assigned to streaming ID group 1 at the processor, and I / O instructions are assigned to streaming ID group 2 at the processor. In this case, in order to match the bus bandwidth characteristics associated with the DMA command, the slot function is enabled for streaming ID groups 0 and 2 and is suppressed for group 1.

DMA 명령은 전형적으로 BIU로의 하나 이상의 버스 요구로 전개된다. 이러한 버스 요구는 DMA 버스 요구 큐를 보류 중인 BIU에서 큐가 행해지는데, DMA 버스 요구 큐를 보류 중인 BIU는 제한된 크기를 갖는다. 각각의 스트리밍 ID 그룹에 대한 쿼터를 구성함으로써, 이러한 큐는 세 개의 가상의 큐로 분할된다. 소프트웨어 응용에 따라서, 세 개의 가상 큐의 크기는 스트리밍 ID 쿼터에 의해 다이내믹하게 구성될 수 있다. DMA commands are typically deployed with one or more bus requests to the BIU. This bus request is queued in a BIU pending a DMA bus request queue, where the BIU pending a DMA bus request queue has a limited size. By configuring the quota for each streaming ID group, this queue is divided into three virtual queues. Depending on the software application, the size of the three virtual queues can be dynamically configured by streaming ID quota.

도 3을 참조하면, 일반적으로 참조 부호 300은 수정된 DMAC 발행 메커니즘으로부터의 명령의 발행을 나타내는 흐름도를 나타낸다. Referring to FIG. 3, generally reference numeral 300 represents a flow diagram illustrating issuance of instructions from a modified DMAC issuance mechanism.

도 3의 흐름도에 도시된 바와 같이, DMA 명령이 명령 큐로 입력되고 나면, DMAC는 과정 300과 같은 명령을 발행하는 공정을 제공해야 한다. 단계 302에서, 슬롯 0과 슬롯 1 사이의 변경이 발생한다. 단방향 버스 타입에 대하여 이용가능한 대 역폭의 보다 효율적인 이용을 제공하기 위하여, DMAC는 슬롯 사이에서 변경한다. As shown in the flowchart of FIG. 3, once a DMA command has been entered into the command queue, the DMAC should provide a process for issuing a command such as process 300. In step 302, a change between slot 0 and slot 1 occurs. To provide more efficient use of the available bandwidth for the unidirectional bus type, the DMAC changes between slots.

슬롯 0이 다음으로 실행되도록 택해지는 경우, DMAC는 발행 명령 큐를 결정하기 위하여 일련의 측정을 행한다. 단계 304에서, DMAC는 어떤 그룹이 유효한 미결정 DMA 명령을 갖는지를 결정한다. 각각의 그룹은 최대 발행 계수 또는 쿼터와 관련된다. 쿼터는 시스템 오버플로우(overflow)를 방지하도록 발행될 수 있는 버스 요구의 수를 제한한다. 시스템의 적절한 동작을 유지하기 위하여, DMAC는 슬롯의 그룹 각각이 단계 306의 각각의 쿼터를 초과하는지의 여부를 결정한다. When slot 0 is chosen to be executed next, DMAC makes a series of measurements to determine the issue command queue. In step 304, the DMAC determines which group has a valid pending DMA command. Each group is associated with a maximum issue count or quota. The quota limits the number of bus requests that can be issued to prevent system overflow. In order to maintain proper operation of the system, DMAC determines whether each group of slots exceeds a respective quota of step 306.

유효성 및 쿼터의 결정이 이루어지고 나면, DMAC는 다음의 명령을 선택한다. 단계 308에서, DMAC는 명령 그룹 사이에서 연속 선택 시스템을 사용한다. 선택 시, 결정은, 단계 310에서 미결정 명령을 갖는 개별적인 쿼터 제한 하에 어떤 유효한 그룹이 존재하는지의 여부에 따라서 이루어진다. 미결정 명령을 갖는 개별적인 쿼터 제한 하에 유효한 그룹이 존재하지 않는 경우, 변경은 다른 슬롯, 즉 슬롯 1로 이루어진다. 한편, 미결정 명령을 갖는 개별적인 쿼터 하에 유효한 그룹이 존재하는 경우, 선택된 그룹으로부터의 가장 오래된 명령이 단계 312에서 전재된다. 연속 포인터(pointer)는 다음의 스트리밍 ID 명령 그룹으로 조절되고, 큐의 크기는 단계 314에서 감소되며, 슬롯은 단계 302에서 변경된다. Once the validity and quota decisions have been made, DMAC selects the following command: In step 308, the DMAC uses a continuous selection system between command groups. Upon selection, the decision is made according to whether there are any valid groups under the respective quota limits with pending commands in step 310. If no valid group exists under an individual quota limit with pending commands, the change is made to another slot, slot one. On the other hand, if there is a valid group under an individual quota with undecided commands, the oldest command from the selected group is transferred in step 312. The continuation pointer is adjusted to the next group of streaming ID commands, the size of the queue is reduced in step 314, and the slot is changed in step 302.

슬롯 1이 다음으로 실행되도록 택해지는 경우, DMAC는 발행 명령 큐를 결정하기 위하여 일련의 측정을 행한다. 단계 316에서, DMAC는 어떤 그룹이 유효한 미결정 DMA 명령을 갖는지를 결정한다. 각각의 그룹은 최대 발행 계수 또는 쿼터와 관련된다. 쿼터는 시스템 오버플로우를 방지하도록 발행될 수 있는 버스 요구의 수 를 제한한다. 시스템의 적절한 동작을 유지하기 위하여, DMAC는 슬롯의 그룹 각각이 단계 318에서 각각의 쿼터를 초과하는지의 여부를 결정한다. When slot 1 is chosen to be executed next, DMAC makes a series of measurements to determine the issue command queue. At step 316, the DMAC determines which group has a valid pending DMA command. Each group is associated with a maximum issue count or quota. Quotas limit the number of bus requests that can be issued to prevent system overflow. To maintain proper operation of the system, DMAC determines whether each group of slots exceeds each quota in step 318.

유효성 및 쿼터의 결정이 이루어지고 나면, DMAC는 다음의 명령을 선택한다. 단계 320에서, DMAC는 명령 그룹 사이에서 연속 선택시스템을 사용한다. 선택 시, 결정은 단계 322에서 미결정 명령을 갖는 개별적인 쿼터 제한 하에 어떤 유효한 그룹이 존재하는지의 여부에 따라 이루어진다. 미결정 명령을 갖는 개별적인 쿼터 제한 하에 유효한 그룹이 존재하지 않는 경우, 변경은 다른 슬롯, 즉 슬롯 0으로 이루어진다. 한편, 미결정 명령을 갖는 개별적인 쿼터 하에 유효한 그룹이 존재하는 경우, 선택된 그룹으로부터의 가장 오래된 명령이 단계 324에서 전개된다. 연속 포인터는 다음의 스트리밍 ID 명령 그룹으로 조절되고, 큐의 크기는 단계 326에서 감소되며, 슬롯은 단계 302에서 변경된다. Once the validity and quota decisions have been made, DMAC selects the following command: In step 320, the DMAC uses a continuous selection system between command groups. Upon selection, the decision is made based on whether there are any valid groups under the respective quota limit with pending commands in step 322. If no valid group exists under an individual quota limit with pending commands, the change is made to another slot, slot 0. On the other hand, if there is a valid group under separate quota with pending commands, the oldest command from the selected group is developed in step 324. The continuous pointer is adjusted to the next group of streaming ID commands, the size of the queue is reduced in step 326, and the slot is changed in step 302.

로드 또는 저장 명령인 모든 프로세서에서 메모리 명령이 슬롯 0을 통하여 전개된다. 이러한 방식으로 다수개의 명령을 발행하는 이유는 효율성을 개선하기 위해서이다. 양방향 버스의 방향 변화는 시간을 소비한다. 게다가, 외부 메모리와 함께, 개별적으로 요구를 각각 처리할 수 있는 다수개의 뱅크가 존재하여, 외부 메모리는 복합 명령을 수신하는 것이 가능하다. 또한, 요구를 처리하는데 필요한 시간은 매우 길다. 이에 따라, 양방향 버스의 방향을 변화하는 것을 최소화하고, 병렬의 로드 또는 병렬의 저장을 최대화하기 위하여, 버스트 로드 또는 저장으로서의 외부 메모리로의 다수개의 요구로서 처리하는데 유리하다. In all processors that are load or store instructions, memory instructions are deployed through slot 0. The reason for issuing multiple commands in this way is to improve efficiency. Changing the direction of the bidirectional bus is time consuming. In addition, with the external memory, there are a plurality of banks that can each individually handle the request, so that the external memory can receive a compound command. In addition, the time required to handle the request is very long. This is advantageous for handling as multiple requests to external memory as burst load or storage, in order to minimize the change in the direction of the bidirectional bus and to maximize parallel load or storage in parallel.

이상의 설명으로부터, 본 발명의 바람직한 실시예에 있어서, 본 발명의 취지로부터 벗어남이 없이 다양한 수정 및 변경이 이루어질 수 있다는 것이 부가적으로 이해되어야 할 것이다. 이러한 설명은 예시만을 목적으로 한 것이며, 한정한 것으로 해석되어서는 안된다. 본 발명의 범위는 다음의 청구의 범위에 의해서만 한정되어야 한다. From the above description, it should be further understood that in the preferred embodiment of the present invention, various modifications and changes can be made without departing from the spirit of the present invention. This description is for illustrative purposes only and should not be construed as limiting. It is intended that the scope of the invention should be limited only by the following claims.

따라서, 본 발명의 바람직한 실시예를 참조하여 본 발명을 설명함에 있어서, 여기에 기재된 실시예는 본질적으로 한정하는 것이 아니며, 광범위한 변경, 수정, 변화 및 대치가 상술한 설명 내에서 고려되며, 어떤 예에 있어서, 본 발명의 몇몇 특징은 다른 특징의 대응하는 이용 없이 사용될 것이다. 이러한 다수개의 변경 및 수정은 상술한 바람직한 실시예의 검토 하에 당업자에 의해 이해되는 것이 바람직하다. 따라서, 첨부된 청구의 범위는 본 발명의 범위에 일치하는 범위에서 광범위하게 해석된다. Therefore, in describing the present invention with reference to the preferred embodiments of the present invention, the embodiments described herein are not inherently limited, and a wide variety of changes, modifications, changes, and substitutions are contemplated within the above description, and certain examples In the above, some features of the invention will be used without the corresponding use of other features. Many of these modifications and variations are understood by those skilled in the art upon examination of the preferred embodiments described above. Accordingly, the appended claims are to be construed broadly in accordance with the scope of the invention.

Claims

A system for issuing a Direct Memory Access (DMA) request command originating from a processing element using a streaming ID,

Bus means;

DMA controller (DMAC) means having issuance logic means;

A bus interface unit (BIU) means having a waiting queue means and connected between said bus means and said DMAC means; And

A bus target means connected to the bus means and having at least one of an external memory, an input / output (IO) means, and an on-chip memory;

The bus means is connected between the BIU means and the bus target means,

The issue logic means determines which instructions are allowed to deploy as a bus request as a function of an issue policy with at least one of slot change, streaming ID group and age of the instructions,

And the queue holding means holds each of the bus requests before issuing to the bus.

The method of claim 1, wherein the DMAC means,

An instruction code field having a plurality of entry positions;

A slot field configured to be associated with at least an instruction designation and having a plurality of slot entries, each corresponding to at least one entry location of at least the plurality of entry locations; And

And an identification field configured to include a streaming ID number corresponding to each entry location of at least the plurality of entry locations.

3. The system of claim 2, wherein the command designation further comprises a designation selected from the group consisting of a load command and a save command.

2. The system of claim 1, wherein said issuing logic means inhibits at least slot changes due to external devices having bidirectional buses.

To issue a command with DMAC,

Selecting one of the plurality of slots to provide the selected slot;

Determining validity of the selected slot;

If there is no valid group, selecting another one of the plurality of slots;

If the at least one group is valid, selecting the oldest valid command; And

Updating a group property for the group having the oldest valid command.

6. The method of claim 5, wherein selecting a slot further comprises selecting a load slot or a storage slot.

The method of claim 5, wherein the determining the group validity,

Determining a valid ID group of the plurality of ID groups; And

Determining whether at least one valid ID group has reached a programmed quota.

6. The method of claim 5, wherein updating the queue characteristic further comprises moving a pointer to the group with the oldest valid instruction to the next pending bus request.

A computer program product having a medium on which a computer program is recorded and for issuing instructions to DMAC,

The computer program,

Computer code for selecting any one of the plurality of slots to provide a selected slot;

Computer code for determining group validity of the selected slot;

Computer code for selecting another one of the plurality of slots when there is no valid group;

Computer code for selecting the oldest valid command if at least one group is valid; And

And computer code for updating a group property for the group having the oldest valid instruction.

10. The computer program product of claim 9, wherein the computer code for selecting the slots further comprises computer code for selecting a load slot or a storage slot.

The computer code of claim 9, wherein the computer code for determining group validity is:

Computer code for determining a valid ID group of the plurality of ID groups; And

And computer code for determining whether at least one valid ID group has reached a programmed quota.

10. The computer program product of claim 9, wherein the computer code for updating the queue characteristic further comprises computer code for moving a pointer to a group having the oldest valid instruction to a next pending bus request.

A processor that issues instructions to DMAC and has a computer program,

Computer code for determining group validity of the selected slot;

14. The processor of claim 13, wherein the computer code for selecting the slots further comprises computer code for selecting a load slot or a storage slot.

The computer code of claim 13, wherein the computer code for determining group validity is:

14. The processor of claim 13, wherein the computer code for updating the queue characteristic further comprises computer code for moving a pointer to the group having the oldest instruction to the next pending bus request.