KR100766666B1

KR100766666B1 - Multi-processor system

Info

Publication number: KR100766666B1
Application number: KR1020057010950A
Authority: KR
Inventors: 다께시 시마다; 다쯔루 나까가끼; 아끼히로 고바야시
Original assignee: 후지쯔 가부시끼가이샤
Priority date: 2005-06-15
Filing date: 2003-05-30
Publication date: 2007-10-11
Also published as: KR20050085671A

Abstract

공유 메모리에 데이터를 기입할 때의 데이터의 송신을, 각 프로세서와 공유 메모리의 사이에 설치된 고속의 전용 회선을 사용하여 행한다. 각 프로세서는, 공유 메모리의 공유 메모리 공간에 기입을 행하는 경우에는, 종래의 글로벌 버스에 대응하는 갱신 통지 버스에, 어느 하나의 어드레스에의 갱신을 할지를 통지한다. 이 통지를 검출한 다른 프로세서는, 공유 메모리 캐쉬 내의 그 어드레스에의 액세스를 금지하고, 해당 어드레스에의 기입 데이터가 전용 회선을 사용하여 송신되어 오는 것을 기다린다. 데이터가 송신되어 오면, 공유 메모리 캐쉬의 대응 어드레스에 데이터를 기입한다. 이 때, 공유 메모리의 대응하는 어드레스에도 해당 데이터가 기입되며, 캐쉬 코히어런시가 유지된다. 또한, 기입 어드레스를 송신하려면, 버스의 사용권 취득이 필요하지만, 데이터의 송신은 전용 회선을 사용하므로, 버스의 사용권 취득을 위한 시간이 크게 삭감된다.Data transmission when data is written to the shared memory is performed using a high-speed dedicated line provided between each processor and the shared memory. When writing to the shared memory space of the shared memory, each processor notifies the update notification bus corresponding to the conventional global bus to which address to update. The other processor detecting this notification prohibits access to the address in the shared memory cache and waits for data written to the address to be transmitted using the dedicated line. When data is transmitted, data is written to the corresponding address of the shared memory cache. At this time, the corresponding data is also written to the corresponding address of the shared memory, and cache coherency is maintained. In addition, in order to transmit the write address, it is necessary to acquire a license to use the bus. However, since data is transmitted using a dedicated line, the time for acquiring the license to use the bus is greatly reduced.

갱신 통지, 글로벌 버스, 공유 메모리 캐쉬, 캐쉬 코히어런시, 전용 회선 Update Notification, Global Bus, Shared Memory Cache, Cache Coherency, Private Line

Description

Multiprocessor System {MULTI-PROCESSOR SYSTEM}

본 발명은 복수의 프로세서를 결합하고 이들 프로세서에 공유되는 공유 메모리 공간을 배치하는 공유 메모리형 멀티프로세서 시스템으로, 특히 공유 메모리 공간의 데이터를 캐쉬하는 공유 메모리 캐쉬가 구비된 프로세서에 의해 구성되는 시스템에 관한 것이다. 소프트웨어의 처리는 각 프로세서에 의해 분담되어 행해지고, 공유 메모리는 프로세서 사이에서 처리를 계속할 때의 데이터의 교환이나 프로세서 단체가 아니라, 시스템 단위로 관리하여야 할 정보를 저장하는 장소 등으로 사용된다. 공유 메모리 캐쉬는 공유 메모리에의 액세스를 고속화하여 시스템 성능을 향상하기 위해 도입된다.SUMMARY OF THE INVENTION The present invention is a shared memory type multiprocessor system that combines a plurality of processors and arranges a shared memory space shared by these processors, in particular a system configured by a processor having a shared memory cache for caching data in the shared memory space. It is about. The processing of the software is shared by each processor, and the shared memory is used as a place for storing information to be managed in units of systems, not as an exchange of data when the processing is continued between processors or a single processor. Shared memory caches are introduced to improve system performance by speeding up access to shared memory.

도 1은 가장 간단한 공유 메모리형 멀티프로세서 시스템의 종래예를 나타내는 도면이다. 1 is a diagram showing a conventional example of the simplest shared memory type multiprocessor system.

복수의 프로세서와 공유 메모리가 동일한 글로벌 버스에 의해 접속되고, 각 프로세서는 이 글로벌 버스를 경유하여 공유 메모리에 액세스한다. 각 프로세서(1a-1)∼(1a-n)는 아비터(1b)에 대하여 버스 요구 신호(1c-1)∼(1c-n)를 송출하고, 아비터에 의해 사용권이 조정되어, 동일 시간에 1개의 프로세서에만 글로벌 버스(1e)의 사용권이 주어지고, 그 프로세서에 대하여 버스 허가 신호(1d-1)∼(1d-n)가 송출된다. 버스 허가 신호를 받은 프로세서는 글로벌 버스를 경유하여 공유 메모리(1f)에 액세스하여 원하는 데이터의 수수를 행한다. A plurality of processors and shared memory are connected by the same global bus, and each processor accesses the shared memory via this global bus. Each of the processors 1a-1 to 1a-n sends bus request signals 1c-1 to 1c-n to the arbiter 1b, and the right of use is adjusted by the arbiter so that the processor 1 receives the same time. Only one processor is given a right to use the global bus 1e, and bus permission signals 1d-1 to 1d-n are sent to the processors. Upon receiving the bus permission signal, the processor accesses the shared memory 1f via the global bus to receive the desired data.

도 1의 실현 방식에서는, 공유 메모리 공간에의 액세스는 리드, 라이트의 종별을 막론하고, 모두가 글로벌 버스를 경유하게 된다. In the realization method of FIG. 1, all access to the shared memory space is via the global bus regardless of the type of read and write.

여기에 이하의 2개의 제약이 있다. There are two restrictions here.

제약 1: 신호 전송에 시간을 요한다(물리적 제약) Constraints 1: Time for signal transmission (physical constraints)

제약 2: 버스 사용권의 순서 대기 시간을 요한다(원리적 제약)Constraints 2: Require order wait time for bus licenses (principle constraints)

전자는, 글로벌 버스에 있어서 신호 전송 거리가 길어지는 것이나, 복수의 프로세서가 동일한 신호선을 공유하는 것 등의 전기적 조건에 의해, 고속인 신호 전송이 곤란해 진다는 것에 기인한다. 후자는, 2개 이상의 프로세서가 공유 메모리에 대하여 동일 시간에 액세스를 행한 경우, 2개째 이후의 프로세서가 글로벌 버스 사용권의 조정에 의해 공유 메모리에 대한 액세스를 기다리게 되는 시간이 발생하는 것에 기인한다. 결과적으로, 이들의 제약은 공유 메모리 공간에의 액세스에 이하의 문제를 발생시킨다. The former is due to the fact that the signal transmission distance becomes long in the global bus, and that high-speed signal transmission becomes difficult due to electrical conditions such as a plurality of processors sharing the same signal line. The latter is attributable to the fact that when two or more processors access the shared memory at the same time, the second and subsequent processors wait for access to the shared memory by adjusting the global bus usage rights. As a result, these restrictions cause the following problems in access to the shared memory space.

문제 1: 대역(시스템에 허용되는 시간 당의 액세스 횟수)의 부족 Issue 1: lack of bands (number of accesses per hour allowed to the system)

문제 2: 레이턴시(액세스 개시로부터 완료까지 걸리는 시간)의 과대 Problem 2: Excessive Latency (Time to Start to Complete)

도 2는 각 프로세서 상에 공유 메모리 캐쉬(2h)를 배치한 종래예를 나타내는 도면이다. Fig. 2 is a diagram showing a conventional example in which the shared memory cache 2h is disposed on each processor.

프로세서 코어(2g)가 공유 메모리 공간을 리드한 경우, 공유 메모리 캐쉬 상에 공유 메모리 공간의 데이터의 카피가 있으면, 리드 처리를 내부 버스(2i)를 통 하여 프로세서 상에서 완결할 수 있어, 상기한 제약 1을 경감할 수 있다. 또한, 글로벌 버스 경유의 액세스로 되지 않기 때문에, 글로벌 버스의 사용권 조정이 불필요해져, 상기한 제약 2로부터 해방된다. 이러한 점에서, 공유 메모리 캐쉬의 도입은 상기 문제에 대한 개선책이 된다. When the processor core 2g reads the shared memory space, if there is a copy of the data of the shared memory space on the shared memory cache, the read processing can be completed on the processor via the internal bus 2i, and the above limitations. 1 can be reduced. In addition, since the access via the global bus is not performed, the adjustment of the use rights of the global bus becomes unnecessary, and the above-mentioned restriction 2 is released. In this regard, the introduction of a shared memory cache is an improvement over this problem.

공유 메모리 캐쉬의 도입에 의해 각 프로세서가 공유 메모리 공간의 데이터의 카피를 개별적으로 유지할 수 있게 되지만, 공유 메모리 공간 상의 데이터는 모든 프로세서에 있어서 동일하게 보이지 않으면 안된다. 따라서, 데이터 갱신의 계기가 되는 라이트 처리에 관해서는, 이를 보증하는 코히어런시 제어의 고려가 필수로 된다. 이유는 후술하지만, 이 코히어런시 제어도 상기 문제를 해결하는 데에 있어서의 장벽이 된다. The introduction of a shared memory cache allows each processor to maintain a copy of the data in the shared memory space individually, but the data on the shared memory space must look the same for all processors. Therefore, regarding the write processing which is the trigger for data update, consideration of coherency control that guarantees this is essential. Although the reason is mentioned later, this coherency control also becomes a barrier in solving the said problem.

여기서, 코히어런시 제어 상의 요건을 이하의 3가지로 세분한다. Here, the requirements on coherency control are subdivided into the following three.

요건 1: 시간적인 동기 Requirement 1: Time Motivation

요건 2: 공간적인 동기 Requirement 2: spatial motivation

요건 3: 갱신 시간의 단축 Requirement 3: Reduction of renewal time

도 3은 코히어런시 제어에 대하여 설명하는 도면이다. It is a figure explaining coherence control.

도 3은 상기의 요건의 의미를 설명하는 것으로, 공유 메모리 공간 상의 임의의 어드레스의 데이터가 값 0이었을 때, 그 어드레스에 프로세서 1이 값 1을 라이트하고, 그 후 프로세서 2가 값 2를 라이트하고, 다른 프로세서 3∼n이 해당 어드레스를 리드한 경우를 상정한 것이다. 여기서, 요건 1은 개개의 프로세서 상에서, 예를 들면 값 2→값 1의 순으로 리드하는 가능성을 배제하는 것에 상당하고(t₁≥0의 보증), 또한 요건 2는, 예를 들면 이미 값 1을 리드한 프로세서가 있는데도, 그 후에 값 0을 리드하는 프로세서가 발생할 가능성을 배제하는 것에 상당한다(t₂≥O의 보증). 또한, 요건 3은 데이터 갱신이 있은 시점으로부터, 다른 프로세서가 여전히 갱신 전의 데이터를 리드하는 시간과, 갱신 후의 데이터를 판독할 수 있게 되기까지의 시간을, 모두 가능한 한 짧게 하는 것에 상당한다(t₂ 및 t₃의 최소화). 요건 3은 코히어런시 제어 상의 필수 요건은 아니지만, 시스템 성능을 향상하기 위해 필요하게 된다.3 illustrates the meaning of the above requirement, when data at any address on the shared memory space had a value of 0, processor 1 writes the value 1 to that address, and then processor 2 writes the value 2 to it. It is assumed that other processors 3 to n read the address. Here, requirement 1 corresponds to excluding the possibility of reading on an individual processor, for example in the order of value 2 → value 1 (guarantee of t ₁ ≧ 0), and requirement 2 is already a value 1, for example. Even if there is a processor that reads, it is equivalent to excluding the possibility that a processor that reads a value 0 thereafter occurs (a guarantee of t _2? O). In addition, requirement 3 corresponds to making the time from the time of data update to another processor still reading the data before the update and the time until the data after the update can be read as short as possible (t _2). And minimization of t ₃ ). Requirement 3 is not a requirement for coherency control, but is needed to improve system performance.

도 2에 있어서의 코히어런시 제어의 예로는, 공유 메모리 공간에 대한 프로세서의 라이트 처리마다, 그것을 자신의 공유 메모리 캐쉬에 반영함과 동시에 글로벌 버스를 경유하여 공유 메모리에 라이트하고, 한편 다른 프로세서는 글로벌 버스에 나타나는 라이트 액세스를 감시하여, 해당 어드레스의 데이터가 각각의 공유 메모리 캐쉬 상에 있는 경우, 그 데이터를 글로벌 버스 상의 데이터로 치환하는 방법을 들 수 있다. In the example of coherency control in FIG. 2, each write process of a processor to a shared memory space is reflected in its shared memory cache and written to the shared memory via the global bus, while another processor May monitor a write access appearing on the global bus and replace the data with data on the global bus when the data of the corresponding address is on each shared memory cache.

도 4는 캐쉬 코히어런시의 확립의 방법의 예를 설명하는 도면이다. 4 is a diagram for explaining an example of a method of establishing cache coherency.

도 4는 상기 방법에 기초한 처리 시퀀스의 예이다. 도면에서, (4a)∼(4f)의 타이밍은 각각 이하의 사상에 대응한다. 4 is an example of a processing sequence based on the above method. In the drawings, the timings of (4a) to (4f) correspond to the following ideas, respectively.

(4a): 프로세서 코어가 라이트 액세스를 기동(4a): processor core activates write access

(4b): 라이트 액세스 기동에 의해 글로벌 버스 요구를 송출(4b): Send global bus request by write access start

(4c): 버스 사용 허가를 받아, 글로벌 버스에 어드레스를 데이터 출력(4c): Bus use permission, address is output to the global bus

(4d): 다른 프로세서/공유 메모리가 글로벌 버스의 정보를 수신하여, 자신의 공유 메모리 혹은 공유 메모리 캐쉬에 라이트(4d): Another processor / shared memory receives information from the global bus and writes to its shared memory or shared memory cache

(4e): 메모리 라이트 완료(4e): complete memory write

(4f): 라이트 액세스를 기동한 프로세서가 버스 개방(4f): The processor that initiated write access opens the bus

이 예에서는, 코히어런시 보증에 필요한 조건은 이하의 식으로 표시된다. In this example, the conditions necessary for the coherence guarantee are expressed by the following equation.

t_rc( _min ₎>t_dsd _(max)+t_dmw _(max) t _{rc (} _min ₎ > t _dsd _(max) + t _dmw _(max)

t_dsd _(max)<t_dsd _(min)+t_dmw _(min) t _dsd _(max) <t _dsd _(min) + t _dmw _(min)

여기서,here,

t_rc:글로벌 버스에의 라이트 발행으로부터 버스 개방까지의 시간 t _rc: Time from issuance of light to global bus to bus opening

t_dsd:글로벌 버스에의 라이트 발행을 다른 프로세서가 인식하는 데 필요한 시간 t _dsd: the time required for other processors to recognize the write to the global bus

t_dmw:프로세서/공유 메모리가 글로벌 버스 상의 라이트 액세스를 인식하고 나서, 그 데이터를 자신에게 반영시키는 데 걸리는 시간t _{dmw: The} time it takes for the processor / shared memory to recognize write access on the global bus and then reflect that data to itself

이다.to be.

여기서, 수학식 1은 상기한 요건 1을 만족하기 위한 조건으로, 라이트값이 공유 메모리 및 모든 프로세서 상의 공유 메모리 캐쉬에 반영된 후에 글로벌 버스를 개방하는 것을 보증하는 것이다(일반적으로는 라이트 처리의 완료 응답을 피 라이트측으로부터 송출하고, 그 수신으로써 버스 개방을 행하는 시퀀스가 채용되는 경우가 많다). 그 조건을 충족시킴으로써, 글로벌 버스의 사용권 조정에 의해 다음의 프로세서가 라이트 처리를 개시할 때에는 전의 라이트 처리가 완료되었음이 보증된다. 말하자면, 글로벌 버스가 갖는 결점에 의해 코히어런시 제어의 요건이 충족되어 있는 모습이지만, 실제로는 상기한 요건 1은 데이터 갱신의 조정을 요하는 것과 본질적으로 차이가 없다. 데이터 갱신의 순서 매김을 보증하는 것은, 복수의 데이터 갱신이 동시에 발생하지 않는 것을 보증하는 것, 즉 조정을 행하는 것과 등가이기 때문이다. 따라서, 상기한 코히어런시 제어의 요건 1을 충족시키는 것은, 글로벌 버스를 사용하는 데에 있어서 생기는 상기한 제약 2를 동일하게 받아, 상기 문제를 해결하는 데에 있어서의 장벽이 된다. Here, Equation 1 is a condition for satisfying the above requirement 1, which guarantees to open the global bus after the write value is reflected in the shared memory and the shared memory cache on all processors (generally, the completion response of the write process). Is sent from the pi-right side and a bus opening is often used as the reception). By satisfying the conditions, it is guaranteed that the previous write processing is completed when the next processor starts the write processing by adjusting the usage rights of the global bus. In other words, although the requirement of coherency control is satisfied by the shortcomings of the global bus, in practice, the above requirement 1 is essentially indistinguishable from requiring adjustment of data update. Guaranteeing the ordering of data updates is equivalent to ensuring that a plurality of data updates do not occur at the same time, i.e., making adjustments. Therefore, satisfying the above requirement 1 of coherency control is a barrier in solving the above problem by receiving the above-mentioned constraint 2 generated in using the global bus.

한편, 수학식 2는 도 4의 (4d)의 타이밍이 각 프로세서에서 변동이 있는 것을 흡수하여, 상기한 요건 2를 충족시키기 위한 조건이다. (4d)의 타이밍은, 글로벌 버스에 나타난 라이트 액세스와 경합하는 리드 액세스가 각 프로세서 상에서 기동되었을 때에, 갱신 전의 데이터가 프로세서 코어로 되돌려지는지, 갱신 후의 데이터가 되돌려지는지의 경계가 되는 타이밍이다. 갱신 후의 데이터가 반송되는 것은 (4e)의 타이밍이기 때문에, 수학식 2를 충족하지 않으면, 이 타이밍이 프로세서에 따라서는 역전되어, 상기한 요건에 반하게 된다.On the other hand, Equation 2 is a condition for absorbing that the timing of Fig. 4 (4d) is varied in each processor and satisfying the above requirement 2. The timing (4d) is a timing at which a boundary between whether the data before the update is returned to the processor core or the data after the update is returned when the read access competing with the write access shown in the global bus is activated on each processor. Since it is the timing of (4e) that the data after the update is conveyed, if the expression (2) is not satisfied, this timing is reversed depending on the processor, which is contrary to the above requirement.

여기서, 예를 들면 수학식 1은 버스 점유 시간을 일정 이상으로 해야만 하는 것, 즉 공유 메모리 공간의 대역에 대한 제약을 부여하는 것을 나타내고 있고, 수학식 2는 공유 메모리 캐쉬나 공유 메모리에 대한 기입 시간을 짧게 하여 대역을 늘리려고 하여도, 프로세서 사이에서 (4d)의 타이밍이 변동되는 것을 고려하여, 일정 이상의 시간으로 유지하지 않으면 안 되는 것을 나타내고 있다. 이들 예대로, 각종의 동작 타이밍에 조건이 덧붙여지기 때문에, 처리 시간의 단축을 도모하여 성능 향상을 도모하고자 하는 경우에, 코히어런시 제어 자체가 일종의 제약을 낳게 된다. Here, for example, Equation 1 indicates that the bus occupancy time must be greater than or equal to a certain time, that is, to impose a restriction on a band of the shared memory space, and Equation 2 indicates a write time for the shared memory cache or the shared memory. Even when trying to increase the bandwidth by shortening the time, it is indicated that the timing of 4d must be maintained for a predetermined time or more, taking into account that the timing of 4d varies between processors. According to these examples, since conditions are added to various operation timings, coherency control itself brings about a kind of limitation when it is desired to shorten processing time and improve performance.

종래의 캐쉬 사이의 코히어런시를 취하는 기술로서 특허 문헌 1이 있다. 특허 문헌 1에서는 프로세서 모듈은 캐쉬 메모리를 갖고 다른 프로세서 모듈에 버스를 통하여 코히어런시 트랜잭션을 발행한다. 코히어런시 트랜잭션을 수신한 프로세서 모듈은 코히어런시 검사를 실행한다. 코히어런시를 유지하기 위해 갱신을 행하는 경우, 갱신에 사용될 데이터는 버스를 통하여 보내진다. 프로세서 모듈과 메인 메모리를 연결하는 신호선은 코히어런시 검사의 결과의 통지에 사용된다. Patent Literature 1 is a technique that takes coherence between conventional caches. In Patent Document 1, a processor module has a cache memory and issues a coherency transaction to another processor module via a bus. The processor module receiving the coherency transaction executes the coherency check. When updating is performed to maintain coherency, data to be used for updating is sent over the bus. The signal line connecting the processor module and the main memory is used for notification of the result of the coherency check.

[특허 문헌 1] 일본 특허공개 평7-281956호 공보 [Patent Document 1] Japanese Patent Application Laid-Open No. 7-281956

<발명의 개시><Start of invention>

본 발명의 과제는, 이상 설명한 바와 같은 코히어런시 제어를 포함하는 상기 각종 제약에 의한 성능 저하 요인의 최소화를 도모하면서, 상기 과제를 해결하여 공유 메모리 공간의 대역과 레이턴시의 향상을 도모한 멀티프로세서 시스템을 제공하는 것이다.Disclosure of Invention An object of the present invention is to solve the above problem by minimizing the performance deterioration factor caused by the various constraints including coherency control as described above, and to improve the bandwidth and latency of the shared memory space. It is to provide a processor system.

본 발명의 멀티프로세서 시스템은, 각각이 공유 메모리 캐쉬를 구비하는 복 수의 프로세서와 적어도 1개의 공유 메모리가 서로 결합된 멀티프로세서 시스템에 있어서, 공유 메모리 영역의 데이터의 갱신에 있어서, 갱신에 이용할 데이터를 프로세서와 공유 메모리 사이에서 전용으로 송수신하는 전용 회선 수단과, 데이터의 갱신 통지를, 각 프로세서에 상기 갱신 통지를 송신하는 권리를 조정하면서, 전송하는 글로벌 버스 수단을 구비하고, 프로세서로부터의 상기 데이터의 갱신 통지의 송신과 갱신에 이용할 데이터의 송신을 독립적으로 행하고, 각 프로세서 및 공유 메모리에서는, 갱신 통지의 수신에 의해서, 상기 갱신 통지에 의해서 나타나는 어드레스에의 액세스를 제한하고, 각 프로세서 및 공유 메모리에 도착한 갱신에 이용할 데이터에 의해서, 공유 메모리 영역의 상기 어드레스의 데이터가 갱신된 후, 상기 어드레스에의 액세스를 허가하는 것을 특징으로 한다.The multiprocessor system of the present invention is a multiprocessor system in which a plurality of processors each having a shared memory cache and at least one shared memory are combined with each other, wherein data to be used for updating is performed in updating data in a shared memory area. A dedicated line means for exclusively transmitting and receiving the data between the processor and the shared memory, and a global bus means for transmitting data update notification while adjusting the right to transmit the update notification to each processor, wherein the data from the processor is provided. Each processor and the shared memory independently transmit the update notification and the data to be used for the update, and by receiving the update notification, access to the address indicated by the update notification is restricted, and each processor and the shared memory. By data to use for update that arrived at After the data of the address of the shared memory area is updated, access to the address is permitted.

본 발명에 따르면, 갱신 데이터를 송수신하는 전용 회선 수단을 설치함으로써, 갱신 데이터의 송수신이 고속화된다. 또한, 글로벌 버스 수단으로는, 데이터량이 적은 갱신 통지만을 조정하여 전송하면 되기 때문에, 버스의 사용권의 획득을 위해 장시간 기다리는 일이 적어진다. 또한, 갱신 통지에 따라서, 갱신 데이터에 의한 공유 메모리 영역의 갱신을 각 프로세서 및 공유 메모리가 하기 때문에, 공유 메모리 캐쉬와 공유 메모리와의 코히어런시가 확보된다. According to the present invention, by providing a dedicated line means for transmitting and receiving update data, transmission and reception of update data is speeded up. In addition, since the global bus means only needs to adjust and transmit the update notification with a small amount of data, it is less likely to wait for a long time to obtain a bus usage right. Further, in accordance with the update notification, each processor and the shared memory update the shared memory area by the update data, thereby ensuring coherence between the shared memory cache and the shared memory.

도 1은 가장 간단한 공유 메모리형 멀티프로세서 시스템의 종래예를 나타내는 도면. 1 illustrates a conventional example of the simplest shared memory type multiprocessor system.

도 2는 각 프로세서 상에 공유 메모리 캐쉬(2h)를 배치한 종래예를 나타내는 도면. Fig. 2 is a diagram showing a conventional example in which a shared memory cache 2h is disposed on each processor.

도 3은 코히어런시 제어에 대하여 설명하는 도면. 3 is a diagram for explaining coherency control.

도 4는 캐쉬 코히어런시의 확립 방법의 예를 설명하는 도면. 4 is a view for explaining an example of a method of establishing cache coherency.

도 5는 본 발명의 실시 형태에 기초한 시스템의 구성도. 5 is a configuration diagram of a system based on an embodiment of the present invention.

도 6은 본 발명의 실시 형태에서의 제1 양태의 일련의 처리에 기초한 타임차트의 예. 6 is an example of a time chart based on a series of processes of the first aspect in the embodiment of the present invention.

도 7은 본 발명의 실시 형태의 제2 양태에 기초한 처리의 타임 차트의 예. 7 is an example of a time chart of processing based on the second aspect of the embodiment of the present invention.

도 8은 서로 다른 데이터 사이즈로 데이터 갱신을 행한 경우의 타임 차트의 예. 8 is an example of a time chart when data is updated with different data sizes.

도 9는 본 발명의 실시 형태의 제3 양태에 기초한 처리의 타임 차트의 예. 9 is an example of a time chart of processing based on the third aspect of the embodiment of the present invention.

도 10은 본 발명의 실시 형태의 제4 양태의 원리에 기초한 타임 차트의 예.10 is an example of a time chart based on the principle of the fourth aspect of an embodiment of the present invention.

도 11 및 도 12는 본 발명의 실시 형태의 제5 양태에서의 시스템의 구성도와 그 제어 원리를 나타내는 타임 차트. 11 and 12 are time charts showing the configuration diagram of the system in the fifth aspect of the embodiment of the present invention and its control principle.

도 13은 본 발명의 실시 형태의 제6 양태를 설명하는 도면. 13 is a view for explaining a sixth aspect of an embodiment of the present invention.

도 14는 본 발명의 실시 형태에 기초한 보다 구체적인 시스템 구성도. 14 is a more detailed system block diagram based on the embodiment of the present invention.

도 15는 도 14의 각 프로세서(14a-1)∼(14a-10)의 내부 구성도. Fig. 15 is a diagram showing the internal configuration of each of the processors 14a-1 to 14a-10 in Fig. 14.

도 16은 본 발명의 실시 형태에서의 제1 양태의 라이트 액세스 시의 신호의 흐름을 나타내는 도면. Fig. 16 is a diagram showing the flow of signals during write access in the first aspect in the embodiment of the present invention.

도 17은 본 발명의 실시 형태의 제1 양태에 기초한 갱신 데이터 수신 시의 신호의 흐름을 나타내는 도면. Fig. 17 is a diagram showing a signal flow at the time of receiving update data based on the first aspect of the embodiment of the present invention.

도 18은 본 발명의 실시 형태의 제1 양태에서, 공유 메모리 캐쉬의 데이터를 이용할 수 있는 전형적인 리드 액세스 시의 신호의 흐름을 나타내는 도면. FIG. 18 is a diagram illustrating a signal flow during typical read access in which data in the shared memory cache can be used in the first aspect of the embodiment of the present invention. FIG.

도 19는 본 발명의 실시 형태의 제1 양태에서의 리드 액세스에서, 공유 메모리 캐쉬 상의 데이터를 이용할 수 없고, 갱신 데이터 요구 처리를 수반하는 경우의 신호의 흐름을 나타내는 도면. Fig. 19 is a diagram showing a signal flow when data on a shared memory cache cannot be used in read access in the first aspect of the embodiment of the present invention and involves update data request processing.

도 20은 본 발명의 실시 형태의 제1 양태에서, 다른 프로세서로부터 송신된 갱신 데이터 요구에 대한, 마스터 프로세서에 의한 응답 시의 신호의 흐름을 나타내는 도면. 20 is a diagram showing the flow of signals in response by the master processor to an update data request transmitted from another processor in the first aspect of the embodiment of the present invention;

도 21은 본 발명의 실시 형태의 제2 양태에서의 라이트 액세스 시의 신호의 흐름을 나타내는 도면. Fig. 21 is a diagram showing the flow of signals during write access in the second aspect of the embodiment of the present invention.

도 22는 본 발명의 실시 형태의 제2 양태에서, 다른 프로세서로부터 송출된 갱신 데이터 수신 시의 신호의 흐름을 나타내는 도면. Fig. 22 is a diagram showing a signal flow when receiving update data sent from another processor in the second aspect of the embodiment of the present invention.

도 23은 본 발명의 실시 형태의 제3 양태에서의 갱신 통지를 생략한 라이트 액세스 시의 신호의 흐름을 나타내는 도면. Fig. 23 is a diagram showing a signal flow during write access in which update notification is omitted in the third aspect of the embodiment of the present invention.

도 24는 본 발명의 실시 형태의 제3 양태에서, 다른 프로세서로부터 송출된 갱신 통지를 생략한 갱신 데이터 수신 시의 신호의 흐름을 나타내는 도면. Fig. 24 is a diagram showing a signal flow at the time of receiving update data in which the update notification sent from another processor is omitted in the third aspect of the embodiment of the present invention.

도 25는 본 발명의 실시 형태의 제2 양태에서의 캐쉬필 동작에 있어서, 시스템에 증설된 프로세서가 전체 데이터 송신 요구를 행할 때의 신호의 흐름을 나타내는 도면. Fig. 25 is a diagram showing the flow of signals when a processor added to the system makes an entire data transmission request in the cache fill operation in the second aspect of the embodiment of the present invention.

도 26은 본 발명의 실시 형태의 제4 양태에서의 캐쉬필 동작에 있어서, 마스 터 프로세서가 전체 데이터 송신 요구에 응답하여 전체 데이터 송신을 행할 때의 신호의 흐름을 나타내는 도면. Fig. 26 is a diagram showing a signal flow when a master processor performs full data transmission in response to a full data transmission request in the cache fill operation in the fourth aspect of the embodiment of the present invention.

도 27은 본 발명의 실시 형태의 제4 양태에서의 캐쉬필 동작에 있어서, 시스템에 증설된 프로세서가 전체 데이터 수신을 행할 때의 신호의 흐름을 나타내는 도면. Fig. 27 is a diagram showing a signal flow when a processor expanded in a system performs full data reception in the cache fill operation in the fourth aspect of the embodiment of the present invention.

도 28은 본 발명의 실시 형태의 제5 양태에 기초한 라이트 액세스 시의 신호의 흐름을 나타내는 도면. Fig. 28 is a diagram showing a signal flow at the time of light access based on the fifth aspect of the embodiment of the present invention.

<발명을 실시하기 위한 최량의 형태><Best Mode for Carrying Out the Invention>

도 5는 본 발명의 실시 형태에 기초한 시스템의 구성도이다. 5 is a configuration diagram of a system based on the embodiment of the present invention.

본 발명의 실시 형태의 제1 양태에서의 본 발명의 원리를 이하에 기재한다. 도 5에서, 종래예의 글로벌 버스에 상당하는 부분은, 갱신 통지 버스(5e)로서, 데이터 갱신의 통지와, 갱신 데이터의 송출 요구를 전적으로 행하는 버스로서 이용된다. 갱신 데이터의 내용은 데이터 채널(5g)을 사용하여 리피터(5h)와의 사이에서 송수신된다. 데이터 채널은 고속 광대역의 기지의 전송 수단(예를 들면, 기가 비트 이더넷 등)을 이용하는 것을 상정한다. 리피터(5h)는 데이터 채널이 접속된 각 포트에 나타난 데이터를, 모든 포트에 동보(同報)하는 기능을 갖는다. 또한, 프로세서의 수가 적어 현실적인 데이터 채널 수로 되는 경우에는, 리피터를 설치하지 않고서 모든 프로세서와 공유 메모리의 사이에 일 대 일로 데이터 채널을 설치하여, 각 프로세서 상에서 동보하는 처리를 행하여도 된다. 또한, 공유 메모리는 특 정한 프로세서 상에 배치하여도 되고, 일본 특허출원 제2002-126212호 공보에 있는 예와 같이, 각 프로세서가 공유 메모리 공간의 사이즈와 동일한 공유 메모리 캐쉬를 구비하는 경우에는, 공유 메모리 자체를 설치하지 않아도 된다. 어느 경우든 본 발명의 실시 형태로서의 효과를 얻을 수 있다. The principle of this invention in the 1st aspect of embodiment of this invention is described below. In Fig. 5, the portion corresponding to the global bus of the conventional example is used as the update notification bus 5e as a bus that makes a notification of data update and a request for sending update data. The content of the update data is transmitted and received between the repeater 5h using the data channel 5g. It is assumed that the data channel uses high speed broadband known transmission means (e.g., Gigabit Ethernet). The repeater 5h has a function of broadcasting the data shown in each port to which the data channel is connected to all the ports. In addition, when the number of processors is small and becomes the actual number of data channels, the data channels may be provided one-to-one between all processors and the shared memory without providing repeaters, and processing may be performed on each processor. In addition, the shared memory may be disposed on a specific processor, and in the case where each processor has a shared memory cache equal to the size of the shared memory space, as in the example in Japanese Patent Application Laid-Open No. 2002-126212, the shared memory may be shared. You do not need to install the memory itself. In any case, the effect as embodiment of this invention can be acquired.

각 프로세서는, 프로세서 코어에 의해 공유 메모리 공간에의 라이트 처리가 발행되면, 갱신 통지 버스를 획득하여 갱신 대상의 어드레스를 갱신 통지 버스에 송출한다. 그와 동시에, 데이터 채널의 송신 버퍼에 갱신 데이터를 투입한다. 갱신 데이터는, 주로 각 프로세서와 리피터의 포트부에서의 신호 처리 지연을 받아, 갱신 통지에 대하여 지연되어 다른 프로세서에 도달한다. When the processor core issues a write process to the shared memory space, each processor acquires an update notification bus and sends an update target address to the update notification bus. At the same time, update data is input to the transmission buffer of the data channel. The update data mainly receives signal processing delays at ports of the processors and repeaters, and delays the update notification to reach other processors.

한편, 갱신 통지 버스는 모든 프로세서가 항상 감시하고 있으며, 갱신 통지를 검지하면, 해당 어드레스를 프로세서 상의 갱신 큐에 기입한다. 그 후 갱신 데이터가 도착하면, 그것을 공유 메모리 캐쉬에 라이트하고, 갱신 큐로부터 해당 어드레스를 소거한다. 또한, 갱신 큐 상에 존재하는 어드레스에 대하여 프로세서 코어로부터의 리드 처리가 기동된 경우, 공유 메모리 캐쉬로부터의 리드를 보류하고, 갱신 데이터 도착 시에 공유 메모리 캐쉬에의 라이트 처리에 맞추어 그 데이터를 프로세서 코어에 반송하는 처리를 행한다. 여기서, 갱신 큐에 저장되어 있는 어드레스는, 모두 모든 어드레스가 감시 대상으로 되어 있고, 갱신 데이터에는, 라이트 목적지의 어드레스가 부가되어 있다. 따라서, 각 프로세서에서는, 갱신 큐 내의 어드레스와 갱신 데이터에 부가되어 있는 어드레스를 비교하여, 공유 메모리 캐쉬의 적절한 어드레스에 갱신 데이터를 기입할 수 있다. 또한, 공유 메모리의 구성 은, 프로세서의 구성과 기본적으로 동일하지만, 공유 메모리에는, 프로세서 코어가 존재하지 않고, 공유 메모리 캐쉬가 보다 용량이 큰 공유 메모리 칩으로 되어 있다. On the other hand, the update notification bus is always monitored by all processors. When an update notification is detected, the update notification bus writes the corresponding address to the update queue on the processor. After that, when update data arrives, it is written to the shared memory cache and the corresponding address is erased from the update queue. When the read processing from the processor core is activated for an address existing on the update queue, the read from the shared memory cache is suspended, and the data is processed in accordance with the write processing to the shared memory cache when the update data arrives. The process of conveying to a core is performed. Here, all addresses are to be monitored for the addresses stored in the update queue, and the address of the write destination is added to the update data. Therefore, each processor can compare the address in the update queue with the address added to the update data, and write the update data to the appropriate address of the shared memory cache. In addition, although the configuration of the shared memory is basically the same as that of the processor, there is no processor core in the shared memory and the shared memory chip has a larger shared memory cache.

공유 메모리 캐쉬 상에 유효 데이터가 존재하지 않는 경우, 즉 캐쉬 미스 시의 리드 액세스는, 갱신 통지 버스에 갱신 데이터 송출 요구를 발행하고, 공유 메모리 또는 다른 유효한 데이터를 공유 메모리 캐쉬 상에 유지하는 프로세서가 갱신 데이터를 송출함으로써 행한다. If there is no valid data on the shared memory cache, i.e. read access at the time of a cache miss, the processor that issues a request to send an update data to the update notification bus and maintains the shared memory or other valid data on the shared memory cache This is done by sending update data.

도 6은 본 발명의 실시 형태에서의 제1 양태의 일련의 처리에 기초하는 타임 차트의 예이다.6 is an example of a time chart based on a series of processes of the first aspect in the embodiment of the present invention.

본 예는 프로세서 1이 어드레스 1에 데이터 1을, 그것에 계속해서 프로세서 2가 어드레스 2에 데이터 2를 라이트하고 있고, 그와 평행하게 프로세서 3이 어드레스 1, 어드레스 0, 어드레스 1의 순으로 공유 메모리 공간을 리드한 경우이다. 또한, 공유 메모리 공간 상의 데이터 초기값은 모두 0으로 한다. 또한, 도 6에서, A는 어드레스, D는 데이터를 의미하며, 또한, (1)←0 등의 표기는 어드레스 1에의 데이터 0의 라이트, 1←(0) 등의 표기는 어드레스 0으로부터의 데이터 1의 리드를 각각 의미한다. In this example, processor 1 writes data 1 to address 1, processor 2 writes data 2 to address 2, and parallel processor 3 writes address 1, address 0, and address 1 in parallel. This is the case. In addition, all initial values of data in the shared memory space are zero. In Fig. 6, A denotes an address, D denotes data, and (1)? 0 denotes data writing to address 1, and 1? (0) denotes data from address 0. Each of 1 means a lead.

프로세서 3의 1회째의 리드시에는 갱신 큐는 비어 있기 때문에, 공유 메모리 캐쉬로부터 리드가 행해지고, 데이터 0이 프로세서 코어에 반송된다. 계속해서, 프로세서 1로부터의 갱신 통지를 검지하고, 그것이 프로세서 3의 갱신 큐에 투입된다. 프로세서 3의 2회째의 리드시에는, 갱신 큐는 비어 있지 않지만, 갱신 큐 상 에 있는 것은 어드레스 1만으로, 리드 어드레스와 일치하는 것이 존재하지 않아, 1회째의 리드와 마찬가지의 처리로 데이터 0이 프로세서 코어에 반송된다. 3회째의 리드에서는, 갱신 큐 상에 리드 어드레스와 일치하는 것이 있기 때문에, 공유 메모리 캐쉬의 리드는 기동되지 않고 리드 액세스는 유지된다. 그 후, 프로세서 1로부터 어드레스 1의 갱신 데이터가 도착하면, 프로세서 3의 공유 메모리 캐쉬에 데이터 1이 라이트되어 갱신 큐가 클리어되고, 동시에 그 데이터가 어드레스 1의 리드 데이터로서 프로세서 코어에 반송된다. In the first read of the processor 3, since the update queue is empty, the read is performed from the shared memory cache, and data 0 is returned to the processor core. Subsequently, an update notification from processor 1 is detected, and it is put into the update queue of processor 3. At the second read of the processor 3, the update queue is not empty, but the address 1 is only on the update queue, and there is no match with the read address, and data 0 is the same process as the first read. Conveyed to the core. In the third read, since there is a match with the read address on the update queue, the read of the shared memory cache is not activated and the read access is maintained. After that, when update data of address 1 arrives from processor 1, data 1 is written to the shared memory cache of processor 3, and the update queue is cleared, and at the same time, the data is returned to the processor core as read data of address 1.

본 방식의 주된 이점은 다음의 2가지이다. 하나는, 데이터 갱신을 행하는 측의 프로세서에 있어서, 다른 프로세서의 공유 메모리 캐쉬에의 반영을 대기하지 않아도 되기 때문에, 버스 점유 시간을 삭감할 수 있고, 공유 메모리 공간의 대역의 향상을 도모할 수 있는 것이다. 또 하나는, 데이터 갱신 처리와 경합하지 않는 리드 액세스의 불필요한 대기 시간을 배제함으로써, 리드 액세스의 평균 레이턴시를 저감할 수 있는 것이다. 이 중, 후자의 종래예에 대한 개선 정도는, 공유 메모리 캐쉬의 히트율과 액세스 경합의 발생 확률에 따라 변화하는데, 특히 히트율이 높고 경합 발생 확률이 낮을수록, 본 방식의 우위성은 현저해진다. The main advantages of this method are the following two. One is that the processor on the side of updating data does not have to wait for the other processor to reflect the shared memory cache, so that the bus occupancy time can be reduced, and the bandwidth of the shared memory space can be improved. will be. In addition, the average latency of read access can be reduced by eliminating unnecessary waiting time of read access that is not in conflict with the data update process. Of these, the degree of improvement of the latter conventional example varies depending on the hit ratio of the shared memory cache and the probability of occurrence of access contention. In particular, the higher the hit rate and the lower the probability of contention occurrence, the more prominent the advantage of the present scheme is.

본 발명의 실시 형태의 제1 양태에서의 원리는, 제1 양태에서의 데이터 갱신의 단위를 블록화함으로써, 공유 메모리 공간의 대역을 더욱 확대하고자 하는 것이다. 통상 생각할 수 있는 실장으로는, 데이터 채널이나 공유 메모리 캐쉬의 대역은, 갱신 통지 버스의 그것에 비해 훨씬 크게 할 수 있다. 따라서, 공유 메모리 공간의 대역으로는, 갱신 통지 버스의 대역에 의해 제한되고, 데이터 채널이나 공 유 메모리 캐쉬의 대역을 다 활용할 수 없을 가능성이 생긴다. 우선, 이를 해결하고자 하는 것이다. The principle in the first aspect of the embodiment of the present invention is to further expand the bandwidth of the shared memory space by blocking the unit of data update in the first aspect. As a general conceivable implementation, the bandwidth of the data channel and shared memory cache can be much larger than that of the update notification bus. Therefore, the band of the shared memory space is limited by the band of the update notification bus, and there is a possibility that the band of the data channel and the shared memory cache cannot be used up. First of all, this is to be solved.

도 7은 본 발명의 실시 형태의 제2 양태에 기초한 처리의 타임 차트의 예이다. 7 is an example of a time chart of processing based on the second aspect of the embodiment of the present invention.

도 7에서는, 데이터 갱신을 4 어드레스 단위로 한 것이다. 프로세서 1 및 2가 송출하는 갱신 통지는 갱신 대상 어드레스의 선두를 나타냄으로써 행해지고, 대응하는 어드레스의 갱신 데이터는 데이터 채널 상에 하나로 합쳐져 송출된다. In FIG. 7, data update is performed in units of four addresses. The update notifications sent by the processors 1 and 2 are performed by indicating the head of the update target address, and the update data of the corresponding address is collectively sent out on the data channel.

데이터 길이가 고정된 채로는, 소프트웨어의 처리상 불필요한 데이터까지 조로 하여 데이터 갱신을 행하지 않으면 안 되는 케이스가 발생하기 때문에, 데이터 채널이나 공유 메모리 캐쉬의 대역을 낭비하여, 실효 대역을 저하시킬 가능성이 발생한다. 그 때문에, 갱신 데이터 사이즈를 가변으로 하여 필요 충분한 데이터만이 데이터 채널에 송출되도록 구성한다. With the fixed data length, there is a case that data must be updated by tightening even data that is unnecessary for software processing. Therefore, the bandwidth of the data channel and the shared memory cache may be wasted and the effective band may be lowered. do. Therefore, the update data size is made variable so that only necessary data is sent out to the data channel.

도 8은 서로 다른 데이터 사이즈로 데이터 갱신을 행한 경우의 타임 차트의 예이다. 8 is an example of a time chart when data is updated with different data sizes.

도 8에서는, 도 7의 예에서 프로세서 1의 첫회의 라이트가 갱신 사이즈 2로 되어 있는 점만 서로 다르다. 이 차이에 의해, 전체적으로 2 어드레스분의 데이터교환에 요하는 시간만큼, 데이터 채널 및 공유 메모리 캐쉬의 점유 시간이 감소한다. 또한, 그 시간만큼 프로세서 2의 라이트 처리에 대응하는 갱신 데이터의 도착이 빨라져, 갱신 큐의 내용이 클리어되기까지의 시간이 짧아지기 때문에, 이 원리에 의해 액세스 경합 시의 레이턴시도 저감할 수 있다. In FIG. 8, only the first write of the processor 1 is the update size 2 in the example of FIG. This difference reduces the occupancy time of the data channel and the shared memory cache by the time required for data exchange for two addresses as a whole. In addition, since the arrival of the update data corresponding to the write processing of the processor 2 is shortened by that time, and the time until the contents of the update queue is cleared is shortened, the latency during access contention can be reduced by this principle.

또한, 제2 양태에서의 방식은 대역의 향상만이 아니라, 공유 메모리 공간 상에 블록 단위에서의 배타적 갱신을 제공하는 수단으로도 된다. 이러한 점에 의해 소프트웨어 처리를 효율화하여, 시스템의 처리 능력을 향상하는 것도 기대할 수 있다. 동등한 것을 소프트웨어로 실현하기 위해서는, 갱신 개시와 완료를 관리하기 위해 여분의 처리가 필요해지기 때문이다. In addition, the scheme in the second aspect may be not only an improvement of the band but also a means for providing an exclusive update in units of blocks on the shared memory space. In this regard, it is also possible to make the software processing more efficient and to improve the processing capability of the system. This is because in order to realize the equivalent in software, extra processing is required to manage update start and completion.

본 발명의 실시 형태의 제3 양태에서의 원리는, 프로세서가 코히어런시 제어 필요 여부의 속성을 라이트 액세스마다 선택할 수 있게 하여, 코히어런시 제어 불필요의 속성이 지정된 라이트 액세스에 대하여 갱신 통지를 발행하지 않고, 갱신 데이터만을 다른 프로세서에 송출하는 제어를 행하는 것이다. 소프트웨어의 처리 내용에 따라서는, 코히어런시 보증이 불필요한 공유 메모리 공간의 용도도 있기 때문에, 그와 같은 처리에 대하여, 이 제어를 소프트웨어가 이용하여, 갱신 통지 버스의 사용 빈도를 삭감하여 공유 메모리 공간의 대역을 향상함과 함께, 갱신 데이터가 다른 프로세서에 반영되는 시간을 단축하고, 또한 불필요한 액세스 경합의 발생에 의한 레이턴시 증가를 필요 최소한으로 억제하여 리드 액세스의 평균 레이턴시의 삭감을 도모하고자 하는 것이다. The principle in the third aspect of the embodiment of the present invention is that the processor can select, for each write access, an attribute of whether coherency control is required, so that an update notification is given for a write access for which an attribute of no coherency control is specified. The control is performed to send only the update data to another processor without issuing the. Depending on the contents of the software process, there is also the use of the shared memory space where no coherency guarantee is required. Therefore, the software uses this control for such a process to reduce the frequency of use of the update notification bus, thereby reducing the shared memory. In addition to improving the bandwidth of the space, it is possible to reduce the time required for update data to be reflected to other processors, and to reduce the average latency of read access by minimizing the increase in latency caused by unnecessary access contention. .

도 9는 본 발명의 실시 형태의 제3 양태에 기초한 처리의 타임 차트의 예이다. 9 is an example of a time chart of processing based on the third aspect of the embodiment of the present invention.

이 예에서의 프로세서의 액세스 패턴은 도 6의 예에 준하고 있고, 프로세서 1의 첫회의 라이트가 코히어런시 제어 불필요의 속성이 덧붙여져 있다는 점만이 상이하다. 프로세서 1의 첫회의 라이트에 수반하는 갱신 통지 버스 상의 처리가 기 동되지 않기 때문에, 그에 필요한 만큼 갱신 통지 버스의 점유 시간이 감소하고 있다. 또한, 그 만큼 프로세서 2에 의한 2회째의 라이트 액세스에 수반하는 갱신 통지가 갱신 통지 버스에 빠르게 송출되기 때문에, 갱신 시간의 단축을 도모할 수도 있다. 프로세서 3의 3회째의 리드는 프로세서 1의 라이트보다 후에 발행되지만, 본 제어에 의해 갱신 큐에는 투입되어 있지 않기 때문에, 경합에 의한 대기가 발생하지 않고, 통상과 동일한 레이턴시로 리드 액세스가 완료하고 있다. The access pattern of the processor in this example is in accordance with the example of FIG. 6, except that the first write of processor 1 is accompanied by an attribute that requires no coherency control. Since the processing on the update notification bus accompanying the first write of the processor 1 is not started, the occupancy time of the update notification bus is reduced as necessary. In addition, since the update notification accompanying the second write access by the processor 2 is quickly sent to the update notification bus by that much, the update time can be shortened. The third read of the processor 3 is issued after the write of the processor 1, but since it is not put into the update queue by this control, no waiting for contention occurs and the read access is completed with the same latency as usual. .

본 발명의 실시 형태의 제4 양태에서의 원리는, 프로세서의 온라인 증설 시에, 공유 메모리 공간의 모든 데이터를 유지하는 프로세서 혹은 공유 메모리가, 자신이 갖는 공유 메모리 공간의 데이터를 데이터 채널의 빈 시간을 사용하여, 증설 프로세서에 전송하고, 증설 프로세서는 그 데이터를 받아 공유 메모리 캐쉬를 초기화하는 것이다. The principle in the fourth aspect of the embodiment of the present invention is that the processor or shared memory holding all the data in the shared memory space at the time of online expansion of the processor frees the data in the shared memory space owned by the processor. Is sent to the expansion processor, which receives the data and initializes the shared memory cache.

증설 직후의 프로세서는 공유 메모리 캐쉬의 내용이 모두 무효 데이터로서, 그대로 운용계에 참가시키면, 공유 메모리 공간에의 액세스가 모두 공유 메모리 캐쉬에서 미스 히트한다. 이로써, 운용 개시 직후는 증설 프로세서의 처리 능력이 현저히 저하할 뿐만 아니라, 갱신 통지 버스나 데이터 채널이 부주의하게 점유되기 때문에 다른 프로세서에도 영향을 주어, 시스템 성능을 오히려 저하시킬 위험도 있다. 본 방식에 의해, 프로세서 증설에 의한 운용계의 처리 능력 저하를 방지하고, 또한 증설 프로세서의 처리 능력도 운용 개시 직후부터 최대한으로 끌어올릴 수 있다. If the contents of the shared memory cache are invalid data immediately after the expansion, and all of them are directly added to the operating system, all accesses to the shared memory space will be missed in the shared memory cache. As a result, the processing capacity of the expansion processor is notably reduced immediately after the start of operation, and since the update notification bus and the data channel are inadvertently occupied, there is a risk that other processors are affected and the system performance is rather deteriorated. According to this method, the processing capacity of the operating system can be prevented from being increased by the processor expansion, and the processing capacity of the expansion processor can also be raised to the maximum immediately after the start of operation.

도 10은 본 발명의 실시 형태의 제4 양태의 원리에 기초를 둔 타임 차트의 예이다. 10 is an example of a time chart based on the principle of the fourth aspect of the embodiment of the present invention.

도면에서, a∼h는 통상의 데이터 갱신 처리에 기초한 전송으로, 1∼8까지가, 본 방식에 의해 행해지는 증설 프로세서에의 데이터 전송이다. 증설 프로세서는, 자신이 새롭게 시스템에 실장된 것을, 갱신 통지 버스에 특정한 신호를 송출하거나, 혹은 실장 미 실장을 나타내는 전용의 신호선을 이용하는 등의 방법으로 다른 프로세서에 통지한다. 증설 프로세서 대상으로 데이터를 송출하는 프로세서 또는 공유 메모리는 그 통지를 받아, 도 10에 도시한 바와 같이, 자신의 갱신 큐가 비어 있을 때에, 갱신 데이터를 데이터 채널에 송출한다. 갱신 큐가 비어 있지 않으면, 즉시 데이터 송출을 중단하고 통상의 처리를 우선하고, 갱신 큐가 비게 되면 데이터 송출을 재개한다. 이러한 처리에 의해, 시스템 상에서 행해지는 통상의 데이터 갱신 처리의 타이밍에 영향을 주지 않고도, 증설 프로세서에 대하여 공유 메모리 캐쉬를 채우기 위한 데이터를 송출하는 처리를 추가할 수 있다. 증설 프로세서는, 데이터 채널로부터 수신한 모든 데이터로 채워진 후, 본래의 처리를 개시하여 운용계에 참가한다. 이 때에는 공유 메모리 캐쉬의 내용은 모두 갱신되어 있고, 운용 개시 직후부터 공유 메모리 캐쉬의 히트율이 높게 유지되어, 시스템으로서의 처리 능력을 향상할 수 있다. In the figure, a to h are transfers based on the normal data update process, and 1 to 8 are data transfers to the expansion processor performed by this system. The expansion processor notifies another processor of the newly installed system by sending a signal to the update notification bus or by using a dedicated signal line indicating unimplementation. The processor or shared memory that sends data to the enlarged processor object receives the notification and sends update data to the data channel when its update queue is empty, as shown in FIG. If the update queue is not empty, data sending stops immediately and priority processing is given priority. When the update queue becomes empty, data sending resumes. By this process, a process of sending out data for filling the shared memory cache to the expansion processor can be added without affecting the timing of the normal data update process performed on the system. After the expansion processor is filled with all data received from the data channel, the expansion processor starts the original processing and participates in the operation system. At this time, the contents of the shared memory cache are all updated, and the hit ratio of the shared memory cache is kept high immediately after the start of operation, thereby improving the processing capability as a system.

도 11 및 도 12는 본 발명의 실시 형태의 제5 양태에서의 시스템의 구성도와, 그 제어 원리를 나타내는 타임 차트이다. 11 and 12 are structural diagrams of a system in a fifth aspect of the embodiment of the present invention, and a time chart showing the control principle thereof.

제5 양태에 의한 제어의 원리는, 경합 빈도가 높은 특정한 어드레스에의 라이트 처리에 종래와 동일한 방법을 선택적으로 사용할 수 있도록 함으로써, 경합 시의 리드 액세스의 레이턴시를 저감하고자 하는 것이다. 도 11에 도시하는 바와 같이, 갱신 데이터를 전송하는 데이터 버스(11i)를 설치하여 갱신 통지 버스와 동일한 조정 논리 경로로서 데이터 채널(11g)을 사용할지, 데이터 버스를 사용할지는, 라이트 액세스마다 프로세서가 선택한다. The principle of the control according to the fifth aspect is to reduce the latency of read access during contention by selectively allowing the same method as conventionally used to write to a specific address having a high contention frequency. As shown in Fig. 11, whether the data channel 11g or the data bus is used as the same adjustment logical path as that of the update notification bus by providing a data bus 11i for transmitting update data, the processor determines for each write access. Choose.

도 12는 동일 시간에 발행된 라이트 액세스에 있어서, 갱신 데이터의 전송 경로에 데이터 채널을 사용한 경우(P=0)와, 데이터 버스를 사용한 경우(P=1)의 타이밍의 차이를 나타낸 것이다. 라이트 액세스 기동으로부터, 다른 프로세서가 갱신 전의 데이터를 리드하지 않게 되기까지의 시간은, t_dsd로서, 양자에 차이 없다. 그러나, 갱신 후의 데이터를 리드할 수 있게 되기까지의 시간은, (P=1)인 경우의 t_duc1에 대하여, (P=0)인 경우, 데이터 채널의 레이턴시의 영향을 받아, t_duc0으로 증대한다. 동일 어드레스에 대한 리드 액세스의 경합이 발생하지 않는 한, 이 차이는 아무런 영향을 미치지 않지만, 경합이 발생한 경우에, 이 시간 차이가 리드 액세스의 레이턴시 증대로 되어 나타나기 때문에, 경합이 다발하는 액세스에 대하여 (P=0)을 선택적으로 사용한다. 그에 따라, 리드 액세스의 평균 레이턴시를 저감할 수 있다. Fig. 12 shows the difference in timing between the case of using a data channel (P = 0) and the case of using a data bus (P = 1) in the write access issued at the same time. The time from the write access activation until the other processor does not read the data before the update is t _dsd . However, the time until able to read the data after the updating is, with respect to the t _duc1 For (P = 1), when the (P = 0), received a latency effect of the data channel, increasing the t _duc0 do. As long as there is no contention of read access to the same address, this difference has no effect. However, in case of contention, this time difference appears as an increase in latency of the read access. Optionally use (P = 0). As a result, the average latency of read access can be reduced.

도 13은 본 발명의 실시 형태의 제6 양태를 설명하는 도면이다. It is a figure explaining the 6th aspect of embodiment of this invention.

도 13의 (a)는, 제6 양태에서의 제어의 타임 차트이다.FIG. 13A is a time chart of control in the sixth aspect.

제6 양태는, 제5 양태에서의 제어 원리를, 제1∼제4 양태에서의 시스템 구성에 그대로 적용하는 것으로, 특정한 라이트 액세스에 대하여, 갱신 데이터의 물리 적인 전송은 행하지 않고 데이터 갱신을 행하는 것이다. 구체적으로는, 공유 메모리 공간 상의 어드레스로 라이트될 데이터를, 프로세서 코어가 생성하는 특정한 어드레스에 미리 대응시켜 두고, 그 특정 어드레스에 대한 라이트 액세스가 발행된 경우, 갱신 통지가 발행된 시점에, 그 예약된 데이터가 갱신 데이터로서 전송된 것으로 취급한다. 이 방법으로는 작은 정보량의 데이터밖에 취급할 수 없지만, 신호선 수가 많은 종래와 같은 데이터 버스를 설치하지 않고도, 또한, 제5 양태의 방식과 동일한 효과를 얻을 수 있다. The sixth aspect applies the control principle of the fifth aspect to the system configuration of the first to fourth aspects as it is, and updates data without performing physical transfer of update data for a specific write access. . Specifically, the data to be written to the address on the shared memory space is made to correspond in advance to a specific address generated by the processor core, and when the write access to the specific address is issued, the reservation is made at the time the update notification is issued. Treated data is transmitted as update data. In this method, only a small amount of data can be handled, but the same effects as those in the fifth aspect can be obtained without providing a conventional data bus having a large number of signal lines.

도 13의 (a)의 예에서는, 어드레스 1에 대한 라이트는, 공유 메모리 공간 상의 동일 어드레스에 대한 데이터 1의 라이트로서 취급하고 있다. 갱신 통지의 인식시, 갱신 데이터가 동시에 전달된 것으로 취급할 수 있기 때문에, 제5 양태에서(P=1)로 한 경우와 동일한 타이밍에서 처리를 행할 수 있다. 또한, 데이터 채널의 점유가 발생하지 않기 때문에, 후속의 액세스가 있는 경우에는, 그 액세스에 관계되는 레이턴시를 저감하는 효과도 얻을 수 있다. In the example of FIG. 13A, the write to address 1 is treated as the write of data 1 to the same address in the shared memory space. When the update notification is recognized, the update data can be treated as being delivered at the same time, so that the processing can be performed at the same timing as in the fifth embodiment (P = 1). In addition, since the occupation of the data channel does not occur, when there is subsequent access, the effect of reducing latency related to the access can also be obtained.

예를 들면, 도 13의 (a)의 예에서, 어드레스 2에 대한 라이트를 어드레스 1에 대한 데이터 0의 라이트로서 취급하는 규약을 마련하여 병용하면, 액세스 경합의 오버헤드가 적어, 다른 프로세서에의 반영시간도 고속인 2값의 플래그로서의 기능을 소프트웨어에 제공할 수 있다(도 13의 (b)). For example, in the example of FIG. 13A, when a protocol for treating writes to address 2 as writes to data 0 to address 1 is provided and used in combination, the overhead of access contention is small, and therefore, to other processors. It is possible to provide the software with a function as a two-value flag at which the reflection time is also high (Fig. 13 (b)).

도 14는 본 발명의 실시 형태에 기초한 보다 구체적인 시스템 구성도이다. 시스템은 프로세서 10대(14a-1)∼(14a-10)와, 버스 아비터/리피터(14b)로 구성되어 있다. 버스 아비터와 리피터는 완전히 독립된 기능을 제공하는 것이지만, 시스템 구성을 간이하게 하기 위해서, 양 블록을 동일한 유닛에 수용하고 있다. 갱신 통지 버스(14c)는, 버스 블록 BC1∼BC10, 버스 요구 신호 NR1∼NR10, 버스 허가 신호 NG1∼NG10, 갱신 통지 어드레스 NA(30비트), 갱신 통지 어드레스 마스크 NM(4비트), 즉시 갱신 데이터 ND(4비트), 갱신 통지 신호 NV, 갱신 데이터 요구 신호 RV, 즉시 갱신 속성 신호 NI로 이루어지고, BC에 동기하여 동작한다. 데이터 채널 TSD1∼TSD10, RSD1∼RSD10은, 약 3기가 비트/초의 전송 대역을 갖는 시리얼 전송 선로를 대향시킨 전체 이중 통신 채널을 이용하고 있다. 프로세서 중 적어도 2개는 공유 메모리 공간의 전체 내용을 유지하고 있고, 그 중 1개는 마스터 프로세서로서 갱신 데이터 요구에 응답한다. 14 is a more detailed system block diagram based on the embodiment of the present invention. The system is composed of ten processors 14a-1 to 14a-10 and a bus arbiter / repeater 14b. The bus arbiter and repeater provide completely independent functions, but both blocks are housed in the same unit to simplify system configuration. The update notification bus 14c includes bus blocks BC1 to BC10, bus request signals NR1 to NR10, bus permission signals NG1 to NG10, update notification address NA (30 bits), update notification address mask NM (4 bits), and immediate update data. ND (4 bits), update notification signal NV, update data request signal RV, and immediate update attribute signal NI, and operate in synchronization with BC. The data channels TSD1 to TSD10 and RSD1 to RSD10 utilize all duplex communication channels in which serial transmission lines having transmission bands of about 3 gigabit / sec are opposed to each other. At least two of the processors maintain the entire contents of the shared memory space, one of which responds to the update data request as a master processor.

도 15는 도 14의 각 프로세서(14a-1)∼(14a-10)의 내부 구성도이다. FIG. 15 is a diagram illustrating an internal configuration of each of the processors 14a-1 to 14a-10 of FIG. 14.

프로세서 내부의 기능 블록은, 프로세서 코어(15a), 프로세서 버스 브릿지(15b), 갱신 통지 버스 브릿지(15e), 데이터 채널 IF(15h), 갱신 큐(15k), 공유 메모리 캐쉬(15n)로 대별된다. 각 부의 기능 개략을 이하에 기재한다. The functional blocks inside the processor are roughly divided into a processor core 15a, a processor bus bridge 15b, an update notification bus bridge 15e, a data channel IF 15h, an update queue 15k, and a shared memory cache 15n. . The functional outline of each part is described below.

(15a) 프로세서 코어 (15a) processor core

주 처리부이다. It is the main processing unit.

(15b) 프로세서 버스 브릿지 (15b) processor bus bridge

공유 메모리 공간에의 액세스의 포괄적 제어를 행한다. Perform comprehensive control of access to the shared memory space.

제어 블록(15c)은 전체의 제어를, 리다이렉터(15d)는, 각 기능 블록 간의 버스 스위칭과, 어드레스 및 데이터의 변환을 행한다. The control block 15c performs overall control, and the redirector 15d performs bus switching between the functional blocks and converts addresses and data.

(15e) 갱신 통지 버스 브릿지 (15e) Update Notice Bus Bridge

갱신 통지 버스의 제어를 행한다. The update notification bus is controlled.

(15h) 데이터 채널 IF (15h) data channel IF

다른 프로세서와의 사이에서 갱신 데이터의 송수신을 행한다. Update data is transmitted / received with another processor.

(15k) 갱신 큐 (15k) update queue

갱신 큐를 수용하고 있고, 큐 상태를 외부에 출력한다. Holds an update queue and prints the queue status externally.

(15n) 공유 메모리 캐쉬 (15n) shared memory cache

공유 메모리 공간의 데이터를 유지하고, 프로세서 코어에 대하여 고속인 액세스를 제공한다. It maintains data in shared memory space and provides fast access to processor cores.

도 16은 본 발명의 실시 형태에서의 제1 양태의 라이트 액세스 시의 신호의 흐름을 설명하는 도면이다. It is a figure explaining the flow of the signal at the time of the light access of 1st aspect in embodiment of this invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는, 도 16의 각 신호에 붙인 번호에 대응한다. The flow is described below. The preceding number of each line corresponds to the number attached to each signal of FIG.

(1) 프로세서 코어(16a)가 프로세서 어드레스 PA, 프로세서 데이터 PD, 프로세서 전송 데이터 PT를 설정하고, 프로세서 라이트 신호 PW를 송신한다. (1) The processor core 16a sets the processor address PA, the processor data PD, and the processor transmission data PT, and transmits the processor write signal PW.

(2) 프로세서 버스 브릿지(16b)의 제어 로직(16c)은 리다이렉터 기능 제어 신호 FC를 설정한다. 리다이렉터(16d)는, 그에 따라, 프로세서 어드레스 PA를 실효 어드레스 EA와 캐쉬 어드레스 CA에, 프로세서 데이터 PD를 실효 데이터 ED 및 캐쉬 데이터 CD에 에코한다. (2) The control logic 16c of the processor bus bridge 16b sets the redirector function control signal FC. The redirector 16d thus echoes the processor address PA to the effective address EA and the cache address CA and the processor data PD to the effective data ED and the cache data CD.

(3) 프로세서 버스 브릿지(16b)의 제어 로직(16c)은 갱신 통지 송신 신호 NS를 송신한다. (3) The control logic 16c of the processor bus bridge 16b transmits the update notification transmission signal NS.

(4) 갱신 통지 버스 브릿지(16e)의 송신부(16f)는 NS를 받아, 버스 요구 신호 NR을 송신한다. (4) The transmission unit 16f of the update notification bus bridge 16e receives NS and transmits the bus request signal NR.

(5) 갱신 통지 버스 브릿지(16e)의 송신부(16f)가 버스 허가 신호 NG를 수신하여, 갱신 통지 버스를 획득한다. (5) The transmitting unit 16f of the update notification bus bridge 16e receives the bus permission signal NG to obtain an update notification bus.

(6) 갱신 통지 어드레스 NA에 EA가 에코되고, 갱신 통지 신호 NV가 전체 프로세서에 송신된다. NA 및 NV는 자기 프로세서의 갱신 통지 버스 브릿지 감시부(16g)에도 루프백하여 수신된다. (6) EA is echoed to update notification address NA, and update notification signal NV is transmitted to all the processors. NA and NV are also looped back to the update notification bus bridge monitoring unit 16g of the own processor.

(7) 갱신 통지 버스 브릿지(l6e)의 감시부(16g)는, 자신이 송출한 NV를 수취하면, NA를 갱신 통지 어드레스 SA로서 에코함과 함께, NV를 갱신 통지 수신 신호 SV로서 자기 프로세서 내에 송신한다. SV를 받아, 갱신 큐(16k)의 큐 레지스터(16l)에 해당 갱신 통지가 큐잉된다. 이 때, 다른 프로세서 상에서도 동일한 제어가 행하여진다. (7) The monitoring unit 16g of the update notification bus bridge 16e echoes NA as the update notification address SA and receives the NV as the update notification received signal SV when receiving the NV sent by itself. Send. Upon receiving the SV, the update notification is queued in the queue register 16l of the update queue 16k. At this time, the same control is performed on other processors.

(8) 프로세서 버스 브릿지(l6b)의 제어 로직(16c)은 SV를 받아 갱신 데이터 송신 신호 US를 송신하고, 이를 받은 데이터 채널 IF(16h)의 프레이머(16i)는, EA/ED의 내용을 송신 버퍼에 큐잉한다. US의 송신 후, 프로세서 코어에 액크날리지 신호 ACK가 송신되고, 프로세서 코어측의 액세스는 완료한다. (8) The control logic 16c of the processor bus bridge 16b receives the SV and transmits the update data transmission signal US, and the framer 16i of the received data channel IF 16h transmits the contents of EA / ED. Queue to the buffer. After the transmission of the US, an acknowledge signal ACK is transmitted to the processor core, and the access on the processor core side is completed.

(9) 데이터 채널 IF(16h)의 프레이머(16i)에서는, 송신 버퍼에 큐잉된 데이터가 수시 패킷에 구축되어 있고, 완료한 분으로부터 SERDES(16j)(시리얼라이저 디시리얼라이저의 약칭으로, 시리얼 신호를 페러랠 신호로 변환하거나, 페러랠 신호를 시리얼 신호로 변환하는 기능 블록임)에 송신 페러랠 데이터 TPD 로서 송출된 다. SERDES는 이를 받아, 데이터 채널에서 반송할 수 있는 전기 신호로 변조를 행하여, 송신 시리얼로 TSD로서 갱신 데이터를 송출한다. (9) In the framer 16i of the data channel IF 16h, data queued to the transmission buffer is built into a packet from time to time, and when completed, SERDES 16j (abbreviation for serializer deserializer) is used to parse a serial signal. It is a function block for converting to a parallel signal or a parallel signal to a serial signal) and is transmitted as the transmission parallel data TPD. SERDES receives it, modulates it with an electrical signal that can be carried in the data channel, and sends update data as a TSD on the transmission serial.

도 17은 본 발명의 실시 형태의 제1 양태에 기초한 갱신 데이터 수신 시의 신호의 흐름을 설명하는 도면이다. It is a figure explaining the flow of the signal at the time of update data reception based on the 1st aspect of embodiment of this invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 17의 각 신호에 붙인 번호에 대응한다. The flow is described below. The number in front of each row corresponds to the number added to each signal in FIG.

(1) 데이터 채널 IF(17h)의 SERDES(17j)가 수신 시리얼 데이터 RSD를 복조하여, 프레이머(17i)에 수신 페러랠 데이터 RPD로서 송출한다. (1) The SERDES 17j of the data channel IF 17h demodulates the received serial data RSD and sends it to the framer 17i as a received parallel data RPD.

(2) 데이터 채널 IF(17h)의 프레이머(17i)는 RPD를 받아, 데이터 중의 패킷의 추출 및 전개를 행하고, 갱신 데이터 어드레스 UA, 갱신 데이터 UD를 설정하여, 갱신 데이터 수신 신호 UR을 송신한다. 이에 맞추어, 큐 레지스터(17l)의 큐 클리어 어드레스 QCA에 UA가 세트된다. (2) The framer 17i of the data channel IF 17h receives the RPD, extracts and expands a packet in the data, sets the update data address UA and update data UD, and transmits the update data reception signal UR. In accordance with this, the UA is set in the queue clear address QCA in the queue register 17l.

(3) 프로세서 버스 브릿지(17b)의 제어 로직(17c)은 UR을 받아, 리다이렉터 기능 제어 신호 FC를 설정한다. 리다이렉터(l7d)는 그에 따라, UA를 CA에, UD를 CD에 에코한다. 제어 로직(17c)에서 다른 처리가 행하여지고 있는 경우, 일단 대기하고, 그것이 완료하는 대로 본 처리를 실행한다. (3) The control logic 17c of the processor bus bridge 17b receives the UR and sets the redirector function control signal FC. The redirector l7d thus echoes the UA to the CA and the UD to the CD. If other processing is being performed in the control logic 17c, it waits once and executes this processing as soon as it completes.

(4) 프로세서 버스 브릿지(17b)의 제어 로직(17c)은 캐쉬 라이트 신호 CW를 송신하고, 이를 받은 공유 메모리 캐쉬(17n)는 CA에서 지정되는 원하는 데이터를 CD에서 갱신한다. 또한, 제어 로직(17c)은, 큐 클리어 신호 QC를 송신하고, 이를 받은 갱신 큐(17k)는, (2)에서 세트한 QCA를 큐 레지스터(17l)로부터 클리어한다. (4) The control logic 17c of the processor bus bridge 17b transmits the cache write signal CW, and the shared memory cache 17n receiving this updates the desired data specified in the CA on the CD. In addition, the control logic 17c transmits the queue clear signal QC, and the update queue 17k which has received this clears the QCA set in (2) from the queue register 17l.

도 18은 본 발명의 실시 형태의 제1 양태에서, 공유 메모리 캐쉬의 데이터를 이용할 수 있는 전형적인 리드 액세스 시의 신호의 흐름을 설명하는 도면이다. FIG. 18 is a diagram illustrating a signal flow during typical read access in which data in the shared memory cache can be used in the first aspect of the embodiment of the present invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 18의 각 신호에 붙인 번호에 대응한다. The flow is described below. The number in front of each line corresponds to the number added to each signal in FIG.

(1) 프로세서 코어(18a)가 PA, PT를 설정하고, 프로세서 리드 신호 PR를 송신한다. (1) The processor core 18a sets PA and PT, and transmits the processor read signal PR.

(2) 프로세서 버스 브릿지(18b)의 제어 로직(18c)은 FC을 설정하고, 리다이렉터(18d)는, 그에 따라, PA를 EA와 CA에 에코한다. (2) The control logic 18c of the processor bus bridge 18b sets the FC, and the redirector 18d echoes the PA to the EA and CA accordingly.

(3) 프로세서 버스 브릿지(18b)의 제어 로직(18c)이 CR을 송신한다. (3) The control logic 18c of the processor bus bridge 18b transmits the CR.

(4) 공유 메모리 캐쉬(18n)는 CR을 받고, CA에서 지정된 캐쉬 상의 데이터를 이용할 수 없는 경우에는 이용 불능 신호 NP를 송신하고, 이용 가능한 경우에는 캐쉬 데이터 CD를 송신한다. 또한, 갱신 큐(18k)의 비교기(18m)는, EA에서 지정되는 큐가 큐 레지스터 상에 있는 경우, 경합 신호 COL을 송신한다. (4) The shared memory cache 18n receives the CR, transmits the unavailable signal NP when the data on the cache designated by the CA is not available, and transmits the cache data CD when it is available. In addition, the comparator 18m of the update queue 18k transmits a contention signal COL when the queue specified by EA is on the queue register.

(5) 프로세서 버스 브릿지(18b)의 제어 로직(18c)은, NP, COL 중 어느 것도 수신하지 못한 경우, CD를 PD에 에코하고, ACK를 송신하고 액세스는 완료한다. COL을 수신한 경우에는 CR을 해제한 후, COL이 해제될 때까지 대기하고, COL 해제 후에 (3) 이후의 처리를 재차 행한다. 여기서, COL을 수신하지 않고, NP를 수신한 경우의 처리는 이하에서 설명한다. (5) When neither the NP nor the COL is received, the control logic 18c of the processor bus bridge 18b echoes the CD to the PD, sends an ACK, and access is completed. In the case of receiving the COL, after the release of the CR, the process waits until the release of the COL and releases the processing after the release of the COL (3). Here, the process in the case of receiving NP without receiving COL will be described below.

도 19는 본 발명의 실시 형태의 제1 양태에서의 리드 액세스에서, 공유 메모리 캐쉬 상의 데이터를 이용할 수 없고, 갱신 데이터 요구 처리를 수반하는 경우의 신호의 흐름을 설명하는 도면이다. FIG. 19 is a diagram illustrating a signal flow when data on the shared memory cache cannot be used in read access in the first aspect of the embodiment of the present invention and involves update data request processing.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 19의 각 신호에 붙인 번호에 대응한다. 또한, 도중의 (4)까지는 전항에서 설명한 리드 액세스 시의 흐름과 완전히 동일하기 때문에, 생략한다. The flow is described below. The number in front of each line corresponds to the number attached to each signal in FIG. In addition, up to (4) on the way is abbreviate | omitted since it is completely the same as the flow at the time of read access demonstrated in the preceding paragraph.

(5) 프로세서 버스 브릿지(19b)의 제어 로직(19c)이 COL을 수신하지 않고, NP를 수신한 경우에는, 갱신 데이터 요구 신호 RS를 송신한다. (5) When the control logic 19c of the processor bus bridge 19b does not receive the COL but receives the NP, the update data request signal RS is transmitted.

(6) 갱신 통지 버스 브릿지(19e)의 송신부(19f)는 RS를 받고, 버스 요구 신호 NR를 송신한다. (6) The transmitting unit 19f of the update notification bus bridge 19e receives the RS and transmits the bus request signal NR.

(7) 갱신 통지 버스 브릿지(19e)의 송신부(19f)가 버스 허가 신호 NG를 수신하고, 갱신 통지 버스를 획득한다. (7) The transmission unit 19f of the update notification bus bridge 19e receives the bus permission signal NG to obtain an update notification bus.

(8) 갱신 통지 어드레스 NA에 EA가 에코되고, 갱신 데이터 요구 신호 RV가 전체 프로세서에 송신된다. NA 및 RV는 자기 프로세서의 갱신 통지 버스 브릿지 감시부(19g)에도 루프백하여 수신된다. (8) EA is echoed to update notification address NA, and update data request signal RV is transmitted to all processors. The NA and RV are also looped back to the update notification bus bridge monitoring unit 19g of the self processor.

(9) 갱신 통지 버스 브릿지(l9e)의 감시부(19g)는 NA를 SA로서 에코함과 함께, 자기 프로세서가 송출한 RV를 검지하면, 자기 프로세서 내에 SV로서 에코한다. 갱신 큐(19k)는 SV를 큐 세트 신호 QS로서 받아, SA의 내용을 큐 세트 어드레스 QSA로서 큐 레지스터(19l)에 큐잉한다. (9) The monitoring unit 19g of the update notification bus bridge l9e echoes NA as SA and, upon detecting the RV sent by the magnetic processor, echoes as SV in the magnetic processor. The update queue 19k receives the SV as a cue set signal QS and queues the contents of the SA into the queue register 19l as the cue set address QSA.

(10) 리드 액세스 대상에 일치하는 큐가 큐잉되기 때문에, 갱신 큐(19k)로부터 COL이 반드시 송신된다. COL의 수신으로써, 프로세서 버스 브릿지(19b)는 COL이 해제될 때까지, 프로세서 코어(19a)로부터의 리드 액세스를 보류한 채로 갱신 통지와 갱신 데이터의 수신 처리를 행하면서 대기한다. (10) Since the queue matching the read access object is queued, the COL is necessarily transmitted from the update queue 19k. Upon reception of the COL, the processor bus bridge 19b waits while performing the update notification and the reception of the update data with the read access pending from the processor core 19a until the COL is released.

(11) (8)에서 송출된 갱신 데이터 요구를 받아, 마스터 프로세서로부터 갱신 데이터가 송출되고, 데이터 채널 IF(19h)은 갱신 데이터 어드레스 UA, 갱신 데이터 UD를 설정하고, 갱신 데이터 수신 신호 UR를 송신한다. 이에 맞추어, 큐 레지스터(19l)의 큐 클리어 어드레스 QCA에 UA가 세트된다. (11) Upon receiving the update data request sent in (8), update data is sent from the master processor, and the data channel IF 19h sets the update data address UA and the update data UD, and transmits the update data reception signal UR. do. In accordance with this, the UA is set in the queue clear address QCA of the queue register 19l.

(12) 갱신 큐(19k)로부터 리드 액세스 대상의 큐가 클리어되기 때문에, COL이 해제된다. (12) Since the queue for read access is cleared from the update queue 19k, the COL is released.

(13) 프로세서 버스 브릿지(19b)의 제어 로직(19c)은, COL의 해제를 받아, FC를 제어하여 리다이렉터(19d)를 제어하여, UA를 CA에, UD를 CD와 PD에 에코한다. (13) The control logic 19c of the processor bus bridge 19b receives the release of the COL, controls the FC to control the redirector 19d, and echoes the UA to the CA and the UD to the CD and the PD.

(14) 프로세서 버스 브릿지(19b)의 제어 로직(19c)은 캐쉬 라이트 신호 CW를 송신하여 공유 메모리 캐쉬 상의 원하는 데이터를 CD에서 갱신함과 함께, 프로세서 코어에 대하여 ACK를 송신하고, 리드 액세스를 완료한다. (14) The control logic 19c of the processor bus bridge 19b transmits the cache write signal CW to update the desired data on the shared memory cache on the CD, sends an ACK to the processor core, and completes read access. do.

도 20은 본 발명의 실시 형태의 제1 양태에서, 다른 프로세서로부터 송신된 갱신 데이터 요구에 대한, 마스터 프로세서에 의한 응답 시의 신호의 흐름을 설명하는 도면이다. FIG. 20 is a diagram for explaining the flow of signals in response by the master processor to an update data request transmitted from another processor in the first aspect of the embodiment of the present invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 20의 각 신호에 붙인 번호에 대응한다. The flow is described below. The number in front of each row corresponds to the number attached to each signal in FIG.

(1) 갱신 통지 버스 브릿지(19e)의 감시부(19g)는 RV를 검지하면, NA를 SA에 에코함과 함께, 갱신 데이터 요구 신호 SR을 프로세서 내부에 송신한다. (1) When the monitoring unit 19g of the update notification bus bridge 19e detects the RV, it echoes the NA to the SA and transmits the update data request signal SR to the processor.

(2) 프로세서 버스 브릿지(20b)의 제어 로직(20c)은, 자신이 마스터 프로세 서인 경우, SR을 받아 FC를 설정하고, 리다이렉터(20d)를 제어하여 SA를 EA와 CA에 에코하고, CD와 ED를 접속한다. 여기서 자신이 마스터 프로세서가 아닌 경우, SR은 무시된다. 또한, 제어 로직(17c)에서 다른 처리가 행하여지고 있는 경우, 일단 대기하고, 그것이 완료하는 대로, 본 처리를 실행한다. (2) When the control logic 20c of the processor bus bridge 20b is the master processor, the control logic 20c receives the SR, sets the FC, controls the redirector 20d to echo the SA to the EA and CA, and CDs. Connect to ED. Here, if it is not the master processor, the SR is ignored. In addition, when another process is performed in the control logic 17c, it waits once and executes this process as soon as it completes.

(3) 프로세서 버스 브릿지(20b)의 제어 로직(20c)은, CR을 공유 메모리 캐쉬(20n)에 송신한다. (3) The control logic 20c of the processor bus bridge 20b transmits the CR to the shared memory cache 20n.

(4) 공유 메모리 캐쉬(20n)로부터 CD가 송출되어 ED에 에코된다. (4) The CD is sent out from the shared memory cache 20n and echoed to the ED.

(5) 프로세서 버스 브릿지(20b)의 제어 로직(20c)은 US를 송신하고, 라이트 액세스 시의 갱신 데이터 송출 처리와 마찬가지로, 데이터 채널에 갱신 데이터가 송출된다. (5) The control logic 20c of the processor bus bridge 20b transmits US, and the update data is sent to the data channel similarly to the update data sending process at the write access.

도 21은 본 발명의 실시 형태의 제2 양태에서의 라이트 액세스 시의 신호의 흐름을 설명하는 도면이다. It is a figure explaining the flow of the signal at the time of write access in the 2nd aspect of embodiment of this invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 21의 각 신호에 붙인 번호에 대응한다. The flow is described below. The preceding number in each row corresponds to the number added to each signal in FIG.

(1) 프로세서 코어(21a)가 프로세서 어드레스 PA, 프로세서 데이터 PD, 프로세서 전송 타입 PT를 설정하고, 버스트 전송에 의해 복수 사이즈의 데이터를 리다이렉터에 전송한다. (1) The processor core 21a sets the processor address PA, the processor data PD, and the processor transfer type PT, and transmits data of multiple sizes to the redirector by burst transfer.

(2) 프로세서 버스 브릿지(21b)의 제어 로직(21c)은 리다이렉터 기능 제어 신호 FC를 설정한다. 리다이렉터(21d)는 그에 따라, 프로세서 어드레스 PA에서 설정된 선두의 어드레스를 실효 어드레스 EA에 에코한다. 또한, 버스트 전송된 데이 터 사이즈를 카운트하고, 그로부터 실행 어드레스 마스크 EM을 산출하여 출력한다. 여기서, 실효 어드레스 마스크는, 실효 어드레스의 하위 몇 비트를 무시하는지를 나타내는 신호이다. PD에 설정된 복수 사이즈의 데이터는 리다이렉터 내부의 버퍼에 저장한다. (2) The control logic 21c of the processor bus bridge 21b sets the redirector function control signal FC. The redirector 21d accordingly echoes the head address set in the processor address PA to the effective address EA. Further, the burst transferred data size is counted, and the execution address mask EM is calculated therefrom and output. Here, the effective address mask is a signal indicating how many lower bits of the effective address are ignored. Data of a plurality of sizes set in the PD is stored in a buffer inside the redirector.

(3) 프로세서 버스 브릿지(21b)의 제어 로직(21c)은 갱신 통지 송신 신호 NS를 송신한다. (3) The control logic 21c of the processor bus bridge 21b transmits the update notification transmission signal NS.

(4) 갱신 통지 버스 브릿지(21e)의 송신부(21f)는 NS를 받아, 버스 요구 신호 NR을 송신한다. (4) The transmission unit 21f of the update notification bus bridge 21e receives NS and transmits the bus request signal NR.

(5) 갱신 통지 버스 브릿지(21e)의 송신부(21f)가 버스 허가 신호 NG를 수신하고, 갱신 통지 버스를 획득한다. (5) The transmission unit 21f of the update notification bus bridge 21e receives the bus permission signal NG to obtain an update notification bus.

(6) 갱신 통지 어드레스 NA에 EA가, 갱신 통지 어드레스 마스크 NM에 EM이 에코되고, 갱신 통지 신호 NV가 전체 프로세서에 송신된다. NA, NM, NV는 자기 프로세서의 갱신 통지 버스 브릿지 감시부(21g)에도 루프백하여 수신된다. (6) EA is echoed to update notification address NA, EM is echoed to update notification address mask NM, and update notification signal NV is transmitted to all processors. The NA, NM, and NV are also looped back to the update notification bus bridge monitoring unit 21g of the own processor and received.

(7) 갱신 통지 버스 브릿지(21e)의 감시부(21g)는 NV를 받아, NA를 갱신 설정 어드레스 SA, NM을 갱신 설정 어드레스 마스크 SM에 에코하여 갱신 통지 수신 신호 SV를 송신한다. 갱신 큐(21k)는 SV를 큐 세트 신호 QS로서 받고, SA의 내용을 큐 세트 어드레스 QSA, SM의 내용을 큐 세트 어드레스 마스크 QSM으로서 큐 레지스터(21l)에 큐잉한다. (7) The monitoring unit 21g of the update notification bus bridge 21e receives NV, echoes NA to the update setting address SA and NM to the update setting address mask SM, and transmits the update notification reception signal SV. The update queue 21k receives the SV as the cue set signal QS, and queues the contents of the SA into the queue register 21l as the queue set address QSA and the contents of the SM as the cue set address mask QSM.

(8) 프로세서 버스 브릿지(21b)의 제어 로직(21c)은, SV를 받으면, 갱신 데이터 송신 신호 US를 송신하고, 동시에 FC를 설정한다. 리다이렉터(21d)는 이에 따라서 버퍼 내에 저장한 갱신 데이터의 선두의 데이터로부터 순서대로 ED로서 설정한다. 이를 받은 데이터 채널 IF(21h)의 프레이머(21i)는 EA/EM/ED의 내용을 송신 버퍼에 큐잉한다. US의 송신 후, 프로세서 코어에 액크날리지 신호 ACK가 송신되고, 프로세서 코어측의 액세스는 완료한다. (8) When the control logic 21c of the processor bus bridge 21b receives the SV, it transmits the update data transmission signal US and sets FC at the same time. The redirector 21d sets accordingly ED as data from the head of the update data stored in the buffer in this order. The framer 21i of the data channel IF 21h receiving this queues the contents of the EA / EM / ED in the transmission buffer. After the transmission of the US, an acknowledge signal ACK is transmitted to the processor core, and the access on the processor core side is completed.

(9) 데이터 채널 IF(21h)의 프레이머(21i)에서는, 송신 버퍼에 큐잉된 데이터를 수시 패킷에 구축하고 있고, 완료한 분으로부터 SERDES(21j)에 송신 페러랠 데이터 TPD로서 송출된다. SERDES는, 이를 받아, 데이터 채널에서 반송되는 전기 신호로 변조를 행하고, 송신 시리얼 데이터 TSD로서 갱신 데이터를 송출한다. (9) In the framer 21i of the data channel IF 21h, data queued in the transmission buffer is constructed in a packet at any time, and is sent to the SERDES 21j as transmission parallel data TPD from the completion. SERDES receives this, modulates with an electrical signal carried in the data channel, and sends out update data as the transmission serial data TSD.

도 22는 본 발명의 실시 형태의 제2 양태에서, 다른 프로세서로부터 송출된 갱신 데이터 수신 시의 신호의 흐름을 설명하는 도면이다. FIG. 22 is a diagram illustrating a signal flow when receiving update data sent from another processor in the second aspect of the embodiment of the present invention. FIG.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 22의 각 신호에 붙인 번호에 대응한다. The flow is described below. The number in front of each line corresponds to the number added to each signal in FIG.

(1) 데이터 채널 IF(22h)의 SERDES(22j)가 수신 시리얼 데이터 RSD를 복조하여, 프레이머(22i)에 수신 페러랠 데이터 RPD로서 송출한다. (1) SERDES 22j of data channel IF 22h demodulates the received serial data RSD and sends it to framer 22i as received parallel data RPD.

(2) 데이터 채널 IF(22h)의 프레이머(22i)는 RPD를 받아, 데이터 중의 패킷의 추출 및 전개를 행하고, 갱신 데이터 어드레스 UA, 갱신 어드레스 마스크 UM에 설정하고, 갱신 데이터 수신 신호 UR을 송신한다. 이에 맞추어, 큐 레지스터(22l)의 큐 클리어 어드레스 QCA에 UA가 세트된다. 또한, UR 송신과 동시에 갱신 데이터를 선두 데이터로부터 순서대로 UD에 설정한다. (2) The framer 22i of the data channel IF 22h receives the RPD, extracts and expands a packet in the data, sets the update data address UA and the update address mask UM, and transmits the update data reception signal UR. . In accordance with this, the UA is set in the queue clear address QCA of the queue register 22l. At the same time as UR transmission, update data is set in the UD sequentially from the head data.

(3) 프로세서 버스 브릿지(22b)의 제어 로직(22c)은 UR를 받아, 리다이렉터 기능 제어 신호 FC를 설정한다. UA와 UD는 일단 리다이렉터 내의 버퍼에 저장하고, UA가 CA에 설정되고, UD의 선두 데이터가 CD에 설정된다. CA에 설정 제어 로직(22c)으로써 다른 처리가 행하여지고 있는 경우, 일단 대기하고, 그것이 완료하는 대로 본 처리를 실행한다. (3) The control logic 22c of the processor bus bridge 22b receives the UR and sets the redirector function control signal FC. The UA and the UD are once stored in a buffer in the redirector, the UA is set in the CA, and the head data of the UD is set in the CD. If other processing is being performed by the setting control logic 22c in the CA, the process waits once and executes this processing as soon as it is completed.

(4) 프로세서 버스 브릿지(22b)의 제어 로직(22c)은 캐쉬 라이트 신호 CW를 송신하고, 이를 받은 공유 메모리 캐쉬(22n)는 CA에서 지정되는 원하는 데이터를 CD에서 갱신한다. 계속하여 리다이렉터의 버퍼에 저장된 다음의 갱신 데이터를 CD에 설정하고, CA의 값을 하나 인크리먼트하고, 마찬가지의 캐쉬 메모리 갱신 처리를, UM의 설정값에 따라서, 버퍼 내의 갱신 데이터가 없어질 때까지 반복하여 행한다. 그 후, 큐 클리어 신호 QC를 송신하고, 이를 받은 갱신 큐(22k)는, (2)에서 세트한 QCA를 큐 레지스터(22l)로부터 클리어한다. (4) The control logic 22c of the processor bus bridge 22b transmits the cache write signal CW, and the shared memory cache 22n which receives this updates the desired data specified in the CA on the CD. Subsequently, the next update data stored in the redirector's buffer is set on the CD, the CA value is incremented by one, and the same cache memory update processing is performed according to the UM setting value, and the update data in the buffer disappears. Repeat until. Thereafter, the queue clear signal QC is transmitted, and the update queue 22k having received it clears the QCA set in (2) from the queue register 22l.

도 23은 본 발명의 실시 형태의 제3 양태에서의 갱신 통지를 생략한 라이트 액세스 시의 신호의 흐름을 설명하는 도면이다. It is a figure explaining the flow of the signal at the time of write access in which update notification was abbreviate | omitted in the 3rd aspect of embodiment of this invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 23의 각 신호에 붙인 번호에 대응한다. The flow is described below. The number in front of each line corresponds to the number added to each signal in FIG.

(1) 프로세서 코어(23a)가 프로세서 전송 타입 PT에 데이터 온리 속성을 설정하고, 프로세서 어드레스 PA, 프로세서 데이터 PD, 프로세서 라이트 신호 PW를 송신한다. (1) The processor core 23a sets a data only attribute to the processor transmission type PT and transmits a processor address PA, a processor data PD, and a processor write signal PW.

(2) 프로세서 버스 브릿지(23b)의 제어 로직(23c)은, 리다이렉터 기능 제어 신호 FC를 설정한다. 리다이렉터(23d)는 그에 따라, 프로세서 어드레스 PA를 실행 어드레스 EA에, 프로세서 데이터 PD를 실효 데이터 ED에 에코한다. (2) The control logic 23c of the processor bus bridge 23b sets the redirector function control signal FC. The redirector 23d accordingly echoes the processor address PA to the execution address EA and the processor data PD to the effective data ED.

(3) 프로세서 버스 브릿지(23b)의 제어 로직(23c)은, 데이터 온리 속성 신호 DO를 설정하고, 갱신 데이터 송신 신호 US를 송신한다. US의 송신 후 프로세서 코어에 액크날리지 신호 ACK가 송신되고, 프로세서 코어측의 액세스는 완료한다. (3) The control logic 23c of the processor bus bridge 23b sets the data only attribute signal DO and transmits the update data transmission signal US. After the transmission of the US, an acknowledge signal ACK is transmitted to the processor core, and the access on the processor core side is completed.

(4) 갱신 데이터 송신 신호 US와 데이터 온리 속성 신호 DO를 수신한 데이터 채널 IF(23h)의 프레이머(23i)는 EA/ED의 내용 및 데이터 온리 속성을 송신 버퍼에 큐잉한다. (4) The framer 23i of the data channel IF 23h receiving the update data transmission signal US and the data only attribute signal DO queues the contents and data only attributes of the EA / ED in the transmission buffer.

(5) 데이터 채널 IF(23h)의 프레이머(23i)에서는, 송신 버퍼에 큐잉된 데이터 및 속성을 수시 패킷에 구축하고 있고, 완료한 분으로부터 SERDES(23j)에 송신 페러랠 데이터 TPD로서 송출된다. SERDES는 이를 받아, 데이터 채널에서 반송할 수 있는 전기 신호로 변조를 행하여, 송신 시리얼 데이터 TSD로서 갱신 데이터를 송출한다. (5) In the framer 23i of the data channel IF 23h, data and attributes queued in the transmission buffer are constructed in a packet at any time, and are sent to the SERDES 23j as transmission parallel data TPD from the completion. SERDES receives it, modulates it with an electrical signal that can be carried in the data channel, and sends update data as the transmission serial data TSD.

도 24는 본 발명의 실시 형태의 제3 양태에서, 다른 프로세서로부터 송출된 갱신 통지를 생략한 갱신 데이터 수신 시의 신호의 흐름을 설명하는 도면이다. FIG. 24 is a diagram for explaining the flow of a signal when receiving update data in which the update notification sent from another processor is omitted in the third aspect of the embodiment of the present invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 24의 각 신호에 붙인 번호에 대응한다. The flow is described below. The number in front of each line corresponds to the number added to each signal in FIG.

(1) 데이터 채널 IF(24h)의 SERDES(24j)가 수신 시리얼 데이터 RSD를 복조하여, 프레이머(24i)에 수신 페러랠 데이터 RPD로서 송출한다. (1) SERDES 24j of data channel IF 24h demodulates the received serial data RSD and sends it to framer 24i as received parallel data RPD.

(2) 데이터 채널 IF(24h)의 프레이머(24i)는 RPD를 받아, 데이터 중의 패킷의 추출 및 전개를 행하여, 갱신 데이터 어드레스 UA, 갱신 데이터 UD, 데이터 온 리 속성 DO를 설정하고, 갱신 데이터 수신 신호 UR를 송신한다. (2) The framer 24i of the data channel IF 24h receives the RPD, extracts and expands a packet in the data, sets the update data address UA, update data UD, data only attribute DO, and receives update data. Transmit the signal UR.

(3) 프로세서 버스 브릿지(24b)의 제어 로직(24c)은 갱신 데이터 수신 신호 UR와 데이터 온리 속성 신호 DO를 받아, 리다이렉터 기능 제어 신호 FC를 설정한다. 리다이렉터(24d)는 그에 따라, UA를 캐쉬 어드레스 CA에, UD를 캐쉬 데이터 CD에 에코한다. 제어 로직(24c)으로써 다른 처리가 행하여지고 있는 경우, 일단 대기하고, 그것이 완료하는 대로 본 처리를 실행한다. (3) The control logic 24c of the processor bus bridge 24b receives the update data received signal UR and the data only attribute signal DO to set the redirector function control signal FC. The redirector 24d accordingly echoes the UA to the cache address CA and the UD to the cache data CD. If other processing is being performed by the control logic 24c, it waits once and executes this processing as soon as it completes.

(4) 프로세서 버스 브릿지(24b)의 제어 로직(24c)은 캐쉬 라이트 신호 CW를 송신하고, 이를 받은 공유 메모리 캐쉬(24n)는 CA에서 지정되는 원하는 데이터를 CD에서 갱신한다. (4) The control logic 24c of the processor bus bridge 24b transmits the cache write signal CW, and the shared memory cache 24n receiving this updates the desired data specified in the CA on the CD.

도 25는 본 발명의 실시 형태의 제2 양태에서의 캐쉬필 동작에 있어서, 시스템에 증설된 프로세서가 전체 데이터 송신 요구를 행할 때의 신호의 흐름을 설명하는 도면이다. It is a figure explaining the flow of the signal at the time of the processor extended in the system making the whole data transmission request in the cache fill operation | movement in 2nd aspect of embodiment of this invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 25의 각 신호에 붙인 번호에 대응한다. The flow is described below. The number in front of each row corresponds to the number attached to each signal in FIG.

(1) 프로세서 버스 브릿지(25b)의 제어 로직(25c)은, 자기 프로세서가 디스템에 증설된 것을 검지하면, 전체 데이터 송신 요구 신호로서 RS와 IS를 동시에 송신한다. (1) The control logic 25c of the processor bus bridge 25b transmits RS and IS simultaneously as all data transmission request signals when it detects that its own processor is added to the system.

(2) 갱신 통지 버스 브릿지(25e)의 송신부(25f)는 RS와 IS를 받아, 버스 요구 신호 NR을 송신한다. (2) The transmission unit 25f of the update notification bus bridge 25e receives RS and IS and transmits a bus request signal NR.

(3) 갱신 통지 버스 브릿지(25e)의 송신부(25f)가 버스 허가 신호 NG를 수신 하고, 갱신 통지 버스를 획득한다. (3) The transmission unit 25f of the update notification bus bridge 25e receives the bus permission signal NG to obtain an update notification bus.

(4) 갱신 통지 버스 브릿지(25e)의 송신부(25f)는 RV와 NI를 동시에 송신한다. (4) The transmitter 25f of the update notification bus bridge 25e simultaneously transmits the RV and NI.

도 26은 본 발명의 실시 형태의 제4 양태에서의 캐쉬필 동작에 있어서, 마스터 프로세서가 전체 데이터 송신 요구에 응답하여 전체 데이터 송신을 행할 때의 신호의 흐름을 설명하는 도면이다. It is a figure explaining the flow of the signal at the time of a master processor performing whole data transmission in response to a whole data transmission request in the cache fill operation | movement in 4th aspect of embodiment of this invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는, 도 26의 각 신호에 붙인 번호에 대응한다. The flow is described below. The preceding number of each line corresponds to the number attached to each signal of FIG.

(1) 마스터 프로세서의 갱신 통지 버스 브릿지(26e)의 감시부(26g)는, RV와 동시에 N1을 수신하면, SR과 SI를 동시에 송신한다. (1) When the monitoring unit 26g of the update notification bus bridge 26e of the master processor receives N1 at the same time as the RV, the monitoring unit 26g simultaneously transmits the SR and the SI.

(2) 프로세서 버스 브릿지(26b)의 제어 로직(26c)은 SR와 SI를 동시에 수신하면, 전체 데이터 송신 요구 신호로 해석하여, 공유 메모리 공간의 선두의 어드레스를 송신 개시 어드레스, 및, 다음 송신 어드레스로서 기억한다. (2) When the control logic 26c of the processor bus bridge 26b receives SR and SI at the same time, the control logic 26c interprets it as a total data transmission request signal and interprets the head address of the shared memory space as the transmission start address and the next transmission address. Remember as.

(3) 시스템에 다른 프로세서가 증설되어, 마스터 프로세서의 제어 로직(26c)이, 다시 전체 데이터 요구 신호를 수신한 경우, 제어 로직(26c)은 먼저 기억한 다음 송신 어드레스를 송신 개시 어드레스로서 기억한다. (3) When another processor is added to the system and the control logic 26c of the master processor receives the entire data request signal again, the control logic 26c first stores the transmission address and then stores the transmission address as the transmission start address. .

(4) 제어 로직(26c)은 큐 엠프티 신호 QE가 유효하고, 또한, 그 외에 요구되고 있는 처리가 없을 때, 리다이렉터 기능 제어 신호 FC를 설정하고, 리다이렉터(26d)는 먼저 기억한 다음 송신 어드레스를 캐쉬 어드레스 CA에 설정하고, 제어 로직(26c)은 캐쉬 리드 신호 CR을 송신한다. (4) The control logic 26c sets the redirector function control signal FC when the queue empty signal QE is valid and no other processing is required, and the redirector 26d stores the transmission address first and then transmits the address. Is set in the cache address CA, and the control logic 26c transmits the cache read signal CR.

(5) 공유 메모리 캐쉬(26n)는 CR을 받아, CA에서 지정된 캐쉬 상의 데이터를 캐쉬 데이터 CD에 송신한다. (5) The shared memory cache 26n receives the CR and transmits the data on the cache designated by the CA to the cache data CD.

(6) 프로세서 버스 브릿지(26b)의 리다이렉터(26d)는, 먼저 설정한 CA를 실효 어드레스 EA에도 설정하고, CD를 실효 데이터 ED에 에코한다. 제어 로직(26c)은 데이터 온리 속성 DO를 설정하고, 갱신 데이터 송신 신호 US를 송신한다. 이를 받은 데이터 채널 IF(26h)의 프레이머(26i)는 EA/ED의 내용 및 데이터 온리 속성을 송신 버퍼에 큐잉한다. (6) The redirector 26d of the processor bus bridge 26b sets the previously set CA to the effective address EA and echoes the CD to the effective data ED. The control logic 26c sets the data only attribute DO and transmits the update data transmission signal US. The framer 26i of the data channel IF 26h receiving this queues the contents and data only attributes of the EA / ED in the transmission buffer.

(7) 프로세서 버스 브릿지(26b)의 제어 로직(26c)은, 송신한 어드레스의 다음의 어드레스를 다음 송신 어드레스로서 기억한다. 송신한 어드레스가 공유 메모리 공간의 최후의 어드레스에 달한 경우에는, 공유 메모리 공간의 선두의 어드레스를 다음 송신 어드레스로서 기억한다. 다음 송신 어드레스가 먼저 기억한 송신 개시 어드레스와 일치한 경우, 전체 데이터 송신을 종료한다. (7) The control logic 26c of the processor bus bridge 26b stores the address following the transmitted address as the next transmission address. When the transmitted address reaches the last address of the shared memory space, the address of the head of the shared memory space is stored as the next transmission address. If the next transmission address matches the previously stored transmission start address, all data transmission ends.

(8) (3)∼(7)의 수순을 반복하여, 순차적으로 데이터를 송출한다. (8) The procedure of (3)-(7) is repeated, and data is sent sequentially.

(9) 데이터 채널 IF(26h)의 프레이머(26i)에서는, 송신 버퍼에 큐된 데이터를 수시 패킷에 구축하고 있고, 완료된 분으로부터 SERDES(26j)에 송신 페러랠 데이터 TPD로서 송출된다. SERDES는 이를 받아, 데이터 채널에서 반송할 수 있는 전기 신호로 변조를 행하여, 송신 시리얼 데이터 TSD로서 데이터를 송출한다. (9) In the framer 26i of the data channel IF 26h, data queued in the transmission buffer is constructed in a packet at any time, and is sent to the SERDES 26j as transmission parallel data TPD from the completed one. SERDES receives it, modulates it with an electrical signal that can be carried in the data channel, and sends out data as a transmission serial data TSD.

도 27은 본 발명의 실시 형태의 제4 양태에서의 캐쉬필 동작에 있어서, 시스템에 증설된 프로세서가 전체 데이터 수신을 행할 때의 신호의 흐름을 설명하는 도면이다. It is a figure explaining the flow of the signal at the time of the processor extended in the system to receive the whole data in the cache fill operation | movement in 4th aspect of embodiment of this invention.

그 흐름을 이하에 기재한다. 각 행의 앞의 번호는 도 27의 각 신호에 붙인 번호에 대응한다. The flow is described below. The number in front of each line corresponds to the number added to each signal in FIG.

(1) 전체 데이터 수신 동작 중에 제어 로직(27c)이, 프로세서 리드 신호 PR, 또는, 프로세서 라이트 신호 PW를 수신한 경우, 제어 로직(27c)은, 이 요구를 보류한다. 전체 데이터 수신 동작 중이어도, 갱신 큐에의 큐잉, 클리어는 각각 도 16,도 17에서 도시되는 흐름으로 행한다. (1) When the control logic 27c receives the processor read signal PR or the processor write signal PW during the entire data reception operation, the control logic 27c holds this request. Even during the entire data reception operation, queuing and clearing to the update queue are performed in the flows shown in Figs. 16 and 17, respectively.

(2) 데이터 채널 IF(27h)의 SERDES(27j)가 수신 시리얼 데이터 RSD를 복조하여, 프레이머(27i)에 수신 페러랠 데이터 RPD로서 송출한다. (2) The SERDES 27j of the data channel IF 27h demodulates the received serial data RSD and sends it to the framer 27i as a received parallel data RPD.

(3) 데이터 채널 IF(27h)의 프레이머(27i)는 RPD를 받아, 데이터 중의 패킷의 추출 및 전개를 행하고, 갱신 데이터 어드레스 UA, 갱신 데이터 UD, 데이터 온리 속성 DO를 설정하고, 갱신 데이터 수신 신호 UR를 송신한다. (3) The framer 27i of the data channel IF 27h receives the RPD, extracts and expands a packet in the data, sets the update data address UA, update data UD, data only attribute DO, and updates data reception signal. Send UR.

(4) 프로세서 버스 브릿지(27b)의 제어 로직(27c)은 UR을 받아, 리다이렉터 기능 제어 신호 FC를 설정한다. 리다이렉터(27d)는, 그에 따라, UA를 캐쉬 어드레스 CA에, UD를 캐쉬 데이터 CD에 에코한다. 제어 로직(27c)에서 다른 처리가 행하여지고 있는 경우, 일단 대기하고, 그것이 완료하는 대로 처리를 실행한다. (4) The control logic 27c of the processor bus bridge 27b receives the UR and sets the redirector function control signal FC. The redirector 27d echoes the UA to the cache address CA and the UD to the cache data CD accordingly. If other processing is being performed in the control logic 27c, it waits once and executes the processing as it completes.

(5) 프로세서 버스 브릿지(27b)의 제어 로직(27c)은 캐쉬 라이트 신호 CW를 송신한다. 데이터 온리 속성 DO를 수신하고 있기 때문에, 큐 클리어 신호 QC는 송신하지 않는다. (5) The control logic 27c of the processor bus bridge 27b transmits the cache write signal CW. Since the data only attribute DO is received, the queue clear signal QC is not transmitted.

(6) 캐쉬 라이트 신호 CW를 받은 공유 메모리 캐쉬(27n)는, CA 및 CD에서 지정되는 원하는 데이터를 갱신하고, 갱신 전의 상태에서 해당 데이터가 이용 불가능 한 상태인 경우, 이용 불가능 신호 NP를 송신한다. (6) The shared memory cache 27n receiving the cache write signal CW updates the desired data specified in the CA and CD, and transmits an unavailable signal NP when the data is not available in the state before the update. .

(7) 프로세서 버스 브릿지(27b)의 제어 로직(27c)은, 전체 데이터 수신 동작 중에 이용 불능 신호 NP를 수신한 횟수를 계측함으로써, 공유 메모리 캐쉬의 전체 영역이 유효 데이터로 필된 것을 인식하면, 전체 데이터 수신 동작을 종료한다. (7) The control logic 27c of the processor bus bridge 27b measures the number of times the unusable signal NP is received during the entire data reception operation to recognize that the entire area of the shared memory cache is filled with valid data. The data receiving operation ends.

(8) 전체 데이터 수신 동작이 종료하였을 때, 보류되어 있는 프로세서 리드 신호 PR, 또는, 프로세서 라이트 신호 PW가 있는 경우에는, 그 동작을 개시한다. (8) When there is a processor read signal PR or a processor write signal PW held when the entire data reception operation is completed, the operation is started.

도 28은 본 발명의 실시 형태의 제5 양태에 기초한 라이트 액세스 시의 신호의 흐름을 설명하는 도면이다. It is a figure explaining the flow of the signal at the time of light access based on the 5th aspect of embodiment of this invention.

그 흐름을 이하에 기재한다.The flow is described below.

(1) 프로세서 코어(28a)가 PA, PD, PT를 설정하여 PW를 송신. (1) The processor core 28a sets the PA, PD and PT to transmit the PW.

(2) 프로세서 버스 브릿지(28b)의 제어 로직(28c)은 리다이렉터 기능 제어 신호 FC를 설정한다. 리다이렉터(28d)는 그에 따라, 프로세서 어드레스 PA를 실효 어드레스 EA와 캐쉬 어드레스 CA에, 프로세서 데이터 PD를 실효 데이터 ED 및 캐쉬 데이터 CD에 에코한다. (2) The control logic 28c of the processor bus bridge 28b sets the redirector function control signal FC. The redirector 28d accordingly echoes the processor address PA to the effective address EA and the cache address CA and the processor data PD to the effective data ED and the cache data CD.

(3) 프로세서 버스 브릿지(28b)의 제어 로직(28c)은 갱신 통지 송신 신호 NS를 송신한다. 아울러, PA가 규정된 어드레스 공간에 있는 경우에 즉시 갱신 속성 송신 신호 IS를 송신한다. (3) The control logic 28c of the processor bus bridge 28b transmits the update notification transmission signal NS. In addition, when the PA is in the specified address space, it immediately sends the update attribute transmission signal IS.

(4) 갱신 통지 버스 브릿지(28e)의 송신부(28f)는 NS를 받아, NR을 송신한다. (4) The transmission unit 28f of the update notification bus bridge 28e receives NS and transmits NR.

(5) 갱신 버스 브릿지(28e)의 송신부(28f)가 NG를 수신하고, 갱신 통지 버스 를 획득한다. (5) The transmitting unit 28f of the update bus bridge 28e receives the NG and obtains an update notification bus.

(6) 갱신 통지 어드레스 NA에 EA가, 즉시 갱신 속성 신호 NI에 IS가, 즉시 갱신 데이터 ND에 ED가 각각 에코되어, 갱신 통지 신호 NV가 전체 프로세서에 송신된다. NA, ND, NV, NI는 자기 프로세서의 갱신 통지 버스 브릿지 감시부(28g)에도 루프백하여 수신된다. (6) EA is echoed to update notification address NA, IS to immediate update attribute signal NI, and ED to immediate update data ND, respectively, and update notification signal NV is transmitted to all processors. The NA, ND, NV, and NI are also looped back to the update notification bus bridge monitoring unit 28g of the magnetic processor.

(7) 갱신 통지 버스 브릿지(28e)의 감시부(28g)는, NV를 NI와 함께 수신하면, 즉시 갱신 신호 SI로서 자기 프로세서 내에 에코한다. 다른 프로세서 상에서도 동일한 동작이 행해진다. (7) When the monitoring unit 28g of the update notification bus bridge 28e receives the NV together with the NI, it immediately echoes into the self processor as the update signal SI. The same operation is performed on other processors.

(8) 프로세서 버스 브릿지(28b)의 제어 로직(28c)은, 리다이렉터 기능 제어 신호 FC를 설정한다. 리다이렉터(28d)는 그에 따라, SA를 CA에, SD를 CD에 에코한다. 다른 프로세서 상에서도 동일한 동작이 행해진다. 이 때, 프로세서 버스 브릿지(28b)가 다른 처리를 행하고 있는 경우, 그 처리의 완료 후에 이 처리를 최우선으로 행한다. (8) The control logic 28c of the processor bus bridge 28b sets the redirector function control signal FC. The redirector 28d accordingly echoes the SA to the CA and the SD to the CD. The same operation is performed on other processors. At this time, when the processor bus bridge 28b is performing other processing, this processing is prioritized after completion of the processing.

(9) 프로세서 버스 브릿지(28b)의 제어 로직(28c)은, 캐쉬 라이트 신호 CW를 송신하고, 이를 받은 공유 메모리 캐쉬(28n)는 CA에서 지정되는 원하는 데이터를 CD에서 갱신한다. 다른 프로세서 상에서도 동일한 동작이 행해진다. (9) The control logic 28c of the processor bus bridge 28b transmits the cache write signal CW, and the shared memory cache 28n which receives this updates the desired data specified in the CA on the CD. The same operation is performed on other processors.

(10) 프로세서 코어에 ACK가 송신되고, 프로세서 코어측의 액세스가 완료한다. (10) An ACK is sent to the processor core, and access on the processor core side is completed.

본 발명의 실시 형태의 제6 양태에 기초한 라이트 액세스는, 특정 어드레스에의 라이트 시에 예약 데이터를 이용하는 것으로, 그 흐름은 제5 양태에서의 라이 트 액세스에 거의 준한다. 이하의 점이 차이이다. The write access based on the sixth aspect of the embodiment of the present invention uses reservation data when writing to a specific address, and the flow is almost similar to the write access in the fifth aspect. The following points are differences.

(8) 프로세서 버스 브릿지(28b)의 리다이렉터(28d)는, SA가 그 액세스에 예약 데이터를 이용하는 특정 어드레스로 해석되는 경우, SD를 무시하고, SA에 대응하는 예약 데이터를 생성하여 CD에 출력한다. (8) The redirector 28d of the processor bus bridge 28b ignores the SD and generates the reserved data corresponding to the SA and outputs it to the CD when the SA is interpreted as a specific address using the reserved data for the access. .

이상과 같이, 공유 메모리 캐쉬를 구비하는 프로세서에 의해 구성되는 공유 메모리형 멀티프로세서 시스템에 있어서, 본 발명의 적용에 의해 코히어런시 보증에 필요한 시간과, 데이터 전송에 필요한 시간이 명확하게 분리되어, 공유 메모리 공간의 액세스에 있어서 종래 기술에 존재한 문제가 이하의 점에서 해결된다.As described above, in the shared memory type multiprocessor system constituted by a processor having a shared memory cache, the time required for coherency guarantee and the time required for data transmission are clearly separated by the application of the present invention. The problem existing in the prior art in accessing the shared memory space is solved in the following points.

·버스 점유 시간의 최소화와, 불필요한 레이턴시 증대 요인의 배제 Minimize bus occupancy time and eliminate unnecessary latency increases

·데이터 전송 경로의 레이턴시 은폐와, 이에 의한 대역 확대의 용이화Conceal latency in data transmission paths, thereby facilitating bandwidth expansion

이로써, 공유 메모리 캐쉬의 고속성을 최대한으로 활용할 수 있게 되고, 공유 메모리 공간 액세스의 대역과 레이턴시의 쌍방이 개선되어, 시스템의 처리 능력 향상에 기여할 수 있다. As a result, the high speed of the shared memory cache can be utilized to the maximum, and both the bandwidth and the latency of the shared memory space access can be improved, thereby contributing to the improvement of the processing capacity of the system.

Claims

A multiprocessor system in which a plurality of processors each having a shared memory cache and at least one shared memory are coupled to each other,

Dedicated line means for exclusively transmitting and receiving data between the processor and the shared memory for updating the data in the shared memory area;

Global bus means that the license is adjusted by the arbiter to send data update notifications to each processor.

Including;

Sending the update notification of the data from the processor and the data to be used for the update are independently performed. In each processor and the shared memory, by receiving the update notification, access to the address indicated by the update notification is restricted. And access to the address after the data of the address of the shared memory area is updated by the data to be used for the update arriving at each processor and the shared memory.

The method of claim 1,

The dedicated line means includes repeater means for connecting a line from the processor to the shared memory.

The method of claim 2,

The dedicated line means comprises a dedicated line provided in each of the plurality of processors.

The method of claim 1,

And updating the plurality of update data units in a single update by associating a plurality of update data with the update notification.

The method of claim 4, wherein

In the update notification, the size of data used for updating in one update is variable.

The method of claim 1,

Update of data in the shared memory space that does not require maintenance of cache coherency is performed by sending update data to an address of data that does not require maintenance of the cache coherency without transmitting the update notification. And a multiprocessor system.

The method of claim 1,

And when a new processor is added to the multiprocessor system, transferring the contents of the shared memory cache of another processor to the shared memory cache of the processor, and thereafter operating the new processor.

The method of claim 1,

And means for transmitting update notification and data for update using said global bus means, and for updating said shared memory area.

The method of claim 1,

For access to a specific address in the shared memory area, only the update notification is transmitted and received, and the processor or shared memory which has received the update notification updates the address using predetermined data. Processor system.

A method of accelerating memory access in a multiprocessor system in which a plurality of processors each having a shared memory cache and shared memory are coupled to each other, the method comprising:

In the updating of data in the shared memory area, transmitting and receiving data to be used for the update via a dedicated line dedicated to send and receive data between the processor and the shared memory;

Sending and receiving update notification of data to each processor through a global bus whose license is adjusted by the arbiter,

Sending the update notification of the data from the processor and the data to be used for the update are independently performed. In each processor and the shared memory, by receiving the update notification, access to the address indicated by the update notification is restricted. Authorizing access to the address after the data of the address in the shared memory area is updated by the data to be used for the update arriving at each processor and the shared memory;

Method comprising a.