KR20010080958A

KR20010080958A - Concurrent processing for event-based systems

Info

Publication number: KR20010080958A
Application number: KR1020017005796A
Authority: KR
Inventors: 홀름베르크퍼안데르스; 클링라르스-외르얀; 욘슨스텐에드바르트; 소호니밀린드; 티케카닉힐
Original assignee: 에를링 블로메, 타게 뢰브그렌; 텔레폰아크티에볼라게트 엘엠 에릭슨
Priority date: 1998-11-16
Filing date: 1999-11-12
Publication date: 2001-08-25
Also published as: AU1437300A; JP2002530737A; CA2350922A1; JP4489958B2; BR9915363B1; WO2000029942A1; BR9915363A; KR100401443B1; EP1131703A1; CA2350922C

Abstract

본 발명에 따르면, 다중 공유-메모리 프로세서(11)가 계층적 분산 처리 시스템(1)의 가장 높은 레벨에 삽입되며, 프로세서의 이용은 상기 시스템에서 식별된 병행 이벤트 흐름을 기반으로 하여 극대화된다. 제 1국면에 있어서, 이벤트의 소위 비-교환 카테고리(NCC)가 다중 프로세서(11)에 매핑되어 병행 실행된다. 본 발명의 제 2국면에 있어서, 프로세서(11)는 멀티프로세서 파이프라인으로 동작하는데, 여기서 파이프라인에 도달하는 각 이벤트는 파이프라인의 상이한 단에서 실행되는 일련의 내부 이벤트로 슬라이스에서 처리된다. 일반적인 처리 구조는 행렬 처리라는 것에 의해 얻어진다. 이 경우, 비-교환 카테고리는 각기 다른 프로세서 집합에 의해 실행되며, 적어도 하나의 프로세서 집합은 외부 이벤트가 파이프라인의 상이한 프로세서 단의 슬라이스에서 처리되는 멀티프로세서 파이프라인으로 동작한다.According to the invention, multiple shared-memory processors 11 are inserted at the highest level of the hierarchical distributed processing system 1, and the use of the processor is maximized based on the parallel event flow identified in the system. In the first phase, a so-called non-switching category (NCC) of events is mapped to multiple processors 11 and executed in parallel. In a second aspect of the invention, the processor 11 operates in a multiprocessor pipeline, where each event reaching the pipeline is processed in a slice into a series of internal events that are executed at different stages of the pipeline. The general processing structure is obtained by what is called matrix processing. In this case, the non-switched categories are executed by different processor sets, with at least one processor set operating as a multiprocessor pipeline in which external events are processed in slices of different processor stages in the pipeline.

Description

Parallel processing of event-based systems {CONCURRENT PROCESSING FOR EVENT-BASED SYSTEMS}

컴퓨터적인 관점에서, 이벤트를 기반으로한 다수의 시스템은 계층적 분산 처리 시스템으로 구성된다. 예컨대, 근대 전기 통신 및 데이터 통신망에 있어서, 각 네트워크 노드는 보통 네트워크로부터의 이벤트를 처리하는 프로세서 계층을 포함한다. 일반적으로, 계층내의 프로세서는 메시지 전달을 이용하여 통신하고, 프로세서 계층의 하위 레벨에 있는 프로세서는 보다 단순한 낮은 레벨의 서브-태스크(sub-task) 처리를 수행하며, 계층의 상위 레벨에 있는 프로세서는 더 복잡한 높은 레벨의 태스크 처리를 수행한다.From a computer point of view, many systems based on events consist of hierarchical distributed processing systems. For example, in modern telecommunications and data communications networks, each network node typically includes a processor layer that processes events from the network. In general, processors in a hierarchy communicate using message delivery, and processors at lower levels of the processor hierarchy perform simpler, lower levels of sub-task processing, and processors at higher levels of the hierarchy. Perform more complex high level task processing.

상기 계층적 구조는 이미 어떤 고유의 병행 처리 장치(harnessing)를 제시하지만, 시간 단위당 처리될 이벤트의 수가 증가함에 따라, 성능을 더욱 증가시키는데 높은 레벨의 프로세서 계층은 병목현상(bottleneck)으로 된다. 예컨대, 프로세서 계층이 "트리(tree)" 구조로 구현된다면, 계층의 가장 높은 레벨에 있는 프로세서가 최초 병목현상으로 된다.The hierarchical structure already presents some inherent parallelism, but as the number of events to be processed per unit of time increases, the higher level processor tier becomes a bottleneck to further increase performance. For example, if the processor layer is implemented in a "tree" structure, the processor at the highest level of the layer becomes the initial bottleneck.

이러한 문제점을 완화시키기 위한 통상적인 접근법은 주로 더욱 높은 프로세서 클록 주파수, 더욱 빠른 메모리, 및 명령 파이프라인 기법(instruction pipelining)을 이용하는 것에 의존한다.Conventional approaches to alleviate this problem mainly rely on using higher processor clock frequencies, faster memory, and instruction pipelining.

Uchida 등에게 허여된 미합중국 특허 제 5,239,539 호에는, 복수의 호출 프로세서 간에 균일하게 부하를 분산함으로써 ATM 교환기의 스위칭 네트워크를 제어하는 제어기가 개시되어 있다. 주프로세서는 발생된 호출 처리를 호출 발생 시퀀스로 또는 각 호출 셀에 부여된 채널 식별자를 이용하여 프로세서에 할당한다. 스위칭 상태 제어기가 스위칭 네트워크 내의 복수의 버퍼에 대한 이용 정보를 수집하며, 호출 프로세서가 스위칭 상태 제어기의 내용에 따라 호출 처리를 수행한다.U.S. Patent No. 5,239,539 to Uchida et al. Discloses a controller for controlling a switching network of an ATM switch by spreading the load evenly among a plurality of call processors. The main processor assigns the generated call processing to the processor either in a call generation sequence or using a channel identifier assigned to each call cell. The switching state controller collects usage information for the plurality of buffers in the switching network, and the call processor performs call processing according to the contents of the switching state controller.

일본 특허 요약 JP 6276198 호에는 복수의 프로세서 유닛(unit)이 구비된 패킷 스위치가 개시되어 있으며, 패킷의 스위칭 처리는 유닛이 상호 독립적이도록 수행된다.Japanese Patent Summary JP 6276198 discloses a packet switch provided with a plurality of processor units, and the switching processing of the packets is performed so that the units are mutually independent.

일본 특허 요약 JP 4100449 A 에는, ATM 채널을 STM-멀티플렉싱함으로써 ATM 교환기와 신호 프로세서 배열(signaling processor array:SPA) 사이에 신호 셀을 분산시키는 ATM 통신 시스템이 개시되어 있다. 처리 부하를 분산시키는 것은, 라우팅 태그 가산기(routing tag adder)에 의해 각 가상 채널에 추가된 SPA 번호에 따라 STM을 이용하여 신호 셀을 스위칭 함으로써 실현된다.Japanese Patent Summary JP 4100449 A discloses an ATM communication system that distributes signal cells between an ATM switch and a signaling processor array (SPA) by STM-multiplexing an ATM channel. Distributing the processing load is realized by switching signal cells using the STM according to the SPA number added to each virtual channel by a routing tag adder.

일본 특허 요약 JP 5274279 호에는, 프로세서 요소 그룹이 병렬/파이프라인처리를 담당하는 계층적 프로세서 집합 형태의 병렬 처리 장치가 개시되어 있다.Japanese Patent Summary JP 5274279 discloses a parallel processing apparatus in the form of a hierarchical processor set in which a group of processor elements is responsible for parallel / pipeline processing.

본 발명은 일반적으로 이벤트를 기반으로한 처리 시스템(event-based processing system)에 관한 것으로서, 특히 계층적 분산 처리 시스템(hierarchical distributed processing system) 및 상기 처리 시스템에서의 처리 방법에 관한 것이다.BACKGROUND OF THE INVENTION The present invention generally relates to event-based processing systems, and more particularly to hierarchical distributed processing systems and processing methods in such processing systems.

도 1은 본 발명에 따른 높은 레벨의 프로세서 노드를 가진 계층적 분산 처리 시스템의 개요도.1 is a schematic diagram of a hierarchical distributed processing system having a high level processor node in accordance with the present invention;

도 2는 본 발명의 제 1국면에 따른 처리 시스템의 개요도.2 is a schematic diagram of a processing system according to a first aspect of the invention;

도 3은 본 발명의 제 1국면에 따른 소정의 처리 시스템 구현을 나타내는 도면.3 illustrates an implementation of a given processing system according to a first aspect of the invention.

도 4는 객체-지향 설계의 공유-메모리 소프트웨어를 가진 간략한 공유-메모리 멀티프로세서의 개요도.4 is a schematic diagram of a simplified shared-memory multiprocessor with shared-memory software of an object-oriented design.

도 5a는 본 발명의 제 2국면에 따른 특히 이로운 처리 시스템의 개요도.5A is a schematic representation of a particularly advantageous treatment system according to a second aspect of the present invention.

도 5b는 본 발명의 제 2국면에 따른 멀티프로세서 파이프라인을 나타내는 도면.5B illustrates a multiprocessor pipeline in accordance with a second aspect of the present invention.

도 6은 블록/객체의 로킹을 이용하여 데이터 일치를 보장하는 것을 나타내는 도면.6 illustrates ensuring data matching using locking of blocks / objects.

도 7은 변수 마킹을 이용하여 액세스 충돌을 검출하는 것을 나타내는 도면.7 illustrates detecting access conflicts using variable markings.

도 8a는 계층화된 관점에서 종래의 단일-프로세서 시스템을 나타내는 도면.8A illustrates a conventional single-processor system from a layered perspective.

도 8b는 계층화된 관점에서 멀티프로세서 시스템을 나타내는 도면.8B illustrates a multiprocessor system from a layered perspective.

도 9는 본 발명에 따른 적어도 하나의 처리 시스템이 구현되는 통신 시스템의 개요도.9 is a schematic diagram of a communication system in which at least one processing system according to the present invention is implemented.

본 발명의 목적은 이벤트를 기반으로한 계층적 분산 처리 시스템의 처리량을 증가시키는 것이다. 특히, 계층적 시스템의 높은 레벨의 프로세서 노드에 의해 형성된 병목현상을 완화시키는 것이 바람직하다.It is an object of the present invention to increase the throughput of an event based hierarchical distributed processing system. In particular, it is desirable to alleviate bottlenecks formed by high level processor nodes in hierarchical systems.

본 발명의 다른 목적은, 시스템 내에 식별된 이벤트 흐름 병행 처리를 기반으로 하여 이벤트를 효과적으로 처리할 수 있으며 높은 레벨의 프로세서 노드로 동작하는 것이 바람직하지만 반드시 그렇게 할 필요는 없는 처리 시스템을 제공하는 것이다.It is another object of the present invention to provide a processing system that can effectively process events based on the event flow parallel processing identified within the system and that it is desirable to operate as a high level processor node, but not necessarily.

본 발명의 다른 목적은, 현재의 응용 소프트웨어의 재이용을 여전히 허용하는 한편 이벤트 흐름에서 병행 처리를 이용할 수 있는 처리 시스템을 제공하는 것이다.It is a further object of the present invention to provide a processing system which can still use the concurrent processing in the event flow while still allowing reuse of current application software.

본 발명의 다른 목적은 계층적 분산 처리 시스템에서 이벤트를 효과적으로 처리하는 방법을 제공하는 것이다.Another object of the present invention is to provide a method for effectively processing an event in a hierarchical distributed processing system.

상기 및 그 밖의 목적은 첨부한 특허 청구 범위에 의해 정해진 바와 같은 본 발명에 의해 충족된다.These and other objects are met by the present invention as defined by the appended claims.

본 발명에 따른 일반적인 개념은, 계층적 분산 처리 시스템의 가장 높은 레벨에 다중 공유-메모리 프로세서를 삽입하여, 시스템 내에 식별된 병행 처리 이벤트 흐름을 기반으로 다중 프로세서의 이용을 극대화하는 것이다.A general concept according to the present invention is to insert multiple shared-memory processors at the highest level of a hierarchical distributed processing system to maximize the use of multiple processors based on the identified parallel processing event flow within the system.

본 발명의 제 1국면에 있어서, 외부 이벤트 흐름은 비-교환 카테고리(non-commuting category)라는 병행 카테고리로 분리되며, 그 다음 상기 비-교환 카테고리가 병행 실행을 위해 다중 프로세서로 매핑된다. 비-교환 카테고리는 일반적으로, 카테고리 내에서 이벤트의 순서가 유지되어야 하지만 카테고리간에는 순서화 조건이 전혀 없는 이벤트 그룹이다. 예컨대, 비-교환 카테고리는 시스템에 접속된 소정의 입력 포트, 영역(regional) 프로세서, 또는 하드웨어 장치와 같은 규정된 소스에 의해 발생되는 이벤트에 의해 정해질 수 있다. 이벤트의 비-교환 카테고리 각각은 규정된 하나 이상의 프로세서 집합에 할당되며, 규정된 프로세서 집합에 의해 발생된 내부 이벤트는 동일한 프로세서 집합으로 피드백되어 상기 프로세서 집합에 할당된 비-교환 카테고리를 유지한다.In a first aspect of the invention, the external event flow is divided into parallel categories called non-commuting categories, which are then mapped to multiple processors for parallel execution. Non-exchange categories are generally groups of events in which the order of events within a category should be maintained but no ordering conditions between categories. For example, a non-switched category can be defined by an event generated by a defined source, such as a given input port, regional processor, or hardware device connected to the system. Each non-switched category of events is assigned to one or more defined processor sets, and internal events generated by the defined processor set are fed back to the same processor set to maintain the non-switched categories assigned to the processor set.

본 발명의 제 2국면에 있어서, 다중 프로세서는 다수의 프로세서 단(stage)을 가진 멀티프로세서 파이프라인으로 동작하는데, 이 경우 파이프라인에 도달하는 각각의 외부 이벤트는 상이한 파이프라인 단에서 실행되는 일련의 내부 이벤트로서 슬라이스(slice)에서 처리된다. 일반적으로, 각 파이프라인 단은 한 프로세서에서 실행되지만, 소정의 프로세서가 하나 보다 많은 파이프라인 단을 실행할 수도 있다. 멀티프로세서 파이프라인을 실현하는 특히 이로운 방법은 공유 메모리 소프트웨어내의 소프트웨어 블록/클래스 클러스터(cluster)를 각 프로세서에 할당한 다음(여기서 각 이벤트는 특정 블록으로 목표가 정해짐), 상기 할당한 것에 따라 프로세서에 이벤트를 분산한다.In a second aspect of the invention, a multiprocessor operates in a multiprocessor pipeline with multiple processor stages, where each external event arriving at the pipeline is executed in a series of different pipeline stages. Processed on a slice as an internal event. In general, each pipeline stage runs on one processor, but a given processor may execute more than one pipeline stage. A particularly advantageous way of realizing a multiprocessor pipeline is to assign a software block / class cluster in shared memory software to each processor (where each event is targeted to a specific block) and then to the processor according to the assignment. Distribute the event.

일반적인 처리 구조는 행렬(matrix) 처리라고 하는 것에 의해 얻어지는데, 여기서는, 비-교환 카테고리가 상이한 프로세서 집합에 의해 실행되며, 적어도 하나의 프로세서 집합은 외부 이벤트가 파이프라인의 상이한 프로세서 단내의 슬라이스에서 처리되는 멀티프로세서 파이프라인으로 동작하는 프로세서 배열 형태이다.The general processing structure is obtained by what is called matrix processing, where the non-switching categories are executed by different processor sets, where at least one processor set is processed in slices within different processor stages of the pipeline. It is a form of processor array that operates as a multiprocessor pipeline.

공유 메모리 시스템에서, 전체 응용 프로그램과 데이터는 시스템내의 모든 공유-메모리 프로세서에 액세스가 가능하다. 따라서, 프로세서에 의해 글로벌 데이터(global data)가 처리될 때 데이터 일치(data consistency)가 보장되어야 한다.In a shared memory system, entire applications and data are accessible to all shared-memory processors in the system. Therefore, data consistency must be ensured when global data is processed by the processor.

본 발명에 따르면, 이벤트에 응하여 실행되는 소프트웨어 태스크에 의해 사용될 글로벌 데이터를 로킹(locking)하거나, 객체-지향 소프트웨어 설계인 경우 전체 소프트웨어 블록/객체를 로킹 함으로써 데이터 일치가 보장될 수 있다. 이벤트의 처리가 하나 보다 많은 블록으로부터 자원을 요구한다면, 로킹 접근법은 태스크가 서로를 로킹하는 교착상태(deadlock)를 일으킬 수 있다. 따라서, 교착상태를 검출하고 롤백(rollback)을 수행하여 진행을 확실하게 하거나, 선택적으로 태스크 실행을 시작하기 전에 태스크에 필요한 모든 블록을 점유(seize)함으로써 교착상태를 완전히 피할 수도 있다.According to the present invention, data matching can be ensured by locking global data to be used by a software task executed in response to an event, or by locking the entire software block / object in the case of an object-oriented software design. If the processing of events requires resources from more than one block, the locking approach can cause deadlocks where tasks lock each other. Thus, deadlocks can be completely avoided by detecting deadlocks and performing rollbacks to ensure progress or, optionally, seize all the blocks needed for the task before starting the task execution.

데이터 일치를 보장하기 위한 또 다른 접근법은, 태스크간의 액세스 충돌이 검출되고 충돌이 검출되는 실행 태스크가 롤백되어 재시작되는 병렬 태스크 실행을 기반으로 한다. 충돌은 가변 이용량 마킹을 기반으로 검출되기도 하고, 또는 선택적으로 판독 및 기록 번지가 비교되는 번지 비교를 기반으로 하여 검출되기도 한다.Another approach to ensure data matching is based on parallel task execution in which access conflicts between tasks are detected and the executing task in which the conflict is detected is rolled back and restarted. Collisions may be detected based on variable usage markings, or optionally based on address comparisons where read and write addresses are compared.

개별적인 데이터 대신 보다 큰 면적(area)을 마킹함으로써, 더욱 성긴(coarse-grained) 충돌 검사가 실현된다.By marking larger areas instead of individual data, coarse-grained collision checking is realized.

본 발명에 따른 해결 방법은 사실상 처리 시스템의 처리 용량을 증가시키며, 계층적 처리 시스템의 경우 높은 레벨의 병목현상이 효과적으로 완화된다.The solution according to the invention actually increases the processing capacity of the processing system and, in the case of hierarchical processing systems, the high level bottleneck is effectively alleviated.

공유-메모리 멀티프로세서를 이용하고 적절한 데이터 일치 보장 수단을 제공함으로써, 단일-프로세서 시스템에 이미 존재하는 응용 소프트웨어가 재사용될 수 있다. 다수의 경우, 계층적 처리 시스템의 가장 높은 레벨에 있는 단일-프로세서 노드와 같은 단일-프로세서 시스템에 이미 수백만 라인의 코드가 이용될 수 있다. 표준 오프-더-셀프(off-the-shelf) 멀티프로세서를 이용하여 다중 프로세서를 구현하는 경우, 응용 소프트웨어를 자동으로 변환하고 가능하다면 시스템의 가상 기계/운영 시스템(virtual machine/operating system)을 다중 프로세서를 지원하도록 변경함으로써 현재의 모든 응용 소프트웨어가 재사용될 수 있다. 반면, 다중 프로세서가 독점적인 설계의 특수 하드웨어로 구현된다면, 응용 소프트웨어는 직접 멀티프로세서 환경으로 옮겨질 수 있다. 어떤 방법이든, 이것은 귀중한 시간을 절약하며, 스크래치(scratch)로 응용 소프트웨어를 설계하는 것에 비해 프로그래밍 비용을 줄인다.By using a shared-memory multiprocessor and providing a means of ensuring proper data matching, application software already present in a single-processor system can be reused. In many cases, millions of lines of code may already be used in a single-processor system, such as a single-processor node at the highest level of a hierarchical processing system. When implementing multiple processors using standard off-the-shelf multiprocessors, the application software is automatically converted and, if possible, multiple virtual machine / operating systems in the system. By changing to support the processor, all current application software can be reused. On the other hand, if the multiprocessor is implemented with special hardware of proprietary design, the application software can be moved directly to the multiprocessor environment. Either way, this saves valuable time and reduces programming costs compared to designing application software with scratch.

본 발명은 다음과 같은 장점을 제공한다:The present invention provides the following advantages:

- 처리 용량 증가;Increasing treatment capacity;

- 병목현상 완화;-Bottleneck mitigation;

- 특히, 객체-지향 설계의 경우 이미 존재하는 응용 소프트웨어의 재이용을 허용.In particular, object-oriented design allows reuse of existing application software.

본 발명에 의해 제공되는 상기 이외의 장점은 본 발명의 실시예에 대한 아래의 상세한 설명을 읽으면 알게 된다.Advantages other than those provided by the present invention will become apparent upon reading the following detailed description of embodiments of the present invention.

본 발명과, 이것의 목적 및 장점은 첨부 도면을 함께 고려한 이하의 상세한 설명을 참조하여 가장 잘 이해된다.The invention and its objects and advantages are best understood by reference to the following detailed description taken in conjunction with the accompanying drawings.

도면 전체에 걸쳐, 동일한 참조 문자는 상응하거나 유사한 요소에 사용된다.Throughout the drawings, the same reference characters are used for corresponding or similar elements.

도 1은 본 발명에 따른 높은 레벨의 프로세서 노드를 가진 계층적 분산 처리 시스템의 개요도이다. 계층적 분산 처리 시스템(1)은 다수의 시스템 계층 레벨에 걸쳐 분산된 다수의 프로세서 노드를 가진 통상적인 트리 구조를 갖는다. 예컨대, 계층적 처리 시스템은 전기 통신 노드와 라우터(router)에서 발견될 수 있다. 본래, 높은 레벨의 프로세서 노드와, 특히 맨 위의 프로세서 노드는 처리 시스템에 의해 처리될 이벤트의 수가 증가함에 따라 병목현상으로 된다.1 is a schematic diagram of a hierarchical distributed processing system having a high level processor node according to the present invention. The hierarchical distributed processing system 1 has a conventional tree structure with multiple processor nodes distributed across multiple system hierarchical levels. For example, hierarchical processing systems may be found in telecommunication nodes and routers. Inherently, high level processor nodes, and especially the top processor nodes, become a bottleneck as the number of events to be processed by the processing system increases.

본 발명에 따라 상기 병목현상을 완화시키는 효과적인 방법은 계층의 가장 높은 레벨에 다중 공유-메모리 프로세서(11)를 이용하는 것을 포함한다. 도 1에는, 다중 프로세서가 맨 위 노드(10)에 구현되는 것으로 도시되어 있다. 다중 공유-메모리 프로세서(11)는 멀티프로세서 시스템을 기반으로한 표준 마이크로프로세서 형태로 구현되는 것이 바람직하다. 모든 프로세서(11)가 소위 공유 메모리(12)라고 하는 공통 메모리를 공유한다. 일반적으로, 높은 레벨의 프로세서 노드(10)에 묶인 외부의 비동기 이벤트는 먼저 입/출력(I/O) 유닛(13)에 도달하며, 상기 입/출력 유닛으로부터 매핑기(mapper) 또는 분배기(distributor)(14)로 전송된다. 매핑기(14)는 이벤트를 프로세서(11)로 매핑/분산시켜 처리한다.An effective way to mitigate the bottleneck in accordance with the present invention involves using multiple shared-memory processors 11 at the highest level of the hierarchy. In FIG. 1, multiple processors are shown implemented at the top node 10. The multiple shared-memory processor 11 is preferably implemented in the form of a standard microprocessor based on a multiprocessor system. All the processors 11 share a common memory called the shared memory 12. In general, an external asynchronous event tied to a high level processor node 10 first arrives at an input / output (I / O) unit 13, from which it is a mapper or distributor. 14). The mapper 14 processes the events by mapping / distributing the events to the processor 11.

계층적 처리 시스템(1)에서 식별된 이벤트 흐름 병행 처리를 기반으로, 프로세서 노드(10)로의 외부 이벤트 흐름이 다수의 병행 처리 카테고리(이하, 이벤트의비-교환 카테고리(non-commuting categories:NCCs라 함)로 분리된다. 매핑기(14)는 각 NCC가 하나 이상의 프로세서(11)의 규정된 집합으로 할당되도록 함으로써, 병행 처리와 다중 프로세서의 최적 이용을 가능하게 한다. 매핑기(14)는 하나 이상의 프로세서(11)에 구현될 수 있으며, 상기 프로세서는 그 다음 매핑기에 제공되는 것이 바람직하다.Based on the event flow parallel processing identified in the hierarchical processing system 1, the external event flow to the processor node 10 is referred to as a number of parallel processing categories (hereinafter referred to as non-commuting categories (NCCs). The mapper 14 allows each NCC to be assigned to a defined set of one or more processors 11, thereby enabling parallel processing and optimal use of multiple processors. It may be implemented in the above processor 11, the processor is preferably provided to the next mapper.

비-교환 카테고리는, 이벤트의 순서가 카테고리 내에서는 유지되어야 하지만 상이한 카테고리로부터의 이벤트를 처리하는데 있어서는 순서화 조건이 전혀 없는 이벤트 그룹이다. 정보 흐름이 프로토콜에 의해 관리되는 시스템에 대한 일반적인 조건은 소정의 관련 이벤트가 수신된 순서로 처리되어야 한다는 것이다. 이것은 시스템이 어떻게 구현될 지라도 시스템의 변하지 않는 사항이다. 적당한 NCC의 식별과 NCC의 병행 처리는, 소정의 시스템 프로토콜에 의해 제기된 순서화 조건이 충족되는 한편, 이와 동시에 이벤트 흐름에 고유의 병행처리가 이용되도록 한다.A non-exchange category is a group of events in which the order of events must be maintained within the category but without any ordering conditions in handling events from different categories. A general condition for systems where information flow is managed by a protocol is that certain related events must be processed in the order in which they are received. This is what the system does not change, no matter how it is implemented. The identification of the appropriate NCC and the parallel processing of the NCC ensure that the unique parallel processing is used in the event flow while the ordering conditions raised by the given system protocol are met.

외부 이벤트가 일련의 이벤트로서 "슬라이스"에서 처리 또는 실행될 수 있다면, 하나 이상의 프로세서 집합을 멀티프로세서 파이프라인으로 동작시킴으로써 선택적인 또는 또 다른 병행 실행이 가능하다. 따라서, 멀티프로세서 파이프라인에 도달하는 각각의 외부 이벤트는, 멀티프로세서 파이프라인의 상이한 프로세서 단에서 실행되는 슬라이스에 처리된다.If an external event can be processed or executed in a "slice" as a series of events, selective or another parallel execution is possible by operating one or more sets of processors in a multiprocessor pipeline. Thus, each external event arriving at the multiprocessor pipeline is processed in a slice running at different processor stages of the multiprocessor pipeline.

그 결과, 일반적인 처리 구조는, NCC가 각기 다른 프로세서 집합에 의해 실행되고 적어도 하나의 프로세서 집합이 멀티프로세서 파이프라인으로 동작하는 소위 행렬 처리에 의해 얻어진다. 도 1에 도시된 프로세서의 논리 "행렬"의 요소 중어떤 것은 비어 있을 수도 있다는 것을 알아두어야 한다. 도 1에 도시된 프로세서의 논리 행렬을 한 행의 프로세서로 줄이면 순수한 NCC 처리를 제공하며, 행렬을 한 열의 프로세서로 줄이면 순수한 이벤트-레벨 파이프라인 처리를 제공한다.As a result, the general processing structure is obtained by so-called matrix processing in which the NCCs are executed by different processor sets and at least one processor set operates in a multiprocessor pipeline. It should be noted that any of the elements of the logical "matrix" of the processor shown in FIG. 1 may be empty. Reducing the logical matrix of the processor shown in FIG. 1 to one row of processors provides pure NCC processing, while reducing the matrix to one column of processors provides pure event-level pipeline processing.

이벤트를 기반으로한 시스템에 대한 계산은 일반적으로, 외부 세계로부터의 입력 이벤트가 시스템의 상태를 변경함으로써 결과적으로 출력 이벤트가 나타날 수 있는 상태 기계로서 모델링된다. 각각의 비-교환 카테고리/파이프라인 단이 독립/해체(disjoint) 상태에 의해 처리될 수 있다면, 다양한 상태 기계 사이에는 어떠한 데이터 공유도 존재하지 않는다. 그러나, 글로벌 상태 또는 변수로 표현되는 글로벌 자원이 존재한다고 하면, 소정의 글로벌 상태에서의 동작은 보통 한 번에, 시스템 상태 기계의 일부를 실행하는 단 하나의 프로세서만 소정의 글로벌 상태를 액세스하는 "원자적"이어야 한다. NCC/파이프라인을 기반으로한 실행으로 인해, 소위 시퀀스-종속성 검사에 대한 필요성이 없어진다.Calculations for an event based system are generally modeled as a state machine where input events from the outside world can change the state of the system, resulting in output events. If each non-switched category / pipeline stage can be handled by an independent / disjoint state, there is no data sharing between the various state machines. However, given that a global resource, represented by a global state or variable, exists, operations in a given global state are usually performed at a time, with only one processor executing a portion of the system state machine accessing the given global state. Atomic ". Due to the NCC / pipeline based implementation, there is no need for so-called sequence-dependency checking.

더 나은 이해를 위해, 다음 예를 고려한다. 소정의 글로벌 변수 집합이 다른 통신 노드로 향하는 빈 채널(free channel)과 같은 자원을 할당하는 것을 담당한다고 가정하자. 다음으로, 상이한 NCC의 두 가지 비동기 작업에 있어서, 이들이 빈 채널을 요청하는 순서는 문제가되지 않는다 - 첫 번째 요청이 선택 기준을 충족하는 제 1채널을 얻는 한편, 두 번째 요청이 기준을 충족하는 그 다음 이용가능한 채널을 얻는다. 중요한 점은, 한 작업에 대해 채널 선택이 진행 중인 동안 다른 작업이 간섭하지 말아야 한다는 것이다. 채널 할당을 담당하는 글로벌 변수로의 액세스는 (특별한 경우, 채널 조사를 균일하게 병렬화할 수 있다 하더라도) "원자적" 이어야한다.For a better understanding, consider the following example. Assume that a given set of global variables is responsible for allocating resources such as free channels to other communication nodes. Next, for two asynchronous operations of different NCCs, the order in which they request the empty channels is not a problem-while the first request gets the first channel that meets the selection criteria, while the second request meets the criteria. Then get the available channel. The important point is that while no channel selection is in progress for one task, the other task should not interfere. Access to global variables in charge of channel assignments should be "atomic" (even in certain cases, evenly parallelizing channel surveys).

또 다른 예는 계수기를 증분시킬 필요가 있는 상이한 NCC의 두 가지 작업을 수반한다. 어떤 작업이 먼저 계수기를 증분시킬 것인지는 문제가 되지 않지만, 상기 작업 중 하나가 계수기 변수에 작용하여 이것을 증분시키는(이것의 현재값을 판독하여 상기 현재값에 하나를 추가함) 동안에는 다른 작업이 간섭하지 말아야 한다.Another example involves two tasks of different NCCs that need to increment the counter. It doesn't matter which task first increments the counter, but while one of the tasks acts on the counter variable and increments it (reads its current value and adds one to the current value), it doesn't interfere. Should not.

공유-메모리 시스템에 있어서, 공유 메모리912) 내의 전체 응용 프로그램 공간과 데이터는 모든 프로세서에 액세스될 수 있다. 따라서, 프로세서가 모든 프로세서 또는 적어도 하나 보다 많은 프로세서에 공통인 글로벌 변수를 처리할 필요가 있을 때 데이터 일치를 보장할 필요가 있다. 이것은 도 1에 참조 번호 15로 개략적으로 표시된 데이터 일치 수단에 의해 이루어진다.In a shared-memory system, the entire application program space and data in shared memory 912 can be accessed by all processors. Thus, there is a need to ensure data matching when a processor needs to handle global variables common to all processors or at least one or more processors. This is done by data matching means schematically indicated by reference numeral 15 in FIG. 1.

이하, 본 발명의 제 1국면으로서의 NCC 처리, 본 발명의 제 2국면으로서의 이벤트-레벨 파이프라인 처리, 및 데이터 일치를 보장하는 처리와 수단이 설명된다.Hereinafter, NCC processing as the first aspect of the present invention, event-level pipeline processing as the second aspect of the present invention, and processing and means for ensuring data matching will be described.

NCC 처리NCC processing

도 2는 본 발명의 제 1국면에 따른 이벤트로 구동되는(event-driven) 처리 시스템의 개요도이다. 상기 처리 시스템은 다수의 공유-메모리 프로세서(P1 에서 P4), 공유 메모리(12), I/O-유닛(13), 분배기(14), 데이터 일치 수단(15), 및 다수의 독립적인 병렬 이벤트 큐(16)를 포함한다.2 is a schematic diagram of an event-driven processing system according to the first aspect of the present invention. The processing system comprises a plurality of shared-memory processors (P1 to P4), shared memory 12, I / O-units 13, distributors 14, data matching means 15, and a number of independent parallel events. And a cue 16.

I/O-유닛(13)은 입력되는 외부 이벤트를 수신하여 출력 이벤트를 출력한다.분배기(14)는 입력되는 이벤트를 비-교환 카테고리(NCC)로 분리하여 각 NCC를 독립 이벤트 큐(16) 중 규정된 큐에 분배한다. 이벤트 큐 각각은 각자의 프로세서에 접속되며, 각 프로세서는 관련 이벤트 큐로부터의 이벤트를 순차적으로 인출(fetch) 또는 수신하여 처리한다. 이벤트가 상이한 우선 순위 레벨을 갖는다면, 이것은 프로세서가 우선 순위 순으로 이벤트를 처리하도록 고려되어야 한다.The I / O-unit 13 receives an input external event and outputs an output event. The distributor 14 divides the input event into a non-switched category (NCC) to separate each NCC into an independent event queue 16. To the specified queue. Each event queue is connected to its own processor, and each processor sequentially fetches or receives events from the associated event queue and processes them. If the events have different priority levels, this should be considered so that the processor processes the events in order of priority.

예로서, 중앙의 높은 레벨의 프로세서 노드와, 소위 영역 프로세서(regional processor)(여기서, 각 영역 프로세서는 차례로 다수의 하드웨어 장치에 서비스를 제공함)라 하는 다수의 하위-레벨 프로세서를 가진 계층적 처리 시스템을 고려하자. 상기 시스템에서, 하드웨어 장치로부터 발생하는 이벤트와, 장치 그룹에 서비스를 제공하는 영역 프로세서로부터 나오는 이벤트는, 소정의 프로토콜에 의해 정해지는 순서화 조건에 의해 제기된 조건(더 높은 레벨에서 처리함으로써 보호되는 오차 조건은 제외함)을 만족한다. 따라서, 소정의 장치/영역 프로세서로부터의 이벤트가 비-교환 카테고리를 형성한다. 비-교환 카테고리를 유지하기 위해서는, 각 장치/영역 프로세서가 항상 각자의 이벤트를 동일한 프로세서에 공급해야 한다.By way of example, a hierarchical processing system having a central high level processor node and a plurality of low-level processors, called so-called regional processors, where each regional processor in turn services multiple hardware devices. Let's consider. In the above system, events occurring from hardware devices and events coming from the area processor providing a service to a device group are conditions that are protected by processing at a higher level (a higher level of error raised by an ordering condition defined by a predetermined protocol). Conditions are excluded). Thus, events from certain device / area processors form a non-switching category. To maintain a non-switched category, each device / area processor must always supply its own event to the same processor.

예컨대, 전기 통신 응용에 있어서, 사용자로부터 수신되는 숫자의 시퀀스, 또는 트렁크(trunk) 장치로 수신된 ISDN 사용자 부분 메시지의 시퀀스가 수신된 순서로 처리되어야 하지만, 두 개의 독립적인 트렁크 장치로 수신되는 메시지의 시퀀스는, 개별적인 트렁크 장치에 대한 순차화(sequencing)가 유지되기만 한다면 임의의 순서로 처리될 수 있다.For example, in a telecommunications application, a sequence of numbers received from a user, or a sequence of ISDN user part messages received at a trunk device must be processed in the order received, but received at two independent trunk devices. The sequence of s may be processed in any order as long as sequencing for the individual trunk devices is maintained.

도 2에 있어서, 소정의 하드웨어 장치나 입력 포트와 같은 규정된 소스(S1)로부터의 이벤트가 규정된 프로세서(P1)에 매핑되며, 소정의 영역 프로세서와 같은 또 다른 규정된 소스(S2)로부터의 이벤트가 또 다른 프로세서(P3)로 매핑된다는 것을 알 수 있다. 소스의 수가 보통 공유-메모리 프로세서의 수를 훨씬 초과하므로, 일반적으로 각 프로세서는 다수의 소스에 할당된다. 보편적인 전기 통신/데이터 통신 응용의 경우, 단일 중앙 프로세서 노드와 통신하는 영역 프로세서가 1024개 있을 수 있다. 부하 균형 방법으로 중앙 노드 내의 다중 공유-메모리 프로세서에 영역 프로세서를 매핑한다는 것은, 각 공유-메모리 프로세서가 대략 256개의 영역 프로세서를 얻는다는(중앙 노드에 4개의 프로세서가 있고, 모든 영역 프로세서가 동일한 부하를 발생시킨다고 가정함) 것을 의미한다. 그러나, 실제로는, 신호 장치, 가입자 단말기 등의 하드웨어 장치를 중앙 노드 프로세서로 매핑하여 훨씬 더 정교한 세분성(granularity)을 갖는 것이 이로울 수 있다. 이것은 일반적으로 부하 균형을 얻는 것을 더욱 용이하게 한다. 전기 통신망의 각 영역 프로세서는 수 백개의 하드웨어 장치를 제어한다. 그래서, 단일 프로세서(이것은 당연히 시분할 방식으로 부하를 처리함)로 10,000 개 이상의 하드웨어 장치를 매핑하는 대신, 본 발명에 따른 해결 방법은 하드웨어 장치를 중앙 노드내의 다수의 공유-메모리 프로세서에 매핑함으로써, 중앙 노드내의 병목 현상을 완화시킨다.In FIG. 2, events from a defined source S1, such as a given hardware device or input port, are mapped to a defined processor P1, and from another defined source S2, such as a given area processor. It can be seen that the event is mapped to another processor P3. Since the number of sources usually far exceeds the number of shared-memory processors, each processor is typically assigned to multiple sources. For common telecommunication / data communication applications, there may be 1024 area processors communicating with a single central processor node. Mapping area processors to multiple shared-memory processors within the central node in a load balancing manner means that each shared-memory processor gets approximately 256 area processors (four processors in the central node, all area processors have the same load). Is assumed to be generated). In practice, however, it may be beneficial to map hardware devices such as signaling devices, subscriber stations, etc., to a central node processor to have much more granularity. This generally makes it easier to achieve load balancing. Each area processor in a telecommunications network controls hundreds of hardware devices. So, instead of mapping more than 10,000 hardware devices to a single processor (which of course handles the load in a time-sharing manner), the solution according to the present invention is to map the hardware devices to multiple shared-memory processors in the central node, thereby providing a centralized solution. Mitigates bottlenecks in the node.

프로세서-투-프로세서(processor-to-processor:CP-to-CP) 신호에 의해 접속된 슬라이스내의 외부 이벤트 또는 소위 내부 이벤트를 처리하는 Telefonaktiebolaget LM Ericsson의 AXE Digital Switching System과 같은 시스템은 프로토콜에 의해 제기되는 것 이외에 각자의 순차화 조건을 제기한다. NCC에 대한 상기 CP-to-CP 신호는 이들이 생성된 순서로 처리되어야 한다(그렇지 않은 경우, 실행중 마지막 슬라이스에 의해 생성된 더 높은 우선 순위 신호로 교체됨). 이와 같은 추가 순차화 조건은, 도 2에 프로세서에서 이벤트 큐까지의 띠선(dashed line)으로 나타나있는 바와 같이, 각 CP-to-CP 신호(내부 이벤트)가 자신이 생성된 프로세서와 동일한 프로세서에서 처리되는 경우 만족된다. 따라서, 내부 이벤트는, 이들을 생성한 것과 동일한 프로세서 또는 프로세서 집합으로 이들을 다시 공급함에 따라 이들이 생성된 순서와 동일한 순서로 처리되도록 함으로써 동일한 NCC내에 유지될 수 있다.Systems such as Telefonaktiebolaget LM Ericsson's AXE Digital Switching System, which handles external or so-called internal events in slices connected by processor-to-processor (CP-to-CP) signals, are addressed by protocols. In addition to that, raise your own sequencing conditions. The CP-to-CP signals for the NCC must be processed in the order in which they were created (if not, they are replaced with higher priority signals generated by the last slice during execution). Such additional sequencing conditions are handled by the same processor as the processor on which each CP-to-CP signal (internal event) is generated, as indicated by the dashed line from processor to event queue in FIG. 2. If it is satisfied. Thus, internal events can be maintained in the same NCC by having them processed in the same order in which they were created as they were fed back to the same processor or set of processors that created them.

보통, 처리 시스템에 의해 인식되는 이벤트의 표현은 신호 메시지이다. 일반적으로, 각 신호 메시지는 헤더(header)와 신호부(signal body)를 갖는다. 신호부는 소프트웨어 태스크 실행에 필요한 정보를 포함한다. 예컨대, 신호부는 공유 메모리내의 소프트웨어 코드/데이터에 대한 포인터는 물론 필요한 입력 오퍼랜드(operand)를 명시적 또는 암시적으로 포함한다. 이러한 점에 있어서, 이벤트 신호는 상응하는 태스크를 완벽히 정의하는 독립식이다. 따라서, 프로세서(P1 에서 P4)는 독립적으로 이벤트를 인출하고 처리하여 상응하는 소프트웨어 태스크나 작업을 병렬로 실행한다. 소프트웨어 태스크를 또한 작업이라고 하며, 본 발명 전체에 걸쳐 태스크와 작업이라는 용어는 서로 바꿔 사용될 수 있다. 병렬 태스크 실행 중, 프로세서는 공유 메모리의 글로벌 데이터를 처리할 필요가 있다. (작업의 수명동안) 다수의 프로세서가 동일한 글로벌 데이터를 액세스하여 처리하는 데이터 불일치를 피하기 위해, 데이터 일치 수단(15)은 데이터 일치가 항상 보장되도록 해야 한다. 본 발명은, 병렬로 태스크를 실행하는 동안 글로벌 데이터가 프로세서에 의해 처리될 때 데이터 일치를 보장하기 위해 두 가지 기본적인 절차를 이용한다.Usually, the representation of the event recognized by the processing system is a signal message. In general, each signal message has a header and a signal body. The signal portion contains information necessary for the execution of the software task. For example, the signal portion explicitly or implicitly contains the necessary input operands as well as pointers to software code / data in shared memory. In this respect, the event signal is standalone, which completely defines the corresponding task. Thus, processors P1 to P4 independently retrieve and process events to execute corresponding software tasks or tasks in parallel. Software tasks are also referred to as tasks, and the terms task and task are used interchangeably throughout this invention. During parallel task execution, the processor needs to process global data in shared memory. In order to avoid data inconsistency in which multiple processors access and process the same global data (during the lifetime of the task), the data matching means 15 must ensure that data matching is always guaranteed. The present invention utilizes two basic procedures to ensure data matching when global data is processed by a processor while executing tasks in parallel.

·로킹(locking) : 각 프로세서는 보통, 데이터 일치 수단(15) 부분을 형성하여, 태스크 실행을 시작하기 전에 상응하는 태스크에 의해 이용될 글로벌 데이터를 로킹하는 수단을 포함한다. 이와 같은 방법으로, 글로벌 데이터를 로킹한 프로세서만 이것을 액세스할 수 있다. 로킹된 데이터는 태스크 실행을 끝마칠 때 해제되는 것이 바람직하다. 이러한 접근법은, 글로벌 데이터가 프로세서에 의해 로킹되고 또 다른 프로세서가 상기와 동일한 데이터를 액세스하고자 하는 경우, 로킹된 데이터가 해제될 때까지 다른 프로세서가 기다려야한다는 것을 의미한다. 일반적으로, 로킹은 병렬 처리 양을 어느 정도 제한하는 시간 대기(로킹된 글로벌 상태에 대기/기능 정지(stall))를 의미한다(당연히, 동시에 상이한 글로벌 상태에서의 병행 동작은 허용됨).Locking: Each processor usually comprises means for forming a data matching means 15 portion to lock global data to be used by the corresponding task before starting the task execution. In this way, only processors that have locked global data can access it. The locked data is preferably released when the task finishes executing. This approach means that if global data is locked by a processor and another processor wishes to access the same data as above, another processor must wait until the locked data is released. In general, locking refers to a time wait (waiting / stall on a locked global state) that limits the amount of parallelism to some extent (of course, concurrent operations in different global states are allowed at the same time).

·충돌 검출과 롤-백 : 소프트웨어 태스크가 병렬로 실행되고, 충돌이 검출되어, 충돌이 검출되는 하나 이상의 실행된 태스크가 롤백되고 재시작된다. 충돌 검출은 일반적으로 마커 방법 또는 번지 비교 방법에 의해 이루어진다. 마커 방법에 있어서, 각 프로세서는 공유 메모리의 변수 이용을 마킹하는 수단을 포함하며, 그 다음 상기 마킹한 것을 기반으로 하여 변수 액세스 충돌이 검출된다. 충돌 검출은 일반적으로 롤백으로 인한 페널티(penalty)를 갖는다(처리가 낭비되는 결과를 가져옴).Collision detection and roll-back: Software tasks are executed in parallel, collisions are detected, and one or more executed tasks for which collisions are detected are rolled back and restarted. Collision detection is generally accomplished by a marker method or a bungee comparison method. In the marker method, each processor includes means for marking variable usage of shared memory, and then variable access conflicts are detected based on the marking. Collision detection generally has a penalty for rollback (which results in wasted processing).

어떤 접근법을 선택하는지는 응용에 따라 좌우되어 케이스-투-케이스(case-to-case) 원리로 선택되어야 한다. 단순한 경험상으로는, 데이터 일치를 기반으로한 로킹은 데이터 시스템에 더 적합하며, 충돌 검출은 전기 통신과 데이터 통신 시스템에 더 이롭다. 어떤 응용의 경우에는, 로킹과 충돌 검출을 결합한 것을 이용하는 것이 이로울 수도 있다.Which approach you choose depends on your application and should be chosen on a case-to-case basis. In simple experience, locking based on data matching is more suitable for data systems, and collision detection is more beneficial for telecommunications and data communication systems. For some applications, it may be beneficial to use a combination of locking and collision detection.

이후, 데이터 일치를 보장하기 위한 수단으로서의 로킹과 충돌 검출이 더 상세히 기술된다.The locking and collision detection as a means to ensure data matching are then described in more detail.

도 3은 본 발명의 제 1국면에 따른 처리 시스템을 구체적으로 실현한 것을 도시하는 것이다. 상기 실현에 있어서, 프로세서(P1 에서 P4)는, 각 프로세서가 각자의 로컬 캐시(local cache)(C1 에서 C4)를 갖는 대칭 멀티프로세서(SMP)이며, 이벤트 큐는 전용 메모리 리스트, 바람직하게는 연결 리스크(EQ1 에서 EQ4)로서 공유 메모리(12)에 할당된다.Fig. 3 shows the concrete implementation of the processing system according to the first aspect of the present invention. In this implementation, the processors P1 to P4 are symmetric multiprocessors SMP, with each processor having its own local cache C1 to C4, and the event queue is a dedicated memory list, preferably a concatenation. It is allocated to shared memory 12 as risk (EQ1 to EQ4).

이미 언급된 바와 같이, 일반적으로 이벤트 신호 각각은 헤더와 신호부를 갖는다. 이러한 경우, 헤더는 상응하는 이벤트가 속하는 NCC를 나타내는 (암시적 또는 명시적인) NCC 태그를 포함한다. 분배기(14)는 이벤트 신호에 포함된 NCC 태그에 기초하여 입력 이벤트를 이벤트 큐(EQ1 에서 EQ4) 중 하나로 분배한다. 예컨대, NCC 태그는 입력 포트, 영역 프로세서, 또는 하드웨어 장치와 같이 이벤트가 발생하는 소스의 표현일 수도 있다. I/O-유닛(13)에 의해 수신된 이벤트가 소정의 하드웨어 장치로부터 나오며, 이것이 이벤트 신호에 포함된 태그에 나타나있다고 가정하자. 다음으로, 분배기(14)는 이벤트의 태그를 산출하여, 미리 저장된 이벤트-디스패치 테이블(event-dispatch table)이나 이에 상응하는 것에 기초하여 공유-메모리에 할당된 이벤트 큐(EQ1 에서 EQ4) 중 규정된 것으로 이벤트를 분배한다. 프로세서(P1 에서 P4) 각각은 로컬 캐시를 통해 공유 메모리(12) 내의 각자의 전용 이벤트 큐로부터 이벤트를 인출하여 상기 이벤트를 순차적으로 처리하고 종료한다. 이벤트-디스패치 테이블은 트래픽 소스에서의 장기간의 불균형을 조절하도록 때때로 변형될 수 있다.As already mentioned, each event signal generally has a header and a signal portion. In this case, the header includes an (implicit or explicit) NCC tag that indicates the NCC to which the corresponding event belongs. The distributor 14 distributes the input event to one of the event queues EQ1 to EQ4 based on the NCC tag included in the event signal. For example, the NCC tag may be a representation of the source where the event occurs, such as an input port, an area processor, or a hardware device. Assume that an event received by the I / O-unit 13 comes from a given hardware device, which appears in a tag included in the event signal. Next, the distributor 14 calculates a tag of the event, so that the distributor 14 may define a tag among the event queues EQ1 to EQ4 allocated to the shared-memory based on a pre-stored event-dispatch table or a corresponding one. To distribute the event. Each of the processors P1 to P4 draws events from their respective dedicated event queues in the shared memory 12 via the local cache, sequentially processing and terminating the events. The event-dispatch table may be modified from time to time to adjust for long term imbalances at the traffic source.

물론, 본 발명은 로컬 캐시를 가진 대칭 멀티프로세서로 제한되지 않는다. 공유-메모리 시스템의 다른 예는 캐시를 가지지 않은 공유-메모리, 공통 캐시를 가진 공유 메모리, 및 혼합(mixed) 캐시를 가진 공유 메모리를 포함한다.Of course, the invention is not limited to symmetric multiprocessors with local caches. Other examples of shared-memory systems include shared-memory without a cache, shared memory with a common cache, and shared memory with a mixed cache.

객체-지향 설계에 대한 예Example for Object-Oriented Design

도 4는 객체-지향 설계의 공유-메모리 소프트웨어를 가진 간단한 공유-메모리 멀티프로세서 시스템의 개요도이다. 공유 메모리(12) 내의 소프트웨어는 객체-지향 설계를 가지며, 블록(B1 에서 Bn) 또는 클래스 집합으로 구성된다. 각 블록/객체는 소정의 기능을 실행하는 것을 담당한다. 보편적으로, 각 블록/객체는, 코드가 저장되는 프로그램 섹터와 데이터가 저장되는 데이터 섹터의 두 개의 주요 섹터로 분리된다. 블록의 프로그램 섹터 내의 코드는 단지 동일한 블록에 속하는 데이터를 액세스하여 상기 데이터에만 작용한다. 다음으로, 데이터 섹터도 두 개의 섹터, 즉 다수의 글로벌 변수(GV1 에서 GVn)를 포함하는 "글로벌" 데이터의 제 1섹터와, 레코드(RV1 에서 RVn)와 같은 "비밀(private)" 데이터의 제 2섹터로 분리되는 것이 바람직하다. 각 레코드는 보편적으로 레코드(Rx)에 대해 나타나있는 바와 같이 다수의 레코드 변수(RV1 에서 RVn)를 포함한다. 각 트랜잭션(transaction)이 보편적으로 블록 내의 한 레코드와 연결되는 반면, 블록 내의 글로벌 데이터는 다수의 트랜잭션에 의해 공유된다.4 is a schematic diagram of a simple shared-memory multiprocessor system with shared-memory software of an object-oriented design. The software in shared memory 12 has an object-oriented design and consists of a block (B1 to Bn) or a set of classes. Each block / object is responsible for performing a given function. Typically, each block / object is divided into two main sectors: a program sector in which code is stored and a data sector in which data is stored. Code in the program sector of a block only acts on that data by accessing data belonging to the same block. Next, the data sector is also composed of two sectors, a first sector of "global" data including a plurality of global variables (GV1 to GVn), and a first sector of "private" data such as a record (RV1 to RVn). Preferably separated into two sectors. Each record typically contains a number of record variables RV1 to RVn as shown for record Rx. Each transaction is commonly associated with a record in a block, while global data in the block is shared by multiple transactions.

일반적으로, 블록으로의 신호 입력이 블록 내의 데이터 처리를 개시한다. 외부 또는 내부의 이벤트를 수신하면, 각 프로세서는 이벤트 신호에 의해 표시된 블록 내의 코드를 실행하여 상기 블록의 글로벌 변수와 레코드 변수에 작용함으로써, 소프트웨어 태스크를 실행한다. 소프트웨어 태스크의 실행은 각 프로세서(P1 에서 P4)내의 물결모양의 선으로 도 4에 나타나있다.In general, signal input to a block initiates data processing within the block. Upon receiving an external or internal event, each processor executes the code in the block indicated by the event signal to act on the global and record variables of the block, thereby executing a software task. Execution of the software task is shown in FIG. 4 by wavy lines in each processor P1 through P4.

도 4의 예에 있어서, 제 1프로세서(P1)는 소프트웨어 블록(B88) 내의 코드를 실행한다. 다수의(이 중 I20 에서 I23 명령만 도시됨) 명령이 실행되며, 각각의 명령은 블록 내의 하나 이상의 변수에 작용한다. 예컨대, 명령(I20)은 레코드(R1)내의 레코드 변수(RV28)에 작용하고, 명령(I21)은 레코드(R5) 내의 레코드 변수(RV59)에 작용하며, 명령(I22)은 글로벌 변수(GV43)에 작용하며, 명령(I23)은 글로벌 변수(GV67)에 작용한다. 이와 유사하게, 프로세서(P2)는 블록(B1)내의 코드를 실행하여 변수에 작용하고, 프로세서(P3)는 블록(B8) 내의 코드를 실행하여 변수에 작용하며, 프로세서(P4)는 블록(B99) 내의 코드를 실행하여 변수에 작용한다.In the example of FIG. 4, the first processor P1 executes code in software block B88. A number of instructions (of which only I20 to I23 instructions are shown) are executed, each of which acts on one or more variables in the block. For example, command I20 acts on record variable RV28 in record R1, command I21 acts on record variable RV59 in record R5, and command I22 acts on global variable GV43. Command I23 acts on the global variable GV67. Similarly, processor P2 executes the code in block B1 to act on the variable, processor P3 executes the code in block B8 to act on the variable, and processor P4 executes block B99. Run the code inside the) to act on the variable.

블록-지향 소프트웨어에 대한 예로는, 전체 소프트웨어가 블록으로 구성되는 Telefonaktiebolaget LM Ericsson의 PLEX(Programming Language for Exchanges) 소프트웨어가 있다. 자바 응용은 실제 객체-지향 설계의 예이다.An example of block-oriented software is Telefonaktiebolaget LM Ericsson's Programming Language for Exchanges (PLEX) software, which consists entirely of blocks. Java applications are an example of a real object-oriented design.

이벤트-레벨 파이프라인 기법Event-level pipeline technique

이미 언급된 바와 같이, 어떤 시스템은 내부 이벤트에 의해 접속된 "슬라이스"에서 외부 이벤트를 처리한다(예컨대, CP-to-CP 버퍼링된 신호).As already mentioned, some systems handle external events (eg, CP-to-CP buffered signals) in "slices" connected by internal events.

본 발명의 제 2국면에 따르면, 적어도 하나의 다중 공유-메모리 프로세서 집합을, 파이프라인의 상이한 프로세서 단에서 실행되는 일련의 이벤트로서 각 외부 이벤트가 슬라이스에서 처리되는 멀티프로세서 파이프라인으로 동작시킴으로써 병행 실행이 이루어진다. 한 단에 의해 발생된 모든 신호가 이들이 발생된 순서와 동일한 순서로 다음 단에 제공되기만 한다면, 생성 순서로 신호를 처리하는 순차화 조건이 보장된다. 이 규칙에서 벗어나면 경합이 없는(racing-free) 실행을 보장해야 한다. 소정의 슬라이스의 실행이 하나 보다 많은 신호로 나타난다면, 상기 신호는 이들이 발생된 순서와 동일한 순서로 다음 프로세서 단으로 제공되어야 하며, 또는 신호가 두 개 이상의 프로세서로 분산된다면, 상기 결과 나타날 수 있는 경합이 상기 계산에 해가되지 않도록 할 필요가 있다.According to a second aspect of the present invention, at least one set of multiple shared-memory processors is executed in parallel by operating in a multiprocessor pipeline where each external event is processed in a slice as a series of events executed at different processor stages of the pipeline. This is done. As long as all signals generated by one stage are provided to the next stage in the same order in which they were generated, the sequencing condition of processing the signals in the generation order is guaranteed. If you deviate from this rule, you need to ensure racing-free execution. If the execution of a given slice results in more than one signal, the signal must be provided to the next processor stage in the same order in which they occurred, or if the signal is distributed to more than one processor, the resultant contention may result. It is necessary to ensure that this calculation is not harmful.

이제, 도 5a-b를 참조하여 본 발명의 제 2국면에 따른 소정의 멀티프로세서 파이프라인의 구현이 설명된다.5A-B, an implementation of a given multiprocessor pipeline according to the second aspect of the present invention is now described.

도 5a는 본 발명의 제 2국면에 따른 이벤트로 구동되는 처리 시스템의 개요도이다. 상기 처리 시스템은 도 2에 도시된 것과 유사하다. 그러나, 멀티프로세서 파이프라인(11)의 일부인 프로세서에 의해 발생된 내부 이벤트가 반드시 동일한 프로세서로 피드백되지 않고, 프로세서(P1 에서 P4)에서 발생되어 이벤트 큐(16)까지의 버스에서 종료하는 띠선으로 나타나있는 바와 같이 임의의 프로세서로 제공될 수 있다.5A is a schematic diagram of a processing system driven by an event according to a second aspect of the present invention. The processing system is similar to that shown in FIG. However, internal events generated by a processor that is part of the multiprocessor pipeline 11 are not necessarily fed back to the same processor, but rather appear as bands that occur in the processors P1 to P4 and terminate on the bus to the event queue 16. As may be provided to any processor.

객체-지향 소프트웨어 설계에 있어서, 공유 메모리내의 소프트웨어는 도 4와관련하여 상기 기재된 바와 같이 블록 또는 클래스로 구성되며, 외부 이벤트 수신시, 상응하는 프로세서는 블록/객체 내의 코드를 실행하여 다른 블록/객체로 향하는 내부 이벤트 형태의 결과를 발생시킬 수 있다. 상기 내부 이벤트가 발생하여 실행될 때, 이것은 지시된 블록/객체에서 실행되며 다른 블록/객체로 향하는 또 다른 내부 이벤트를 발생시키게 된다. 상기 일련의 과정은 일반적으로 몇 번의 내부 이벤트 이후 사라진다. 예컨대, 전기 통신 응용의 경우, 각 외부 이벤트는 보편적으로 5-10 개의 내부 이벤트를 발생시킬 수 있다.In an object-oriented software design, the software in the shared memory is organized into blocks or classes as described above with respect to FIG. 4, and upon receipt of an external event, the corresponding processor executes code in the block / object to execute another block / object. It can generate a result in the form of an internal event going to. When the internal event occurs and is executed, it executes on the indicated block / object and generates another internal event destined for another block / object. This series of procedures generally disappears after several internal events. For example, for telecommunications applications, each external event can typically generate 5-10 internal events.

객체-지향 소프트웨어 설계용으로 제작되는 멀티프로세서 파이프라인의 실현은 소프트웨어 블록/클래스 클러스터를 프로세서에 할당하는 것이다. 도 2에는, 공유 메모리(12) 내의 블록/클래스 클러스터(CL1 에서 CLn)가 띠선으로 개략적으로 나타나있다. 도 2에서 알 수 있는 바와 같이, 상기 클러스터 중 하나(CL1)가 프로세서(P2)에 할당되며(CL1과 P2를 상호 연결하는 실선으로 나타나있음), 또 다른 클러스터(CL2)가 프로세서(P4)에 할당된다(CL2와 P4를 상호 연결하는 띠선으로 나타나있음). 이와 같은 방법으로, 공유 메모리(12) 내의 블록/클래스 클러스터 각각이 프로세서(P1 에서 P4) 중 규정된 것으로 할당되며, 상기 할당 방식은 분배기(14)내의 룩업 테이블(look-up table)(17)과 공유 메모리(12) 내의 룩업 테이블(18)에 구현된다. 각 룩업 테이블(17, 18)은 이벤트 ID 등에 따라 각 이벤트에 목표 블록을 연결시켜, 각 목표 블록을 규정된 블록 클러스터에 연결한다. 분배기(14)는 룩업 테이블(17)내의 정보에 따라 외부 이벤트를 프로세서에 분배한다. 공유 메모리(12) 내의 룩업 테이블(18)은 모든 프로세서(P1 에서 P4)에 의해 이용될 수 있으므로 내부 이벤트를 프로세서에 분배할 수 있다. 즉, 프로세서가 내부 이벤트를 발생시키면, 상기 프로세서는 룩업 테이블(18)에 물어, i) 이벤트 ID 등에 따른 상응하는 목표 블록, ii) 식별된 목표 블록이 속하는 클러스터, 및 iii) 식별된 클러스터가 할당되는 프로세서를 결정한 다음, 적절한 이벤트 큐로 내부 이벤트 신호를 전송한다. 오버래핑(overlapping) 클러스터를 이용한 할당 방식이 이벤트 ID 외에 실행 상태와 같은 정보를 이용하여 좀 더 정교한 방법으로 구현될 수 있다 하더라도, 각 블록은 보통 오직 하나의 클러스터에 속한다는 것을 유념해 두는 것이 중요하다.The realization of a multiprocessor pipeline built for object-oriented software design is the allocation of software block / class clusters to processors. In Fig. 2, block / class clusters CL1 to CLn in the shared memory 12 are schematically shown in bands. As can be seen in FIG. 2, one of the clusters CL1 is assigned to the processor P2 (indicated by the solid line interconnecting CL1 and P2), and another cluster CL2 is connected to the processor P4. (As shown by the bands connecting CL2 and P4). In this way, each block / class cluster in the shared memory 12 is assigned to one of the processors P1 to P4, the allocation scheme being a look-up table 17 in the distributor 14. And lookup table 18 in shared memory 12. Each lookup table 17 or 18 connects a target block to each event according to an event ID or the like, and connects each target block to a prescribed block cluster. The distributor 14 distributes external events to the processor according to the information in the lookup table 17. The lookup table 18 in the shared memory 12 can be used by all processors P1 through P4 so that internal events can be distributed to the processors. That is, when the processor generates an internal event, the processor asks the lookup table 18 to assign i) the corresponding target block according to the event ID, etc., ii) the cluster to which the identified target block belongs, and iii) the identified cluster. The processor then decides which processor it is, and sends an internal event signal to the appropriate event queue. It is important to note that although blocks using overlapping clusters can be implemented in more sophisticated ways using information such as execution status in addition to event IDs, each block usually belongs to only one cluster. .

도 5b에 나타나있는 바와 같이, 블록/클래스 클러스터를 프로세서에 매핑 함으로써 자동으로 파이프라인 실행이 일어난다 - 외부 이벤트(EE)는 프로세서(P1)에 할당되는 블록(A)으로 향하고, 그 다음 상기 블록에 의해 발생된 내부 이벤트(IE)는 프로세서(P2)에 할당되는 블록(B)으로 향하며, 그 다음 상기 블록에 의해 발생된 내부 이벤트(IE)는 프로세서(P4)에 할당되는 블록(C)으로 향하며, 그리고 상기 블록에 의해 발생된 내부 이벤트(IE)는 프로세서(P1)에 할당되는 블록(D)으로 향한다. 따라서, 논리적으로 다수의 프로세서 단을 가진 파이프라인을 가진다. 여기서, 블록(A 와 D)이 프로세서(P1)로 매핑되는 클러스터 부분인 반면, 블록(B)은 프로세서(P2)로 매핑되는 클러스터 부분이고, 블록(C)은 프로세서(P4)로 매핑되는 클러스터 부분이라고 가정하자. 파이프라인 내의 각 단은 한 프로세서에서 실행되지만, 소정의 프로세서가 파이프라인의 하나 보다 많은 단을 실행할 수도 있다.As shown in Figure 5b, pipeline execution occurs automatically by mapping a block / class cluster to a processor-the external event EE is directed to block A which is assigned to the processor P1, and then to the block. The internal event IE generated by is directed to block B which is assigned to processor P2, and then the internal event IE generated by the block is directed to block C which is assigned to processor P4. And the internal event IE generated by the block is directed to a block D allocated to the processor P1. Thus, we have a pipeline that logically has multiple processor stages. Here, blocks A and D are cluster portions mapped to the processor P1, while block B is a cluster portion mapped to the processor P2, and block C is a cluster mapped to the processor P4. Suppose it is part. Each stage in the pipeline runs on one processor, but a given processor may execute more than one stage of the pipeline.

공유 메모리(12)내의 규정된 데이터 영역으로부터 입력 데이터를 요청하는 이벤트를 동일한 규정된 프로세서 집합으로 매핑하는 변화가 포함된다.A change is included that maps an event requesting input data from a defined data area in shared memory 12 to the same defined processor set.

멀티프로세서 파이프라인의 프로세서 단이 일련의 제 1이벤트에 속하는 이벤트를 실행하여 그 결과 나타나는 내부 이벤트 신호를 다음 프로세서 단으로 전송할 때, 보통 일련의 상기 다음 이벤트로부터의 이벤트를 처리하는 것을 자유롭게 시작함으로써 처리 용량을 향상시킨다는 것을 알아두어야 한다.When a processor stage in a multiprocessor pipeline executes an event belonging to a series of first events and sends the resulting internal event signal to the next processor stage, it is usually handled by freely starting to process events from the series of subsequent events. It should be noted that the dose is improved.

최대 이득을 얻기 위해서는, 파이프라인 단을 프로세서에 매핑하는 것이 모든 프로세서가 균일하게 부하되도록 이루어져야 한다. 따라서, 블록/클래스 클러스터는 "균등 부하" 기준에 따라 분배된다. 각 클러스터에서 소비되는 시간 양은, 예컨대 단일 프로세서 상에서 실행중인 유사한 응용으로부터 알 수도 있고, 또는 실행 시간 동안 감시되어 분배의 재조절을 가능하게 할 수도 있다. 블록이 입력 이벤트에 응하여 하나 보다 많은 내부 이벤트를 발생시키고, 발생된 각 이벤트가 각기 다른 블록으로 향하는 경우, 다른 이벤트보다 "늦게" 발생된 내부 이벤트가 "먼저" 실행되는 것을 방지하기 위해서는 "균등 부하" 기준과 더불어 "비 경합(no racing)" 기준이 필요하다.To get the maximum gain, mapping the pipeline stage to the processor must be done so that all processors are evenly loaded. Thus, block / class clusters are distributed according to "even load" criteria. The amount of time spent in each cluster may be known, for example, from a similar application running on a single processor, or may be monitored during execution time to enable readjustment of distribution. If a block raises more than one internal event in response to an input event, and each event that is directed to a different block results in a "even load" to prevent the "first" execution of an internal event that is "later" than other events In addition to the criteria, a "no racing" criterion is needed.

물론, 내부 이벤트를 슬라이스 내로 분할하지 않고 처리할 수도 있지만, 내부 이벤트를 분할함으로써, 구조적 프로그램 개발/유지보수(maintenance) 및 파이프라인 처리를 가능하게 한다.Of course, internal events can be handled without dividing into slices, but by dividing internal events, structured program development / maintenance and pipeline processing is possible.

또한, 상기와 같은 외부 이벤트의 처리는 소수의 큰 슬라이스에서 수행될 수도 있고, 또는 다수의 작은 슬라이스에서 수행될 수도 있다.In addition, the processing of such an external event may be performed in a few large slices, or may be performed in many small slices.

상기 언급된 바와 같이, 병렬 태스크 실행 중 글로벌 데이터가 프로세서에 의해 처리될 때 데이터 일치를 보장하기 위한 두 가지 기본 절차, 즉 i) 로킹, 및ii) 충돌 검출과 롤백 이 존재한다.As mentioned above, there are two basic procedures to ensure data matching when global data is processed by the processor during parallel task execution: i) locking, and ii) collision detection and rollback.

데이터 일치를 보장하기 위한 수단으로서의 로킹Locking as a means to ensure data matching

데이터 일치를 보장하고자 하는 목적으로 로킹을 구현할 때, 태스크를 실행하는 각 프로세서는 일반적으로 태스크 실행을 시작하기 전에 태스크에 의해 사용될 글로벌 데이터를 로킹한다. 이와 같은 방법으로, 글로벌 데이터를 로킹한 프로세서만 이것을 액세스할 수 있다.When implementing locking for the purpose of ensuring data matching, each processor executing a task typically locks the global data to be used by the task before starting the task execution. In this way, only processors that have locked global data can access it.

로킹은, 데이터 영역이 명확히 정해져 한 블록 또는 전체 블록의 소정의 데이터 섹터가 로킹될 수 있으므로 객체-지향 설계에 매우 적합하다. 블록 내의 글로벌 데이터 중 어느 부분이 소정의 실행 시퀀스나 태스크에 의해 변형될 것인지를 아는 것이 보통 불가능하다는 바와 같이 글로벌 데이터의 일반적인 특성이 없다면, 전체 글로벌 데이터를 로킹하는 것이 데이터 일치를 보장하는 안전한 방법이다. 이상적으로는, 각 블록 내의 글로벌 데이터를 보호하는 것만으로도 충분하지만, 다수의 응용에 있어서, 역시 보호될 필요가있는 소정의 이른바 "교차 레코드(across record)"가 존재한다. 예컨대, 빈 레코드를 선택하는 동작은 실제로 빈 레코드를 발견하기 위해 다수의 레코드를 조사하게 된다. 따라서, 전체 블록을 로킹하는 것이 모두를 보호한다. 또한, 버퍼링된 신호의 실행이 반복(loop)(EXIT 전에 한 번 보다 많이 한 블록을 방문함) 가능성을 가진 소위 직접/결합 신호에 의해 접속된 다수의 블록에 걸쳐있을 수 있는 응용에서는, 태스크 실행을 끝마칠 때까지 로킹될 블록을 해제하지 말아야 한다.Locking is well suited for object-oriented design because the data area is clearly defined so that certain data sectors of one or all blocks can be locked. If there is no general characteristic of global data, such as it is usually impossible to know which part of the global data in a block will be transformed by a given execution sequence or task, then locking the entire global data is a safe way to ensure data matching. . Ideally, it would be sufficient to protect the global data in each block, but for many applications, there are some so-called "across records" that also need to be protected. For example, selecting an empty record actually examines multiple records to find the empty record. Thus, locking the entire block protects everyone. Also, in applications where execution of a buffered signal may span multiple blocks connected by so-called direct / combined signals with the possibility of looping (visiting one block more than once before EXIT), task execution Do not release the block to be locked until you have finished.

NCC의 이용은 일반적으로 다중 프로세서 간에 "공유 상태"를 최소화하며, 또한 캐시 적중률(cache hit rate)을 향상시킨다. 특히, 전기 통신 시스템의 신호 장치 및 가입자 단말기와 같이 기능적으로 상이한 영역 프로세서/하드웨어 장치 등을 중앙 노드의 각기 다른 프로세서에 매핑 함으로써, 마지막 실행 단에 처리가 도달할 때까지 상이한 액세스 메커니즘이 보통 각기 다른 블록에서 처리되므로, 로킹된 블록을 거의 또는 전혀 기다리지 않고 상이한 액세스 메커니즘을 동시 처리하는 것이 허용된다.The use of NCC generally minimizes the "shared state" between multiple processors, and also improves cache hit rate. In particular, by mapping functionally different area processors / hardware devices, such as signaling devices and subscriber terminals in telecommunication systems, to different processors in the central node, different access mechanisms are usually different until processing reaches the last execution stage. As it is processed in blocks, it is allowed to process different access mechanisms concurrently with little or no waiting for locked blocks.

도 6은 블록/객체의 로킹을 이용하여 데이터 일치를 보장하는 것을 나타낸다. 세 개의 상이한 외부 이벤트(EEx, EEy, EEz)가 각각 블록(B1, B2, B3)으로 향한다고 하자. 외부 이벤트(EEx)가 블록(B1)에 입력되며, 상응하는 프로세서는, 블록(B1)을 가로지르는 대각선으로 나타나있는 바와 같이 상기 블록에서 실행을 시작하기 전에 블록(B1)을 로킹한다. 그 다음, 외부 이벤트(EEy)가 블록(B2)에 입력되며, 상응하는 프로세서가 블록(B2)을 로킹한다. 도 6의 시간축(t)으로 나타나있는 바와 같이, 블록(B1)으로 향하는 외부 이벤트(EEz)는, 이미 블록(B1)에 입력되어 상기 블록을 로킹한 외부 이벤트(EEx) 다음에 나타난다. 따라서, 외부 이벤트(EEz)의 처리는 블록(B1)이 해제될 때까지 기다려야 한다.6 illustrates ensuring data matching using locking of blocks / objects. Suppose three different external events EEx, EEy, and EEz are directed to blocks B1, B2, and B3, respectively. An external event EEx is input to block B1, and the corresponding processor locks block B1 before starting execution in that block, as indicated by the diagonal across block B1. Then, external event EEy is input to block B2, and the corresponding processor locks block B2. As indicated by the time axis t in FIG. 6, the external event EEz directed to the block B1 appears after the external event EEx that has already been input to the block B1 and locked the block. Therefore, the processing of the external event EEz must wait until the block B1 is released.

그러나, 로킹은, 두 개의 프로세서가 이들의 현재 태스크 실행에서 프로세서에 의해 상호 요청된 변수를 해제하기 위해 무한정으로 서로를 기다리는 교착 상태를 일으킬 수도 있다. 따라서, 교착 상태를 피하거나 이것을 검출하여, 진행을 보장하는 롤백을 수행하는 것이 바람직하다.However, locking may cause deadlocks where two processors wait indefinitely for each other to release variables mutually requested by the processors in their current task execution. Therefore, it is desirable to perform a rollback that avoids or detects a deadlock and ensures progress.

실행 중 필요할 때 블록을 점유/로킹하는 것과 반대로, 전체 태스크(또한 작업이라고도 함)를 실행하는데 필요한 모든 블록을 상기 작업을 시작할 때 점유함으로써 교착 상태를 피할 수 있다. 그러나, 컴파일러 분석을 이용하는 비-실행 시간 입력이, 예컨대 작업 내의 처리 시간 중 더 많은 부분을 소비하는 블록을 적어도 점유함으로써 교착 상태를 최소화화기 위한 정보를 제공할 수 있다 하더라도, 소정의 작업에 필요한 모든 블록을 미리 아는 것이 항상 가능하지는 않을 수도 있다. 교착 상태를 최소화하는 효과적인 방법은, 처리에 필요한 다음 블록인지 여부에 관계없이 실행을 시작하기 전에 상기와 같은 가장 높은 이용량의 블록을 점유하는 것이다. 항상, 거의 확실히 작업에 필요한 블록, 특히 높은 이용량을 가진 블록을 점유하고, 필요한 때 블록의 나머지를 점유하는 것이 올바른 생각이다.In contrast to occupying / locking blocks when needed during execution, deadlocks can be avoided by occupying all the blocks needed to execute the entire task (also called a task) at the start of the task. However, although non-execution time input using compiler analysis may provide information for minimizing deadlocks, for example, by at least occupying blocks that consume more of the processing time in a task, all that is needed for a given task. Knowing a block in advance may not always be possible. An effective way of minimizing deadlocks is to occupy such highest utilization blocks before starting execution, regardless of whether they are the next block needed for processing. It is always a good idea to occupy the blocks necessary for the task, almost certainly the ones with the highest utilization, and the rest of the blocks when needed.

실행 중 필요한 때 블록을 점유하면 상기 설명된 바와 같이 교착 상태를 일으키기 쉬우므로, 교착 상태를 검출하여 해결할 필요가 있다. 가능한 빨리 교착 상태를 검출하는 것이 유리하며, 본 발명에 따르면 교착 상태 검출이 거의 즉시 이루어질 수 있다. 모든 "오버헤드 처리"가 두 작업 사이에 발생하므로, 교착 상태를 일으키게 될 나중 작업에 대한 "자원"을 얻는 동안 교착 상태 검출이 입증된다. 이것은, 고려중인 작업에 필요한 자원 중 하나가 어떤 프로세서에 의해 보유되는지를 검사한 다음, 상기 프로세서가 예컨대 블록마다 플래그를 이용함으로써 고려중인 작업을 가진 프로세서에 의해 보유된 자원을 기다리고 있는지를 검증함으로써 이루어진다.Occupying a block when necessary during execution tends to cause a deadlock as described above, so it is necessary to detect and resolve the deadlock. It is advantageous to detect deadlocks as soon as possible, and according to the invention deadlock detection can be made almost immediately. Since all "overhead processing" occurs between two jobs, deadlock detection is demonstrated while obtaining "resources" for later jobs that will cause deadlocks. This is done by examining which processor one of the resources required for the task under consideration is held by and then verifying that the processor is waiting for the resource held by the processor with the task under consideration, for example by using a flag per block. .

또한, 교착 상태를 최소화하는 것은 롤백과 진행 방식에 영향을 미치게 된다. 교착 상태 빈도가 낮을수록 롤백 방식은 더욱 간단해지는데, 왜냐하면 드문 롤백의 효율에 대해 고민할 필요가 없기 때문이다. 반면, 교착 상태 빈도가 비교적 높다면, 효과적인 롤백 방식을 갖는 것이 중요하다.In addition, minimizing deadlocks affects how rollback and progression occurs. The lower the deadlock frequency, the simpler the rollback method, because you don't have to worry about the efficiency of rare rollbacks. On the other hand, if the deadlock frequency is relatively high, it is important to have an effective rollback scheme.

롤백에 대한 기본 원리는, 보유된 모든 자원을 해제하고, 이 시점까지 실행하는데 이루어진 모든 변화를 취소하여 교착 상태를 일으키는데 포함된 작업 중 한 작업의 처음으로 되돌아가, 효율을 떨어뜨리지 않고 진행이 보장될 수 있게 하는 방식으로 나중에 또는 상기 지연 이후 롤백된 작업을 재시작하는 것이다. 이것은 일반적으로, 롤백 방식이, 바로 재시작 함으로써 동일한 작업의 롤백으로 나타나는 교착상태 반복을 일으키는 것이 허용되지 않음은 물론, 롤백된 작업을 시작하기 전의 지연이 너무 길어지도록 해서도 안된다는 것을 의미한다. 그러나, 작업 실행 시간이 매우 짧으면, 간단히 교착 상태를 일으키는 "나중" 작업을 선택하여 롤백하는 것이 적절하다.The basic principle behind rollback is to release all the resources held up and back out to the beginning of one of the tasks involved in deadlocking by undoing all changes made to execution up to this point, ensuring progress without sacrificing efficiency. It is possible to restart the rolled back job later or after the delay. This generally means that the rollback method is not allowed to cause deadlock repetition, which results in a rollback of the same job by restarting immediately, nor should it cause the delay before starting the rolled back operation to be too long. However, if the job execution time is very short, it is appropriate to simply select and roll back the "later" job that causes the deadlock.

데이터 일치를 보장하는 수단으로서의 충돌 검출Collision detection as a means of ensuring data matching

데이터 일치를 보장하고자 하는 목적으로 충돌 검출을 구현하면, 소프트웨어는 다중 프로세서에 의해 병렬로 실행되며, 충돌이 검출되는 하나 이상의 실행된 태스크가 롤백되어 재시작될 수 있도록 액세스 충돌이 검출된다.Implementing conflict detection for the purpose of ensuring data matching, the software is executed in parallel by multiple processors, and access conflicts are detected so that one or more executed tasks for which a conflict is detected can be rolled back and restarted.

태스크를 실행하는 동안 각 프로세서가 공유 메모리의 변수 이용을 마킹함으로써, 변수 액세스 충돌이 검출되도록 하는 것이 바람직하다. 매우 기본적인 단계로, 마커 방법은 공유 메모리의 개별적인 변수 이용을 마킹하는 것으로 이루어진다. 그러나, 개별적인 데이터 대신 보다 큰 면적을 마킹함으로써, 더욱 성긴 충돌 검사가 실현된다. 더욱 성긴 충돌 검사를 구현하는 한 가지 방법은 페이징(paging)을 포함하는 표준 메모리 관리 기술을 이용하는 것이다. 또 다른 방법은 변수 그룹을 마킹하는 것인데, 이것은 개별적인 레코드 변수를 마킹하는 대신 레코드 내의 모든 레코드 변수를 포함하는 전체 레코드를 마킹하는데 특히 효과적인 것으로 판명되었다. 그러나, 작업이 소정의 데이터 면적을 이용한다면 다른 작업이 상기와 동일한 면적을 이용할 가능성을 매우 낮게 하는 방법으로 "데이터 면적"을 선택하는 것이 중요하다. 이와 같이 하지 않으면, 성긴 데이터-면적 마킹은 사실상 롤백 빈도를 더욱 높이는 결과를 나타낼 수 있다.It is desirable for each processor to mark variable usage of shared memory while executing a task so that variable access conflicts are detected. At a very basic level, the marker method consists of marking the use of individual variables in shared memory. However, by marking larger areas instead of individual data, coarser collision checking is realized. One way to implement coarser collision checking is to use standard memory management techniques, including paging. Another way is to mark groups of variables, which has proved to be particularly effective for marking entire records, including all record variables in a record, instead of marking individual record variables. However, if the job uses a predetermined data area, it is important to select the "data area" in such a way that it is very unlikely that other jobs will use the same area. If this is not done, sparse data-area marking can actually result in a higher rollback frequency.

도 7은 변수 마킹을 이용하여 객체-지향 소프트웨어 설계에서의 액세스 충돌을 검출하는 것을 나타낸다. 공유 메모리(12)는 도 4와 관련하여 상기 기재되어 있는 바와 같이 블록(B1 에서 Bn)으로 구성되고, 다수의 프로세서(P1 에서 P3)가 공유 메모리(12)에 접속된다. 도 7은 두 개의 블록(블록 B2 와 블록 B4)을 더 상세히 도시한다. 이와 같은 소정의 마커 방법 구현에 있어서, 블록 내의 각 글로벌 변수(GV1 에서 GVn)와 각 레코드(R1 에서 Rn)는 도 7에 도시되어 있는 바와 같이 마커 필드와 연결된다.7 illustrates detecting access conflicts in object-oriented software design using variable marking. The shared memory 12 is composed of blocks B1 to Bn as described above with respect to FIG. 4, and a plurality of processors P1 to P3 are connected to the shared memory 12. 7 shows two blocks (block B2 and block B4) in more detail. In such a predetermined marker method implementation, each global variable (GV1 to GVn) and each record (R1 to Rn) in the block is associated with a marker field as shown in FIG.

마커 필드는 공유 메모리 시스템에 접속된 프로세서마다 1 비트를 가지므로, 상기의 경우 각 마커 필드는 3비트를 갖는다. 모든 비트가 처음에 리셋되며, 각 프로세서는 변수 또는 레코드를 액세스(판독 또는 기록)하기 전에 각자의 비트를 설정한 다음, 전체 마커 필드를 판독하여 산출한다. 마커 필드에 설정되는 임의의 다른 비트가 있다면, 충돌이 임박해있어, 프로세서는 상응하는 모든 마커 비트를 리셋하는 것을 포함하여 실행 중 상기 시점까지 이루어진 모든 변화를 취소하여 실행될 태스크를 롤백한다. 반면, 설정되는 다른 비트가 전혀 없다면, 프로세서는 계속해서 상기 태스크를 실행한다. 각 프로세서는 실행하는 동안 액세스된 각 변수의 번지를 레코딩하여, 상기 레코딩된 번지를 이용하여 태스크 실행 완료시 상응하는 각 마커 필드에 각자의 비트를 리셋한다.The marker field has one bit for each processor connected to the shared memory system, so in this case each marker field has three bits. All bits are reset at first, and each processor sets its own bits before accessing (reading or writing) a variable or record, and then reads and computes the entire marker field. If there is any other bit set in the marker field, a collision is imminent, and the processor rolls back the task to be executed by canceling all changes made up to this point in time, including resetting all corresponding marker bits. On the other hand, if no other bits are set, the processor continues to execute the task. Each processor records the address of each variable accessed during execution and uses the recorded address to reset its respective bit in the corresponding respective marker field upon completion of task execution.

충돌이 검출될 때 롤백할 수 있도록 하기 위해서는, 각 작업을 실행하는 동안 모든 변형된 변수(변형되기 전의 변수 상태)와 이들 번지의 사본(copy)을 보유할 필요가 있다. 이와 같이 함으로써, 롤백되는 경우 원 상태를 복원할 수 있다.In order to be able to roll back when a collision is detected, it is necessary to keep all the modified variables (variable state before being transformed) and copies of these addresses during each task execution. In this way, the original state can be restored when rolled back.

도 7에 있어서, 프로세서(P2)는 글로벌 변수(GV1)를 액세스 할 필요가 있으며, GV1과 연결된 마커 필드의 제 2위치에 각자의 비트를 설정한 다음 전체 마커 필드를 판독한다. 이러한 경우, 필드(110)가 프로세서(P1)에 의해 설정된 비트와 프로세서(P2)에 의해 설정된 비트를 포함함에 따라, 임박한 변수 액세스 충돌이 검출된다. 프로세서(P2)는 실행중인 태스크를 롤백한다. 이와 마찬가지로, 프로세서(P2)가 레코드(R2)를 액세스할 필요가 있다면, 상기 프로세서는 제 2위치에 각자의 비트를 설정한 다음 전체 마커 필드를 판독한다. 필드(011)가 P2에 의해 설정된 비트와 P3에 의해 설정된 비트를 포함함에 따라, 레코드 액세스 충돌이 검출되어, 프로세서(P2)는 실행중인 태스크를 롤백한다. 프로세서(P3)가 레코드(R1)를 액세스 할 필요가 있으면, 상기 프로세서는 먼저 연결된 마커 필드의 제 3위치에 각자의 비트를 설정한 다음 전체 필드를 판독하여 산출한다. 이러한 경우, 다른 비트가 전혀 설정되지 않아, 프로세서(P3)가 판독 또는 기록을 위해 레코드를 액세스하는 것이 허용된다. 각 마커 필드는 프로세서마다 기록용 비트 하나와 판독용비트 하나의 두 비트를 가져, 예컨대 주로 판독되는 변수상에서의 불필요한 롤백을 줄이는 것이 바람직하다.In Fig. 7, the processor P2 needs to access the global variable GV1, sets its respective bit at the second position of the marker field associated with GV1 and then reads out the entire marker field. In this case, as field 110 includes bits set by processor P1 and bits set by processor P2, an impending variable access conflict is detected. Processor P2 rolls back the running task. Similarly, if processor P2 needs to access record R2, the processor sets its respective bit in the second position and then reads the entire marker field. As the field 011 contains the bit set by P2 and the bit set by P3, a record access conflict is detected, and the processor P2 rolls back the task being executed. If processor P3 needs to access record R1, the processor first sets its respective bit in the third position of the linked marker field and then reads and calculates the entire field. In this case, no other bits are set at all, allowing the processor P3 to access the record for reading or writing. Each marker field has two bits per processor, one for writing and one for reading, so as to reduce unnecessary rollback, for example, mainly on variables being read.

충돌 검출을 위한 또 다른 접근법을 번지 비교 방법이라 하는데, 이 경우 판독 및 기록 번지는 태스크를 끝마칠 때 비교된다. 마커 방법과 비교해 주된 차이점은, 일반적으로 다른 프로세서에 의한 액세스가 태스크 실행 중에는 검사되지 않고 오로지 태스크를 끝마칠 때 검사된다는 점이다. 번지 비교 방법을 구현하는 소정 유형의 검사 유닛에 대한 예가 국제 특허 출원 WO 88/02513 에 개시되어 있다.Another approach for collision detection is called a bungee comparison method, where read and write bungees are compared at the end of a task. The main difference compared to the marker method is that access by other processors is generally not checked during task execution, but only upon completion of the task. Examples of certain types of inspection units implementing the address comparison method are disclosed in international patent application WO 88/02513.

현재의 응용 소프트웨어의 재이용Reuse of Current Application Software

현재의 순차적으로 프로그램된 응용 소프트웨어는 보편적으로 큰 투자 대상(investment)을 나타내며, 계층적 처리 시스템의 가장 높은 레벨에 있는 단일-프로세서 노드와 같은 단일-프로세서 시스템의 경우에는, 이미 수천 또는 수 백만 라인의 소프트웨어 코드가 존재한다. 재컴파일 또는 이것에 상당하는 것을 통해 응용 소프트웨어를 자동으로 변형하여, 응용 소프트웨어가 다중 프로세서 상에서 실행될 때 데이터 일치를 보장함으로써, 모든 소프트웨어 코드가 멀티프로세서 환경으로 옮겨져 상기 환경에서 재사용됨으로써, 시간과 돈을 절약할 수 있다.Today's sequentially programmed application software typically represents a large investment, and in the case of single-processor systems such as single-processor nodes at the highest levels of hierarchical processing systems, already thousands or millions of lines Software code exists. Automatically transform application software through recompilation or equivalent to ensure data matching when the application software runs on multiple processors, so that all software code is moved to and reused in a multiprocessor environment, saving time and money. You can save.

도 8a는 계층화된 관점으로 본 종래의 단일-프로세서 시스템을 나타낸다. 아래층에는, 표준 멀티프로세서와 같은 프로세서(P1)가 발견될 수 있다. 다음 레벨은 운영 시스템을 포함하며, 그 다음에는 맨 위 레벨에서 발견될 수 있는 응용 소프트웨어를 해석하는 가상 기계가 이어진다.8A illustrates a conventional single-processor system seen in a layered view. Downstairs, a processor P1, such as a standard multiprocessor, can be found. The next level includes the operating system, followed by a virtual machine that interprets the application software that can be found at the top level.

도 8b는 계층화된 관점으로 본 멀티프로세서 시스템을 나타낸다. 맨 아래 레벨에는, 표준 오프-더-셀프 멀티프로세서로 구현되는 다중 공유-메모리 프로세서(P1 와 P2)가 발견된다. 다음으로, 운영 시스템이 이어진다. 예컨대 SUN 워크스테이션에서 실행되는 APZ 에뮬레이터(emulator), SIMAX와 같은 고성능 컴파일 에뮬레이터, 또는 잘 알려진 자바 가상 기계 등이 가능한 가상 기계는 멀티프로세서 지원 및 데이터-일치 관련 지원에 맞게 변경된다. 순차적으로 프로그램된 응용 소프트웨어는 일반적으로, 컴파일된 경우에는 객체 코드를 후처리(post-processing)하거나 블록/클래스를 재컴파일하고, 또는 해석된 경우에는 해석기를 변경함으로써 데이터-일치 관련 지원을 위한 코드를 간단히 추가하여 변형된다.8B illustrates a multiprocessor system in a layered view. At the bottom level, multiple shared-memory processors P1 and P2 are found that are implemented as standard off-the-self multiprocessors. Next, the operating system follows. For example, a virtual machine capable of an APZ emulator running on a SUN workstation, a high performance compilation emulator such as SIMAX, or a well-known Java virtual machine can be adapted for multiprocessor support and data-matching support. Sequentially programmed application software typically uses code for data-matching support by post-processing object code if compiled, recompiling blocks / classes, or changing interpreters if interpreted. It is modified by simply adding

변수 마킹을 기반으로 하여 충돌을 검출하는 경우, 다음 단계가 취해짐으로써, 단일-프로세서 시스템용으로 기록된 응용 소프트웨어를 멀티프로세서 환경으로 옮길 수 있다. 변수로의 각 기록 액세스 이전에, 변수의 원상태와 번지를 저장하는 코드가 응용 소프트웨어에 삽입됨으로써 적당한 롤백을 가능하게 한다. 변수로의 각각의 판독 및 기록 액세스 이전에, 마커 필드에 마커 비트를 설정하고, 마커 필드를 검사하는 것은 물론, 변수의 번지를 저장하는 코드가 소프트웨어에 삽입된다. 다음으로, 응용 소프트웨어가 재컴파일 또는 재해석되거나, 객체 코드가 후처리된다. 하드웨어/운영 시스템/가상 기계는 또한, 롤백을 구현하고 마커 필드를 리셋하여 충돌 검출 관련 지원을 제공하도록 변경된다. 따라서, 코드를 실행하여 마커 필드를 검사할 때 충돌이 검출된다면, 변경된 변수의 저장되어 있는 사본을 이용하여 롤백을 수행하는 하드웨어/운영 시스템/가상 기계로 제어가 전달된다. 이 외에도, 작업 마지막에, 하드웨어/운영 시스템/가상 기계는 보통, 작업에 의해 액세스된 변수의 저장 번지에 의해 제공되는 각 마커 필드의 관련 비트를 테이크 오버하여 리셋한다.When detecting collisions based on variable marking, the following steps can be taken to move application software written for a single-processor system to a multiprocessor environment. Prior to each write access to the variable, code that stores the original state and address of the variable is inserted into the application software to enable proper rollback. Prior to each read and write access to the variable, code is inserted into the software that sets the marker bit in the marker field, examines the marker field, as well as stores the address of the variable. Next, the application software is recompiled or reinterpreted, or the object code is postprocessed. The hardware / operating system / virtual machine is also modified to implement rollback and reset the marker field to provide collision detection related support. Thus, if a collision is detected when executing code to inspect the marker field, control is passed to a hardware / operating system / virtual machine that performs a rollback using the stored copy of the changed variable. In addition, at the end of the task, the hardware / operating system / virtual machine usually takes over and resets the relevant bit of each marker field provided by the storage address of the variable accessed by the task.

코드의 통계적인 분석은 새로운 코드의 삽입을 최소화할 수 있다는 것을 알아두어야 한다. 예컨대, 상기 기재되어 있는 바와 같은 각각의 판독 및 기록 이전에 하는 대신, 코드 삽입은 궁극적인 목표가 충족되도록 하는 방법으로 더 적은 장소에서 수행될 수 있다.It should be noted that statistical analysis of code can minimize the insertion of new code. For example, instead of prior to each read and write as described above, code insertion can be performed in fewer places in such a way that the ultimate goal is met.

다중 프로세서가 독점적인 설계의 특수 하드웨어로 구현된다면, 응용 소프트웨어가 직접 멀티프로세서 환경으로 옮겨질 수 있다는 것을 알아두어야 한다.If multiple processors are implemented with proprietary hardware in proprietary designs, it should be noted that application software can be moved directly to a multiprocessor environment.

도 9는 본 발명에 따른 하나 이상의 처리 시스템이 구현되는 통신 시스템의 개요도이다. 통신 시스템(100)은 PSTN(Public Switched Telephone Network), PLMN(Public Land Mobile Network), ISDN(Integrated Service Digital Network), 및 ATM(Asynchronous Transfer Mode) 네트워크와 같은 각기 다른 베어러 서비스 네트워크를 지원할 수 있다. 상기 통신 시스템(100)은 기본적으로, 보통 트렁크 그룹으로 그룹화되는 물리적 링크에 의해 상호 접속된 다수의 스위칭/라우팅 노드(50-1 에서 50-6)를 포함한다. 스위칭 노드(50-1 에서 50-4)는 전화기(51-1 에서 51-4) 및 컴퓨터(52-1 에서 52-4)와 같은 액세스 단말기가 로컬 교환기(도시되지 않음)를 통해 접속되는 액세스 포인트를 갖는다. 스위칭 노드(50-5)는 이동 교환국(Mobile Switching Center:MSC)(53)에 접속된다. MSC(53)는 두 개의 기지국 제어기(Base Station Controller:BSC)(54-1 과 54-2)와 홈 위치 레지스터(Home Location Register:HLR) 노드(55)에 접속된다. 제 1BSC(54-1)는 하나 이상의 이동 유닛(57-1과 57-2)과 통신하는 다수의 기지국(56-1 과 56-2)에 접속된다. 마찬가지로, 제 2BSC(54-2)는 하나 이상의 이동 유닛(57-3)과 통신하는 다수의 기지국(56-3 과 56-4)에 접속된다. 스위칭 노드(50-6)는 데이터 베이스 시스템(DBS)이 구비된 호스트 컴퓨터(58)에 접속된다. 컴퓨터(52-1 에서 52-4)와 같이 상기 시스템(100)에 접속된 사용자 단말기는, 호스트 컴퓨터(58) 내의 데이터 베이스 시스템으로부터의 데이터 베이스 서비스를 요청할 수 있다. 서버(59), 특별히 자바 서버가 스위칭/라우팅 노드(50-4)에 접속된다. 사무용 네트워크와 같은 전용 네트워크(도시되지 않음)가 또한 도 1의 통신 시스템에 접속될 수도 있다.9 is a schematic diagram of a communication system in which one or more processing systems according to the present invention are implemented. The communication system 100 may support different bearer service networks, such as a Public Switched Telephone Network (PSTN), a Public Land Mobile Network (PLMN), an Integrated Service Digital Network (ISDN), and an Asynchronous Transfer Mode (ATM) network. The communication system 100 basically comprises a number of switching / routing nodes 50-1 through 50-6 interconnected by physical links, usually grouped into trunk groups. Switching nodes 50-1 through 50-4 provide access to which access terminals such as telephones 51-1 to 51-4 and computers 52-1 to 52-4 are connected through a local exchange (not shown). Has a point. The switching node 50-5 is connected to a Mobile Switching Center (MSC) 53. The MSC 53 is connected to two Base Station Controllers (BSCs) 54-1 and 54-2 and a Home Location Register (HLR) node 55. The first BSC 54-1 is connected to a plurality of base stations 56-1 and 56-2 in communication with one or more mobile units 57-1 and 57-2. Similarly, second BSC 54-2 is connected to multiple base stations 56-3 and 56-4 in communication with one or more mobile units 57-3. The switching node 50-6 is connected to a host computer 58 equipped with a database system DBS. A user terminal connected to the system 100, such as computers 52-1 to 52-4, can request a database service from a database system in the host computer 58. The server 59, in particular the Java server, is connected to the switching / routing node 50-4. A dedicated network (not shown), such as an office network, may also be connected to the communication system of FIG. 1.

통신 시스템(100)은 네트워크에 접속된 사용자에게 다양한 서비스를 제공한다. 상기 서비스의 예로는, PSTN 과 PLMN에서의 일상적인 전화 호출, 메시지 서비스, LAN 상호접속, 지능망(Intelligent Network:IN) 서비스, ISDN 서비스, CTI(Computer Telephony Integration) 서비스, 영상 회의, 파일 전송, 소위 인터넷으로의 액세스, 페이징 서비스, 비디오-온-디맨드(video-on-demand) 등이 있다.The communication system 100 provides various services to a user connected to a network. Examples of such services include routine telephone calls, messaging services, LAN interconnections, intelligent network (IN) services, ISDN services, computer telephony integration (CTI) services, video conferencing, file transfer, so-called on PSTN and PLMN. Access to the Internet, paging services, video-on-demand, and the like.

본 발명에 있어서, 시스템(100) 내의 각 스위칭 노드(50)는, 서비스 요청 및 내부-노드 통신 등의 이벤트를 처리하는 본 발명의 제 1 또는 제 2국면(가능하다면 행렬 처리 시스템의 형태로 상기 두 국면을 결합함)에 따른 처리 시스템(1-1 에서 106)이 제공되는 것이 바람직하다. 호출 설정은 예컨대, 작업 시퀀스를 실행하도록 처리 시스템에 요청한다. 이와 같은 작업 시퀀스는 프로세서 레벨에서의 호출 설정 서비스를 정한다. 본 발명에 따른 처리 시스템은 또한 통신 시스템(100)의 MSC(53), BSC(54-1 과 54-2), HLR 노드(55), 및 호스트 컴퓨터(58)와 서버(59)에각각 배열되는 것이 바람직하다.In the present invention, each switching node 50 in the system 100 is the first or second aspect of the present invention that handles events such as service requests and internal-node communication (if possible in the form of a matrix processing system). It is preferred to provide a treatment system (1-1 to 106) in accordance with the two aspects combined). The call setup requests, for example, the processing system to execute the task sequence. This task sequence defines the call setup service at the processor level. The processing system according to the invention is also arranged in the MSC 53, the BSCs 54-1 and 54-2, the HLR node 55, and the host computer 58 and the server 59 of the communication system 100, respectively. It is desirable to be.

본 발명의 바람직한 이용이 계층적 처리 시스템의 높은 레벨의 프로세서 노드에서 이루어지지만, 당업자들이라면, 상기 기재된 본 발명 국면이 이벤트-흐름 병행 처리가 식별될 수 있는 임의의 이벤트로 구동되는 처리에 적용될 수 있다는 것을 알고 있을 것이다.While the preferred use of the present invention is at high level processor nodes in hierarchical processing systems, those skilled in the art will appreciate that the aspects of the invention described above may be applied to processing driven by any event for which event-flow parallel processing can be identified. You will know that.

이벤트를 기반으로한 시스템이란 용어는 전기 통신, 데이터 통신, 및 트랜잭션-지향 시스템을 포함하지만 이것으로 제한되지는 않는다.The term event-based system includes, but is not limited to, telecommunications, data communications, and transaction-oriented systems.

공유-메모리 프로세서라는 용어는 표준 오프-더-셀프 마이크로프로세서로 제한되지 않고, 모든 처리 유닛에 액세스할 수 있는 응용 소프트웨어와 데이터를 가진 공통 메모리 경향으로 동작하는 SMP 및 특정 하드웨어와 같은 임의의 유형의 처리 유닛을 포함한다. 이것은 또한, 공유 메모리가 다수의 메모리 유닛에 분배되는 시스템과, 심지어는 상이한 프로세서에 대해 분배된 공유 메모리의 각기 다른 부분으로의 액세스 시간이 각각 다를 수 있는 비대칭 액세스를 이용한 시스템을 포함한다.The term shared-memory processor is not limited to standard off-the-self microprocessors, but any type of hardware, such as SMP and certain hardware, that operates with a common memory trend with data and application software that can access all processing units. A processing unit. This also includes systems in which shared memory is distributed to multiple memory units, and even systems using asymmetric access, wherein access times to different portions of shared memory distributed for different processors may each be different.

상기 기재된 실시예는 단지 예로서 제시된 것이며, 이것으로 본 발명이 제한되지는 않는다는 것을 알아두어야 한다. 여기 개시되어 특허 청구된 기본 원리를 유지하는 상기 이외의 변형, 변경, 및 개선은 본 발명의 범위와 의도에 포함된다.It should be noted that the above described embodiments are presented by way of example only, and this does not limit the invention. Modifications, changes, and improvements other than those described above that maintain the basic principles disclosed and claimed herein are included within the scope and spirit of the present invention.

Claims

In a hierarchical distributed processing system (1) based on events having a plurality of processor nodes distributed across multiple system hierarchical levels,

One or more high level processor nodes 10 in the hierarchical processing system 1 are:

Multiple shared-memory processors (11),

Means (14) for mapping external events arriving at the processor node to the processor, where the external event flow is divided into a number of non-switched event categories, with each non-switched event category assigned to a defined set of shared-memory processors Means 14 for mapping to process using the processor of the set, and

Event-based hierarchical distributed processing system, characterized in that it comprises means (15) to ensure data matching when global data in the shared memory (12) is processed by the processor.

The method of claim 1,

Each processor set is a hierarchical distributed processing system based on an event, characterized in that the form of a single processor.

The method of claim 1,

One or more processor sets are in the form of processor arrays operating in a multiprocessor pipeline with multiple processor stages. Each non-switched category event assigned to each processor set is a sequence of events that is executed at different processor stages in the pipeline. A hierarchical distributed processing system based on events, characterized in that processed in a slice.

The method of claim 3, wherein

The event-based hierarchical distribution, characterized in that the events requesting input data from the defined data area in the shared memory 12 are mapped to one and the same defined processor set by the mapping means 14, 18. Processing system.

The method of claim 1,

The non-exchange category is a hierarchical distributed processing system based on an event, characterized in that the order of events within a category is maintained but there is no ordering condition for processing events of different categories.

The method of claim 1,

And said high level processor node further comprises means for feeding events generated by one processor set to the same processor set.

The method of claim 1,

And said non-exchange category is defined by an event from a defined source (S1 / S2).

The method of claim 7, wherein

And the source (S1 / S2) is an input port, a lower level processor node, or a hardware device connected to the hierarchical distributed processing system.

The method of claim 1,

The data matching means 15 comprises means for locking a global variable in a shared memory to be used by a software task executed in response to an event, and means for releasing the locked global variable upon completion of task execution. Event-based hierarchical distributed processing system.

The method of claim 9,

The data matching means 15 further comprises means for releasing and appropriately delaying the locked global variables of one of the two tasks locking each other and restarting the task. Distributed processing system.

The method of claim 1,

The software in the shared memory 12 includes a plurality of software blocks (B1 through Bn), each of the processors executing a software task that includes a software block in response to an event, and each processor, before starting the task execution, Means for forming at least a portion of the data coincidence means (15) that locks at least global data in the block so that only the processor that has locked the block can access the global data in the block. Distributed processing system.

The method of claim 11,

And said locking means locks the entire software block before starting the corresponding task execution and releases the locked block upon completion of the task execution.

The method of claim 11,

And said locking means minimizes deadlocks by at least occupying blocks required for software tasks that consume a large portion of processing time within a task before starting execution of the task.

The method of claim 11,

The high level processor node includes means for detecting a deadlock and means for releasing and appropriately delaying a block locked by one of the waiting processors and then restarting a software task executed by the processor to ensure progress. Event-based hierarchical distributed processing system comprising a.

The method of claim 14,

The deadlock detection means includes means for checking whether a variable required for a software task under consideration is locked by another processor, and means for verifying whether the other processor is waiting for a variable locked by a processor having a task under consideration. Event-based hierarchical distributed processing system comprising a.

The method of claim 1,

The multiprocessor 11 individually processes the events and executes a corresponding number of software tasks in parallel, and the data matching guarantee means 15 cancels the means for detecting a conflict between the parallel tasks and the task for which the conflict is detected. Event-based hierarchical distributed processing system, characterized in that it comprises a means for restarting.

The method of claim 16,

Wherein each processor comprises means for marking variable usage in a shared memory, and wherein the collision detection means comprises means for detecting variable access conflicts based on the marking. Processing system.

The method of claim 16,

The software in the shared memory 12 includes a plurality of software blocks (B1 through Bn), each of the multiple processors executes a software task including the software block in response to an event, and each processor marks the use of variables in the block. Means for detecting a variable access collision based on said marking.

The method of claim 1,

The high level processor node 10 further comprises a parallel event queue 16 with one queue directed to each processor set, the mapping means 14 being external based on the information contained in each external event. A hierarchical distributed processing system based on events characterized by mapping events to event queues.

One or more high level processor nodes 10 in the hierarchical processing system are:

Multi-shared-memory processor 11 operating as a multiprocessor pipeline with multiple processor stages, where each event reaching the multiprocessor pipeline is processed in a slice as a series of events executed at different processor stages in the pipeline. Multiple shared-memory processors (11),

The method of claim 20,

The software in the shared memory 12 comprises a plurality of software blocks B1 to Bn, each event directed to one of the software blocks,

The multiprocessor pipeline comprises means (17, 18) for allocating a block cluster (CL) to each processor in a load-balancing manner, and means (17, 18) for mapping events to processors in accordance with the assignment made by the allocation means. Event-based hierarchical distributed processing system, characterized in that is implemented as).

The method of claim 20,

The data matching means 15 comprises means for locking a global variable in a shared memory to be used by a task executed by a processor in response to an event, and means for releasing the locked global variable upon completion of the task execution. Event-based hierarchical distributed processing system comprising a.

The method of claim 22,

The data matching means (15) further comprises means for restarting the task after releasing and appropriately delaying a global variable of one of the two tasks locking with each other.

The method of claim 20,

The software in the shared memory 12 includes a plurality of software blocks (B1 through Bn), each of the processors executing a software task including the software block in response to an event, wherein each processor is at least software before starting the task execution. Means for forming a portion of the data coincident guarantee means (15) that locks the global data of the block so that only the processor that locks the block has access to the global data in the block. Distributed processing system.

The method of claim 24,

And said locking means locks the entire software block before starting execution of the corresponding task and releases the locked block upon completion of the task execution.

The method of claim 24,

And said locking means minimizes deadlocks by at least occupying blocks necessary for software tasks that consume a large portion of task processing time before starting task execution.

The method of claim 24,

The high level processor node 10 proceeds by means of detecting a deadlock and by releasing and appropriately delaying a block locked by one of the waiting processors and restarting a software task executed by the processor. An event-based hierarchical distributed processing system comprising means for ensuring.

The method of claim 27,

The deadlock detection means includes means for checking whether a variable required for the software task under consideration is locked by another processor and means for verifying whether the other processor is waiting for a variable locked by the processor having the task under consideration. Event-based hierarchical distributed processing system characterized in that the.

The method of claim 20,

The multiprocessor 11 processes the events individually and executes a corresponding number of software tasks in parallel, and the data matching guarantee means 15 cancels the task for which the collision is detected and the means for detecting the collision between the parallel tasks. Event-based hierarchical distributed processing system, characterized in that it comprises a means for restarting.

The method of claim 29,

Wherein each processor comprises means for marking variable usage in shared memory 12, and wherein the collision detection means includes means for detecting variable access conflicts based on the marking. Hierarchical distributed processing system.

A processing method of a distributed processing system (1) based on an event having a plurality of processor nodes distributed over a plurality of system layer levels,

Providing multiple shared-memory processors 11 to one or more high level processor nodes 10 in the hierarchical processing system 1,

Separating the external event flow to the processor node into multiple non-exchange categories (NCCs) of events based on the event-flow parallelism identified in the system,

Mapping each NCC to a processor so that each NCC of the event is assigned to a prescribed set of multiple processors and processed using the processors of the set, and

Ensuring data matching when the global data in the shared memory (12) is processed by the processor such that only one processor accesses the predetermined global data at a time.

The method of claim 31, wherein

The NCC is a method of processing a hierarchical distributed processing system, characterized in that the order of events must be maintained in a category but no ordering conditions exist in processing events of different categories.

The method of claim 31, wherein

One or more processor sets operate as a multiprocessor pipeline with multiple processor stages, each of the events in a non-switched category assigned to the processor set processed in a slice as a series of events executed on different processor stages in the pipeline. Method of processing a hierarchical distributed processing system, characterized in that the.

The method of claim 31, wherein

A method for processing a hierarchical distributed processing system, characterized by supplying events generated by one processor set to the same processor set.

The method of claim 31, wherein

The step of ensuring data matching includes locking a global variable in shared memory to be used by software executed in response to an event, and releasing the locked variable upon completion of a software task execution. A method of processing a hierarchical distributed processing system.

36. The method of claim 35 wherein

The step of ensuring data matching further comprises the step of releasing and appropriately delaying a global variable of one of the two locking tasks, and restarting the task.

The method of claim 31, wherein

The software in the shared memory 12 includes a plurality of software blocks, each of the processors executing a software task including the software blocks in response to an event, and the step of ensuring the data match is performed before being executed by one processor. Locking at least global data of a software block such that only the processor can access global data in the block.

The method of claim 37,

Wherein said entire software block is locked before starting execution of a corresponding task, and said locked block is released upon completion of said task execution.

The method of claim 37,

A method of processing a hierarchical distributed processing system, characterized by avoiding so-called deadlocks by occupying all blocks necessary for a software task before starting task execution.

The method of claim 37,

Detecting a deadlock, releasing a block locked by one of the waiting processors, and restarting a software task executed by the processor after a prescribed delay to ensure progress. Treatment method.

The method of claim 31, wherein

The processor executes a plurality of corresponding software tasks in parallel in response to an event, wherein assuring data matching includes detecting an access conflict and canceling and restarting the task for which the conflict is detected. Processing method of a hierarchical distributed processing system.

42. The method of claim 41 wherein

Wherein each of the processors marks the use of a variable in shared memory, and wherein the collision detection step includes detecting a variable access conflict based on the marking.

The method of claim 31, wherein

And the method further comprises transferring application software for a single-processor system to multiple shared-memory processors and executing accordingly.

A method of processing a hierarchical distributed processing system based on an event having a plurality of processor nodes distributed across a plurality of system layer levels,

Providing a plurality of shared-memory processors 11 to one or more high level processor nodes 10 in the hierarchical processing system 1,

Operating a multiprocessor 11 with a multiprocessor pipeline having multiple processor stages, wherein one or more events arriving at the multiprocessor pipeline are processed in the slice as a series of events executed at different processor stages of the pipeline. Steps, and

Ensuring data matching so that only one processor accesses the global data at a time when global data in shared memory is processed by a multiprocessor.

The method of claim 44,

The software in the shared memory 12 includes a plurality of software blocks (B1 to Bn), each event directed to one of the software blocks, and operating the processor in a multiprocessor pipeline, in a load balancing manner. And assigning a cluster (CL) to each of the processors and mapping the events to the processors according to the allocation.

The method of claim 44,

Ensuring data matching includes locking a global variable in shared memory to be used by software executed in response to an event, and releasing the locked global variable upon completion of a software task execution. A method of processing a hierarchical distributed processing system.

The method of claim 46,

The method of claim 44,

The software in the shared memory 12 includes a plurality of software blocks, each of the processors executing a software task including the software blocks in response to an event, and ensuring the data match is at least before being executed by one processor. Locking the global data of a software block such that only the processor has access to global data in the block.

49. The method of claim 48 wherein

Wherein the entire software block is locked before starting the corresponding task execution, and the locked block is released upon completion of the task execution.

49. The method of claim 48 wherein

A method of processing a hierarchical distributed processing system, characterized by avoiding so-called deadlocks, by occupying all blocks necessary for a software task before starting the task execution.

49. The method of claim 48 wherein

Detecting a deadlock, releasing a block locked by one of the waiting processors to ensure progress by restarting a software task executed by the processor after a defined delay. Treatment method.

The method of claim 44,

The processor executes a plurality of corresponding software tasks in parallel in response to an event, and assuring data matching includes detecting an access conflict between parallel tasks to cancel and restart the task for which the conflict was detected. A method of processing a hierarchical distributed processing system characterized by the above-mentioned.

The method of claim 52, wherein

Wherein each processor marks the use of a variable in shared memory, and wherein the collision detection step includes detecting a variable access collision based on the marking.

The method of claim 44,

And the method further comprises moving application software for a single-processor system to multiple shared-memory processors and executing accordingly.

Multiple shared-memory processors 11 that execute multiple tasks in parallel,

A mapper 14 for mapping parallel categories for external independent event signals to multiple processors 11 to execute corresponding tasks in parallel,

A collision detector for detecting access conflicts between parallel tasks when global data in shared memory is processed by a processor, and

Means for canceling and restarting a job for which a conflict was detected to ensure data matching.

The method of claim 55,

And means for feeding a task generated by one processor to the same processor.

A multiple shared-memory processor 11 that executes multiple tasks in parallel, operating as a multiprocessor pipeline with multiple multiprocessor stages, wherein each external event signal reaching the multiprocessor pipeline is different from the pipeline. A multiple shared-memory processor 11, which is processed in a slice as a series of tasks executed at the processor end,

Shared memory 12 containing software in the form of a plurality of software blocks,

Multiple shared-memory processors (11), each for executing in parallel a task associated with one or more software blocks,

Means (17, 18) for allocating a cluster (CL) of software blocks to each processor,

Means (17, 18) for distributing work to the processor for execution based on the allocation made by the assigning means, and

Means (15) for ensuring data matching when global data in shared memory is processed by a multiprocessor.

A communication system (100) comprising a hierarchical distributed processing system (1) based on events having a plurality of processor nodes distributed across multiple system hierarchical levels,

Multiple shared-memory processors (11),

Means 14 for mapping external events arriving at the processor node to the processor, wherein the external event flow is divided into a plurality of non-exchange event categories, and the non-exchange event categories are assigned to a prescribed set of shared-memory Means (14) for mapping to be processed using a set of processors, and

And means (15) to ensure data matching when global data in the shared memory (12) is processed by the processor.

Multi-shared-memory processor 12 operating in a multiprocessor pipeline with multiple processor stages, where each event reaching the multiprocessor pipeline is processed in a slice as a series of events executed at different processor stages in the pipeline. Multiple shared-memory processors 12, and