EP1131703A1 - Simultanes verarbeitungssystem für veranstaltungsbasierte systeme - Google Patents

Simultanes verarbeitungssystem für veranstaltungsbasierte systeme

Info

Publication number
EP1131703A1
EP1131703A1 EP99972323A EP99972323A EP1131703A1 EP 1131703 A1 EP1131703 A1 EP 1131703A1 EP 99972323 A EP99972323 A EP 99972323A EP 99972323 A EP99972323 A EP 99972323A EP 1131703 A1 EP1131703 A1 EP 1131703A1
Authority
EP
European Patent Office
Prior art keywords
processor
processors
software
task
processing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99972323A
Other languages
English (en)
French (fr)
Inventor
Per Anders Holmberg
Lars-Örjan KLING
Sten Edward Johnson
Milind C-147 CSRE Quarters SOHONI
Nikhil Tikekar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from SE9803901A external-priority patent/SE9803901D0/xx
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Publication of EP1131703A1 publication Critical patent/EP1131703A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Definitions

  • the present invention generally relates to an event-based processing system, and more particularly to a hierarchical distributed processing system as well as a processing method in such a processing system.
  • each network node normally comprises a hierarchy of processors for processing events from the network.
  • the processors in the hierarchy communicate by message passing, and the processors at the lower levels of the processor hierarchy perform low-level processing of simpler sub-tasks, and the processors at the higher levels of the hierarchy perform high-level processing of more complex tasks.
  • U.S. Patent 5,239,539 issued to Uchida et al. discloses a controller for controlling the switching network of an ATM exchange by uniformly distributing loads among a plurality of call processors.
  • a main processor assigns originated call processings to the call processors in the sequence of call originations or by the channel identifiers attached to the respective cells of the calls.
  • a switching state controller collects usage information about a plurality of buffers in the switching network, and the call processors perform call processings based on the content of the switching state controller.
  • the Japanese Patent abstract JP 6276198 discloses a packet switch in which plural processor units are provided, and the switching processing of packets is performed with the units being mutually independent.
  • the Japanese Patent abstract JP 4100449 A discloses an ATM communication system which distributes signaling cells between an ATM exchange and a signaling processor array (SPA) by STM-multiplexing ATM channels. Scattering of processing loads is realized by switching the signaling cells by means of an STM on the basis of SPA numbers added to each virtual channel by a routing tag adder.
  • SPA signaling processor array
  • the Japanese Patent abstract JP 5274279 discloses a parallel processing device which is in the form of a hierarchical set of processors, where processor element groups are in charge of parallel and pipeline processing.
  • Yet another object of the invention is to provide a processing system which is capable of exploiting concurrency in the event flow while still allowing reuse of existing application software.
  • Still another object of the invention is to provide a method for efficiently processing events in a hierarchical distributed processing system.
  • a general idea according to the invention is to introduce multiple shared- memory processors at the highest level or levels of a hierarchical distributed processing system, and optimize the utilization of the multiple processors based on concurrent event flows identified in the system.
  • non-commuting categories are generally groupings of events where the order of events must be preserved within a category, but where there are no ordering requirements between categories.
  • a non-commuting category may be defined by events generated by a predetermined source such as a particular input port, regional processor or hardware device connected to the system.
  • Each non-commuting category of events is assigned to a predetermined set of one or more processors, and internal events generated by a predetermined processor set are fed back to the same processor set in order to preserve the non-commuting category or categories assigned to that processor set.
  • the multiple processors are operated as a multiprocessor pipeline having a number of processor stages, where each external event arriving to the pipeline is processed in slices as a chain of internal events which are executed in different stages of the pipeline.
  • each pipeline stage is executed in one of the processors, but a given processor may execute more than one stage of the pipeline.
  • a particularly advantageous way of realizing a multiprocessor pipeline is to allocate a cluster of software blocks/ classes in the shared memory software to each processor, where each event is targeted for a particular block, and then distribute the events onto the processors based on this allocation.
  • a general processing structure is obtained by what is called matrix processing, where non-commuting categories are executed by different sets of processors, and at least one processor set is in the form of an array of processors which operates as a multiprocessor pipeline in which an external event is processed in slices in different processor stages of the pipeline.
  • data consistency can be assured by locking global data to be used by a software task that is executed in response to an event, or in the case of an object-oriented software design locking entire software blocks/ objects. If processing of an event requires resources from more than one block, then the locking approach may give rise to deadlocks, where tasks are mutually locking each other. Therefore, deadlocks are detected and rollback performed to ensure progress, or alternatively deadlocks are completely avoided by seizing all blocks required by a task before initiating execution of the task.
  • Another approach for assuring data consistency is based on parallel execution of tasks, where access collisions between tasks are detected and an executed task, for which a collision is detected, is rolled-back and restarted. Collisions are either detected based on variable usage markings, or alternatively detected based on address comparison where read and write addresses are compared.
  • the solution according to the invention substantially increases the throughput capacity of the processing system, and for hierarchical processing systems the high-level bottlenecks are efficiently decongested.
  • Fig. 1 is a schematic diagram of a hierarchical distributed processing system with a high-level processor node according to the invention
  • Fig. 2 is a schematic diagram of a processing system according to a first aspect of the invention.
  • Fig. 3 illustrates a particular realization of a processing system according to the first aspect of the invention
  • Fig. 4 is a schematic diagram of a simplified shared-memory multiprocessor with an object-oriented design of the shared-memory software
  • Fig. 5A is a schematic diagram of a particularly advantageous processing system according to a second aspect of the invention.
  • Fig. 5B illustrates a multiprocessor pipeline according to the second aspect of the invention
  • Fig. 6 illustrates the use of locking of blocks/ objects to assure data consistency
  • Fig. 7 illustrates the use of variable marking to detect access collisions
  • Fig. 8A illustrates a prior art single-processor system from a stratified viewpoint
  • Fig. 8B illustrates a multiprocessor system from a stratified viewpoint
  • Fig. 9 is a schematic diagram of a communication system in which at least one processing system according to the invention is implemented.
  • Fig. 1 is a schematic diagram of a hierarchical distributed processing system with a high-level processor node according to the invention.
  • the hierarchical distributed processing system 1 has a conventional tree structure with a number of processor nodes distributed over a number of levels of the system hierarchy.
  • hierarchical processing systems can be found in telecommunication nodes and routers.
  • the high-level processor nodes, and especially the processor node at the top become bottlenecks as the number of events to be processed by the processing system increases.
  • An efficient way of deconge sting such bottlenecks includes using multiple shared-memory processors 11 at the highest level or levels of the hierarchy.
  • the multiple processors are illustrated as implemented at the top node 10.
  • the multiple shared-memory processors 11 are realized in the form of a standard microprocessor based multiprocessor system. All processors 11 share a common memory, the so- called shared memory 12.
  • I/O input/ output
  • the mapper 14 maps/ distributes the events to the processors 11 for processing.
  • the external flow of events to the processor node 10 is divided into a number of concurrent categories, hereinafter referred to as non-commuting categories (NCCs), of events.
  • NCCs non-commuting categories
  • the mapper 14 makes sure that each NCC is assigned to a predetermined set of one or more of the processors 11 , thus enabling concurrent processing and optimized utilization of the multiple processors.
  • the mapper 14 could be implemented in one or more of the processors 11 , which then preferably are dedicated to the mapper.
  • the non-commuting categories are groupings of events where the order of events must be preserved within a category, but where there are no ordering requirements on processing events from different categories.
  • a general requirement for systems where the information flow is governed by protocols is that certain related events must be processed in the received order. This is the invariant of the system, no matter how the system is implemented.
  • the identification of proper NCCs and the concurrent processing of the NCCs guarantee that the ordering requirements imposed by the given system protocols are met, while at the same time the inherent concurrency in the event flow is exploited.
  • the computation for an event-based system is generally modeled as a state machine, where an input event from the external world changes the state of the system and may result in output events. If each non-commuting category/ pipeline stage could be processed by an independent/ disjoint state machine, there would not be any sharing of data between the various state machines. But given that there are global resources, which are represented by global states or variables, the operation on a given global state normally has to be "atomic" with only one processor, which executes part of the system state machine, accessing a given global state at a time. The need for so-called sequence-dependency checks is eliminated because of the NCC /pipeline-based execution.
  • NCC processing as a first aspect of the invention
  • event-level pipeline processing as a second aspect of the invention as well as procedures and means for assuring data consistency will be described.
  • Fig. 2 is a schematic diagram of an event-driven processing system according to a first aspect of the invention.
  • the processing system comprises a number of shared-memory processors PI to P4, a shared memory 12, an I/O-unit 13, a distributor 14, data consistency means 15 and a number of independent parallel event queues 16.
  • the I/O-unit 13 receives incoming external events and outputs outgoing events.
  • the distributor 14 divides the incoming events into non-commuting categories (NCCs) and distributes each NCC to a predetermined one of the independent event queues 16.
  • NCCs non-commuting categories
  • Each one of the event queues is connected to a respective one of the processors, and each processor sequentially fetches or receives events from its associated event queue for processing. If the events have different priority levels, this has to be considered so that the processors will process events in order of priority.
  • a hierarchical processing system with a central high-level processor node and a number of lower-level processors, so-called regional processors, where each regional processor in turn serves a number of hardware devices.
  • the events originating from the hardware devices and the events coming from the regional processors that serve a group of devices meet the conditions imposed by the ordering requirements that are defined by the given protocols (barring error conditions which are protected by processing at a higher level). So, events from a particular device /regional processor form a non-commuting category. In order to preserve a non- commuting category, each device /regional processor must always feed its events to the same processor.
  • a sequence of digits received from a user, or a sequence of ISDN user part messages received for a trunk device must be processed in the received order, whereas sequences of messages received for two independent trunk devices can be processed in any order as long as the sequencing for individual trunk devices is preserved.
  • Fig. 2 it can be seen that events from a predetermined source SI, for example a particular hardware device or input port, are mapped onto a predetermined processor PI, and events from another predetermined source S2, for example a particular regional processor, are mapped onto another predetermined processor P3. Since, the number of sources normally exceeds the number of shared-memory processors by far, each processor is usually assigned a number of sources. In a typical telecom/ datacom application, there could be 1024 regional processors communicating with a single central processor node.
  • Mapping regional processors onto the multiple shared- memory processors in the central node in a load balanced way means that each shared-memory processor roughly gets 256 regional processors (assuming that there are 4 processors in the central node, and all regional processors generate the same load). In practice however, it might be beneficial to have an even finer granularity, mapping hardware devices such as signaling devices, subscriber terminations, etc. to the central node processors. This generally makes it easier to obtain load balance. Each regional processor in a telecom network might control hundreds of hardware devices.
  • the solution according to the invention is to map the hardware devices onto a number of shared-memory processors in the central node, thus decongesting the bottleneck in the central node.
  • a system such as the AXE Digital Switching System of Wegwaret LM Ericsson that processes an external event in slices connected by processor-to- processor (CP-to-CP) signals or so-called internal events, might impose its own sequencing requirement in addition to the one imposed by protocols.
  • Such CP- to-CP signals for an NCC must be processed in the order in which they are generated (unless superseded by a higher priority signal generated by the last slice under execution). This additional sequencing requirement is met if each CP-to-CP signal (internal event) is processed in the same processor in which it is generated, as indicated in Fig. 2 by the dashed lines from the processors to the event queues. So, internal events are kept within the same NCC by feeding them back to the same processor or processor set that generated them - hence guaranteeing that they are processed in the same order in which they were generated.
  • each signal message has a header and a signal body.
  • the signal body includes information necessary for execution of a software task.
  • the signal body includes, implicitly or explicitly, a pointer to software code /data in the shared memory as well as the required input operands.
  • the event signals are self-contained, completely defining the corresponding task. Consequently, the processors PI to P4 independently fetch and process events to execute corresponding software tasks, or jobs, in parallel.
  • a software task is also referred to as a job, and throughout the disclosure, the terms task and job are used interchangeably.
  • the processors need to manipulate global data in the shared memory.
  • the data consistency means 15 In order to avoid data inconsistencies, where several processors access and manipulate the same global data (during the lifetime of a job), the data consistency means 15 must make sure that data consistency is assured at all times.
  • the invention makes use of two basic procedures for assuring data consistency when global data is manipulated by the processors during parallel task execution:
  • Each processor normally comprises means, forming part of the data consistency means 15, for locking the global data to be used by a corresponding task before starting execution of the task. In this way, only the processor that has locked the global data can access it. Preferably, the locked data is released at the end of execution of the task. This approach means that if global data is locked by a processor, and another processor wants to access the same data that other processor has to wait until the locked data is released. Locking generally implies waiting times (wait/ stall on a locked global state) which limits the amount of parallel processing to some degree (concurrent operations on different global states at the same time is of course allowed).
  • Collision detection and roll-back Software tasks are executed in parallel, and access collisions are detected so that one or more executed tasks for which collisions are detected can be rolled-back and restarted. Collision detection is generally accomplished by a marker method or an address comparison method. In the marker method, each processor comprises means for marking the use of variables in the shared memory, and variable access collisions are then detected based on the markings. Collision detection generally has a penalty due to roll-backs (resulting in wasted processing). Which approach to choose depends on the application, and has to be selected on a case-to-case basis. A simple rule of thumb is that locking based data consistency might be more suitable for database systems, and collision detection more beneficial for telecom and datacom systems. In some applications, it may even be advantageous to use a combination of locking and collision detection.
  • Locking and collision detection as means for assuring data consistency will be described in more detail later on.
  • Fig. 3 illustrates a particular realization of a processing system according to the first aspect of the invention.
  • the processors PI to P4 are symmetrical multiprocessors (SMPs) where each processor has its own local cache Cl to C4, and the event queues are allocated in the shared memory 12 as dedicated memory lists, preferably linked lists, EQl to EQ4.
  • SMPs symmetrical multiprocessors
  • each event signal generally has a header and a signal body.
  • the header includes an NCC tag (implicit or explicit) which is representative of the NCC to which the corresponding event belongs.
  • the distributor 14 distributes an incoming event to one of the event queues EQl to EQ4 based on the NCC tag included in the event signal.
  • the NCC tag may be a representation of the source, such as an input port, regional processor or hardware device, from which the event originates. Assume that an event received by the I/O-unit 13 comes from a particular hardware device and that this is indicated in the tag included in the event signal.
  • the distributor 14 evaluates the tag of the event, and distributes the event to a predetermined one of the shared-memory allocated event queues EQ l to EQ4 based on a pre-stored event-dispatch table or equivalent.
  • Each one of the processors PI to P4 fetches events from its own dedicated event queue in the shared memory 12 via its local cache to process and terminate the events in a sequence.
  • the event-dispatch table could be modified from time to time to adjust for long-term imbalances in traffic sources.
  • the invention is not limited to symmetrical multiprocessors with local caches.
  • Other examples of shared-memory systems include shared- memory without cache, shared memory with common cache as well as shared memory with mixed cache.
  • Fig. 4 is a schematic diagram of a simplified shared-memory multiprocessor system having an object-oriented design of the shared-memory software.
  • the software in the shared memory 12 has an object-oriented design, and is organized as a set of blocks Bl to Bn or classes.
  • Each block/ object is responsible for executing a certain function or functions.
  • each block/ object is split into two main sectors - a program sector where the code is stored and a data sector where the data is stored.
  • the code in the program sector of a block can only access and operate on data belonging to the same block.
  • the data sector in turn is preferably divided into two sectors as well - a first sector of "global" data comprising a number of global variables GV1 to GVn, and a second sector of for example "private" data such as records Rl to Rn, where each record typically comprises a number of record variables RV1 to RVn as illustrated for record Rx.
  • Each transaction is typically associated with one record in a block, whereas global data within a block could be shared by several transactions.
  • a signal entry into a block initiates processing of data within the block.
  • each processor executes code in the block indicated by the event signal and operates on global variables and record variables within that block, thus executing a software task.
  • the execution of a software task is indicated in Fig. 4 by a wavy line in each of the processors PI to P4.
  • the first processor PI executes code in software block B88.
  • instruction 120 operates on record variable RV28 in record Rl
  • instruction 121 operates on record variable RV59 in record R5
  • instruction 122 operates on the global variable GV43
  • instruction 123 operates on the global variable GV67.
  • the processor P2 executes code and operates on variables in block Bl
  • the processor P3 executes code and operates on variables in block B8
  • the processor P4 executes code and operates on variables in block B99.
  • An example of a block-oriented software is the PLEX (Programming Language for Exchanges) software of Wegiebolaget LM Ericsson, in which the entire software is organized in blocks.
  • Java applications are examples of truly object-oriented designs.
  • concurrent execution is accomplished by operating at least a set of the multiple shared-memory processors as a multiprocessor pipeline where each external event is processed in slices as a chain of events which are executed in different processor stages of the pipeline.
  • the sequencing requirement of processing signals in order of their creation will be guaranteed as long as all the signals generated by a stage are fed to the subsequent stage in the same order as they are generated. Any deviation from this rule will have to guarantee racing- free execution. If execution of a given slice results in more than one signal, then these signals either have to be fed to the subsequent processor stage in the same order as they are generated, or if the signals are distributed to two or more processors it is necessary to make sure that the resulting possibility of racing is harmless for the computation.
  • Fig. 5A is a schematic diagram of an event-driven processing system according to the second aspect of the invention.
  • the processing system is similar to that shown in Fig. 2.
  • internal events generated by a processor that is part of the multiprocessor pipeline 11 are not necessarily fed back to the same processor, but can be fed to any of the processors, as indicated by the dashed lines that originate from the processors PI to P4 and terminate on the bus to the event queues 16.
  • the software in the shared memory is organized into blocks or classes as described above in connection with Fig. 4, and on receiving an external event the corresponding processor executes code in a block/ object and may generate results in the form of an internal event towards another block/ object.
  • this internal event comes for execution it is executed in the indicated block/ object and might generate another internal event towards some other block/ object.
  • the chain usually dies after a few internal events. In telecommunication applications for example, each external event may typically spawn 5-10 internal events.
  • a realization of a multiprocessor pipeline customized for object-oriented software design is to allocate clusters of software blocks/ classes to the processors.
  • clusters CL1 to CLn of blocks/ classes in the shared memory 12 are schematically indicated by dashed boxes.
  • one of the clusters CL1 is allocated to the processor P2 as indicated by the solid line interconnecting CL1 with P2
  • another cluster CL2 is allocated to the processor P4 as indicated by the dashed line interconnecting CL2 with P4.
  • each cluster of blocks/ classes within the shared memory 12 is allocated to a predetermined one of the processors PI to P4, and the allocation scheme is implemented in a look-up table 17 in the distributor 14 and in a look-up table 18 in the shared memory 12.
  • Each of the look-up tables 17, 18 links a target block to each event based on e.g. the event ID, and associates each target block to a predetermined cluster of blocks.
  • the distributor 14 distributes external events to the processors according to the information in the look-up table 17.
  • the look-up table 18 in the shared memory 12 is usable by all of the processors PI to P4 to enable distribution of internal events to the processors. In other words, when a processor generates an internal event, it consults the look-up table 18 to determine i) the corresponding target block based on e.g. the event ID, ii) the cluster to which the identified target block belongs, and iii) the processor to which the identified cluster is allocated, and then feeds the internal event signal to the appropriate event queue. It is important to note that normally each block belongs to one and only one cluster, although an allocation scheme with overlapping clusters could be implemented in a slightly more elaborate way by using information such as execution state in addition to the event ID.
  • mapping clusters of blocks/ classes to processors automatically causes pipelined execution - let us say the external event EE is directed to block A, which is allocated to processor PI, then the internal event IE generated by this block is directed to block B, which is allocated to processor P2, then the internal event IE generated by this block is directed to block C, which is allocated to processor P4, and the internal event IE generated by this block is directed to block D which is allocated to processor PI .
  • the external event EE is directed to block A, which is allocated to processor PI
  • block B which is allocated to processor P2
  • the internal event IE generated by this block is directed to block C, which is allocated to processor P4
  • the internal event IE generated by this block is directed to block D which is allocated to processor PI .
  • blocks A and D are part of a cluster mapped to processor PI
  • block B is part of a cluster mapped to processor P2
  • block C is part of a cluster mapped to processor P4.
  • Each stage in the pipeline is executed in one processor, but
  • a variation includes mapping events that require input data from a predetermined data area in the shared memory 12 to one and the same predetermined processor set. It should be understood that when a processor stage in the multiprocessor pipeline has executed an event belonging to a first chain of events, and sent the resulting internal event signal to the next processor stage, it is normally free to start processing an event from the next chain of events, thus improving the throughput capacity.
  • the mapping of pipeline stages to the processors should be such that all the processors are equally loaded. Therefore, the clusters of blocks/ classes are partitioned according to an "equal load” criterion.
  • the amount of time spent in each cluster can be known for example from a similar application running on a single processor, or could be monitored during runtime to enable re-adjustment of the partitioning.
  • a "no racing" criterion along with the "equal load” criterion is required to prevent an internal event generated "later" than another event from being executed "earlier".
  • the same processing of an external event can be performed in a few big slices or many small slices.
  • each processor in executing a task, generally locks the global data to be used by the task before starting execution of the task. In this way, only the processor that has locked the global data can access it.
  • Locking is very suitable for object-oriented designs as the data areas are clearly defined, allowing specific data sectors of a block or an entire block to be locked. Lacking a general characterization of global data as it is normally not possible to know which part of the global data in a block that will be modified by a given execution sequence or task, locking the entire global data sector is a safe way of assuring data consistency. Ideally, just protecting the global data in each block is sufficient, but in many applications there are certain so-called "across record” operations that also need to be protected. For example, the operation of selecting a free record will go through many records to actually find a free record. Hence locking the entire block protects everything.
  • NCCs will generally minimize "shared states" between the multiple processors and also improve the cache hit rate.
  • mapping for example functionally different regional processors/ hardware devices such as signaling devices and subscriber terminations in a telecommunication system to different processors in the central node, simultaneous processing of different access mechanisms with little or no wait on locked blocks is allowed since different access mechanisms are normally processed in different blocks till the processing reaches the late stages of execution.
  • Fig. 6 illustrates the use of locking of blocks/objects to assure data consistency.
  • the external event EEx enters the block Bl and the corresponding processor locks the block Bl before starting execution in the block, as indicated by the diagonal line across the block Bl.
  • the external event EEy enters the block B2 and the corresponding processor locks the block B2.
  • the external event EEz directed to block Bl comes after the external event EEx which has already entered block Bl and locked that block. Accordingly, the processing of external event EEz has to wait until block B 1 is released.
  • Locking might give rise to deadlock conditions in which two processors indefinitely wait for each other to release variables mutually required by the processors in execution of their current tasks. It is therefore desirable either to avoid deadlocks, or to detect them and perform roll-back with guarantee of progress.
  • deadlock detection could be almost immediate. Since all "overhead processing" takes place between two jobs, deadlock detection will be evident while acquiring "resources” for a later job that will cause a deadlock. This is accomplished by checking if one of the resources required by the job under consideration is held by some processor, and then verifying whether that processor is waiting on a resource held by the processor with the job under consideration - for example by using flags per blocks.
  • deadlocks will normally also have an impact on the scheme for rollback and progress.
  • the lower the deadlock frequency the simpler the roll-back scheme, as one does not have to bother about the efficiency of rare roll-backs.
  • the deadlock frequency is relatively high it is important to have an efficient roll-back scheme.
  • the basic principle for roll-back is to release all the held resources, go back to the beginning of one of the jobs involved in causing the deadlock, undoing all changes made in the execution up to that point, and restart the rolled-back job later in such a way, or after such a delay, that progress can be guaranteed without compromising the efficiency.
  • simply selecting the "later" job causing the deadlock for rollback should be adequate.
  • Collision detection as a means of assuring data consistency
  • the software tasks are executed in parallel by the multiple processors, and access collisions are detected so that one or more executed tasks for which collisions are detected can be rolled-back and restarted.
  • each processor marks the use of variables in the shared memory while executing a task, thus enabling variable access collisions to be detected.
  • the marker method consists of marking the use of individual variables in the shared memory.
  • One way of implementing a more coarse-grained collision check is to utilize standard memory management techniques including paging.
  • Another way is to mark groupings of variables, and it has turned out be particularly efficient to mark entire records including all record variables in the records, instead of marking individual record variables. It is however important to choose "data areas" in such a way that if a job uses a given data area then the probability of some other job using the same area should be very low. Otherwise, the coarsegrained data-area marking may in fact result in a higher roll-back frequency.
  • Fig. 7 illustrates the use of variable marking to detect access collisions in an object-oriented software design.
  • the shared memory 12 is organized into blocks Bl to Bn as described above in connection with Fig. 4, and a number of processors PI to P3 are connected to the shared memory 12.
  • Fig. 7 shows two blocks, block B2 and block B4, in more detail.
  • each global variable GV1 to GVn and each record Rl to Rn in a block is associated with a marker field as illustrated in Fig. 7.
  • the marker field has 1 bit per processor connected to the shared memory system, and hence in this case, each marker field has 3 bits. All bits are reset at start, and each processor sets its own bit before accessing (read or write) a variable or record, and then reads the entire marker field for evaluation. If there is any other bit that is set in the marker field, then a collision is imminent, and the processor rolls back the task being executed, undoing all changes made up to that point in the execution including resetting all the corresponding marker bits. On the other hand, if no other bit is set then the processor continues execution of the task. Each processor records the address of each variable accessed during execution, and uses the recorded address(es) to reset its own bit in each of the corresponding marker fields at the end of execution of a task.
  • the processor P2 needs to access the global variable GV1, and sets its own bit at the second position of the marker field associated with GV1, and then reads the entire marker field.
  • the field (110) contains a bit set by processor PI and a bit set by processor P2, and consequently an imminent variable access collision is detected.
  • the processor P2 rolls back the task being executed.
  • processor P2 needs to access the record R2, it sets its own bit at the second position, and then reads the entire marker field.
  • the field (Oi l) contains a bit set by P2 and a bit set by P3, and consequently a record access collision is detected, and the processor P2 rolls back the task being executed.
  • processor P3 When processor P3 needs to access the record Rl, it first sets its own bit in the third position of the associated marker field, and then reads the entire field for evaluation. In this case, no other bits are set so the processor P3 is allowed to access the record for a read or write.
  • each marker field will have two bits per processor, one bit for write and one bit for read so as to reduce unnecessary roll-backs, for example on variables that are mostly read.
  • Another approach for collision detection is referred to as the address comparison method, where read and write addresses are compared at the end of a task.
  • the main difference compared to the marker method is that accesses by other processors are generally not checked during execution of a task, only at the end of a task.
  • An example of a specific type of checking unit implementing an address comparison method is disclosed in our international patent application WO 88/02513. Reuse of existing application software
  • Fig. 8A illustrates a prior art single-processor system from a stratified viewpoint.
  • the processor PI such as a standard microprocessor can be found.
  • the next level includes the operating system, and then comes the virtual machine, which interprets the application software found at the top level.
  • Fig. 8B illustrates a multiprocessor system from a stratified viewpoint.
  • multiple shared-memory processors PI and P2 implemented as standard off-the-shelf microprocessors are found. Then comes the operating system.
  • the virtual machine which by way of example may be an APZ emulator running on a SUN work station, a compiling high-performance emulator such as SIMAX or the well-known Java Virtual Machine, is modified for multiprocessor support and data-consistency related support.
  • the sequentially programmed application software is generally transformed by simply adding code for data-consistency related support by post-processing the object code or recompiling blocks /classes if compiled, or modifying the interpreter if interpreted.
  • the following steps may be taken to enable migration of application software written for a single-processor system to a multiprocessor environment.
  • code for storing the address and original state of the variable is inserted into the application software to enable proper roll-back.
  • code for setting marker bits in the marker field, checking the marker field as well as for storing the address of the variable is inserted into the software.
  • the application software is then recompiled or reinterpreted, or the object code is post- processed.
  • the hardware /operating system/virtual machine is also modified to give collision detection related support, implementing roll-back and resetting of marker fields.
  • the control is normally transferred to the hardware /operating system/virtual machine, which performs roll-back using the stored copy of the modified variables.
  • the hardware /operating system/virtual machine normally takes over and resets the relevant bit in each of the marker fields given by the stored addresses of variables that have been accessed by the job.
  • Fig. 9 is a schematic diagram of a communication system in which one or more processing systems according to the invention are implemented.
  • the communication system 100 may support different bearer service networks such as PSTN (Public Switched Telephone Network), PLMN (Public Land Mobile Network), ISDN (Integrated Services Digital Network) and ATM (Asynchronous Transfer Mode) networks.
  • the communication system 100 basically comprises a number of switching/ routing nodes 50-1 to 50-6 interconnected by physical links that are normally grouped into trunk groups.
  • the switching nodes 50-1 to 50-4 have access points to which access terminals, such as telephones 51-1 to 51-4 and computers 52-1 to 52-4, are connected via local exchanges (not shown).
  • the switching node 50-5 is connected to a Mobile Switching Center (MSC) 53.
  • MSC Mobile Switching Center
  • the MSC 53 is connected to two Base Station Controllers (BSCs) 54-1 and 54-2, and a Home Location Register (HLR) node 55.
  • the first BSC 54- 1 is connected to a number of base stations 56-1 and 56-2 communicating with one or more mobile units 57-1 and 57-2.
  • the second BSC 54-2 is connected to a number of base stations 56-3 and 56-4 communicating with one or more mobile units 57-3.
  • the switching node 50-6 is connected to a host computer 58 provided with a data base system (DBS).
  • DBS data base system
  • User terminals connected to the system 100, such as the computers 52-1 to 52-4, can request data base services from the data base system in the host computer 58.
  • a server 59 is connected to the switching/ routing node 50-4. Private networks such as business networks (not shown) may also be connected to the communication system of Fig. 1.
  • the communication system 100 provides various services to the users connected to the network. Examples of such services are ordinary telephone calls in PSTN and PLMN, message services, LAN interconnects, Intelligent Network (IN) services, ISDN services, CTI (Computer Telephony Integration) services, video conferences, file transfers, access to the so-called Internet, paging services, video-on-demand and so on.
  • each switching node 50 in the system 100 is preferably provided with a processing system 1-1 to 1-6 according to the first or second aspect of the invention (possibly a combination of the two aspects in the form of a matrix processing system), which handles events such as service requests and inter-node communication.
  • a call set-up for example requires the processing system to execute a sequence of jobs. This sequence of jobs defines the call set-up service on the processor level.
  • a processing system according to the invention is preferably also arranged in each one of the MSC 53, the BSCs 54-1 and 54-2, the HLR node 55 and the host computer 58 and the server 59 of the communication system 100.
  • event-based system includes but is not limited to telecommunication, data communication and transaction-oriented systems.
  • shared-memory processors is not limited to standard off-the-shelf microprocessors, but includes any type of processing units, such as SMPs and specialized hardware, operating towards a common memory with application software and data accessible to all processing units. This also includes systems where the shared memory is distributed over several memory units and even systems with asymmetrical access where the access times to different parts of the distributed shared memory for different processors could be different.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Computer And Data Communications (AREA)
EP99972323A 1998-11-16 1999-11-12 Simultanes verarbeitungssystem für veranstaltungsbasierte systeme Withdrawn EP1131703A1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
SE9803901 1998-11-16
SE9803901A SE9803901D0 (sv) 1998-11-16 1998-11-16 a device for a service network
PCT/SE1999/002064 WO2000029942A1 (en) 1998-11-16 1999-11-12 Concurrent processing for event-based systems

Publications (1)

Publication Number Publication Date
EP1131703A1 true EP1131703A1 (de) 2001-09-12

Family

ID=50202830

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99972323A Withdrawn EP1131703A1 (de) 1998-11-16 1999-11-12 Simultanes verarbeitungssystem für veranstaltungsbasierte systeme

Country Status (7)

Country Link
EP (1) EP1131703A1 (de)
JP (1) JP4489958B2 (de)
KR (1) KR100401443B1 (de)
AU (1) AU1437300A (de)
BR (1) BR9915363B1 (de)
CA (1) CA2350922C (de)
WO (1) WO2000029942A1 (de)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6633865B1 (en) 1999-12-23 2003-10-14 Pmc-Sierra Limited Multithreaded address resolution system
US7080238B2 (en) 2000-11-07 2006-07-18 Alcatel Internetworking, (Pe), Inc. Non-blocking, multi-context pipelined processor
US7526770B2 (en) 2003-05-12 2009-04-28 Microsoft Corporation System and method for employing object-based pipelines
JP2006146678A (ja) 2004-11-22 2006-06-08 Hitachi Ltd 情報処理装置におけるプログラム制御方法、情報処理装置、及びプログラム
US20080301125A1 (en) 2007-05-29 2008-12-04 Bea Systems, Inc. Event processing query language including an output clause
US20090070786A1 (en) 2007-09-11 2009-03-12 Bea Systems, Inc. Xml-based event processing networks for event server
WO2011107163A1 (en) * 2010-03-05 2011-09-09 Telefonaktiebolaget L M Ericsson (Publ) A processing system with processing load control
EP2650750A1 (de) * 2012-04-12 2013-10-16 Telefonaktiebolaget L M Ericsson AB (Publ) Vorrichtung und Verfahren zur Aufgabenzuweisung in einem Knoten eines Telekommunikationsnetzwerks

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58149555A (ja) * 1982-02-27 1983-09-05 Fujitsu Ltd 並列処理装置
JPS6347835A (ja) * 1986-08-18 1988-02-29 Agency Of Ind Science & Technol パイプライン計算機
JPS63301332A (ja) * 1987-06-02 1988-12-08 Nec Corp ジョブ実行方式
US5072364A (en) * 1989-05-24 1991-12-10 Tandem Computers Incorporated Method and apparatus for recovering from an incorrect branch prediction in a processor that executes a family of instructions in parallel
JP2957223B2 (ja) 1990-03-20 1999-10-04 富士通株式会社 コールプロセッサの負荷分散制御方式
JPH07122866B1 (de) * 1990-05-07 1995-12-25 Mitsubishi Electric Corp
JPH04100449A (ja) 1990-08-20 1992-04-02 Toshiba Corp Atm通信システム
JPH04273535A (ja) * 1991-02-28 1992-09-29 Nec Software Ltd マルチタスク制御方式
US5287467A (en) * 1991-04-18 1994-02-15 International Business Machines Corporation Pipeline for removing and concurrently executing two or more branch instructions in synchronization with other instructions executing in the execution unit
CA2067576C (en) * 1991-07-10 1998-04-14 Jimmie D. Edrington Dynamic load balancing for a multiprocessor pipeline
JPH0546415A (ja) * 1991-08-14 1993-02-26 Nec Software Ltd 排他管理制御方式
JP3182806B2 (ja) 1991-09-20 2001-07-03 株式会社日立製作所 バージョンアップ方法
JPH05204876A (ja) * 1991-10-01 1993-08-13 Hitachi Ltd 階層型ネットワークおよび階層型ネットワークを用いたマルチプロセッサシステム
US5471580A (en) 1991-10-01 1995-11-28 Hitachi, Ltd. Hierarchical network having lower and upper layer networks where gate nodes are selectively chosen in the lower and upper layer networks to form a recursive layer
US5511172A (en) * 1991-11-15 1996-04-23 Matsushita Electric Co. Ind, Ltd. Speculative execution processor
US5379428A (en) * 1993-02-01 1995-01-03 Belobox Systems, Inc. Hardware process scheduler and processor interrupter for parallel processing computer systems
JP2655466B2 (ja) 1993-03-18 1997-09-17 日本電気株式会社 パケット交換装置
WO1994027216A1 (en) * 1993-05-14 1994-11-24 Massachusetts Institute Of Technology Multiprocessor coupling system with integrated compile and run time scheduling for parallelism
JP3005397B2 (ja) * 1993-09-06 2000-01-31 関西日本電気ソフトウェア株式会社 デッドロック多発自動回避方式
ATE184407T1 (de) * 1994-01-03 1999-09-15 Intel Corp Verfahren und vorrichtung zum implementieren eines vierstufigen verzweigungsauflosungssystem in einem rechnerprozessor
JPH0836552A (ja) * 1994-07-22 1996-02-06 Nippon Telegr & Teleph Corp <Ntt> 分散処理方法、分散処理システム及び分散処理管理装置
CN1209207A (zh) 1995-12-19 1999-02-24 艾利森电话股份有限公司 指令处理机作业的调度
US5848257A (en) * 1996-09-20 1998-12-08 Bay Networks, Inc. Method and apparatus for multitasking in a computer system
US6240509B1 (en) * 1997-12-16 2001-05-29 Intel Corporation Out-of-pipeline trace buffer for holding instructions that may be re-executed following misspeculation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0029942A1 *

Also Published As

Publication number Publication date
KR100401443B1 (ko) 2003-10-17
CA2350922C (en) 2014-06-03
JP2002530737A (ja) 2002-09-17
CA2350922A1 (en) 2000-05-25
AU1437300A (en) 2000-06-05
KR20010080958A (ko) 2001-08-25
BR9915363B1 (pt) 2012-12-25
WO2000029942A1 (en) 2000-05-25
BR9915363A (pt) 2001-07-31
JP4489958B2 (ja) 2010-06-23

Similar Documents

Publication Publication Date Title
Thistle et al. A processor architecture for Horizon
Magnusson et al. Queue locks on cache coherent multiprocessors
Anderson et al. The performance implications of thread management alternatives for shared-memory multiprocessors
US6480918B1 (en) Lingering locks with fairness control for multi-node computer systems
US8091078B2 (en) Dynamically partitioning processing across a plurality of heterogeneous processors
US20050240930A1 (en) Parallel processing computer
Dagum et al. Polytopes, permanents and graphs with large factors
CN100492282C (zh) 处理系统、通信系统及在处理系统中处理作业的方法
EP1131704B1 (de) Verarbeitungssystem-aufstellung
JPS616741A (ja) 階層型多重計算機システム
CA2350922C (en) Concurrent processing for event-based systems
US20080134187A1 (en) Hardware scheduled smp architectures
Ha et al. A massively parallel multithreaded architecture: DAVRID
Kessler et al. Concurrent scheme
Giloi et al. Very high-speed communication in large MIMD supercomputers
Lundberg A parallel Ada system on an experimental multiprocessor
Sato et al. Experience with executing shared memory programs using fine-grain communication and multithreading in EM-4
Shieh et al. Multi-threaded design for a distributed shared memory system
Asthana et al. Towards a programming environment for a computer with intelligent memory
Chang et al. An efficient thread architecture for a distributed shared memory on symmetric multiprocessor clusters
Samman et al. Architecture, on-chip network and programming interface concept for multiprocessor system-on-chip
Elleuch et al. Dynamic load balancing mechanisms for a parallel operating system kernel
Kosai et al. Application of virtual storage to switching systems-an implementation of CTRON kernel on a general-purpose microprocessor
Matthews Concurrency in Poly/ML
Trachos et al. A class hierarchy emulating virtual shared objects on message-passing systems

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010618

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

RBV Designated contracting states (corrected)

Designated state(s): DE FI FR GB

17Q First examination report despatched

Effective date: 20071112

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 9/52 20060101ALI20150120BHEP

Ipc: G06F 9/50 20060101AFI20150120BHEP

INTG Intention to grant announced

Effective date: 20150218

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20150630