WO2001016742A2 - Network shared memory - Google Patents

Network shared memory Download PDF

Info

Publication number
WO2001016742A2
WO2001016742A2 PCT/US2000/024248 US0024248W WO0116742A2 WO 2001016742 A2 WO2001016742 A2 WO 2001016742A2 US 0024248 W US0024248 W US 0024248W WO 0116742 A2 WO0116742 A2 WO 0116742A2
Authority
WO
WIPO (PCT)
Prior art keywords
shared memory
memory
shared
message
node
Prior art date
Application number
PCT/US2000/024248
Other languages
French (fr)
Other versions
WO2001016742A3 (en
Inventor
Chris Miller
Original Assignee
Times N Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US15215199P priority Critical
Priority to US60/152,151 priority
Priority to US60/220,794 priority
Priority to US60/220,748 priority
Priority to US22097400P priority
Priority to US22074800P priority
Application filed by Times N Systems, Inc. filed Critical Times N Systems, Inc.
Publication of WO2001016742A2 publication Critical patent/WO2001016742A2/en
Publication of WO2001016742A3 publication Critical patent/WO2001016742A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0284Multiple user address space allocation, e.g. using different base addresses
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/457Communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0837Cache consistency protocols with software control, e.g. non-cacheable data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/52Indexing scheme relating to G06F9/52
    • G06F2209/523Mode

Abstract

Methods, systems and devices are described for a network shared memory. A method, includes: creating and assembling a message buffer in shared memory; parsing the message buffer for a plurality of pointer fields; and passing a message from a transmitting process to a receiving process by passing at least one pointer. The methods, systems and devices provide advantages because the speed and scalability of parallel processor systems is enhanced.

Description

NETWORK SHARED MEMORY

BACKGROUND OF THE INVENTION 1. Field of the Invention

The invention relates generally to the field of computing systems in which multiple processors share some memory but in which each is required to provide separate access to standard network I/O implementation for interconnection. More particularly, the invention relates to computer science techniques that utilize a network shared memory.

2. Discussion of the Related Art

The clustering of workstations is a well-known art. In the most common cases, the clustering involves workstations that operate almost totally independently, utilizing the network only to share such services as a printer, license-limited applications, or shared files.

In more-closely-coupled environments, some software packages (such as NQS) allow a cluster of workstations to share work. In such cases the work arrives, typically as batch jobs, at an entry point to the cluster where it is queued and dispatched to the workstations on the basis of load. In both of these cases, and all other known cases of clustering, the operating system and cluster subsystem are built around the concept of message-passing. The term message-passing means that a given workstation operates on some portion of a job until communications (to send or receive data, typically) with another workstation is necessary. Then, the first workstation prepares and communicates with the other workstation.

Another well-known art is that of clustering processors within a machine, usually called a Massively Parallel Processor or MPP, in which the techniques are essentially identical to those of clustered workstations. Usually, the bandwidth and latency of the interconnect network of an MPP are more highly optimized, but the system operation is the same.

In the general case, the passing of a message is an extremely expensive operation; expensive in the sense that many CPU cycles in the sender and receiver are consumed by the process of sending, receiving, bracketing, verifying, and routing the message, CPU cycles that are therefore not available for other operations. A highly streamlined message-passing subsystem can typically require 10,000 to 20,000 CPU cycles or more. There are specific cases wherein the passing of a message requires significantly less overhead. However, none of these specific cases is adaptable to a general-purpose computer system.

Message-passing parallel processor systems have been offered commercially for years but have failed to capture significant market share because of poor performance and difficulty of programming for typical parallel applications. Message-passing parallel processor systems do have some advantages. In particular, because they share no resources, message-passing parallel processor systems are easier to provide with high-availability features. What is needed is a better approach to parallel processor systems. There are alternatives to the passing of messages for closely-coupled cluster work. One such alternative is the use of shared memory for inter- processor communication.

Shared-memory systems, have been much more successful at capturing market share than message-passing systems because of the dramatically superior performance of shared-memory systems, up to about four-processor systems. In Search of Clusters, Gregory F. Pfister 2nd ed. (January 1998) Prentice Hall Computer Books, ISBN: 0138997098 describes a computing system with multiple processing nodes in which each processing node is provided with private, local memory and also has access to a range of memory which is shared with other processing nodes. The disclosure of this publication in its entirety is hereby expressly incorporated herein by reference for the purpose of indicating the background of the invention and illustrating the state of the art.

However, providing high availability for traditional shared-memory systems has proved to be an elusive goal. The nature of these systems, which share all code and all data, including that data which controls the shared operating systems, is incompatible with the separation normally required for high availability. What is needed is an approach to shared-memory systems that improves availability.

Although the use of shared memory for inter-processor communication is a well-known art, prior to the teachings of U.S. Ser. No. 09/273,430, filed March 19, 1999, entitled Shared Memory Apparatus and Method for

Multiprocessing Systems, the processors shared a single copy of the operating system. The problem with such systems is that they cannot be efficiently scaled beyond four to eight way systems except in unusual circumstances. All known cases of said unusual circumstances are such that the systems are not good price-performance systems for general-purpose computing.

The entire contents of U.S. Patent Applications 09/273,430, filed March 19, 1999 and PCT/US00/01262, filed January 18, 2000 are hereby expressly incorporated by reference herein for all purposes. U.S. Ser. No. 09/273,430, improved upon the concept of shared memory by teaching the concept which will herein be referred to as a tight cluster. The concept of a tight cluster is that of individual computers, each with its own CPU(s), memory, I/O, and operating system, but for which collection of computers there is a portion of memory which is shared by all the computers and via which they can exchange information. U.S. Ser. No. 09/273,430 describes a system in which each processing node is provided with its own private copy of an operating system and in which the connection to shared memory is via a standard bus. The advantage of a tight cluster in comparison to an SMP is "scalability" which means that a much larger number of computers can be attached together via a tight cluster than an SMP with little loss of processing efficiency. What is needed are improvements to the concept of the tight cluster.

What is also needed is an expansion of the concept of the tight cluster.

SUMMARY OF THE INVENTION A goal of the invention is to simultaneously satisfy the above-discussed requirements of improving and expanding the tight cluster concept which, in the case of the prior art, are not satisfied. One embodiment of the invention is based on a method, comprising: creating and assembling a message buffer in shared memory; parsing said message buffer for a plurality of pointer fields; and passing a message from a transmitting process to a receiving process by passing at least one pointer. Another embodiment of the invention is based on an apparatus, comprising: a shared memory node; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein a message buffer is created and assembled in said shared memory node, said message buffer is parsed for a plurality of pointer fields; and a message from said first processing node to said second processing node is passed by at least one pointer. Another embodiment of the invention is based on an electronic media, comprising: a computer program adapted to create and assemble a message buffer in shared memory; parse said message buffer for a plurality of pointer fields; and pass a message from a transmitting process to a receiving process by passing at least one pointer. Another embodiment of the invention is based on a computer program comprising computer program means adapted to perform the steps of: creating and assembling a message buffer in shared memory; parsing said message buffer for a plurality of pointer fields; and passing a message from a transmitting process to a receiving process by passing at least one pointer when said computer program is run on a computer. Another embodiment of the invention is based on a system, comprising a multiplicity of processors, each with some private memory and the multiplicity with some shared memory, interconnected and arranged such that memory accesses to a first set of address ranges will be to local, private memory whereas memory accesses to a second set of address ranges will be to shared memory, and configured so that MBUFs are constructed and connected within shared memory. Another embodiment of the invention is based on a computer system comprising Operating System extensions to perform network I/O functions in a shared-memory environment, wherein said Operating System extensions perform the functions with Load and Store operations. Another embodiment of the invention is based on a computer system comprising Operating System extensions to perform network I/O functions in a shared-memory environment, wherein said Operating System extensions transparently simulate standard networking protocols. Another embodiment of the invention is based on an apparatus, comprising: a shared memory node; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein Operating System extensions perform network

I/O functions in a shared-memory environment with Load and Store operations. Another embodiment of the invention is based on an apparatus, comprising: a shared memory node; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein Operating System extensions perform network I/O functions in a shared-memory environment and transparently simulate standard networking protocols.

These, and other goals and embodiments of the invention will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawing. It should be understood, however, that the following description, while indicating preferred embodiments of the invention and numerous specific details thereof, is given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the invention without departing from the spirit thereof, and the invention includes all such modifications.

BRIEF DESCRIPTION OF THE DRAWING A clear conception of the advantages and features constituting the invention, and of the components and operation of model systems provided with the invention, will become more readily apparent by referring to the exemplary, and therefore nonlimiting, embodiments illustrated in the drawing accompanying and forming a part of this specification, wherein like reference characters (if they occur in more than one view) designate the same parts. It should be noted that the features illustrated in the drawing are not necessarily drawn to scale.

FIG. 1 illustrates a block schematic view of a system, representing an embodiment of the invention. DESCRIPTION OF PREFERRED EMBODIMENTS The invention and the various features and advantageous details thereof are explained more fully with reference to the nonlimiting embodiments that are illustrated in the accompanying drawing and detailed in the following description of preferred embodiments. Descriptions of well known components and processing techniques are omitted so as not to unnecessarily obscure the invention in detail.

The teachings of U.S. Ser. No. 09/273,430 include a system which is a single entity; one large supercomputer. The invention is also applicable to a cluster of workstations, or even a network.

The invention is applicable to systems of the Pfister or the type of U.S. Ser. No. 09/273,430 in which each processing node has its own copy of an operating system. The invention is also applicable to other types of multiple processing node systems.

The context of the invention can include a tight cluster as described in U.S. Ser. No. 09/273,430. A tight cluster is defined as a cluster of workstations or an arrangement within a single, multiple-processor machine in which the processors are connected by a high-speed, low-latency interconnection, and in which some but not all memory is shared among the processors. Within the scope of a given processor, accesses to a first set of ranges of memory addresses will be to local, private memory but accesses to a second set of memory address ranges will be to shared memory. The significant advantage to a tight cluster in comparison to a message-passing cluster is that, assuming the environment has been appropriately established, the exchange of information involves a single

STORE instruction by the sending processor and a subsequent single LOAD instruction by the receiving processor.

The establishment of the environment, taught by U.S. Ser. No. 09/273,430 and more fully by companion disclosures (U.S. Provisional Application Ser. No. 60/220,794, filed July 26, 2000; U.S. Provisional

Application Ser. No. 60/220,748, filed July 26, 2000; WSGR 15245-711; WSGR 15245-712; WSGR 15245-713; WSGR 15245-715; WSGR 15245-716; WSGR 15245-717; WSGR 15245-718; and WSGR 15245-720, the entire contents of all which are hereby expressly incorporated herein by reference for all purposes) can be performed in such a way as to require relatively little system overhead, and to be done once for many, many information exchanges. Therefore, a comparison of 10,000 instructions for message-passing to a pair of instructions for tight-clustering, is valid.

In a shared-memory cluster, some memory is private to each processor and some memory is shared among the processors. This invention can include utilizing the shared memory for exchanging information among the processors. The information can be passed using existing network interfaces through shared memory. For a computing system in which multiple processing nodes share some memory, and where each node requires standard networking methods of interconnect, the invention can include the utilization of shared memory to achieve OS-transparent high-speed access to network resources used for interconnecting said nodes.

Within an MPP or a cluster of workstations, or even a network of more- loosely coupled computers are generally-used protocols for the sending of messages. These protocols assure robust delivery of messages over widely- dispersed, often flimsy networks. The protocols used therein are therefore quite ill-suited to simple, secure, intercommunication media, particularly to media with secure memory.

However, these protocols are so widespread that a vast array of applications and subsystems utilize them. The result is that most applications and subsystems provide well-defined interfaces, interfaces that have been developed to meet with these particular protocols.

These interfaces produce complex, slow transfer of data. Two basic kinds of complexities are introduced by these packages which result in reliable communications over unreliable, complex networks. However, these complexities cause complex, slow communications within shared memory. The first of these complexities is one that will here be called layering.

When a first node in a complex network develops a message to pass to a second node within the network, the standard packages generally require that a massive amount of "layering" of the communication subsystem occur. The message is transformed, step-by-step, from a simple buffer containing application-level data into a message suitable for network traffic. Each step is performed within a given layer of the communication subsystem, and these layers pass little information from the one to the next. The first layer provides a first transformation to the data and passes the transformed information to the second which transforms that information and passes the result to the third layer, and in similar fashion down through the layers. No layer passes information to the next describing anything about the incoming information nor about the transformation performed except for the basic information necessary to continue the process of preparation for transmission over a complex network. Therefore, only with difficulty can the final result be examined by an automated process to determine the original information submitted.

The second, companion complexity will here be called MBUFs. Not only does each layer transform and repackage the information from the layer above, but also each of several of the various layers break the information each respectively receives from the layer above into finer and finer entities (MBUFs, or message buffers). For an automated process to locate and correlate the final MBUFs to the original application data is difficult. U.S. Ser. No. 09/273,430 describes a computing system in which multiple processing nodes are provided; and in which each is provided with some local, private memory, and further in which all have access to a portion of memory which is shared. U.S. Ser. No. 09/273,430 teaches the sharing of memory via a standard bus. In U.S. Ser. No. 09/273,430 each processing node is provided with separate networking I/O implemented over standard serial media.

The invention can be used with the kind of systems taught by U.S. Ser. No. 09/273,430. The invention is also applicable to other architectures such as NUMA machines in which each processor or processor aggregation is provided with a separate network I/O facility. The invention can be used in an environment as described in U.S. Ser.

No. 09/273,430 where multiple computers selectively address a first set of memory address ranges which will be to private memory and a second set of memory ranges which will be to shared memory. The invention can also be used in an environment that includes a large number of existing packages for the interchange of data and which meet the interfaces required for using those packages. The invention can include emulating those packages while simultaneously achieving highly efficient, fast, reliable transfer of information within a shared-memory cluster, and while avoiding the need to completely rewrite the protocols.

The first step of the invention is to redirect a MBUF subsystem so that the MBUFs are created and assembled in shared memory. This is in contrast to assembling the MBUFs in the private memory of the processing node.

The second step is to parse each MBUF for the pointer fields and to pull out these pointers separately. The passing of a message can, therefore, consist only of the passing of a pointer to the receiving process; the pointer points to the head of the MBUF chain, and the receiving process can read the buffers by merely following the successive MBUF pointers. No movement of data is necessary, and no message is therefore physically passed. In this manner, a message of many megabytes can be passed by the mere passing only of a one- word pointer.

The invention can be embodied outside the operating system but within shared memory to provide any node in a shared-memory computing system access to a shared memory, which is physically attached to another node. In a preferred embodiment, each processing node is provided with some local, private memory and a separate copy of the operating system. In this preferred embodiment, each of several nodes is provided with its own I/O channel to disks, networking adapters, and other I/O units. In this preferred embodiment, the operating system in each node is augmented with external extensions (not part of the operating system) which that can reach shared memory and communicate to other nodes via Load and Store instructions to shared memory. Referring to FIG. 1, the invention can include other extensions, called Network-Shared-Memory (NSM) extensions, which make use of primitives of

Load and Store instructions and by which network communication functions, which originate in applications and Operating System internal functions, can be processed. The network I/O functions can be processed by the NSM extensions and be translated into shared-memory Load and Store instructions. In this way, the NSM extensions satisfy the Operating System I/O request transparently.

The NSM extensions can simulate the behavior of a standard networking media, such as Ethernet, Token Ring, etceteras. Packet sizes appropriate to the medium can be supported transparently, allowing, for example, the large packet sizes of Token Ring to be exploited to minimize network protocol stack message fragmentation. The standardized behavior of Ethernet can be used for those application implementations requiring it. A more specific preferred implementation will now be described. When a NSM node requires a standard networking message to be sent to another node, the NSM extensions will copy the message out to shared-memory. The medium- appropriate destination address is then used as input into a hash function, which results in a table lookup returning the target node's address, as per the requirements of the primitives. To one skilled in the art, the network address to target address lookup is a straightforward implementation. The target node is then notified of the packet presence and its address in shared memory through mechanisms provided by Load and Store instructions. The target node may then indicate the data to the Operating System as appropriate, copying the shared- memory data as required.

If the target address is not in the lookup table, the sending node can use mechanisms provided by Load and Store instructions to broadcast to all nodes, ensuring that every node gets notified that the packet is present in shared memory. As each node examines the packet header, standard networking implementations dictate that it will ignore a message whose destination address it does not recognize.

When the receiving node issues a response to the network message, it will likewise be sent to all nodes via the broadcast mechanism previously described. Upon reception of notification of data in shared memory that matches the simulated destination address, the previous sending node (now the receiving node), can place the address in the hash table for lookup the next time a send is required. The hash table entries may be aged; discarding them after time. The fixed configuration nature of a group shared-memory connected to processing nodes, however, does not require this as a standard networking interconnect implementation would. The operating system perceived implementation of the networking interconnect can remain, in all cases, transparent.

While not being limited to any particular performance indicator or diagnostic identifier, preferred embodiments of the invention can be identified one at a time by testing for the substantially highest performance. The test for the substantially highest performance can be carried out without undue experimentation by the use of a simple and conventional benchmark (speed) experiment.

The term substantially, as used herein, is defined as at least approaching a given state (e.g., preferably within 10% of, more preferably within 1% of, and most preferably within 0.1% of). The term coupled, as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically.

The term means, as used herein, is defined as hardware, firmware and/or software for achieving a result. The term program or phrase computer program, as used herein, is defined as a sequence of instructions designed for execution on a computer system. A program may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, and/or other sequence of instructions designed for execution on a computer system. Practical Applications of the Invention A practical application of the invention that has value within the technological arts is waveform transformation. Further, the invention is useful in conjunction with data input and transformation (such as are used for the purpose of speech recognition), or in conjunction with transforming the appearance of a display (such as are used for the purpose of video games), or the like. There are virtually innumerable uses for the invention, all of which need not be detailed here. Advantages of the Invention A system, representing an embodiment of the invention, can be cost effective and advantageous for at least the following reasons. The invention improves the speed of parallel computing systems. The invention improves the scalability of parallel computing systems.

All the disclosed embodiments of the invention described herein can be realized and practiced without undue experimentation. Although the best mode of carrying out the invention contemplated by the inventors is disclosed above, practice of the invention is not limited thereto. Accordingly, it will be appreciated by those skilled in the art that the invention may be practiced otherwise than as specifically described herein.

For example, although the network shared memory described herein can be a separate module, it will be manifest that the network shared memory may be integrated into the system with which it is associated. Furthermore, all the disclosed elements and features of each disclosed embodiment can be combined with, or substituted for, the disclosed elements and features of every other disclosed embodiment except where such elements or features are mutually exclusive.

It will be manifest that various additions, modifications and rearrangements of the features of the invention may be made without deviating from the spirit and scope of the underlying inventive concept. It is intended that the scope of the invention as defined by the appended claims and their equivalents cover all such additions, modifications, and rearrangements.

The appended claims are not to be interpreted as including means-plus- function limitations, unless such a limitation is explicitly recited in a given claim using the phrase "means for." Expedient embodiments of the invention are differentiated by the appended subclaims.

Claims

CLAIMS What is claimed is:
1. A method, comprising: creating and assembling a message buffer in shared memory; parsing said message buffer for a plurality of pointer fields; and passing a message from a transmitting process to a receiving process by passing at least one pointer.
2. The method of claim 1 , wherein the pointer points to a head of an
MBUF chain and the receiving process can read a plurality of message buffers by merely following successive message buffer pointers.
3. The method of claim 1 , further comprising processing network I/O functions with network-shared-memory extensions to translate network I/O functions into shared-memory Load and Store instructions.
4. An apparatus, comprising: a shared memory node; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein a message buffer is created and assembled in said shared memory node, said message buffer is parsed for a plurality of pointer fields; and a message from said first processing node to said second processing node is passed by at least one pointer.
5. A computer system, comprising the apparatus of claim 4.
6. An electronic media, comprising: a computer program adapted to create and assemble a message buffer in shared memory; parse said message buffer for a plurality of pointer fields; and pass a message from a transmitting process to a receiving process by passing at least one pointer.
7. A computer program comprising computer program means adapted to perform the steps of creating and assembling a message buffer in shared memory; parsing said message buffer for a plurality of pointer fields; and passing a message from a transmitting process to a receiving process by passing at least one pointer when said computer program is run on a computer.
8. A computer program as claimed in claim 7, embodied on a computer- readable medium.
9. A system, comprising a multiplicity of processors, each with some private memory and the multiplicity with some shared memory, interconnected and arranged such that memory accesses to a first set of address ranges will be to local, private memory whereas memory accesses to a second set of address ranges will be to shared memory, and configured so that MBUFs are constructed and connected within shared memory.
10. The system of claim 9, wherein the system is configured so that the pointers from each buffer to the next are identified, and the offset within each to user data is specified.
11. The system of claim 10, wherein passing of the message includes passing a pointer to the first MBUF.
12. The system of claim 9, wherein each processor is provided with a pool of outgoing MBUFs within shared memory and uses MBUFs from that pool to send messages to other processors.
13. The system of claim 9, wherein all processors are provided with a common pool of outgoing MBUFs within shared memory and use MBUFs from that pool to send messages to other processors.
14. A computer system comprising Operating System extensions to perform network I/O functions in a shared-memory environment, wherein said Operating System extensions perform the functions with Load and Store operations.
15. A computer system comprising Operating System extensions to perform network I/O functions in a shared-memory environment, wherein said Operating System extensions transparently simulate standard networking protocols.
16. An apparatus, comprising: a shared memory node; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein Operating System extensions perform network I/O functions in a shared-memory environment with Load and Store operations.
17. An apparatus, comprising: a shared memory node; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein Operating System extensions perform network I/O functions in a shared-memory environment and transparently simulate standard networking protocols.
PCT/US2000/024248 1999-08-31 2000-08-31 Network shared memory WO2001016742A2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US15215199P true 1999-08-31 1999-08-31
US60/152,151 1999-08-31
US60/220,794 2000-07-25
US22097400P true 2000-07-26 2000-07-26
US22074800P true 2000-07-26 2000-07-26
US60/220,748 2000-07-26

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU71100/00A AU7110000A (en) 1999-08-31 2000-08-31 Network shared memory

Publications (2)

Publication Number Publication Date
WO2001016742A2 true WO2001016742A2 (en) 2001-03-08
WO2001016742A3 WO2001016742A3 (en) 2001-09-20

Family

ID=27387201

Family Applications (9)

Application Number Title Priority Date Filing Date
PCT/US2000/024147 WO2001016737A2 (en) 1999-08-31 2000-08-31 Cache-coherent shared-memory cluster
PCT/US2000/024216 WO2001016761A2 (en) 1999-08-31 2000-08-31 Efficient page allocation
PCT/US2000/024217 WO2001016741A2 (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory
PCT/US2000/024248 WO2001016742A2 (en) 1999-08-31 2000-08-31 Network shared memory
PCT/US2000/024298 WO2001016743A2 (en) 1999-08-31 2000-08-31 Shared memory disk
PCT/US2000/024329 WO2001016750A2 (en) 1999-08-31 2000-08-31 High-availability, shared-memory cluster
PCT/US2000/024150 WO2001016738A2 (en) 1999-08-31 2000-08-31 Efficient page ownership control
PCT/US2000/024210 WO2001016740A2 (en) 1999-08-31 2000-08-31 Efficient event waiting
PCT/US2000/024039 WO2001016760A1 (en) 1999-08-31 2000-08-31 Switchable shared-memory cluster

Family Applications Before (3)

Application Number Title Priority Date Filing Date
PCT/US2000/024147 WO2001016737A2 (en) 1999-08-31 2000-08-31 Cache-coherent shared-memory cluster
PCT/US2000/024216 WO2001016761A2 (en) 1999-08-31 2000-08-31 Efficient page allocation
PCT/US2000/024217 WO2001016741A2 (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory

Family Applications After (5)

Application Number Title Priority Date Filing Date
PCT/US2000/024298 WO2001016743A2 (en) 1999-08-31 2000-08-31 Shared memory disk
PCT/US2000/024329 WO2001016750A2 (en) 1999-08-31 2000-08-31 High-availability, shared-memory cluster
PCT/US2000/024150 WO2001016738A2 (en) 1999-08-31 2000-08-31 Efficient page ownership control
PCT/US2000/024210 WO2001016740A2 (en) 1999-08-31 2000-08-31 Efficient event waiting
PCT/US2000/024039 WO2001016760A1 (en) 1999-08-31 2000-08-31 Switchable shared-memory cluster

Country Status (4)

Country Link
EP (3) EP1214653A2 (en)
AU (9) AU7108500A (en)
CA (3) CA2382927A1 (en)
WO (9) WO2001016737A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6920485B2 (en) 2001-10-04 2005-07-19 Hewlett-Packard Development Company, L.P. Packet processing in shared memory multi-computer systems
US6999998B2 (en) 2001-10-04 2006-02-14 Hewlett-Packard Development Company, L.P. Shared memory coupling of network infrastructure devices
EP1895413A3 (en) * 2006-08-18 2009-09-30 Fujitsu Limited Access monitoring method and device for shared memory

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005504461A (en) * 2001-07-13 2005-02-10 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィKoninklijke Philips Electronics N.V. How to perform media applications and media system using a job control
US7254745B2 (en) 2002-10-03 2007-08-07 International Business Machines Corporation Diagnostic probe management in data processing systems
US7685381B2 (en) 2007-03-01 2010-03-23 International Business Machines Corporation Employing a data structure of readily accessible units of memory to facilitate memory access
US7899663B2 (en) 2007-03-30 2011-03-01 International Business Machines Corporation Providing memory consistency in an emulated processing environment
US9442780B2 (en) * 2011-07-19 2016-09-13 Qualcomm Incorporated Synchronization of shader operation
US9064437B2 (en) 2012-12-07 2015-06-23 Intel Corporation Memory based semaphores
CN103608792B (en) * 2013-05-28 2016-03-09 华为技术有限公司 Support Resources under isolation method and system for multi-core architecture

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0551242A2 (en) * 1992-01-10 1993-07-14 Digital Equipment Corporation Multiprocessor buffer system
EP0592117A2 (en) * 1992-09-24 1994-04-13 AT&T Corp. Asynchronous inter-process communications arrangement

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3668644A (en) * 1970-02-09 1972-06-06 Burroughs Corp Failsafe memory system
US4484262A (en) * 1979-01-09 1984-11-20 Sullivan Herbert W Shared memory computer method and apparatus
US4403283A (en) * 1980-07-28 1983-09-06 Ncr Corporation Extended memory system and method
US4414624A (en) * 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
US4725946A (en) * 1985-06-27 1988-02-16 Honeywell Information Systems Inc. P and V instructions for semaphore architecture in a multiprogramming/multiprocessing environment
JPH063589B2 (en) * 1987-10-29 1994-01-12 インターナシヨナル・ビジネス・マシーンズ・コーポレーシヨン Address replacement device
US5175839A (en) * 1987-12-24 1992-12-29 Fujitsu Limited Storage control system in a computer system for double-writing
DE68925064T2 (en) * 1988-05-26 1996-08-08 Hitachi Ltd Task execution control method for a multiprocessor system with post / Waiting procedure
US4992935A (en) * 1988-07-12 1991-02-12 International Business Machines Corporation Bit map search by competitive processors
US4965717B1 (en) * 1988-12-09 1993-05-25 Tandem Computers Inc
DE69124285D1 (en) * 1990-05-18 1997-03-06 Fujitsu Ltd Data processing system having an input / output paths separating mechanism and method for controlling the data processing system
US5206952A (en) * 1990-09-12 1993-04-27 Cray Research, Inc. Fault tolerant networking architecture
US5434970A (en) * 1991-02-14 1995-07-18 Cray Research, Inc. System for distributed multiprocessor communication
JPH04271453A (en) * 1991-02-27 1992-09-28 Toshiba Corp Composite electronic computer
DE69227956T2 (en) * 1991-07-18 1999-06-10 Tandem Computers Inc Multiprocessor system having mirrored memory
US5398331A (en) * 1992-07-08 1995-03-14 International Business Machines Corporation Shared storage controller for dual copy shared data
DE4238593A1 (en) * 1992-11-16 1994-05-19 Ibm Multiprocessor computer system
JP2963298B2 (en) * 1993-03-26 1999-10-18 富士通株式会社 Recovery Method and computer system for exclusive control instructions in dual shared memory
US5590308A (en) * 1993-09-01 1996-12-31 International Business Machines Corporation Method and apparatus for reducing false invalidations in distributed systems
US5664089A (en) * 1994-04-26 1997-09-02 Unisys Corporation Multiple power domain power loss detection and interface disable
US5636359A (en) * 1994-06-20 1997-06-03 International Business Machines Corporation Performance enhancement system and method for a hierarchical data cache using a RAID parity scheme
US6587889B1 (en) * 1995-10-17 2003-07-01 International Business Machines Corporation Junction manager program object interconnection and method
US5940870A (en) * 1996-05-21 1999-08-17 Industrial Technology Research Institute Address translation for shared-memory multiprocessor clustering
US5784699A (en) * 1996-05-24 1998-07-21 Oracle Corporation Dynamic memory allocation in a computer using a bit map index
JPH10142298A (en) * 1996-11-15 1998-05-29 Advantest Corp Testing device for ic device
US5829029A (en) * 1996-12-18 1998-10-27 Bull Hn Information Systems Inc. Private cache miss and access management in a multiprocessor system with shared memory
US5918248A (en) * 1996-12-30 1999-06-29 Northern Telecom Limited Shared memory control algorithm for mutual exclusion and rollback
US6360303B1 (en) * 1997-09-30 2002-03-19 Compaq Computer Corporation Partitioning memory shared by multiple processors of a distributed processing system
EP0908825B1 (en) * 1997-10-10 2002-09-04 Bull S.A. A data-processing system with cc-NUMA (cache coherent, non-uniform memory access) architecture and remote access cache incorporated in local memory

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0551242A2 (en) * 1992-01-10 1993-07-14 Digital Equipment Corporation Multiprocessor buffer system
EP0592117A2 (en) * 1992-09-24 1994-04-13 AT&T Corp. Asynchronous inter-process communications arrangement

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Page info" NETSCAPE SCREENSHOT, 23 March 2001 (2001-03-23), XP002163812 *
"SHARED MEMORY CLUSTER A SCALABLE MULTIPROCESSOR DESIGN" IBM TECHNICAL DISCLOSURE BULLETIN,US,IBM CORP. NEW YORK, vol. 37, no. 6A, 1 June 1994 (1994-06-01), pages 503-507, XP000455862 ISSN: 0018-8689 *
PAUL J. CHRISTENSEN <PAULCÐLL.MIT.EDU>, DANIEL J. VAN HOOK <DVANHOOKÐLL.MIT.EDU>, HARRY M. WOLFSON <HARRYWOLFSONÐLL.MIT.EDU>: "HLA RTI Shared Memory Communication" INTERNET DOCUMENT, [Online] 15 April 1999 (1999-04-15), XP002163805 Lexington, Massachusetts, États-Unis d'Amérique Retrieved from the Internet: <URL:http://dss.ll.mit.edu/dss.web/99S-SIW -090.html > [retrieved on 2001-03-21] *
PAUL R. WILSON: "POINTER SWIZZLING AT PAGE FAULT TIME: EFFICIENTLY SUPPORTING HUGE ADDRESS SPACES ON STANDARD HARDWARE" COMPUTER ARCHITECTURE NEWS,ASSOCIATION FOR COMPUTING MACHINERY, NEW YORK,US, vol. 19, no. 4, 1 June 1991 (1991-06-01), pages 6-13, XP000228934 ISSN: 0163-5964 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6920485B2 (en) 2001-10-04 2005-07-19 Hewlett-Packard Development Company, L.P. Packet processing in shared memory multi-computer systems
US6999998B2 (en) 2001-10-04 2006-02-14 Hewlett-Packard Development Company, L.P. Shared memory coupling of network infrastructure devices
EP1895413A3 (en) * 2006-08-18 2009-09-30 Fujitsu Limited Access monitoring method and device for shared memory

Also Published As

Publication number Publication date
WO2001016750A3 (en) 2002-01-17
AU7112100A (en) 2001-03-26
AU6949600A (en) 2001-03-26
EP1214651A2 (en) 2002-06-19
AU7108300A (en) 2001-03-26
AU6949700A (en) 2001-03-26
WO2001016761A3 (en) 2001-12-27
AU7113600A (en) 2001-03-26
WO2001016737A3 (en) 2001-11-08
CA2382929A1 (en) 2001-03-08
WO2001016741A2 (en) 2001-03-08
CA2382728A1 (en) 2001-03-08
WO2001016738A3 (en) 2001-10-04
WO2001016750A2 (en) 2001-03-08
WO2001016743A3 (en) 2001-08-09
WO2001016742A3 (en) 2001-09-20
WO2001016743A8 (en) 2001-10-18
AU7100700A (en) 2001-03-26
WO2001016738A9 (en) 2002-09-12
CA2382927A1 (en) 2001-03-08
WO2001016738A2 (en) 2001-03-08
WO2001016760A1 (en) 2001-03-08
WO2001016737A2 (en) 2001-03-08
WO2001016740A3 (en) 2001-12-27
WO2001016741A3 (en) 2001-09-20
EP1214653A2 (en) 2002-06-19
EP1214652A2 (en) 2002-06-19
AU7474200A (en) 2001-03-26
AU7108500A (en) 2001-03-26
WO2001016761A2 (en) 2001-03-08
WO2001016743A2 (en) 2001-03-08
WO2001016738A8 (en) 2001-05-03
WO2001016740A2 (en) 2001-03-08
AU7110000A (en) 2001-03-26

Similar Documents

Publication Publication Date Title
Tanenbaum Distributed operating systems
US8019902B2 (en) Network adapter with shared database for message context information
US7257616B2 (en) Network switch and components and method of operation
Prylli et al. BIP: a new protocol designed for high performance networking on myrinet
KR940000177B1 (en) Multiprocessor interrupts rerouting mechanism
CN100579108C (en) Method for remote key validation and host computer structure adapter
JP4150336B2 (en) Configured to create multiple virtual queue pairs from the compressed queue pair based on the shared attributes
US5802366A (en) Parallel I/O network file server architecture
CN1212574C (en) End node partitioning using local identifiers
US6460120B1 (en) Network processor, memory organization and methods
Kaiserwerth The parallel protocol engine
JP3872342B2 (en) Apparatus and scalable network processor for network
US6948004B2 (en) Host-fabric adapter having work queue entry (WQE) ring hardware assist (HWA) mechanism
US8458280B2 (en) Apparatus and method for packet transmission over a high speed network supporting remote direct memory access operations
CN100470517C (en) System and method for information handling system PCI express advanced switching
CA2509404C (en) Using direct memory access for performing database operations between two or more machines
US7013353B2 (en) Host-fabric adapter having an efficient multi-tasking pipelined instruction execution micro-controller subsystem
US7996569B2 (en) Method and system for zero copy in a virtualized network environment
KR100437146B1 (en) Intelligent network interface device and system for accelerating communication
US20030191838A1 (en) Distributed intelligent virtual server
US7757232B2 (en) Method and apparatus for implementing work request lists
CN100375469C (en) Method and device for emulating multiple logic port on a physical poet
US5619650A (en) Network processor for transforming a message transported from an I/O channel to a network by adding a message identifier and then converting the message
US6836808B2 (en) Pipelined packet processing
KR100773013B1 (en) Method and Apparatus for controlling flow of data between data processing systems via a memory

Legal Events

Date Code Title Description
AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US US US UZ VN YU ZA ZW

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US US US UZ VN YU ZA ZW

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)