WO2009035690A2

WO2009035690A2 - Memory sharing and data distribution

Info

Publication number: WO2009035690A2
Application number: PCT/US2008/010710
Authority: WO
Inventors: Jimmy J. Tomblin; Laurence W. Grodd; Robert A. Todd
Original assignee: Mentor Graphics Corporation
Priority date: 2007-09-11
Filing date: 2008-09-11
Publication date: 2009-03-19
Also published as: EP2210398A2; JP2010539588A; WO2009035690A3

Abstract

Copies of a software tool computing process are instantiated on a master computing device and on remote computing devices separate from the master computing device. A primary instantiation of the computing process employs a data block resource manager that can save data to and retrieve data from a memory of the master computing device. Each remote installation of the computing process employs a remote data client, while a remote data server is instantiated on each remote computing device. When a remote instantiation of the computing process needs to access data, it instructs its associated remote data client to request the data. If the data is being managed by another remote computing process, then the remote data client will request the data from the remote data server associated with that remote computing process. Alternately, if the data is being managed by a computing process on the master computing device, then the remote data client will request the data from the data block manager associated with that master computing process.

Description

MEMORY SHARING AND DATA DISTRIBUTION

Related Applications

[01] This application claims priority under 35 U.S. C. §119 to U.S. Provisional Patent Application No. 60/971,264, entitled "Memory Sharing And Data Distribution," filed on September 11, 2007, and naming Laurence Grodd et al. as inventors, which application is incorporated entirely herein by reference. Also, this application is related to U.S. Patent Application No. 11/396,929, entitled "Distribution Of Parallel Operations," filed on April 2, 2006, and naming Laurence Grodd et al. as inventors, which application is incorporated entirely herein by reference as well.

Field of the Invention

[02] The present invention is directed to the use of shared and distributed memory in a parallel processing network. Various implementations of the invention may have particular application to the use of shared and distributed memory to store design data for use by an electronic design automation software tool operating in a parallel processing network.

Background of the Invention

[03] Many software applications can be efficiently run on a single-processor computer. Some software applications, however, have so many operations that they cannot be sequentially executed on a single-processor computer in an economical amount of time. For example, microdevice (e.g., integrated circuit) design process software applications may require the execution of a hundred thousand or more operations on hundreds of thousands or even millions of input data values. In order to run this type of software application more quickly, computers were developed that employed multiple processors capable of simultaneously using multiple processing threads. While these computers can execute complex software applications more quickly than single-processor computers, these multi-processor computers are very expensive to purchase and maintain. With multiprocessor computers, the processors execute numerous operations simultaneously, so they must employ specialized operating systems to coordinate the concurrent execution of related operations. Further, because its multiple processors may simultaneously seek access to resources such as memory, the bus structure and physical layout of a multiprocessor computer is inherently more complex than a single processor computer.

[04] In view of the difficulties and expense involved with large multi-processor computers, networks of linked single-processor computers have become a popular alternative to using a single multi-processor computer. The cost of conventional single-processor computers, such as personal computers, has dropped significantly in the last few years. Moreover, techniques for linking the operation of multiple single-processor computers into a network have become more sophisticated and reliable. Accordingly, multi-million dollar, multi-processor computers are now typically being replaced with networks or "farms" of relatively simple and low-cost single processor computers.

[05] Shifting from single multi-processor computers to multiple networked single-processor computers has been particularly useful where the data being processed has parallelism. With this type of data, one portion of the data is independent of another portion of the data. That is, manipulation of a first portion of the data does not require knowledge of or access to a second portion of the data. Thus, one single-processor computer can execute an operation on the first portion of the data while another single-processor computer can simultaneously execute the same operation on the second portion of the data. By using multiple computers to execute the same operation on different groups of data at the same time, i.e., in "parallel," large amounts of data can be processed quickly. This use of multiple single-processor computers has been particularly beneficial for analyzing circuit design data using electronic design automation (EDA) tools. With this type of data, one portion of the design, such as a semiconductor gate in a first area of a microcircuit, may be completely independent from another portion of the design, such as a wiring line in a second area of the microcircuit. Some electronic design automation operations, such as operations defining a minimum width check of a structure, can thus be executed by one computer for the gate while another computer executes the same operations for the wiring line. [06] While parallel processing has substantially decreased the time required to execute electronic design automation software tools, many of these tools still require a large amount of system memory. For example, an electronic design automation software tool may initiate several instantiations of a computing process on a single "master" computing device, so that each instantiation can execute operations in parallel with the others. Alternately or additionally, an electronic design automation software tool may instantiate several copies of a computing process on multiple "remote" computing devices, again so that each instantiation may execute operations in parallel with the others. With conventional parallel processing systems, each instantiation will need its own copy of the data required to perform its assigned operations. For example, if a parallel processing system is being used to implement an electronic design automation software tool computing process for processing layout design data, data for a single layer of an integrated circuit design may be duplicated several times in a master computing device's memory, for use by multiple instantiations of the computing process. Still further, the data may be duplicated on a plurality of remote computing devices as well, for use by instantiations of the computing process on those remote computing devices.

Summary of the Invention

[07] Advantageously, various aspects of the invention are related to techniques for allowing multiple instantiations of a software tool computing process (such as a computing process of an electronic design automation software tool) operating on a single master computing device to share data stored in the memory of that master computing device. Other aspects of the invention are directed to allowing multiple instantiations of a software tool computing process, operating on multiple remote computing devices, to employ data distributed among those remote computing devices.

[08] According to various embodiments of the invention, multiple copies of a software tool computing process, such as an electronic design automation software tool computing process, are instantiated on a master computing device. Each instantiation of the computing process includes or otherwise employs a data block manager configured to request data needed by the computing process. A primary instantiation of the computing process also includes or otherwise employs a data block resource manager that can save data to and retrieve from a memory of the master computing device. With various embodiments of the invention, the memory may be "fast" memory provided by an integrated circuit memory device operating adjacent to a processing unit of the master computing device, a "slow" memory provided by a magnetic or optical disk, or some combination thereof. When an instantiation of a computing process needs to access data, it instructs its associated data block manager to request the data from the data block resource manager. In response, the data block resource manager retrieves the data from the memory of the master computing device, and provides it to the requesting data block manager for use by computing process. Similarly, if a computing process needs to save data, it instructs its associated data block manager to provide the data to the data block resource manager. In response, the data block resource manager saves the data in the memory of the master computing device.

[09] With still other examples of the invention, one or more versions of the software tool computing process are instantiated on remote computing devices separate from the master computing device. Each remote instantiation of the computing process includes or otherwise employees a remote data client. Also, a remote data server (or remote data block resource manager) is instantiated on each remote computing device. Like the data block resource manager, the remote data server can save data to and retrieve data from the memory of its remote computing device. Also, as with the memory for the master computing device, the memory for the remote computing device may be "fast" memory provided by an integrated circuit memory device operating adjacent to a processing unit of the remote computing device, a "slow" memory provided by a magnetic or optical disk, or some combination thereof.

[10] When a remote instantiation of the computing process needs to access data, it instructs its associated remote data client to request the data. If the data is being managed by another remote computing process, then the remote data client will request the data from the remote data server associated with that remote computing process. Alternately, if the data is being managed by a computing process on the master computing device, then the remote data client will request the data from the data block manager associated with that master computing process. As discussed above, the data block manager may in turn obtain the data through the data block resource manager associated with the primary computing process on the master computing device. Similarly, if a remote computing process needs to save data, it instructs its associated remote data client to provide the data to the appropriate remote data server or data block resource manager for storage in the corresponding memory. With various examples of the invention, the data block managers, remote data clients and remote data servers may be interconnected by an electronic communications network using a suitable networking communication protocol, such as the Transmission Control Protocol/Internet protocol (TCP/IP) or User Datagram Protocol.

[11] These and other features and aspects of the invention will be apparent upon consideration of the following detailed description.

Brief Description of the Drawings

[12] Fig. 1 is a schematic diagram of a multi -processor computer linked with a network of single-processor computers as may be employed by various embodiments of the invention.

[13] Fig. 2 is a schematic diagram of a processor unit for a computer that may be employed by various embodiments of the invention.

Detailed Description of Preferred Embodiments

Introduction

[14] Various embodiments of the invention relate to tools and methods for distributing segments of data among multiple computing threads for conversion from a first format to a second format. Accordingly, aspects of some embodiments of the invention have particular application to the distribution of data segments among a computing network including at least one multi-processor master computer and a plurality of single-processor slave computers. To better facilitate an understanding of these implementations, an example of a network having a multi-processor master computer linked to a plurality of single-processor slave computers will be discussed.

Exemplary Operating Environment

[15] The implementations of a software tool, (such as an electronic design automation software tool) according to various examples of the invention may be implemented using computer-executable software instructions executed by one or more programmable computing devices. Because these examples of the invention may be implemented using software instructions, the components and operation of a generic programmable computer system on which various embodiments of the invention may be employed will first be described. More particularly, the components and operation of a computer network having a host or master computer and one or more remote or slave computers will be described with reference to Figure 1. This operating environment is only one example of a suitable operating environment, however, and is not intended to suggest any limitation as to the scope of use or functionality of the invention.

[16] In Figure 1, the master computer 101 is a multi-processor computer that includes a plurality of input and output devices 103 and a memory 105. The input and output devices 103 may include any device for receiving input data from or providing output data to a user. The input devices may include, for example, a keyboard, microphone, scanner or pointing device for receiving input from a user. The output devices may then include a display monitor, speaker, printer or tactile feedback device. These devices and their connections are well known in the art, and thus will not be discussed at length here.

[17] The memory 105 may similarly be implemented using any combination of computer readable media that can be accessed by the master computer 101. The computer readable media may include, for example, microcircuit memory devices such as read-write memory (RAM), read-only memory (ROM), electronically erasable and programmable read-only memory (EEPROM) or flash memory microcircuit devices, CD-ROM disks, digital video disks (DVD), or other optical storage devices. The computer readable media may also include magnetic cassettes, magnetic tapes, magnetic disks or other magnetic storage devices, punched media, holographic storage devices, or any other medium that can be used to store desired information.

[18] As will be discussed in detail below, the master computer 101 runs a software application for performing one or more operations according to various examples of the invention. Accordingly, the memory 105 stores software instructions 107 A that, when executed, will implement a software application for performing one or more operations. The memory 105 also stores data 107B to be used with the software application. In the illustrated embodiment, the data 107B contains process data that the software application uses to perform the operations, at least some of which may be parallel.

[19] The master computer 101 also includes a plurality of processor units 109 and an interface device 111. The processor units 109 may be any type of processor device that can be programmed to execute the software instructions 107 A, but will conventionally be a microprocessor device. For example, one or more of the processor units 109 may be commercially generic programmable microprocessors, such as Intel® Pentium® or Xeon™ microprocessors, Advanced Micro Devices Athlon™ microprocessors or Motorola 68K/Coldfire® microprocessors. Alternately or additionally, one or more of the the processor units 109 may be custom-manufactured processors, such as microprocessors designed to optimally perform specific types of mathematical operations. The interface device 111, the processor units 109, the memory 105 and the input/output devices 103 are connected together by a bus 113.

[20] With some implementations of the invention, the master computing device 101 may employ one or more processing units 109 having more than one processor core. Accordingly, Figure 2 illustrates an example of a multi-core processor unit 109 that may be employed with various embodiments of the invention. As seen in this figure, the processor unit 109 includes a plurality of processor cores 201. Each processor core 201 includes a computing engine 203 and a memory cache 205. As known to those of ordinary skill in the art, a computing engine contains logic devices for performing various computing functions, such as fetching software instructions and then performing the actions specified in the fetched instructions. These actions may include, for example, adding, subtracting, multiplying, and comparing numbers, performing logical operations such as AND, OR, NOR and XOR, and retrieving data. Each computing engine 203 may then use its corresponding memory cache 205 to quickly store and retrieve data and/or instructions for execution.

[21] Each processor core 201 is connected to an interconnect 207. The particular construction of the interconnect 207 may vary depending upon the architecture of the processor unit 201. With some processor units 201, such as the Cell microprocessor created by Sony Corporation, Toshiba Corporation and IBM Corporation, the interconnect 207 may be implemented as an interconnect bus. With other processor units 201, however, such as the Opteron™ and Athlon™ dual-core processors available from Advanced Micro Devices of Sunnyvale, California, the interconnect 207 may be implemented as a system request interface device. In any case, the processor cores 201 communicate through the interconnect 207 with an input/output interfaces 209 and a memory controller 211. The input/output interface 209 provides a communication interface between the processor unit 201 and the bus 113. Similarly, the memory controller 211 controls the exchange of information between the processor unit 201 and the system memory 107. With some implementations of the invention, the processor units 201 may include additional components, such as a high-level cache memory accessible shared by the processor cores 201.

[22] While Figure 2 shows one illustration of a processor unit 201 that may be employed by some embodiments of the invention, it should be appreciated that this illustration is representative only, and is not intended to be limiting. For example, some embodiments of the invention may employ a master computer 101 with one or more Cell processors. The Cell processor employs multiple input/output interfaces 209 and multiple memory controllers 211. Also, the Cell processor has nine different processor cores 201 of different types. More particularly, it has six or more synergistic processor elements (SPEs) and a power processor element (PPE). Each synergistic processor element has a vector-type computing engine 203 with 128 x 128 bit registers, four single-precision floating point computational units, four integer computational units, and a 256KB local store memory that stores both instructions and data. The power processor element then controls that tasks performed by the synergistic processor elements. Because of its configuration, the Cell processor can perform some mathematical operations, such as the calculation of fast Fourier transforms (FFTs), at substantially higher speeds than many conventional processors.

[23] Returning now to Figure 1, the interface device 111 allows the master computer 101 to communicate with the slave computers 115A, 115B, 115C...115x through a communication interface. The communication interface may be any suitable type of interface including, for example, a conventional wired network connection or an optically transmissive wired network connection. The communication interface may also be a wireless connection, such as a wireless optical connection, a radio frequency connection, an infrared connection, or even an acoustic connection. The interface device 111 translates data and control signals from the master computer 101 and each of the slave computers 1 15 into network messages according to one or more communication protocols, such as the transmission control protocol (TCP), the user datagram protocol (UDP), and the Internet protocol (IP). These and other conventional communication protocols are well known in the art, and thus will not be discussed here in more detail.

[24] Each slave computer 115 may include a memory 117, a processor unit 1 19, an interface device 121, and, optionally, one more input/output devices 123 connected together by a system bus 125. As with the master computer 101, the optional input/output devices 123 for the slave computers 115 may include any conventional input or output devices, such as keyboards, pointing devices, microphones, display monitors, speakers, and printers. Similarly, the processor units 119 may be any type of conventional or custom- manufactured programmable processor device. For example, one or more of the processor units 119 may be commercially generic programmable microprocessors, such as Intel® Pentium® or Xeon™ microprocessors, Advanced Micro Devices Athlon™ microprocessors or Motorola 68K/Coldfire® microprocessors. Alternately, one or more of the processor units 109 may be custom-manufactured processors, such as microprocessors designed to optimally perform specific types of mathematical operations. [25] Still farther, one or more of the processor units 119 may have more than one core, as described with reference to Figure 2 above. For example, with some implementations of the invention, one or more of the processor units 119 may be a Cell processor. The memory 117 then may be implemented using any combination of the computer readable media discussed above. Like the interface device 111, the interface devices 121 allow the slave computers 115 to communicate with the master computer 101 over the communication interface.

[26] In the illustrated example, the master computer 101 is a multi-processor unit computer with multiple processor units 109, while each slave computer 115 has a single processor unit 119. It should be noted, however, that alternate implementations of the invention may employ a master computer having single processor unit 109. Further, one or more of the slave computers 115 may have multiple processor units 119, depending upon their intended use. Also, while only a single interface device 111 is illustrated for the host computer 101, it should be noted that, with alternate embodiments of the invention, the computer 101 may use two or more different interface devices 111 for communicating with the remote computers 115 over multiple communication interfaces.

[27] With various examples of the invention, the master computer 101 may be connected to one or more external data storage devices. These external data storage devices may be implemented using any combination of computer readable media that can be accessed by the master computer 101. The computer readable media may include, for example, microcircuit memory devices such as read-write memory (RAM), read-only memory (ROM), electronically erasable and programmable read-only memory (EEPROM) or flash memory microcircuit devices, CD-ROM disks, digital video disks (DVD), or other optical storage devices. The computer readable media may also include magnetic cassettes, magnetic tapes, magnetic disks or other magnetic storage devices, punched media, holographic storage devices, or any other medium that can be used to store desired information. According to some implementations of the invention, one or more of the slave computers 115 may alternately or additions be connected to one or more external data storage devices. Typically, these external data storage devices will include data storage devices that also are connected to the master computer 101, but they also may be different from any data storage devices accessible by the master computer 101.

Operations

[28] As previously noted, various aspects of the invention may be implemented to support the execution of operations by a computing system with a multiprocessor architecture. Accordingly, different embodiments of the invention can be employed with a variety of different types of software applications. Some embodiments of the invention, however, may be particularly useful in conjunction with electronic design automation software tools that perform operations for simulating, verifying or modifying design data representing a microdevice, such as a microcircuit. Designing and fabricating microcircuit devices involve many steps during a 'design flow' process. These steps are highly dependent on the type of microcircuit, the complexity, the design team, and the microcircuit fabricator or foundry. Several steps are common to all design flows: first a design specification is modeled logically, typically in a hardware design language (HDL). Software and hardware "tools" then verify the design at various stages of the design flow by running software simulators and/or hardware emulators, and errors are corrected.

[29] After the logical design is deemed satisfactory, it is converted into physical design data by synthesis software. The physical design data may represent, for example, the geometric pattern that will be written onto a mask used to fabricate the desired microcircuit device in a photolithographic process at a foundry. It is very important that the physical design information accurately embody the design specification and logical design for proper operation of the device. Further, because the physical design data is employed to create masks used at a foundry, the data must conform to foundry requirements. Each foundry specifies its own physical design parameters for compliance with their process, equipment, and techniques. Accordingly, the design flow may include a design rule check process. During this process, the physical layout of the circuit design is compared with design rules. In addition to rules specified by the foundry, the design rule check process may also check the physical layout of the circuit design against other design rules, such as those obtained from test chips, knowledge in the industry, etc. [30] Once a designer has used a verification software application to verify that the physical layout of the circuit design complies with the design rules, the designer may then modify the physical layout of the circuit design to improve the resolution of the image that the physical layout will produce during a photolithography process. These resolution enhancement techniques (RET) may include, for example, modifying the physical layout using optical proximity correction (OPC) or by the addition of sub-resolution assist features (SRAF). Once the physical layout of the circuit design has been modified using resolution enhancement techniques, then a design rule check may be performed on the modified layout, and the process repeated until a desired degree of resolution is obtained. Examples of such simulation and verification tools are described in U.S. Patent No. 6,230,299 to McSherry et al., issued May 8, 2001, U.S. Patent No. 6,249,903 to McSherry et al., issued June 19, 2001, U.S. Patent No. 6,339,836 to Eisenhofer et al., issued January 15, 2002, U.S. Patent No. 6,397,372 to Bozkus et al., issued May 28, 2002, U.S. Patent No. 6,415,421 to Anderson et al., issued July 2, 2002, and U.S. Patent No. 6,425,113 to Anderson et al., issued July 23, 2002, each of which are incorporated entirely herein by reference.

[31] The design of a new integrated circuit may include the interconnection of millions of transistors, resistors, capacitors, or other electrical structures into logic circuits, memory circuits, programmable field arrays, and other circuit devices. In order to allow a computer to more easily create and analyze these large data structures (and to allow human users to better understand these data structures), they are often hierarchically organized into smaller data structures, typically referred to as "cells." Thus, for a microprocessor or flash memory design, all of the transistors making up a memory circuit for storing a single bit may be categorized into a single "bit memory" cell. Rather than having to enumerate each transistor individually, the group of transistors making up a single-bit memory circuit can thus collectively be referred to and manipulated as a single unit. Similarly, the design data describing a larger 16-bit memory register circuit can be categorized into a single cell. This higher level "register cell" might then include sixteen bit memory cells, together with the design data describing other miscellaneous circuitry, such as an input/output circuit for transferring data into and out of each of the bit memory cells. Similarly, the design data describing a 128kB memory array can then be concisely described as a combination of only 64,000 register cells, together with the design data describing its own miscellaneous circuitry, such as an input/output circuit for transferring data into and out of each of the register cells.

[32] By categorizing microcircuit design data into hierarchical cells, large data structures can be processed more quickly and efficiently. For example, a circuit designer typically will analyze a design to ensure that each circuit feature described in the design complies with design rules specified by the foundry that will manufacture microcircuits from the design. With the above example, instead of having to analyze each feature in the entire 128kB memory array, a design rule check process can analyze the features in a single bit cell. The results of the check will then be applicable to all of the single bit cells. Once it has confirmed that one instance of the single bit cells complies with the design rules, the design rule check process then can complete the analysis of a register cell simply by analyzing the features of its additional miscellaneous circuitry (which may itself be made of up one or more hierarchical cells). The results of this check will then be applicable to all of the register cells. Once it has confirmed that one instance of the register cells complies with the design rules, the design rule check software application can complete the analysis of the entire 128kB memory array simply by analyzing the features of the additional miscellaneous circuitry in the memory array. Thus, the analysis of a large data structure can be compressed into the analyses of a relatively small number of cells making up the data structure.

[33] Typically, microcircuit physical design data will include two different types of data: "drawn layer" design data and "derived layer" design data. The drawn layer data describes polygons drawn in the layers of material that will form the microcircuit. The drawn layer data will usually include polygons in metal layers, diffusion layers, and polysilicon layers. The derived layers will then include features made up of combinations of drawn layer data and other derived layer data. For example, with the transistor gate described above, the derived layer design data describing the gate will be derived from the intersection of a polygon in the polysilicon material layer and a polygon in the diffusion material layer. [34] For example, a design rule check software application will perform two types of operations: "check" operations that confirm whether design data values comply with specified parameters, and "derivation" operations that create derived layer data. A transistor gate design data thus may be created by the following derivation operation:

gate = diff AND poly

The results of this operation will identify all intersections of diffusion layer polygons with polysilicon layer polygons. Likewise, a p-type transistor gate, formed by doping the diffusion layer with n-type material, is identified by the following derivation operation:

pgate = nwell AND gate

The results of this operation then will identify all transistor gates (i.e., intersections of diffusion layer polygons with polysilicon layer polygons) where the polygons in the diffusion layer have been doped with n-type material.

[35] A check operation will then define a parameter or a parameter range for a data design value. For example, a user may want to ensure that no metal wiring line is within a micron of another wiring line. This type of analysis may be performed by the following check operation:

external metal < 1

The results of this operation will identify each polygon in the metal layer design data that are closer than one micron to another polygon in the metal layer design data.

[36] Also, while the above operation employs drawn layer data, check operations may be performed on derived layer data as well. For example, if a user wanted to confirm that no transistor gate is located within one micron of another gate, the design rule check process might include the following check operation:

external gate < 1 The results of this operation will identify all gate design data representing gates that are positioned less than one micron from another gate. It should be appreciated, however, that this check operation cannot be performed until a derivation operation identifying the gates from the drawn layer design data has been performed.

[37] Table 1 describes an example of a circuit analysis process in psuedocode. An example of this flow might be implemented using, e.g., the Calibre family of electronic design tools available from Mentor Graphics Corporation of Wilsonville, Oregon. In particular, Table 1 lists the flow of an algorithm for inserting "dummy" geometric data (typically rectangles) are added to a layer in the design in order to achieve a desired amount of structure density per unit area.

LAYOUT PATH in.gds LAYOUT SYSTEM GDSII

LAYOUT PRIMARY "TOPCELL"

DRC RESULTS DATABASE out.oas OASIS

LAYTER M2 2

MS_DENS = DENSITY M2 <= 0.25 WINDOW 5.0 STEP 1.0

M2_FILL_AREA = M2_DENS NOT M2

DM2 = RECTANGLES .1 .1 .1 INSIDE OF LAYER M2 FILL AREA

M2 { COPY M2 } DRC CHECK MAP M2 OASIS 2 0 DM2 { COPY M2 } DRC CHECK MAP DM2 OASIS 2 1

TABLE 1

[38] Thus, this flow specifies the series of inputs that will be employed by the algorithm, hi particular, it identified for the electronic design automation tool both the database from which the input data is being retrieved (i.e., in.gds), and the particular layout design is that is being read into the electronic design automation tool (i.e., the circuit design entitled "GDSII") in the GDSII data format, and the primary cell in the design (i.e., the cell entitled "Topcell").

Organization Of Data

[39] As will be discussed in more detail below, various implementations of the invention transfer data to various computing processes, running either on a master computing device, one or more remote computing devices, or both. Accordingly, with various implementations of the invention, the software tool is configured to handle data in blocks of a defined size. For example, with some implementations of the invention, the software tool may be configured to retrieve, transfer, and save data in block of 16 kilobytes. More particularly, as will be discussed in more detail below, various implementations of the invention employ a data block resource manager to save data to and retrieve from a memory of a master computing device. With various embodiments of the invention, the memory may be a "fast" memory provided by an integrated circuit memory device operating adjacent to a processing unit of the master computing device, a "slow" memory provided by a magnetic or optical disk, or some combination thereof.

[40] The data block resource manager may, for example, create a single file on a disk, such as a magnetic disk employed in a convention hard drive storage device. This file is partitioned into 16kB blocks, and can be grown by the data block resource manager as needed. As will be discussed in more detail below, the data block resource manager may access a set of data relative to the memory using a DBLOCK ID. (As used herein, the term "access" will be used to encompass both operations to retrieve data from memory or save data to memory.) The set of data may include one or more actual 16kB blocks of the file. In this instance, the DBLOCK ID will reference an offset map, which maps offsets identifying each 16kB memory block making up the data set, as illustrated in Fig. 3. As will be appreciated by those of ordinary skill in the art, no two DBLOCK IDs should reference the same file offset.

[41] It will be appreciated, however, that a typical set of related data may not be some integral amount of 16kB. For example, with some implementations of the invention, the software tool may be an electronic design automation software tool, such as a software tool in the Calibre® family of software products available from Mentor Graphics Corporation of Wilsonville, Oregon. With this type of software tool, a related set of data may include a sequence of geometric elements from a layout design of an integrated circuit. If the size of a related data set, such as a geometry sequence, exceeds 16kB, then all initial full 16kB blocks of data may be shifted to a "slow" memory storage device, such as a magnetic disk, and referenced using a common DBLOCK ID as noted above. The "drabble," which is always non-empty, can then be stored in "fast" or local remains in memory.

[42] Using this arrangement, when the data block resource manager seeks to access data, it can do so using the associated DBLOCK ID. For example, if the data block resource manager needs to retrieve a set of data, it first identifies the DBLOCK ID associated with that data set. After identifying the relevant DBLOCK ID, the data block resource manager then uses the offsets associated with the DBLOCK ID to begin retrieving the data, in 16kB increments. After all of the retrieved data blocks have been retrieved and processed, then the data block manager retrieves and processes the "drabble" remnant (which may be saved in local memory). It may then identify a DBLOCK ID for the next set of data to be accessed. In this manner, various implementations of the invention can e.g., reduce the amount of local memory used to store a set of data without having to waste some portion of a 16Kb block of memory on a hard disk storage device.

Shared Data

[43] Fig. 4 illustrates an example of a parallel processing computing system that can share data among various processes of a master computing device in accordance with various examples of the invention. As seen in this figure, the parallel processing computing system 401 includes N+l number of instantiations (O, 1 ,...N) of a computing process 403. More particularly, the parallel processing computing system 401 includes a primary computing process 403' and N number of "pseudo" computing processes 403A-403x. With the illustrated implementation of the invention, the computing process 403 is an electronic design automation tool configured to operate on a hierarchical database (HDB) of integrated circuit design information. As also illustrated in this figure, each computing process 403 includes (or otherwise employs) a data block manager 405. Each data block manager is configured to request or transmit data in response to instructions from its associated computing process 403.

[44] The primary computing process 403' also includes a data block resource manager 407. The data block resource manager 407 can save data to and retrieve from components of a memory of the master computing device. In the illustrated example, the memory includes both a "fast" or local memory 409, which may provided by an integrated circuit memory device operating adjacent to a processing unit of the master computing device, and a "slow" memory 411, which provided by a magnetic or optical disk storage device. It should be appreciated however, that with various implementations of the invention, the data block resource manager 407 may be capable of retrieving data from or saving data to any number of different storage sources. With these implementations, the blocks in a DBLOCK ID will be allocated from the same source, and the source should support the specified block allocation (e.g., 16Kb with the illustrated example).

[45] The data block resource manager 407 includes a data block resource server 413 for each instantiation of the pseudo computing processes 403A-403x. Thus, the data block resource manager 407 includes a data block resource server 413 A associated with the computing process 403 A, a data block resource server 413x associated with the computing process 403x, etc. With various implementations of the invention, each data block resource server 413 is dedicated to its corresponding computing process 403. Also, with the illustrated implementation of the invention, the primary computing process 403 ' can interact with the data block resource manager 407 directly, omitting the need for a data block resource server 413 associated with the primary computing process 403'. For still other implementations of the invention, however, the primary computing process 403' may interact with the data block resource manager 407 through an associated data block resource server 413' was well.

[46] When an instantiation of a pseudo computing process 403 needs to retrieve data from the memory of the computing device, it instructs its associated data block manager 405 to request the data from the data block resource manage 407. In response, the data block manager 405 requests the specified data from its associated data block resource server 413. The data block resource server 413 then has the data block resource manager 407 obtain the requested data from the local memory 409, the disk memory 411, or both as necessary. After the data block resource manager 407 has retrieved the requested data, the data block resource server 413 provide the retrieved data to its associated data block manager 405, where it can be employed by the pseudo computing process 403. A reverse process is used to save data generated by the computing process 403.

[47] Fig. 5 illustrates a table showing the operations that may be performed by various embodiments of the invention to exchange a sequence of geometric data (named L in the table) between the primary computing process 403' (referred to as HDB 0 in the table) and the pseudo computing process 403A (referred to as HDB 1 in the table). With various implementations of the invention, a data block manager 405 may communicate with its associated data block resource server 413 through, for example, a UNIX domain socket using a conventional Interprocess Communication (IPC) technique. Of course, still other communication techniques can be employed, as appropriate to the underlying operating system being used, the hardware configuration, etc.

Distributed Data

[48] Fig. 6 illustrates an example of a parallel processing computing system that can share data among different processes of a master computing device in accordance with various examples of the invention. As seen in this figure, the parallel processing computing system 601 includes a master computing device 603 (identified by the label "Master Node" in the figure) and a plurality of remote computing devices 605A, 605B...605y (each identified by the label "Remote Node" in the figure). The master computer 503 includes N+l number of instantiations (0, 1,...N) of a computing process 403, as described in detail above, with each of these "master" computing processes 403 including a data block manager 405 and the primary computing process 403 ' including a data block resource manager 407.

[49] Each of the remote computing devices includes one or more instantiations of a remote computing process 607 (labeled as "HDB _ RCS" in the figures). Each remote computing process 607 may be a copy of a corresponding computing process 403 on the master computing device 503, or it may be a subprocess of a corresponding computing process 403 on the master computing device 503. Each remote computing process 607 includes or otherwise employs a remote data client (not shown). Also, each remote computing device 605 implements a remote data server 609 (also referred to as a remote data block resource manager). Like the data block resource manager 407, each remote data server 609 can save data to and retrieve data from a memory of its remote computing device 605. Again, the memory for a remote computing device 605 ay be "fast" memory provided by an integrated circuit memory device operating adjacent to a processing unit of the remote computing device, a "slow" memory provided by a magnetic or optical disk, or some combination thereof. Each of the remote data clients, remote data servers 609, and data block managers 405 are interconnected into a network.

[50] As previously noted, each remote data client (not shown) is responsible for providing the functionality of sending and receiving (reading/writing) data in 16K blocks over the network formed by the remote data clients, remote data servers 609, and data block managers 405. Each remote computing process 607 has a remote data client, and each remote data client, when instantiated, obtains a unique identification from the data block resource manager 407.

[51] Each remote data server 609 then is responsible for receiving 16K blocks from any remote data client and storing the data in the memory of its associated remote computing device 605. Each remote data server 609 also is responsible for retrieving data from the memory of its associated remote computing device 605, and sending that data to any remote data client. When a remote data server 609 is instantiated, the data block resource manager 407 will be informed so that it can track and communicate with the remote data server 609. With various implementations of the invention, a remote data server 609 may manage a fixed amount of memory, such as IGB. Also, control information for controlling the remote data server 609 can be sent to and received from a remote data server 609 via the standard computing process interface, such as an interface using the transmission control protocol (TCP). [52] When a remote instantiation of a computing process 607 needs to access data, it instructs its associated remote data client to request the data. If the data is being managed by another remote computing process 607, then the remote data client will request the data from the remote data server 609 associated with that remote computing process 607. Alternately, if the data is being managed by a computing process on the master computing device, then the remote data client will request the data from the data block manager associated with that master computing process. As discussed above, the data block manager may in turn obtain the data through the data block resource manager associated with the primary computing process on the master computing device. Similarly, if a remote computing process needs to save data, it instructs its associated remote data client to provide the data to the appropriate remote data server or data block resource manager for storage in the corresponding memory.

[53] With various examples of the invention, the data block managers, remote data clients and remote data servers may be interconnected by an electronic communications network using a suitable networking communication protocol, such as the Transmission Control Protocol/Internet protocol (TCP/IP) or User Datagram Protocol. With some implementations of the invention, the data block managers, remote data clients and remote data servers may be interconnected using a reliable form of UDP. For example, some implementations of the invention may use the data formats shown in Figs. 7 and 8 to reliably receive and send blocks of data, respectively, using UDP. The elements of these blocks are discussed below:

a. Sequence number: 4 bytes, RDC-maintained block sequence number, incremented after every successful transaction. This number is returned by the RDS so the RDC can be sure that a specific transaction was successful.

b. Transaction ID: 4 bytes, RDC/RDS-maintained, RDS-specific transaction ID, incremented after every successful transaction, so the RDS can detect stale transactions.

c. Client ID: 4 bytes, unique to the RDC (allocated by resource manager). Why? Used by the RDS as a lookup for RDC transaction ID information. d. Time stamp: 8 bytes. Used by the client to estimate round trip time, gets around the retransmit RTT ambiguity problem.

e. Block index: The index of the block that the RDC wants from the RDS.

f. RDS IP address: 4 bytes, used by the RDC to ensure the received data is from the right RDS. The client UDP socket will receive any data destined for its IP address and port from anywhere.

g. RDS port: 2 bytes, same purpose as above.

[54] The RDC uses sequence numbers, transaction IDs, and remote addresses to validate transactions, but the RDC must also be prepared to retransmit if it does not get a reply from the RDS. Retransmission may be based on RTT estimates (similar to TCP). A RDC will attempt to retransmit up to a maximum number of times, and uses exponential backoff up to a maximum amount of time for its retransmission timer (similar to TCP).

Conclusion

[55] Thus, the present invention has been described in terms of preferred and exemplary embodiments thereof. Numerous other embodiments, modifications and variations within the scope and spirit of the appended claims will occur to persons of ordinary skill in the art from a review of this disclosure.

Claims

What is claimed is:

1. A method of distributing data in a parallel processing system, comprising: at a data block resource manager, receiving a request for data at from a first data block manager, the first data block manager being associated with a first process running on a first computing node in a parallel processing system; determining that the requested data block is stored in a memory location allocated to a second process running on a second computing node in the parallel processing system; requesting the data from a second data block manager associated with the second process; receiving the data from the second data block manager; and providing the received data to the first data block manager.

2. The method recited in claim 1, wherein the first computing node and the second computing node are hosted on a same computing device.

3. The method recited in claim 1, wherein the first computing node is hosted by first computing device and the second computing node is hosted by a second computing device separate from the first computing device.

4. The method recited in claim 1, wherein the first computing node is hosted by first computing device and the second computing node is hosted by a second computing device separate from the first computing device.

5. A method of distributing data in a parallel processing system, comprising: at a remote data server, receiving a request for data from a first remote data client, the first remote data client being associated with a first process running on a first computing node in a parallel processing system; and providing the received data to the first remote data client.

6. A parallel processing system employing any combination of the components discussed above.