EP2069958A2 - Multiprocessor architecture with hierarchical processor organization - Google Patents

Multiprocessor architecture with hierarchical processor organization

Info

Publication number
EP2069958A2
EP2069958A2 EP07811051A EP07811051A EP2069958A2 EP 2069958 A2 EP2069958 A2 EP 2069958A2 EP 07811051 A EP07811051 A EP 07811051A EP 07811051 A EP07811051 A EP 07811051A EP 2069958 A2 EP2069958 A2 EP 2069958A2
Authority
EP
European Patent Office
Prior art keywords
operations
processor
slave
processors
senior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07811051A
Other languages
German (de)
English (en)
French (fr)
Inventor
Dragos Dudau
Eugene Miloslavsky
Nicolas Cobb
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mentor Graphics Corp
Original Assignee
Mentor Graphics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mentor Graphics Corp filed Critical Mentor Graphics Corp
Publication of EP2069958A2 publication Critical patent/EP2069958A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors

Definitions

  • the present invention is directed to the distribution of operations from a master computer among one or more different types of slave computers.
  • Various aspects of the invention may be applicable to the distribution of a first type of operation to a first type of slave computing unit, and the distribution of a second type of operation to a second type of slave computing unit.
  • Shifting from single multi-processor computers to multiple networked single- processor computers is particularly useful where the data being processed has parallelism.
  • one portion of the data is independent of another portion of the data. That is, manipulation of a first portion of the data does not require knowledge of or access to a second portion of the data.
  • one single-processor computer can execute an operation on a first portion of the data while another single- processor computer can simultaneously execute another operation on a second portion of the data.
  • one portion of the design such as a semiconductor gate in a first area of a microcircuit
  • another portion of the design such as a wiring line in a second area of the microcircuit.
  • Design analysis operations such as operations defining a minimum width check of a structure, can thus be executed by one computer for the gate while another computer executes the same operations for the wiring line.
  • Various aspects of the invention relate to techniques of more efficiently processing data for a software application using a plurality of computers. As will be discussed in detail below, embodiments of both tools and methods implementing these techniques have particular application for analyzing microdevice design data by distributing operations among different types of single-processor computers in a network.
  • a computing system has a multiprocessor architecture.
  • the processors are hierarchically organized so that one or more slave processors at a senior hierarchical level provide tasks to one or more slave processors at a junior hierarchical level.
  • the slave processors at the junior hierarchical level will have a different operational capability than the slave processors at the senior hierarchical level, such that the junior slave processors can perform some types of operations better than the senior slave processors.
  • the junior slave processors may be capable of executing one or more operations, such as floating point number calculations, significantly faster than the senior slave processors.
  • Various implementations of the invention may additionally include one or more processors at a master hierarchical level, for coordinating the operation of the senior slave processors, and/or one or more processors at an intermediate hierarchical level for managing the cooperation between the senior slave processors and the junior slave processors.
  • a master computing process distributes operation sets among one or more computing processes running on a senior processor.
  • these operation sets may be parallel (that is, the execution of one of the operation sets does not require results obtained from the prior execution of another of the operation sets, and vice versa).
  • each operation set may include operations of the type that are better performed by the junior slave processors.
  • a computing process running on a senior slave processor will begin executing operations in the operation set.
  • the senior slave computing process identifies one or more operations of the type better performed by the junior slave processor, it provides this operation or operations to a junior slave processor running on a second type of computing device. After the junior computing process executes its assigned operation or operations, it returns the results to the senior computing process to complete the execution of the operation set.
  • FIG. 1 is a schematic diagram of a computer that may be employed by various embodiments of the invention.
  • FIG. 2 is a schematic diagram of a processor unit for a computer that may be employed by various embodiments of the invention.
  • FIG. 3 schematically illustrates an example of a computing system with a hierarchical processor arrangement according to various embodiments of the invention.
  • FIGs. 4A-4C and Figs. 5 A and 5B illustrate flowcharts describing the operation of the computing system shown in Fig. 3 according to various embodiments of the invention.
  • Fig. 6 illustrates a chart showing an estimated improvement in operation speed that would be obtained with different computing system configurations according to various embodiments of the invention.
  • FIG. 7 illustrates another example of a computing system with a hierarchical processor arrangement according to various embodiments of the invention.
  • FIG. 8 illustrates yet another example of a computing system with a hierarchical processor arrangement according to various embodiments of the invention.
  • Various embodiments of the invention relate to tools and methods for distributing operations among multiple networked computing devices for execution. Accordingly, to better facilitate an understanding of the invention, an example of a computing device that may be employed in a network made up of a master computer linked to a plurality of different slave computers will be discussed.
  • FIG. 1 An illustrative example of a computing device 101 that may be used to implement various embodiments of the invention therefore is illustrated in Figure 1.
  • the computing device 101 has a computing unit 103.
  • the computing unit 103 typically includes a processor unit 105 and a system memory 107.
  • the processor unit 105 may be any type of processing device for executing software instructions, but will conventionally be a microprocessor device.
  • the system memory 107 may include both a read-only memory (ROM) 109 and a random access memory (RAM) 111.
  • ROM read-only memory
  • RAM random access memory
  • both the read-only memory (ROM) 109 and the random access memory (RAM) 111 may store software instructions for execution by the processor unit 105.
  • FIG. 1 illustrates an example of a multi-core processor unit 105 that may be employed with various embodiments of the invention.
  • the processor unit 105 includes a plurality of processor cores 201.
  • Each processor core 201 includes a computing engine 203 and a memory cache 205.
  • a computing engine contains logic devices for performing various computing functions, such as fetching software instructions and then performing the actions specified in the fetched instructions.
  • Each computing engine 203 may then use its corresponding memory cache 205 to quickly store and retrieve data and/or instructions for execution.
  • Each processor core 201 is connected to an interconnect 207.
  • the particular construction of the interconnect 207 may vary depending upon the architecture of the processor unit 201. With some processor units 201 , such as the Cell microprocessor created by Sony Corporation, Toshiba Corporation and IBM Corporation, the interconnect 207 may be implemented as an interconnect bus. With other processor units 201, however, such as the OpteronTM and AthlonTM dual-core processors available from Advanced Micro Devices of Sunnyvale, California, the interconnect 207 may be implemented as a system request interface device. In any case, the processor cores 201 communicate through the interconnect 207 with an input/output interfaces 209 and a memory controller 211.
  • the input/output interface 209 provides a communication interface between the processor unit 201 and the bus 113.
  • the memory controller 211 controls the exchange of information between the processor unit 201 and the system memory 107.
  • the processor units 201 may include additional components, such as a high- level cache memory accessible shared by the processor cores 201.
  • FIG. 2 shows one illustration of a processor unit 201 that may be employed by some embodiments of the invention, it should be appreciated that this illustration is representative only, and is not intended to be limiting.
  • various embodiments of the invention may employ a computing device with a Cell processor.
  • the Cell processor employs multiple input/output interfaces 209 and multiple memory controllers 211.
  • the Cell processor has nine different processor cores 201 of different types. More particularly, it has six or more synergistic processor elements (SPEs) and a power processor element (PPE).
  • SPEs synergistic processor elements
  • PPE power processor element
  • Each synergistic processor element has a vector-type computing engine 203 with 128 x 128 bit registers, four single-precision floating point computational units, four integer computational units, and a 256KB local store memory that stores both instructions and data.
  • the power processor element then controls that tasks performed by the synergistic processor elements. Because of its configuration, the Cell processor can perform some mathematical operations, such as the calculation of fast Fourier transforms (FFTs), at substantially higher speeds than conventional processor units 105.
  • FFTs fast Fourier transforms
  • the computing unit 103 will be directly or indirectly connected to one or more network interfaces 115 for communicating with other devices in a network as will be discussed in further detail below.
  • the network interface 115 translates data and control signals from the computing unit 103 into network messages according to one or more communication protocols, such as the transmission control protocol (TCP), the user datagram protocol (UDP), and the Internet protocol (IP).
  • TCP transmission control protocol
  • UDP user datagram protocol
  • IP Internet protocol
  • An interface 123 may employ any suitable connection agent (or combination of agents) for connecting to a network, including, for example, a wireless transceiver, a modem, or an Ethernet connection.
  • the connection agent may employ any desired medium, such as radio frequency transmissions, an optical cable, or conductive wires.
  • the processing unit 105 and the system memory 107 are connected, either directly or indirectly, through a bus 113 or alternate communication structure, to one or more peripheral devices.
  • the processing unit 105 or the system memory 107 may be directly or indirectly connected to one or more additional memory storage devices, such as a magnetic hard disk drive 1 17 or a removable magnetic optical disk drive 119.
  • the computing device 101 may include additional or alternate memory storage devices, such as or a magnetic disk drive (not shown) or a flash memory card (not shown).
  • the processing unit 105 and the system memory 107 also may be directly or indirectly connected to one or more input devices 121 and one or more output devices 123.
  • the input devices 121 may include, for example, a keyboard and a pointing device (such as a mouse, touchpad, digitizer, trackball, or joystick).
  • the output devices 123 may include, for example, a display monitor and a printer.
  • peripheral devices may be housed with the computing unit 103 and bus 113. Alternately or additionally, one or more of these peripheral devices may be housed separately from the computing unit 103 and bus 113, and then connected (either directly or indirectly) to the bus 113. Also, it should be appreciated that a computing device 101 employed according to various embodiments of the invention may include any of the components illustrated in Figure 1, may include only a subset of the components illustrated in Figure 1, or may include an alternate combination of components from those shown in Figure 1, including some components that are not shown in Figure 1.
  • various aspects of the invention relate to the execution of sets of operations by a computing system with a multiprocessor architecture. Accordingly, different embodiments of the invention can be employed with a variety of different types of software applications. Some embodiments of the invention, however, may be particularly useful in running software applications that perform operations for simulating, verifying or modifying design data representing a microdevice, such as a microcircuit.
  • Designing and fabricating microcircuit devices involve many steps during a 'design flow' process. These steps are highly dependent on the type of microcircuit, the complexity, the design team, and the microcircuit fabricator or foundry. Several steps are common to all design flows: first a design specification is modeled logically, typically in a hardware design language (HDL). Software and hardware "tools" then verify the design at various stages of the design flow by running software simulators and/or hardware emulators, and errors are corrected.
  • HDL hardware design language
  • the physical design data may represent, for example, the geometric pattern that will be written onto a mask used to fabricate the desired microcircuit device in a photolithographic process at a foundry. It is very important that the physical design information accurately embody the design specification and logical design for proper operation of the device. Further, because the physical design data is employed to create masks used at a foundry, the data must conform to foundry requirements. Each foundry specifies its own physical design parameters for compliance with their process, equipment, and techniques. Accordingly, the design flow may include a design rule check process. During this process, the physical layout of the circuit design is compared with design rules. In addition to rules specified by the foundry, the design rule check process may also check the physical layout of the circuit design against other design rules, such as those obtained from test chips, knowledge in the industry, etc.
  • RET resolution enhancement techniques
  • OPC optical proximity correction
  • SRAF sub- resolution assist features
  • the design of a new integrated circuit may include the interconnection of millions of transistors, resistors, capacitors, or other electrical structures into logic circuits, memory circuits, programmable field arrays, and other circuit devices.
  • transistors resistors, capacitors, or other electrical structures into logic circuits, memory circuits, programmable field arrays, and other circuit devices.
  • cells typically referred to as "cells.”
  • all of the transistors making up a memory circuit for storing a single bit may be categorized into a single "bit memory” cell.
  • the group of transistors making up a single-bit memory circuit can thus collectively be referred to and manipulated as a single unit.
  • the design data describing a larger 16-bit memory register circuit can be categorized into a single cell. This higher level "register cell” might then include sixteen bit memory cells, together with the design data describing other miscellaneous circuitry, such as an input/output circuit for transferring data into and out of each of the bit memory cells.
  • the design data describing a 128kB memory array can then be concisely described as a combination of only 64,000 register cells, together with the design data describing its own miscellaneous circuitry, such as an input/output circuit for transferring data into and out of each of the register cells.
  • microcircuit design data By categorizing microcircuit design data into hierarchical cells, large data structures can be processed more quickly and efficiently. For example, a circuit designer typically will analyze a design to ensure that each circuit feature described in the design complies with design rules specified by the foundry that will manufacture microcircuits from the design. With the above example, instead of having to analyze each feature in the entire 128kB memory array, a design rule check process can analyze the features in a single bit cell. The results of the check will then be applicable to all of the single bit cells.
  • the design rule check process then can complete the analysis of a register cell simply by analyzing the features of its additional miscellaneous circuitry (which may itself be made of up one or more hierarchical cells). The results of this check will then be applicable to all of the register cells.
  • the design rule check software application can complete the analysis of the entire 128kB memory array simply by analyzing the features of the additional miscellaneous circuitry in the memory array. Thus, the analysis of a large data structure can be compressed into the analyses of a relatively small number of cells making up the data structure.
  • the data making up a circuit design may also have parallelism. That is, some portions of a microcircuit design may be independent from other portions of the design. For example, a cell containing design data for a 16 bit comparator will be independent of the register cell. While a "higher" cell may include both a comparator cell and a register cell, one cell does not include the other cell. Instead, the data in these two lower cells are parallel. Because these cells are parallel, the same design rule check operation can be performed on both cells simultaneously without conflict. Thus, in a multi-processor computer running multiple computing threads, a first computing thread can thus execute a design rule check operation on the register cell while a separate, second computing thread executes the same design rule check operation on the comparator cell.
  • a microcircuit analysis software application also may have a hierarchical organization with parallelism.
  • a software application that implements a design rule check operations for the physical layout data of a microcircuit design will be described.
  • this type of software tool performs operations on the data that defines the geometric features of the microcircuit. For example, a transistor gate is created at the intersection of a region of polysilicon material and a region of diffusion material.
  • the physical layout design data used to form a transistor gate in a lithographic process will be made up of a polygon in a layer of polysilicon material and an overlapping polygon in a layer of diffusion material.
  • microcircuit physical design data will include two different types of data: "drawn layer” design data and "derived layer” design data.
  • the drawn layer data describes polygons drawn in the layers of material that will form the microcircuit.
  • the drawn layer data will usually include polygons in metal layers, diffusion layers, and polysilicon layers.
  • the derived layers will then include features made up of combinations of drawn layer data and other derived layer data. For example, with the transistor gate described above, the derived layer design data describing the gate will be derived from the intersection of a polygon in the polysilicon material layer and a polygon in the diffusion material layer.
  • a design rule check software application will perform two types of operations: “check” operations that confirm whether design data values comply with specified parameters, and “derivation” operations that create derived layer data.
  • check operations that confirm whether design data values comply with specified parameters
  • derivation operations that create derived layer data.
  • transistor gate design data may be created by the following derivation operation:
  • a check operation will then define a parameter or a parameter range for a data design value. For example, a user may want to ensure that no metal wiring line is within a micron of another wiring line. This type of analysis may be performed by the following check operation:
  • the results of this operation will identify each polygon in the metal layer design data that are closer than one micron to another polygon in the metal layer design data.
  • check operations may be performed on derived layer data as well. For example, if a user wanted to confirm that no transistor gate is located within one micron of another gate, the design rule check process might include the following check operation:
  • Optical proximity correction (OPC) operations are one category of example of simulation and verification operations that will typically be executed using floating point number computations.
  • optical proximity correction includes the modification of a physical layout of a circuit design to improve the reproduction accuracy of the layout during a lithographic process.
  • optical proximity correction as used herein will also include the modification of the physical layout to improve the robustness of the lithographic process for, e.g., printing isolated features and/or features at abrupt proximity transitions.
  • the polygon edges of the physical layout are divided into small segments. These segments are then moved, and additional small polygons may be added to the physical layout at strategic locations.
  • the lithographic process is then simulated to determine whether the image that would be created by the modified or "corrected" layout would be better than the image created that would be created by previous modifications to the layout image. This process is then iteratively repeated until a modified layout the simulation and verification tool generates a modified layout that will produce a satisfactory image resolution during an actual lithographic process.
  • optical proximity correction techniques are classified as either rule-based or model-based.
  • rule-based optical proximity correction the layout modifications are generated based upon specific rules. For example, small serifs may be automatically added to each convex (i.e., outwardly-pointing) 90° corner in the layout.
  • Model-based optical proximity correction generally will be significantly more complex than rule-based optical proximity correction.
  • model-based optical proximity correction lithographic process data obtained from test layouts are used to create mathematical models of the lithographic patterning behavior. Using an appropriate model, the simulation and verification tool will then calculate the image that wall be created by a corrected layout during the lithographic process.
  • the layout features undergoing correction then are iteratively manipulated until the image for the layout (calculated using the model) is sufficiently close to the desired layout image.
  • some model-based optical proximity correction algorithms may require the simulation of multiple lithographic process effects by a calculating a weighted sum of pre-simulated results for edges and comers.
  • An example of an optical proximity correction algorithm is described in "Fast Optical and Process Proximity Correction Algorithms for Integrated Circuit Manufacturing," by Nick Cobb (Ph.D. Thesis), University of California, Berkeley, 1998.
  • Obtaining a simulated lithographic image may involve modeling the lithographic light source as a plurality of separate coherent light sources arranged at different angles. For each such coherent light source, a simulated image is obtained by calculating a fast Fourier transform (FFT) to model the operation of the lens used in the lithographic process. These simulated images are then summed to obtain the image that would be produced by the lithographic process.
  • FFT fast Fourier transform
  • FIG. 3 illustrates a hierarchical processor computing system 301 according to various embodiments of the invention.
  • this hierarchical processor computing system 301 may be employed to efficiently implement a simulation and verification tool that calculates both integral number computations and floating point number computations.
  • the hierarchical processor computing system 301 includes a master computing module 303, and a plurality of senior slave computing modules 305A-305 ⁇ .
  • the hierarchical processor computing system 301 also includes a dispatcher computing module 307 and a plurality of junior slave computing modules 309A-309 ⁇ .
  • each of the senior slave computing modules 305A-305 ⁇ may be implemented by a computer, such as computing device 101, using one or more processor units 103.
  • each of the senior slave computing modules 3O5A-3O5 ⁇ may be implemented by a conventional server computer using a conventional single-core processor, such as the OpteronTM single-core processor available from Advanced Micro Devices of Sunnyvale, California.
  • one or more of the senior slave computing modules 3O5A-3O5 ⁇ may be implemented by a server computer having multiple single-core processors.
  • a single server computer 101 may have multiple OpteronTM single-core processors. Each OpteronTM single-core processor can then be used to implement an instance of a senior slave computing module 305.
  • Still other implementations of the invention may employ computers with multi-core processors, with each processor or, alternatively, each core being used to implement an instantiation of a senior slave computing module 305.
  • a computing device 101 may employ a single OpteronTM dual-core processor to implement a single instantiation of a senior slave computing module 305.
  • a computing device 101 may use a single OpteronTM dual-core processor to implement two separate instantiations of a senior slave computing module 305 (i.e., a separate instantiation being implemented by each core of the OpteronTM dual-core processor).
  • a computing device 101 used to implement multiple instantiations of a senior slave computing module 305 may have a plurality of single- core processors, multi-core processors, or some combination thereof.
  • each of the master computing module 303 and the dispatcher computing module 307 may be implemented by a separate computing device 101 from the senior slave computing modules 305A-305 ⁇ .
  • the master computing module 303 may be implemented by a computing device 101 having a single OpteronTM single- core processor or OpteronTM dual-core processor.
  • the dispatcher computing module 307 may then be implemented by another computing device 101 having a single OpteronTM single-core processor or OpteronTM dual-core processor.
  • one or both of the master computing module 303 and the dispatcher computing module 307 may be implemented using the same computing device 101 or processor unit 201 as a senior slave computing module 305.
  • the master computing module 303 may be implemented by a multiprocessor computing device.
  • One processor unit 201 can be used to run an instantiation of the master computing module 303, while the remaining processor units 201 can then each be used to implement an instantiation of a senior slave computing module 305.
  • a single core in a multi-core processor unit 201 may be used to run an instantiation of the master computing module 303, while the remaining cores can then each be used to implement an instantiation of a senior slave computing module 305.
  • the master computing module 303, the dispatcher computing module 307 or both may even share single-core processor unit 201 (or a single core of a multi-core processor unit 201) with one or more instantiations of a senior slave computing module 305 using, for example, multi-threading technology.
  • each of the junior slave computing modules 309A-309 ⁇ may be implemented by a computer, such as computing device 101, using one or more processor units 103 that have a different functional capability from the processor units 103 used to implement the senior slave computing modules 305 A- 305 ⁇ .
  • the senior slave computing modules 305 A- 305 ⁇ may be implemented using some type of OpteronTM processor available from Advanced Micro Devices.
  • this type of processor is configured to perform integral number computations more quickly than floating point number computations.
  • one or more of the junior slave computing modules 309A-309 ⁇ may be implemented using a Cell processor available from International Business Machines Corporation of Armonk, New York.
  • this type of processor is configured to perform floating point number computations more quickly than the OpteronTM processor.
  • Each of the master computing module 303, senior slave computing modules 305A- 305 ⁇ , dispatcher computing module 307, and the junior slave computing modules 309A-309 ⁇ may be a computing process created using some variation of the Unix operation system, some variation of the Microsoft Windows operating system available from Microsoft Corporation of Redmond, Washington, or some combination of both.
  • any software operating system or combination of software operating systems can be used to implement any of the master computing module 303, senior slave computing modules 305A-305 ⁇ , dispatcher computing module 307, and the junior slave computing modules 309A-309 ⁇ .
  • each of the master computing module 303, senior slave computing modules 3O5A-305 ⁇ , dispatcher computing module 307, and the junior slave computing modules 309A-309 ⁇ are interconnected through a network 311.
  • the network 311 may use any communication protocol, such as the well-known Transmission Control Protocol (TCP) and Internet Protocol (IP).
  • TCP Transmission Control Protocol
  • IP Internet Protocol
  • the network 311 may be a wired network using conventional conductive wires, a wireless network (using, for example radio frequency or infrared frequency signals as a medium), an optical cable network, or some combination thereof. It should be appreciated, however, that the communication rate across the network 311 should be sufficiently fast so as not to delay the operation of the computing modules 303-309.
  • each of the master computing module 303 and the senior slave computing modules 305A-305 ⁇ initiates an instance of the target software application that will be run on the hierarchical processor computing system 301.
  • some examples of the invention may be used to run a simulation and verification software application for analyzing and modifying microcircuit designs.
  • some embodiments of the invention may be used to run the CALIBRE microcircuit design analysis software application available from Mentor Graphics Corporation of Wilsonville, Oregon.
  • step 403 the master computing module 303 initiates the operation of the dispatcher computing module 307.
  • the operation of the dispatcher computing module 307 may be started manually by a user.
  • the dispatcher computing module 307 has each of the junior slave computing modules 309A-309 ⁇ initiate an instance of the target software application in step 405.
  • step 407 When each of the senior slave computing modules 305A-305 ⁇ is ready to begin running an instantiation of the target software application, it reports its readiness and its network address to the master computing module 303 in step 407. Similarly, in step 409, when each of the junior slave computing modules 309A-309 ⁇ is ready to begin running an instantiation of the target software application, it reports its readiness and its network address to the dispatcher computing module 307. When each of the junior slave computing modules 309A-309 ⁇ has reported its readiness and network address to the dispatcher computing module 307, in step 41 1 the dispatcher computing module 307 reports it readiness and network address to master computing module 303.
  • the master computing module 303 provides the network address of the dispatcher computing module 307 to each of the senior slave computing modules 305A-305 ⁇ in step 413.
  • the master computing module 303 begins assigning sets of operations to individual senior slave computing modules 305A-305 ⁇ for execution. More particularly, the master computing module 303 will access the next set of operations that are to be performed by the target software application. It provides this operation set to the next available senior slave computing module 305, together with the relevant data required to perform the operation set. This process is repeated until all of the senior slave computing modules 305A-305 ⁇ are occupied (or until there are no further operations to be executed).
  • the operation of the senior slave computing modules 305A-305 ⁇ , the dispatcher computing module 307 and the junior slave computing modules 309A-309 ⁇ with now be discussed with regard to the flowchart shown in Figs. 5A-5B.
  • a senior slave computing module 305 executes operations in the operation set that are of a first type better suited to execution by a senior slave computing module 305.
  • the senior slave computing modules 305A-305 ⁇ may be implemented using processor units 201 that execute integral number computations more efficiently than floating point number computations. Accordingly, if the operation set includes operations that primarily involve integral number computations, such as design rule check operations, then these operations will be performed by the a senior slave computing module 305 to which they have been assigned by the master computing module 303.
  • the senior slave computing module 305 identifies one or more operations in the operation that are of a second type better suited to execution by a junior slave computing module 309.
  • the junior slave computing modules 309A-309 ⁇ may be implemented using processor units 201 that execute floating point number computations more efficiently than the processor units 201 used to implement the senior slave computing modules 305A-305 ⁇ . Accordingly, if the operation set includes operations that primarily involve floating point number computations, such as optical proximity correction operations or optical proximity correction verification operations, then these operations will be identified by the senior slave computing module 305 to which they have been assigned by the master computing module 303.
  • step 505 the senior slave computing module 305 sends an inquiry to the dispatcher computing module 307 for the network address of an available junior slave computing module 309.
  • the dispatcher computing module 307 sends the senior slave computing module 305 the network address of a junior slave computing module 309 that is not currently occupied performing other operations in step 507.
  • the dispatcher computing module 307 may select available junior slave computing modules 309 A- 309 ⁇ using any desired algorithm, such as a round-robin algorithm.
  • step 509 start transfers the identified operations of the second type to the available junior slave computing module 309 for execution.
  • the junior slave computing module 309 then executes the transferred operations in step 511, and returns the results of executing the transferred operations back to the senior slave computing module 305 in step 513.
  • the senior slave computing module 305 may wait indefinitely for the results from the junior slave computing module 309. With other examples of the invention, however, the senior slave computing module 305 may only wait a threshold time period for the results from the junior slave computing module 309. After this time period expires, the senior slave computing module 305 may begin executing the transferred operations itself, on the assumption that the junior slave computing module 309 has failed and will never return the operation results.
  • the senior slave computing module 305 may simply wait in an idle mode for the results from the junior slave computing module 309. With other examples of the invention, however, the senior slave computing module 305 may employ multi-tasking techniques to begin executing a second operation set assigned by the master computing module 303 while waiting for the results from the junior slave computing module 309 to complete the execution of the first operation set.
  • Steps 501 -511 are repeated until all of the operations in the operation set have been performed. Once all of the operations in the operation set have been performed, then the senior slave computing module 305 returns the results obtained from performing the operation set to master computing module 303 in step 515. [62] Returning now to Fig. 4, in step 417 the master computing module 303 receives the operation results from the senior slave computing module 305. In step 419, the master computing module 303 determines if there are any more operations sets that need to be executed. If so, then steps 415 and 417 are repeated for the next operation set. If there are no more operations that need to be executed, then the process ends.
  • the Cell microprocessor may be approximately 100 times faster for performing some operations, such as image simulation operations used for optical proximity control, than a conventional OpteronTM processor.
  • the Cell processor may be slower (e.g., only 0.9 times as fast) as a conventional OpteronTM processor for other types of operations, such as design rule check operations.
  • processor units 201 By employing different types of processor units 201 in a computing system 301, and then matching each operation to the type of processor unit 201 best suited to execute that operation, various implementations of the invention can execute the operations of a process much faster than a homogenous-processor computing system.
  • the ratio of senior slave computing modules 305A-305 ⁇ to junior slave computing modules 309A-309 ⁇ may depend upon the types of operations that are expected to be performed by the computing system 301.
  • some embodiments of the invention may implement a computing system 301 that uses OpteronTM processors and Cell processors to perform simulation and verification operations including image simulation operations.
  • Fig. 6 illustrates the estimated increase in speed that may be obtained for different ratios of simulation/non-simulation operations, based upon the number of Cell processors employed in the computing system 301.
  • the y-axis of this figure illustrates the ratio of the estimated runtime of a typical integrated circuit design analysis process with an embodiment of the invention to the estimated runtime of that integrated circuit design analysis process on a conventional distributed processing system, while the x-axis then corresponds to the number of Cell processor employed in the computing system 301. Each curve then corresponds to a ratio of floating point number operations to integral number operations in the analysis process.
  • FIG. 3 illustrates one example of a hierarchical processor computing system that may be implemented according to various embodiments of the invention
  • Fig. 7 illustrates a computing system 701 that includes a second master computing module 703 and a second set of senior slave computing modules 705A-705 ⁇ .
  • the second master computing module 703 and a second set of senior slave computing modules 705A-705 ⁇ share the user of the dispatcher computing module 307 and the junior slave computing modules 309A-309 ⁇ .
  • This type arrangement may be useful where, for example, the processor units 201 used to implement the junior slave computing modules 309A-309 ⁇ are relatively expensive and/or sparsely used, and are to be shared among two or more sets of master computing modules senior slave computing modules.
  • FIG. 8 illustrates a computing system 801 that omits the dispatcher computing module 307 altogether. Instead, each senior slave computing module 305 is assigned the exclusive use of a corresponding junior slave computing module 309.
  • This type configuration may be useful where, for example, the processor units 201 used to implement the junior slave computing modules 309A-309 ⁇ are relatively inexpensive and/or are so frequently used that the optimum number of junior slave computing modules 309A-309 ⁇ needed to obtain a desired operating speed would match the number of senior slave computing modules 305A-305 ⁇ .
  • the processor units 201 used to implement the junior slave computing modules 309A-309 ⁇ are relatively inexpensive and/or are so frequently used that the optimum number of junior slave computing modules 309A-309 ⁇ needed to obtain a desired operating speed would match the number of senior slave computing modules 305A-305 ⁇ .
  • still other configurations using a hierarchical arrangement of different types of processors will be apparent to those of ordinary skill in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)
  • Multi Processors (AREA)
EP07811051A 2006-08-13 2007-08-03 Multiprocessor architecture with hierarchical processor organization Withdrawn EP2069958A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US82224706P 2006-08-13 2006-08-13
PCT/US2007/017347 WO2008021024A2 (en) 2006-08-13 2007-08-03 Multiprocessor architecture with hierarchical processor organization

Publications (1)

Publication Number Publication Date
EP2069958A2 true EP2069958A2 (en) 2009-06-17

Family

ID=39082534

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07811051A Withdrawn EP2069958A2 (en) 2006-08-13 2007-08-03 Multiprocessor architecture with hierarchical processor organization

Country Status (4)

Country Link
EP (1) EP2069958A2 (ja)
JP (1) JP2010500692A (ja)
CN (1) CN101523381A (ja)
WO (1) WO2008021024A2 (ja)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7003758B2 (en) 2003-10-07 2006-02-21 Brion Technologies, Inc. System and method for lithography simulation
JP5326308B2 (ja) * 2008-03-13 2013-10-30 日本電気株式会社 コンピュータリンク方法及びシステム
FR2984557B1 (fr) * 2011-12-20 2014-07-25 IFP Energies Nouvelles Systeme et procede de prediction des emissions de polluants d'un vehicule avec calculs simultanes de la cinetique chimique et des emissions
US8959522B2 (en) 2012-01-30 2015-02-17 International Business Machines Corporation Full exploitation of parallel processors for data processing
US9141631B2 (en) 2012-04-16 2015-09-22 International Business Machines Corporation Table boundary detection in data blocks for compression

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2704663B1 (fr) * 1993-04-29 1995-06-23 Sgs Thomson Microelectronics Procédé et dispositif de détermination de la composition d'un circuit intégré.
EP0627682B1 (en) * 1993-06-04 1999-05-26 Sun Microsystems, Inc. Floating-point processor for a high performance three dimensional graphics accelerator
FR2727540B1 (fr) * 1994-11-30 1997-01-03 Bull Sa Outil d'aide a la repartition de la charge d'une application repartie
US5682323A (en) * 1995-03-06 1997-10-28 Lsi Logic Corporation System and method for performing optical proximity correction on macrocell libraries
JP3981238B2 (ja) * 1999-12-27 2007-09-26 富士通株式会社 情報処理装置
US6703167B2 (en) * 2001-04-18 2004-03-09 Lacour Patrick Joseph Prioritizing the application of resolution enhancement techniques
US20040083475A1 (en) * 2002-10-25 2004-04-29 Mentor Graphics Corp. Distribution of operations to remote computers
JP2006155187A (ja) * 2004-11-29 2006-06-15 Sony Corp 情報処理システム、情報処理装置および方法、記録媒体、並びにプログラム。

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2008021024A3 *

Also Published As

Publication number Publication date
JP2010500692A (ja) 2010-01-07
CN101523381A (zh) 2009-09-02
WO2008021024A2 (en) 2008-02-21
WO2008021024A3 (en) 2008-05-15

Similar Documents

Publication Publication Date Title
US7409656B1 (en) Method and system for parallelizing computing operations
Lutz et al. PARTANS: An autotuning framework for stencil computation on multi-GPU systems
US8938696B1 (en) Techniques of optical proximity correction using GPU
US8516399B2 (en) Collaborative environment for physical verification of microdevice designs
US10643015B2 (en) Properties in electronic design automation
US8234599B2 (en) Use of graphs to decompose layout design data
JP5496986B2 (ja) 並列演算の分散方法及び装置
US8612902B1 (en) Retargeting multiple patterned integrated circuit device designs
US20140337810A1 (en) Modular platform for integrated circuit design analysis and verification
US10311197B2 (en) Preserving hierarchy and coloring uniformity in multi-patterning layout design
WO2008021024A2 (en) Multiprocessor architecture with hierarchical processor organization
US20080235497A1 (en) Parallel Data Output
US20080140989A1 (en) Multiprocessor Architecture With Hierarchical Processor Organization
US8352891B2 (en) Layout decomposition based on partial intensity distribution
US9047434B2 (en) Clustering for processing of circuit design data
US20150143317A1 (en) Determination Of Electromigration Features
US10895864B2 (en) Fabric-independent multi-patterning
US20130318487A1 (en) Programmable Circuit Characteristics Analysis
US20130198699A1 (en) Pattern Matching Optical Proximity Correction
US20120198394A1 (en) Method For Improving Circuit Design Robustness
US10908511B2 (en) Systems and methods for patterning color assignment
Lohoff et al. Interfacing neuromorphic hardware with machine learning frameworks-a review
US20130145340A1 (en) Determination Of Uniform Colorability Of Layout Data For A Double Patterning Manufacturing Process
Andonovski et al. Introduction to scientific computing technologies for global analysis of multidimensional nonlinear dynamical systems
US11972193B1 (en) Automatic elastic CPU for physical verification

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20090227

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK RS

17Q First examination report despatched

Effective date: 20121009

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20130220