EP2069958A2 - Architecture multiprocesseur avec organisation hiérarchique des processeurs - Google Patents
Architecture multiprocesseur avec organisation hiérarchique des processeursInfo
- Publication number
- EP2069958A2 EP2069958A2 EP07811051A EP07811051A EP2069958A2 EP 2069958 A2 EP2069958 A2 EP 2069958A2 EP 07811051 A EP07811051 A EP 07811051A EP 07811051 A EP07811051 A EP 07811051A EP 2069958 A2 EP2069958 A2 EP 2069958A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- operations
- processor
- slave
- processors
- senior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5044—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
Definitions
- the present invention is directed to the distribution of operations from a master computer among one or more different types of slave computers.
- Various aspects of the invention may be applicable to the distribution of a first type of operation to a first type of slave computing unit, and the distribution of a second type of operation to a second type of slave computing unit.
- Shifting from single multi-processor computers to multiple networked single- processor computers is particularly useful where the data being processed has parallelism.
- one portion of the data is independent of another portion of the data. That is, manipulation of a first portion of the data does not require knowledge of or access to a second portion of the data.
- one single-processor computer can execute an operation on a first portion of the data while another single- processor computer can simultaneously execute another operation on a second portion of the data.
- one portion of the design such as a semiconductor gate in a first area of a microcircuit
- another portion of the design such as a wiring line in a second area of the microcircuit.
- Design analysis operations such as operations defining a minimum width check of a structure, can thus be executed by one computer for the gate while another computer executes the same operations for the wiring line.
- Various aspects of the invention relate to techniques of more efficiently processing data for a software application using a plurality of computers. As will be discussed in detail below, embodiments of both tools and methods implementing these techniques have particular application for analyzing microdevice design data by distributing operations among different types of single-processor computers in a network.
- a computing system has a multiprocessor architecture.
- the processors are hierarchically organized so that one or more slave processors at a senior hierarchical level provide tasks to one or more slave processors at a junior hierarchical level.
- the slave processors at the junior hierarchical level will have a different operational capability than the slave processors at the senior hierarchical level, such that the junior slave processors can perform some types of operations better than the senior slave processors.
- the junior slave processors may be capable of executing one or more operations, such as floating point number calculations, significantly faster than the senior slave processors.
- Various implementations of the invention may additionally include one or more processors at a master hierarchical level, for coordinating the operation of the senior slave processors, and/or one or more processors at an intermediate hierarchical level for managing the cooperation between the senior slave processors and the junior slave processors.
- a master computing process distributes operation sets among one or more computing processes running on a senior processor.
- these operation sets may be parallel (that is, the execution of one of the operation sets does not require results obtained from the prior execution of another of the operation sets, and vice versa).
- each operation set may include operations of the type that are better performed by the junior slave processors.
- a computing process running on a senior slave processor will begin executing operations in the operation set.
- the senior slave computing process identifies one or more operations of the type better performed by the junior slave processor, it provides this operation or operations to a junior slave processor running on a second type of computing device. After the junior computing process executes its assigned operation or operations, it returns the results to the senior computing process to complete the execution of the operation set.
- FIG. 1 is a schematic diagram of a computer that may be employed by various embodiments of the invention.
- FIG. 2 is a schematic diagram of a processor unit for a computer that may be employed by various embodiments of the invention.
- FIG. 3 schematically illustrates an example of a computing system with a hierarchical processor arrangement according to various embodiments of the invention.
- FIGs. 4A-4C and Figs. 5 A and 5B illustrate flowcharts describing the operation of the computing system shown in Fig. 3 according to various embodiments of the invention.
- Fig. 6 illustrates a chart showing an estimated improvement in operation speed that would be obtained with different computing system configurations according to various embodiments of the invention.
- FIG. 7 illustrates another example of a computing system with a hierarchical processor arrangement according to various embodiments of the invention.
- FIG. 8 illustrates yet another example of a computing system with a hierarchical processor arrangement according to various embodiments of the invention.
- Various embodiments of the invention relate to tools and methods for distributing operations among multiple networked computing devices for execution. Accordingly, to better facilitate an understanding of the invention, an example of a computing device that may be employed in a network made up of a master computer linked to a plurality of different slave computers will be discussed.
- FIG. 1 An illustrative example of a computing device 101 that may be used to implement various embodiments of the invention therefore is illustrated in Figure 1.
- the computing device 101 has a computing unit 103.
- the computing unit 103 typically includes a processor unit 105 and a system memory 107.
- the processor unit 105 may be any type of processing device for executing software instructions, but will conventionally be a microprocessor device.
- the system memory 107 may include both a read-only memory (ROM) 109 and a random access memory (RAM) 111.
- ROM read-only memory
- RAM random access memory
- both the read-only memory (ROM) 109 and the random access memory (RAM) 111 may store software instructions for execution by the processor unit 105.
- FIG. 1 illustrates an example of a multi-core processor unit 105 that may be employed with various embodiments of the invention.
- the processor unit 105 includes a plurality of processor cores 201.
- Each processor core 201 includes a computing engine 203 and a memory cache 205.
- a computing engine contains logic devices for performing various computing functions, such as fetching software instructions and then performing the actions specified in the fetched instructions.
- Each computing engine 203 may then use its corresponding memory cache 205 to quickly store and retrieve data and/or instructions for execution.
- Each processor core 201 is connected to an interconnect 207.
- the particular construction of the interconnect 207 may vary depending upon the architecture of the processor unit 201. With some processor units 201 , such as the Cell microprocessor created by Sony Corporation, Toshiba Corporation and IBM Corporation, the interconnect 207 may be implemented as an interconnect bus. With other processor units 201, however, such as the OpteronTM and AthlonTM dual-core processors available from Advanced Micro Devices of Sunnyvale, California, the interconnect 207 may be implemented as a system request interface device. In any case, the processor cores 201 communicate through the interconnect 207 with an input/output interfaces 209 and a memory controller 211.
- the input/output interface 209 provides a communication interface between the processor unit 201 and the bus 113.
- the memory controller 211 controls the exchange of information between the processor unit 201 and the system memory 107.
- the processor units 201 may include additional components, such as a high- level cache memory accessible shared by the processor cores 201.
- FIG. 2 shows one illustration of a processor unit 201 that may be employed by some embodiments of the invention, it should be appreciated that this illustration is representative only, and is not intended to be limiting.
- various embodiments of the invention may employ a computing device with a Cell processor.
- the Cell processor employs multiple input/output interfaces 209 and multiple memory controllers 211.
- the Cell processor has nine different processor cores 201 of different types. More particularly, it has six or more synergistic processor elements (SPEs) and a power processor element (PPE).
- SPEs synergistic processor elements
- PPE power processor element
- Each synergistic processor element has a vector-type computing engine 203 with 128 x 128 bit registers, four single-precision floating point computational units, four integer computational units, and a 256KB local store memory that stores both instructions and data.
- the power processor element then controls that tasks performed by the synergistic processor elements. Because of its configuration, the Cell processor can perform some mathematical operations, such as the calculation of fast Fourier transforms (FFTs), at substantially higher speeds than conventional processor units 105.
- FFTs fast Fourier transforms
- the computing unit 103 will be directly or indirectly connected to one or more network interfaces 115 for communicating with other devices in a network as will be discussed in further detail below.
- the network interface 115 translates data and control signals from the computing unit 103 into network messages according to one or more communication protocols, such as the transmission control protocol (TCP), the user datagram protocol (UDP), and the Internet protocol (IP).
- TCP transmission control protocol
- UDP user datagram protocol
- IP Internet protocol
- An interface 123 may employ any suitable connection agent (or combination of agents) for connecting to a network, including, for example, a wireless transceiver, a modem, or an Ethernet connection.
- the connection agent may employ any desired medium, such as radio frequency transmissions, an optical cable, or conductive wires.
- the processing unit 105 and the system memory 107 are connected, either directly or indirectly, through a bus 113 or alternate communication structure, to one or more peripheral devices.
- the processing unit 105 or the system memory 107 may be directly or indirectly connected to one or more additional memory storage devices, such as a magnetic hard disk drive 1 17 or a removable magnetic optical disk drive 119.
- the computing device 101 may include additional or alternate memory storage devices, such as or a magnetic disk drive (not shown) or a flash memory card (not shown).
- the processing unit 105 and the system memory 107 also may be directly or indirectly connected to one or more input devices 121 and one or more output devices 123.
- the input devices 121 may include, for example, a keyboard and a pointing device (such as a mouse, touchpad, digitizer, trackball, or joystick).
- the output devices 123 may include, for example, a display monitor and a printer.
- peripheral devices may be housed with the computing unit 103 and bus 113. Alternately or additionally, one or more of these peripheral devices may be housed separately from the computing unit 103 and bus 113, and then connected (either directly or indirectly) to the bus 113. Also, it should be appreciated that a computing device 101 employed according to various embodiments of the invention may include any of the components illustrated in Figure 1, may include only a subset of the components illustrated in Figure 1, or may include an alternate combination of components from those shown in Figure 1, including some components that are not shown in Figure 1.
- various aspects of the invention relate to the execution of sets of operations by a computing system with a multiprocessor architecture. Accordingly, different embodiments of the invention can be employed with a variety of different types of software applications. Some embodiments of the invention, however, may be particularly useful in running software applications that perform operations for simulating, verifying or modifying design data representing a microdevice, such as a microcircuit.
- Designing and fabricating microcircuit devices involve many steps during a 'design flow' process. These steps are highly dependent on the type of microcircuit, the complexity, the design team, and the microcircuit fabricator or foundry. Several steps are common to all design flows: first a design specification is modeled logically, typically in a hardware design language (HDL). Software and hardware "tools" then verify the design at various stages of the design flow by running software simulators and/or hardware emulators, and errors are corrected.
- HDL hardware design language
- the physical design data may represent, for example, the geometric pattern that will be written onto a mask used to fabricate the desired microcircuit device in a photolithographic process at a foundry. It is very important that the physical design information accurately embody the design specification and logical design for proper operation of the device. Further, because the physical design data is employed to create masks used at a foundry, the data must conform to foundry requirements. Each foundry specifies its own physical design parameters for compliance with their process, equipment, and techniques. Accordingly, the design flow may include a design rule check process. During this process, the physical layout of the circuit design is compared with design rules. In addition to rules specified by the foundry, the design rule check process may also check the physical layout of the circuit design against other design rules, such as those obtained from test chips, knowledge in the industry, etc.
- RET resolution enhancement techniques
- OPC optical proximity correction
- SRAF sub- resolution assist features
- the design of a new integrated circuit may include the interconnection of millions of transistors, resistors, capacitors, or other electrical structures into logic circuits, memory circuits, programmable field arrays, and other circuit devices.
- transistors resistors, capacitors, or other electrical structures into logic circuits, memory circuits, programmable field arrays, and other circuit devices.
- cells typically referred to as "cells.”
- all of the transistors making up a memory circuit for storing a single bit may be categorized into a single "bit memory” cell.
- the group of transistors making up a single-bit memory circuit can thus collectively be referred to and manipulated as a single unit.
- the design data describing a larger 16-bit memory register circuit can be categorized into a single cell. This higher level "register cell” might then include sixteen bit memory cells, together with the design data describing other miscellaneous circuitry, such as an input/output circuit for transferring data into and out of each of the bit memory cells.
- the design data describing a 128kB memory array can then be concisely described as a combination of only 64,000 register cells, together with the design data describing its own miscellaneous circuitry, such as an input/output circuit for transferring data into and out of each of the register cells.
- microcircuit design data By categorizing microcircuit design data into hierarchical cells, large data structures can be processed more quickly and efficiently. For example, a circuit designer typically will analyze a design to ensure that each circuit feature described in the design complies with design rules specified by the foundry that will manufacture microcircuits from the design. With the above example, instead of having to analyze each feature in the entire 128kB memory array, a design rule check process can analyze the features in a single bit cell. The results of the check will then be applicable to all of the single bit cells.
- the design rule check process then can complete the analysis of a register cell simply by analyzing the features of its additional miscellaneous circuitry (which may itself be made of up one or more hierarchical cells). The results of this check will then be applicable to all of the register cells.
- the design rule check software application can complete the analysis of the entire 128kB memory array simply by analyzing the features of the additional miscellaneous circuitry in the memory array. Thus, the analysis of a large data structure can be compressed into the analyses of a relatively small number of cells making up the data structure.
- the data making up a circuit design may also have parallelism. That is, some portions of a microcircuit design may be independent from other portions of the design. For example, a cell containing design data for a 16 bit comparator will be independent of the register cell. While a "higher" cell may include both a comparator cell and a register cell, one cell does not include the other cell. Instead, the data in these two lower cells are parallel. Because these cells are parallel, the same design rule check operation can be performed on both cells simultaneously without conflict. Thus, in a multi-processor computer running multiple computing threads, a first computing thread can thus execute a design rule check operation on the register cell while a separate, second computing thread executes the same design rule check operation on the comparator cell.
- a microcircuit analysis software application also may have a hierarchical organization with parallelism.
- a software application that implements a design rule check operations for the physical layout data of a microcircuit design will be described.
- this type of software tool performs operations on the data that defines the geometric features of the microcircuit. For example, a transistor gate is created at the intersection of a region of polysilicon material and a region of diffusion material.
- the physical layout design data used to form a transistor gate in a lithographic process will be made up of a polygon in a layer of polysilicon material and an overlapping polygon in a layer of diffusion material.
- microcircuit physical design data will include two different types of data: "drawn layer” design data and "derived layer” design data.
- the drawn layer data describes polygons drawn in the layers of material that will form the microcircuit.
- the drawn layer data will usually include polygons in metal layers, diffusion layers, and polysilicon layers.
- the derived layers will then include features made up of combinations of drawn layer data and other derived layer data. For example, with the transistor gate described above, the derived layer design data describing the gate will be derived from the intersection of a polygon in the polysilicon material layer and a polygon in the diffusion material layer.
- a design rule check software application will perform two types of operations: “check” operations that confirm whether design data values comply with specified parameters, and “derivation” operations that create derived layer data.
- check operations that confirm whether design data values comply with specified parameters
- derivation operations that create derived layer data.
- transistor gate design data may be created by the following derivation operation:
- a check operation will then define a parameter or a parameter range for a data design value. For example, a user may want to ensure that no metal wiring line is within a micron of another wiring line. This type of analysis may be performed by the following check operation:
- the results of this operation will identify each polygon in the metal layer design data that are closer than one micron to another polygon in the metal layer design data.
- check operations may be performed on derived layer data as well. For example, if a user wanted to confirm that no transistor gate is located within one micron of another gate, the design rule check process might include the following check operation:
- Optical proximity correction (OPC) operations are one category of example of simulation and verification operations that will typically be executed using floating point number computations.
- optical proximity correction includes the modification of a physical layout of a circuit design to improve the reproduction accuracy of the layout during a lithographic process.
- optical proximity correction as used herein will also include the modification of the physical layout to improve the robustness of the lithographic process for, e.g., printing isolated features and/or features at abrupt proximity transitions.
- the polygon edges of the physical layout are divided into small segments. These segments are then moved, and additional small polygons may be added to the physical layout at strategic locations.
- the lithographic process is then simulated to determine whether the image that would be created by the modified or "corrected" layout would be better than the image created that would be created by previous modifications to the layout image. This process is then iteratively repeated until a modified layout the simulation and verification tool generates a modified layout that will produce a satisfactory image resolution during an actual lithographic process.
- optical proximity correction techniques are classified as either rule-based or model-based.
- rule-based optical proximity correction the layout modifications are generated based upon specific rules. For example, small serifs may be automatically added to each convex (i.e., outwardly-pointing) 90° corner in the layout.
- Model-based optical proximity correction generally will be significantly more complex than rule-based optical proximity correction.
- model-based optical proximity correction lithographic process data obtained from test layouts are used to create mathematical models of the lithographic patterning behavior. Using an appropriate model, the simulation and verification tool will then calculate the image that wall be created by a corrected layout during the lithographic process.
- the layout features undergoing correction then are iteratively manipulated until the image for the layout (calculated using the model) is sufficiently close to the desired layout image.
- some model-based optical proximity correction algorithms may require the simulation of multiple lithographic process effects by a calculating a weighted sum of pre-simulated results for edges and comers.
- An example of an optical proximity correction algorithm is described in "Fast Optical and Process Proximity Correction Algorithms for Integrated Circuit Manufacturing," by Nick Cobb (Ph.D. Thesis), University of California, Berkeley, 1998.
- Obtaining a simulated lithographic image may involve modeling the lithographic light source as a plurality of separate coherent light sources arranged at different angles. For each such coherent light source, a simulated image is obtained by calculating a fast Fourier transform (FFT) to model the operation of the lens used in the lithographic process. These simulated images are then summed to obtain the image that would be produced by the lithographic process.
- FFT fast Fourier transform
- FIG. 3 illustrates a hierarchical processor computing system 301 according to various embodiments of the invention.
- this hierarchical processor computing system 301 may be employed to efficiently implement a simulation and verification tool that calculates both integral number computations and floating point number computations.
- the hierarchical processor computing system 301 includes a master computing module 303, and a plurality of senior slave computing modules 305A-305 ⁇ .
- the hierarchical processor computing system 301 also includes a dispatcher computing module 307 and a plurality of junior slave computing modules 309A-309 ⁇ .
- each of the senior slave computing modules 305A-305 ⁇ may be implemented by a computer, such as computing device 101, using one or more processor units 103.
- each of the senior slave computing modules 3O5A-3O5 ⁇ may be implemented by a conventional server computer using a conventional single-core processor, such as the OpteronTM single-core processor available from Advanced Micro Devices of Sunnyvale, California.
- one or more of the senior slave computing modules 3O5A-3O5 ⁇ may be implemented by a server computer having multiple single-core processors.
- a single server computer 101 may have multiple OpteronTM single-core processors. Each OpteronTM single-core processor can then be used to implement an instance of a senior slave computing module 305.
- Still other implementations of the invention may employ computers with multi-core processors, with each processor or, alternatively, each core being used to implement an instantiation of a senior slave computing module 305.
- a computing device 101 may employ a single OpteronTM dual-core processor to implement a single instantiation of a senior slave computing module 305.
- a computing device 101 may use a single OpteronTM dual-core processor to implement two separate instantiations of a senior slave computing module 305 (i.e., a separate instantiation being implemented by each core of the OpteronTM dual-core processor).
- a computing device 101 used to implement multiple instantiations of a senior slave computing module 305 may have a plurality of single- core processors, multi-core processors, or some combination thereof.
- each of the master computing module 303 and the dispatcher computing module 307 may be implemented by a separate computing device 101 from the senior slave computing modules 305A-305 ⁇ .
- the master computing module 303 may be implemented by a computing device 101 having a single OpteronTM single- core processor or OpteronTM dual-core processor.
- the dispatcher computing module 307 may then be implemented by another computing device 101 having a single OpteronTM single-core processor or OpteronTM dual-core processor.
- one or both of the master computing module 303 and the dispatcher computing module 307 may be implemented using the same computing device 101 or processor unit 201 as a senior slave computing module 305.
- the master computing module 303 may be implemented by a multiprocessor computing device.
- One processor unit 201 can be used to run an instantiation of the master computing module 303, while the remaining processor units 201 can then each be used to implement an instantiation of a senior slave computing module 305.
- a single core in a multi-core processor unit 201 may be used to run an instantiation of the master computing module 303, while the remaining cores can then each be used to implement an instantiation of a senior slave computing module 305.
- the master computing module 303, the dispatcher computing module 307 or both may even share single-core processor unit 201 (or a single core of a multi-core processor unit 201) with one or more instantiations of a senior slave computing module 305 using, for example, multi-threading technology.
- each of the junior slave computing modules 309A-309 ⁇ may be implemented by a computer, such as computing device 101, using one or more processor units 103 that have a different functional capability from the processor units 103 used to implement the senior slave computing modules 305 A- 305 ⁇ .
- the senior slave computing modules 305 A- 305 ⁇ may be implemented using some type of OpteronTM processor available from Advanced Micro Devices.
- this type of processor is configured to perform integral number computations more quickly than floating point number computations.
- one or more of the junior slave computing modules 309A-309 ⁇ may be implemented using a Cell processor available from International Business Machines Corporation of Armonk, New York.
- this type of processor is configured to perform floating point number computations more quickly than the OpteronTM processor.
- Each of the master computing module 303, senior slave computing modules 305A- 305 ⁇ , dispatcher computing module 307, and the junior slave computing modules 309A-309 ⁇ may be a computing process created using some variation of the Unix operation system, some variation of the Microsoft Windows operating system available from Microsoft Corporation of Redmond, Washington, or some combination of both.
- any software operating system or combination of software operating systems can be used to implement any of the master computing module 303, senior slave computing modules 305A-305 ⁇ , dispatcher computing module 307, and the junior slave computing modules 309A-309 ⁇ .
- each of the master computing module 303, senior slave computing modules 3O5A-305 ⁇ , dispatcher computing module 307, and the junior slave computing modules 309A-309 ⁇ are interconnected through a network 311.
- the network 311 may use any communication protocol, such as the well-known Transmission Control Protocol (TCP) and Internet Protocol (IP).
- TCP Transmission Control Protocol
- IP Internet Protocol
- the network 311 may be a wired network using conventional conductive wires, a wireless network (using, for example radio frequency or infrared frequency signals as a medium), an optical cable network, or some combination thereof. It should be appreciated, however, that the communication rate across the network 311 should be sufficiently fast so as not to delay the operation of the computing modules 303-309.
- each of the master computing module 303 and the senior slave computing modules 305A-305 ⁇ initiates an instance of the target software application that will be run on the hierarchical processor computing system 301.
- some examples of the invention may be used to run a simulation and verification software application for analyzing and modifying microcircuit designs.
- some embodiments of the invention may be used to run the CALIBRE microcircuit design analysis software application available from Mentor Graphics Corporation of Wilsonville, Oregon.
- step 403 the master computing module 303 initiates the operation of the dispatcher computing module 307.
- the operation of the dispatcher computing module 307 may be started manually by a user.
- the dispatcher computing module 307 has each of the junior slave computing modules 309A-309 ⁇ initiate an instance of the target software application in step 405.
- step 407 When each of the senior slave computing modules 305A-305 ⁇ is ready to begin running an instantiation of the target software application, it reports its readiness and its network address to the master computing module 303 in step 407. Similarly, in step 409, when each of the junior slave computing modules 309A-309 ⁇ is ready to begin running an instantiation of the target software application, it reports its readiness and its network address to the dispatcher computing module 307. When each of the junior slave computing modules 309A-309 ⁇ has reported its readiness and network address to the dispatcher computing module 307, in step 41 1 the dispatcher computing module 307 reports it readiness and network address to master computing module 303.
- the master computing module 303 provides the network address of the dispatcher computing module 307 to each of the senior slave computing modules 305A-305 ⁇ in step 413.
- the master computing module 303 begins assigning sets of operations to individual senior slave computing modules 305A-305 ⁇ for execution. More particularly, the master computing module 303 will access the next set of operations that are to be performed by the target software application. It provides this operation set to the next available senior slave computing module 305, together with the relevant data required to perform the operation set. This process is repeated until all of the senior slave computing modules 305A-305 ⁇ are occupied (or until there are no further operations to be executed).
- the operation of the senior slave computing modules 305A-305 ⁇ , the dispatcher computing module 307 and the junior slave computing modules 309A-309 ⁇ with now be discussed with regard to the flowchart shown in Figs. 5A-5B.
- a senior slave computing module 305 executes operations in the operation set that are of a first type better suited to execution by a senior slave computing module 305.
- the senior slave computing modules 305A-305 ⁇ may be implemented using processor units 201 that execute integral number computations more efficiently than floating point number computations. Accordingly, if the operation set includes operations that primarily involve integral number computations, such as design rule check operations, then these operations will be performed by the a senior slave computing module 305 to which they have been assigned by the master computing module 303.
- the senior slave computing module 305 identifies one or more operations in the operation that are of a second type better suited to execution by a junior slave computing module 309.
- the junior slave computing modules 309A-309 ⁇ may be implemented using processor units 201 that execute floating point number computations more efficiently than the processor units 201 used to implement the senior slave computing modules 305A-305 ⁇ . Accordingly, if the operation set includes operations that primarily involve floating point number computations, such as optical proximity correction operations or optical proximity correction verification operations, then these operations will be identified by the senior slave computing module 305 to which they have been assigned by the master computing module 303.
- step 505 the senior slave computing module 305 sends an inquiry to the dispatcher computing module 307 for the network address of an available junior slave computing module 309.
- the dispatcher computing module 307 sends the senior slave computing module 305 the network address of a junior slave computing module 309 that is not currently occupied performing other operations in step 507.
- the dispatcher computing module 307 may select available junior slave computing modules 309 A- 309 ⁇ using any desired algorithm, such as a round-robin algorithm.
- step 509 start transfers the identified operations of the second type to the available junior slave computing module 309 for execution.
- the junior slave computing module 309 then executes the transferred operations in step 511, and returns the results of executing the transferred operations back to the senior slave computing module 305 in step 513.
- the senior slave computing module 305 may wait indefinitely for the results from the junior slave computing module 309. With other examples of the invention, however, the senior slave computing module 305 may only wait a threshold time period for the results from the junior slave computing module 309. After this time period expires, the senior slave computing module 305 may begin executing the transferred operations itself, on the assumption that the junior slave computing module 309 has failed and will never return the operation results.
- the senior slave computing module 305 may simply wait in an idle mode for the results from the junior slave computing module 309. With other examples of the invention, however, the senior slave computing module 305 may employ multi-tasking techniques to begin executing a second operation set assigned by the master computing module 303 while waiting for the results from the junior slave computing module 309 to complete the execution of the first operation set.
- Steps 501 -511 are repeated until all of the operations in the operation set have been performed. Once all of the operations in the operation set have been performed, then the senior slave computing module 305 returns the results obtained from performing the operation set to master computing module 303 in step 515. [62] Returning now to Fig. 4, in step 417 the master computing module 303 receives the operation results from the senior slave computing module 305. In step 419, the master computing module 303 determines if there are any more operations sets that need to be executed. If so, then steps 415 and 417 are repeated for the next operation set. If there are no more operations that need to be executed, then the process ends.
- the Cell microprocessor may be approximately 100 times faster for performing some operations, such as image simulation operations used for optical proximity control, than a conventional OpteronTM processor.
- the Cell processor may be slower (e.g., only 0.9 times as fast) as a conventional OpteronTM processor for other types of operations, such as design rule check operations.
- processor units 201 By employing different types of processor units 201 in a computing system 301, and then matching each operation to the type of processor unit 201 best suited to execute that operation, various implementations of the invention can execute the operations of a process much faster than a homogenous-processor computing system.
- the ratio of senior slave computing modules 305A-305 ⁇ to junior slave computing modules 309A-309 ⁇ may depend upon the types of operations that are expected to be performed by the computing system 301.
- some embodiments of the invention may implement a computing system 301 that uses OpteronTM processors and Cell processors to perform simulation and verification operations including image simulation operations.
- Fig. 6 illustrates the estimated increase in speed that may be obtained for different ratios of simulation/non-simulation operations, based upon the number of Cell processors employed in the computing system 301.
- the y-axis of this figure illustrates the ratio of the estimated runtime of a typical integrated circuit design analysis process with an embodiment of the invention to the estimated runtime of that integrated circuit design analysis process on a conventional distributed processing system, while the x-axis then corresponds to the number of Cell processor employed in the computing system 301. Each curve then corresponds to a ratio of floating point number operations to integral number operations in the analysis process.
- FIG. 3 illustrates one example of a hierarchical processor computing system that may be implemented according to various embodiments of the invention
- Fig. 7 illustrates a computing system 701 that includes a second master computing module 703 and a second set of senior slave computing modules 705A-705 ⁇ .
- the second master computing module 703 and a second set of senior slave computing modules 705A-705 ⁇ share the user of the dispatcher computing module 307 and the junior slave computing modules 309A-309 ⁇ .
- This type arrangement may be useful where, for example, the processor units 201 used to implement the junior slave computing modules 309A-309 ⁇ are relatively expensive and/or sparsely used, and are to be shared among two or more sets of master computing modules senior slave computing modules.
- FIG. 8 illustrates a computing system 801 that omits the dispatcher computing module 307 altogether. Instead, each senior slave computing module 305 is assigned the exclusive use of a corresponding junior slave computing module 309.
- This type configuration may be useful where, for example, the processor units 201 used to implement the junior slave computing modules 309A-309 ⁇ are relatively inexpensive and/or are so frequently used that the optimum number of junior slave computing modules 309A-309 ⁇ needed to obtain a desired operating speed would match the number of senior slave computing modules 305A-305 ⁇ .
- the processor units 201 used to implement the junior slave computing modules 309A-309 ⁇ are relatively inexpensive and/or are so frequently used that the optimum number of junior slave computing modules 309A-309 ⁇ needed to obtain a desired operating speed would match the number of senior slave computing modules 305A-305 ⁇ .
- still other configurations using a hierarchical arrangement of different types of processors will be apparent to those of ordinary skill in the art.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
- Multi Processors (AREA)
Abstract
La présente invention concerne un système informatique ayant une architecture multiprocesseur. Les processeurs sont organisés hiérarchiquement de sorte qu'un ou plusieurs processeurs esclaves à un niveau hiérarchique prioritaire attribuent des tâches à un ou plusieurs processeurs esclaves à un niveau hiérarchique non prioritaire. En outre, les processeurs esclaves au niveau hiérarchique non prioritaire auront une capacité fonctionnelle différente des processeurs esclaves au niveau hiérarchique prioritaire, de sorte que les processeurs esclaves non prioritaires peuvent exécuter certains types d'opérations mieux que les processeurs esclaves prioritaires. Un processus de traitement informatique maître distribue des jeux d'opérations entre un ou plusieurs processus de traitement informatiques s'exécutant sur un processeur au niveau hiérarchique prioritaire, qui commenceront à exécuter des opérations du jeu d'opérations. Quand un processus s'exécutant au niveau hiérarchique prioritaire identifie une ou plusieurs opérations du type mieux exécuté par un processeur au niveau hiérarchique non prioritaire, il propose cette opération ou ces opérations à un processus s'exécutant sur un processeur au niveau hiérarchique non prioritaire. Une fois que le processus s'exécutant au niveau hiérarchique non prioritaire exécute son opération ou ses opérations attribuée(s), il retourne les résultats au processus s'exécutant au niveau hiérarchique prioritaire pour achever l'exécution du jeu d'opérations.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US82224706P | 2006-08-13 | 2006-08-13 | |
PCT/US2007/017347 WO2008021024A2 (fr) | 2006-08-13 | 2007-08-03 | Architecture multiprocesseur avec organisation hiérarchique des processeurs |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2069958A2 true EP2069958A2 (fr) | 2009-06-17 |
Family
ID=39082534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP07811051A Withdrawn EP2069958A2 (fr) | 2006-08-13 | 2007-08-03 | Architecture multiprocesseur avec organisation hiérarchique des processeurs |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP2069958A2 (fr) |
JP (1) | JP2010500692A (fr) |
CN (1) | CN101523381A (fr) |
WO (1) | WO2008021024A2 (fr) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7003758B2 (en) | 2003-10-07 | 2006-02-21 | Brion Technologies, Inc. | System and method for lithography simulation |
JP5326308B2 (ja) * | 2008-03-13 | 2013-10-30 | 日本電気株式会社 | コンピュータリンク方法及びシステム |
FR2984557B1 (fr) * | 2011-12-20 | 2014-07-25 | IFP Energies Nouvelles | Systeme et procede de prediction des emissions de polluants d'un vehicule avec calculs simultanes de la cinetique chimique et des emissions |
US8959522B2 (en) | 2012-01-30 | 2015-02-17 | International Business Machines Corporation | Full exploitation of parallel processors for data processing |
US9141631B2 (en) | 2012-04-16 | 2015-09-22 | International Business Machines Corporation | Table boundary detection in data blocks for compression |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2704663B1 (fr) * | 1993-04-29 | 1995-06-23 | Sgs Thomson Microelectronics | Procédé et dispositif de détermination de la composition d'un circuit intégré. |
EP0627682B1 (fr) * | 1993-06-04 | 1999-05-26 | Sun Microsystems, Inc. | Processeur à virgule flottante pour un accélérateur graphique tri-dimensionnel à haute performance |
FR2727540B1 (fr) * | 1994-11-30 | 1997-01-03 | Bull Sa | Outil d'aide a la repartition de la charge d'une application repartie |
US5682323A (en) * | 1995-03-06 | 1997-10-28 | Lsi Logic Corporation | System and method for performing optical proximity correction on macrocell libraries |
JP3981238B2 (ja) * | 1999-12-27 | 2007-09-26 | 富士通株式会社 | 情報処理装置 |
US6703167B2 (en) * | 2001-04-18 | 2004-03-09 | Lacour Patrick Joseph | Prioritizing the application of resolution enhancement techniques |
US20040083475A1 (en) * | 2002-10-25 | 2004-04-29 | Mentor Graphics Corp. | Distribution of operations to remote computers |
JP2006155187A (ja) * | 2004-11-29 | 2006-06-15 | Sony Corp | 情報処理システム、情報処理装置および方法、記録媒体、並びにプログラム。 |
-
2007
- 2007-08-03 WO PCT/US2007/017347 patent/WO2008021024A2/fr active Application Filing
- 2007-08-03 JP JP2009524613A patent/JP2010500692A/ja active Pending
- 2007-08-03 EP EP07811051A patent/EP2069958A2/fr not_active Withdrawn
- 2007-08-03 CN CNA2007800339413A patent/CN101523381A/zh active Pending
Non-Patent Citations (1)
Title |
---|
See references of WO2008021024A3 * |
Also Published As
Publication number | Publication date |
---|---|
CN101523381A (zh) | 2009-09-02 |
JP2010500692A (ja) | 2010-01-07 |
WO2008021024A3 (fr) | 2008-05-15 |
WO2008021024A2 (fr) | 2008-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7409656B1 (en) | Method and system for parallelizing computing operations | |
Lutz et al. | PARTANS: An autotuning framework for stencil computation on multi-GPU systems | |
US8938696B1 (en) | Techniques of optical proximity correction using GPU | |
US8516399B2 (en) | Collaborative environment for physical verification of microdevice designs | |
US10643015B2 (en) | Properties in electronic design automation | |
US20100185994A1 (en) | Topological Pattern Matching | |
US20100023914A1 (en) | Use Of Graphs To Decompose Layout Design Data | |
US8234599B2 (en) | Use of graphs to decompose layout design data | |
JP5496986B2 (ja) | 並列演算の分散方法及び装置 | |
US10311197B2 (en) | Preserving hierarchy and coloring uniformity in multi-patterning layout design | |
US9047434B2 (en) | Clustering for processing of circuit design data | |
EP2069958A2 (fr) | Architecture multiprocesseur avec organisation hiérarchique des processeurs | |
US8352891B2 (en) | Layout decomposition based on partial intensity distribution | |
US8572525B2 (en) | Partition response surface modeling | |
US20080235497A1 (en) | Parallel Data Output | |
US20080140989A1 (en) | Multiprocessor Architecture With Hierarchical Processor Organization | |
US8683394B2 (en) | Pattern matching optical proximity correction | |
US20140040848A1 (en) | Controllable Turn-Around Time For Post Tape-Out Flow | |
US20150143317A1 (en) | Determination Of Electromigration Features | |
US10895864B2 (en) | Fabric-independent multi-patterning | |
Lohoff et al. | Interfacing neuromorphic hardware with machine learning frameworks-a review | |
US20130318487A1 (en) | Programmable Circuit Characteristics Analysis | |
US20120198394A1 (en) | Method For Improving Circuit Design Robustness | |
US10908511B2 (en) | Systems and methods for patterning color assignment | |
US20130145340A1 (en) | Determination Of Uniform Colorability Of Layout Data For A Double Patterning Manufacturing Process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20090227 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR MK RS |
|
17Q | First examination report despatched |
Effective date: 20121009 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20130220 |