US20110010716A1 - Domain Bounding for Symmetric Multiprocessing Systems - Google Patents
Domain Bounding for Symmetric Multiprocessing Systems Download PDFInfo
- Publication number
- US20110010716A1 US20110010716A1 US12/815,299 US81529910A US2011010716A1 US 20110010716 A1 US20110010716 A1 US 20110010716A1 US 81529910 A US81529910 A US 81529910A US 2011010716 A1 US2011010716 A1 US 2011010716A1
- Authority
- US
- United States
- Prior art keywords
- task queue
- symmetric multiprocessing
- tasks
- instruction
- scheduled
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5033—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering data affinity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G06F9/5088—Techniques for rebalancing the load in a distributed system involving task migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the invention relates to the field of computing on multi-processor computer architectures. More particularly, various implementations of the invention are applicable to developing application software with symmetric and/or asymmetric software designs on symmetric multi-processor embedded systems.
- An embedded system may be described as a special purpose computing system designed to perform one or a few dedicated functions.
- Embedded systems are commonly used in consumer devices like personal digital assistants, mobile phones, videogame consoles, microwaves, washing machines, alarm systems, and digital cameras.
- embedded systems are used in nearly every industry, from telecommunications to manufacturing, and from transportation to medical devices. In fact, embedded systems are so commonly in use today that it is not feasible to exhaustively list specific examples.
- embedded system does not have a precise definition, and determining what is and is not an embedded system can be difficult.
- a general purpose computer such as a laptop
- a laptop is not typically characterized as an embedded system.
- a laptop is usually composed of a multitude of subsystems such as the hard disk drive, the motherboard, the optical drive, the video processing unit, and various communication devices. Many of the individual subsystems comprising the laptop may themselves be embedded systems.
- SoC system on a chip
- ASIC application-specific integrated circuit
- USB universal serial bus
- an embedded system typically is designed to do some specific task, as opposed to being a general purpose computer with a wide range of features for performing many different tasks.
- design engineers can optimize the embedded system for the desired task, which assists in reducing the size and cost of the device as well as increasing its reliability and performance.
- embedded systems may often contain more than one processing unit.
- Embedded systems having more than one processing unit are often referred to as a multi-processor system.
- various implementations of the present invention are particularly applicable to multi-processor systems having more than one homogeneous processor, that is, they have multiple identical processing units.
- a multi-processor computer system is any computing configuration that utilizes more than one processing unit.
- the processing units will typically share a memory.
- one operating system is often used to control the entire system.
- multiple computational tasks, or “instructions,” may be processed at the same time, such as, for example, one by each processing unit.
- This type of computing arrangement i.e. where multiple processing units share a memory and are controlled by a single instance of an operating system
- SMP symmetric multiprocessing
- an operating system is used to control the symmetric multiprocessing system. Controlling which processing units execute which tasks and when, is managed by the operating system, which typically operates on one of the processing units in the system.
- This operating system is often referred to as an SMP operating system or a symmetric multiprocessing operating system.
- SMP operating system or a symmetric multiprocessing operating system.
- various symmetric multiprocessing operating systems currently exist. For example, OS X, Linux, and various UNIX based operating systems are all capable of operating in a symmetric multiprocessing environment.
- a symmetric multiprocessing operating system allows any processor to work on any task, no matter the type of task or where the data for that task is located. Additionally, many symmetric multiprocessing operating systems move tasks between processors to balance the workload efficiently.
- Various implementations of the present invention provide methods and apparatuses for developing symmetric and asymmetric software applications on a single monolithic symmetric multiprocessing operating system.
- Various implementations of the invention may provides an enabling framework for one or all of the following software design patterns; application work load sharing between all processors present in a multi-processor system in a symmetric fashion, application work load sharing between all processors present in a multi-processor system in a asymmetric fashion using task to processor soft affinity declarations, application work load sharing between all processors present in a multi-processor system using bound computational domains.
- a particular computational task is “linked” or a set of computational tasks may be bound to a particular processing unit. Subsequently, when one such task is to be scheduled, the symmetric multiprocessing operating system ensures that the bound processing unit processes the instruction. When the bound processing unit is not processing the particular computational instruction, the bound processing unit may enter a low power or idle state.
- FIG. 1 shows an illustrative computing environment
- FIG. 2 shows a portion of the illustrative computing environment of FIG. 1 in greater detail
- FIG. 3 illustrates a conventional symmetric multiprocessing system
- FIG. 4 illustrates a method of bounding the processing domain of a symmetric multiprocessing system
- FIG. 5 illustrates a symmetric multiprocessing system according to various implementations of the present invention
- FIG. 6 illustrates the symmetric multiprocessing system of FIG. 5 in alternate detail
- FIG. 7 illustrates the symmetric multiprocessing system of FIG. 5 in alternate detail.
- the methods described herein can be implemented by software stored on a computer readable storage medium and executed on a computer. Furthermore, the selected methods could be executed on a single computer or a computer networked with another computer or computers. For clarity, only those aspects of the software germane to these disclosed methods are described; product details well known in the art are omitted.
- the computer network 101 includes a master computer 103 .
- the master computer 103 is a multi-processor computer that includes a plurality of input and output devices 105 and a memory 107 .
- the input and output devices 105 may include any device for receiving input data from or providing output data to a user.
- the input devices may include, for example, a keyboard, microphone, scanner or pointing device for receiving input from a user.
- the output devices may then include a display monitor, speaker, printer or tactile feedback device.
- the memory 107 may similarly be implemented using any combination of computer readable media that can be accessed by the master computer 103 .
- the computer readable media may include, for example, microcircuit memory devices such as read-write memory (RAM), read-only memory (ROM), electronically erasable and programmable read-only memory (EEPROM) or flash memory microcircuit devices, CD-ROM disks, digital video disks (DVD), or other optical storage devices.
- the computer readable media may also include magnetic cassettes, magnetic tapes, magnetic disks or other magnetic storage devices, punched media, holographic storage devices, or any other medium that can be used to store desired information.
- the master computer 103 runs a software application for performing one or more operations according to various examples of the invention.
- the memory 107 stores software instructions 109 A that, when executed, will implement a software application for performing one or more operations.
- the memory 107 also stores data 109 B to be used with the software application.
- the data 109 B contains process data that the software application uses to perform the operations, at least some of which may be parallel.
- the master computer 103 also includes a plurality of processor units 111 and an interface device 113 .
- the processor units 111 may be any type of processor device that can be programmed to execute the software instructions 109 A, but will conventionally be a microprocessor device.
- one or more of the processor units 111 may be a commercially generic programmable microprocessor, such as Intel® Pentium® or XeonTM microprocessors, Advanced Micro Devices AthlonTM microprocessors, ARM ⁇ , or Motorola 68K.Coldfire microprocessors.
- one or more of the processor units 111 may be a custom-manufactured processor, such as a microprocessor designed to optimally perform specific types of mathematical operations.
- the interface device 113 , the processor units 111 , the memory 107 and the input/output devices 105 are connected together by a bus 115 .
- the master computing device 103 may employ one or more processing units 111 having more than one processor core.
- FIG. 2 illustrates an example of a multi-core processor unit 111 that may be employed with various embodiments of the invention.
- the processor unit 111 includes a plurality of processor cores 201 .
- Each processor core 201 includes a computing engine 203 and a memory cache 205 .
- a computing engine contains logic devices for performing various computing functions, such as fetching software instructions and then performing the actions specified in the fetched instructions.
- Each computing engine 203 may then use its corresponding memory cache 205 to quickly store and retrieve data and/or instructions for execution.
- Each processor core 201 is connected to an interconnect 207 .
- the particular construction of the interconnect 207 may vary depending upon the architecture of the processor unit 201 .
- the interconnect 207 may be implemented as an interconnect bus.
- the interconnect 207 may be implemented as a system request interface device.
- the processor cores 201 communicate through the interconnect 207 with an input/output interfaces 209 and a memory controller 211 .
- the input/output interface 209 provides a communication interface between the processor unit 201 and the bus 115 .
- the memory controller 211 controls the exchange of information between the processor unit 201 and the system memory 107 .
- the processor units 201 may include additional components, such as a high-level cache memory accessible shared by the processor cores 201 . More particularly, with various implementations, the processor cores 201 all have access to the same memory cache, which may, for example, be the memory cache units 205 shown. This is often referred to as “cache coherency.”
- FIG. 2 shows one illustration of a processor unit 201 that may be employed by some embodiments of the invention, it should be appreciated that this illustration is representative only, and is not intended to be limiting.
- some embodiments of the invention may employ a master computer 103 with one or more Cell processors.
- the Cell processor employs multiple input/output interfaces 209 and multiple memory controllers 211 .
- the Cell processor has nine different processor cores 201 of different types. More particularly, it has six or more synergistic processor elements (SPEs) and a power processor element (PPE).
- SPEs synergistic processor elements
- PPE power processor element
- Each synergistic processor element has a vector-type computing engine 203 with 128 ⁇ 128 bit registers, four single-precision floating point computational units, four integer computational units, and a 256 KB local store memory that stores both instructions and data.
- the power processor element then controls that tasks performed by the synergistic processor elements. Because of its configuration, the Cell processor can perform some mathematical operations, such as the calculation of fast Fourier transforms (FFTs), at substantially higher speeds than many conventional processors.
- FFTs fast Fourier transforms
- a multi-core processor unit 111 can be used in lieu of multiple, separate processor units 111 .
- an alternate implementation of the invention may employ a single processor unit 111 having six cores, two multi-core processor units each having three cores, a multi-core processor unit 111 with four cores together with two separate single-core processor units 111 , etc.
- the interface device 113 allows the master computer 103 to communicate with the slave computers 117 A, 117 B, 117 C . . . 117 x through a communication interface.
- the communication interface may be any suitable type of interface including, for example, a conventional wired network connection or an optically transmissive wired network connection.
- the communication interface may also be a wireless connection, such as a wireless optical connection, a radio frequency connection, an infrared connection, or even an acoustic connection.
- the interface device 113 translates data and control signals from the master computer 103 and each of the slave computers 117 into network messages according to one or more communication protocols, such as the transmission control protocol (TCP), the user datagram protocol (UDP), and the Internet protocol (IP).
- TCP transmission control protocol
- UDP user datagram protocol
- IP Internet protocol
- Each slave computer 117 may include a memory 119 , a processor unit 121 , an interface device 122 , and, optionally, one more input/output devices 125 connected together by a system bus 127 .
- the optional input/output devices 125 for the slave computers 117 may include any conventional input or output devices, such as keyboards, pointing devices, microphones, display monitors, speakers, and printers.
- the processor units 121 may be any type of conventional or custom-manufactured programmable processor device.
- one or more of the processor units 121 may be commercially generic programmable microprocessors, such as Intel® Pentium® or XeonTM microprocessors, Advanced Micro Devices AthlonTM microprocessors or Motorola 68K/Coldfire® microprocessors. Alternately, one or more of the processor units 121 may be custom-manufactured processors, such as microprocessors designed to optimally perform specific types of mathematical operations. Still further, one or more of the processor units 121 may have more than one core, as described with reference to FIG. 2 above. For example, with some implementations of the invention, one or more of the processor units 121 may be a Cell processor.
- the memory 119 then may be implemented using any combination of the computer readable media discussed above. Like the interface device 113 , the interface devices 123 allow the slave computers 117 to communicate with the master computer 103 over the communication interface.
- the master computer 103 is a multi-processor unit computer with multiple processor units 111 , while each slave computer 117 has a single processor unit 121 . It should be noted, however, that alternate implementations of the invention may employ a master computer having single processor unit 111 . Further, one or more of the slave computers 117 may have multiple processor units 121 , depending upon their intended use, as previously discussed. Also, while only a single interface device 113 or 123 is illustrated for both the master computer 103 and the slave computers, it should be noted that, with alternate embodiments of the invention, either the computer 103 , one or more of the slave computers 117 , or some combination of both may use two or more different interface devices 113 or 123 for communicating over multiple communication interfaces.
- the master computer 103 and the slave computers 117 are shows as individual discrete units, some implementations may package the master computers 103 and the slave computers 117 into a single unit, such as, for example, a System-on-Chip device.
- the master computer 103 may be connected to one or more external data storage devices. These external data storage devices may be implemented using any combination of computer readable media that can be accessed by the master computer 103 .
- the computer readable media may include, for example, microcircuit memory devices such as read-write memory (RAM), read-only memory (ROM), electronically erasable and programmable read-only memory (EEPROM) or flash memory microcircuit devices, CD-ROM disks, digital video disks (DVD), or other optical storage devices.
- the computer readable media may also include magnetic cassettes, magnetic tapes, magnetic disks or other magnetic storage devices, punched media, holographic storage devices, or any other medium that can be used to store desired information.
- one or more of the slave computers 117 may alternately or additions be connected to one or more external data storage devices.
- these external data storage devices will include data storage devices that also are connected to the master computer 103 , but they also may be different from any data storage devices accessible by the master computer 103 .
- FIG. 3 illustrates a conventional symmetric multiprocessing system 301 .
- the system 301 includes a computing environment 303 , including processing units 305 .
- the computing environment 303 may be formed by the computer network 101 of FIG. 1 .
- the processing units 305 would comprise the processor units 111 and 121 shown in the figure.
- the computing environment 303 may be formed by the master computer 103 .
- the processing units 305 would comprise the processor units 111 .
- various components of the computing environment 303 are not shown in this example.
- the computing environment 303 would likely include a memory component, which is some cases, may be implemented by the memory 107 shown in FIG. 1 .
- the system 301 also includes a symmetric multiprocessing scheduler 307 having a symmetric multiprocessing queue 309 .
- the symmetric multiprocessing queue 309 includes tasks 311 .
- any of the tasks 311 may be executed on any of the processing units 305 .
- the symmetric multiprocessing scheduler 307 may assign particular tasks 311 to any of the processing units 305 , and may change or move the assignments dynamically to balance the computational load efficiently.
- processing units 305 are increasingly more specific to particular tasks 311 . This is often the case in embedded systems, where a particular processing unit may have been designed for a specific function, such as, for example, video encoding. Additionally, ones of the processing units 305 may have much higher power consumption needs than other ones of the processing units 305 . As such, use of these processing units 305 could be better controlled to manage power consumption for the system 301 .
- a processing unit 305 may be either a microprocessor or a core within a multi-core microprocessor, such as, for example, the processor unit 111 and the processor core 201 respectively.
- a symmetric computing architecture i.e. homogenous processing units that share memory
- an asymmetric computing architecture i.e. heterogeneous processing units that share memory
- a symmetric multiprocessing system may have a combination of single core microprocessors and multi-core microprocessors.
- the microprocessors may have different hardware specifications. Still, further, the microprocessors may have different computer processor architectures.
- FIG. 4 illustrates a method 401 for bounding the processing domain of a symmetric multiprocessing system.
- the method 401 may be implemented in conjunction with the example symmetric multiprocessing system 501 shown in FIG. 5 .
- the symmetric multiprocessing system 501 includes, among other items, a computing environment 503 having processing units 505 .
- the symmetric multiprocessing system 501 may be formed by modifying the symmetric multiprocessing system 301 shown in FIG. 3 .
- the symmetric multiprocessing system 501 may be formed by utilizing the computing network 101 , or alternatively, from the master computer 103 as the computing environment 503 .
- the method 401 includes an operation 403 for initializing the processing units 505 within the symmetric multiprocessing system 501 and an operation 405 for booting a symmetric multiprocessing operating system 507 on one or more of the processing units 503 .
- the symmetric multiprocessing operating system 507 is booted onto the processing unit 505 i .
- the processing unit 505 that loads the operating system e.g. the processing unit 505 i in this example
- the boot processor is used exclusively by the symmetric multiprocessing operating system 507 for operations related to managing the symmetric multiprocessing system 501 .
- the boot processor is used to load the operating system, but is not used exclusively for operations related to managing the symmetric multiprocessing system 501 . Accordingly, in some implementations, the boot processor is available for general computing tasks unrelated to operating system management. With some implementations of the invention, the operation 403 initializes all the processing units 505 . With alternative implementations, the operation 403 initializes only the boot processor.
- the method 401 further includes an operation 407 for loading the scheduler 509 .
- the scheduler 509 includes a symmetric multiprocessing queue 511 .
- the system 501 additionally includes a user application 515 having tasks 517 .
- the tasks 517 may be explicit instructions that processing units 505 may directly execute. Alternatively, the tasks 517 may be higher level operations that the symmetric multiprocessing operating system 507 will translate into instructions that the processor units 505 may execute.
- FIG. 5 illustrates a single user application 515 , and one set of tasks 517
- more than one user application 515 may be executed by the symmetric multiprocessing operating system 507 .
- the user application 515 may have multiple sets of tasks 517 .
- the set of tasks 517 is typically not static. More particularly, the set of tasks 517 changes as the user application is executed.
- the method 401 additionally includes an operation 409 for generating a bound computational domain queue 513 and an operation 411 for moving selected tasks 517 to the bound computational domain queue 513 .
- all the tasks 517 may be initially loaded into the symmetric multiprocessing queue 511 . More particularly, when the scheduler 509 is first loaded by the operation 407 , the scheduler may only include the symmetric multiprocessing queue 511 , which will include all of the tasks 517 .
- the operation 409 and the operation 411 are performed as a result of a software application interface (API), which is some cases, may be the result of a users input.
- API software application interface
- the operation 409 and the operation 411 are triggered without user input, such as, for example, based upon the type of user application 515 or the type of task 517 .
- the operations 409 and 411 may be repeated a number of times, resulting in more than one bound computational domain queue 513 being created within the scheduler 509 .
- the method 401 further includes an operation 413 for forming a processing domain boundary for the bound computational domain queue 513 .
- an “affinity” is created between a bound computational domain queue 513 and one or more processing units 505 .
- a “link” is created between a bound computational domain queue 513 and one or more processing units 505 .
- the operation 413 “affines” one or more of the processing units 505 to the bound computational domain queue 513 .
- Tasks 517 included in a bound computational domain queue 513 that is “affined” to a particular processing unit 505 are said to be affined to that particular processing unit 505 .
- Tasks 517 that are affined to a particular processing unit 505 are given “priority” by the scheduler 509 to execute on that particular processing unit 505 .
- the processing unit 505 is available for scheduling non-affined tasks 517 by the scheduler 509 .
- Priority of execution may be shown by the scheduler 509 by transferring execution of non-affined tasks 517 to idle processing units 505 when affined tasks 517 need to be executed. Alternatively, priority may be shown by stalling execution of the affined task 517 until the affined processing unit 505 is available for executing tasks 517 .
- a single processing unit 505 is affined to a bound computational domain queue 513 by the operations 413 .
- multiple processing units 505 are affined to a bound computational domain queue 513 .
- FIG. 6 illustrates the symmetric multiprocessing system 501 of FIG. 5 , where the bound computational domain queue 513 has been affined to the processing unit 505 iii and the processing unit 505 n , as illustrated by the boundary 603 .
- the user application 515 is not shown. However, the tasks 517 from the user application 515 have been moved into the symmetric multiprocessing queue 511 and the bound computational domain queue 513 .
- the scheduler 509 may assign the tasks 517 iv , 517 v , and 517 n to execute on either of the processing unit 505 iii or 505 n . Additionally, the scheduler 509 may assign the tasks 517 i , 517 ii , or 517 iii to execute on the processing unit 505 ii . Alternatively, if the processing unit 505 iii is not executing tasks 517 from the bound computational domain queue 513 , tasks 517 from the symmetric multiprocessing queue 511 may be executed on the processing unit 505 iii . Alternatively still, if the processing unit 505 n is not executing tasks 517 from the bound computational domain queue 513 , tasks 517 from the symmetric multiprocessing queue 511 may be executed on the processing unit 505 n.
- the operation 413 may “link” one or more the processing units 505 to a bound computational domain queue 513 .
- Processing units 505 that have been linked to a particular task 517 or set of tasks 517 can only execute those tasks 517 .
- the processor remains idle, as opposed to becoming available for scheduling as in the case of an affined processing unit 505 .
- FIG. 7 illustrates the symmetric multiprocessing system 501 shown in FIG. 5 and FIG. 6 .
- FIG. 7 includes a boundary 703 that shows a link, as opposed to an affinity as shown by the boundary 603 in FIG. 6 .
- boundary 603 isolates the processing units 505 iii or 505 n to the bound computational domain 513 .
- the tasks 517 iv , 517 v , and 517 n may be executed by the processing units 505 iii and 505 n.
- the processing domain for individual tasks 517 may be bound.
- the operation 415 may directly affine the task 517 v with the processing unit 505 iii .
- the processing domain for the queue 513 may be bound.
- a bound computational domain queue 513 may be created as a result of some user input. This may be facilitated by providing an application programming interface (API) instruction set that includes instructions for defining and manipulating bound computational domain queues 513 .
- API application programming interface
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Methods and apparatuses for developing symmetric and asymmetric software applications on a single monolithic symmetric multiprocessing operating system are disclosed. An enabling framework for one or all of the following software design patterns; application work load sharing between all processors present in a multi-processor system in a symmetric fashion, application work load sharing between all processors present in a multi-processor system in a asymmetric fashion using task to processor soft affinity declarations, application work load sharing between all processors present in a multi-processor system using bound computational domains may be provided. Further, a particular computational task or a set of computational tasks may be bound to a particular processing unit. Subsequently, when one such task is to be scheduled, the symmetric multiprocessing operating system ensures that the bound processing unit processes the instruction. When the bound processing unit is not processing the particular computational instruction, the bound processing unit may enter a low power or idle state.
Description
- This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application No. 61/186,760, entitled “Domain Bounding for Symmetric Multiprocessing Systems,” filed on Jun. 12, 2009, and naming Arvind Raghuraman et al. as inventors, which application is incorporated entirely herein by reference.
- The invention relates to the field of computing on multi-processor computer architectures. More particularly, various implementations of the invention are applicable to developing application software with symmetric and/or asymmetric software designs on symmetric multi-processor embedded systems.
- An embedded system may be described as a special purpose computing system designed to perform one or a few dedicated functions. Embedded systems are commonly used in consumer devices like personal digital assistants, mobile phones, videogame consoles, microwaves, washing machines, alarm systems, and digital cameras. In addition to the consumer space, embedded systems are used in nearly every industry, from telecommunications to manufacturing, and from transportation to medical devices. In fact, embedded systems are so commonly in use today that it is not feasible to exhaustively list specific examples.
- The term “embedded system” does not have a precise definition, and determining what is and is not an embedded system can be difficult. For example, a general purpose computer, such as a laptop, is not typically characterized as an embedded system. However, a laptop is usually composed of a multitude of subsystems such as the hard disk drive, the motherboard, the optical drive, the video processing unit, and various communication devices. Many of the individual subsystems comprising the laptop may themselves be embedded systems.
- The complexity of embedded systems can vary from, for example, systems with a single microcontroller chip and a light emitting diode to systems with multiple microprocessor units and various peripheral communication interfaces and mechanical parts. Manufacturers of modern microprocessors are increasingly adding components and peripheral modules to their microprocessors, creating what may be thought of as embedded processors. This type of embedded system is often referred to as a system on a chip (SoC). A simple example of a system on chip is an application-specific integrated circuit (ASIC) packaged with a universal serial bus (USB) port. Additionally, embedded systems range from those having no user interface at all to those with full user interfaces similar to a desktop operating system.
- There are many advantages to using embedded systems. For example, an embedded system typically is designed to do some specific task, as opposed to being a general purpose computer with a wide range of features for performing many different tasks. As a result, design engineers can optimize the embedded system for the desired task, which assists in reducing the size and cost of the device as well as increasing its reliability and performance.
- As stated above, embedded systems may often contain more than one processing unit. Embedded systems having more than one processing unit are often referred to as a multi-processor system. As will be apparent from the discussion below, various implementations of the present invention are particularly applicable to multi-processor systems having more than one homogeneous processor, that is, they have multiple identical processing units. In general, a multi-processor computer system is any computing configuration that utilizes more than one processing unit. The processing units will typically share a memory. Additionally, one operating system is often used to control the entire system. In this type of arrangement, multiple computational tasks, or “instructions,” may be processed at the same time, such as, for example, one by each processing unit. This type of computing arrangement (i.e. where multiple processing units share a memory and are controlled by a single instance of an operating system) is often referred to as “symmetric multiprocessing” or SMP.
- As indicated, an operating system is used to control the symmetric multiprocessing system. Controlling which processing units execute which tasks and when, is managed by the operating system, which typically operates on one of the processing units in the system. This operating system is often referred to as an SMP operating system or a symmetric multiprocessing operating system. As those of skill in the art can appreciate, various symmetric multiprocessing operating systems currently exist. For example, OS X, Linux, and various UNIX based operating systems are all capable of operating in a symmetric multiprocessing environment. Typically, a symmetric multiprocessing operating system allows any processor to work on any task, no matter the type of task or where the data for that task is located. Additionally, many symmetric multiprocessing operating systems move tasks between processors to balance the workload efficiently. One reason for this is to keep all processing units in the system busy. Often, application software developed on such systems is referred to as having a symmetric software design. Additionally, some symmetric multiprocessing operating systems provide a user with the capability to define “task” to processing unit affinity, such as, for example, with an “affinity definition.” In such systems, tasks affined with affinity definitions always execute on the processing unit it was affined to when the task is scheduled. Application software developed on symmetric multi-processing systems using this task to processor affinity feature are often referred to as having an asymmetric software design.
- This type of task balancing and workload sharing may however, in some cases, be disadvantageous. This is particularly true in an embedded system where hardware and power constraints may dictate that particular processing units be employed to perform a particular type of task during particular times or operate on data located in a specific location.
- Various implementations of the present invention provide methods and apparatuses for developing symmetric and asymmetric software applications on a single monolithic symmetric multiprocessing operating system. Various implementations of the invention may provides an enabling framework for one or all of the following software design patterns; application work load sharing between all processors present in a multi-processor system in a symmetric fashion, application work load sharing between all processors present in a multi-processor system in a asymmetric fashion using task to processor soft affinity declarations, application work load sharing between all processors present in a multi-processor system using bound computational domains.
- With some implementations, a particular computational task is “linked” or a set of computational tasks may be bound to a particular processing unit. Subsequently, when one such task is to be scheduled, the symmetric multiprocessing operating system ensures that the bound processing unit processes the instruction. When the bound processing unit is not processing the particular computational instruction, the bound processing unit may enter a low power or idle state.
- The present invention will be described by way of illustrative embodiments shown in the accompanying drawings in which like references denote similar elements, and in which:
-
FIG. 1 shows an illustrative computing environment; -
FIG. 2 shows a portion of the illustrative computing environment ofFIG. 1 in greater detail; -
FIG. 3 illustrates a conventional symmetric multiprocessing system; -
FIG. 4 illustrates a method of bounding the processing domain of a symmetric multiprocessing system; -
FIG. 5 illustrates a symmetric multiprocessing system according to various implementations of the present invention; -
FIG. 6 illustrates the symmetric multiprocessing system ofFIG. 5 in alternate detail; and -
FIG. 7 illustrates the symmetric multiprocessing system ofFIG. 5 in alternate detail. - The operations of the disclosed implementations may be described herein in a particular sequential order. However, it should be understood that this manner of description encompasses rearrangements, unless a particular ordering is required by specific language set forth below. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Moreover, for the sake of simplicity, the illustrated flow charts and block diagrams typically do not show the various ways in which particular methods can be used in conjunction with other methods.
- It should also be noted that the detailed description sometimes uses terms like “determine” to describe the disclosed methods. Such terms are often high-level abstractions of the actual operations that are performed. The actual operations that correspond to these terms will often vary depending on the particular implementation, and will be readily discernible by one of ordinary skill in the art.
- The methods described herein can be implemented by software stored on a computer readable storage medium and executed on a computer. Furthermore, the selected methods could be executed on a single computer or a computer networked with another computer or computers. For clarity, only those aspects of the software germane to these disclosed methods are described; product details well known in the art are omitted.
- As the techniques of the present invention may be implemented using software instructions executed by one or more programmable computing devices, the components and operation of a generic programmable computer system on which various implementations of the invention may be employed will first be described. Further, because of the complexity of some electronic design automation processes and the large size of many circuit designs, various electronic design automation tools are configured to operate on a computing system capable of simultaneously running multiple processing threads. The components and operation of a computer network having a host or master computer and one or more remote or slave computers therefore will be described with reference to
FIG. 1 . This operating environment is only one example of a suitable operating environment, however, and is not intended to suggest any limitation as to the scope of use or functionality of the invention. - In
FIG. 1 , thecomputer network 101 includes amaster computer 103. In the illustrated example, themaster computer 103 is a multi-processor computer that includes a plurality of input andoutput devices 105 and amemory 107. The input andoutput devices 105 may include any device for receiving input data from or providing output data to a user. The input devices may include, for example, a keyboard, microphone, scanner or pointing device for receiving input from a user. The output devices may then include a display monitor, speaker, printer or tactile feedback device. These devices and their connections are well known in the art, and thus will not be discussed at length here. - The
memory 107 may similarly be implemented using any combination of computer readable media that can be accessed by themaster computer 103. The computer readable media may include, for example, microcircuit memory devices such as read-write memory (RAM), read-only memory (ROM), electronically erasable and programmable read-only memory (EEPROM) or flash memory microcircuit devices, CD-ROM disks, digital video disks (DVD), or other optical storage devices. The computer readable media may also include magnetic cassettes, magnetic tapes, magnetic disks or other magnetic storage devices, punched media, holographic storage devices, or any other medium that can be used to store desired information. - As will be discussed in detail below, the
master computer 103 runs a software application for performing one or more operations according to various examples of the invention. Accordingly, thememory 107 stores software instructions 109A that, when executed, will implement a software application for performing one or more operations. Thememory 107 also storesdata 109B to be used with the software application. In the illustrated embodiment, thedata 109B contains process data that the software application uses to perform the operations, at least some of which may be parallel. - The
master computer 103 also includes a plurality ofprocessor units 111 and aninterface device 113. Theprocessor units 111 may be any type of processor device that can be programmed to execute the software instructions 109A, but will conventionally be a microprocessor device. For example, one or more of theprocessor units 111 may be a commercially generic programmable microprocessor, such as Intel® Pentium® or Xeon™ microprocessors, Advanced Micro Devices Athlon™ microprocessors, ARM©, or Motorola 68K.Coldfire microprocessors. Alternately or additionally, one or more of theprocessor units 111 may be a custom-manufactured processor, such as a microprocessor designed to optimally perform specific types of mathematical operations. Theinterface device 113, theprocessor units 111, thememory 107 and the input/output devices 105 are connected together by abus 115. - With some implementations of the invention, the
master computing device 103 may employ one ormore processing units 111 having more than one processor core. Accordingly,FIG. 2 illustrates an example of amulti-core processor unit 111 that may be employed with various embodiments of the invention. As seen in this figure, theprocessor unit 111 includes a plurality ofprocessor cores 201. Eachprocessor core 201 includes acomputing engine 203 and amemory cache 205. As known to those of ordinary skill in the art, a computing engine contains logic devices for performing various computing functions, such as fetching software instructions and then performing the actions specified in the fetched instructions. These actions may include, for example, adding, subtracting, multiplying, and comparing numbers, performing logical operations such as AND, OR, NOR and XOR, and retrieving data. Eachcomputing engine 203 may then use itscorresponding memory cache 205 to quickly store and retrieve data and/or instructions for execution. - Each
processor core 201 is connected to aninterconnect 207. The particular construction of theinterconnect 207 may vary depending upon the architecture of theprocessor unit 201. With someprocessor cores 201, such as the Cell microprocessor created by Sony Corporation, Toshiba Corporation and IBM Corporation, theinterconnect 207 may be implemented as an interconnect bus. Withother processor units 201, however, such as the Opteron™ and Athlon™ dual-core processors available from Advanced Micro Devices of Sunnyvale, Calif., theinterconnect 207 may be implemented as a system request interface device. In any case, theprocessor cores 201 communicate through theinterconnect 207 with an input/output interfaces 209 and a memory controller 211. The input/output interface 209 provides a communication interface between theprocessor unit 201 and thebus 115. Similarly, the memory controller 211 controls the exchange of information between theprocessor unit 201 and thesystem memory 107. With some implementations of the invention, theprocessor units 201 may include additional components, such as a high-level cache memory accessible shared by theprocessor cores 201. More particularly, with various implementations, theprocessor cores 201 all have access to the same memory cache, which may, for example, be thememory cache units 205 shown. This is often referred to as “cache coherency.” - While
FIG. 2 shows one illustration of aprocessor unit 201 that may be employed by some embodiments of the invention, it should be appreciated that this illustration is representative only, and is not intended to be limiting. For example, some embodiments of the invention may employ amaster computer 103 with one or more Cell processors. The Cell processor employs multiple input/output interfaces 209 and multiple memory controllers 211. Also, the Cell processor has ninedifferent processor cores 201 of different types. More particularly, it has six or more synergistic processor elements (SPEs) and a power processor element (PPE). Each synergistic processor element has a vector-type computing engine 203 with 128×128 bit registers, four single-precision floating point computational units, four integer computational units, and a 256 KB local store memory that stores both instructions and data. The power processor element then controls that tasks performed by the synergistic processor elements. Because of its configuration, the Cell processor can perform some mathematical operations, such as the calculation of fast Fourier transforms (FFTs), at substantially higher speeds than many conventional processors. - It also should be appreciated that, with some implementations, a
multi-core processor unit 111 can be used in lieu of multiple,separate processor units 111. For example, rather than employing sixseparate processor units 111, an alternate implementation of the invention may employ asingle processor unit 111 having six cores, two multi-core processor units each having three cores, amulti-core processor unit 111 with four cores together with two separate single-core processor units 111, etc. - Returning now to
FIG. 1 , theinterface device 113 allows themaster computer 103 to communicate with theslave computers interface device 113 translates data and control signals from themaster computer 103 and each of the slave computers 117 into network messages according to one or more communication protocols, such as the transmission control protocol (TCP), the user datagram protocol (UDP), and the Internet protocol (IP). These and other conventional communication protocols are well known in the art, and thus will not be discussed here in more detail. - Each slave computer 117 may include a
memory 119, aprocessor unit 121, an interface device 122, and, optionally, one more input/output devices 125 connected together by a system bus 127. As with themaster computer 103, the optional input/output devices 125 for the slave computers 117 may include any conventional input or output devices, such as keyboards, pointing devices, microphones, display monitors, speakers, and printers. Similarly, theprocessor units 121 may be any type of conventional or custom-manufactured programmable processor device. For example, one or more of theprocessor units 121 may be commercially generic programmable microprocessors, such as Intel® Pentium® or Xeon™ microprocessors, Advanced Micro Devices Athlon™ microprocessors or Motorola 68K/Coldfire® microprocessors. Alternately, one or more of theprocessor units 121 may be custom-manufactured processors, such as microprocessors designed to optimally perform specific types of mathematical operations. Still further, one or more of theprocessor units 121 may have more than one core, as described with reference toFIG. 2 above. For example, with some implementations of the invention, one or more of theprocessor units 121 may be a Cell processor. Thememory 119 then may be implemented using any combination of the computer readable media discussed above. Like theinterface device 113, theinterface devices 123 allow the slave computers 117 to communicate with themaster computer 103 over the communication interface. - In the illustrated example, the
master computer 103 is a multi-processor unit computer withmultiple processor units 111, while each slave computer 117 has asingle processor unit 121. It should be noted, however, that alternate implementations of the invention may employ a master computer havingsingle processor unit 111. Further, one or more of the slave computers 117 may havemultiple processor units 121, depending upon their intended use, as previously discussed. Also, while only asingle interface device master computer 103 and the slave computers, it should be noted that, with alternate embodiments of the invention, either thecomputer 103, one or more of the slave computers 117, or some combination of both may use two or moredifferent interface devices - Furthermore, it is to be appreciated, that although in the example, the
master computer 103 and the slave computers 117 are shows as individual discrete units, some implementations may package themaster computers 103 and the slave computers 117 into a single unit, such as, for example, a System-on-Chip device. - With various examples of the invention, the
master computer 103 may be connected to one or more external data storage devices. These external data storage devices may be implemented using any combination of computer readable media that can be accessed by themaster computer 103. The computer readable media may include, for example, microcircuit memory devices such as read-write memory (RAM), read-only memory (ROM), electronically erasable and programmable read-only memory (EEPROM) or flash memory microcircuit devices, CD-ROM disks, digital video disks (DVD), or other optical storage devices. The computer readable media may also include magnetic cassettes, magnetic tapes, magnetic disks or other magnetic storage devices, punched media, holographic storage devices, or any other medium that can be used to store desired information. According to some implementations of the invention, one or more of the slave computers 117 may alternately or additions be connected to one or more external data storage devices. Typically, these external data storage devices will include data storage devices that also are connected to themaster computer 103, but they also may be different from any data storage devices accessible by themaster computer 103. - It also should be appreciated that the description of the computer network illustrated in
FIG. 1 andFIG. 2 is provided as an example only, and it not intended to suggest any limitation as to the scope of use or functionality of alternate embodiments of the invention. - As indicated above, a conventional symmetric multiprocessing system includes a plurality of processing units, capable of independently executing various tasks. For example,
FIG. 3 illustrates a conventionalsymmetric multiprocessing system 301. As can be seen from this figure, thesystem 301 includes acomputing environment 303, includingprocessing units 305. In various implementations of the invention, thecomputing environment 303 may be formed by thecomputer network 101 ofFIG. 1 . As such, theprocessing units 305 would comprise theprocessor units computing environment 303 may be formed by themaster computer 103. Accordingly, theprocessing units 305 would comprise theprocessor units 111. As can be appreciated, various components of thecomputing environment 303 are not shown in this example. For example, thecomputing environment 303 would likely include a memory component, which is some cases, may be implemented by thememory 107 shown inFIG. 1 . - The
system 301 also includes asymmetric multiprocessing scheduler 307 having asymmetric multiprocessing queue 309. Furthermore, as can be seen, thesymmetric multiprocessing queue 309 includestasks 311. As detailed above, in a conventional symmetric multiprocessing system, any of thetasks 311 may be executed on any of theprocessing units 305. Thesymmetric multiprocessing scheduler 307 may assignparticular tasks 311 to any of theprocessing units 305, and may change or move the assignments dynamically to balance the computational load efficiently. - However, as indicated above, this has some disadvantageous. One such disadvantage is that processing
units 305 are increasingly more specific toparticular tasks 311. This is often the case in embedded systems, where a particular processing unit may have been designed for a specific function, such as, for example, video encoding. Additionally, ones of theprocessing units 305 may have much higher power consumption needs than other ones of theprocessing units 305. As such, use of these processingunits 305 could be better controlled to manage power consumption for thesystem 301. - As used herein, a
processing unit 305 may be either a microprocessor or a core within a multi-core microprocessor, such as, for example, theprocessor unit 111 and theprocessor core 201 respectively. Furthermore, applicants would like to point out that although in practice, a distinction between a symmetric computing architecture (i.e. homogenous processing units that share memory) and an asymmetric computing architecture (i.e. heterogeneous processing units that share memory) may be made; herein, when referencing a symmetric multiprocessing system, not all processing units must be homogenous. For example, as used herein, a symmetric multiprocessing system may have a combination of single core microprocessors and multi-core microprocessors. Furthermore, the microprocessors may have different hardware specifications. Still, further, the microprocessors may have different computer processor architectures. -
FIG. 4 illustrates amethod 401 for bounding the processing domain of a symmetric multiprocessing system. For example, themethod 401 may be implemented in conjunction with the examplesymmetric multiprocessing system 501 shown inFIG. 5 . As can be seen from these figures, thesymmetric multiprocessing system 501 includes, among other items, acomputing environment 503 having processing units 505. In various implementations, thesymmetric multiprocessing system 501 may be formed by modifying thesymmetric multiprocessing system 301 shown inFIG. 3 . Still, in some implementations, thesymmetric multiprocessing system 501 may be formed by utilizing thecomputing network 101, or alternatively, from themaster computer 103 as thecomputing environment 503. - Returning to
FIG. 4 , themethod 401 includes anoperation 403 for initializing the processing units 505 within thesymmetric multiprocessing system 501 and an operation 405 for booting a symmetricmultiprocessing operating system 507 on one or more of theprocessing units 503. As shown in this Figure, the symmetricmultiprocessing operating system 507 is booted onto theprocessing unit 505 i. The processing unit 505 that loads the operating system (e.g. theprocessing unit 505 i in this example) is often referred to as the “boot processor.” In various implementations of the invention, the boot processor is used exclusively by the symmetricmultiprocessing operating system 507 for operations related to managing thesymmetric multiprocessing system 501. In alternative implementations, the boot processor is used to load the operating system, but is not used exclusively for operations related to managing thesymmetric multiprocessing system 501. Accordingly, in some implementations, the boot processor is available for general computing tasks unrelated to operating system management. With some implementations of the invention, theoperation 403 initializes all the processing units 505. With alternative implementations, theoperation 403 initializes only the boot processor. - The
method 401 further includes an operation 407 for loading thescheduler 509. As can be seen fromFIG. 5 , thescheduler 509 includes asymmetric multiprocessing queue 511. As can be further seen from this figure, thesystem 501 additionally includes a user application 515 having tasks 517. As used herein, the tasks 517 may be explicit instructions that processing units 505 may directly execute. Alternatively, the tasks 517 may be higher level operations that the symmetricmultiprocessing operating system 507 will translate into instructions that the processor units 505 may execute. - Although
FIG. 5 illustrates a single user application 515, and one set of tasks 517, in various implementations, more than one user application 515 may be executed by the symmetricmultiprocessing operating system 507. Additionally, in some implementations, the user application 515 may have multiple sets of tasks 517. Further still, as can be appreciated by those of skill in the art, the set of tasks 517 is typically not static. More particularly, the set of tasks 517 changes as the user application is executed. - The
method 401 additionally includes an operation 409 for generating a boundcomputational domain queue 513 and an operation 411 for moving selected tasks 517 to the boundcomputational domain queue 513. In various implementations of the invention, as a user application 515 is loaded by the symmetricmultiprocessing operating system 509 and tasks 517 associated with the user application 515 are identified, all the tasks 517 may be initially loaded into thesymmetric multiprocessing queue 511. More particularly, when thescheduler 509 is first loaded by the operation 407, the scheduler may only include thesymmetric multiprocessing queue 511, which will include all of the tasks 517. - In various implementations of the invention, the operation 409 and the operation 411 are performed as a result of a software application interface (API), which is some cases, may be the result of a users input. With some implementations, the operation 409 and the operation 411 are triggered without user input, such as, for example, based upon the type of user application 515 or the type of task 517. In various implementations of the invention, the operations 409 and 411 may be repeated a number of times, resulting in more than one bound
computational domain queue 513 being created within thescheduler 509. - The
method 401 further includes an operation 413 for forming a processing domain boundary for the boundcomputational domain queue 513. As stated above, in various implementations, an “affinity” is created between a boundcomputational domain queue 513 and one or more processing units 505. Alternatively, a “link” is created between a boundcomputational domain queue 513 and one or more processing units 505. These example processing domain boundaries are discussed in greater detail below. - Bound Computational Domain with Affinity
- In various implementations, the operation 413 “affines” one or more of the processing units 505 to the bound
computational domain queue 513. Tasks 517 included in a boundcomputational domain queue 513 that is “affined” to a particular processing unit 505 are said to be affined to that particular processing unit 505. Tasks 517 that are affined to a particular processing unit 505 are given “priority” by thescheduler 509 to execute on that particular processing unit 505. However, when tasks 517 having an affinity for the selected processing unit 505 are not being executed, the processing unit 505 is available for scheduling non-affined tasks 517 by thescheduler 509. Priority of execution may be shown by thescheduler 509 by transferring execution of non-affined tasks 517 to idle processing units 505 when affined tasks 517 need to be executed. Alternatively, priority may be shown by stalling execution of the affined task 517 until the affined processing unit 505 is available for executing tasks 517. - In some implementations, a single processing unit 505 is affined to a bound
computational domain queue 513 by the operations 413. With some implementations, multiple processing units 505 are affined to a boundcomputational domain queue 513. For example,FIG. 6 illustrates thesymmetric multiprocessing system 501 ofFIG. 5 , where the boundcomputational domain queue 513 has been affined to the processing unit 505 iii and theprocessing unit 505 n, as illustrated by theboundary 603. As can be seen from this figure, the user application 515 is not shown. However, the tasks 517 from the user application 515 have been moved into thesymmetric multiprocessing queue 511 and the boundcomputational domain queue 513. - As a result of the affinity created by the operation 415 (as illustrated by the boundary 603) the
scheduler 509 may assign the tasks 517 iv, 517 v, and 517 n to execute on either of the processing unit 505 iii or 505 n. Additionally, thescheduler 509 may assign thetasks 517 i, 517 ii, or 517 iii to execute on the processing unit 505 ii. Alternatively, if the processing unit 505 iii is not executing tasks 517 from the boundcomputational domain queue 513, tasks 517 from thesymmetric multiprocessing queue 511 may be executed on the processing unit 505 iii. Alternatively still, if theprocessing unit 505 n is not executing tasks 517 from the boundcomputational domain queue 513, tasks 517 from thesymmetric multiprocessing queue 511 may be executed on theprocessing unit 505 n. - Bound Computational Domain with Link
- As stated above, with some implementations, the operation 413 may “link” one or more the processing units 505 to a bound
computational domain queue 513. Processing units 505 that have been linked to a particular task 517 or set of tasks 517 can only execute those tasks 517. When there are no linked tasks to execute, the processor remains idle, as opposed to becoming available for scheduling as in the case of an affined processing unit 505. -
FIG. 7 illustrates thesymmetric multiprocessing system 501 shown inFIG. 5 andFIG. 6 . However,FIG. 7 includes aboundary 703 that shows a link, as opposed to an affinity as shown by theboundary 603 inFIG. 6 . As can be seen fromFIG. 7 ,boundary 603 isolates the processing units 505 iii or 505 n to the boundcomputational domain 513. As a result, only the tasks 517 iv, 517 v, and 517 n may be executed by the processing units 505 iii and 505 n. - In various implementations, as opposed to bounding a queue of tasks 517, such as, for example, the bounded
computational domain queue 513, as described above, the processing domain for individual tasks 517 may be bound. For example, the operation 415 may directly affine thetask 517 v with the processing unit 505 iii. As opposed to including thetask 517 v into a boundcomputational domain queue 513 and then bounding the processing domain for thequeue 513. - As stated above, in various implementations of the invention, a bound
computational domain queue 513 may be created as a result of some user input. This may be facilitated by providing an application programming interface (API) instruction set that includes instructions for defining and manipulating boundcomputational domain queues 513. The following is an illustrative set of application programming interface instructions that may be provided in various implementations of the invention. - Although certain devices and methods have been described above in terms of the illustrative embodiments, the person of ordinary skill in the art will recognize that other embodiments, examples, substitutions, modification and alterations are possible.
- It is intended that the following claims cover such other embodiments, examples, substitutions, modifications and alterations within the spirit and scope of the claims.
Claims (20)
1. A computer-implemented method for bounding the processing domain in a symmetric multiprocessing system, the method comprising:
identifying a symmetric multiprocessing system, the symmetric multiprocessing system including a plurality of processing units;
identifying a plurality of tasks to be scheduled for execution by the symmetric multiprocessing system;
forming a computationally bound task queue;
moving selected ones of the plurality of tasks to be scheduled to the computationally bound task queue; and
bounding the processing domain for the computationally bound task queue.
2. The computer-implemented method recited in claim 1 , further comprising:
causing a symmetric multiprocessing operating system to boot onto a one of the plurality of processing units.
3. The computer-implemented method recited in claim 2 , further comprising:
loading a symmetric multiprocessing operating system scheduler.
4. The computer-implemented method recited in claim 3 , the method act for identifying a plurality of tasks to be scheduled for execution by the symmetric multiprocessing operating system comprising:
identifying a symmetric multiprocessing task queue within the symmetric multiprocessing operating system scheduler;
identifying a plurality of tasks within the symmetric multiprocessing task queue; and
designating the identified tasks as the plurality of tasks to be scheduled.
5. The computer-implemented method recited in claim 4 , the method act of forming a computationally bound task queue comprising:
receiving an instruction from a user of the symmetric multiprocessing system to create a bound computational domain; and
forming a task queue within the symmetric multiprocessing operating system scheduler to represent the computationally bound task queue.
6. The computer-implemented method recited in claim 5 , the instruction including a listing of one or more of the plurality of tasks to be scheduled and the method act of moving selected ones of the plurality of tasks to be scheduled to the computationally bound task queue comprising:
adding the one or more of the plurality of tasks to be scheduled listed in the instruction to the computationally bound task queue; and
removing the one or more of the plurality of tasks to be scheduled listed in the instruction from the symmetric multiprocessing task queue.
7. The computer-implemented method recited in claim 6 , the instruction including a listing of one or more of the plurality processing units and the method act of bounding the processing domain for the computationally bound task queue comprising affining the computationally bound task queue to the one or more of the plurality of processing units listed in the instruction.
8. The computer-implemented method recited in claim 6 , the instruction including a listing of one or more of the plurality processing units and the method act of bounding the processing domain for the computationally bound task queue comprising linking the computationally bound task queue to the one or more of the plurality of processing units listed in the instruction.
9. The computer-implemented method recited in claim 1 , further comprising
forming a second computationally bound task queue;
moving selected ones of the plurality of tasks to the second computationally bound task queue; and
bounding the processing domain for the computationally bound task queue.
10. The computer-implemented method recited in claim 1 , further comprising:
unbounding the computationally bound task queue; and
removing the computationally bound task queue from the symmetric multiprocessing system.
11. One or more tangible computer-readable media, having computer executable instructions for bounding the processing domain in a symmetric multiprocessing system stored thereon, the computer executable instructions comprising:
causing a computer to perform a set of operations; and
wherein the set of operations include:
identifying a symmetric multiprocessing system, the symmetric multiprocessing system including a plurality of processing units;
identifying a plurality of tasks to be scheduled for execution by the symmetric multiprocessing system;
forming a computationally bound task queue;
moving selected ones of the plurality of tasks to be scheduled to the computationally bound task queue; and
bounding the processing domain for the computationally bound task queue.
12. The one or more tangible computer-readable media recited in claim 11 , the symmetric multiprocessing system including a symmetric multiprocessing operating system scheduler and the operation for identifying a plurality of tasks to be scheduled for execution by the symmetric multiprocessing operating system comprising:
identifying a symmetric multiprocessing task queue;
identifying a plurality of tasks within the symmetric multiprocessing task queue; and
designating the identified tasks as the plurality of tasks to be scheduled.
13. The one or more tangible computer-readable media recited in claim 12 , the operation for forming a computationally bound task queue comprising:
receiving an instruction from a user of the symmetric multiprocessing system to create a bound computational domain; and
forming a task queue within the symmetric multiprocessing operating system scheduler to represent the computationally bound task queue.
14. The one or more tangible computer-readable media recited in claim 13 , the instruction including a listing of one or more of the plurality of tasks to be scheduled and the operation for moving selected ones of the plurality of tasks to be scheduled to the computationally bound task queue comprising:
adding the one or more of the plurality of tasks to be scheduled listed in the instruction to the computationally bound task queue; and
removing the one or more of the plurality of tasks to be scheduled listed in the instruction from the symmetric multiprocessing task queue.
15. The one or more tangible computer-readable media recited in claim 14 , the instruction including a listing of one or more of the plurality processing units and the operation for bounding the processing domain for the computationally bound task queue comprising affining the computationally bound task queue to the one or more of the plurality of processing units listed in the instruction.
16. The one or more tangible computer-readable media recited in claim 14 , the instruction including a listing of one or more of the plurality processing units and the operation for bounding the processing domain for the computationally bound task queue comprising linking the computationally bound task queue to the one or more of the plurality of processing units listed in the instruction.
17. A symmetric multiprocessing system adapted to allowing bounded processing, the system comprising:
a plurality of processing units;
a plurality of tasks to be scheduled for execution by the system; and
a memory including a set of instructions that cause the system to perform:
forming a computationally bound task queue;
moving selected ones of the plurality of tasks to be scheduled to the computationally bound task queue; and
bounding the processing do
main for the computationally bound task queue.
18. The symmetric multiprocessing system recited in claim 17 , wherein the set of instructions are included in an application programming interface.
19. The symmetric multiprocessing system recited in claim 18 , the instruction for bounding the processing domain for a computationally bound task queue comprising:
receiving a listing of one or more of the plurality processing units
receiving a boundary relationship affining the computationally bound task queue to the one or more of the plurality of processing units listed in the instruction.
20. The symmetric multiprocessing system recited in claim 18 , the instruction for bounding the processing domain for a computationally bound task queue comprising:
receiving a listing of one or more of the plurality processing units
receiving a boundary relationship linking the computationally bound task queue to the one or more of the plurality of processing units listed in the instruction.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/815,299 US20110010716A1 (en) | 2009-06-12 | 2010-06-14 | Domain Bounding for Symmetric Multiprocessing Systems |
US13/771,059 US20130318531A1 (en) | 2009-06-12 | 2013-02-19 | Domain Bounding For Symmetric Multiprocessing Systems |
US14/949,842 US10228970B2 (en) | 2009-06-12 | 2015-11-23 | Domain bounding for symmetric multiprocessing systems |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18676009P | 2009-06-12 | 2009-06-12 | |
US12/815,299 US20110010716A1 (en) | 2009-06-12 | 2010-06-14 | Domain Bounding for Symmetric Multiprocessing Systems |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/771,059 Continuation US20130318531A1 (en) | 2009-06-12 | 2013-02-19 | Domain Bounding For Symmetric Multiprocessing Systems |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110010716A1 true US20110010716A1 (en) | 2011-01-13 |
Family
ID=43428441
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/815,299 Abandoned US20110010716A1 (en) | 2009-06-12 | 2010-06-14 | Domain Bounding for Symmetric Multiprocessing Systems |
US13/771,059 Abandoned US20130318531A1 (en) | 2009-06-12 | 2013-02-19 | Domain Bounding For Symmetric Multiprocessing Systems |
US14/949,842 Active 2031-08-23 US10228970B2 (en) | 2009-06-12 | 2015-11-23 | Domain bounding for symmetric multiprocessing systems |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/771,059 Abandoned US20130318531A1 (en) | 2009-06-12 | 2013-02-19 | Domain Bounding For Symmetric Multiprocessing Systems |
US14/949,842 Active 2031-08-23 US10228970B2 (en) | 2009-06-12 | 2015-11-23 | Domain bounding for symmetric multiprocessing systems |
Country Status (1)
Country | Link |
---|---|
US (3) | US20110010716A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140244983A1 (en) * | 2013-02-26 | 2014-08-28 | Qualcomm Incorporated | Executing an operating system on processors having different instruction set architectures |
US8848576B2 (en) | 2012-07-26 | 2014-09-30 | Oracle International Corporation | Dynamic node configuration in directory-based symmetric multiprocessing systems |
US20170303264A1 (en) * | 2016-04-13 | 2017-10-19 | Qualcomm Incorporated | System and method for beam management |
US20190087224A1 (en) * | 2017-09-20 | 2019-03-21 | Samsung Electronics Co., Ltd. | Method, system, apparatus, and/or non-transitory computer readable medium for the scheduling of a plurality of operating system tasks on a multicore processor and/or multi-processor system |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874129B (en) * | 2017-02-04 | 2020-01-10 | 北京信息科技大学 | Method for determining process scheduling sequence of operating system and control method |
CN109597650A (en) * | 2017-09-30 | 2019-04-09 | 中兴通讯股份有限公司 | A kind of method, apparatus, equipment and the storage medium of multiple operating system starting |
US12105666B2 (en) * | 2021-04-19 | 2024-10-01 | Advanced Micro Devices, Inc. | Master-slave communication with subdomains |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5109512A (en) * | 1990-05-31 | 1992-04-28 | International Business Machines Corporation | Process for dispatching tasks among multiple information processors |
US5566349A (en) * | 1994-05-16 | 1996-10-15 | Trout; Ray C. | Complementary concurrent cooperative multi-processing multi-tasking processing system using shared memories with a minimum of four complementary processors |
US6289369B1 (en) * | 1998-08-25 | 2001-09-11 | International Business Machines Corporation | Affinity, locality, and load balancing in scheduling user program-level threads for execution by a computer system |
US6427161B1 (en) * | 1998-06-12 | 2002-07-30 | International Business Machines Corporation | Thread scheduling techniques for multithreaded servers |
US6480876B2 (en) * | 1998-05-28 | 2002-11-12 | Compaq Information Technologies Group, L.P. | System for integrating task and data parallelism in dynamic applications |
US6728959B1 (en) * | 1995-08-08 | 2004-04-27 | Novell, Inc. | Method and apparatus for strong affinity multiprocessor scheduling |
US20040117793A1 (en) * | 2002-12-17 | 2004-06-17 | Sun Microsystems, Inc. | Operating system architecture employing synchronous tasks |
US6785774B2 (en) * | 2001-10-16 | 2004-08-31 | International Business Machines Corporation | High performance symmetric multiprocessing systems via super-coherent data mechanisms |
US20060168571A1 (en) * | 2005-01-27 | 2006-07-27 | International Business Machines Corporation | System and method for optimized task scheduling in a heterogeneous data processing system |
US20070011646A1 (en) * | 2005-06-24 | 2007-01-11 | College Of William And Mary | Parallel Decoupled Mesh Generation |
US7318126B2 (en) * | 2005-04-11 | 2008-01-08 | International Business Machines Corporation | Asynchronous symmetric multiprocessing |
US20080177756A1 (en) * | 2007-01-18 | 2008-07-24 | Nicolai Kosche | Method and Apparatus for Synthesizing Hardware Counters from Performance Sampling |
US7475195B2 (en) * | 2005-05-24 | 2009-01-06 | International Business Machines Corporation | Data processing system, cache system and method for actively scrubbing a domain indication |
US20090132769A1 (en) * | 2007-11-19 | 2009-05-21 | Microsoft Corporation | Statistical counting for memory hierarchy optimization |
US20090187713A1 (en) * | 2006-04-24 | 2009-07-23 | Vmware, Inc. | Utilizing cache information to manage memory access and cache utilization |
US7774555B2 (en) * | 2005-02-10 | 2010-08-10 | International Business Machines Corporation | Data processing system and method for efficient coherency communication utilizing coherency domain indicators |
US20110041131A1 (en) * | 2009-08-11 | 2011-02-17 | International Business Machines Corporation | Migrating tasks across processors |
US7913257B2 (en) * | 2004-12-01 | 2011-03-22 | Sony Computer Entertainment Inc. | Scheduling method, scheduling apparatus and multiprocessor system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5459864A (en) * | 1993-02-02 | 1995-10-17 | International Business Machines Corporation | Load balancing, error recovery, and reconfiguration control in a data movement subsystem with cooperating plural queue processors |
FR2792087B1 (en) * | 1999-04-07 | 2001-06-15 | Bull Sa | METHOD FOR IMPROVING THE PERFORMANCE OF A MULTIPROCESSOR SYSTEM INCLUDING A WORK WAITING LINE AND SYSTEM ARCHITECTURE FOR IMPLEMENTING THE METHOD |
US7178145B2 (en) * | 2001-06-29 | 2007-02-13 | Emc Corporation | Queues for soft affinity code threads and hard affinity code threads for allocation of processors to execute the threads in a multi-processor system |
US7464380B1 (en) * | 2002-06-06 | 2008-12-09 | Unisys Corporation | Efficient task management in symmetric multi-processor systems |
US20050273571A1 (en) * | 2004-06-02 | 2005-12-08 | Lyon Thomas L | Distributed virtual multiprocessor |
-
2010
- 2010-06-14 US US12/815,299 patent/US20110010716A1/en not_active Abandoned
-
2013
- 2013-02-19 US US13/771,059 patent/US20130318531A1/en not_active Abandoned
-
2015
- 2015-11-23 US US14/949,842 patent/US10228970B2/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5109512A (en) * | 1990-05-31 | 1992-04-28 | International Business Machines Corporation | Process for dispatching tasks among multiple information processors |
US5566349A (en) * | 1994-05-16 | 1996-10-15 | Trout; Ray C. | Complementary concurrent cooperative multi-processing multi-tasking processing system using shared memories with a minimum of four complementary processors |
US6728959B1 (en) * | 1995-08-08 | 2004-04-27 | Novell, Inc. | Method and apparatus for strong affinity multiprocessor scheduling |
US6480876B2 (en) * | 1998-05-28 | 2002-11-12 | Compaq Information Technologies Group, L.P. | System for integrating task and data parallelism in dynamic applications |
US6427161B1 (en) * | 1998-06-12 | 2002-07-30 | International Business Machines Corporation | Thread scheduling techniques for multithreaded servers |
US6289369B1 (en) * | 1998-08-25 | 2001-09-11 | International Business Machines Corporation | Affinity, locality, and load balancing in scheduling user program-level threads for execution by a computer system |
US6785774B2 (en) * | 2001-10-16 | 2004-08-31 | International Business Machines Corporation | High performance symmetric multiprocessing systems via super-coherent data mechanisms |
US20040117793A1 (en) * | 2002-12-17 | 2004-06-17 | Sun Microsystems, Inc. | Operating system architecture employing synchronous tasks |
US7913257B2 (en) * | 2004-12-01 | 2011-03-22 | Sony Computer Entertainment Inc. | Scheduling method, scheduling apparatus and multiprocessor system |
US20060168571A1 (en) * | 2005-01-27 | 2006-07-27 | International Business Machines Corporation | System and method for optimized task scheduling in a heterogeneous data processing system |
US7774555B2 (en) * | 2005-02-10 | 2010-08-10 | International Business Machines Corporation | Data processing system and method for efficient coherency communication utilizing coherency domain indicators |
US7318126B2 (en) * | 2005-04-11 | 2008-01-08 | International Business Machines Corporation | Asynchronous symmetric multiprocessing |
US7475195B2 (en) * | 2005-05-24 | 2009-01-06 | International Business Machines Corporation | Data processing system, cache system and method for actively scrubbing a domain indication |
US20070011646A1 (en) * | 2005-06-24 | 2007-01-11 | College Of William And Mary | Parallel Decoupled Mesh Generation |
US20090187713A1 (en) * | 2006-04-24 | 2009-07-23 | Vmware, Inc. | Utilizing cache information to manage memory access and cache utilization |
US20080177756A1 (en) * | 2007-01-18 | 2008-07-24 | Nicolai Kosche | Method and Apparatus for Synthesizing Hardware Counters from Performance Sampling |
US20090132769A1 (en) * | 2007-11-19 | 2009-05-21 | Microsoft Corporation | Statistical counting for memory hierarchy optimization |
US20110041131A1 (en) * | 2009-08-11 | 2011-02-17 | International Business Machines Corporation | Migrating tasks across processors |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8848576B2 (en) | 2012-07-26 | 2014-09-30 | Oracle International Corporation | Dynamic node configuration in directory-based symmetric multiprocessing systems |
US20140244983A1 (en) * | 2013-02-26 | 2014-08-28 | Qualcomm Incorporated | Executing an operating system on processors having different instruction set architectures |
US10437591B2 (en) * | 2013-02-26 | 2019-10-08 | Qualcomm Incorporated | Executing an operating system on processors having different instruction set architectures |
US20170303264A1 (en) * | 2016-04-13 | 2017-10-19 | Qualcomm Incorporated | System and method for beam management |
US20190087224A1 (en) * | 2017-09-20 | 2019-03-21 | Samsung Electronics Co., Ltd. | Method, system, apparatus, and/or non-transitory computer readable medium for the scheduling of a plurality of operating system tasks on a multicore processor and/or multi-processor system |
US11055129B2 (en) * | 2017-09-20 | 2021-07-06 | Samsung Electronics Co., Ltd. | Method, system, apparatus, and/or non-transitory computer readable medium for the scheduling of a plurality of operating system tasks on a multicore processor and/or multi-processor system |
Also Published As
Publication number | Publication date |
---|---|
US20160217006A1 (en) | 2016-07-28 |
US20130318531A1 (en) | 2013-11-28 |
US10228970B2 (en) | 2019-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10228970B2 (en) | Domain bounding for symmetric multiprocessing systems | |
US10713095B2 (en) | Multi-core processor and method of controlling the same using revisable translation tables | |
TWI594117B (en) | Profiling application code to identify code portions for fpga inplementation | |
CN109669772B (en) | Parallel execution method and equipment of computational graph | |
US10942824B2 (en) | Programming model and framework for providing resilient parallel tasks | |
US20070150895A1 (en) | Methods and apparatus for multi-core processing with dedicated thread management | |
US9619298B2 (en) | Scheduling computing tasks for multi-processor systems based on resource requirements | |
US10157089B2 (en) | Event queue management for embedded systems | |
US9009716B2 (en) | Creating a thread of execution in a computer processor | |
US20150254113A1 (en) | Lock Spin Wait Operation for Multi-Threaded Applications in a Multi-Core Computing Environment | |
US20120317582A1 (en) | Composite Contention Aware Task Scheduling | |
US20130061231A1 (en) | Configurable computing architecture | |
JP2017537393A (en) | Efficient Synchronous Barrier Technology with Worksteeling Support [Cross Reference to Related US Patent Application] This application is a US patent application Ser. No. 14 / 568,831 filed Dec. 12, 2014 (invention “TECHNOLOGIES FOR EFFICENTENT”). SYNCHRONIZATION BARRIERS WITH WORK STEARING SUPPORT ”). | |
CN107957965B (en) | Quality of service ordinal modification | |
KR100694212B1 (en) | Distribution operating system functions for increased data processing performance in a multi-processor architecture | |
US20200319893A1 (en) | Booting Tiles of Processing Units | |
WO2016028425A1 (en) | Programmatic decoupling of task execution from task finish in parallel programs | |
JP2022060151A (en) | Firmware boot task distribution to enable low latency boot performance | |
US10241838B2 (en) | Domain based resource isolation in multi-core systems | |
KR20150101870A (en) | Method and apparatus for avoiding bank conflict in memory | |
JP2012099155A (en) | Runtime polymorphism | |
US20240232622A9 (en) | Apparatus, articles of manufacture, and methods for managing processing units | |
JP2009245409A (en) | Automatic resource configuration system and method, and management terminal for the same | |
Grudnitsky et al. | COREFAB: Concurrent reconfigurable fabric utilization in heterogeneous multi-core systems | |
US20120137300A1 (en) | Information Processor and Information Processing Method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MENTOR GRAPHICS CORPORATION, OREGON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RAGHURAMAN, ARVIND;DRISCOLL, DANIEL;TRIPPI, MICHAEL;REEL/FRAME:025035/0042 Effective date: 20100921 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |