EP2297639A2 - Verfahren zur verwendung paralleler verarbeitungskonstrukte - Google Patents
Verfahren zur verwendung paralleler verarbeitungskonstrukteInfo
- Publication number
- EP2297639A2 EP2297639A2 EP09751296A EP09751296A EP2297639A2 EP 2297639 A2 EP2297639 A2 EP 2297639A2 EP 09751296 A EP09751296 A EP 09751296A EP 09751296 A EP09751296 A EP 09751296A EP 2297639 A2 EP2297639 A2 EP 2297639A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- program
- labs
- parallel processing
- spmd
- parallel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
Definitions
- Closely-coupled processors or hardware resources will likely become widely available within the near future.
- Examples of such closely-coupled processors (or hardware resources) may include additional processors, threads in a particular processor, additional cores in a central processing unit, additional processors mounted on the same substrate or board, and/or such devices provided within computers connected by a network fabric into a cluster, a grid, or a collection of resources.
- Parallel computing arrangements may include a controller that determines how an application should be divided and what application portions go to which parallel processors.
- a host computer that is running a simulation may act as the controller for a number of parallel processors.
- Parallel processors may receive instructions and/or data from the controller and may return a result to the controller.
- Fig. 1 depicts an exemplary diagram of an architectural overview in which implementations described herein may be practiced
- FIG. 2 illustrates an exemplary diagram of a hardware environment depicted in Fig. 1
- Fig. 3 depicts an exemplary diagram of a batch (or distributed computing) environment illustrated in Fig. 1 ;
- Fig. 5B illustrates an exemplary diagram of functional components of the parallel processing interface in an alternative arrangement
- Fig. 5C depicts an exemplary diagram of functional components of the parallel processing interface in another alternative arrangement
- Fig. 6 illustrates exemplary hardware components of a client and/or a web service depicted in Figs. 5A and 5B
- Fig. 7 depicts an exemplary parallel processing construct capable of being analyzed and transformed to parallel program portions by the analysis logic depicted in Figs. 5A and 5B;
- FIG. 8 illustrates an exemplary diagram of a parallel processing construct capable of being generated by a technical computing environment depicted in Fig. 7;
- Fig. 9 depicts a flow chart of an exemplary process capable of being performed by the analysis logic and resource allocation logic illustrated in Figs. 5A-5C;
- Fig. 10 illustrates an exemplary diagram of a parallel processing construct capable of being generated by the technical computing environment depicted in Fig. 7;
- Figs. 1 IA and 1 IB depict a flow chart of an exemplary process capable of being performed by the analysis logic and resource allocation logic illustrated in Figs. 5A-5C;
- Fig. 12 illustrates an exemplary diagram of functional components, of the analysis logic depicted in Figs. 5A and 5B, for determining input and output variables;
- Fig. 13 depicts an exemplary diagram of data transfer rules associated with parallel processing constructs described herein and capable of being implemented by the analysis logic depicted in Figs. 5A and 5B;
- Fig. 14 illustrates an exemplary conversion application program interface capable of being provided by the client depicted in Figs. 5A-5C;
- Fig. 15 depicts an exemplary diagram of functional components, of the client depicted in Figs. 5A-5C, for handling errors associated with one or more labs illustrated in Fig. 7;
- Fig. 16 illustrates an exemplary diagram of creating a pool that may include one or more labs depicted in Fig. 7, and of interacting with the pool via the exemplary parallel processing construct depicted in Fig. 7;
- Figs. 17A and 17B depict an exemplary diagram of providing a desired number of labs to an idle sub-pool of labs, and of restoring the idle sub-pool of labs;
- Fig. 18 illustrates an exemplary operation of the resource allocation logic of the parallel program interfaces depicted in Figs. 5A and 5B during nesting of parallel program constructs;
- Fig. 19 depicts an exemplary diagram of controlling lifetimes of variables with a parallel processing construct capable of being generated by the technical computing environment illustrated in Fig. 7;
- Fig. 20 illustrates an exemplary diagram of execution of a parallel processing construct capable of being generated by the technical computing environment depicted in Fig. 7;
- Fig. 21 depicts an alternative exemplary diagram of execution of a parallel processing construct capable of being generated by the technical computing environment illustrated in Fig. 7;
- Figs. 22-28 depict flow charts associated with an exemplary process according to implementations described herein.
- Implementations described herein may provide systems and/or methods for performing parallel processing.
- the systems and/or methods may receive a program created for a technical computing environment, may analyze the program, and may determine an inner context and an outer context of the program based on the analysis of the program.
- the systems and/or methods may allocate one or more portions of the inner context of the program to two or more labs for parallel execution, and may receive one or more results associated with the parallel execution of the one or more portions from the two or more labs.
- the systems and/or methods may further provide the one or more results to the program (e.g., to the outer context of the program).
- a lab may include hardware, software, and/or combination of hardware and software that performs and/or participates in parallel processing activities.
- a lab may perform and/or participate in parallel processing activities in response to a request and/or a task received from a client.
- a lab may be implemented as a software unit of execution and/or a hardware unit of execution.
- a lab may perform and/or participate in substantially any type of parallel processing (e.g., task, data, and/or stream processing).
- a lab may perform and/or participate in parallel processing activities in response to a receipt of a program and/or one or more portions of the program.
- a lab may support one or more threads (or processes) when performing processing operations.
- Parallel processing may include any type of processing that can be distributed across two or more resources (e.g., software units of execution, hardware units of execution, processors, microprocessors, clusters, labs, etc.) and be performed at substantially the same time.
- resources e.g., software units of execution, hardware units of execution, processors, microprocessors, clusters, labs, etc.
- parallel processing may refer to task parallel processing where a number of tasks are processed at substantially the same time on a number of software units of execution.
- each task may be processed independently of other tasks executing at the same time (e.g., a first software unit of execution executing a first task may not communicate with a second software unit of execution executing a second task).
- parallel processing may refer to data parallel processing, where data (e.g., a data set) is parsed into a number of portions that are executed in parallel using two or more software units of execution. In data parallel processing, the software units of execution and/or the data portions may communicate with each other as processing progresses.
- parallel processing may refer to stream parallel processing (also referred to as pipeline parallel processing). Stream parallel processing may use a number of software units of execution arranged in series (e.g., a line) where a first software unit of execution produces a first result that is fed to a second software unit of execution that produces a second result.
- Stream parallel processing may also include a state where task allocation may be expressed in a directed acyclic graph (DAG) or a cyclic graph with delays.
- DAG directed acyclic graph
- Other implementations may combine two or more of task, data, or stream parallel processing techniques alone or with other types of processing techniques to form hybrid- parallel processing techniques.
- a parallel processing environment may include any environment capable of performing parallel processing.
- a parallel processing environment may include a dynamic number of processes provided on one or more hardware, software, and/or a combination of hardware and software units of execution which may have several different control and data passing layers through which a current behavior of a part or a whole of the environment may be specified.
- a front-end application e.g., a parallel processing interface
- the processes involved in the parallel processing environment may include processes associated with a technical computing environment.
- a technical computing environment may include any hardware, software, and/or a combination of hardware and software based logic that provides a computing environment that allows users to perform tasks related to disciplines, such as, but not limited to, mathematics, science, engineering, medicine, business, etc., more efficiently than if the tasks were performed in another type of computing environment, such as an environment that required the user to develop code in a conventional programming language, such as C++, C, Fortran, Pascal, etc.
- a TCE may include a dynamically-typed programming language (e.g., the M language or MATLAB® language) that can be used to express problems and/or solutions in mathematical notations.
- a TCE may use an array as a basic element, where the array may not require dimensioning.
- a TCE may be implemented as a text-based environment (e.g., MATLAB® software; Octave; Python; Comsol Script; MATRIXx from National Instruments; Mathematica from Wolfram Research, Inc.; Mathcad from Mathsoft Engineering & Education Inc.; Maple from Maplesoft; Extend from Imagine That Inc.; Scilab from The French Institution for Research in Computer Science and Control (INRIA); Virtuoso from Cadence; Modelica or Dymola from Dynasim; etc.), a graphically -based environment (e.g., Simulink® software, Stateflow® software, SimEventsTM software, etc., by The Math Works, Inc.; VisSim by Visual Solutions; Lab View® by National Instruments; Dymola by Dynasim; SoftWIRE by Measurement Computing; WiT by DALSA Coreco; VEE Pro or System Vue by Agilent; Vision Program Manager from PPT Vision; Khoros from Khoral Research; Gedae by Gedae, Inc.; Scicos from (IN
- FIG. 1 is an exemplary diagram of an architectural overview 100 in which implementations described herein may be practiced. As illustrated, overview 100 may include a hardware environment
- a batch (or distributed computing) environment 120 a parallel processing environment 130, and/or a parallel processing interface 140.
- Hardware environment 110 may include one or more hardware resources that may be used to perform parallel processing.
- hardware environment 110 may include one or more hardware units of execution. Further details of hardware environment 110 are provided below in connection with Fig. 2.
- Batch environment 120 may provide a distributed computing environment for a job.
- batch (or distributed computing) environment 120 may include a client that provides a job to a scheduler.
- the scheduler may distribute the job into one or more tasks, and may provide the tasks to one or more hardware units of execution and/or one or more processors.
- the hardware units of execution and/or processors may execute the tasks, and may provide results to the scheduler.
- the scheduler may combine the results into a single result, and may provide the single result to the client. Further details of batch environment 120 are provided below in connection with Fig. 3.
- Parallel processing environment 130 may provide parallel processing for a main program.
- parallel processing environment 130 may include a technical computing environment that provides a main program to a controller.
- the controller may provide portions of the program to one or more software units of execution and/or one more labs.
- the software units of execution and/or labs may execute the program portions, and may provide results to the controller.
- the controller may combine the results into a single result, and may provide the single result to the technical computing environment. Further details of parallel processing environment 130 are provided below in connection with Fig. 4.
- Parallel processing interface 140 may include a front-end application (e.g., an application program interface (API)) that provides an interface for dynamically accessing, controlling, utilizing, etc. hardware environment 110, batch environment 120, and/or parallel processing environment 130.
- parallel processing interface 140 may include parallel processing constructs that permit users to express specific parallel workflows.
- parallel processing interface 140 may include a program provider that provides a main program to analysis logic. The analysis logic may analyze the main program, may parse the main program into program portions, and may provide the program portions to resource allocation logic. The resource allocation logic may allocate the program portions to one or more software units of execution and/or hardware units of execution. The program portions may be executed, and results may be provided to the program provider.
- parallel processing interface 140 may include an object API where a user may specify how a program may be parallelized. Further details of parallel processing interface 140 are provided below in connection with Figs. 5A-5C.
- FIG. 1 shows exemplary components of architectural overview 100
- architectural overview 100 may contain fewer, different, or additional components than depicted in Fig. 1.
- Fig. 2 is an exemplary diagram of hardware environment 110.
- hardware environment 110 may include a hardware unit of execution (UE) 200 with one or more processors 210-1, 210-2, 210-3, 210-4 (collectively, "processors 210").
- a hardware unit of execution may include a device (e.g., a hardware resource) that performs and/or participates in parallel processing activities.
- a hardware unit of execution may perform and/or participate in parallel processing activities in response to a request and/or a task received from a client.
- a hardware unit of execution may perform and/or participate in substantially any type of parallel processing (e.g., task, data, and/or stream processing) using one or more devices.
- a hardware unit of execution may include a single processor that includes multiple cores and in another implementation, the hardware unit of execution may include a number of processors.
- Devices used in a hardware unit of execution may be arranged in substantially any configuration (or topology), such as a grid, ring, star, etc.
- a hardware unit of execution may support one or more threads (or processes) when performing processing operations.
- hardware UE 200 may perform parallel processing activities on behalf of another device.
- hardware UE 200 may perform parallel processing activities on behalf of itself or on behalf of a host of which hardware UE 200 is a part.
- Hardware UE 200 may perform parallel processing in a variety of ways. For example, hardware UE 200 may perform parallel processing activities related to task parallel processing, data parallel processing, stream parallel processing, etc. Hardware UE 200 may perform parallel processing using processing devices resident on UE 200 and/or using processing devices that are remote with respect to UE 200.
- hardware UE 200 may include processors 210-1, 210-2, 210-3, and 210-4.
- Processors 210 may include hardware, software, and/or a combination of hardware and software based logic that performs processing operations.
- Processors 210 may include substantially any type of processing device, such as a central processing unit (CPU), a microprocessor, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), a micro electrical mechanical switch (MEMS), a general purpose graphical processing unit (GPGPU), an optical processor, etc.
- each processor 210-1 through 210-4 may include a single core processor or a multi-core processor.
- each processor 210-1 through 210-4 may include a single processing device or a group of processing devices, such as a processor cluster or a computing grid.
- each processor 210-1 through 210-4 may include multiple processors that may be local or remote with respect each other, and may use one or more threads while processing.
- each processor 210-1 through 210-4 may represent a single hardware UE.
- hardware environment 110 may contain fewer, different, or additional components than depicted in Fig. 2.
- hardware environment 110 may include one or more of a bus, a main memory, a read-only memory (ROM), a storage device, an input device, an output device, and/or a communication interface.
- ROM read-only memory
- one or more components of hardware environment 110 may perform one or more other tasks described as being performed by one or more other components of hardware environment 110.
- Fig. 3 is an exemplary diagram of batch environment 120.
- batch environment 120 may include a client 300, a scheduler 310, and hardware UE 200 (including processors 210).
- Hardware UE 200 and processors 210 may perform the same or similar tasks as described above in connection with Fig. 2.
- Client 300 may include one or more entities.
- An entity may be defined as a device, such as a personal computer, a personal digital assistant (PDA), a laptop, or another type of computation or communication device, a thread or process running on one of these devices, and/or an object executable by one of these devices.
- client 300 may include a device capable of sending information to, or receiving information, from another device, such as hardware UE 200.
- client 300 may include a technical computing environment (TCE) 320 and a library 330.
- TCE technical computing environment
- Other implementations of client 300 may contain fewer, different, or additional components than depicted in Fig. 3.
- Technical computing environment (TCE) 320 may include any of the features described above with respect to the term “technical computing environment.”
- Library 330 may include hardware, software, and/or a combination of hardware and software based logic that may operate with TCE 320 to perform certain operations.
- library 330 may store functions to perform certain operations (e.g., signal processing, image processing, parallel processing, data display, etc.) in a text-based environment.
- library 140 may store graphical representations (e.g., blocks, icons, images, etc.) to perform certain operations in a graphically-based environment (e.g., a gain block, a source block, a filter block, a discrete event generator block, etc.).
- Scheduler 310 may include hardware, software, and/or a combination of hardware and software based logic to perform scheduling operations on behalf of a device (e.g., client 300). For example, scheduler 310 may perform operations to select and/or control parallel processing activities performed by hardware UE 200 on behalf of client 300.
- scheduler 310 may receive a job 340, and may distribute or divide job 340 into tasks (e.g., tasks 350-1, 350-2, 350-3, and 350-4).
- Scheduler 310 may send tasks 350-1, 350-2, 350-3, and 350-4 to hardware UE 200 (e.g., to processor 210-1, 210-2, 210-3, and 210-4, respectively) for execution.
- FIG. 4 is an exemplary diagram of parallel processing environment 130.
- parallel processing environment 130 may include technical computing environment 320, a controller 400, and a software unit of execution (UE) 410.
- Technical computing environment 320 may include any of the features described above with respect to the term "technical computing environment.”
- Controller 400 may include hardware, software, and/or a combination of hardware and software based logic to perform controlling operations on behalf of a program. For example, in one implementation, controller 400 may select and/or control parallel processing activities performed by software UE 410 on behalf of technical computing environment 320.
- a software unit of execution may include a software resource (e.g., a worker, a lab, etc.) that performs and/or participates in parallel processing activities.
- a software unit of execution may perform and/or participate in parallel processing activities in response to receipt of a program and/or one or more portions of the program.
- a software unit of execution may perform and/or participate in substantially any type of parallel processing using one or more hardware units of execution.
- a software unit of execution may support one or more threads (or processes) when performing processing operations.
- software UE 410 may include one or more labs (e.g., labs 420-1, 420-2,
- Labs 420 may include any of the features described above with respect to the term "lab.”
- a lab may be similar to a software unit of execution, except on a smaller scale.
- a lab may represent a single software unit of execution.
- technical computing environment 320 may provide a main program
- Controller 400 may provide portions of program 430 (e.g., program portions 440- 1, 440-2, 440-3, and 440-4, collectively referred to as "program portions 440") to labs 420-1, 420-2, 420- 3, and 420-4, respectively, to software UE 410.
- Labs 420 may execute program portions 440, and may provide results to controller 400.
- Lab 420-1 may provide a result 450-1 to controller 400
- lab 420-2 may provide a result 450-2 to controller 400
- lab 420-3 may provide a result 450-3 to controller 400
- lab 420-4 may provide a result 450-4 to controller 400.
- Controller 400 may combine the results into a single result 460, and may provide single result 460 to technical computing environment 320.
- parallel processing environment 130 may contain fewer, different, or additional components than depicted in Fig. 4.
- one or more components of parallel processing environment 130 may perform one or more other tasks described as being performed by one or more other components of parallel processing environment 130.
- Fig. 5 A is an exemplary diagram of functional components of parallel processing interface 140.
- parallel processing interface may include a client 500 that includes a variety of functional components, such as a program provider 510, analysis logic 520, resource allocation logic 530, and/or a results provider 540.
- Client 500 may include one or more entities.
- An entity may be defined as a device, such as a personal computer, a personal digital assistant (PDA), a laptop, or another type of computation or communication device, a thread or process running on one of these devices, and/or an object executable by one of these devices.
- client 500 may include a device capable of providing a parallel processing interface, as described herein.
- Other implementations of client 500 may contain fewer, different, or additional components than depicted in Fig. 5.
- client 500 may include a technical computing environment (e.g., TCE 320) and a library (e.g., library 330).
- Program provider 510 may include hardware, software, and/or a combination of hardware and software based logic that provides one or more programs for execution. For example, in one implementation, program provider 510 may generate programs created using a technical computing environment, as defined above. As shown in Fig. 5, program provider 510 may provide a main program 545 to analysis logic 520.
- Resource allocation logic 530 may receive program portions 550, and may include hardware, software, and/or a combination of hardware and software based logic that dynamically allocates (as indicated by reference number 560) program portions 550 to one or more software UEs (e.g., software UE 410) for parallel execution.
- allocation 560 may be provided to one or more software UEs, and the software UEs may be executed by one or more hardware UEs (e.g., hardware UE 200) in a parallel processing manner.
- allocation 560 may be executed via software UEs and/or hardware UEs of client 500.
- the software UEs may return results 570 of the execution of program portions 550 to results provider 540.
- Results provider 540 may include hardware, software, and/or a combination of hardware and software based logic that receives results 570 from the software UEs, and provides results 570 to program provider 510. In one implementation, results provider 540 may combine results 570 into a single result, and may provide the single result to program provider 510.
- Client 500 may use different control and data passing layers through which it may specify the current behavior of a part or a whole of the parallel processing interface 140.
- client 500 may use a message passing interface (MPI), a Transmission Control Protocol/Internet Protocol (TCP/IP), an Ethernet protocol, and/or other interconnects and protocols for the control and data passing layers.
- MPI message passing interface
- TCP/IP Transmission Control Protocol/Internet Protocol
- Ethernet Ethernet protocol
- client 500 may implement an MPI layer (and/or other data and control layers) on any standard non-guaranteed stream protocol.
- Client 500 may define a sub-group behavior for each of program portions 550.
- a sub-group may include any part of the overall set of processes (e.g., main program 545 and/or program portions 550).
- the sub-group behavior may relate to the parallel processing styles that may be employed on the group of program portions 550.
- client 500 may dynamically change the behavior of one or more of program portions 550 as code is executed for other program portions 550.
- client 500 may use the control layer to change the current state of a sub-group at any time, which may dynamically change the behavior of that portion of the group.
- an application e.g., main program 545) may include different phases (e.g., an input phase, an analysis phase, an output phase, etc.), and parallel processing needs may be different for each phase.
- the sub-group behavior may include an unused state (e.g., the initial state of a process when it is not being used), a user-controlled UE state (e.g., if a user has acquired a process as a UE object), a task parallel state (e.g., an execution state used by parallel processing constructs), a single program, multiple data (SPMD) state (e.g., one or more processes may have a MPI ring between them with appropriate values for rank and size), a stream state (e.g., a state where task allocation may be expressed in a directed acyclic graph (DAG) or a cyclic graph with delays), etc.
- Each of program portions 550 may be in one of the above-mentioned states, and may request other tasks to be placed in a new state.
- the sub-group behavior may include a variety of other states.
- the sub-group behavior may include a delayed debugging state where a task may be executed and delayed in time with respect to another task (or delayed in lines of code).
- a delayed debugging state may permit a breakpoint to be created for one task if another task experiences an error, and may enable a user to see why an error occurred.
- the sub-group behavior may include a release differences state that may execute one or more tasks associated with different releases of a product (e.g., different releases of TCE 320). This may permit behavior differences to be found between different releases of a product, and may permit users to undertake release compatibility studies.
- some state information may be consistent across client 500.
- a source of code may come from one device (e.g., client 500), and a file system associated with the source device may be used across client 500.
- some state information may be consistent across a sub-group of client 500 (e.g., labindex, numlabs, etc.).
- the state information may be automatically transferred from client 500 to software unit of execution 410 and/or labs 420.
- a path is added to a technical computing environment (e.g., TCE 320) of client 500, the path may be automatically added to all TCEs in the parallel environment (e.g., TCEs provided in labs 420).
- client 500 may be interactive in that resource allocation logic 530 may permit a user to dynamically control a current setup (e.g., via scripts, functions, command lines, etc.). Thus, client 500 and its configuration may change based on an actual analysis that the user may be currently undertaking.
- resource allocation logic 530 may be connected to one or more clusters of software UEs 410 and may use processes derived from each of the clusters, as well as client 500, to form the functional components of client 500.
- client 500 may include devices having different architectures and/or operating systems (i.e., client 500 may execute across multiple platforms). For example, client 500 may include a different architecture and/or operating system other than software UE 410.
- main program 545 may be submitted in batch manner to a cluster (e.g., a cluster of software UEs 410 and/or a cluster of labs 420).
- a user may interactively develop main program 545, and may save main program 545 in a file (e.g., an M file).
- a command may exist in main program 545 (e.g., in the M file) that may cause one lab (e.g., one of labs 420) in the cluster to act as a client where the execution of main program 545 initiates.
- Main program 545 may be submitted in batch manner to a cluster (e.g., a cluster of software UEs 410 and/or a cluster of labs 420).
- a user may interactively develop main program 545, and may save main program 545 in a file (e.g., an M file).
- a command may exist in main program 545 (e.g., in the M file) that may cause one lab (e.g., one of labs
- 545 may use four labs 420 and a client (e.g., one of labs 420 acting as a client), may initiate on the client, and may utilize as many labs 420 as necessary to carry out execution.
- a client e.g., one of labs 420 acting as a client
- a special type of job may be created that creates a pool (or cluster) of labs, where one of the initiated processes of the job may act as the client, and rest of processes may be in the pool.
- Web service 580 may provide access to one or more programs (e.g., main program 545 provided by program provider 510, applications accessed by main program 545, etc.).
- a web service may include any software application that allows machine-to-machine communications over a network (e.g., a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), such as the Internet, etc.).
- a web service may communicate with a client (e.g., client 500) using an application program interface (API) that the client may access over the network.
- API application program interface
- the web service may exchange Hypertext Markup Language (HTML), Extensible Markup Language (XML), or other types of messages with the client using industry compatible standards (e.g., simple object access protocol (SOAP)) and/or proprietary standards.
- a web service may further include network services that can be described using industry standard specifications, such as web service definition language (WSDL) and/or proprietary specifications.
- web service 580 may allow a destination (e.g., a computer operated by a customer) to perform parallel processing using hardware, software, and/or a combination of hardware and software UEs that may be operated by a service provider (e.g., client 500). For example, the customer may be permitted access to client 500 to perform parallel processing if the customer subscribes to one of the offered web services.
- the customer may receive web service 580 on a subscription basis.
- a subscription may include substantially any type of arrangement, such as monthly subscription, a per-use fee, a fee based on an amount of information exchanged between the service provider and the customer, a fee based on a number of processor cycles used by the customer, a fee based on a number of hardware UEs, software UEs, etc., used by the customer, etc.
- Fig. 5C is an exemplary diagram of functional components of parallel processing interface 140 in another alternative arrangement.
- the alternative arrangement depicted in Fig. 5C is the same as the arrangement of Fig. 5A, except that analysis logic 520 may be replaced with a parallel processing object API 590.
- Program provider 510, resource allocation logic 530, and/or results provider 540 may operate in the manner as described above in connection with Fig. 5A.
- Parallel processing object API 590 may permit a user to specify how main program 545 may be parallelized.
- Parallel processing object API 590 may cooperate with resource allocation logic 530 and/or an execution mechanism (e.g., software UEs 420) in a similar manner that analysis logic 520 cooperates with these components.
- parallel processing API 590 may offer much more flexibility and/or customization than analysis logic 520.
- Parallel processing API 590 may define and implement an object in a technical computing environment (e.g., TCE 320) that corresponds to another one or more (or set of) executing technical computing environments.
- Parallel processing API 590 may permit customizable parallelism of a program (e.g., main program 545), and may be nested in other calls or function (e.g., in the parallel processing constructs described herein).
- Parallel processing API 590 may be used by other calls as inputs to a calling function so that identification of which labs (e.g., labs 420) to use may be known.
- parallel processing API 590 may be called a MATLAB® unit of execution (or MUE) API.
- the MUE API may define and implement an object in MATLAB® software that corresponds to another one or more of executing MATLAB® software applications.
- the MUE API may be used to permit one technical computing environment to communicate with and control another technical computing environment.
- the MUE API may be used to create groups of processes with certain behaviors (e.g., using the language constructs described herein).
- parallel processing interface 140 may contain fewer, different, or additional functional components than depicted in Figs. 5A-5C.
- one or more functional components of parallel processing interface 140 may perform one or more other tasks described as being performed by one or more other functional components of parallel processing interface 140.
- Fig. 6 is an exemplary diagram of an entity corresponding to client 500 and/or web service 580.
- the entity may include a bus 610, a processing unit 620, a main memory 630, a read-only memory (ROM) 640, a storage device 650, an input device 660, an output device 670, and/or a communication interface 680.
- Bus 610 may include a path that permits communication among the components of the entity.
- Processing unit 620 may include a processor, microprocessor, or other types of processing logic that may interpret and execute instructions.
- processing unit 620 may include a single core processor or a multi-core processor.
- processing unit 620 may include a single processing device or a group of processing devices, such as a processor cluster or computing grid.
- processing unit 620 may include multiple processors that may be local or remote with respect each other, and may use one or more threads while processing.
- processing unit 620 may include multiple processors implemented as hardware UEs capable of running copies of a technical computing environment.
- Input device 660 may include a mechanism that permits an operator to input information to the entity, such as a keyboard, a mouse, a pen, a microphone, voice recognition and/or biometric mechanisms, etc.
- Output device 670 may include a mechanism that outputs information to the operator, including a display, a printer, a speaker, etc.
- Communication interface 680 may include any transceiver- like mechanism that enables the entity to communicate with other devices and/or systems.
- communication interface 680 may include mechanisms for communicating with another device or system via a network.
- the entity depicted in Fig. 6 may perform certain operations in response to processing unit 620 executing software instructions contained in a computer-readable medium, such as main memory 630.
- a computer-readable medium may be defined as a physical or logical memory device.
- the software instructions may be read into main memory 630 from another computer-readable medium, such as storage device 650, or from another device via communication interface 680.
- the software instructions contained in main memory 630 may cause processing unit 620 to perform processes that will be described later.
- hardwired circuitry may be used in place of or in combination with software instructions to implement processes described herein.
- implementations described herein are not limited to any specific combination of hardware circuitry and software.
- Fig. 6 shows exemplary components of the entity
- the entity may contain fewer, different, or additional components than depicted in Fig. 6.
- one or more components of the entity may perform one or more other tasks described as being performed by one or more other components of the entity.
- FIG. 7 illustrates an exemplary parallel processing construct (e.g., a single program, multiple data (SPMD) command 700) capable of being analyzed and transformed to parallel program portions by analysis logic 520 of parallel processing interface 140.
- SPMD command 700 may be created with TCE 320 and provided to analysis logic 520 of client 500.
- SPMD command 700 may be created by another device and/or may be provided to analysis logic 520 of client 500.
- analysis logic 520 may implement SPMD command 700 to generate program portions 550.
- SPMD command 700 may permit users to enter into a SPMD mode.
- SPMD command 700 may support data parallelism whereby a large amount of data may be distributed across multiple software UEs (e.g., software UEs 410 and/or labs 420) via a distributed arrays API. Operations on the distributed arrays may be coordinated through communication between labs 420 that own pieces of the array.
- the general form of SPMD command 700 may include:
- SPMD command 700 may be executed on resources (e.g., software UEs 410 and/or labs 420) that may be defined by a default configuration.
- SPMD command 700 may configure these resources as a communicating ring of labs (e.g., ring of labs 420), which may mean that labs 420 may have a same number of labs (e.g., NUMLABS) 720 defined, each lab 420 may have a unique value (e.g., LABINDEX 730, 740, 750, and 760 for labs 420-1, 420-2, 420-3, 420-4, respectively) between one and NUMLABS 720, labs 420 may send data to and from one another, and/or each lab 420 may include a unique random number generator that creates random number streams independent of one another.
- labs 420 may exchange information among each other when labs 420 are configured and/or executed.
- labs 420 may be cleaned up, which may mean that labs 420 may be restored to ordinary resources (e.g., after the results are received), NUMLABS 720 and LABINDEX 730-760 may set back to one, the random number generators may be set back to a default start value, and/or workspaces may be cleared. There may be no implicit data transfer to and from the workspace where SPMD command 700 is called and the workspaces of labs 420 executing the body of SPMD command 700. An error on any of labs 420 executing the body of SPMD command 700 may cause an error in SPMD command 700. A warning on any of labs 700 executing the body of SPMD command 700 may be displayed on a device (e.g., client 500).
- a device e.g., client 500
- SPMD command 700 of the form SPMD NUMWORKERS, statement, ..., statement, END may execute SPMD command 700 on an anonymous group of a number (e.g., NUMWORKERS) of resources provided within a default resource pool.
- SPMD command 700 of the form SPMD MYWORKERS, statement, ..., statement, END may execute SPMD command 700 on a specified group of resources (e.g., MYWORKERS).
- the syntax [OUTl, OUT2, ...] SPMD(IN1,IN2, ...), statement, .., statement, END may transfer variables (e.g., INl, IN2, ...) from client 500 to workspaces of labs 420 at the beginning of SPMD command 700, and may transfer variables (e.g., OUTl, OUT2, ...) from one of the workspaces back to client 500 at the end of SPMD command 700. If the variable being transferred from client 500 to labs 420 is a distributed array, then the variable may be automatically re-distributed to all labs 420. If the variable being transferred from client 500 is a non-distributed array, then the variable may be replicated on all labs 420.
- variable being transferred from labs 420 to client 500 is a replicated array
- a replicated value may be received from any of labs 420.
- a value may be received from one of labs 420.
- the variable being transferred from labs 420 to client 500 is a distributed array, then the variable may be automatically redistributed to be a distributed array over a single lab 420.
- SPMD command 700 (and its associated syntax) may be implemented via client 500 (e.g. via analysis logic 520 of client 500), software UEs 410 (including labs 420), and/or TCE 320.
- SPMD command 700 (and its associated syntax) may be implemented via other software and hardware logic. SPMD command 700 may increase processing performance by dividing large data sets into pieces, and by providing each piece to different resources. Each resource may execute the same program on its piece of data, and the results may be collected.
- Fig. 7 shows an exemplary parallel processing construct
- analysis logic 520 may contain fewer, different, or additional parallel processing constructs than depicted in Fig. 7.
- the exemplary parallel processing construct may be allocated in other ways than depicted in Fig. 7.
- another parallel processing construct (e.g., a PARFOR command) may be analyzed and transformed to parallel program portions by analysis logic 520 of parallel processing interface 140.
- a PARFOR command may be created with TCE 320 and provided to analysis logic 520 of client 500.
- the PARFOR command may be created by another device and/or may be provided to analysis logic 520 of client 500.
- analysis logic 520 may implement the PARFOR command to generate program portions 550.
- One such parallel processing construct may include a parallel FOR loop (e.g., the PARFOR command).
- the PARFOR command may include the following general form:
- the PARFOR command may be a work sharing construct that executes the loop body for a set of iterations simultaneously by using available resources. To accomplish this, the body of the PARFOR command may be written such that each iteration may be independent of the other iterations (i.e., the loop iterations may be order-independent). The PARFOR command may terminate if all the resources finish executing the loop body for their assigned set of iterations (e.g., program portions 550). Analysis logic 520 may implement the PARFOR command based on the definition that its body is iteration- independent. If execution of the PARFOR command produces unexpected results for a user, an appropriate diagnostic message may be displayed indicating a reason for the unexpected results.
- debugging information e.g., the iteration number, resources that failed, the statement being executed, etc.
- the user device e.g., client 500
- debugging information may be provided to the user device (e.g., client 500) that initiated the PARFOR command. If an error occurs during execution of the PARFOR command, all iterations in progress may be terminated, and new iterations may not be initiated.
- Semantics for the PARFOR command may not be influenced by what happens (e.g., in terms of usage of variables) before or after the PARFOR command section.
- Temporary variables may persist after execution of the PARFOR command.
- the PARFOR command may be optimized to selectively determine which temporary variables may be permitted to persist after execution of the PARFOR command.
- the PARFOR command may be executed on different resources (e.g., software UEs 410, hardware UEs 200, etc.), variables (e.g., loop index, right-hand side variables within the loop body, etc.) that execute the body of the PARFOR command may be transferred to and/or created on such resources.
- variables e.g., loop index, right-hand side variables within the loop body, etc.
- the number of resources to be used with the PARFOR command may be controlled by specifying an optional input to the PARFOR command of the form:
- N may be an integer representing a maximum number of resources to try to use. IfN is not specified, the number of resources to use may be specified via a resource configuration and management utility. If there are not enough resources available to satisfy the specified N, the available resources may be initiated as part of the execution of the PARFOR command.
- Analysis logic 520 may determine variables and/or data of program portions of the PARFOR command to be transferred to software UE 410. Analysis logic 520 may transform the program portions and may transfer variables and/or data based on the determination of the variables and/or data. Analysis logic 520 may provide execution or run time control of how the iterations get allocated to software UE 410 (e.g., labs 420 of software UE 410). For example, in one implementation, client 500 (via resource allocation logic 530) may use a number of allocation strategies to provide run time control of iteration allocation. In other implementations, users may be provided with dynamic options for iteration distribution schemes.
- the program portions of the PARFOR command may be allocated to and/or executed by one or more labs 420 of software UE 410. For example, a first portion of the PARFOR command may be allocated to lab 420-1, a second portion of the PARFOR command may be allocated to lab 420-2, a third portion of the PARFOR command may be allocated to lab 420-3, and/or a fourth portion of the PARFOR command may be allocated to lab 420-4.
- a parallel processing construct (e.g., a PARSECTION command) may be analyzed and transformed to parallel program portions by analysis logic 520 of parallel processing interface 140.
- the PARSECTION command may be created with TCE 320 and provided to analysis logic 520 of client 500.
- the PARSECTION command may be created by another device and/or may be provided to analysis logic 520 of client 500.
- analysis logic 520 may implement the PARSECTION command to generate program portions.
- the PARSECTION command may include the following exemplary syntax: parsection (4)
- One such parallel processing construct may include a parallel SECTION command (e.g., the PARSECTION command).
- the PARSECTION command may include the following general form:
- Analysis logic 520 may determine independent segments or sections of code associated with the program portions. For example, in one implementation, analysis logic 520 may perform a dependency analysis on the sections of the code to determine independent sections. Analysis logic 520 may analyze the PARSECTION command and may determine sections of the code to be executed together and sections of the code that may undergo staggered execution. Analysis logic 520 may determine sections of the code to allocate to software UE 410 (e.g., labs 420 of software UE 410), and/or results to be returned at the end of the PARSECTION command.
- software UE 410 e.g., labs 420 of software UE 410
- the PARSECTION command may be allocated to and/or executed by one or more labs 420 of software UE 410. For example, a first portion of the PARSECTION command may be allocated to lab 420-1, a second portion of the PARSECTION command may be allocated to lab 420-2, a third portion of the PARSECTION command may be allocated to lab 420-3, and/or a fourth portion of the PARSECTION command may be allocated to lab 420-4.
- a SPMD parallel processing construct may provide a place holder for a single program that may be executed on one or more labs.
- the code for the SPMD construct may be provided to the labs, and workspace contents available to the SPMD construct may be determined on the client.
- the SPMD constructs described herein may be easy to use (e.g., may make it easy to mark code to execute in parallel and may make it easy to send ordinary variables into the SPMD), may support a user (e.g., a programmer) by performing minimal data transfer through remote references or similar mechanisms, and may provide sufficient richness to allow for remote distributed arrays.
- the concept of parallel resource sets may be a building block for the behavior of the SPMD construct.
- a parallel resource set may include a set of labs such that the labs may be available to execute parallel code, the labs may be connected in a MPI ring, and each of the labs may include a value store that can store values of variables.
- a parallel context may include a combination of a parallel resource set with a parallel code block, and may include variables associated with the parallel code block.
- Fig. 8 illustrates an exemplary diagram 800 of a parallel processing construct (a SPMD command
- SPMD command 810 may include an outer parallel context 820, a SPMD body (or inner parallel context) 830, and SPMD boundaries 840.
- Outer parallel context 820 may include syntax or code provided outside a spmd statement and an end statement (e.g., outside SPMD boundaries 840). In one exemplary implementation, outer parallel context 820 may be executed sequentially (e.g., by client 500), or may be executed in parallel (e.g., by labs 420).
- SPMD body 830 may include syntax or code provided inside the spmd statement and the end statement (e.g., inside SPMD boundaries 840).
- SPMD body 830 may be provided to two or more labs (e.g., labs 420), and may be executed in parallel by the two or more labs.
- SPMD boundaries 840 may be defined by the spmd statement and the end statement of SPMD command 810. As described above, SPMD boundaries 840 may define outer parallel context 820 and inner parallel context (e.g., SPMD body 830) associated with SPMD command 810.
- SPMD command 810 may be provided to analysis logic 520.
- Analysis logic 520 may receive SPMD command 810, and may analyze SPMD command 810 to determine outer parallel context 820 and inner parallel context 830.
- analysis logic 520 may analyze SPMD command 810 to determine input variables 850 associated with SPMD command 810.
- Input variables 850 may include variables used within SPMD body 830 but before they are assigned values.
- analysis logic 520 may determine input variables 850 upon entering the spmd statement, and may attempt to transfer input variables from outer parallel context 820 into the inner parallel context (e.g., SPMD body 830).
- Analysis logic 520 may allocate one or more portions of the inner parallel context (e.g., SPMD body 830) and input variables 850 to labs 420 for parallel execution. If analysis logic 520 determines that no resources (e.g., labs 420) are available for parallel execution, as indicated by reference number 860, client 500 may sequentially execute outer parallel context 820 and SPMD body 830.
- Fig. 8 shows an exemplary parallel processing construct
- client 500 may contain fewer, different, or additional parallel processing constructs than depicted in Fig. 8.
- Fig. 9 depicts a flow chart of an exemplary process 900 capable of being performed by analysis logic 520 and/or resource allocation logic 530.
- process 900 may begin with a determination of whether a SPMD block contains variants as input variables (block 910).
- analysis logic 520 may determine if a SPMD command (e.g., SPMD command 810) includes variants as input variables.
- a variant may include information about a parallel resource set, such as a remote reference to a parallel resource set.
- a value of a variant may be stored on each participating lab (e.g., labs 420), and may be cleared from the storage after the variant goes out of scope in the outer parallel context.
- analysis logic 520 may determine whether a SPMD command (e.g., SPMD command 810) includes variants as variables, and may determine whether the variants correspond to the same resource set (e.g., labs 420). Otherwise (block 910 - NO), it may be determined if a pool parallel resource set exists (block 930). For example, in one implementation, analysis logic 520 may determine if a pool parallel resource set exists for a SPMD command (e.g., SPMD command 810).
- a SPMD command e.g., SPMD command 810
- analysis logic 520 may use a resource set (e.g., labs 420) for a SPMD command (e.g., SPMD command 810) if the input variables correspond to the same resource set (e.g., labs 420), and may generate an error if the input variables do not correspond to the same resource set.
- a resource set e.g., labs 420
- SPMD command 810 e.g., SPMD command 810
- the pool parallel resource set may be used for the SPMD block (block 960). Otherwise (block 930 - NO), it may be determined whether to create a new pool parallel resource set (block 970). For example, in one implementation, resource allocation logic 530 may determine if pool parallel resource set exists in a SPMD command (e.g., SPMD command 810), and may use the pool parallel resource set for SPMD command 810. If the pool parallel resource set does not exist, resource allocation logic 530 may determine whether to create a new pool parallel resource set for SPMD command 810. If a new pool parallel resource set is to be created (block 970 - YES), the new pool parallel resource set may be created from a technical computing environment pool (block 980).
- a SPMD command e.g., SPMD command 810
- the SPMD block may be executed by technical computing environment 320 (block 990).
- resource allocation logic 530 may determine that a new pool parallel resource set is to be created, and may create the new pool parallel resource set from a pool associated with technical computing environment 320.
- Resource allocation logic 520 may use technical computing environment 320 to execute SPMD command 810 if a new pool parallel resource set is not to be created.
- Fig. 10 illustrates an exemplary diagram 1000 of a parallel processing construct (e.g., a SPMD command 1010) capable of being generated by technical computing environment 320.
- SPMD command 1010 may include a SPMD body (or inner parallel context) 1020 and input variables 1030.
- SPMD body 1020 may include syntax or code provided inside a spmd statement and an end statement.
- SPMD body 1020 may be provided to two or more labs (e.g., labs 420), and may be executed in parallel by the two or more labs.
- Input variables 1030 may include variables used within SPMD body 1020 but before they are assigned values.
- input variables 1030 may include a minimum number of labs to use (e.g., minN), a maximum number of labs to use (e.g., maxN), etc.
- SPMD command 1010 may be provided to analysis logic 520 and/or resource allocation logic 530.
- Analysis logic/resource allocation logic 520/530 may receive SPMD command 1010, and may analyze SPMD command 1010.
- analysis logic 520 may analyze SPMD command 1010 to determine input variables 1030 associated with SPMD command 1010.
- analysis logic 520 may determine input variables 1030 upon entering the spmd statement.
- Resource allocation logic 530 may provide SPMD body 1020 and input variables 1030 to labs 420 for parallel execution.
- FIG. 10 shows an exemplary parallel processing construct
- fewer, different, or additional parallel processing constructs than depicted in Fig. 10 may be used.
- Figs. 1 IA and 1 IB depict a flow chart of an exemplary process 1100 capable of being performed by analysis logic 520 and/or resource allocation logic 530.
- process 1100 may begin with reception of a SPMD block (block 1105), and a determination of whether the SPMD block contains variants as input variables (block 1110).
- analysis logic 520 may receive a SPMD command (e.g., SPMD command 1010), and may determine if the SPMD command (e.g., SPMD command 1010) includes variants as input variables.
- analysis logic 520 may determine that a SPMD command (e.g., SPMD command 1010) includes variants as variables, and may determine whether the variants correspond to the same resource set (e.g., labs 420). Otherwise (block 1110 - NO), it may be determined if the SPMD block is asking for spmd(O) (block 1120). For example, in one implementation, analysis logic 520 may determine if a SPMD command (e.g., SPMD command 1010) is asking for a SPMD block (e.g., a spmd(0) construct).
- SPMD command e.g., SPMD command 1010
- resource allocation logic 530 may use a resource set (e.g., labs 420) for a SPMD command (e.g., SPMD command 1010) if the input variables correspond to the same resource set (e.g., labs 420), and may generate an error if the input variables do not correspond to the same resource set. If the SPMD block is asking for spmd(0) (block 1120 - YES), the SPMD block may be executed by technical computing environment 320 (block 1135).
- a resource set e.g., labs 420
- SPMD command 1010 e.g., SPMD command 1010
- TCE 320 may execute a SPMD command (e.g., SPMD command 1010) if the SPMD command is asking for a SPMD block (e.g., a spmd(0) construct). If the SPMD command is not asking for a SPMD block (e.g., a spmd(O) construct), analysis logic 520 may determine whether a pool parallel resource set exists for SPMD command 810.
- resource allocation logic 530 may determine whether a pool parallel resource set matches constraints associated with a SPMD command (e.g., SPMD command 1010), or may determine if a new pool parallel resource set may be created for SPMD command 1010 (e.g., from a pool associated with technical computing environment 320) that matches the constraints.
- the pool parallel resource set may be used for the SPMD block (block 1155). If a new pool parallel resource set is to be created for the SPMD block (block 1150 - YES), the new pool parallel resource set may be created (block 1160). Otherwise (block 1145 - NO or block 1150 - NO), it may be determined if a TCE satisfies the constraints (block 1165). For example, in one implementation, if analysis logic 520 determines that a pool parallel resource set matches the constraints associated with SPMD command 1010, resource allocation logic 530 may use the pool parallel resource set for SPMD command 1010.
- resource allocation logic 530 may create a new pool parallel resource set for SPMD command 1010. In still another example, if resource allocation logic 530 determines that a new pool parallel resource set is not to be created, resource allocation logic 530 may determine whether TCE 320 with SPMD command 1010 satisfies constraints associated with SPMD command 1010.
- Fig. 12 illustrates an exemplary diagram of functional components of analysis logic 520 for determining input and output variables.
- analysis logic 520 may include an input variable determiner 1200 and an output variable determiner 1210.
- Input variable determiner 1200 and output variable determiner 1210 may permit detection of lexical information or scope (e.g., input and output variables), and sharing of lexical information across the inner and outer parallel contexts of a SPMD command.
- Input variable determiner 1200 may include hardware, software, and/or a combination of hardware and software based logic that detects input variables, such as variables that are used in a SPMD body before they are assigned values. For example, in one implementation, upon entering a spmd statement, input variable determiner 1200 may determine input variables to the SPMD block. As shown in Fig. 12, a SPMD command 1220 may be received by input variable determiner 1200, and input variable determiner 1200 may determine that a variable (e.g., x) associated with SPMD command 1220 is an input variable, as indicated by reference number 1230.
- Output variable determiner 1210 may include hardware, software, and/or a combination of hardware and software based logic that detects output variables, such as variables assigned within the SPMD body.
- analysis logic 520 may contain fewer, different, or additional functional components than depicted in Fig. 12.
- one or more functional components of analysis logic 520 may perform one or more other tasks described as being performed by one or more other functional components of analysis logic 520.
- the data transfer rules may include a crossing spmd: input variables data transfer rule 1310, a crossing end: variants pointing to output variables data transfer rule 1320, a crossing end: complete/incomplete variant output variables data transfer rule 1330, a crossing end: more complete variant output variables data transfer rule 1340, a crossing end: disallow variants to variants data transfer rule 1350, a crossing spmd: variants as input variables data transfer rule 1360, and/or a crossing spmd: non-variants as input variables data transfer rule 1370.
- a crossing spmd input variables data transfer rule 1310
- a crossing end variants pointing to output variables data transfer rule 1320
- a crossing end complete/incomplete variant output variables data transfer rule 1330
- a crossing end more complete variant output variables data transfer rule 1340
- a crossing end disallow variants to variants data transfer rule 1350
- a crossing spmd variants as input variables data transfer rule 1360
- a crossing spmd non-variants as
- an outer parallel context e.g., outside a spmd statement and an end statement pair
- an inner parallel context e.g., inside the spmd statement and an end statement pair
- an input variable may include a same value after the end statement as it was before the spmd statement, may include a same class after the end statement as it was before the spmd statement, and/or may include a same attribute(s) (e.g., sparsity) after the end statement as it was before the spmd statement.
- a variable that does not include a value on all labs may be referred to as an incomplete variant.
- data transfer rule 1330 e.g., crossing end: complete/incomplete variant output variables
- end statement i.e., upon returning to a subsequent outer parallel context
- a value associated with a complete output variable e.g., complete output variable x
- SPMD block may be discarded.
- variable x may include its original value (e.g., "0") on a lab to which it is unassigned (e.g., labs less than or equal to "5") and may include a value (e.g., "1") on assigned labs (e.g., labs greater than "5").
- Variable y may include its original value (e.g., "0") on all labs since it is not assigned to any labs.
- data transfer rule 1360 e.g., crossing spmd: variants as input variables
- a variant to be used as an input variable is not defined in a parallel resource set, an error may be generated.
- Fig. 14 illustrates an exemplary conversion application program interface (API) 1400 capable of being provided by client 500.
- conversion API 1400 may include a function invoked in inner parallel context when crossing end 1410, a function invoked in outer parallel context when crossing end 1420, a function invoked in outer parallel context when crossing spmd 1430, and a function invoked in inner parallel context when crossing spmd 1440.
- Function 1430 may receive a reference (e.g., x) from SPMD command 1470, may be invoked in an outer parallel context when crossing a spmd statement, and may return a function handle to a function that updates actual data in the inner parallel context and input data for that function.
- Update function and input data may be used by function 1440.
- Function 1440 may invoke the update function in the inner parallel context when crossing a spmd statement, and may return x as an input variable (or data), as indicated by reference number 1480.
- Function 1440 may receive the input data, may update the input data, and may return updated data (input variable x).
- Fig. 14 shows exemplary functions associated with conversion API 1400, in other implementations, conversion API 1400 may contain fewer, different, or additional functions than depicted in Fig. 14.
- Error detection logic 1500 may include hardware, software, and/or a combination of hardware and software based logic that receives an error 1540 from a lab (e.g., one of labs 420), and provides error 1540 to interrupt SPMD block 1510.
- a lab e.g., one of labs 420
- Interrupt SPMD block logic 1510 may include hardware, software, and/or a combination of hardware and software based logic that receives error 1540 from error detection logic 1500, and interrupts execution of a SPMD block, as indicated by reference number 1550.
- interrupt SPMD block logic 1510 may provide interrupt 1550 to analysis logic 520, and analysis logic 520 may interrupt execution of a SPMD block on labs (e.g., one or more of labs 420) in an inner parallel context.
- interrupt SPMD block logic 1510 may provide interrupt 1550 to transfer output variables logic 1520.
- Transfer output variables logic 1520 may include hardware, software, and/or a combination of hardware and software based logic that receives interrupt 1550 from interrupt SPMD block 1510, and transfers output variables from the inner parallel context into the outer parallel context associated with the SPMD block, as indicated by reference number 1560.
- transfer output variable logic 1520 may use states associated with the output variables before error 1540 is generated and/or interrupt 1550 is generated.
- transfer output variables logic 1520 may provide transfer 1560 to generate exception logic 1530.
- Generate exception logic 1530 may include hardware, software, and/or a combination of hardware and software based logic that receives transfer 1560 from transfer output variables logic 1520, and generates an exception 1570 in the outer parallel context of the SPMD block.
- exception 1570 may include information, such as a labindex of a lab that generated error 1540, an error message, etc.
- Fig. 15 shows exemplary functional components of client 500
- client 500 may contain fewer, different, or additional functional components than depicted in Fig. 15.
- one or more functional components of client 500 may perform one or more other tasks described as being performed by one or more other functional components of client 500.
- Fig. 16 illustrates an exemplary diagram 1600 of creating a pool 1610 that may include one or more labs (e.g., labs 420-1, ..., 420-4), and of interacting with pool 1610 via parallel processing construct 700 created by technical computing environment 320.
- exemplary diagram 1600 may depict an implementation of resource allocation logic 530, and may be combined with the implementations of analysis logic/resource allocation logic 520/530 depicted in Fig. 9 and/or Figs. 1 IA and HB.
- An entire pool of labs may be used for parallel processing (e.g., for SPMD processing).
- an undecorated spmd statement e.g., a spmd statement without arguments
- client 500 may limit a number of labs for the SPMD block.
- client 500 e.g., via SPMD command 700
- Each of idle sub-pools 1620 and 1630 may include one or more of the following exemplary properties.
- Each of idle sub-pools 1620 and 1630 may be empty, and, if a lab is included in one or more idle sub-pools 1620 and 1630, the lab may include an empty idle sub-pool.
- Each lab in pool 1610 may belong to a single idle sub-pool.
- lab 420-1 may belong to idle sub-pool 1620, but may not belong to idle sub-pool 1630.
- Idle sub-pools 1620 and 1630 need not encompass the entire pool 1610 of labs since client 500 may create additional labs from pool 1610.
- an idle sub-pool associated with client 500 may include all the labs (e.g., labs 420-1, ..., 420-4) in pool 1610.
- the pool of labs may contain fewer, different, or additional labs than depicted in Fig. 16.
- Figs. 17A and 17B depict an exemplary diagram 1700 of providing a desired number of labs to an idle sub-pool of labs, and of restoring the idle sub-pool of labs.
- technical computing environment 320 may create a SPMD command 1710 that includes an inner parallel resource set and uses labs from an idle sub-pool 1720.
- idle sub-pool 1720 may include "210" labs, including the ten labs (e.g., labs 420-1, ..., 420-10) depicted in Fig. 17A.
- Each of the ten labs may include a size of twenty sub-labs
- idle sub-pool 1720 may include a size of "210" labs.
- SPMD command 1710 may seek to create an inner parallel resource set with ten desired labs 1730, and may create the inner parallel resource set as follows.
- SPMD command 1710 may subtract the number of desired labs 1730 from the number of labs (e.g., "210") contained in idle sub-pool 1720, and may divide the result (e.g., "200") by the number of desired labs 1730 to determine a particular number (e.g., "20").
- the particular number may be used to divide the inner parallel resource set, associated with SPMD command 1710, among labs 420-1, ..., 420-10 of idle sub-pool 1720. For example, as shown in Fig.
- each of labs 420-1, ..., 420-10 may include a portion of the inner parallel resource set, associated with SPMD command 1710, that is less than or equal to "20.”
- a remaining portion of idle sub-pool 1720 may be allocated to the number of desired labs 1730 for future use.
- SPMD command 1710 may restore idle sub-pool 1720 to its original size, as indicated by reference number 1740.
- idle sub-pool 1720 may include the same number of labs as it had before the inner parallel resource set was created.
- idle sub-pool 1720 may include "210" labs, and the "210" labs may include empty idle sub-pools, including the labs (e.g., labs 420-1, ..., 420-10) that executed SPMD command 1710.
- Figs. 17A and 17B shows exemplary creation and interaction with an idle sub-pool of labs
- the idle sub-pool of labs may contain fewer, different, or additional labs than depicted in Figs. 17A and 17B.
- Fig. 18 depicts an exemplary operation of resource allocation logic 530 during nesting of parallel processing constructs (e.g., SPMD and PARFOR).
- a main program 1800 may be generated by client 500 (e.g., from program provider 510) and may be provided to resource allocation logic 530.
- resource allocation logic 530 may use three helpers (or another number of helpers) to handle portions of main program 1800, and each helper may use three more helpers (or another number of helpers) to execute other portions (or portions within portions) of main program 1800.
- Resource allocation logic 530 may allocate main program 1800 as depicted in Fig. 18, where each helper may be associated with a resource (e.g., labs 420, not shown).
- resource allocation logic 530 may use two SPMD helpers 1810-1 and 1810-2 and a PARFOR helper 1820-1 to handle portions of main program 1800.
- Resource allocation logic 530 may cause SPMD helpers 1810- 1 and 1810-2 and PARFOR helper 1820- 1 to each use three helpers to execute other portions (or portions within portions) of main program 1800.
- SPMD helper 1810-1 may use SPMD helpers 1810-2, 1810-3, and 1810-4
- SPMD helper 1810-2 may use PARFOR helpers 1820-2, 1820-3, and 1820-4
- PARFOR helper 1820-1 may use SPMD helpers 1810-6, 1810-7, and 1810-8, respectively.
- resource allocation logic 530 may implement a variety of allocation strategies, such as a user-controlled allocation strategy, a top-down allocation strategy, a dynamic allocation strategy, a global allocation strategy, and/or an adaptive allocation strategy.
- technical computing environment 320 may include an idle pool of "210" labs, and may be asked to execute an SPMD block using "10" labs.
- TCE 320 may subtract “10" labs from its idle pool (e.g., labs 420-1, ..., 420-10, as shown in Fig. 17A), may divide the remaining "200" labs (e.g., labs 420-11, ..., 420-210) into “10” idle sub-pools, and may allocate them to the "10” labs (e.g., labs 420-1, ..., 420-10).
- the "10" labs e.g.
- Fig. 18 shows exemplary operations of resource allocation logic 530, in other implementations, resource allocation logic 530 may include fewer, different, or additional operations than depicted in Fig. 18.
- Fig. 19 depicts an exemplary diagram 1900 of controlling lifetimes of variables with a parallel processing construct.
- a parallel processing construct e.g., SPMD command 1910
- SPMD command 1910 may be executed by a lab (e.g., lab 420-1).
- Variable x may exist on lab 420-1 when a first spmd statement of SPMD command 1910 is encountered, as indicated by reference number 1920, because variable x is referenced in later portions of SPMD command 1910.
- Variable x may to continue exist on lab 420-1 after a first end statement of SPMD command 1910, as indicated by reference number 1930, because variable x is referenced in later portions of SPMD command 1910.
- Variable y may exist on lab 420-1 when a second spmd statement of SPMD command 1910 is encountered, as indicated by reference number 1940, because variable y is referenced in later portions of SPMD command 1910.
- Variables x andy may cease to exist on lab 420-1 (and lab 420-1 may be available) after a second end statement of SPMD command 1910, as indicated by reference number 1950, because variables x and y are no longer referenced in later portions of SPMD command 1910.
- Fig. 19 shows exemplary variable lifetime control with a parallel processing construct, in other implementations, a lifetime of a variable may depend upon an amount of resources (e.g., labs) available for use.
- Fig. 20 illustrates an exemplary diagram 2000 of execution of a parallel processing construct.
- a parallel processing construct e.g., SPMD command 2010
- SPMD command 2010 may include a SPMD body 2020 that may not be executed by a lab (e.g., lab 420- 1) until an end statement is entered, as indicated by reference number 2030. After the end statement is entered, lab 420-1 may execute SPMD body 2020 and may return a result (e.g., "Hello World") 2040.
- a lab e.g., lab 420- 1
- result e.g., "Hello World
- Fig. 21 depicts an alternative exemplary diagram 2100 of execution of a parallel processing construct.
- a parallel processing construct e.g., SPMD command 2110
- SPMD command 2110 may be generated by technical computing environment 320, and may include the following syntax: spmd disp('Hello '); disp('World'); end
- SPMD command 2110 may include a SPMD body with a first portion (e.g., disp('Hello ')) that may be executed by a lab (e.g., lab 420-1) before an end statement is entered, as indicated by reference number 2120.
- Lab 420-1 may execute the first portion and may return a first result (e.g., "Hello") 2130.
- the SPMD body of SPMD command 2110 may also include a second portion (e.g., disp('World')) that may be executed by a lab (e.g., lab 420-1) before the end statement is entered, as indicated by reference number 2140.
- Lab 420-1 may execute the second portion and may return a second result (e.g., "World”) 2150.
- FIGS. 20 and 21 depict exemplary execution timing arrangements associated with a parallel processing construct, in other implementations, other execution timing arrangements may be utilized for the parallel processing construct.
- process 2200 may be performed by client 500. In other implementations, process 2200 may be performed by another device or combination of devices (e.g., client 500 in conjunction with web service 580).
- process 2200 may begin with receipt or creation of a program (or main program) (block 2210).
- program provider 510 may include hardware, software, and/or a combination of hardware and software based logic that provides one or more programs (e.g., main program 545) for execution.
- program provider 510 may generate or receive programs created using a technical computing environment.
- the program may be analyzed (block 2220), and inner and outer contexts of the program may be determined based on the analysis of the program (block 2230).
- analysis logic 520 of client 500 may include hardware, software, and/or a combination of hardware and software based logic that analyzes main program 545.
- SPMD command 810 may be provided to analysis logic 520.
- Analysis logic 520 may receive SPMD command 810, and may analyze SPMD command 810 to determine outer parallel context 820 and inner parallel context 830.
- analysis logic 520 may perform a language analysis of a program (e.g., SPMD command 810), and may determine a separation between inner parallel context 820 and outer parallel context 830.
- Analysis logic 520 may identify SPMD blocks (e.g., SPMD command) and/or input and output variables associated with the SPMD blocks.
- SPMD command may include a crossing spmd: input variables data transfer rule 1310, a crossing end: variants pointing to output variables data transfer rule 1320, a crossing end: complete/incomplete variant output variables data transfer rule 1330, a crossing end: more complete variant output variables data transfer rule 1340, a crossing end: disallow variants to variants data transfer rule 1350, a crossing spmd: variants as input variables data transfer rule 1360, and/or a crossing spmd: non-variants as input variables data transfer rule 1370.
- a crossing spmd input variables data transfer rule 1310
- a crossing end variants pointing to output variables data transfer rule 1320
- a crossing end complete/incomplete variant output variables data transfer rule 1330
- a crossing end more complete variant output variables data transfer rule 1340
- a crossing end disallow variants to variants data transfer rule 1350
- a crossing spmd variants as input variables data transfer rule 1360
- a crossing spmd non-variants as input
- lexical information may be shared across the inner and outer contexts of the program (block 2260), and the one or more program portions may be allocated to one or more labs for parallel execution (block 2270).
- analysis logic 520 may allocate one or more portions of the inner parallel context (e.g., SPMD body 830) of SPMD command 810 and input variables 850 to labs 420 for parallel execution.
- Input variable determiner 1200 and output variable determiner 1210 may permit detection of lexical information (e.g., input and output variables), and sharing of lexical information across the inner and outer parallel contexts of a SPMD command.
- input variable determiner 1200 may detect input variables, such as variables that are used in a SPMD body before they are assigned.
- output variable determiner 1210 may detect output variables, such as variables assigned within the SPMD body.
- the determined input variables associated, associated with the identified SPMD blocks (e.g., block 2220), may be used as input to resource allocation logic 530.
- Resource allocation logic 530 may utilize the input variables to perform the functions described above in connection with Figs. 9, 1 IA, and 1 IB. As a result of these functions, resource allocation logic 530 may determine where to execute a body of a SPMD block (i.e., what parallel resource set (e.g., sub-pool) to use).
- resource allocation logic 530 may transfer the input variables (e.g., block 2250) into labs in the parallel resource set, and may transfer the SPMD body (e.g., block 2260) into the labs in the parallel resource set.
- the SPMD body may be executed on the labs in the parallel resource set.
- one or more results associated with parallel execution of the one or more program portions may be received from the one or more labs (block 2280), and the one or more results may be provided to the program (block 2290).
- results provider 540 of client 500 may receive results 570 from the labs, and may provide results 570 to program provider 510.
- results provider 540 may combine results 570 into a single result, and may provide the single result to program provider 510.
- Process block 2220 may include the process blocks illustrated in Fig. 23. As shown in Fig.
- process block 2220 may include determining one or more input variables associated with the inner and outer contexts of the program (block 2300), and determining one or more output variables associated with the inner and outer contexts of the program (block 2310).
- analysis logic 520 may include input variable determiner 1200 and output variable determiner 1210.
- Input variable determiner 1200 may detect input variables, such as variables that are used in a SPMD body. In one example, upon entering a spmd statement, input variable determiner 1200 may determine input variables to the SPMD block.
- Output variable determiner 1210 may detect output variables, such as variables assigned within the SPMD body. In one example, upon reaching a spmd statement, output variable determiner 1210 may determine output variables from the SPMD block.
- process block 2220 may include transferring the one or more input variables from the outer context to the inner context of the program (block 2320), and transferring the one or more output variables from the inner context to the outer context of the program (block 2330).
- process block 2220 may include transferring the one or more input variables from the outer context to the inner context of the program (block 2320), and transferring the one or more output variables from the inner context to the outer context of the program (block 2330).
- FIG. 13 may be implemented by analysis logic 520 of client 500, and may include a crossing spmd: input variables data transfer rule 1310, a crossing end: variants pointing to output variables data transfer rule 1320, a crossing end: complete/incomplete variant output variables data transfer rule 1330, a crossing end: more complete variant output variables data transfer rule 1340, a crossing end: disallow variants to variants data transfer rule 1350, a crossing spmd: variants as input variables data transfer rule 1360, and/or a crossing spmd: non-variants as input variables data transfer rule 1370.
- a crossing spmd input variables data transfer rule 1310
- FIG. 13 the data transfer rules depicted in Fig.
- an outer parallel context e.g., outside a spmd statement and an end statement pair
- an inner parallel context e.g., inside the spmd statement and an end statement pair
- process block 2220 may include the process blocks illustrated in Fig. 24. As shown in Fig. 24, process block 2220 may include returning a variant constructor function with a function invoked in the inner context of the program (block 2400), and invoking the variant constructor function in the outer context of the program to generate a variant (block 2410).
- conversion API 1400 may be provided by client 500, and may include function 1410 invoked in inner parallel context when crossing end, and function 1420 invoked in outer parallel context when crossing end. Function 1410 may receive data
- SPMD command 1450 may be invoked in an inner parallel context when crossing an end statement, and may return a function handle to a variant constructor function and input data that may be used by function 1420.
- Function 1420 may invoke the variant constructor function in the outer parallel context when crossing an end statement, and may return x as a variant (or a reference), as indicated by reference number 1460.
- process block 2220 may include returning an update function and input data with a function invoked in the outer context of the program (block 2420), and invoking the update function in the inner context of the program to receive the input data, update the input data, and return updated data (block 2430).
- conversion API 1400 may be provided by client 500, and may include function 1430 invoked in outer parallel context when crossing spmd, and function 1440 invoked in inner parallel context when crossing spmd.
- Process block 2250 may include the process blocks illustrated in Figs. 25A and 25B. As shown in Fig. 25A, process block 2250 may include preserving a value, class, and attribute of an input variable associated with the outer context of the program (block 2500), providing an output variable from the inner context to the outer context of the program as a variant (block 2510), and/or discarding (or replacing) one or more complete output variables after crossing the boundary (block 2520). For example, in implementations described above in connection with Fig. 13, according to data transfer rule 1310 (e.g., crossing spmd: input variables), when entering a SPMD block, variables created or available in an outer parallel context may be automatically transferred to remote labs (e.g., labs 420) executing an inner parallel context.
- data transfer rule 1310 e.g., crossing spmd: input variables
- An input variable may include a same value after the end statement as it was before the spmd statement, may include a same class after the end statement as it was before the spmd statement, and/or may include a same attribute(s) (e.g., sparsity) after the end statement as it was before the spmd statement.
- output variables e.g., output variable x
- output variables may be sent as references from the inner parallel context to the outer parallel context.
- output variables e.g., output variable x
- output variables may be of class variant.
- an incomplete output variable (e.g., incomplete output variable y) may be brought into the outer parallel context as an incomplete variant.
- data transfer rule 1340 e.g., crossing end: more complete variant output variables
- a variable includes a value before entering a SPMD block
- the value of the variable after the SPMD block may be a variant class.
- the variable may include its original value on a lab to which it is unassigned.
- a variant may associate a pre-existing value of a variable in an outer parallel context (if any) to a lab where an output variable was not assigned.
- variables x andy include a value of "0" before the spmd statement (e.g., before entering the SPMD block)
- variable x may include its original value (e.g., "0") on a lab to which it is unassigned (e.g., labs less than or equal to "5") and may include a value (e.g., "1") on assigned labs (e.g., labs greater than "5").
- Variable y may include its original value (e.g., "0") on all labs since it is not assigned to any labs.
- process block 2250 may include generating an error when the inner context of the program includes a variant pointing to another variant (block 2560), preventing execution of the inner context of the program that includes a variant pointing to another variant (block 2570), and/or assigning a value of an input reference variable in the outer context of the program to an input variable in the inner context, otherwise the input variable is undefined (block 2580).
- block 2560 generating an error when the inner context of the program includes a variant pointing to another variant
- block 2570 preventing execution of the inner context of the program that includes a variant pointing to another variant
- data transfer rule 1350 e.g., crossing end: disallow variants to variants
- a user may be prevented from generating code that includes variants pointing to variants by generating an error on first use of variants pointing to variants as input variables to SPMD blocks, and by not permitting the user to obtain a value of a variant that points to a variant.
- data transfer rule 1360 e.g., crossing spmd: variants as input variables
- an error may be generated. Otherwise, for each lab in the inner parallel context, if an input reference variable in the outer parallel context includes a reference to a value on the lab, an input variable in the inner parallel context may store the value.
- Process block 2270 may include the process blocks illustrated in Fig. 26. As shown in Fig. 26, process block 2270 may include receiving an error from one lab (block 2600), and interrupting execution of the inner context of the program (block 2610).
- client 500 may include error detection logic 1500, interrupt SPMD block logic 1510, transfer output variables logic 1520, and/or generate exception logic 1530.
- Error detection logic 1500 may receive an error 1540 from a lab (e.g., one of labs 420), and may provide error 1540 to interrupt SPMD block 1510.
- Interrupt SPMD block logic 1510 may receive error 1540 from error detection logic 1500, and may interrupt execution of a SPMD block, as indicated by reference number 1550.
- interrupt SPMD block logic 1510 may provide interrupt 1550 to analysis logic 520, and analysis logic 520 may interrupt execution of a SPMD block on labs (e.g., one or more of labs 420) in an inner parallel context.
- process block 2270 may transfer output variables from the inner context of the program to the outer context of the program (block 2620), and may generate an exception associated with the error (block 2630).
- transfer output variables logic 1520 may receive interrupt 1550 from interrupt SPMD block 1510, and may transfer output variables from the inner parallel context into the outer parallel context associated with the SPMD block, as indicated by reference number 1560.
- transfer output variable logic 1520 may use states associated with the output variables before error 1540 is generated and/or interrupt 1550 is generated.
- Generate exception logic 1530 may receive output variables 1560 from transfer output variables logic 1520, and may generate an exception 1570 in the outer parallel context of the SPMD block.
- exception 1570 may include information, such as a labindex of a lab that generates error 1540, an error message, etc.
- process block 2270 may include the process blocks illustrated in Fig. 27. As shown in Fig. 27, process block 2270 may include defining an idle sub-pool of one or more labs distinct from the other labs (block 2700), determining a desired number of lab(s) for parallel execution (block 2710), and allocating the one or more portions of the program to the desired number of labs(s) from a portion of the idle sub-pool (block 2720). For example, in implementations described above in connection with Figs.
- client 500 may establish one or more idle sub-pools 1620 and 1630 of labs (e.g., labs 420) from pool 1610 for a particular SPMD block (e.g., SPMD command 700).
- Each of idle sub-pools 1620 and 630 may include a set of labs from pool 1610 that a parallel process (e.g., SPMD command 700) may have at its disposal for performing computations.
- Idle sub-pool 1720 may include "210" labs, including ten labs (e.g., labs 420-1, ..., 420- 10) depicted in Fig. 17A.
- Each of the ten labs may include a size of twenty sub-labs, and idle sub-pool 1720 may include a size of "210" labs.
- SPMD command 1710 may seek to create an inner parallel resource set with ten desired labs 1730, and may create the inner parallel resource set as follows. SPMD command 1710 may subtract the number of desired labs 1730 from the number of labs (e.g., "210") contained in idle sub-pool 1720, and may divide the result (e.g., "200") by the number of desired labs 1730 to determine a particular number (e.g., "20"). The particular number may be used to divide the inner parallel resource set, associated with SPMD command 1710, among labs 420- 1 , ... , 420- 10 of idle sub-pool 1720. In one example, each of labs 420-1, ..., 420-10 may include a portion of the inner parallel resource set, associated with SPMD command 1710, that is less than or equal to "20.”
- process block 2270 may include allocating a remaining portion of the idle sub-pool to the desired number of lab(s) for future use (block 2730), and restoring the idle sub- pool after execution of the allocated one or more portions of the program (block 2740).
- a remaining portion of idle sub- pool 1720 may be allocated to the number of desired labs 1730 for future use by SPMD command 1710.
- SPMD command 1710 may restore idle sub-pool 1720 to its original size, as indicated by reference number 1740.
- idle sub- pool 1720 may include the same number of labs as it had before the inner parallel resource set was created.
- idle sub-pool 1720 may include "210" labs, and the "210" labs may include empty idle sub-pools, including the labs (e.g., labs 420-1, ..., 420-10) that executed SPMD command 1710.
- process block 2270 may include the process blocks illustrated in Fig. 28. As shown in Fig. 28, process block 2270 may include maintaining one or more variables, referenced by the program, on the one or more labs (block 2800), removing the one or more variables, not referenced by the program, from the one or more labs (block 2810), and making the one or more labs available for use (block 2820).
- process block 2270 may include maintaining one or more variables, referenced by the program, on the one or more labs (block 2800), removing the one or more variables, not referenced by the program, from the one or more labs (block 2810), and making the one or more labs available for use (block 2820).
- SPMD command 1910 may be executed by a lab (e.g., lab 420-1).
- Variable x may exist on lab 420-1 when a first spmd statement of SPMD command 1910 is encountered, as indicated by reference number 1920, because variable x is referenced in later portions of SPMD command 1910.
- Variable x may to continue exist on lab 420-1 after a first end statement of SPMD command 1910, as indicated by reference number 1930, because variable x is referenced in later portions of SPMD command 1910.
- Variable y may exist on lab 420-1 when a second spmd statement of SPMD command 1910 is encountered, as indicated by reference number 1940, because variable y is referenced in later portions of SPMD command 1910.
- Variables x andy may cease to exist on lab 420-1 (and lab 420-1 may be available) after a second end statement of SPMD command 1910, as indicated by reference number 1950, because variables x and y are no longer referenced in later portions of SPMD command 1910.
- logic may include hardware, such as an application specific integrated circuit or a field programmable gate array, software, or a combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
- Stored Programmes (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US5429208P | 2008-05-19 | 2008-05-19 | |
US5429508P | 2008-05-19 | 2008-05-19 | |
US12/254,578 US8255889B2 (en) | 2007-02-14 | 2008-10-20 | Method of using parallel processing constructs and dynamically allocating program portions |
US12/254,572 US8239844B2 (en) | 2007-02-14 | 2008-10-20 | Method of using parallel processing constructs and dynamically allocating program portions |
US12/254,584 US8239845B2 (en) | 2007-02-14 | 2008-10-20 | Media for using parallel processing constructs |
PCT/US2009/044377 WO2009143068A2 (en) | 2008-05-19 | 2009-05-18 | Method of using parallel processing constructs |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2297639A2 true EP2297639A2 (de) | 2011-03-23 |
Family
ID=41112700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP09751296A Ceased EP2297639A2 (de) | 2008-05-19 | 2009-05-18 | Verfahren zur verwendung paralleler verarbeitungskonstrukte |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP2297639A2 (de) |
WO (1) | WO2009143068A2 (de) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2489526A (en) | 2011-04-01 | 2012-10-03 | Schlumberger Holdings | Representing and calculating with sparse matrixes in simulating incompressible fluid flows. |
CN107203406B (zh) * | 2017-06-26 | 2020-11-06 | 西安微电子技术研究所 | 一种面向分布式存储结构的处理方法 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0420142B1 (de) * | 1989-09-29 | 2000-03-08 | Myrias Research Corporation | Paralleles Verarbeitungssystem |
US8255890B2 (en) * | 2007-02-14 | 2012-08-28 | The Mathworks, Inc. | Media for performing parallel processing of distributed arrays |
US8239846B2 (en) * | 2007-02-14 | 2012-08-07 | The Mathworks, Inc. | Device for performing parallel processing of distributed arrays |
US8239844B2 (en) * | 2007-02-14 | 2012-08-07 | The Mathworks, Inc. | Method of using parallel processing constructs and dynamically allocating program portions |
US8250550B2 (en) * | 2007-02-14 | 2012-08-21 | The Mathworks, Inc. | Parallel processing of distributed arrays and optimum data distribution |
US8255889B2 (en) * | 2007-02-14 | 2012-08-28 | The Mathworks, Inc. | Method of using parallel processing constructs and dynamically allocating program portions |
-
2009
- 2009-05-18 EP EP09751296A patent/EP2297639A2/de not_active Ceased
- 2009-05-18 WO PCT/US2009/044377 patent/WO2009143068A2/en active Application Filing
Non-Patent Citations (1)
Title |
---|
OPENMP ARCHITECTURE REVIEW BOARD: "OpenMP Application Program Interface - Version 2.5", May 2005 (2005-05-01), XP055063668, Retrieved from the Internet <URL:http://www.openmp.org/mp-documents/spec25.pdf> [retrieved on 20130522] * |
Also Published As
Publication number | Publication date |
---|---|
WO2009143068A2 (en) | 2009-11-26 |
WO2009143068A3 (en) | 2010-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8255889B2 (en) | Method of using parallel processing constructs and dynamically allocating program portions | |
US8707280B2 (en) | Using parallel processing constructs and dynamically allocating program portions | |
EP2147374B1 (de) | Parallelprogrammierungsschnittstelle | |
US8239845B2 (en) | Media for using parallel processing constructs | |
US8255890B2 (en) | Media for performing parallel processing of distributed arrays | |
US8250550B2 (en) | Parallel processing of distributed arrays and optimum data distribution | |
US8239846B2 (en) | Device for performing parallel processing of distributed arrays | |
US8108845B2 (en) | Parallel programming computing system to dynamically allocate program portions | |
US8949807B2 (en) | Saving and loading graphical processing unit (GPU) arrays providing high computational capabilities in a computing environment | |
US8108717B2 (en) | Parallel programming error constructs | |
Mashayekhi et al. | Execution templates: Caching control plane decisions for strong scaling of data analytics | |
Tsuji et al. | Multiple-spmd programming environment based on pgas and workflow toward post-petascale computing | |
WO2009143073A1 (en) | Parallel processing of distributed arrays | |
EP2297639A2 (de) | Verfahren zur verwendung paralleler verarbeitungskonstrukte | |
Morris et al. | Mpignite: An mpi-like language and prototype implementation for apache spark | |
US8819643B1 (en) | Parallel program profiler | |
CN111656323B (zh) | 应用运行时确定的异构计算资源的动态分配 | |
Nair | An Analytical study of Performance towards Task-level Parallelism on Many-core systems using Java API | |
Caromel et al. | Proactive parallel suite: From active objects-skeletons-components to environment and deployment | |
Kolesnikov et al. | Indigo: An infrastructure for optimization of distributed algorithms | |
Wang et al. | Automating three-dimensional reconstruction of icosahedral virus structure with Condensed Graphs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20101220 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA RS |
|
17Q | First examination report despatched |
Effective date: 20110427 |
|
DAX | Request for extension of the european patent (deleted) | ||
APBK | Appeal reference recorded |
Free format text: ORIGINAL CODE: EPIDOSNREFNE |
|
APBN | Date of receipt of notice of appeal recorded |
Free format text: ORIGINAL CODE: EPIDOSNNOA2E |
|
APBR | Date of receipt of statement of grounds of appeal recorded |
Free format text: ORIGINAL CODE: EPIDOSNNOA3E |
|
APAF | Appeal reference modified |
Free format text: ORIGINAL CODE: EPIDOSCREFNE |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: THE MATHWORKS, INC. |
|
APBX | Invitation to file observations in appeal sent |
Free format text: ORIGINAL CODE: EPIDOSNOBA2E |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
APBT | Appeal procedure closed |
Free format text: ORIGINAL CODE: EPIDOSNNOA9E |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20180112 |