EP2165260A1 - Traitement parallèle de réseaux distribués - Google Patents
Traitement parallèle de réseaux distribuésInfo
- Publication number
- EP2165260A1 EP2165260A1 EP09751301A EP09751301A EP2165260A1 EP 2165260 A1 EP2165260 A1 EP 2165260A1 EP 09751301 A EP09751301 A EP 09751301A EP 09751301 A EP09751301 A EP 09751301A EP 2165260 A1 EP2165260 A1 EP 2165260A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- parallel
- distributed
- program
- labs
- distributed array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/4401—Bootstrapping
- G06F9/4411—Configuring for operating with peripheral devices; Loading of device drivers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/451—Code distribution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/453—Data distribution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
Definitions
- Closely-coupled processors or hardware resources will likely become widely available within the near future.
- Examples of such closely-coupled processors (or hardware resources) may include additional processors, threads in a particular processor, additional cores in a central processing unit, additional processors mounted on the same substrate or board, and/or such devices provided within computers connected by a network fabric into a cluster, a grid, or a collection of resources.
- Parallel computing arrangements may include a controller that determines how an application should be divided and what application portions go to which parallel processors.
- a host computer that is running a simulation may act as the controller for a number of parallel processors.
- Parallel processors may receive instructions and/or data from the controller and may return a result to the controller.
- An array is a data structure consisting of a group of elements that are accessed by indexing.
- An array may include any number of dimensions containing numeric, character, logical values, cells, or structures.
- An array may be partitioned into segments to create a distributed array.
- current architectures do not utilize a single language presentation for parallel processing of distributed arrays. Rather, current architectures may process distributed arrays exclusively in parallel or exclusively in sequential manner.
- Fig. 1 depicts an exemplary diagram of an architectural overview in which implementations described herein may be practiced
- FIG. 2 illustrates an exemplary diagram of a hardware environment depicted in Fig. 1
- Fig. 3 depicts an exemplary diagram of a batch (or distributed computing) environment illustrated in Fig. 1 ;
- FIG. 4 illustrates an exemplary diagram of a parallel processing environment depicted in Fig. 1
- Fig. 5 A depicts an exemplary diagram of functional components of a parallel processing interface illustrated in Fig. 1 ;
- Fig. 5B illustrates an exemplary diagram of functional components of the parallel processing interface in an alternative arrangement
- Fig. 5C depicts an exemplary diagram of functional components of the parallel processing interface in another alternative arrangement
- Fig. 6 illustrates exemplary hardware components of a client and/or a web service depicted in Figs. 5 A and 5B;
- Fig. 7 depicts an exemplary parallel processing construct capable of being analyzed and transformed to parallel program portions by the analysis logic illustrated in Figs. 5 A and 5B;
- Fig. 8 illustrates an exemplary diagram of a parallel processing construct capable of being generated by a technical computing environment depicted in Fig. 7;
- Fig. 9 depicts an exemplary diagram of functional components of the client illustrated in Figs. 5A-5C, where the client may determine an efficient distribution scheme;
- Fig. 10 illustrates an exemplary diagram of distribution scheme commands and/or distributed array commands capable of being generated by the technical computing environment depicted in Fig. 7;
- Fig. 11 depicts an exemplary diagram of distribution scheme commands capable of being generated by the technical computing environment illustrated in Fig. 7 and may include remote objects;
- Fig. 12 illustrates an exemplary distributor placement application program interface capable of being provided by the client depicted in Figs. 5A-5C
- Fig. 13 depicts an exemplary diagram of distribution scheme commands capable of being provided by the client illustrated in Figs. 5A-5C, where the distribution scheme commands may provide conversions between distributor objects and parallel distributor objects;
- Fig. 14 illustrates an exemplary diagram of distribution scheme commands capable of being provided by the client depicted in Figs. 5A-5C, where the distribution scheme commands may convert a distributor object into a specific distributions scheme object;
- Fig. 15 depicts an exemplary diagram of functional components, of the client illustrated in Fig. 5A-5C, for handling user-defined distribution schemes;
- Fig. 16 illustrates an exemplary diagram of distributed array commands capable of being provided by the client depicted in Figs. 5A-5C, where the distributed array commands may create an instance of a distributed array for execution by a lab;
- Fig. 17 depicts an exemplary diagram of distributed array commands capable of being provided by the client illustrated in Figs. 5A-5C, where the distributed array commands may convert a remote object into a non-remote object;
- Fig. 18 illustrates an exemplary diagram of distributed array commands capable of being provided by the client depicted in Figs. 5A-5C, where the distributed array commands may provide conversions between distributed objects and parallel distributed objects;
- Fig. 19 depicts an exemplary diagram of distributed array commands capable of being provided by the client illustrated in Figs. 5A-5C, where the distributed array commands may mix distributed objects and parallel distributed objects;
- Fig. 20 illustrates an exemplary diagram of distributed array commands capable of being provided by the client depicted in Figs. 5A-5C, where the distributed array commands may include distributed objects with one or more input arguments;
- Fig. 21 depicts an exemplary diagram of distributed array commands capable of being provided by the client illustrated in Figs. 5A-5C, where the distributed array commands may include parallel distributed objects with one or more input arguments;
- Fig. 22 illustrates an exemplary diagram of a data placement policy for distribution scheme and/or distributed array commands capable of being provided by the client depicted in Figs. 5A-5C;
- Fig. 23 depicts an exemplary diagram of dimensional constructors capable of being provided by the client illustrated in Figs. 5A-5C;
- Fig. 24 illustrates an exemplary diagram of distribution scheme and/or distributed array commands capable of being provided by the client depicted in Figs. 5A-5C, and transferring distributed arrays and associated distributor objects;
- Fig. 25 depicts an exemplary diagram of distribution scheme and/or distributed array commands capable of being provided by the client illustrated in Figs. 5A-5C, where the distribution scheme and/or distributed array commands may provide interactions with nested parallel processing constructs;
- Fig. 26 illustrates an exemplary diagram of distribution scheme and/or distributed array commands capable of being provided by the client depicted in Figs. 5A-5C, where the distribution scheme and/or distributed array commands may generate an error to prevent parallel error signaling;
- Fig. 27 depicts an exemplary diagram of distribution scheme and/or distributed array commands capable of being provided by the client illustrated in Figs. 5A-5C, where the distribution scheme and/or distributed array commands may reduce a remote call outside a parallel processing construct;
- Fig. 28 illustrates an exemplary diagram of functional components, of the client depicted in Figs.
- Figs. 29-45 depict flow charts associated with an exemplary process according to implementations described herein.
- Implementations described herein may include systems and/or methods for providing a single programming language presentation of distributed arrays.
- the systems and/or methods may initiate a single programming language, and may identify, via the single programming language, one or more data distribution schemes for executing a program.
- the systems and/or methods also may transform, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and may allocate one or more portions of the parallel program to two or more labs for parallel execution.
- the systems and/or methods may further receive one or more results associated with the parallel execution of the one or more portions from the two or more labs, and may provide the one or more results to the program.
- a lab may include hardware, software, and/or combination of hardware and software that performs and/or participates in parallel processing activities.
- a lab may perform and/or participate in parallel processing activities in response to a request and/or a task received from a client.
- a lab may be implemented as a software unit of execution and/or a hardware unit of execution.
- a lab may perform and/or participate in substantially any type of parallel processing (e.g., task, data, and/or stream processing).
- a lab may perform and/or participate in parallel processing activities in response to a receipt of a program or one or more portions of the program.
- a lab may support one or more threads (or processes) when performing processing operations.
- Parallel processing may include any type of processing that can be distributed across two or more resources (e.g., software units of execution, hardware units of execution, processors, microprocessors, clusters, labs, etc.) and be performed at substantially the same time.
- resources e.g., software units of execution, hardware units of execution, processors, microprocessors, clusters, labs, etc.
- parallel processing may refer to task parallel processing where a number of tasks are processed at substantially the same time on a number of software units of execution.
- each task may be processed independently of other tasks executing at the same time (e.g., a first software unit of execution executing a first task may not communicate with a second software unit of execution executing a second task).
- parallel processing may refer to data parallel processing, where data (e.g., a data set) is parsed into a number of portions that are executed in parallel using two or more software units of execution.
- data parallel processing the software units of execution and/or the data portions may communicate with each other as processing progresses.
- parallel processing may refer to stream parallel processing (also referred to as pipeline parallel processing).
- Stream parallel processing may use a number of software units of execution arranged in series (e.g., a line) where a first software unit of execution produces a first result that is fed to a second software unit of execution that produces a second result.
- Stream parallel processing may also include a state where task allocation may be expressed in a directed acyclic graph (DAG) or a cyclic graph with delays.
- DAG directed acyclic graph
- Other implementations may combine two or more of task, data, or stream parallel processing techniques alone or with other types of processing techniques to form hybrid- parallel processing techniques.
- a parallel processing environment may include any environment capable of performing parallel processing.
- a parallel processing environment may include a dynamic number of processes provided on one or more hardware, software, and/or a combination of hardware and software units of execution which may have several different control and data passing layers through which a current behavior of a part or a whole of the environment may be specified.
- a front-end application e.g., a parallel processing interface
- the processes involved in the parallel processing environment may include processes associated with a technical computing environment.
- a technical computing environment may include any hardware, software, and/or a combination of hardware and software based logic that provides a computing environment that allows users to perform tasks related to disciplines, such as, but not limited to, mathematics, science, engineering, medicine, business, etc., more efficiently than if the tasks were performed in another type of computing environment, such as an environment that required the user to develop code in a conventional programming language, such as C++, C, Fortran, Pascal, etc.
- a TCE may include a dynamically-typed programming language (e.g., the M language or MATLAB® language) that can be used to express problems and/or solutions in mathematical notations.
- a TCE may use an array as a basic element, where the array may not require dimensioning.
- a TCE may be adapted to perform matrix and/or vector formulations that can be used for data analysis, data visualization, application development, simulation, modeling, algorithm development, etc. These matrix and/or vector formulations may be used in many areas, such as statistics, image processing, signal processing, control design, life sciences modeling, discrete event analysis and/or design, state based analysis and/or design, etc.
- a TCE may further provide mathematical functions and/or graphical tools (e.g., for creating plots, surfaces, images, volumetric representations, etc.).
- a TCE may provide these functions and/or tools using toolboxes (e.g., toolboxes for signal processing, image processing, data plotting, parallel processing, etc.).
- a TCE may provide these functions as block sets.
- a TCE may provide these functions in another way, such as via a library, etc.
- a TCE may be implemented as a text-based environment (e.g., MATLAB® software; Octave; Python; Comsol Script; MATRIXx from National Instruments; Mathematica from Wolfram Research, Inc.; Mathcad from Mathsoft Engineering & Education Inc.; Maple from Maplesoft; Extend from Imagine That Inc.; Scilab from The French Institution for Research in Computer Science and Control (INRIA); Virtuoso from Cadence; Modelica or Dymola from Dynasim; etc.), a graphically-based environment (e.g., Simulink® software, Stateflow® software, SimEventsTM software, etc., by The Math Works, Inc.; VisSim by Visual Solutions; Lab View® by National Instruments; Dymola by Dynasim; SoftWIRE by Measurement Computing; WiT by DALSA Coreco; VEE Pro or System Vue by Agilent; Vision Program Manager from PPT Vision; Khoros from Khoral Research; Gedae by Gedae, Inc.; Scicos from (INRIA
- Fig. 1 is an exemplary diagram of an architectural overview 100 in which implementations described herein may be practiced.
- overview 100 may include a hardware environment 110, a batch (or distributed computing) environment 120, a parallel processing environment 130, and/or a parallel processing interface 140.
- Hardware environment 110 may include one or more hardware resources that may be used to perform parallel processing.
- hardware environment 110 may include one or more hardware units of execution. Further details of hardware environment 110 are provided below in connection with Fig. 2.
- Batch environment 120 may provide a distributed computing environment for a job.
- batch (or distributed computing) environment 120 may include a client that provides a job to a scheduler.
- the scheduler may distribute the job into one or more tasks, and may provide the tasks to one or more hardware units of execution and/or one or more processors.
- the hardware units of execution and/or processors may execute the tasks, and may provide results to the scheduler.
- the scheduler may combine the results into a single result, and may provide the single result to the client. Further details of batch environment 120 are provided below in connection with Fig. 3.
- Parallel processing environment 130 may provide parallel processing for a main program.
- parallel processing environment 130 may include a technical computing environment that provides a main program to a controller.
- the controller may provide portions of the program to one or more software units of execution and/or one more labs.
- the software units of execution and/or labs may execute the program portions, and may provide results to the controller.
- the controller may combine the results into a single result, and may provide the single result to the technical computing environment. Further details of parallel processing environment 130 are provided below in connection with Fig. 4.
- Parallel processing interface 140 may include a front-end application (e.g., an application program interface (API)) that provides an interface for dynamically accessing, controlling, utilizing, etc. hardware environment 110, batch environment 120, and/or parallel processing environment 130.
- parallel processing interface 140 may include parallel processing constructs that permit users to express specific parallel workflows.
- parallel processing interface 140 may include a program provider that provides a main program to analysis logic.
- the analysis logic may analyze the main program, may parse the main program into program portions, and may provide the program portions to resource allocation logic.
- the resource allocation logic may allocate the program portions to one or more software units of execution and/or hardware units of execution.
- the program portions may be executed, and results may be provided to the program provider.
- parallel processing interface 140 may include an object API where a user may specify how a program may be parallelized. Further details of parallel processing interface 140 are provided below in connection with Figs. 5A-5C.
- Fig. 1 shows exemplary components of architectural overview 100, in other implementations, architectural overview 100 may contain fewer, different, or additional components than depicted in Fig. 1. EXEMPLARY HARDWARE ENVIRONMENT
- Fig. 2 is an exemplary diagram of hardware environment 110.
- hardware environment 110 may include a hardware unit of execution (UE) 200 with one or more processors 210-1, 210-2, 210-3, 210-4 (collectively, "processors 210").
- a hardware unit of execution may include a device (e.g., a hardware resource) that performs and/or participates in parallel processing activities.
- a hardware unit of execution may perform and/or participate in parallel processing activities in response to a request and/or a task received from a client.
- a hardware unit of execution may perform and/or participate in substantially any type of parallel processing (e.g., task, data, and/or stream processing) using one or more devices.
- a hardware unit of execution may include a single processor that includes multiple cores and in another implementation, the hardware unit of execution may include a number of processors.
- Devices used in a hardware unit of execution may be arranged in substantially any configuration (or topology), such as a grid, ring, star, etc.
- a hardware unit of execution may support one or more threads (or processes) when performing processing operations.
- hardware UE 200 may perform parallel processing activities on behalf of another device.
- hardware UE 200 may perform parallel processing activities on behalf of itself or on behalf of a host of which hardware UE 200 is a part.
- Hardware UE 200 may perform parallel processing in a variety of ways. For example, hardware UE 200 may perform parallel processing activities related to task parallel processing, data parallel processing, stream parallel processing, etc. Hardware UE 200 may perform parallel processing using processing devices resident on UE 200 and/or using processing devices that are remote with respect to UE 200.
- hardware UE 200 may include processors 210-1, 210-2, 210-3, and 210-4.
- Processors 210 may include hardware, software, and/or a combination of hardware and software based logic that performs processing operations.
- Processors 210 may include substantially any type of processing device, such as a central processing unit (CPU), a microprocessor, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), a micro electrical mechanical switch (MEMS), a general purpose graphical processing unit (GPGPU), an optical processor, etc.
- each processor 210-1 through 210-4 may include a single core processor or a multi-core processor.
- each processor 210-1 through 210-4 may include a single processing device or a group of processing devices, such as a processor cluster or a computing grid.
- each processor 210-1 through 210-4 may include multiple processors that may be local or remote with respect each other, and may use one or more threads while processing.
- each processor 210-1 through 210-4 may represent a single hardware UE.
- Fig. 2 shows exemplary components of hardware environment 110, in other implementations, hardware environment 110 may contain fewer, different, or additional components than depicted in Fig. 2.
- hardware environment 110 may include one or more of a bus, a main memory, a read-only memory (ROM), a storage device, an input device, an output device, and/or a communication interface.
- ROM read-only memory
- one or more components of hardware environment 110 may perform one or more other tasks described as being performed by one or more other components of hardware environment 110.
- Fig. 3 is an exemplary diagram of batch environment 120.
- batch environment 120 may include a client 300, a scheduler 310, and hardware UE 200 (including processors 210).
- Hardware UE 200 and processors 210 may perform the same or similar tasks as described above in connection with Fig. 2.
- Client 300 may include one or more entities.
- An entity may be defined as a device, such as a personal computer, a personal digital assistant (PDA), a laptop, or another type of computation or communication device, a thread or process running on one of these devices, and/or an object executable by one of these devices.
- client 300 may include a device capable of sending information to, or receiving information from, another device, such as hardware UE 200.
- client 300 may include a technical computing environment (TCE) 320 and a library 330.
- TCE technical computing environment
- Other implementations of client 300 may contain fewer, different, or additional components than depicted in Fig. 3.
- Technical computing environment (TCE) 320 may include any of the features described above with respect to the term “technical computing environment.”
- Library 330 may include hardware, software, and/or a combination of hardware and software based logic that may operate with TCE 320 to perform certain operations.
- library 330 may store functions to perform certain operations (e.g., signal processing, image processing, parallel processing, data display, etc.) in a text-based environment.
- library 140 may store graphical representations (e.g., blocks, icons, images, etc.) to perform certain operations in a graphically-based environment (e.g., a gain block, a source block, a filter block, a discrete event generator block, etc.).
- Scheduler 310 may include hardware, software, and/or a combination of hardware and software based logic to perform scheduling operations on behalf of a device (e.g., client 300). For example, scheduler 310 may perform operations to select and/or control parallel processing activities performed by hardware UE 200 on behalf of client 300.
- scheduler 310 may receive a job 340, and may distribute or divide job 340 into tasks (e.g., tasks 350-1 , 350-2, 350-3, and 350-4).
- Scheduler 310 may send tasks 350-1, 350-2, 350-3, and 350-4 to hardware UE 200 (e.g., to processor 210-1, 210-2, 210-3, and 210-4, respectively) for execution.
- Scheduler 310 may receive results from hardware UE 200 (e.g., results 360-1, 360-2, 360-3, and 360-4), may assemble the results into a single result 370, and may provide result 370 to client 300.
- Scheduler 310 may reside locally on client 300 or may be located remotely with respect to client 300 depending on particular implementations described herein.
- Fig. 3 shows exemplary components of batch environment 120
- batch environment 120 may contain fewer, different, or additional components than depicted in Fig. 3.
- one or more components of batch environment 120 may perform one or more other tasks described as being performed by one or more other components of batch environment 120.
- FIG. 4 is an exemplary diagram of parallel processing environment 130.
- parallel processing environment 130 may include technical computing environment 320, a controller 400, and a software unit of execution (UE) 410.
- Technical computing environment 320 may include any of the features described above with respect to the term "technical computing environment.”
- Controller 400 may include hardware, software, and/or a combination of hardware and software based logic to perform controlling operations on behalf of a program. For example, in one implementation, controller 400 may select and/or control parallel processing activities performed by software UE 410 on behalf of technical computing environment 320.
- a software unit of execution may include a software resource (e.g., a worker, a lab, etc.) that performs and/or participates in parallel processing activities.
- a software unit of execution may perform and/or participate in parallel processing activities in response to receipt of a program and/or one or more portions of the program.
- a software unit of execution may perform and/or participate in substantially any type of parallel processing using one or more hardware units of execution.
- a software unit of execution may support one or more threads (or processes) when performing processing operations.
- software UE 410 may include one or more labs (e.g., labs 420-1, 420-2,
- Labs 420 may include any of the features described above with respect to the term "lab.”
- a lab may be similar to a software unit of execution, except on a smaller scale.
- a lab may represent a single software unit of execution.
- technical computing environment 320 may provide a main program
- Controller 400 may provide portions of program 430 (e.g., program portions 440- 1, 440-2, 440-3, and 440-4, collectively referred to as "program portions 440") to labs 420-1, 420-2, 420- 3, and 420-4, respectively, to software UE 410.
- Labs 420 may execute program portions 440, and may provide results to controller 400.
- Lab 420-1 may provide a result 450-1 to controller 400
- lab 420-2 may provide a result 450-2 to controller 400
- lab 420-3 may provide a result 450-3 to controller 400
- lab 420-4 may provide a result 450-4 to controller 400.
- Controller 400 may combine the results into a single result 460, and may provide single result 460 to technical computing environment 320.
- Fig. 4 shows exemplary components of parallel processing environment 130
- parallel processing environment 130 may contain fewer, different, or additional components than depicted in Fig. 4.
- one or more components of parallel processing environment 130 may perform one or more other tasks described as being performed by one or more other components of parallel processing environment 130.
- Fig. 5 A is an exemplary diagram of functional components of parallel processing interface 140.
- parallel processing interface may include a client 500 that includes a variety of functional components, such as a program provider 510, analysis logic 520, resource allocation logic 530, and/or a results provider 540.
- Client 500 may include one or more entities.
- An entity may be defined as a device, such as a personal computer, a personal digital assistant (PDA), a laptop, or another type of computation or communication device, a thread or process running on one of these devices, and/or an object executable by one of these devices.
- client 500 may include a device capable of providing a parallel processing interface, as described herein.
- Other implementations of client 500 may contain fewer, different, or additional components than depicted in Fig. 5.
- client 500 may include a technical computing environment (e.g., TCE 320) and a library (e.g., library 330).
- Program provider 510 may include hardware, software, and/or a combination of hardware and software based logic that provides one or more programs for execution. For example, in one implementation, program provider 510 may generate programs created using a technical computing environment, as defined above. As shown in Fig. 5, program provider 510 may provide a main program 545 to analysis logic 520.
- Analysis logic 520 may receive main program 545, and may include hardware, software, and/or a combination of hardware and software based logic that analyzes main program 545 and parses main program 545 into one or more program portions 550.
- analysis logic 520 may include language constructs (as described herein) that parse main program 545 into one or more program portions 550. As shown in Fig. 5, analysis logic 520 may provide program portions 550 to resource allocation logic 530. Further details of analysis logic 520 are provided below.
- Resource allocation logic 530 may receive program portions 550, and may include hardware, software, and/or a combination of hardware and software based logic that dynamically allocates (as indicated by reference number 560) program portions 550 to one or more software UEs (e.g., software UE 410) for parallel execution.
- allocation 560 may be provided to one or more software UEs, and the software UEs may be executed by one or more hardware UEs (e.g., hardware UE 200) in a parallel processing manner.
- allocation 560 may be executed via software UEs and/or hardware UEs of client 500.
- the software UEs may return results 570 of the execution of program portions 550 to results provider 540.
- Results provider 540 may include hardware, software, and/or a combination of hardware and software based logic that receives results 570 from the software UEs, and provides results 570 to program provider 510. In one implementation, results provider 540 may combine results 570 into a single result, and may provide the single result to program provider 510.
- Client 500 e.g., via analysis logic 520
- Client 500 may use different control and data passing layers through which it may specify the current behavior of a part or a whole of the parallel processing interface 140.
- client 500 may use a message passing interface (MPI), a Transmission Control Protocol/Internet Protocol (TCP/IP), an Ethernet protocol, and/or other interconnects and protocols for the control and data passing layers.
- MPI message passing interface
- TCP/IP Transmission Control Protocol/Internet Protocol
- Ethernet protocol an Ethernet protocol
- client 500 may implement an MPI layer (and/or other data and control layers) on any standard non-guaranteed stream protocol.
- client 500 may use two different layers, a cooperative communication layer (e.g., where processes may need to agree that a particular type of message is being sent) and an imperative communication layer or control layer (e.g., that may send unexpected messages to a recipient and may request the recipient to undertake an instruction contained in the message).
- Client 500 may define a sub-group behavior for each of program portions 550.
- a sub-group may include any part of the overall set of processes (e.g., main program 545 and/or program portions 550).
- the sub-group behavior may relate to the parallel processing styles that may be employed on the group of program portions 550.
- client 500 may dynamically change the behavior of one or more of program portions 550 as code is executed for other program portions 550.
- client 500 may use the control layer to change the current state of a sub-group at any time, which may dynamically change the behavior of that portion of the group.
- an application e.g., main program 545) may include different phases (e.g., an input phase, an analysis phase, an output phase, etc.), and parallel processing needs may be different for each phase.
- the sub-group behavior may include an unused state (e.g., the initial state of a process when it is not being used), a user-controlled UE state (e.g., if a user has acquired a process as a UE object), a task parallel state (e.g., an execution state used by parallel processing constructs), a single program, multiple data (SPMD) state (e.g., one or more processes may have a MPI ring between them with appropriate values for rank and size), a stream state (e.g., a state where task allocation may be expressed in a directed acyclic graph (DAG) or a cyclic graph with delays), etc.
- Each of program portions 550 may be in one of the above-mentioned states, and may request other tasks to be placed in a new state.
- the sub-group behavior may include a variety of other states.
- the sub-group behavior may include a delayed debugging state where a task may be executed and delayed in time with respect to another task (or delayed in lines of code).
- a delayed debugging state may permit a breakpoint to be created for one task if another task experiences an error, and may enable a user to see why an error occurred.
- the sub-group behavior may include a release differences state that may execute one or more tasks associated with different releases of a product (e.g., different releases of TCE 320). This may permit behavior differences to be found between different releases of a product, and may permit users to undertake release compatibility studies.
- some state information may be consistent across client 500.
- a source of code may come from one device (e.g., client 500), and a file system associated with the source device may be used across client 500.
- some state information may be consistent across a sub-group of client 500 (e.g., labindex, numlabs, etc.).
- the state information may be automatically transferred from client 500 to software unit of execution 410 and/or labs 420.
- a path is added to a technical computing environment (e.g., TCE 320) of client 500, the path may be automatically added to all TCEs in the parallel environment (e.g., TCEs provided in labs 420).
- TCE of client 500 is instructed to reanalyze a piece of code (e.g., because a program changed)
- all of the TCEs in the parallel environment may be instructed to reanalyze the piece of code for a sub-group, this may be similar to changing a parallel random number seed, or possibly clearing a particular workspace (e.g., one of labs 420) to ensure clean evaluation of a program.
- client 500 may be interactive in that resource allocation logic 530 may permit a user to dynamically control a current setup (e.g., via scripts, functions, command lines, etc.). Thus, client 500 and its configuration may change based on an actual analysis that the user may be currently undertaking.
- resource allocation logic 530 may be connected to one or more clusters of software UEs 410 and may use processes derived from each of the clusters, as well as client 500, to form the functional components of client 500.
- client 500 may include devices having different architectures and/or operating systems (i.e., client 500 may execute across multiple platforms). For example, client 500 may include a different architecture and/or operating system other than software UE 410.
- main program 545 may be submitted in batch manner to a cluster (e.g., a cluster of software UEs 410 and/or a cluster of labs 420).
- a user may interactively develop main program 545, and may save main program 545 in a file (e.g., an M file).
- a command may exist in main program 545 (e.g., in the M file) that may cause one lab (e.g., one of labs 420) in the cluster to act as a client where the execution of main program 545 initiates.
- Main program 545 may be submitted in batch manner to a cluster (e.g., a cluster of software UEs 410 and/or a cluster of labs 420).
- a user may interactively develop main program 545, and may save main program 545 in a file (e.g., an M file).
- a command may exist in main program 545 (e.g., in the M file) that may cause one lab (e.g., one of labs
- Fig. 545 may use four labs 420 and a client (e.g., one of labs 420 acting as a client), may initiate on the client, and may utilize as many labs 420 as necessary to carry out execution.
- a special type of job may be created that creates a pool (or cluster) of labs, where one of the initiated processes of the job may act as the client, and rest of processes may be in the pool.
- Fig. 5B is an exemplary diagram of functional components of parallel processing interface 140 in an alternative arrangement. The alternative arrangement depicted in Fig. 5B is the same as the arrangement of Fig.
- program provider 510 may be included in a web service 580, while analysis logic 520, resource allocation logic 530, and results provider 540 may be include in client 500.
- Program provider 510, analysis logic 520, resource allocation logic, and/or results provider 540 may operate in the manner as described above in connection with Fig. 5 A.
- Web service 580 may provide access to one or more programs (e.g., main program 545 provided by program provider 510, applications accessed by main program 545, etc.).
- a web service may include any software application that allows machine-to-machine communications over a network (e.g., a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), such as the Internet, etc.).
- a web service may communicate with a client (e.g., client 500) using an application program interface (API) that the client may access over the network.
- API application program interface
- the web service may exchange Hypertext Markup Language (HTML), Extensible Markup Language (XML), or other types of messages with the client using industry compatible standards (e.g., simple object access protocol (SOAP)) and/or proprietary standards.
- a web service may further include network services that can be described using industry standard specifications, such as web service definition language (WSDL) and/or proprietary specifications.
- web service 580 may allow a destination (e.g., a computer operated by a customer) to perform parallel processing using hardware, software, and/or a combination of hardware and software UEs that may be operated by a service provider (e.g., client 500). For example, the customer may be permitted access to client 500 to perform parallel processing if the customer subscribes to one of the offered web services.
- the service provider may maintain a database that includes parameters, such as parameters that indicate the status of hardware UEs, software UEs, etc.
- the service provider may perform a look-up operation in the database if a request for parallel processing is received from the customer.
- the service provider may connect the customer to parallel processing resources that are available based on parameters in the database.
- the customer may receive web service 580 on a subscription basis.
- a subscription may include substantially any type of arrangement, such as monthly subscription, a per-use fee, a fee based on an amount of information exchanged between the service provider and the customer, a fee based on a number of processor cycles used by the customer, a fee based on a number of hardware UEs, software UEs, etc., used by the customer, etc.
- Fig. 5C is an exemplary diagram of functional components of parallel processing interface 140 in another alternative arrangement.
- the alternative arrangement depicted in Fig. 5C is the same as the arrangement of Fig. 5A, except that analysis logic 520 may be replaced with a parallel processing object API 590.
- Program provider 510, resource allocation logic 530, and/or results provider 540 may operate in the manner as described above in connection with Fig. 5A.
- Parallel processing object API 590 may permit a user to specify how main program 545 may be parallelized.
- Parallel processing object API 590 may cooperate with resource allocation logic 530 and/or an execution mechanism (e.g., software UEs 420) in a similar manner that analysis logic 520 cooperates with these components.
- parallel processing API 590 may offer much more flexibility and/or customization than analysis logic 520.
- Parallel processing API 590 may define and implement an object in a technical computing environment (e.g., TCE 320) that corresponds to another one or more (or set of) executing technical computing environments.
- Parallel processing API 590 may permit customizable parallelism of a program (e.g., main program 545), and may be nested in other calls or function (e.g., in the parallel processing constructs described herein).
- Parallel processing API 590 may be used by other calls as inputs to a calling function so that identification of which labs (e.g., labs 420) to use may be known.
- parallel processing API 590 may be used to provide or initiate a single programming language presentation of distributed arrays, as described herein.
- parallel processing API 590 may be called a MATLAB® unit of execution (or MUE) API.
- MUE API may define and implement an object in MATLAB® software that corresponds to another one or more of executing MATLAB® software applications.
- the MUE API may be used to permit one technical computing environment to communicate with and control another technical computing environment.
- the MUE API may be used to create groups of processes with certain behaviors (e.g., using the language constructs described herein).
- parallel processing interface 140 may contain fewer, different, or additional functional components than depicted in Figs. 5A-5C.
- one or more functional components of parallel processing interface 140 may perform one or more other tasks described as being performed by one or more other functional components of parallel processing interface 140.
- Fig. 6 is an exemplary diagram of an entity corresponding to client 500 and/or web service 580.
- the entity may include a bus 610, a processing unit 620, a main memory 630, a read-only memory (ROM) 640, a storage device 650, an input device 660, an output device 670, and/or a communication interface 680.
- Bus 610 may include a path that permits communication among the components of the entity.
- Processing unit 620 may include a processor, microprocessor, or other types of processing logic that may interpret and execute instructions.
- processing unit 620 may include a single core processor or a multi-core processor.
- processing unit 620 may include a single processing device or a group of processing devices, such as a processor cluster or computing grid.
- processing unit 620 may include multiple processors that may be local or remote with respect each other, and may use one or more threads while processing.
- processing unit 620 may include multiple processors implemented as hardware UEs capable of running copies of a technical computing environment.
- Main memory 630 may include a random access memory (RAM) or another type of dynamic storage device that may store information and instructions for execution by processing unit 620.
- ROM 640 may include a ROM device or another type of static storage device that may store static information and/or instructions for use by processing unit 620.
- Storage device 650 may include a magnetic and/or optical recording medium and its corresponding drive, or another type of static storage device (e.g., a disk drive) that may store static information and/or instructions for use by processing unit 620.
- Input device 660 may include a mechanism that permits an operator to input information to the entity, such as a keyboard, a mouse, a pen, a microphone, voice recognition and/or biometric mechanisms, etc.
- Output device 670 may include a mechanism that outputs information to the operator, including a display, a printer, a speaker, etc.
- Communication interface 680 may include any transceiver- like mechanism that enables the entity to communicate with other devices and/or systems. For example, communication interface 680 may include mechanisms for communicating with another device or system via a network.
- the entity depicted in Fig. 6 may perform certain operations in response to processing unit 620 executing software instructions contained in a computer-readable medium, such as main memory 630.
- a computer-readable medium may be defined as a physical or logical memory device.
- the software instructions may be read into main memory 630 from another computer-readable medium, such as storage device 650, or from another device via communication interface 680.
- the software instructions contained in main memory 630 may cause processing unit 620 to perform processes that will be described later.
- hardwired circuitry may be used in place of or in combination with software instructions to implement processes described herein.
- implementations described herein are not limited to any specific combination of hardware circuitry and software.
- Fig. 6 shows exemplary components of the entity
- the entity may contain fewer, different, or additional components than depicted in Fig. 6.
- one or more components of the entity may perform one or more other tasks described as being performed by one or more other components of the entity.
- Fig. 7 illustrates an exemplary parallel processing construct (e.g., a single program, multiple data (SPMD) command 700) capable of being analyzed and transformed to parallel program portions by analysis logic 520 of parallel processing interface 140.
- SPMD command 700 may be created with TCE 320 and provided to analysis logic 520 of client 500.
- SPMD command 700 may be created by another device and/or may be provided to analysis logic 520 of client 500.
- analysis logic 520 may implement SPMD command 700 to generate program portions 550.
- SPMD command 700 may permit users to enter into a SPMD mode.
- SPMD command 700 may support data parallelism whereby a large amount of data may be distributed across multiple software UEs (e.g., software UEs 410 and/or labs 420) via a distributed arrays API.
- SPMD command 700 may include:
- SPMD command 700 may be executed on resources (e.g., software UEs 410 and/or labs 420) that may be defined by a default configuration.
- SPMD command 700 may configure these resources as a communicating ring of labs (e.g., ring of labs 420), which may mean that labs 420 may have a same number of labs (e.g., NUMLABS) 720 defined, each lab 420 may have a unique value (e.g., LABINDEX 730, 740, 750, and 760 for labs 420-1, 420-2, 420-3, 420-4, respectively) between one and NUMLABS 720, labs 420 may send data to and from one another, and/or each lab 420 may include a unique random number generator that creates random number streams independent of one another.
- labs 420 may exchange information among each other when labs 420 are configured and/or executed.
- labs 420 may be cleaned up, which may mean that labs 420 may be restored to ordinary resources (e.g., after the results are received), NUMLABS 720 and LABINDEX 730-760 may set back to one, the random number generators may be set back to a default start value, and/or workspaces may be cleared. There may be no implicit data transfer to and from the workspace where SPMD command 700 is called and the workspaces of labs 420 executing the body of SPMD command 700. An error on any of labs 420 executing the body of SPMD command 700 may cause an error in SPMD command 700. A warning on any of labs 700 executing the body of SPMD command 700 may be displayed on a device (e.g., client 500).
- a device e.g., client 500
- SPMD command 700 of the form SPMD NUMWORKERS, statement, ..., statement, END may execute SPMD command 700 on an anonymous group of a number (e.g., NUMWORKERS) of resources provided within a default resource pool.
- SPMD command 700 of the form SPMD MYWORKERS, statement, ..., statement, END may execute SPMD command 700 on a specified group of resources (e.g., MYWORKERS).
- the syntax [OUTl, OUT2, ...] SPMD(IN1,IN2, ...), statement, .., statement, END may transfer variables (e.g., INl, IN2, ...) from client 500 to workspaces of labs 420 at the beginning of SPMD command 700, and may transfer variables (e.g., OUTl, OUT2, ...) from one of the workspaces back to client 500 at the end of SPMD command 700. If the variable being transferred from client 500 to labs 420 is a distributed array, then the variable may be automatically redistributed to all labs 420. If the variable being transferred from client 500 is a non-distributed array, then the variable may be replicated on all labs 420.
- variable being transferred from labs 420 to client 500 is a replicated array
- a replicated value may be received from any of labs 420.
- a value may be received from one of labs 420.
- the variable being transferred from labs 420 to client 500 is a distributed array, then the variable may be automatically redistributed to be a distributed array over a single lab 420.
- SPMD command 700 (and its associated syntax) may be implemented via client 500 (e.g. via analysis logic 520 of client 500), software UEs 410 (including labs 420), and/or TCE 320.
- SPMD command 700 (and its associated syntax) may be implemented via other software and hardware logic. SPMD command 700 may increase processing performance by dividing large data sets into pieces, and by providing each piece to different resources. Each resource may execute the same program on its piece of data, and the results may be collected.
- Fig. 7 shows an exemplary parallel processing construct
- analysis logic 520 may contain fewer, different, or additional parallel processing constructs than depicted in Fig. 7.
- the exemplary parallel processing construct may be allocated in other ways than depicted in Fig. 7.
- a SPMD parallel processing construct may provide a place holder for a single program that may be executed on one or more labs.
- the code for the SPMD construct may be provided to the labs, and workspace contents available to the SPMD construct may be determined on the client.
- the SPMD constructs described herein may be easy to use (e.g., may make it easy to mark code to execute in parallel and may make it easy to send ordinary variables into the SPMD), may support a user (e.g., a programmer) by performing minimal data transfer through remote references or similar mechanisms, and may provide sufficient richness to allow for remote distributed arrays.
- a parallel resource set may include a set of labs such that the labs may be available to execute parallel code, the labs may be connected in a MPI ring, and each of the labs may include a value store that can store values of variables.
- a parallel context may include a combination of a parallel resource set with a parallel code block, and may include variables associated with the parallel code block.
- Fig. 8 illustrates an exemplary diagram 800 of a parallel processing construct (a SPMD command 810) capable of being generated by technical computing environment 320.
- SPMD command 810 may include an outer parallel context 820, a SPMD body (or inner parallel context) 830, and SPMD boundaries 840.
- Outer parallel context 820 may include syntax or code provided outside a spmd statement and an end statement (e.g., outside SPMD boundaries 840).
- outer parallel context 820 may be executed sequentially (e.g., by client 500).
- SPMD body 830 may include syntax or code provided inside the spmd statement and the end statement (e.g., inside SPMD boundaries 840).
- SPMD body 830 may be provided to two or more labs (e.g., labs 420), and may be executed in parallel by the two or more labs.
- SPMD boundaries 840 may be defined by the spmd statement and the end statement of SPMD command 810. As described above, SPMD boundaries 840 may define outer parallel context 820 and inner parallel context (e.g., SPMD body 830) associated with SPMD command 810.
- SPMD command 810 may be provided to analysis logic 520.
- Analysis logic 520 may receive SPMD command 810, and may analyze SPMD command 810 to determine outer parallel context 820 and inner parallel context 830.
- analysis logic 520 may analyze SPMD command 810 to determine input variables 850 associated with SPMD command 810.
- Input variables 850 may include variables used within SPMD body 830 but before they are assigned values.
- analysis logic 520 may determine input variables 850 upon entering the spmd statement, and may attempt to transfer input variables from outer parallel context 820 into the inner parallel context (e.g., SPMD body 830).
- Analysis logic 520 may allocate one or more portions of the inner parallel context (e.g., SPMD body 830) and input variables 850 to labs 420 for parallel execution. If analysis logic 520 determines that no resources (e.g., labs 420) are available for parallel execution, as indicated by reference number 860, client 500 may sequentially execute outer parallel context 820 and SPMD body 830.
- Fig. 8 shows an exemplary parallel processing construct
- client 500 may contain fewer, different, or additional parallel processing constructs than depicted in Fig. 8.
- Fig. 9 depicts an exemplary diagram of functional components of client 500 for determining an efficient distribution scheme.
- client 500 may include SPMD optional logic 900, distribution scheme logic 910, and efficient distribution scheme identifier logic 920.
- SPMD optional logic 900 may communicate with technical computing environment 320, and may determine (e.g., based on user input via technical computing environment 320) whether to not use SPMD syntax 930 or whether to use SPMD syntax 940.
- SPMD optional logic 900 may provide "do not use SPMD” syntax 930 to distribution scheme logic 910, and may provide "use SPMD” syntax 940 to efficient distribution scheme identifier logic 920.
- Distribution scheme logic 910 may receive "do not use SPMD" syntax 910 and may generate a data distribution scheme 950 (e.g., for a distributed array provided by technical computing environment 320) without SPMD constructs. In one implementation, distribution scheme logic 910 may generate data distribution scheme 950 based on a user provided distribution scheme. Distribution scheme logic 910 may provide data distribution scheme 950 to efficient distribution scheme identifier logic 920.
- Efficient distribution scheme identifier logic 920 may receive "use SPMD" syntax 940 from SPMD optional logic 900 or may receive data distribution scheme 950 from distribution scheme logic 910, and may determine and generate an efficient data distribution scheme 960 (e.g., for a distributed array provided by technical computing environment 320) with SPMD constructs or based on data distribution scheme 950.
- efficient distribution scheme identifier logic 920 may optimize a time to solution (i.e., a time between when a user submits information (e.g., a distributed array) and when the user receives an answer) using various techniques.
- efficient distribution scheme identifier logic 920 may identify one or more data distribution schemes for each operation or set of operations submitted by a user, and may select an appropriate distribution scheme (e.g., efficient data distribution scheme 960) for each operation or set of operations.
- efficient distribution scheme identifier logic 920 may select a fastest algorithm for each operation submitted by the user.
- efficient distribution scheme identifier logic 920 may select appropriate resources (e.g., a number and types of labs) for each operation. In one example, the third exemplary technique may lead to some labs being idle. However, a smaller number of labs may perform a task faster due to inherent algorithmic constraints of an operation.
- efficient distribution scheme identifier logic 920 may execute the three exemplary techniques, described above, simultaneously to derive efficient data distribution scheme 960.
- Efficient data distribution scheme 960 may be used (e.g., by analysis logic 520) to allocate information (e.g., a distributed array) to two or more labs (e.g., labs 420).
- Fig. 9 shows exemplary functional components of client 500
- client 500 may contain fewer, different, or additional functional components than depicted in Fig. 9.
- one or more functional components of client 500 may perform one or more other tasks described as being performed by one or more other functional components of client 500.
- EXEMPLARY DISTRIBUTION SCHEME / DISTRIBUTED ARRAY SYNTAX In order to perform parallel processing with distributed arrays and their distribution schemes, the SPMD commands described herein may be used. In one exemplary implementation, distributed arrays and their distribution schemes may be created, manipulated, and parallel processed via distribution scheme and/or distributed array commands or functions described herein. Such commands may automatically implement SPMD or other parallel constructs for distribution schemes and/or distributed arrays, as described herein.
- Fig. 10 illustrates an exemplary diagram of distribution scheme commands 1000 and/or distributed array commands 1010 capable of being generated by client 500 (e.g., via technical computing environment 320).
- Distribution scheme commands 1000 may specify a layout of data onto a parallel resource set (e.g., labs 420), and may specify which parallel resource set is to be used for a distribution.
- Distribution scheme commands 1000 may encapsulate such information (e.g., distribution objects) inside a distributor object represented by a remote class 1020 (e.g., a distributor class).
- Distribution scheme commands may specify a layout of data onto a parallel resource set (e.g., labs 420), and may specify which parallel resource set is to be used for a distribution.
- Distribution scheme commands 1000 may encapsulate such information (e.g., distribution objects) inside a distributor object represented by a remote class 1020 (e.g., a distributor class).
- Distribution scheme commands may specify a layout of data onto a parallel resource set (e.g., labs 420), and may specify which parallel resource set is to be used for a distribution.
- Distribution scheme commands 1000 may encapsulate such information (e.g., distribution objects) inside a distributor object represented
- the distributor syntax may be a place holder (stub or proxy) for the codistributor syntax, and may perform operations using the SPMD block syntax to change the parallel context from an outer parallel context to an inner parallel context. In other words, the distributor syntax may not store data, but may rely on state information provided by the codistributor syntax.
- Distributed array commands 1010 may specify a layout of data onto a parallel resource set (e.g., labs 420), and may specify which parallel resource set is to be used for a distributed array.
- Distributed array commands 1010 may encapsulate such information (e.g., distributed array objects) inside a distributed object represented a distributed class provided outside a distributed array's parallel context, as indicated by reference number 1040.
- Distributed array commands 1010 (e.g., the distributed class) may be provided to analysis logic 520, and analysis logic may create a parallel distributed object represented by a codistributed class.
- the codistributed class may be provided inside a distributed array's parallel context, as indicated by reference number 1050.
- the distributed syntax may include a remote reference to a distributed array, and methods of the distributed syntax may remotely invoke methods of the codistributed syntax.
- the codistributed syntax may include an instance of the distributed array. Codistributed syntax may reside on each lab, and may store a local portion of the distributed array data as a private field. The methods of the codistributed syntax may perform computations on the data.
- the distributed syntax may serve as a stub or a proxy that may remotely invoke the methods of the codistributed syntax (e.g., via SPMD blocks).
- analysis logic 520 may automatically create the codistributor syntax and the codistributed syntax based on the distributor syntax and the distributed syntax, respectively, as indicated by reference number 1060.
- analysis logic 520 may transform distribution scheme commands 1000 (e.g., the distributor syntax) and distributed array commands (e.g., the distributed syntax) into parallel-based syntax (e.g., the codistributor syntax and the codistributed syntax).
- distribution scheme commands 1000 e.g., the distributor syntax
- distributed array commands e.g., the distributed syntax
- Analysis logic 520 may provide the codistributor syntax and the codistributed syntax to one or more labs 420 for parallel execution.
- Fig. 10 shows exemplary distribution scheme commands and/or distributed array commands
- client 500 may contain fewer, different, or additional distribution scheme commands and/or distributed array commands than depicted in Fig. 10.
- Fig. 11 depicts an exemplary diagram of distribution scheme commands 1100 capable of being generated by client 500 (e.g., via technical computing environment 320).
- distribution scheme commands 1100 may include an outer parallel context 1110, SPMD boundaries 1120, and an inner parallel context 1130.
- Outer parallel context 1110 may include syntax or code provided outside a spmd statement and an end statement (e.g., outside SPMD boundaries 1120).
- outer parallel context 1110 may be executed sequentially (e.g., by client 500), and may include distributor syntax.
- SPMD boundaries 1120 may be defined by the spmd statement and the end statement of the
- SPMD boundaries 1120 may define outer parallel context 1110 and inner parallel context 1130) associated with the SPMD command.
- Inner parallel context 1130 may include syntax or code provided inside the spmd statement and the end statement (e.g., inside SPMD boundaries 1120). In one exemplary implementation, inner parallel context 1130 may be provided to two or more labs (e.g., labs 420), and may be executed in parallel by the two or more labs. Inner parallel context 1130 may include codistributor syntax.
- outer parallel context 1110 may be provided to analysis logic 520 (not shown), and analysis logic 520 may automatically create and identify SPMD boundaries 1120 and inner parallel context 1130.
- analysis logic 520 may analyze outer parallel context 1110 to determine input variables associated with outer parallel context 1110, and may execute outer parallel context 1110 sequentially on client 500.
- Analysis logic 520 may determine and allocate one or more portions of inner parallel context 1130 and the input variables to labs 420 for parallel execution. If analysis logic 520 determines that no resources (e.g., labs 420) are available for parallel execution, client 500 may sequentially execute outer parallel context 1110 and inner parallel context 1130.
- Fig. 11 shows exemplary distribution scheme commands
- client 500 may contain fewer, different, or additional distribution scheme commands than depicted in Fig. 11.
- the distributor object may determine a parallel context that owns a distributed array.
- client 500 e.g., analysis logic 520
- Fig. 12 illustrates an exemplary distributor placement application program interface (API) capable of being provided by client 500. As shown, client 500 may provide distribution scheme commands 1200 and a distributor placement API 1210.
- Distribution scheme commands 1200 may include a codistributor object (e.g., distL) and a distributor object (e.g., distR). Distribution scheme commands 1200 may be provided to distributor placement API 1210.
- a codistributor object e.g., distL
- a distributor object e.g., distR
- Distributor placement API 1210 may receive distribution scheme commands 1200, and may choose a parallel context for distributed arrays by calling an appropriate distributor object. For example, in one implementation, distributor placement API 1210 may determine that distL is a non-remote class (e.g., a codistributor class), as indicated by reference number 1230, and may choose a local (or non- remote) parallel context for distributed arrays associated with distL. Distributor placement API 1210 may determine that distR is a remote class (e.g., a distributor class), as indicated by reference number 1240, and may choose a remote parallel context for distributed arrays associated with distR.
- a non-remote class e.g., a codistributor class
- distR is a remote class (e.g., a distributor class), as indicated by reference number 1240, and may choose a remote parallel context for distributed arrays associated with distR.
- FIG. 12 shows exemplary functions associated with distributor placement API 1210, in other implementations, distributor placement API 1210 may contain fewer, different, or additional functions than depicted in Fig. 12.
- Fig. 13 depicts an exemplary diagram of distribution scheme commands 1300 and 1310 capable of being provided by client 500.
- distributor objects of class distributor and codistributor may implement a SPMD conversion.
- a distributor object of class distributor 1320 provided outside a SPMD block may be automatically converted (e.g., via analysis logic 520) into a distributor object of class codistributor 1330 if distributor object of class distributor 1320 is used inside the SPMD block.
- distributor object 1320 e.g., distld
- distributor object 1320 may be automatically converted (e.g., via analysis logic 520) into distributor object of class codistributor 1330.
- a distributor object of class codistributor 1340 provided inside a SPMD block may be automatically converted (e.g., via analysis logic 520) into a distributor object of class distributor 1350 if distributor object of class codistributor 1340 is used outside the SPMD block.
- distributor object 1340 e.g., distld
- distributor object 1340 may be automatically converted (e.g., via analysis logic 520) into distributor object of class distributor 1350.
- Fig. 13 shows exemplary distribution scheme commands
- client 500 may contain fewer, different, or additional distribution scheme commands than depicted in Fig. 13.
- Fig. 14 illustrates an exemplary diagram of distribution scheme commands 1400 and 1410 capable of being provided by client 500.
- Distribution scheme commands 1400 and/or 1410 may provide a mechanism (e.g., a distribution property of a distributor object) for obtaining a specific distribution scheme object.
- distribution scheme commands 1400 may include a distributor object 1420 provided outside a SPMD block and specific distribution scheme objects 1430 (e.g., distributionDimensio ⁇ ). Information specific to a distribution scheme (e.g., provided by distributor object 1420) may be obtained (e.g., via analysis logic 520) by using specific distribution objects 1430 outside the SPMD block.
- distribution scheme commands 1410 may include a distributor object 1440 provided inside a SPMD block and specific distribution scheme objects 1450 (e.g., distributionDimension). Information specific to a distribution scheme (e.g., provided by distributor object 1440) may be obtained (e.g., via analysis logic 520) by using specific distribution objects 1450 inside the SPMD block.
- Fig. 14 shows exemplary distribution scheme commands
- client 500 may contain fewer, different, or additional distribution scheme commands than depicted in Fig. 14.
- distribution scheme commands 1400 and/or 1410 may provide a mechanism (e.g., a distribution property of a distributor object) for accessing functions common to all distribution schemes (e.g., a zerosO function).
- Fig. 15 depicts an exemplary diagram of functional components of client 500 for handling user- defined distribution schemes.
- client 500 may include abstract class deriver logic 1500, SPMD conversion API logic 1510, and distribution scheme methods logic 1520.
- Abstract class deriver logic 1500 may include hardware, software, and/or a combination of hardware and software based logic that receives a user-defined distribution scheme 1530, and creates a class (e.g., distributorBase) 1540 that is a subclass of a remote distribution scheme or a class (e.g., codistributorBase) 1550 that is a subclass of a non-remote distribution scheme.
- Abstract class deriver 1500 may provide class (e.g., distributorBase) 1540 or class (e.g., codistributorBase) 1550 to SPMD conversion API logic 1510.
- SPMD conversion API logic 1510 may include hardware, software, and/or a combination of hardware and software based logic that receives class (e.g., distributorBase) 1540 or class (e.g., codistributorBase) 1550 from abstract class deriver logic 1500, and integrates class (e.g., distributorBase) 1540 or class (e.g., codistributorBase) 1550 by implementing a SPMD conversion API.
- class e.g., distributorBase
- class e.g., codistributorBase
- the SPMD conversion API may include a first function that may be invoked in inner parallel context when crossing end statement, and may return a function handle to a variant constructor function and input data that may be used by a second function.
- the second function may invoke the variant constructor function in the outer parallel context when crossing an end statement, and may return a variant (or a reference).
- the SPMD conversion API may include a third function that may be invoked in an outer parallel context when crossing a spmd statement, and may return a function handle to a function that updates actual data in the inner parallel context and input data for that function.
- the update function and input data may be used by a fourth function.
- the fourth function may invoke the update function in the inner parallel context when crossing a spmd statement, and may return an input variable (or data).
- the fourth function may receive the input data, may update the input data, and may return updated data.
- SPMD conversion API logic 1510 may provide a SPMD converted distribution scheme 1560, based on implementation of the SPMD conversion API, to distribution scheme methods logic 1520.
- Distribution scheme methods logic 1520 may include hardware, software, and/or a combination of hardware and software based logic that receives SPMD converted distribution scheme 1560, and implements a distribution scheme 1570.
- distribution scheme methods logic 1520 may implement methods that perform parallel computations based on distribution scheme 1570.
- Fig. 15 shows exemplary functional components of client 500
- client 500 may contain fewer, different, or additional functional components than depicted in Fig. 15.
- one or more functional components of client 500 may perform one or more other tasks described as being performed by one or more other functional components of client 500.
- Fig. 16 illustrates an exemplary diagram of distributed array commands 1600 capable of being provided by the client 500 (e.g., via technical computing environment 320).
- distributed array commands 1600 may include a distributed object and a codistributed object.
- the distributed object may provide a remote reference to a distributed array, as indicated by reference number 1610, and may remotely invoke methods of the codistributed object, as indicated by reference number 1620.
- the codistributed object may include an instance of the distributed array 1630 (e.g., the distributed array remotely referenced by the distributed object), and may reside on one or more labs (e.g., lab 420-1), as indicated by reference number 1640.
- the codistributed object may store a local portion of distributed array data 1650 (e.g., as a private field) on the one or more labs (e.g., lab 420-1). Methods associated with the codistributed object may perform computations on distributed array data portion 1650.
- Fig. 16 shows exemplary distributed array commands
- client 500 may contain fewer, different, or additional distributed array commands than depicted in Fig. 16.
- Fig. 17 depicts an exemplary diagram of distributed array commands 1700 capable of being provided by client 500.
- distributed array commands 1700 may include an inner parallel context 1710 and dereference syntax 1720. Execution of some methods associated with an instance of distributed arrays may return replicated values. Inner parallel context 1710 may automatically generate such methods, and the returned values (e.g.,/?) may become remote objects inside inner parallel context 1710. That is, if a call is made in the outer parallel context of distributed array commands 1700, a value coming out of inner parallel context 1710 may be converted to a remote obj ect.
- Fig. 18 illustrates an exemplary diagram of distributed array commands 1800 capable of being provided by client 500.
- distributed objects of class distributed and codistributed may implement a SPMD conversion.
- a distributed object of class distributed 1810 provided outside a SPMD block may be automatically converted (e.g., via analysis logic 520) into a distributed object of class codistributed 1820 if distributed object of class distributed 1810 is used inside the SPMD block.
- distributed object 1810 e.g., Dl
- distributed object 1810 may be automatically converted (e.g., via analysis logic 520) into distributed object of class codistributed 1820.
- a distributed object of class codistributed 1820 provided inside a SPMD block may be automatically converted (e.g., via analysis logic 520) into a distributed object of class distributed 1840 if distributed object of class codistributed 1830 is used outside the SPMD block.
- distributed object 1830 e.g., Dl
- distributed object 1830 may be automatically converted (e.g., via analysis logic 520) into distributed object of class distributed 1840.
- Fig. 18 shows exemplary distributed array commands
- client 500 may contain fewer, different, or additional distributed array commands than depicted in Fig. 18.
- Fig. 19 depicts an exemplary diagram of distributed array commands 1900 and 1910 capable of being provided by client 500.
- Distributed array commands 1900 and 1910 may mix distributed objects and parallel distributed objects.
- distributed array commands 1900 and 1910 may include a distributed object 1920 (e.g., Dl - class distributed) and a parallel distributed object 1930 (e.g., D2 - class codistributed).
- distributed object 1920 and parallel distributed object 1930 may be mixed inside a SPMD block, as indicated by reference number 1940, and may be mixed outside the SPMD block, as indicated by reference number 1950.
- Distributed objects e.g., distributed object 1920
- parallel distributed objects e.g., parallel distributed object 1930
- distributed objects may need to cross the SPMD block boundary and enter the inner parallel context if they are to be mixed with parallel distributed objects.
- parallel distributed objects may need to be taken out of the inner parallel context, across the SPMD block boundary, if they are to be mixed with distributed objects.
- An attempt to mix distributed objects and parallel distributed objects in the wrong context may produce an error 1960.
- attempting to mix a distributed object (e.g., Dl) and a parallel distributed object (e.g., Dl) in a second SPMD second block may produce error 1960.
- Attempting to mix another parallel distributed object (e.g., DO) with the distributed object (e.g., Dl) or the parallel distributed object (e.g., D2) in the second SPMD block may also produce error 1960.
- Fig. 19 shows exemplary distributed array commands
- client 500 may contain fewer, different, or additional distributed array commands than depicted in Fig. 19.
- Fig. 20 illustrates an exemplary diagram of distributed array commands 2000 capable of being provided by client 500.
- distributed array commands 2000 may include distributed objects with one or more input arguments.
- distributed array commands 2000 may include a distributed object with one input argument 2010, distributed objects with two input arguments 2020, a distributed object with three input arguments 2030, and/or a distributed object with four input arguments 2040.
- the distributed object with one input argument 2010 may include an argument 2050 that includes data to be stored in a distributed array.
- the data may include a built-in data type that may be sent to one or more labs (e.g., labs 420) and replicated.
- the data may include a remote reference to an object that resides in a different parallel context.
- the distributed objects with two input arguments 2020 may include argument 2050 and an argument 2060 that includes a distributor object of class distributor.
- the distributed objects with two input arguments 2020 may include argument 2050 and an argument 2070 that indicates a conversion from a single built-in array is requested.
- the distributed object with three input arguments 2030 may include arguments 2050-2070.
- the distributed object with four input arguments 2040 may include arguments 2050-2070 and an argument 2080 that includes an index of a lab (e.g., a labindex).
- Fig. 20 shows exemplary distributed array commands
- client 500 may contain fewer, different, or additional distributed array commands than depicted in Fig. 20.
- Fig. 21 depicts an exemplary diagram of distributed array commands 2100 capable of being provided by client 500.
- distributed array commands 2100 may include parallel distributed objects with one or more input arguments.
- distributed array commands 2100 may include a parallel distributed object with one input argument 2110, parallel distributed objects with two input arguments 2120, a parallel distributed object with three input arguments 2130, and/or a parallel distributed object with four input arguments 2140.
- the distributed object with one input argument 2110 may include an argument 2150 that includes data to be stored in a distributed array.
- the data may include a built-in data type that may be sent to one or more labs (e.g., labs 420) and replicated.
- the distributed objects with two input arguments 2120 may include argument 2150 and an argument 2160 that includes a distributor object of class codistributor. In another example, the distributed objects with two input arguments 2120 may include argument 2150 and an argument 2170 that indicates a conversion from a single built-in array is requested.
- the distributed object with three input arguments 2130 may include arguments 2150-2170.
- the distributed object with four input arguments 2140 may include arguments 2150-2170 and an argument 2180 that includes an index of a lab (e.g., a labindex).
- Fig. 21 shows exemplary distributed array commands
- client 500 may contain fewer, different, or additional distributed array commands than depicted in Fig. 21.
- Fig. 22 illustrates an exemplary diagram of a data placement policy for distribution scheme commands and/or distributed array commands 2200 capable of being provided by client 500.
- distribution scheme/distributed array commands 2200 may include a non-remote distributor object 2210, a remote distributor object 2220, a distributed class object 2230, a codistributed class object 2240, and/or an error 2250.
- a distributor object may determine where data associated with a distributed array may be placed. For example, if a distributor object is non-remote (e.g., non-remote distributor object 2210), a resulting distributed array (e.g., codistributed class object 2240) may be non-remote. However, if a distributor object is non-remote (e.g., non-remote distributor object 2210), error 2250 may be generated if any dimensions of the resulting distributed array are remote. If a distributor object is remote (e.g., remote distributor object 2220), a resulting distributed array (e.g., distributed class object 2230) may be remote.
- a distributor object is remote (e.g., remote distributor object 2220)
- a resulting distributed array e.g., distributed class object 2230) may be remote.
- Fig. 22 shows exemplary distribution scheme/distributed array commands
- client 500 may contain fewer, different, or additional distribution scheme/distributed array commands than depicted in Fig. 22.
- Fig. 23 depicts an exemplary diagram of dimensional constructors 2300 capable of being provided by client 500.
- dimensional constructors 2300 may include dimensional constructors for distributed arrays, such as eye(), ones Q, zeros(), Inf(), NaN(),false(), true(), rand(), randnQ, sparsefm, n, dist), speyeQ, sponesQ, sprandQ, sprandnQ, sprandsymQ, cellQ, etc.
- dimensional constructors 2300 may include other dimensional constructors for distributed arrays.
- dimensional constructors 2300 may include distributed array dimensions as arguments, as indicated by reference number 2310, and a distributor object as arguments, as indicated by reference number 2320. Dimensional constructors 2300 may handle storage attributes (e.g., classes) associated with a distributed array, as indicated by reference number 2330, and may be integrated with distributed and/or codistributed classes, as indicated by reference number 2340.
- storage attributes e.g., classes
- Fig. 23 shows exemplary dimensional constructors
- client 500 may contain fewer, different, or additional dimensional constructors than depicted in Fig. 23.
- Fig. 24 illustrates an exemplary diagram of distribution scheme and/or distributed array commands 2400 capable of being provided by client 500.
- distribution scheme/distributed array commands 2400 may transfer distributed arrays and associated distributor objects.
- distribution scheme/distributed array commands 2400 may include a distributor object (e.g.
- dist 2410 for an inner parallel context, a distributed object (e.g., D) 2420, a distributor object (e.g., distributor(D)) 2430, codistributor objects (e.g., dist and distributor (D) inside a SPMD block) 2440, a codistributor object (e.g., distl) 2450, a codistributed object (e.g., Dl) 2460, a distributed object (e.g., Dl outside the SPMD block) 2470, and a distributor object (e.g., distributor (D I)) 2480.
- a distributed object e.g., D
- a distributor object e.g., distributor(D)) 2430
- codistributor objects e.g., dist and distributor (D) inside a SPMD block
- a codistributor object e.g., dist and distributor (D) inside a SPMD block
- a codistributor object e.
- distributed arrays and distributor objects may be in-sync, as indicated by reference number 2490.
- a distributed object e.g., objects 2420 and/or 2470
- a distributor object e.g., objects 2410, 2430, and/or 2480
- a codistributed object e.g., object 2460
- a distributor object e.g., objects 2410, 2430, and/or 2480
- Fig. 24 shows exemplary distribution scheme/distributed array commands
- client 500 may contain fewer, different, or additional distribution scheme/distributed array commands than depicted in Fig. 24.
- Fig. 25 depicts an exemplary diagram of distribution scheme and/or distributed array commands 2500 capable of being provided by client 500.
- distribution scheme/distributed array commands 2500 may show interactions with nested SPMD blocks.
- a distributor object e.g., dist
- a distributor object may exist in one parallel context, as indicated by reference number 2510.
- a distributor object may be referenced with distributor remotes (or remote references) outside of the one parallel context, as indicated by reference number 2520.
- a distributor object may enter a nested SPMD block from its own parallel context, as indicated by reference number 2530.
- a distributed array may exist in one parallel context, as indicated by reference number 2540.
- references to a distributed array outside of the distributed array's context are not permissible in a nested SPMD block, as indicated by reference number 2550.
- Functions associated with distributed arrays may be collective and may include a transfer of a distributed array into a nested SPMD block (e.g., a SPMD block within another SPMD block).
- a distributed array and distributor functions may use the same data placement rules regardless of the nesting depth of the SPMD blocks, as indicated by reference number 2560.
- Fig. 25 shows exemplary distribution scheme/distributed array commands
- client 500 may contain fewer, different, or additional distribution scheme/distributed array commands than depicted in Fig. 25.
- Fig. 26 illustrates an exemplary diagram of distribution scheme and/or distributed array commands 2600 capable of being provided by client 500.
- distribution scheme/distributed array commands 2600 may generate an error 2610 to prevent parallel error signaling.
- Error 2610 may be generated when a codistributed object (e.g., D) is used in a nested SPMD block.
- the SPMD conversion API described above, may convert an object into its remote representation the SPMD code is executed.
- the SPMD conversion API may generate error 2610 unconditionally. By doing so, the erroneous use of the object in a nested SPMD block may be signaled by client 500, and may prevent parallel error signaling (e.g., by labs 420).
- Fig. 26 shows exemplary distribution scheme/distributed array commands
- client 500 may contain fewer, different, or additional distribution scheme/distributed array commands than depicted in Fig. 26.
- Fig. 27 depicts an exemplary diagram of distribution scheme and/or distributed array commands 2700 capable of being provided by client 500.
- distribution scheme/distributed array commands 2700 may reduce a remote call outside a parallel processing construct. If code that uses distributed arrays is inside a SPMD block, remote method invocation may be reduced. In other words, surrounding a piece of code with a SPMD block may optimize the code.
- Distributed arrays may execute faster inside a SPMD block because of latency associated with connecting to remote labs for every function call versus sending the entire contents (e.g., distributed arrays) of the SPMD block and executing the contents remotely at one time. For example, as shown in Fig.
- another function e.g., testerSPMDQ
- the labs may execute the single call at one time.
- Fig. 27 shows exemplary distribution scheme/distributed array commands
- client 500 may contain fewer, different, or additional distribution scheme/distributed array commands than depicted in Fig. 27.
- Fig. 28 illustrates an exemplary diagram of functional components of client 500 for delegating distributed array methods to distributor objects.
- client 500 may include common distributor creator logic 2800 and redistribute" logic 2810.
- Common distribution creator 2800 may include hardware, software, and/or a combination of hardware and software based logic that receives a first custom distribution scheme 2820 (e.g., from a first user), and receives a second customer distribution scheme 2830 (e.g., from a second user). Common distribution creator 2800 may redistribute first custom distribution scheme 2820 and second custom distribution scheme 2830 into a common distribution 2830, and may provide common distribution 2840 to redistributor logic 2810.
- Redistributor logic 2810 may include hardware, software, and/or a combination of hardware and software based logic that receives common distribution 2840 from common distribution creator 2800, and redistributes common distribution 2840 to a target distribution scheme 2850.
- Target distribution scheme 2850 may be used in place of first custom distribution scheme 2820 and second custom distribution scheme 2830.
- Fig. 28 shows exemplary functional components of client 500
- client 500 may contain fewer, different, or additional functional components than depicted in Fig. 28.
- one or more functional components of client 500 may perform one or more other tasks described as being performed by one or more other functional components of client 500.
- Figs. 29-45 depict flow charts associated with an exemplary process 2900 according to implementations described herein.
- process 2900 may be performed by client 500.
- process 2900 may be performed by another device or combination of devices (e.g., client 500 in conjunction with web service 580).
- process 2900 may begin with initiation of a single programming language
- parallel processing API 590 may be used to provide or initiate a single programming language presentation of distributed arrays.
- One or more data distribution schemes for executing a program may be identified, via the single programming language (block 2920).
- efficient distribution scheme identifier logic 920 may identify one or more data distribution schemes for each operation or set of operations (e.g., main program 545) submitted by a user.
- an optimum data distribution scheme may be automatically selected, via the single programming language, from the one or more identified data distribution schemes (block 2930), and the program may be transformed, via the single programming language, into a parallel program based on the optimum distribution scheme (block 2940).
- efficient distribution scheme identifier logic 920 may select an appropriate distribution scheme (e.g., efficient data distribution scheme 960) for each operation or set of operations, may select a fastest algorithm for each operation submitted by the user, and may select appropriate resources (e.g., a number and types of labs) for each operation.
- Efficient data distribution scheme 960 may be used (e.g., by analysis logic 520) to allocate information (e.g., a distributed array) to two or more labs (e.g., labs 420).
- Distribution scheme commands 1000 may specify a layout of data onto a parallel resource set (e.g., labs 420), and may specify which parallel resource set is to be used for a distribution.
- Distributed array commands 1010 may specify a layout of data onto a parallel resource set (e.g., labs 420), and may specify which parallel resource set is to be used for a distributed array.
- Analysis logic 520 may automatically create the codistributor syntax and the codistributed syntax based on the distributor syntax and the distributed syntax, respectively, as indicated by reference number 1060.
- analysis logic 520 may transform distribution scheme commands 1000 (e.g., the distributor syntax) and distributed array commands (e.g., the distributed syntax) into parallel-based syntax (e.g., the codistributor syntax and the codistributed syntax).
- distribution scheme commands 1000 e.g., the distributor syntax
- distributed array commands e.g., the distributed syntax
- parallel-based syntax e.g., the codistributor syntax and the codistributed syntax
- one or more portions of the parallel program may be allocated to one or more labs for parallel execution (block 2950).
- analysis logic 520 may provide the automatically created, parallel-based syntax (e.g., the codistributor syntax and the codistributed syntax) to one or more labs 420 for parallel execution.
- one or more results associated with parallel execution of the one or more program portions may be received from the one or more labs (block 2960), and the one or more results may be provided to the program (block 2970).
- results provider 540 of client 500 may receive results 570 from the labs, and may provide results 570 to program provider 510.
- results provider 540 may combine results 570 into a single result, and may provide the single result to program provider 510.
- Process block 2930 may include the process blocks illustrated in Fig. 30. As shown in Fig.
- process block 2930 may include receiving a defined data distribution scheme (block 3000), creating a class that is a subclass of a distributor object or a parallel distributor object associated with the defined data distribution scheme (block 3010), and identifying a parallel construct or function of the parallel program based on the created class (block 3020).
- abstract class deriver logic 1500 may receive user-defined distribution scheme 1530, and may create class (e.g., distributorBase) 1540 that is a subclass of a remote distribution scheme or class (e.g., codistributorBase) 1550 that is a subclass of a non-remote distribution scheme.
- Abstract class deriver 1500 may provide class (e.g., distributorBase) 1540 or class (e.g., codistributorBase) 1550 to SPMD conversion API logic 1510.
- SPMD conversion API logic 1510 may receive class (e.g., distributorBase) 1540 or class (e.g., codistributorBase) 1550 from abstract class deriver logic 1500, and may integrate class (e.g., distributorBase) 1540 or class (e.g., codistributorBase) 1550 by implementing a SPMD conversion API.
- SPMD conversion API logic 1510 may provide SPMD converted distribution scheme 1560, based on implementation of the SPMD conversion API, to distribution scheme methods logic 1520.
- Distribution scheme methods logic 1520 may receive SPMD converted distribution scheme 1560, and may implement distribution scheme 1570. In one example, distribution scheme methods logic 1520 may implement methods that perform parallel computations based on distribution scheme 1570.
- Process block 2940 may include the process blocks illustrated in Fig. 31. As shown in Fig. 31 , process block 2940 may include may include transforming one or more segments of the program into one or more remote segments, using a parallel construct or function, to produce the parallel program (block 3100), identifying the inner and outer contexts of the parallel program (block 3110), executing the outer context of the parallel program sequentially (block 3120), and determining the one or more portions of the parallel program from the inner context of the parallel program (block 3130). For example, in implementations described above in connection with Fig. 11 , distribution scheme commands 1100 may include outer parallel context 1110, SPMD boundaries 1120, and inner parallel context 1130.
- Outer parallel context 1110 may be provided to analysis logic 520, and analysis logic 520 may automatically create and identify SPMD boundaries 1120 and inner parallel context 1130.
- Analysis logic 520 may analyze outer parallel context 1110 to determine input variables associated with outer parallel context 1110, and may execute outer parallel context 1110 sequentially on client 500.
- Analysis logic 520 may determine and allocate one or more portions of inner parallel context 1130 and the input variables to labs 420 for parallel execution.
- process block 2940 may include the process blocks illustrated in Fig. 32. As shown in Fig. 32, process block 2940 may include selecting a distributor object constructor for the program based on the optimum distribution scheme (block 3200), determining a parallel context for the program based the distributor object constructor (block 3210), and transforming the program into the parallel program based on the distributor object constructor (block 3220). For example, in implementations described above in connection with Fig. 12, a distributor object may determine a parallel context that owns a distributed array. Client 500 (e.g., analysis logic 520) may select the parallel context by choosing which of the distributor objects is called.
- Distribution scheme commands 1200 may include a codistributor object (e.g., distL) and a distributor object (e.g., distR).
- Distributor placement API 1210 may receive distribution scheme commands 1200, and may choose a parallel context for distributed arrays by calling an appropriate distributor object.
- distributor placement API 1210 may determine that distL is a non-remote class (e.g., a codistributor class), as indicated by reference number 1230, and may choose a local (or non-remote) parallel context for distributed arrays associated with distL.
- Distributor placement API 1210 may determine that distR is a remote class (e.g., a distributor class), as indicated by reference number 1240, and may choose a remote parallel context for distributed arrays associated with distR.
- Client 500 e.g., analysis logic 520
- Client 500 may transform distribution scheme commands 1200 into parallel syntax based on the codistributor object and the distributor object.
- process block 2940 may include the process blocks illustrated in Fig. 33.
- process block 2940 may include providing a parallel construct or function to transform the program into the parallel program (block 3300), automatically converting a remote object, provided outside the parallel construct or function and used inside the parallel construct or function, into a non -remote object (block 3310), automatically converting a non-remote object, provided inside the parallel construct or function and used outside the parallel construct or function, into a remote object (block 3320), and generating an error, before the program executes, when the non-remote object is provided in a nested parallel construct or function (block 3330).
- a parallel construct or function to transform the program into the parallel program
- a remote object block 3300
- automatically converting a remote object provided outside the parallel construct or function and used inside the parallel construct or function
- a remote object block 3320
- generating an error before the program executes, when the non-remote object is provided in a nested parallel construct or function (block 3330).
- distribution scheme commands 1300 and 1310 may include a parallel construct (e.g., a SPMD block).
- Distributor object of class distributor 1320 provided outside a SPMD block may be automatically converted (e.g., via analysis logic 520) into distributor object of class codistributor 1330 if distributor object of class distributor 1320 is used inside the SPMD block.
- Distributor object of class codistributor 1340 provided inside a SPMD block may be automatically converted (e.g., via analysis logic 520) into distributor object of class distributor 1350 if distributor object of class codistributor 1340 is used outside the SPMD block.
- an error may be generated before execution of distribution scheme commands 1300 and 1310.
- process block 2940 may include the process blocks illustrated in Fig. 34. As shown in Fig. 34, process block 2940 may include providing a parallel construct or function to transform the program into the parallel program (block 3400), and obtaining distribution scheme information using properties of the optimum data distribution scheme inside the parallel construct or function (block 3410).
- distribution scheme commands 1400 and/or 1410 may include a parallel construct (e.g., a SPMD block), and may provide a mechanism (e.g., a distribution property of a distributor object) for obtaining a specific distribution scheme object.
- Distribution scheme commands 1400 may include distributor object 1420 provided outside a SPMD block and specific distribution scheme objects 1430 (e.g., distributionDimensiori).
- Information specific to a distribution scheme e.g., provided by distributor object 1420
- Distribution scheme commands 1410 may include distributor object 1440 provided inside a SPMD block and specific distribution scheme objects 1450 (e.g., distributionDimensiori). Information specific to a distribution scheme (e.g., provided by distributor object 1440) may be obtained (e.g., via analysis logic 520) by using specific distribution objects 1450 inside the SPMD block.
- specific distribution scheme objects 1450 e.g., distributionDimensiori.
- Process blocks 2940-2960 may include the process blocks illustrated in Fig. 35. As shown in Fig. 35, process blocks 2940-2960 may include providing a remote reference to an instance of distributed array (block 3500), remotely reconstructing the instance of the distributed array from the remote reference (block 3510), allocating the instance of the distributed array to the two or more labs (block 3520), and storing a local portion of distributed array data as private, with the instance of the distributed array, on the two or more labs (block 3530).
- distributed array commands 1600 may include a distributed object and a codistributed object.
- the distributed object may provide a remote reference to a distributed array, as indicated by reference number 1610, and may remotely invoke methods the codistributed object, as indicated by reference number 1620.
- the codistributed object may include an instance of the distributed array 1630 (e.g., the distributed array remotely referenced by the distributed object), and may reside on one or more labs (e.g., lab 420-1), as indicated by reference number 1640.
- the codistributed object may store a local portion of distributed array data 1650 (e.g., as a private field) on the one or more labs (e.g., lab 420-1).
- process blocks 2940-2960 may further include receiving one or more results associated with execution of the instance of the distributed array on the local portion from the two or more labs (block 3540), and dereferencing one or more remote references associated with the one or more results to produce one or more non -remote references when the one or more non-remote references are on the same lab(s) (block 3550).
- the codistributed object may provide a reference to the local portion of distributed array data 1650
- lab 420-1 may provide one or more results of associated with execution of instance of the distributed array 1630 on local portion 1650 to distributed array commands 1600.
- process block 2940 may include the process blocks illustrated in Fig. 36.
- process block 2940 may include providing a parallel construct or function to transform the program into the parallel program (block 3600), automatically converting a distributed class variable, provided outside the parallel construct or function and used inside the parallel construct or function, into a parallel distributed class variable (block 3610), and automatically converting a parallel distributed class variable, provided inside the parallel construct or function and used outside the parallel construct or function, into a distributed class variable (block 3620).
- distributed array commands 1800 may include a parallel construct (e.g., a SPMD block).
- Distributed object of class distributed 1810 provided outside a SPMD block may be automatically converted (e.g., via analysis logic 520) into a distributed object of class codistributed 1820 if distributed object of class distributed 1810 is used inside the SPMD block.
- Distributed object of class codistributed 1820 provided inside a SPMD block may be automatically converted (e.g., via analysis logic 520) into a distributed object of class distributed 1840 if distributed object of class codistributed 1830 is used outside the SPMD block.
- process block 2940 may include the process blocks illustrated in Fig. 37. As shown in Fig.
- process block 2940 may include providing a parallel construct or function to transform the program into the parallel program (block 3700), providing a distributed class object outside the parallel construct or function (block 3710), mixing a distributed class object provided outside the parallel construct or function with a parallel distributed class object when the distributed class object and the parallel distributed class object are in the same context (block 3720), providing a parallel distributed class object inside the parallel construct or function (block 3730), and mixing the parallel distributed class object provided inside the parallel construct or function with a distributed class object when the parallel distributed class object and the distributed class object are in the same context (block 3740).
- a parallel construct or function to transform the program into the parallel program
- block 3710 providing a distributed class object outside the parallel construct or function
- block 3720 provides a parallel distributed class object inside the parallel construct or function
- block 3740 mixing the parallel distributed class object provided inside the parallel construct or function with a distributed class
- distributed array commands 1900 and 1910 may include a parallel construct (e.g., a SPMD block), distributed object 1920, and parallel distributed object 1930.
- Distributed object 1920 and parallel distributed object 1930 may be mixed inside a SPMD block, as indicated by reference number 1940, and may be mixed outside the SPMD block, as indicated by reference number 1950.
- process block 2940 may include the process blocks illustrated in Fig. 38. As shown in Fig. 38, process block 2940 may include generating a distributed constructor (block 3800), providing data inside a distributed array as a first argument of the distributed constructor (block 3810), providing a distributor class object as a second argument of the distributed constructor (block 3820), providing a lab index as a third argument of the distributed constructor (block 3830), and providing a string conversion as fourth argument of the distributed constructor (block 3840).
- distributed array commands 2000 may include distributed objects with one or more input arguments.
- the distributed object with one input argument 2010 may include argument 2050 that includes data to be stored in a distributed array.
- the distributed objects with two input arguments 2020 may include argument 2050 and argument 2060 that includes a distributor object of class distributor or argument 2070 that indicates a conversion from a single built-in array is requested.
- the distributed object with three input arguments 2030 may include arguments 2050-2070.
- the distributed object with four input arguments 2040 may include arguments 2050-2070 and argument 2080 that includes an index of a lab (e.g., a labindex).
- process block 2940 may include the process blocks illustrated in Fig. 39.
- process block 2940 may include generating a parallel distributed constructor (block 3900), providing data inside a distributed array as a first argument of the parallel distributed constructor (block 3910), providing a parallel distributor class object as a second argument of the parallel distributed constructor (block 3920), providing a lab index as a third argument of the parallel distributed constructor (block 3930), and providing a string conversion as fourth argument of the parallel distributed constructor (block 3940).
- distributed array commands 2100 may include parallel distributed objects with one or more input arguments.
- the distributed obj ect with one input argument 2110 may include argument 2150 that includes data to be stored in a distributed array.
- the distributed objects with two input arguments 2120 may include argument 2150 and argument 2160 that includes a distributor object of class codistributor or argument 2170 that indicates a conversion from a single built-in array is requested.
- the distributed object with three input arguments 2130 may include arguments 2150-2170.
- the distributed object with four input arguments 2140 may include arguments 2150-2170 and argument 2180 that includes an index of a lab (e.g., a labindex).
- process block 2940 may include the process blocks illustrated in Fig. 40. As shown in Fig. 40, process block 2940 may include using a non-remote distributor object to place data in an inner parallel context of the parallel program (block 4000), and using a remote distributor object to place data in an outer parallel context of the parallel program (block 4010).
- distribution scheme/distributed array commands 2200 may include non-remote distributor object 2210, remote distributor object 2220, distributed class object 2230, and codistributed class object 2240.
- a distributor object e.g., non-remote distributor object 2210 and remote distributor object 2220
- a distributor object is non-remote (e.g., non-remote distributor object 2210)
- a resulting distributed array e.g., codistributed class object 2240
- error 2250 may generated if any dimensions of the resulting distributed array are remote.
- a distributor object is remote (e.g., remote distributor object 2220)
- a resulting distributed array e.g., distributed class object 2230) may be remote.
- process block 2940 may include the process blocks illustrated in Fig. 41.
- process block 2940 may include providing one or more dimensional constructors (block 4100), providing one or more distributed array dimensions as one or more arguments for the one or more dimensional constructors (block 4110), providing a distributor as one or more arguments for the one or more dimensional constructors (block 4120), handling one or more storage attributes with the one or more dimensional constructors (block 4130), and integrating the one or more dimensional constructors with one or more remote and/or non -remote distributor objects (block 4140).
- dimensional constructors 2300 may include distributed array dimensions as arguments, as indicated by reference number 2310, and a distributor object as arguments, as indicated by reference number 2320.
- Dimensional constructors 2300 may handle storage attributes associated with a distributed array, as indicated by reference number 2330, and may be integrated with distributed and/or codistributed classes, as indicated by reference number 2340.
- process block 2940 may include the process blocks illustrated in Fig. 42.
- process block 2940 may include providing a parallel construct or function to transform the program into the parallel program (block 4200), defining a distribution, associated with a distributed object, for a distributor object (block 4210), and defining a distribution, associated with a parallel distributed object, for the distributor object (block 4220).
- distribution scheme/distributed array commands 2400 may include a parallel construct (e.g., a SPMD block).
- distributed arrays and distributor objects may be in-sync, as indicated by reference number 2490.
- a distributed object may include a distributor object (e.g., objects 2410, 2430, and/or 2480) to define its distribution
- a codistributed object e.g., object 2460
- a distributor object e.g., objects 2410, 2430, and/or 2480
- process block 2940 may include the process blocks illustrated in Fig. 43.
- process block 2940 may include providing a parallel construct or function to transform the program into the parallel program (block 4300), providing a distributor object inside a parallel context of the parallel construct or function (block 4310), referencing the distributor object with a distributor remote outside the parallel context of the parallel construct or function (block 4320), and providing a nested parallel construct or function for entry from the parallel context of the parallel construct or function with the distributor object (block 4330).
- a parallel construct or function to transform the program into the parallel program
- block 4310 providing a distributor object inside a parallel context of the parallel construct or function
- block 4320 referencing the distributor object with a distributor remote outside the parallel context of the parallel construct or function
- providing a nested parallel construct or function for entry from the parallel context of the parallel construct or function with the distributor object block 4330.
- distribution scheme/distributed array commands 2500 may include a parallel construct (e.g., a SPMD block), and nested parallel constructs (e.g., nested SPMD blocks).
- a distributor object e.g., disi
- a distributor object may exist in one parallel context, as indicated by reference number 2510.
- a distributor object may be referenced with distributor remotes outside of the one parallel context, as indicated by reference number 2520.
- a distributor object may enter a nested SPMD block from its own parallel context, as indicated by reference number 2530.
- process block 2940 may further include providing a distributed array in the parallel context of the parallel construct or function (block 4340), providing a reference to the distributed array outside a distributed array context and inside the nested parallel construct or function (block 4350), using the same data placement rules for the distributed array and the distributor object (block 4360), and signaling an error when the distributed array is used in the nested parallel construct or function (block 4370).
- a distributed array may exist in one parallel context, as indicated by reference number 2540. References to a distributed array outside of the distributed array's context are not permissible in a nested SPMD block, as indicated by reference number 2550.
- Functions associated with distributed arrays may be collective and may include a transfer of a distributed array into a nested SPMD block (e.g., a SPMD block within another SPMD block).
- a distributed array and distributor functions may use the same data placement rules regardless of the nesting depth of the SPMD blocks, as indicated by reference number 2560.
- Distribution scheme/distributed array commands 2600 may generate error 2610 to prevent parallel error signaling. Error 2610 may be generated when a codistributed object (e.g., D) is used in a nested SPMD block.
- Process blocks 2940 and 2950 may include the process blocks illustrated in Fig. 44. As shown in
- process blocks 2940 and 2950 may include surrounding a distributed array with a parallel construct or function (block 4400), and sending the distributed array to the two or more labs for parallel execution at one time (block 4410).
- distribution scheme/distributed array commands 2700 may reduce a remote call outside a parallel processing construct. If code that uses distributed arrays is inside a SPMD block, remote method invocation may be reduced. In other words, surrounding a piece of code with a SPMD block may optimize the code.
- the labs may separately execute the six separate calls, as indicated by reference number 2720.
- another function e.g., testerSPMDQ
- process block 2940 may include the process blocks illustrated in Fig. 45. As shown in Fig. 45, process block 2940 may include receiving a first user-defined distribution scheme for the program (block 4500), receiving a second user-defined distribution scheme for the program (block 4510), redistributing distributed arrays, associated with the first and second user- defined distribution schemes, into a common distribution scheme (block 4520), and redistributing the common distribution scheme to a target distribution scheme (block 4530).
- client 500 may include common distributor creator logic 2800 and redistributor logic 2810.
- Common distribution creator 2800 may receive first custom distribution scheme 2820 (e.g., from a first user), and may receive second customer distribution scheme 2830 (e.g., from a second user).
- Common distribution creator 2800 may redistribute first custom distribution scheme 2820 and second custom distribution scheme 2830 into common distribution 2830, and may provide common distribution 2840 to redistributor logic 2810.
- Redistributor logic 2810 may receive common distribution 2840 from common distribution creator 2800, and may redistribute common distribution 2840 to target distribution scheme 2850.
- Target distribution scheme 2850 may be used in place of first custom distribution scheme 2820 and second custom distribution scheme 2830.
- Implementations described herein may provide systems and/or methods for performing parallel processing.
- the systems and/or methods may initiate a single programming language, and may identify, via the single programming language, one or more data distribution schemes for executing a program.
- the systems and/or methods also may transform, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and may allocate one or more portions of the parallel program to two or more labs for parallel execution.
- the systems and/or methods may further receive one or more results associated with the parallel execution of the one or more portions from the two or more labs, and may provide the one or more results to the program.
- user has been used herein.
- user is intended to be broadly interpreted to include a client or a user of a client.
- logic may include hardware, such as an application specific integrated circuit or a field programmable gate array, software, or a combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Stored Programmes (AREA)
Abstract
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US5429208P | 2008-05-19 | 2008-05-19 | |
US5429508P | 2008-05-19 | 2008-05-19 | |
US12/254,605 US8255890B2 (en) | 2007-02-14 | 2008-10-20 | Media for performing parallel processing of distributed arrays |
US12/254,618 US8250550B2 (en) | 2007-02-14 | 2008-10-20 | Parallel processing of distributed arrays and optimum data distribution |
US12/254,620 US8239846B2 (en) | 2007-02-14 | 2008-10-20 | Device for performing parallel processing of distributed arrays |
PCT/US2009/044384 WO2009143073A1 (fr) | 2008-05-19 | 2009-05-18 | Traitement parallèle de réseaux distribués |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2165260A1 true EP2165260A1 (fr) | 2010-03-24 |
Family
ID=41020980
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP09751301A Ceased EP2165260A1 (fr) | 2008-05-19 | 2009-05-18 | Traitement parallèle de réseaux distribués |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP2165260A1 (fr) |
WO (1) | WO2009143073A1 (fr) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110202329A1 (en) * | 2010-02-12 | 2011-08-18 | James Howard Goodnight | Scenario State Processing Systems And Methods For Operation Within A Grid Computing Environment |
US9665405B1 (en) | 2010-02-12 | 2017-05-30 | Sas Institute Inc. | Distributed systems and methods for state generation based on multi-dimensional data |
US8271537B2 (en) | 2010-11-15 | 2012-09-18 | Sas Institute Inc. | Grid computing system alongside a distributed database architecture |
US8996518B2 (en) | 2010-12-20 | 2015-03-31 | Sas Institute Inc. | Systems and methods for generating a cross-product matrix in a single pass through data using single pass levelization |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8010954B2 (en) * | 2007-02-14 | 2011-08-30 | The Mathworks, Inc. | Parallel programming interface to dynamically allocate program portions |
-
2009
- 2009-05-18 EP EP09751301A patent/EP2165260A1/fr not_active Ceased
- 2009-05-18 WO PCT/US2009/044384 patent/WO2009143073A1/fr active Application Filing
Non-Patent Citations (1)
Title |
---|
See references of WO2009143073A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2009143073A1 (fr) | 2009-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8255890B2 (en) | Media for performing parallel processing of distributed arrays | |
US8250550B2 (en) | Parallel processing of distributed arrays and optimum data distribution | |
US8239846B2 (en) | Device for performing parallel processing of distributed arrays | |
US8255889B2 (en) | Method of using parallel processing constructs and dynamically allocating program portions | |
EP2147374B1 (fr) | Interface de programmation parallèle | |
US8707280B2 (en) | Using parallel processing constructs and dynamically allocating program portions | |
US8239845B2 (en) | Media for using parallel processing constructs | |
US8108845B2 (en) | Parallel programming computing system to dynamically allocate program portions | |
US8949807B2 (en) | Saving and loading graphical processing unit (GPU) arrays providing high computational capabilities in a computing environment | |
US8935682B2 (en) | Graphical processing unit (GPU) arrays providing high computational capabilities in a computing environment | |
Trinder et al. | Parallel and distributed Haskells | |
US8108717B2 (en) | Parallel programming error constructs | |
WO2009143073A1 (fr) | Traitement parallèle de réseaux distribués | |
Tsuji et al. | Multiple-spmd programming environment based on pgas and workflow toward post-petascale computing | |
WO2009143068A2 (fr) | Procédé d'utilisation de structures de traitement parallèle | |
US8819643B1 (en) | Parallel program profiler | |
Caromel et al. | Proactive parallel suite: From active objects-skeletons-components to environment and deployment | |
Berthold et al. | Scheduling Light-Weight Parallelism in A rTCoP | |
Anderson | Coupling Parallel and Distributed Programs for Sparse Data | |
Kelkar | A Generic Framework for Distributed Computation | |
Ejdys et al. | Integrating Application And System Components With The Grid Component Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20091223 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA RS |
|
17Q | First examination report despatched |
Effective date: 20100519 |
|
DAX | Request for extension of the european patent (deleted) | ||
APBK | Appeal reference recorded |
Free format text: ORIGINAL CODE: EPIDOSNREFNE |
|
APBN | Date of receipt of notice of appeal recorded |
Free format text: ORIGINAL CODE: EPIDOSNNOA2E |
|
APBR | Date of receipt of statement of grounds of appeal recorded |
Free format text: ORIGINAL CODE: EPIDOSNNOA3E |
|
APAF | Appeal reference modified |
Free format text: ORIGINAL CODE: EPIDOSCREFNE |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: THE MATHWORKS, INC. |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
APBT | Appeal procedure closed |
Free format text: ORIGINAL CODE: EPIDOSNNOA9E |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20170504 |