WO2000060460A1 - Procede generique d'aide au placement d'applications de traitement de signal sur calculateurs paralleles - Google Patents
Procede generique d'aide au placement d'applications de traitement de signal sur calculateurs paralleles Download PDFInfo
- Publication number
- WO2000060460A1 WO2000060460A1 PCT/FR2000/000824 FR0000824W WO0060460A1 WO 2000060460 A1 WO2000060460 A1 WO 2000060460A1 FR 0000824 W FR0000824 W FR 0000824W WO 0060460 A1 WO0060460 A1 WO 0060460A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- application
- placement
- model
- data
- constraints
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
Definitions
- the present invention relates to a generic method for assisting with the placement of signal processing applications on parallel computers.
- the placement (“Mapping” in English) consists in distributing the data and the processing linked to a processing, such as a signal processing application, on a computer, generally a computer with parallel architecture. This placement is static, because all of the placement choices are made before the execution of the placed application, unlike dynamic placement. Many tools and programming environments are known for carrying out the placement. These include, among others, the INDE SynDEx tool for signal and image processing, the PTOLEMY tool from the University of Berkeley, the HPF for scientific computing, the "FX compiler”, the GEDEA from the company LOCKHEED MARTIN, ... However, few known tools allow complete automation of the placement.
- the Ptolemy environment is essentially an environment for simulating and prototyping heterogeneous systems integrating material and software.
- the subject of the present invention is a generic method for assisting the placement of systematic signal processing applications on a computer with a homogeneous parallel architecture, a method which makes it possible to automatically obtain at least one optimized placement solution, at a granularity level. as fine as possible, and this, from a complete functional description of the application, and of the computer used.
- the method according to the invention consists, for each functional and physical component of the application, in establishing a model defined by a set of relationships on the different variables relating to this component, in order to model the constraints, to be solved concurrently the relationships thus established, to deduce at least one solution, and, if several solutions are obtained, to choose the one optimizing at least one criterion.
- the constraints are those relating to the sub-functions of the placement function, namely: partitioning, alignment, data distribution and processing sequencing.
- the present invention will be better understood on reading the detailed description of an embodiment, taken by way of nonlimiting example and illustrated by the appended drawing, the single figure of which is a functional diagram of the placement function, implemented in accordance with the invention.
- the present invention relates to systematic signal processing, that is to say unconditional, not subject to external orders or actions. This treatment is, moreover, deterministic and structured.
- This processing can be, for example, of the compression of pulses or the computation of Fourier transforms (FFT).
- FFT Fourier transforms
- Systematic signal processing applications are made up of task sequences, which can be expressed by well-structured and parallel loop nests (nested loops and defined bounds).
- Each loop nest contains a call to a procedure or macro-instruction generally corresponding to an array transformation, that is to say to a function of a signal processing library such as an FFT.
- a signal processing library such as an FFT.
- the processing operations are regular (not subject to external tests) and are carried out on multi-dimensional signals, the data are organized in large tables whose dimensions (for example source, frequency, recurrence time, pointing time) carry the vectors on which the individual treatments will be carried out.
- the table easily adapts to the dimensions of the sensor system, and allows the mathematical formulation of the treatments to be given by computer.
- the indices of the variables making up the formulas become table indices.
- a task also called a routine, procedure or processing, accepts one or more data streams as input and output.
- a flow represents the data accessed in read or write by one and only one elementary processing. All of this data constitutes a basic access or domain of elementary transformation. The same treatment is repeated on an iteration space defined by the application.
- the formulation of the treatment it describes the formula of the elementary transformation.
- the output data is expressed in terms of the input data.
- the basic read or write accesses are specified according to the indices of the tables.
- the dimensions of the arrays and the memory space required to execute a treatment on the whole iteration space of the treatment are also specified.
- the data flow can be conditioned by the data or the indices of the tables (it depends on the application).
- properties are associated with data flows:
- Acquisition is considered to be a full-fledged task including an output stream, and a recurrence (acquisition frequency).
- a data flow graph (Data Flow Graph) can be used which can come from any conventional formalism for description of signal application subject to that it contains the information specified above.
- TSS systematic signal processing
- a data flow graph (Data Flow Graph) can be used which can come from any conventional formalism for description of signal application subject to that it contains the information specified above.
- Placement is the automatic distribution of signal processing operations to be performed on a data stream, and these data themselves, on a computer with a parallel multiprocessor architecture taking into account the different constraints of material resources as well as the performances imposed on the computer.
- the parallel architecture in question here is a homogeneous parallel architecture, in which all the processors are identical, of the SIMD / SPMD (“Single Instruction / Program Multiple Data”) type, that is to say in which all the processors execute the same instruction or the same sequence of instructions (for example a program) on different data.
- the routing of information between the different processors is static, that is to say that the data paths between processors are imposed before the initialization of each mode (they are defined during the compilation of the application ).
- the macro-instructions executed in parallel on each of the processors are identical.
- the data necessary for the processing of the macro-instruction must reside in the local memory of the processor which executes it.
- the “dimensions” of the architecture of the computer used are imperative placement constraints. However, these constraints are not taken into account by conventional automatic placement methods (such as the methods cited above) and are therefore not treated for this purpose in the state of the art.
- the characteristic parameters of said dimensions are:
- the power of a processor For real-time signal processing applications, the latency of the calculations (time after which the results of these calculations are available) is very important. This time can be limited by a maximum value, and it depends on the power of the processor that performs the calculation. This power is expressed in number of calculation cycles per second.
- the placement discussed here includes the four sub-functions of alignment partitioning, distribution and sequencing. Until now, these four sub-functions have been dealt with separately. On the other hand, the present invention provides for treating these sub-functions simultaneously and concurrently.
- This placement makes it possible to find the adequacy between a program (whose parallelism is specified or not) and a computer with homogeneous parallel architecture as specified above. It consists in distributing the processing and the data on the various processors of the computer and establishing their sequencing, by optimizing the parallelism of the application.
- constraints are, on the one hand, “application” constraints (linked to the size of the specific elementary tasks of systematic signal processing), on the other hand, constraints linked to the architecture of the computer (number of processors, memory capacity , topology of the processor network and data throughput, and finally constraints linked to execution (fine-grained scheduling, overlap between data communications and the calculations performed).
- Constraints modeling essentially consists in establishing, for each constraint, a relation between at least two variables or a relation between a variable and a given value (generally a threshold). This relation is a linear relation (generally a polynomial of the 1st degree).
- the method of the invention starting from this modeling, performs the concurrent (non-sequential) resolution of all the models, to deduce therefrom one or more solutions satisfying all the constraints.
- the goal is to optimize different criteria such as the latency of the application or the (financial) cost of the target architecture.
- many models specific to the placement problem have been developed and complete the description of the problem, such as communications or physical time. Given the number of tasks and the number of data to consider during placement, each model is defined by intention rather than extension.
- the possibility of working at several levels of granularity is fundamental for the placement problem, and we use for this an algebraic formulation of partitioning. This fixes the granularity of the other models, so it maintains many relationships within the conceptual model.
- dependency constraints often link several models, so these are global constraints and the cornerstone of the problem to be solved.
- there are local heuristics to one or more models However, there is no known overall heuristic.
- the heart of the present invention uses the multi-model approach by concurrent constraints [J. Jourdan, F. Fages, D. Rozzonelli & A. Demeure, "Data Alignment and Task Scheduling On Parallel Machines Using Concurrent Model-based Programming", Proc. ILPS 94, 1994], which makes it possible to grasp the problem of automatic placement in a global manner.
- the models are established on the basis of one model per constituent, whether functional or physical. By definition, a model must be seen as the set of specifications for the behavior of the constituent it models.
- the functional diagram of the single figure of the drawing shows the different models implemented for the “placement” function referenced 1 as a whole. These models are: the architecture of the target processors (2), the memory capacity (3), the partitioning of data flows (4), inter-processor communications (5), event scheduling or calculation sequencing (6), the physical time or calculation time (7) and the signal inputs and outputs (8).
- the different links established between these models are of two kinds: the “hyperlinks” represented by complex arrows (9, 10) in the form of irregular polygons, which each link several models together, and simple links, represented by arrowed lines each. at their ends and each connecting two models.
- the complex arrow 9 which corresponds to the “number of processors” criterion, links the models 2, 3, 4 and 5.
- Model 2 is linked by simple links to models (3) (“memory size” criterion), 5 (“bandwidth” criterion) and 6 (“programming mode” criterion).
- Model 3 and linked by simple links to models 4 (“data volume” criterion) and 6 (“distance and cardinality” criterion).
- Model 4 is linked by a simple link to model 7 (“calculation volume” criterion).
- Model 5 is linked by simple links to models 6 (“communication events” criterion) and 7 (“communication duration” criterion).
- Model 6 is linked by a simple link to model 7 (“distance and cardinality” criterion).
- model 7 and linked by a simple link to model 8 (“latency and recurrence” criterion).
- the specifications of the behavior of the various constituents of these models are expressed on the basis of mathematical relationships. We can therefore deduce that the models are identified with the set of relationships defined on their variables. These relations are either primitives of the language used (primitives forming part of a library of relations), or relations defined by the user.
- compositionality The composition of relational models is quite simply the logical conjunction of the relationships that constitute the model. This implies a simple semantics of compositionality.
- the set of solutions of a composite model is quite simply the intersection of the solutions of the models. They contribute to the universality of the program.
- the properties of the induced process are then: •
- a wider field of use A model can be used in several contexts depending on the goal to be achieved.
- the state of a system is characterized by the content of the memory at a given instant.
- the basic operations are reading and writing to or from memory.
- the state of a system is then characterized only by the set of values of the memory boxes associated with the variables that compose it.
- the fundamental difference between the method of the invention and the other software solutions is the representation of this memory.
- the memory is not reduced to a set of memory boxes but constitutes in itself a constraint.
- the latter is capable of providing partial information on all the variables which make up the system. It is interesting to note that all the reasoning implemented by the constraints is based on this paradigm of manipulation of partial information. The advantage of constraints is simply that the system being developed can make decisions without having to wait for it to be fully determined.
- the resolution of industrial applications is not confined to a well-defined problem, but integrates the combination of several sub-problems.
- Combinatorial optimization problems must be solved on multi-component, multifunctional problems in which the constraints are very heterogeneous and where the different elements at different levels of granularity must be considered.
- the invention offers solutions allowing the coexistence of partially overlapping, coordinating and decomposing models.
- the invention makes it possible independently of the heterogeneity of the constraints, by simple local interactions, to guarantee a global coordination of the system.
- the method of the invention offers a good technological solution, because it allows, during the resolution, the concurrent use of all the redundant models.
- the architecture which is in fact a set of parameters such as the number of processors, bandwidth, ...
- the memory which defines through a capacity constraint for which one is sure to be able to calculate an allocation.
- the real time signal which consists of latency and input / output constraints (periods, ).
- a model consists of definitions of variables on which the model-specific constraints are based.
- the method of the invention allows both to solve the problem of automatic placement of signal processing applications on parallel machines and allows the user to manipulate a solution provided without violating the constraints posed by the overall system. This approach is part of a context of codesign and virtual prototyping.
- the user has configured his machine with an insufficient number of processors for the type of placement he wishes.
- the method will make it possible to find the minimum number of processors necessary for the placement imposed in the set of available processors.
- the user can impose a sequencing of calculations.
- the system will then find the partitions, that is to say the distributions of data and calculations in memory and on the appropriate processors. > Likewise, the user can impose an initial partitioning, the system will find compatible schedules.
- the process makes it possible to specify the resources necessary to place a particular application on a given type of machine without violating the application constraints. This consists in taking into account the dimensions and the number of each hardware component
- Machine dimension the process makes it possible to configure a minimum machine for a given application.
- the user can, for example, choose to configure a machine with a minimum number of processors.
- Latency the process makes it possible to find the placement (s) that minimize (s) the execution time of the application on a target machine predefined by the user.
- Cost by integrating a cost (financial for example) on each of the hardware components, the process allows find the placement (s) of the application that minimizes this cost.
- Machine occupancy time the process makes it possible to find the placement (s) which minimizes (s) the occupation time of the target machine predefined by the user in order to be able to possibly place a second application .
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00915272A EP1082656A1 (fr) | 1999-04-02 | 2000-03-31 | Procede generique d'aide au placement d'applications de traitement de signal sur calculateurs paralleles |
AU36644/00A AU3664400A (en) | 1999-04-02 | 2000-03-31 | Generic aid method for placing signal processing applications on parallel computers |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR99/04182 | 1999-04-02 | ||
FR9904182A FR2791789B1 (fr) | 1999-04-02 | 1999-04-02 | Procede generique d'aide au placement d'applications de traitement de signal sur calculateurs paralleles |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000060460A1 true WO2000060460A1 (fr) | 2000-10-12 |
Family
ID=9543985
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2000/000824 WO2000060460A1 (fr) | 1999-04-02 | 2000-03-31 | Procede generique d'aide au placement d'applications de traitement de signal sur calculateurs paralleles |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1082656A1 (fr) |
AU (1) | AU3664400A (fr) |
FR (1) | FR2791789B1 (fr) |
WO (1) | WO2000060460A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002059743A2 (fr) * | 2001-01-25 | 2002-08-01 | Improv Systems, Inc. | Compilateur destine a des architectures a processeurs multiples et a memoire repartie |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2818406B1 (fr) * | 2000-12-19 | 2003-03-07 | Thomson Csf | Procede de placement d'applications multiprocesseurs |
FR2819601B1 (fr) * | 2001-01-16 | 2003-07-18 | Canon Kk | Procede et dispositif de partition de programme informatique |
FR2820526B1 (fr) | 2001-02-05 | 2003-06-13 | Thomson Csf | Procede de simulation de performances, et procede de realisation d'applications multiprocesseurs, et dispositifs permettant de mettre en oeuvre lesdits procedes |
DE102010028896A1 (de) * | 2010-05-11 | 2011-11-17 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Verfahren und Vorrichtung zum Zuweisen einer Mehrzahl von Teilaufgaben einer Aufgabe zu einer Mehrzahl von Recheneinheiten einer vorgegebenen Prozessorarchitektur |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0624842A2 (fr) * | 1993-04-12 | 1994-11-17 | Loral/Rolm Mil-Spec Corporation | Méthode pour le déploiement automatisé d'un programme de logiciel sur une architecture multi-processeur |
FR2732787A1 (fr) * | 1995-04-07 | 1996-10-11 | Thomson Csf | Procede de saisie graphique d'application de traitement de signal |
-
1999
- 1999-04-02 FR FR9904182A patent/FR2791789B1/fr not_active Expired - Fee Related
-
2000
- 2000-03-31 WO PCT/FR2000/000824 patent/WO2000060460A1/fr not_active Application Discontinuation
- 2000-03-31 EP EP00915272A patent/EP1082656A1/fr not_active Withdrawn
- 2000-03-31 AU AU36644/00A patent/AU3664400A/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0624842A2 (fr) * | 1993-04-12 | 1994-11-17 | Loral/Rolm Mil-Spec Corporation | Méthode pour le déploiement automatisé d'un programme de logiciel sur une architecture multi-processeur |
FR2732787A1 (fr) * | 1995-04-07 | 1996-10-11 | Thomson Csf | Procede de saisie graphique d'application de traitement de signal |
Non-Patent Citations (3)
Title |
---|
ANCOURT C ET AL: "Automatic data mapping of signal processing applications", PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON APPLICATIONS-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS. (CAT. NO.97TB100177), PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ZURICH, SWITZERL, 1997, Los Alamitos, CA, USA, IEEE Comput. Soc, USA, pages 350 - 362, XP002135742, ISBN: 0-8186-7959-X * |
C GUETTIER: "Optimisation globale et placement d' applications de traitement du signal sur architectures paralleles utilisant la programmation logique avec contraintes", THESE CENTRE DE RECHERCHES EN INFORMATIQUE ECOLE DES MINES DE PARIS, 12 December 1997 (1997-12-12), XP002143909 * |
JINGKE LI ET AL: "THE DATA ALIGNMENT PHASE IN COMPILING PROGRAMS FOR DISTRIBUTED- MEMORY MACHINES", JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING,US,ACADEMIC PRESS, DULUTH, MN, vol. 13, no. 2, 1 October 1991 (1991-10-01), pages 213 - 221, XP000228927, ISSN: 0743-7315 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002059743A2 (fr) * | 2001-01-25 | 2002-08-01 | Improv Systems, Inc. | Compilateur destine a des architectures a processeurs multiples et a memoire repartie |
WO2002059743A3 (fr) * | 2001-01-25 | 2002-10-31 | Improv Systems Inc | Compilateur destine a des architectures a processeurs multiples et a memoire repartie |
US7325232B2 (en) | 2001-01-25 | 2008-01-29 | Improv Systems, Inc. | Compiler for multiple processor and distributed memory architectures |
Also Published As
Publication number | Publication date |
---|---|
AU3664400A (en) | 2000-10-23 |
FR2791789B1 (fr) | 2001-08-10 |
EP1082656A1 (fr) | 2001-03-14 |
FR2791789A1 (fr) | 2000-10-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yu et al. | Dynamic control flow in large-scale machine learning | |
Rocklin | Dask: Parallel computation with blocked algorithms and task scheduling. | |
Cummins et al. | Compilergym: Robust, performant compiler optimization environments for ai research | |
US9081928B2 (en) | Embedded system development | |
Bergstra et al. | Machine learning for predictive auto-tuning with boosted regression trees | |
Zaccone | Python parallel programming cookbook | |
Audet et al. | Algorithm 1027: NOMAD version 4: Nonlinear optimization with the MADS algorithm | |
Boutellier et al. | Actor merging for dataflow process networks | |
Bonfietti et al. | Throughput constraint for synchronous data flow graphs | |
Danelutto et al. | Algorithmic skeletons meeting grids | |
WO2000060460A1 (fr) | Procede generique d'aide au placement d'applications de traitement de signal sur calculateurs paralleles | |
Huang et al. | Alcop: Automatic load-compute pipelining in deep learning compiler for ai-gpus | |
de Oliveira Dantas et al. | A component-based framework for certification of components in a cloud of HPC services | |
Madsen et al. | Modeling and analysis framework for embedded systems | |
Almqvist | Integrating SkePU's algorithmic skeletons with GPI on a cluster | |
Honorat | Modeling, Scheduling, Pipelining and Configuration of Synchronous Dataflow Graphs with Throughput Constraints | |
Jeanmougin et al. | Warp-Level CFG Construction for GPU Kernel WCET Analysis | |
Singhal et al. | A Vision on Accelerating Enterprise IT System 2.0 | |
Fox et al. | Algebraic models of correctness for abstract pipelines | |
Widemann et al. | On-line synchronous total purely functional data-flow programming on the java virtual machine with sig | |
Feldman | Software-Defined Hardware Without Sacrificing Performance | |
Zhou et al. | Dataflow-based, cross-platform design flow for DSP applications | |
Yactin et al. | On the feasibility of byzantine agreement to secure fog/edge data management | |
Kofman | Low power application architecture adaptation using SMT solvers | |
Grebant | Efficient tree-based symbolic WCET computation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: 09701384 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2000915272 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2000915272 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000915272 Country of ref document: EP |