WO2002003194A2 - Procede et appareil de filetage a entree multiple destines a une mise en parallele automatique et guidee d'un programme source - Google Patents
Procede et appareil de filetage a entree multiple destines a une mise en parallele automatique et guidee d'un programme source Download PDFInfo
- Publication number
- WO2002003194A2 WO2002003194A2 PCT/US2001/018614 US0118614W WO0203194A2 WO 2002003194 A2 WO2002003194 A2 WO 2002003194A2 US 0118614 W US0118614 W US 0118614W WO 0203194 A2 WO0203194 A2 WO 0203194A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- code
- source program
- sequence
- instruction
- invocation
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
Definitions
- the present invention relates generally to compiler optimization techniques and, more specifically, to a multi-entry threading method and
- Parallel applications are executed by a multiple processor computer system which includes a plurality of processors interconnected so as to exchange
- Figure 1A is a block diagram of a distributed-memory multiple processor
- a computer system 100 includes
- Each processing module 120 includes a processor 122 and a memory 124. In the computer system 100, any number of processing modules can be interconnected as shown.
- Figure IB is a block diagram of a shared-memory multiple processor
- a computer system 150 includes multiple processors 160 connected to a shared memory 170.
- processors 160 connected to a shared memory 170.
- shared memory 170 In one embodiment,
- memory 170 includes exclusive areas occupied by each processor 160 and a common area accessed by all processors. In the computer system 150, only a limited number of processors 160 may be interconnected, due to the Hmitations imposed by the shared memory 170.
- a compiler looks at the entire source program, collects and reorganizes the instructions, and translates the source program into
- One compiler technique involves use of outlining technology, which
- Each outlined subroutine is then sent to one thread in a processor of parallel
- Figure 1 A is a block diagram of one embodiment for a disti ⁇ aded-memory multiple processor computer system.
- Figure IB is a block diagram of one embodiment for a shared-memory multiple processor computer system.
- Figure 2 is a block diagram of one embodiment for a computer system.
- Figure 3A is a block diagram of one embodiment for a process of obtaining an executable program in a computer system.
- Figure 3B is a block diagram of one embodiment for a process of obtaining a parallel executable program in a computer system.
- Figure 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.
- the acts are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
- registers or other such information storage, transmission or display devices are registers or other such information storage, transmission or display devices.
- the present invention also relates to an apparatus for performing the
- This apparatus may be specially constructed for the required
- Such a computer may comprise a general purpose computer, selectively activated or reconfigured by a computer program stored in the computer.
- a computer program stored in the computer.
- program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
- a computer readable storage medium such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
- any of the methods according to the present invention can be implemented in hard-wired circuitry, by programming a general purpose processor or by any combination of hardware and software.
- One of skill in the art can be implemented in hard-wired circuitry, by programming a general purpose processor or by any combination of hardware and software.
- the invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- the required structure for a variety of tasks are performed by remote processing devices that are linked through a communications network.
- sequences of instructions designed to implement the methods can be compiled for execution on a variety of hardware platforms and for interface to a variety of
- Figure 2 is a block diagram of one embodiment for a computer system 200.
- Computer system 200 includes a system bus 201, or other communications module similar to the system bus, for communicating information, and a processing module, such as processor 202, connected to bus 201 for processing
- Computer system 200 further includes a main memory 204, such as a random access memory (RAM) or other dynamic storage device, connected to
- bus 201 for storing information and instructions to be executed by processor 202.
- Main memory 204 may also be used for storing temporary variables or other intermediate information during execution of instructions by processor 202.
- Computer system 200 also includes a read only memory (ROM) 206, and/ or other similar static storage device, connected to bus 201, for storing static information
- ROM read only memory
- static storage device connected to bus 201, for storing static information
- An optional data storage device 207 such as a magnetic disk or optical
- System bus 201 is connected to an
- Computer system 200 may also be connected via bus 210 to a display device 221,
- an alphanumeric input device 222 such as a keyboard including alphanumeric and other keys, is
- cursor control device 223 Another type of user input device is cursor control device 223,
- cursor such as a conventional mouse, touch mouse, trackball, or other type of cursor
- direction keys for communicating direction information and command selection
- a fully loaded computer system may optionally include video, camera, speakers, sound card, and many other similar conventional options.
- a communication device 224 is also connected to bus 210 for accessing
- device 224 may include a modem, a network interface card, or other well known
- the computer system 200 may be connected to a number of servers via a conventional network infrastructure.
- Figure 3A is a block diagram of one embodiment for a process of obtaining
- source file 310 includes source code written by programmers in high-level languages, for
- FORTRAN For example FORTRAN or C.
- the source code instructions must be translated into machine language.
- the translation process involves several processing steps and
- the high-level language source code instructions are configured to:
- source file 310 is passed through a compiler (not shown).
- the compiler translates the high-level instructions into object code stored within object files 320.
- the compiler needs to generate the object code in a form suitable for parallel execution.
- the object code includes multiple modules, each module
- Some modules may be stored in runtime
- the linker 330 combines the modules and gives real values to addresses within the modules, thereby producing an executable program 350.
- Figure 3B is a block diagram of one embodiment for a process of obtaining
- serial source program 360 and a serial source program with OpenMP directives 365 are compiled by a compiler (not shown), which creates parallel executable
- FIG. 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.
- a source program to be compiled and executed by a multi-processor computer system needs to be parallelized in order to fully take advantage of the system's resources. Therefore, depending on the number of
- the source program includes multiple loops of code, also known as parallel regions.
- a parallel region or loop is defined as a code block of the program that is to be executed by the multiple threads in parallel.
- One example of a source program including multiple parallel regions or loops is as follows:
- #include ⁇ stdio , h> Idefine NSIZE 200 main ( ) ⁇ int x, i, j ; float a [NSIZE], b [NSIZE], c [NSIZE];
- Each thread receives a portion of the loop and executes the portion in
- Parallel regions or loops are sequences of the code representing the fundamental parallel constructs that indicate code to be executed
- processing block 410 the source program or source code is received and read by the compiler.
- processing block 420 a first parallel construct within the routine to be executed in parallel is located by the compiler.
- a start code is generated by the compiler.
- the start code is a new threaded entry code indicating the beginning
- an invocation code is generated by the compiler.
- the invocation code is an invocation
- the new threaded entry code is inserted before the
- the new entry code is inserted prior to a first instruction of the parallel construct.
- the invocation instruction is inserted before the new threaded entry code in the source program.
- a stop code is inserted after the parallel construct
- the stop code is a threaded return instruction, which is inserted after a last instruction of the parallel construct.
- the threaded return instruction signals the run-time system to perform the synchronization and return to the main program.
- a new location instruction is generated by the
- the location instruction is a label instruction
- the location instruction is inserted after the threaded return
- the jump is a prefix before the new threaded entry to direct the system to continue execution of the source program at the location instruction.
- the jump is a prefix before the new threaded entry to direct the system to continue execution of the source program at the location instruction.
- blocks 420 through 495 are processed again with respect to the new parallel
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0301568A GB2381356B (en) | 2000-06-30 | 2001-06-08 | Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program |
AU2001266796A AU2001266796A1 (en) | 2000-06-30 | 2001-06-08 | Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program |
DE10196389T DE10196389T1 (de) | 2000-06-30 | 2001-06-08 | Multi-Eintritts-Threading-Verfahren und -Einrichtung für eine automatische und direktiv-gelenkte Parallelisierung eines Quellprogramms |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US60808700A | 2000-06-30 | 2000-06-30 | |
US09/608,087 | 2000-06-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002003194A2 true WO2002003194A2 (fr) | 2002-01-10 |
WO2002003194A3 WO2002003194A3 (fr) | 2003-01-23 |
Family
ID=24434971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/018614 WO2002003194A2 (fr) | 2000-06-30 | 2001-06-08 | Procede et appareil de filetage a entree multiple destines a une mise en parallele automatique et guidee d'un programme source |
Country Status (6)
Country | Link |
---|---|
CN (1) | CN1210650C (fr) |
AU (1) | AU2001266796A1 (fr) |
DE (1) | DE10196389T1 (fr) |
GB (1) | GB2381356B (fr) |
TW (1) | TW525090B (fr) |
WO (1) | WO2002003194A2 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1569104A2 (fr) * | 2004-01-09 | 2005-08-31 | Interuniversitair Microelektronica Centrum Vzw | Méthode automatisée et système d'ordinateur adapté pour paralléliser du code séquentiel |
EP2315118A1 (fr) * | 2009-10-20 | 2011-04-27 | Bull Hn Information Systems Inc. | Procédé et appareil pour activer le traitement parallèle pendant l'exécution d'un programme source cobol utilisant une compilation en deux étapes |
US20140189663A1 (en) * | 2009-10-20 | 2014-07-03 | Cynthia S. Guenthner | Method and apparatus enabling multi threaded program execution for a cobol program including openmp directives by utilizing a two-stage compilation process |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7487496B2 (en) * | 2004-12-02 | 2009-02-03 | International Business Machines Corporation | Computer program functional partitioning method for heterogeneous multi-processing systems |
US7478376B2 (en) * | 2004-12-02 | 2009-01-13 | International Business Machines Corporation | Computer program code size partitioning method for multiple memory multi-processing systems |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0244928A1 (fr) * | 1986-05-01 | 1987-11-11 | The British Petroleum Company p.l.c. | Flux de commande dans des ordinateurs |
US5278986A (en) * | 1991-12-13 | 1994-01-11 | Thinking Machines Corporation | System and method for compiling a source code supporting data parallel variables |
WO1994022077A2 (fr) * | 1993-03-15 | 1994-09-29 | University Of Westminster | Dispositif et procede de calcul parallele |
-
2001
- 2001-04-20 TW TW90109532A patent/TW525090B/zh not_active IP Right Cessation
- 2001-06-08 WO PCT/US2001/018614 patent/WO2002003194A2/fr active Application Filing
- 2001-06-08 AU AU2001266796A patent/AU2001266796A1/en not_active Abandoned
- 2001-06-08 CN CN 01812124 patent/CN1210650C/zh not_active Expired - Fee Related
- 2001-06-08 DE DE10196389T patent/DE10196389T1/de not_active Ceased
- 2001-06-08 GB GB0301568A patent/GB2381356B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0244928A1 (fr) * | 1986-05-01 | 1987-11-11 | The British Petroleum Company p.l.c. | Flux de commande dans des ordinateurs |
US5278986A (en) * | 1991-12-13 | 1994-01-11 | Thinking Machines Corporation | System and method for compiling a source code supporting data parallel variables |
WO1994022077A2 (fr) * | 1993-03-15 | 1994-09-29 | University Of Westminster | Dispositif et procede de calcul parallele |
Non-Patent Citations (1)
Title |
---|
CHOW, JYH-HERNG; LYON, L E; SARKAR, V: "Automatic Parallelization for Symmetric Shared-Memory Multiprocessors" PROCEEDINGS OF CASCON '96, 12 - 14 November 1996, pages 1-14, XP002205143 Toronto, Canada cited in the application * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1569104A2 (fr) * | 2004-01-09 | 2005-08-31 | Interuniversitair Microelektronica Centrum Vzw | Méthode automatisée et système d'ordinateur adapté pour paralléliser du code séquentiel |
EP1569104A3 (fr) * | 2004-01-09 | 2006-05-03 | Interuniversitair Microelektronica Centrum Vzw | Méthode automatisée et système d'ordinateur adapté pour paralléliser du code séquentiel |
EP2315118A1 (fr) * | 2009-10-20 | 2011-04-27 | Bull Hn Information Systems Inc. | Procédé et appareil pour activer le traitement parallèle pendant l'exécution d'un programme source cobol utilisant une compilation en deux étapes |
US20140189663A1 (en) * | 2009-10-20 | 2014-07-03 | Cynthia S. Guenthner | Method and apparatus enabling multi threaded program execution for a cobol program including openmp directives by utilizing a two-stage compilation process |
US8869126B2 (en) * | 2009-10-20 | 2014-10-21 | Bull Hn Information Systems Inc. | Method and apparatus enabling multi threaded program execution for a Cobol program including OpenMP directives by utilizing a two-stage compilation process |
Also Published As
Publication number | Publication date |
---|---|
AU2001266796A1 (en) | 2002-01-14 |
WO2002003194A3 (fr) | 2003-01-23 |
DE10196389T1 (de) | 2003-06-18 |
CN1210650C (zh) | 2005-07-13 |
GB2381356A (en) | 2003-04-30 |
TW525090B (en) | 2003-03-21 |
GB2381356B (en) | 2004-09-22 |
GB0301568D0 (en) | 2003-02-26 |
CN1446334A (zh) | 2003-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8037465B2 (en) | Thread-data affinity optimization using compiler | |
US9424013B2 (en) | System and method for reducing transactional abort rates using compiler optimization techniques | |
Hermanns | Parallel programming in Fortran 95 using OpenMP | |
US5778212A (en) | Interprocedural analysis user interface | |
US8677331B2 (en) | Lock-clustering compilation for software transactional memory | |
EP2815313B1 (fr) | Rastérisation de systèmes d'ombrage informatiques | |
Allen et al. | A framework for determining useful parallelism | |
EP0806725B1 (fr) | Procédé et dispositif d'introduction prématuré de code assembleur à des fins d'optimisation | |
US20130283250A1 (en) | Thread Specific Compiler Generated Customization of Runtime Support for Application Programming Interfaces | |
US8341615B2 (en) | Single instruction multiple data (SIMD) code generation for parallel loops using versioning and scheduling | |
Krishnamurthy et al. | Optimizing parallel programs with explicit synchronization | |
WO2000029937A2 (fr) | Systeme informatique, support de stockage lisible par ordinateur, procede de fonctionnement et procede de mise en service dudit systeme | |
US8966461B2 (en) | Vector width-aware synchronization-elision for vector processors | |
Hammond | Parallel Functional Programming: An Introduction. | |
US6301652B1 (en) | Instruction cache alignment mechanism for branch targets based on predicted execution frequencies | |
Su et al. | Automatic generation of fast BLAS3-GEMM: A portable compiler approach | |
US20130086565A1 (en) | Low-level function selection using vector-width | |
Addison et al. | OpenMP 3.0 tasking implementation in OpenUH | |
WO2002003194A2 (fr) | Procede et appareil de filetage a entree multiple destines a une mise en parallele automatique et guidee d'un programme source | |
Shei et al. | MATLAB parallelization through scalarization | |
Chamberlain et al. | Factor-join: A unique approach to compiling array languages for parallel machines | |
Trancoso et al. | DDMCPP: The data-driven multithreading C pre-processor | |
Bernard et al. | On the compilation of a language for general concurrent target architectures | |
US7162718B1 (en) | Language extension for light weight threading in a JVM | |
Tao et al. | Automatic parallelization of programs via software stream rewriting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
ENP | Entry into the national phase in: |
Ref document number: 0301568 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20010608 |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: IN/PCT/2002/01701/MU Country of ref document: IN |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 018121241 Country of ref document: CN |
|
RET | De translation (de og part 6b) |
Ref document number: 10196389 Country of ref document: DE Date of ref document: 20030618 Kind code of ref document: P |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10196389 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8607 |
|
NENP | Non-entry into the national phase in: |
Ref country code: JP |