WO2002003194A2 - Procede et appareil de filetage a entree multiple destines a une mise en parallele automatique et guidee d'un programme source - Google Patents

Procede et appareil de filetage a entree multiple destines a une mise en parallele automatique et guidee d'un programme source Download PDF

Info

Publication number
WO2002003194A2
WO2002003194A2 PCT/US2001/018614 US0118614W WO0203194A2 WO 2002003194 A2 WO2002003194 A2 WO 2002003194A2 US 0118614 W US0118614 W US 0118614W WO 0203194 A2 WO0203194 A2 WO 0203194A2
Authority
WO
WIPO (PCT)
Prior art keywords
code
source program
sequence
instruction
invocation
Prior art date
Application number
PCT/US2001/018614
Other languages
English (en)
Other versions
WO2002003194A3 (fr
Inventor
Knud Kirkegaard
Milind Girkar
Paul Grey
Xinmin Tian
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to GB0301568A priority Critical patent/GB2381356B/en
Priority to AU2001266796A priority patent/AU2001266796A1/en
Priority to DE10196389T priority patent/DE10196389T1/de
Publication of WO2002003194A2 publication Critical patent/WO2002003194A2/fr
Publication of WO2002003194A3 publication Critical patent/WO2002003194A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation

Definitions

  • the present invention relates generally to compiler optimization techniques and, more specifically, to a multi-entry threading method and
  • Parallel applications are executed by a multiple processor computer system which includes a plurality of processors interconnected so as to exchange
  • Figure 1A is a block diagram of a distributed-memory multiple processor
  • a computer system 100 includes
  • Each processing module 120 includes a processor 122 and a memory 124. In the computer system 100, any number of processing modules can be interconnected as shown.
  • Figure IB is a block diagram of a shared-memory multiple processor
  • a computer system 150 includes multiple processors 160 connected to a shared memory 170.
  • processors 160 connected to a shared memory 170.
  • shared memory 170 In one embodiment,
  • memory 170 includes exclusive areas occupied by each processor 160 and a common area accessed by all processors. In the computer system 150, only a limited number of processors 160 may be interconnected, due to the Hmitations imposed by the shared memory 170.
  • a compiler looks at the entire source program, collects and reorganizes the instructions, and translates the source program into
  • One compiler technique involves use of outlining technology, which
  • Each outlined subroutine is then sent to one thread in a processor of parallel
  • Figure 1 A is a block diagram of one embodiment for a disti ⁇ aded-memory multiple processor computer system.
  • Figure IB is a block diagram of one embodiment for a shared-memory multiple processor computer system.
  • Figure 2 is a block diagram of one embodiment for a computer system.
  • Figure 3A is a block diagram of one embodiment for a process of obtaining an executable program in a computer system.
  • Figure 3B is a block diagram of one embodiment for a process of obtaining a parallel executable program in a computer system.
  • Figure 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.
  • the acts are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
  • registers or other such information storage, transmission or display devices are registers or other such information storage, transmission or display devices.
  • the present invention also relates to an apparatus for performing the
  • This apparatus may be specially constructed for the required
  • Such a computer may comprise a general purpose computer, selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program stored in the computer.
  • program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • a computer readable storage medium such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • any of the methods according to the present invention can be implemented in hard-wired circuitry, by programming a general purpose processor or by any combination of hardware and software.
  • One of skill in the art can be implemented in hard-wired circuitry, by programming a general purpose processor or by any combination of hardware and software.
  • the invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • the required structure for a variety of tasks are performed by remote processing devices that are linked through a communications network.
  • sequences of instructions designed to implement the methods can be compiled for execution on a variety of hardware platforms and for interface to a variety of
  • Figure 2 is a block diagram of one embodiment for a computer system 200.
  • Computer system 200 includes a system bus 201, or other communications module similar to the system bus, for communicating information, and a processing module, such as processor 202, connected to bus 201 for processing
  • Computer system 200 further includes a main memory 204, such as a random access memory (RAM) or other dynamic storage device, connected to
  • bus 201 for storing information and instructions to be executed by processor 202.
  • Main memory 204 may also be used for storing temporary variables or other intermediate information during execution of instructions by processor 202.
  • Computer system 200 also includes a read only memory (ROM) 206, and/ or other similar static storage device, connected to bus 201, for storing static information
  • ROM read only memory
  • static storage device connected to bus 201, for storing static information
  • An optional data storage device 207 such as a magnetic disk or optical
  • System bus 201 is connected to an
  • Computer system 200 may also be connected via bus 210 to a display device 221,
  • an alphanumeric input device 222 such as a keyboard including alphanumeric and other keys, is
  • cursor control device 223 Another type of user input device is cursor control device 223,
  • cursor such as a conventional mouse, touch mouse, trackball, or other type of cursor
  • direction keys for communicating direction information and command selection
  • a fully loaded computer system may optionally include video, camera, speakers, sound card, and many other similar conventional options.
  • a communication device 224 is also connected to bus 210 for accessing
  • device 224 may include a modem, a network interface card, or other well known
  • the computer system 200 may be connected to a number of servers via a conventional network infrastructure.
  • Figure 3A is a block diagram of one embodiment for a process of obtaining
  • source file 310 includes source code written by programmers in high-level languages, for
  • FORTRAN For example FORTRAN or C.
  • the source code instructions must be translated into machine language.
  • the translation process involves several processing steps and
  • the high-level language source code instructions are configured to:
  • source file 310 is passed through a compiler (not shown).
  • the compiler translates the high-level instructions into object code stored within object files 320.
  • the compiler needs to generate the object code in a form suitable for parallel execution.
  • the object code includes multiple modules, each module
  • Some modules may be stored in runtime
  • the linker 330 combines the modules and gives real values to addresses within the modules, thereby producing an executable program 350.
  • Figure 3B is a block diagram of one embodiment for a process of obtaining
  • serial source program 360 and a serial source program with OpenMP directives 365 are compiled by a compiler (not shown), which creates parallel executable
  • FIG. 4 is a flow diagram of one embodiment for a multi-entry threading method for automatic and directive-guided parallelization of a source program.
  • a source program to be compiled and executed by a multi-processor computer system needs to be parallelized in order to fully take advantage of the system's resources. Therefore, depending on the number of
  • the source program includes multiple loops of code, also known as parallel regions.
  • a parallel region or loop is defined as a code block of the program that is to be executed by the multiple threads in parallel.
  • One example of a source program including multiple parallel regions or loops is as follows:
  • #include ⁇ stdio , h> Idefine NSIZE 200 main ( ) ⁇ int x, i, j ; float a [NSIZE], b [NSIZE], c [NSIZE];
  • Each thread receives a portion of the loop and executes the portion in
  • Parallel regions or loops are sequences of the code representing the fundamental parallel constructs that indicate code to be executed
  • processing block 410 the source program or source code is received and read by the compiler.
  • processing block 420 a first parallel construct within the routine to be executed in parallel is located by the compiler.
  • a start code is generated by the compiler.
  • the start code is a new threaded entry code indicating the beginning
  • an invocation code is generated by the compiler.
  • the invocation code is an invocation
  • the new threaded entry code is inserted before the
  • the new entry code is inserted prior to a first instruction of the parallel construct.
  • the invocation instruction is inserted before the new threaded entry code in the source program.
  • a stop code is inserted after the parallel construct
  • the stop code is a threaded return instruction, which is inserted after a last instruction of the parallel construct.
  • the threaded return instruction signals the run-time system to perform the synchronization and return to the main program.
  • a new location instruction is generated by the
  • the location instruction is a label instruction
  • the location instruction is inserted after the threaded return
  • the jump is a prefix before the new threaded entry to direct the system to continue execution of the source program at the location instruction.
  • the jump is a prefix before the new threaded entry to direct the system to continue execution of the source program at the location instruction.
  • blocks 420 through 495 are processed again with respect to the new parallel

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

L'invention concerne un procédé et un appareil permettant de compiler un programme source. Le procédé consiste à localiser des séquences multiples prédéterminées comprises dans le programme source; à insérer un code de démarrage dans le programme source avant une première instruction de chaque séquence prédéterminée; à insérer un code d'appel dans le programme source avant le code de démarrage, le code d'appel adressant le code de démarrage et transférant chaque séquence à un système, de manière qu'elle soit exécutée, et à insérer un code d'arrêt dans le programme source après une dernière instruction de chaque séquence, le code d'arrêt signalant au système que ce dernier doit arrêter la séquence.
PCT/US2001/018614 2000-06-30 2001-06-08 Procede et appareil de filetage a entree multiple destines a une mise en parallele automatique et guidee d'un programme source WO2002003194A2 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
GB0301568A GB2381356B (en) 2000-06-30 2001-06-08 Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program
AU2001266796A AU2001266796A1 (en) 2000-06-30 2001-06-08 Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program
DE10196389T DE10196389T1 (de) 2000-06-30 2001-06-08 Multi-Eintritts-Threading-Verfahren und -Einrichtung für eine automatische und direktiv-gelenkte Parallelisierung eines Quellprogramms

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US60808700A 2000-06-30 2000-06-30
US09/608,087 2000-06-30

Publications (2)

Publication Number Publication Date
WO2002003194A2 true WO2002003194A2 (fr) 2002-01-10
WO2002003194A3 WO2002003194A3 (fr) 2003-01-23

Family

ID=24434971

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/018614 WO2002003194A2 (fr) 2000-06-30 2001-06-08 Procede et appareil de filetage a entree multiple destines a une mise en parallele automatique et guidee d'un programme source

Country Status (6)

Country Link
CN (1) CN1210650C (fr)
AU (1) AU2001266796A1 (fr)
DE (1) DE10196389T1 (fr)
GB (1) GB2381356B (fr)
TW (1) TW525090B (fr)
WO (1) WO2002003194A2 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1569104A2 (fr) * 2004-01-09 2005-08-31 Interuniversitair Microelektronica Centrum Vzw Méthode automatisée et système d'ordinateur adapté pour paralléliser du code séquentiel
EP2315118A1 (fr) * 2009-10-20 2011-04-27 Bull Hn Information Systems Inc. Procédé et appareil pour activer le traitement parallèle pendant l'exécution d'un programme source cobol utilisant une compilation en deux étapes
US20140189663A1 (en) * 2009-10-20 2014-07-03 Cynthia S. Guenthner Method and apparatus enabling multi threaded program execution for a cobol program including openmp directives by utilizing a two-stage compilation process

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7487496B2 (en) * 2004-12-02 2009-02-03 International Business Machines Corporation Computer program functional partitioning method for heterogeneous multi-processing systems
US7478376B2 (en) * 2004-12-02 2009-01-13 International Business Machines Corporation Computer program code size partitioning method for multiple memory multi-processing systems

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0244928A1 (fr) * 1986-05-01 1987-11-11 The British Petroleum Company p.l.c. Flux de commande dans des ordinateurs
US5278986A (en) * 1991-12-13 1994-01-11 Thinking Machines Corporation System and method for compiling a source code supporting data parallel variables
WO1994022077A2 (fr) * 1993-03-15 1994-09-29 University Of Westminster Dispositif et procede de calcul parallele

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0244928A1 (fr) * 1986-05-01 1987-11-11 The British Petroleum Company p.l.c. Flux de commande dans des ordinateurs
US5278986A (en) * 1991-12-13 1994-01-11 Thinking Machines Corporation System and method for compiling a source code supporting data parallel variables
WO1994022077A2 (fr) * 1993-03-15 1994-09-29 University Of Westminster Dispositif et procede de calcul parallele

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHOW, JYH-HERNG; LYON, L E; SARKAR, V: "Automatic Parallelization for Symmetric Shared-Memory Multiprocessors" PROCEEDINGS OF CASCON '96, 12 - 14 November 1996, pages 1-14, XP002205143 Toronto, Canada cited in the application *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1569104A2 (fr) * 2004-01-09 2005-08-31 Interuniversitair Microelektronica Centrum Vzw Méthode automatisée et système d'ordinateur adapté pour paralléliser du code séquentiel
EP1569104A3 (fr) * 2004-01-09 2006-05-03 Interuniversitair Microelektronica Centrum Vzw Méthode automatisée et système d'ordinateur adapté pour paralléliser du code séquentiel
EP2315118A1 (fr) * 2009-10-20 2011-04-27 Bull Hn Information Systems Inc. Procédé et appareil pour activer le traitement parallèle pendant l'exécution d'un programme source cobol utilisant une compilation en deux étapes
US20140189663A1 (en) * 2009-10-20 2014-07-03 Cynthia S. Guenthner Method and apparatus enabling multi threaded program execution for a cobol program including openmp directives by utilizing a two-stage compilation process
US8869126B2 (en) * 2009-10-20 2014-10-21 Bull Hn Information Systems Inc. Method and apparatus enabling multi threaded program execution for a Cobol program including OpenMP directives by utilizing a two-stage compilation process

Also Published As

Publication number Publication date
AU2001266796A1 (en) 2002-01-14
WO2002003194A3 (fr) 2003-01-23
DE10196389T1 (de) 2003-06-18
CN1210650C (zh) 2005-07-13
GB2381356A (en) 2003-04-30
TW525090B (en) 2003-03-21
GB2381356B (en) 2004-09-22
GB0301568D0 (en) 2003-02-26
CN1446334A (zh) 2003-10-01

Similar Documents

Publication Publication Date Title
US8037465B2 (en) Thread-data affinity optimization using compiler
US9424013B2 (en) System and method for reducing transactional abort rates using compiler optimization techniques
Hermanns Parallel programming in Fortran 95 using OpenMP
US5778212A (en) Interprocedural analysis user interface
US8677331B2 (en) Lock-clustering compilation for software transactional memory
EP2815313B1 (fr) Rastérisation de systèmes d'ombrage informatiques
Allen et al. A framework for determining useful parallelism
EP0806725B1 (fr) Procédé et dispositif d'introduction prématuré de code assembleur à des fins d'optimisation
US20130283250A1 (en) Thread Specific Compiler Generated Customization of Runtime Support for Application Programming Interfaces
US8341615B2 (en) Single instruction multiple data (SIMD) code generation for parallel loops using versioning and scheduling
Krishnamurthy et al. Optimizing parallel programs with explicit synchronization
WO2000029937A2 (fr) Systeme informatique, support de stockage lisible par ordinateur, procede de fonctionnement et procede de mise en service dudit systeme
US8966461B2 (en) Vector width-aware synchronization-elision for vector processors
Hammond Parallel Functional Programming: An Introduction.
US6301652B1 (en) Instruction cache alignment mechanism for branch targets based on predicted execution frequencies
Su et al. Automatic generation of fast BLAS3-GEMM: A portable compiler approach
US20130086565A1 (en) Low-level function selection using vector-width
Addison et al. OpenMP 3.0 tasking implementation in OpenUH
WO2002003194A2 (fr) Procede et appareil de filetage a entree multiple destines a une mise en parallele automatique et guidee d'un programme source
Shei et al. MATLAB parallelization through scalarization
Chamberlain et al. Factor-join: A unique approach to compiling array languages for parallel machines
Trancoso et al. DDMCPP: The data-driven multithreading C pre-processor
Bernard et al. On the compilation of a language for general concurrent target architectures
US7162718B1 (en) Language extension for light weight threading in a JVM
Tao et al. Automatic parallelization of programs via software stream rewriting

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase in:

Ref document number: 0301568

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20010608

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: IN/PCT/2002/01701/MU

Country of ref document: IN

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWE Wipo information: entry into national phase

Ref document number: 018121241

Country of ref document: CN

RET De translation (de og part 6b)

Ref document number: 10196389

Country of ref document: DE

Date of ref document: 20030618

Kind code of ref document: P

WWE Wipo information: entry into national phase

Ref document number: 10196389

Country of ref document: DE

122 Ep: pct application non-entry in european phase
REG Reference to national code

Ref country code: DE

Ref legal event code: 8607

NENP Non-entry into the national phase in:

Ref country code: JP