KR101276600B1

KR101276600B1 - Data communication method between processors, code stored computer-readable media for implementing the same method, and multi-processor computing system

Info

Publication number: KR101276600B1
Application number: KR1020110104351A
Authority: KR
Inventors: 김영태
Original assignee: 강릉원주대학교산학협력단
Priority date: 2011-10-13
Filing date: 2011-10-13
Publication date: 2013-06-19
Also published as: KR20130039783A

Abstract

2차원 이상의 데이터 영역 중 2차원의 데이터 영역을 복수 개의 서브-데이터 영역으로 분할하여 각 프로세서에 할당하고, 2차원을 초과하는 차원의 데이터 영역을 상기 프로세서 어레이에 할당하는 병렬처리객체를 생성하는 단계, 병렬처리객체를 이용하여 각 프로세서 간의 데이터 통신방식을 정의하는 통신객체를 생성하는 단계, 및 통신객체에 의해 정의되는 데이터 통신방식에 따라, 프로세서 어레이의 제1 프로세서에게, 프로세서 어레이의 제2 프로세서에 할당된 제2 서브-데이터 영역의 경계영역의 값을 전송하는 단계를 포함하는, 프로세서 간 데이터 통신방법이 공개된다.Dividing a two-dimensional data area into a plurality of sub-data areas among the two-dimensional or more data areas and assigning them to each processor, and generating a parallel processing object for allocating a data area of more than two dimensions to the processor array; Generating a communication object defining a data communication method between the processors using the parallel processing object, and according to the data communication method defined by the communication object, giving the first processor of the processor array a second processor of the processor array. A method of inter-processor data communication is disclosed, comprising transmitting a value of a boundary area of a second sub-data area assigned to.

Description

Data communication method between processors, code stored computer-readable media for implementing the same method, and multi-processor computing system}

본 발명은 멀티 프로세서 컴퓨팅 기술에 관한 것으로서, 특히 프로세서 간 데이터 통신 기술에 관한 것이다.TECHNICAL FIELD The present invention relates to multiprocessor computing technology, and more particularly to interprocessor data communication technology.

Flynn의 분류체계에 따르면 MIMD(Multiple-Instruction Multiple-Data) 컴퓨터는, 프로세서들이 메모리를 공유하는지에 따라 공유-메모리 병렬 컴퓨터 및 분산-메모리 병렬 컴퓨터라는 두 개의 카테고리로 분류된다(아래의 [1] 참조). 또한, 공유-메모리 프로그래밍과 메시지 패싱(message passsing)이라는 두 개의 서로 다른 병렬 프로그래밍 방법이 존재한다. OpenMP 및 MPI(Message Passing Interface)는 각각 공유-메모리 프로그래밍 및 메시지 패싱을 위한 대표적인 표준 프로그래밍 모델이다(아래의 [1]. [2] 참조). According to Flynn's classification scheme, multiple-instruction multiple-data (MIMD) computers fall into two categories: shared-memory parallel computers and distributed-memory parallel computers, depending on whether the processors share memory (see [1] below). Reference). In addition, there are two different parallel programming methods: shared-memory programming and message passing. OpenMP and Message Passing Interface (MPI) are representative standard programming models for shared-memory programming and message passing, respectively (see [1] and [2] below).

분산-메모리 병렬 컴퓨터는 보통의 공유-메모리 병렬 컴퓨터에 비해서 일반적으로 더 비용 절감적이고 확장성(scalable)이 있다(아래의 [3] 참조). MPI 프로그램만이 분산-메모리 및 공유-메모리 병렬 컴퓨터 모두에서 실행될 수 있기 때문에, 일반적으로 MPI 프로그래밍이 효율적인 병렬 프로그램으로서 선호된다. 그러나, MPI 프로그래밍은 OpenMP 프로그래밍보다 설계하기가 훨씬 더 어려운데, 이는 프로그래머들이 프로세서 간 통신을 위한 영역(domain) 할당 및 버퍼 핸들링과 같은 세부적인 사항을 설계해야 하기 때문이다. Pacheco[1]는 MPI 프로그래밍의 어려움을 비교하기 위하여 MPI 프로그래밍을 어셈블리 프로그래밍과 비교하였다. 이러한 이유로 전문적인 프로그래머들도, 예를 들어 가장 널리 알려진 기상 모델인 MM5(Mesoscale Model5)를 병렬화하기 위하여 수 년을 필요로 한다는 문제점이 있다(아래 [4] 참조).
Distributed-memory parallel computers are generally more cost-saving and scalable than ordinary shared-memory parallel computers (see [3] below). MPI programming is generally preferred as an efficient parallel program because only MPI programs can run on both distributed-memory and shared-memory parallel computers. However, MPI programming is much more difficult to design than OpenMP programming, because programmers must design details such as domain allocation and buffer handling for interprocessor communication. Pacheco [1] compared MPI programming to assembly programming to compare the difficulties of MPI programming. For this reason, even professional programmers have a problem, for example, requiring several years to parallelize the most widely known weather model, Mesoscale Model5 (MM5) (see [4] below).

[1] P. Pacheco, Parallel Programming with MPI . San Francisco, Calif.: Morgan Kaufmann, pp. 12-31, 1996.[1] P. Pacheco, Parallel Programming with MPI . San Francisco, Calif .: Morgan Kaufmann, pp. 12-31, 1996.

[2] R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald and R. Menon, Parallel Programming in OpenMP . San Francisco, Calif.: Morgan Kaufmann, pp. 8-12, 2000. [2] R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald and R. Menon, Parallel Programming in OpenMP . San Francisco, Calif .: Morgan Kaufmann, pp. 8-12, 2000.

[3] Y. Kim, Y. Lee, J. Choi and J. Oh, "Implementation and Performance Analysis of PC Clusters using Fast PCs & High Speed Network," J. of KISS , vol. 29, no. 2, pp. 57-64, 2002. [3] Y. Kim, Y. Lee, J. Choi and J. Oh, "Implementation and Performance Analysis of PC Clusters using Fast PCs & High Speed Network," J. of KISS , vol. 29, no. 2, pp. 57-64, 2002.

[4] Y. Kim, Z. Pan, E. Takle and S. Kothari, "Parallel Implementation of Hydrostatic MM5 (Mesoscale Model)," Proc . The Eighth SIAM Conf . on Parallel Processing for Scientific Computing, Oct. 1997. [4] Y. Kim, Z. Pan, E. Takle and S. Kothari, "Parallel Implementation of Hydrostatic MM5 (Mesoscale Model)," Proc . The Eighth SIAM Conf . on Parallel Processing for Scientific Computing , Oct. 1997.

본 발명에서는 복수 프로세서에서 실행되는 병렬 프로그램을 쉽게 구현하는 방법을 제공하고자 한다. 본 발명의 범위가 상술한 목적에 의해 한정되는 것은 아니다.The present invention provides a method for easily implementing a parallel program executed on a plurality of processors. The scope of the present invention is not limited by the above-mentioned object.

본 발명은 MPI에 기초한 메시지 패싱을 위한 상위레벨 루틴을 지원하도록 설계된 HPCC(High Performance Computing Classes)에 관한 내용을 포함한다. 또한, HPCC는 이해와 사용이 개념적으로 쉬운 객체지향형 스타일로 작성될 수 있다(아래 [9] 참조).The present invention includes information on High Performance Computing Classes (HPCC) designed to support high level routines for message passing based on MPI. In addition, HPCC can be written in an object-oriented style that is conceptually easy to understand and use (see [9] below).

보통, 병렬 처리를 하게 되면 각각의 프로세서에 할당된 부영역(subdomain)을 동시에 계산하기 때문에 속도가 향상된다. 따라서, 병렬 처리에 의해 두 가지 이슈가 제기되는데, 첫 번째는 부영역을 관리하기 위한 인덱스 변형(index transformation)이고 두 번째는 프로세서들 간의 데이터 통신이다([4], [5] 참조). HPCC에 의해 정의된 객체는, 예컨대, 병렬 Fortran 프로그램에서 병렬 처리를 위해 요구되는 정보, 예컨대 인덱스 변환 및 실시간의 프로세서 간 통신을 위한 방법(또는 함수)을 제공할 수 있다. HPCC는 또한 인덱스 변환을 위한 매크로(macro)를 제공함으로써 인덱스 변환과 동시에 순차적(sequential) 프로그래밍과 병렬 프로그램이 동일한 코드를 갖도록 하기 때문에 소스 코드(source code)를 쉽게 관리할 수 있도록 해준다.In general, parallel processing improves speed because the subdomains assigned to each processor are calculated simultaneously. Therefore, two issues are raised by parallel processing, firstly index transformation for managing subregions and secondly data communication between processors (see [4] and [5]). Objects defined by the HPCC may provide methods (or functions) for, for example, information required for parallel processing in parallel Fortran programs, such as index transformations and inter-processor communication in real time. HPCC also provides macros for index conversion, making it easier to manage source code because sequential and parallel programs have the same code at the same time as index conversion.

HPCC는 주로, 유체역학 문제를 계산하기 위한 FDM(Finite Difference Method) 모델을 기초로 한 Fortran과 같은 프로그램을 지원할 수 있다. FDM은 기본적으로 분산-메모리 프로그래밍에서 최근접 통신을 요구하기 때문에, HPCC 내에 데이터 통신 함수를 정의하기 위한 포터블 클래스(portable class)들을 만드는 것에 적합하다.HPCC can support programs such as Fortran, primarily based on finite difference method (FDM) models for calculating hydrodynamic problems. Since FDM basically requires closest communication in distributed-memory programming, it is suitable for creating portable classes for defining data communication functions within HPCC.

본 발명의 상세한 설명에서는 HPCC의 세부적인 내용이 설명된다. 또한, HPCC의 적용예를 보여주기 위하여 HPCC를 이용한 레거시 코드(legacy code)의 병렬화 구현(parallel implementation)을 실시한 예를 설명한다.In the detailed description of the present invention, details of the HPCC are described. In addition, an example in which a parallel implementation of legacy code using HPCC is described to show an application example of HPCC.

상술한 종래기술의 문제점을 해결하기 위하여, 본 발명의 일 관점에 따른 프로세서 간 데이터 통신방법이 제공된다. 이 방법은, 객체지향형 클래스를 갖는 프로그램을 이용하여, 2차원 토폴로지를 갖는 프로세서 어레이의 각 프로세서 간에 데이터 통신을 수행하는 방법으로서, 2차원 이상의 데이터 영역 중 2차원의 데이터 영역을 복수 개의 서브-데이터 영역으로 분할하여 위의 각 프로세서에 할당하고, 위의 2차원을 초과하는 차원의 데이터 영역을 위의 프로세서 어레이에 할당하는 병렬처리객체를 생성하는 단계, 위의 병렬처리객체를 이용하여 위의 각 프로세서 간의 데이터 통신방식을 정의하는 통신객체를 생성하는 단계, 및 위의 통신객체에 의해 정의되는 데이터 통신방식에 따라, 위의 프로세서 어레이의 제1 프로세서에게, 위의 프로세서 어레이의 제2 프로세서에 할당된 제2 서브-데이터 영역의 경계영역의 값을 전송하는 단계를 포함한다.In order to solve the above problems of the prior art, there is provided a method for inter-processor data communication according to an aspect of the present invention. This method is a method of performing data communication between each processor of a processor array having a two-dimensional topology by using a program having an object-oriented class, wherein a plurality of sub-data is divided into two-dimensional data regions of two or more data regions. Creating a parallel processing object by dividing it into regions and assigning them to each of the above processors, and allocating a data area of more than two dimensions to the processor array, using the above parallel processing objects. Creating a communication object defining a data communication method between processors, and assigning the first processor of the above processor array to the second processor of the above processor array according to the data communication method defined by the above communication object. And transmitting a value of the boundary area of the second sub-data area.

본 발명의 다른 관점에 따른 컴퓨터로 읽을 수 있는 매체가 제공된다. 이 매체는, 객체지향형 클래스를 갖는 프로그램을 이용하여, 2차원 토폴로지를 갖는 프로세서 어레이를 포함하는 컴퓨터 시스템에, 2차원 이상의 데이터 영역 중 2차원의 데이터 영역을 복수 개의 서브-데이터 영역으로 분할하여 위의 프로세서 어레이의 각 프로세서에 할당하고, 위의 2차원을 초과하는 차원의 데이터 영역을 위의 프로세서 어레이에 할당하는 병렬처리객체를 생성하는 단계, 위의 병렬처리객체를 이용하여 위의 각 프로세서 간의 데이터 통신방식을 정의하는 통신객체를 생성하는 단계, 및 위의 통신객체에 의해 정의되는 데이터 통신방식에 따라, 위의 프로세서 어레이의 제1 프로세서에게, 위의 프로세서 어레이의 제2 프로세서에 할당된 제2 서브-데이터 영역의 경계영역의 값을 전송하는 단계를 실행시키기 위한 프로그램을 기록한, 컴퓨터로 읽을 수 있는 매체이다.According to another aspect of the present invention, a computer readable medium is provided. The medium is a computer system including a processor array having a two-dimensional topology, using a program having an object-oriented class, by dividing the two-dimensional data area of the two-dimensional or more data areas into a plurality of sub-data areas. Allocating to each processor of the processor array of the processor, and generating a parallel processing object for allocating a data area of more than two dimensions to the processor array of the processor array. Generating a communication object defining a data communication method, and assigning to the first processor of the processor array according to the data communication method defined by the communication object; Recorded a program for executing the step of transmitting the value of the boundary area of the sub-data area; A medium that can be read by a computer.

본 발명의 또 다른 관점에 따른 컴퓨팅 시스템이 제공된다. 이 시스템은, 저장장치, 및 2차원 토폴로지를 갖는 프로세서 어레이를 포함하며, 위의 저장장치에는, 2차원 이상의 데이터 영역 중 2차원의 데이터 영역을 복수 개의 서브-데이터 영역으로 분할하여 위의 프로세서 어레이의 각 프로세서에 할당하고, 위의 2차원을 초과하는 차원의 데이터 영역을 위의 프로세서 어레이에 할당하는 병렬처리객체를 생성하는 단계를 실행시키기 위한 제1 프로그램 코드, 위의 병렬처리객체를 이용하여 위의 각 프로세서 간의 데이터 통신방식을 정의하는 통신객체를 생성하는 단계를 실행하기 위한 제2 프로그램 코드, 및 위의 통신객체에 의해 정의되는 데이터 통신방식에 따라, 위의 프로세서 어레이의 제1 프로세서에게, 위의 프로세서 어레이의 제2 프로세서에 할당된 제2 서브-데이터 영역의 경계영역의 값을 전송하는 단계를 실행하기 위한 제3 프로그램 코드가 기록되어 있다. According to another aspect of the present invention, a computing system is provided. The system includes a storage device and a processor array having a two-dimensional topology, wherein the storage device divides the two-dimensional data area of the two-dimensional or more data areas into a plurality of sub-data areas, thereby providing the processor array. The first program code for executing the step of allocating to each processor in the step of generating a parallel processing object for allocating a data area of a dimension exceeding the above two dimensions to the processor array using the above parallel processing object According to the second program code for executing the step of generating a communication object defining a data communication method between each of the above processors, and the data communication method defined by the communication object, to the first processor of the above processor array Transmitting a value of the boundary area of the second sub-data area allocated to the second processor of the processor array above. Third program code for executing the system is recorded.

본 발명의 또 다른 관점에 따른 프로세서 간 데이터 통신방법이 제공된다. 이 방법은, 2차원 토폴로지를 갖는 프로세스 어레이의 각 프로세서 간에 데이터 통신을 수행하는 방법으로서, 2차원 이상의 데이터 영역 중 2차원의 데이터 영역을 복수 개의 서브-데이터 영역으로 분할하여 위의 각 프로세서에 할당하고, 위의 2차원을 초과하는 차원의 데이터 영역을 위의 프로세서 어레이에 할당하는 단계, 및 위의 프로세서 어레이의 제1 프로세서에 할당된 서브-데이터 영역의 경계영역의 값을 계산하기 위하여, 위의 프로세서 어레이의 제2 프로세서에 할당된 서브-데이터 영역의 경계영역의 값을 위의 제1 프로세서에게 전송하는 단계를 포함한다.
According to another aspect of the present invention, a method for communicating data between processors is provided. This method is a method of performing data communication between each processor of a process array having a two-dimensional topology, wherein the two-dimensional data area of the two-dimensional or more data area is divided into a plurality of sub-data areas and assigned to each of the above processors. Assigning a data area of a dimension exceeding the above two dimensions to the processor array, and calculating a value of a boundary area of the sub-data area allocated to the first processor of the processor array, And transmitting the value of the boundary area of the sub-data area allocated to the second processor of the processor array of to the first processor.

[5] Y. Kim, "Parallel Implementation of the MAS (Mesoscale Atmospheric Simulation) Model for Distributed Memory Computers," Korean J. of the Atmospheric Sciences , vol. 6, no. 2, pp. 63-69, 2003. [5] Y. Kim, "Parallel Implementation of the MAS (Mesoscale Atmospheric Simulation) Model for Distributed Memory Computers," K orean J. of the Atmospheric Sciences , vol. 6, no. 2, pp. 63-69, 2003.

[6] J. Michalakes, "RSL: A parallel runtime system library for regional atmospheric models with nesting, in Structured Adaptive Mesh Refinement(SAMR) Grid Methods, IMA Volumes in Mathematics and Its Applications(117), Springer, New York, 2000, pp. 59-74. [6] J. Michalakes, "RSL: A parallel runtime system library for regional atmospheric models with nesting, in Structured Adaptive Mesh Refinement (SAMR) Grid Methods, IMA Volumes in Mathematics and Its Applications (117), Springer, New York, 2000 , pp. 59-74.

[7] P. Burton,"Vector returns: A New Supercomputer for the NET Office Proc. The Tenth ECMWF Workshop on the Use of High Performance Computing in Meteorology, 2003. [7] P. Burton, "Vector returns: A New Supercomputer for the NET Office Proc. The Tenth ECMWF Workshop on the Use of High Performance Computing in Meteorology , 2003.

[8] RAL(Research Applications Laboratory) Annual Report: 2007, http://www.ral.ucar.edu/lar/2007[8] Research Applications Laboratory Annual Report: 2007, http://www.ral.ucar.edu/lar/2007

[9] J. E. Akin, Object oriented programming via Fortran90 /95. Cambridge, UK: Cambridge Univ. Press, pp. 23-30, 2003. [9] JE Akin, Object oriented programming via Fortran 90/95 . Cambridge, UK: Cambridge Univ. Press, pp. 23-30, 2003.

본 발명에 따르면 복수 프로세서에서 실행되는 병렬 프로그램을 쉽게 구현하는 방법을 제공할 수 있다. 본 발명의 범위가 상술한 효과에 의해 한정되는 것은 아니다.According to the present invention, a method for easily implementing a parallel program executed in a plurality of processors can be provided. The scope of the present invention is not limited by the above-mentioned effects.

도 1은 본 발명의 일 실시예에 따라, HPCC가 사용하는 논리 영역 분할의 예를 나타낸 것이다.
도 2는 본 발명의 일 실시예에 따라, 3차원 영역을 2차원 프로세서 어레이에 투영하여 분할한 예를 나타낸다.
도 3은 본 발명의 일 실시예에 따른, 최근접 이웃 간의 통신 방법을 설명하기 위한 것이다.
도 4의 (a)는 본 발명의 일 실시예에 의한 통신 템플릿을 위한 심볼 테이블을 나타낸다.
도 4의 (b)는 본 발명의 일 실시예에 따른 통신방향과 매칭되는 8개의 엔트리의 이름을 나타낸 것이다.
도 5는 본 발명의 일 실시예에 따라, 여러 개의 부영역을 하나의 호스트 프로세서(host processor)로 합하는 과정을 설명한 것이다.
도 6은 본 발명의 일 실시예에 따른 HPCC를 사용한 결과를 MPI 함수를 이용한 결과와 비교한 표이다.
도 7은 PC 클러스터에서 서로 다른 개수의 프로세서들을 이용한 각 경우의 병렬 TIDE 프로그램의 속도를 나타낸 것이다.
도 8은 PC 클러스터에서 서로 다른 개수의 프로세서들을 사용하여 얻은 병렬 QPM 프로그램의 속도를 나타낸 것이다.
도 9는 PC 클러스터에서 서로 다른 개수의 프로세서들을 사용하여 얻은 병렬 ACDM 프로그램의 속도를 나타낸 것이다.
도 10a, 도 10b, 및 도 10c는 본 발명의 일 실시예에 따른 프로세서 간 데이터 통신방법을 순서도로 간략히 나타낸 것이다.1 illustrates an example of logical region partitioning used by HPCC, according to an embodiment of the present invention.
2 illustrates an example in which a three-dimensional area is projected and divided into a two-dimensional processor array according to an embodiment of the present invention.
3 is a diagram for describing a communication method between nearest neighbors according to an embodiment of the present invention.
4 (a) shows a symbol table for a communication template according to an embodiment of the present invention.
Figure 4 (b) shows the names of the eight entries matching the communication direction according to an embodiment of the present invention.
FIG. 5 illustrates a process of combining several subregions into one host processor according to an embodiment of the present invention.
6 is a table comparing the results using the HPCC according to an embodiment of the present invention with the results using the MPI function.
7 shows the speed of a parallel TIDE program in each case using different numbers of processors in a PC cluster.
8 shows the speed of a parallel QPM program obtained using different numbers of processors in a PC cluster.
9 illustrates the speed of a parallel ACDM program obtained using different numbers of processors in a PC cluster.
10A, 10B, and 10C are flowcharts schematically illustrating a method for communicating data between processors according to an embodiment of the present invention.

이하 첨부된 도면을 참조하여 본 발명 및 그 실시예를 설명한다. 이하의 구체적인 실시예는 본 발명을 예시적으로 설명하는 것일 뿐, 본 발명의 범위를 제한하는 것으로 의도되지 아니한다. 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다.Hereinafter, exemplary embodiments will be described with reference to the accompanying drawings. The following specific examples are merely illustrative of the present invention and are not intended to limit the scope of the present invention. In describing the present invention, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

본 발명의 일 실시예에서, HPCC는 Parallel 클래스 및 Comm 클래스라는 두 개의 주요한 객체지향 클래스를 포함하여 구성된다. 이 두 개의 클래스는 예컨대 프로그램 언어인 Fortran90으로 작성될 수 있다. 이 두 클래스들에 의해 정의된 객체는 프로그램의 실행 중 실시간 병렬 처리를 위한 정보 및 방법들을 유지할 수 있다.In one embodiment of the invention, HPCC is Parallel Class and Comm It consists of two main object-oriented classes called classes. These two classes can be written, for example, in the programming language Fortran90. The objects defined by these two classes can maintain information and methods for real-time parallel processing during program execution.

Parallel 클래스에 의해 정의된 객체는 병렬 처리를 위한 정보를 생성하고, 병렬 프로그램은 실행 시에 이 정보를 사용할 수 있다. 이하 영역(domain)을 부영역으로 분할하는 방법을 설명한다. Parallel Objects defined by classes generate information for parallelism, and parallel programs can use this information at run time. Hereinafter, a method of dividing a domain into subregions will be described.

영역을 부영역으로 분할하는 것은 병렬 프로그래밍 스타일을 결정하는 중요한 전략이다. 크게 물리적 분할(physical decomposition) 및 논리 분할(logical decomposition)로 나눌 수 있다(아래 [10] 참조). HPCC는 2차원 논리 영역 분할을 이용하는데, 이 방법에 따르면 모든 프로세서들은 처음에 동일한 전체 영역을 읽게 되고, 그 다음에 각각의 프로세서는 할당된 부영역을 계산하게 된다. 논리 영역 분할은 물리 영역 분할에 대해 다음과 같은 비교 우위를 갖는다. 첫째, 원래의 인덱스(original index)가 사용되기 때문에 로컬 인덱스가 필요 없다. 그 대신, 각각의 프로세서들은 할당된 부영역의 범위만을 알면 된다. 둘째, 부영역에 할당되지 않은 미사용 영역은 버퍼로서 사용될 수 있기 때문에 다른 물리적 통신 버퍼를 필요로 하지 않는다. 셋째, 모든 영역들인 정적으로 할당되어 있기 때문에 서로 다른 개수의 프로세서들이 사용될 때에도 재컴파일(recomplie)이 필요하지 않다. 다른 병렬화 라이브러리들은 논리 영역 분할을 사용하기도 한다.Dividing a region into subregions is an important strategy for determining a parallel programming style. It can be divided into physical decomposition and logical decomposition (see [10] below). HPCC uses two-dimensional logical region partitioning, which means that all processors initially read the same entire region, and then each processor calculates the allocated subregions. Logical region partitioning has the following comparative advantage over physical region partitioning. First, no local index is needed because the original index is used. Instead, each processor only needs to know the extent of its assigned subregion. Second, an unused area that is not allocated to a subregion can be used as a buffer and thus does not require another physical communication buffer. Third, because all regions are statically allocated, no recomplie is needed even when different numbers of processors are used. Other parallelization libraries also use logical domain partitioning.

도 1은 HPCC가 사용하는 논리 영역 분할의 예를 나타낸 것이다. 도 1에서, 영역의 크기는 16 * 16이며, 4개의 프로세서(P0, P1, P2, P3)가 2 * 2 프로세서 어레이 형태로 배열될 수 있다. 모든 프로세서들은 처음에 16 * 16의 영역 전체를 읽어들이고, 그 다음 각 프로세서는 굵은 실선으로 표시한 할당된 8 * 8 부영역을 계산할 수 있다.1 shows an example of logical region partitioning used by HPCC. In FIG. 1, the size of the region is 16 * 16, and four processors P0, P1, P2, and P3 may be arranged in the form of a 2 * 2 processor array. All processors initially read the entire 16 * 16 area, and each processor can then calculate the allocated 8 * 8 subfields, shown in bold solid lines.

2차원을 초과하는 더 높은 차원, 예컨대 3차원, 4차원 영역은 2차원 프로세서 어레이에 투영(project)될 수 있다. 도 2는 3차원 영역을 2차원 프로세서 어레이에 투영하여 분할한 예를 나타낸다. 여기서 노드들로 이루어진 하나의 행(column)에 대한 계산은 하나의 프로세서에 할당될 수 있다. 도 2에서는 프로세서들이 i 및 j 방향으로 3 * 3 배열된 9개의 2차원 부영역에 할당된 예를 나타낸다. 그리고 동일한 i, j 값을 갖지만 서로 다른 k값을 갖는 부영역들은 동일한 프로세서에서 계산될 수 있다. 이하, Parallel 클래스에 의해 정의되는 객체를 설명한다.Higher dimensions beyond two dimensions, such as three-dimensional and four-dimensional regions, can be projected onto a two-dimensional processor array. 2 shows an example in which a three-dimensional area is projected and divided into a two-dimensional processor array. Here, the calculation for one column of nodes may be allocated to one processor. 2 shows an example in which processors are allocated to nine two-dimensional subregions arranged 3 * 3 in the i and j directions. Subregions having the same i and j values but different k values may be calculated in the same processor. Or less, Parallel Describes an object defined by a class.

Parallel 클래스는 프로세스 식별자, 프로세서 토폴로지, 로컬 범위(local range), 통신 버퍼 등과 같은 병렬 처리를 위한 정보를 생성할 수 있다. 이 객체의 정보는 '정보 숨김(information hiding)' 규칙에 따라(위의 [9] 참조) 사용자에게 노출되기보다는, 다른 객체 및 매크로(macro)에 의해 내부에서 사용될 수 있다. Parallel The class may generate information for parallel processing, such as process identifiers, processor topology, local range, communication buffers, and the like. The information in this object can be used internally by other objects and macros, rather than being exposed to the user according to the 'information hiding' rules (see [9] above).

HPCC는 프로세서 어레이의 2차원 토폴로지를 사용할 수 있으며, 따라서 Parallel에 의한 객체를 정의하기 위하여 투사된 2차원 영역의 크기가 파라미터로서 제공될 수 있다. 예컨대, 3차원 영역의 크기가 (X * Y * Z)로 주어진다면, 1차원 및 2차원 영역의 크기를 나타내는 X 및 Y만이 클래스에 제공될 수 있다. 아래의 [코드 1], [코드 2]는 예시적인 코드이다.
The HPCC can use a two-dimensional topology of the processor array, so that the size of the projected two-dimensional region can be provided as a parameter to define the object by Parallel . For example, if the size of the three-dimensional region is given by (X * Y * Z), only X and Y representing the size of the one-dimensional and two-dimensional regions can be provided to the class. [Code 1] and [Code 2] below are exemplary codes.

[코드 1] type ( process _ type ) :: process [Code 1] type ( process _ type ) :: process

[코드 2] process = hpcc _ define _ parallel (X, Y)
[Code 2] process = hpcc _ define _ parallel (X, Y)

이 예에서, process는 Parallel에 의해 정의된 객체의 이름이며, 투사된 영역의 크기는 X * Y 이다. 만일 동일한 프로그램에서 서로 다른 크기의 영역이 사용된다면, 서로 다른 객체들이 서로 다른 크기로 정의될 수 있다. 이하, 프로세서의 개수에 따른 2차원의 최적 값을 계산하는 방법을 설명한다.In this example, process is the name of the object defined by Parallel , and the size of the projected area is X * Y. If different sized regions are used in the same program, different objects can be defined with different sizes. Hereinafter, a method of calculating a two-dimensional optimal value according to the number of processors will be described.

병렬 프로그램을 실행하기 위하여, 사용자는 병렬 프로그램에게 프로세서의 개수를 제공할 수 있다. Parallel 클래스는 주어진 프로세서의 개수로부터 2차원의 최적 값을 계산할 수 있다. 아래 [식 1]은 최적 값을 계산하기 위한 알고리즘의 예를 나타낸다.
In order to execute the parallel program, the user can provide the parallel program with the number of processors. The Parallel class can calculate the optimal two-dimensional value from a given number of processors. Equation 1 below shows an example of an algorithm for calculating an optimal value.

[식 1][Formula 1]

[식 1]에 따른 알고리즘에서, 주어진 프로세서의 개수는 n이고, 2차원의 최적 값은 p와 q이고, 영역의 크기는 M * N이고, 부영역의 크기는 M/p * N/q이다. 다. 이때, p * q = n을 만족한다. 각 프로세서에서 최근접 이웃 간의 통신(nearest neighbor communication)을 위한 데이터 크기는 약 2 * (M/p * N/q)이며(가장자리 영역 프로세서는 제외), 이 데이터의 크기는 M/p와 N/q가 같은 값을 가질 때에 가장 작게 된다. 따라서, 최적 값은 부영역이 가능한 정사각형의 모양을 갖도록 할 때에 얻을 수 있다(즉, M/p ≒ N/q). 이하, Comm 클래스에 의한 프로세서 간 통신을 설명한다.In the algorithm according to [Equation 1], the number of processors given is n , the two-dimensional optimal values are p and q , the size of the region is M * N, and the size of the subregion is M / p * N / q . . All. At this time, p * q = n is satisfied. The data size for nearest neighbor communication in each processor is approximately 2 * ( M / p * N / q ) (excluding edge area processors), and the size of this data is M / p and N / It is the smallest when q has the same value. Therefore, an optimal value can be obtained when the subregions have a square shape where possible (ie, M / p ≒ N / q ). Below, Comm Describes interprocessor communication by class.

Comm 클래스는 통신 템플릿 및 프로세서 간 통신 함수를 포함하는 객체를 정의한다. 이 템플릿은 데이터 통신의 방향을 결정하는데, 그 후에 변수의 통신 패턴이 이 객체에 추가된다. 이하 통신 템플릿의 초기화에 대하여 설명한다. Comm The class defines an object that contains a communication template and an interprocessor communication function. This template determines the direction of data communication, after which the communication pattern of the variable is added to this object. The initialization of the communication template will be described below.

Comm 클래스는 두 개의 파라미터를 갖는 객체를 정의한다. 첫 번째 파라미터는 Parallel 클래스에 의해 정의된 객체이며, 두 번째 파라미터는 2차원을 초과하는 차원의 크기이다. 아래의 [코드 3]은 예시적인 코드이다.
Comm The class defines an object with two parameters. First parameter is Parallel An object defined by a class whose second parameter is the size of the dimension beyond two dimensions. [Code 3] below is an example code.

[코드 3] comm _ template = hpcc _ define _ comm ( process , val )
[Code 3] comm _ template = hpcc _ define _ comm ( process , val )

[코드 3]에서 comm _ template는 Comm 클래스에 의해 정의되는 객체의 이름이다. Process는 Parallel 클래스에 의해 정의된 객체이다. val은 다음과 같이 설명될 수 있다. 영역의 크기가 (N1 * N2 .... * Nn, (n≥2))라면, 투영된 2차원 영역의 크기는 (N1 * N2)로 주어진다. 이때, n=2라면, (N1 * N2) =(N1 * N2 * 1)이 되고, 2차원을 초과하는 영역의 크기는 1로 주어지기 때문에 val=1이 된다. 즉, 2차원 영역에서는 언제나 val=1이 된다. 이와 다르게 n>2라면 val=(N3 .... * Nn)이된다. process와 val로부터의 정보를 이용하여, Comm 클래스는 통신 템플릿의 크기와 모양을 결정할 수 있다. 이하 최근접 이웃 간의 통신에 대하여 설명한다.In [Code 3] comm _ template is Comm Name of the object defined by the class. Process is an object defined by the Parallel class. val can be described as follows. If the size of the area is ( N1 * N2 .... * Nn , ( n ≧ 2)), then the size of the projected two-dimensional area is given by ( N1 * N2 ). At this time, if n = 2, then ( N1 * N2 ) = ( N1 * N2 * 1), and the size of the region exceeding two dimensions is given by 1, so val = 1. That is, in a two-dimensional region, val = 1 always. Alternatively, if n > 2, then val = ( N3 .... * Nn ). Comm using information from process and val The class can determine the size and shape of the communication template. Hereinafter, communication between nearest neighbors will be described.

HPCC는 Fortran FDM 모델 프로그램을 병렬화하는데 유용하다. FDM에서 시변값(time variance value)은 보통 인접한 값에 의해 계산된다. [코드 4]는 FDM 코드의 일 예이다.
HPCC is useful for parallelizing Fortran FDM model programs. In FDM, time variance values are usually calculated from adjacent values. [Code 4] is an example of an FDM code.

[코드 4] UA (i,j,k) = UA (i+1,j,k) + UA (i,j+1,k)+ UA (i+1,j+1,k)
[Code 4] UA (i, j, k) = UA (i + 1, j, k) + UA (i, j + 1, k) + UA (i + 1, j + 1, k)

[코드 4]에서 UA는 시변값을 나타내고, 인덱스 (i, j, k)는 부영역의 특정 지점을 나타낸다. 특정 부영역의 가장자리에 위치하지 않은 UA (i,j,k)를 계산하기 위하여 다른 부영역과 데이터 통신을 할 필요는 없다. 그러나, 특정 부영역의 가장자리에 위치한 UA (i,j,k)를 계산하기 위하여 다른 부영역과 데이터 통신을 할 필요가 있다. 따라서, 부영역의 가장자리에서 UA (i,j,k)를 계산하기 이전에, 이웃 프로세서로부터 송신된 데이터를 이용하여 버퍼를 갱신할 필요가 있다. In [Code 4], UA represents a time-varying value, and indexes ( i, j, k ) represent specific points of a subregion. It is not necessary to communicate data with other subregions in order to calculate UA (i, j, k) not located at the edge of a particular subregion. However, in order to calculate UA (i, j, k) located at the edge of a specific subregion, it is necessary to perform data communication with another subregion. Therefore, before calculating UA (i, j, k) at the edge of the subregion, it is necessary to update the buffer using data transmitted from the neighbor processor.

도 3은 [코드 4]를 위한 최근접 이웃 간의 통신을 나타낸 것이다.3 shows communication between nearest neighbors for [Code 4].

[코드 4]에 따르면, 도 3의 (a)의 P0, P1에서, 각 부영역의 아래쪽 가장자리 라인에서의 UA (i,j,k)를 계산하기 위하여, UA (i+1,j,k)(어둡게 표시한 영역)의 값을 알아야 하는데, 이 UA (i+1,j,k)는 P2 및 P3에 할당되어 있기 때문에 별도의 데이터 통신을 하지 않았다면 이 UA (i+1,j,k) 값을 P0, P1이 알 수가 없다. 따라서, 갱신된 UA (i+1,j,k) 값이 P2에서 P0로 송신되고, P3에서 P1으로 각각 송신되어야 한다. 도 3의 (b)에서, 동일한 이유로, 갱신된 UA (i,j+1,k) 값이 P1에서 P0로 송신되고, P3에서 P2로 각각 송신되어야 한다. 도 3의 (c)에서, 마찬가지로, P0의 아래쪽 가장자리 중 가장 오른쪽의 UA (i+1,j+1,k) 값은 P3에 할당되어 있기 때문에, UA(i+1,j+1,k) 값이 P1 및 P2 뿐만 아니라 P3로부터도 송신되어야 한다.According to [Code 4], in P0 and P1 of Fig. 3A, in order to calculate UA (i, j, k) in the lower edge line of each subregion, UA (i + 1, j, k) ) , You need to know the value of the (darkened area). Since this UA (i + 1, j, k) is assigned to P2 and P3, if there is no separate data communication, this UA (i + 1, j, k) ) P0 and P1 are unknown. Therefore, the updated UA (i + 1, j, k) values should be sent from P2 to P0 and from P3 to P1 respectively. In FIG. 3B, for the same reason, the updated UA (i, j + 1, k) value should be transmitted from P1 to P0, and from P3 to P2, respectively. In FIG. 3C, similarly, since the rightmost UA (i + 1, j + 1, k) value among the lower edges of P0 is assigned to P3, UA (i + 1, j + 1, k). ) Value must be transmitted from P3 as well as P1 and P2.

FDM 코드를 계산하기 위한 최근접 이웃 간의 통신을 위한 모든 가능한 방향은 8가지로서 (i-1,j-1), (i-1,j), (i-1,j+1), (i,j-1), (i,j+1), (i+1,j-1), (i+1,j) 및 (i+1,j+1)와 같은 인덱스로 정의될 수 있다. 이 방향들은 예컨대 다음과 같은 각각 HPCC 상수로 정의될 수 있다: HPCC _ IMJM , HPCC _ IM , HPCC _ IMJP , HPCC_JM, HPCC _ JP , HPCC _ IPJM , HPCC _ IP 및 HPCC _ IPJP . 여기서 I, J는 각각 방향을 나타내고, P는 plus, M은 minus를 의미한다. 이 상수들을 이용함으로써, 사용자는 통신 패턴을 미리 정의된 객체에 추가할 수 있다. 따라서, 위의 [코드 4] 앞에 최근접 이웃 간 통신을 정의하기 위하여, 예컨대 [코드 5]와 같은 코드가 삽입될 수 있다.
There are eight possible directions for communication between the nearest neighbors for calculating the FDM code: (i-1, j-1), (i-1, j), (i-1, j + 1), (i , j-1), (i, j + 1), (i + 1, j-1), (i + 1, j) and (i + 1, j + 1) . The directions may for example be defined as follows, respectively HPCC constants: HPCC IMJM _, _ HPCC IM, HPCC _ IMJP, HPCC_JM, HPCC JP _, _ HPCC IPJM, HPCC _ _ IP and HPCC IPJP. Where I and J are directions, P is plus, and M is minus. By using these constants, a user can add a communication pattern to a predefined object. Thus, in order to define the communication between the nearest neighbors before the above [Code 4], for example, a code such as [Code 5] may be inserted.

[코드 5] call hpcc _ add _ comm ( comm _ template , UA , HPCC _ IP + HPCC _ JP + HPCC _ IPJP )
[Code 5] call hpcc _ add _ comm ( comm _ template , UA , HPCC _ IP + HPCC _ JP + HPCC _ IPJP )

[코드 5]에서 hpcc _ add _ comm은 함수를 나타내며, UA는 comm _ template이라는 템플릿에 추가되는 변수이고, HPCC _ IP + HPCC _ JP + HPCC _ IPJP 은 UA 를 계산하기 위한 통신 패턴을 나타낸다. Code 5 in hpcc _ add _ comm represents a function, UA is a variable that is added to the template of comm _ template, HPCC _ IP + HPCC _ JP + HPCC _ IPJP represents a communication pattern for calculating the UA.

프로세서 간의 물리 데이터 통신은 데이터는 송수신하기 위하여 사전-협상 및 사후-협상을 필요로 한다. 이러한 통신 오버헤드(overhead)를 줄이기 위하여, 데이터를 한 번에 가능한 많이 송수신하는 것이 효율적이다. 따라서 HPCC는 통신 패턴들을 가장 작은 개수의 템플릿들로 이어 붙일 수 있다(collapse). 아래의 [코드 6] 내지 [코드 9]는 서로 다른 통신 패턴을 갖는 변수 a, b, c, d를 comm_template이라는 템플릿에 추가하는 예이다.
Physical data communication between processors requires pre-negotiation and post-negotiation to transmit and receive data. In order to reduce this communication overhead, it is efficient to send and receive data as much as possible at one time. Hence, the HPCC can collapse the communication patterns into the smallest number of templates. [Code 6] to [Code 9] below are examples of adding variables a, b, c, and d having different communication patterns to a template called comm_template .

[코드 6] call hpcc _ add _ comm ( comm _ template ,a, HPCC _ IP + HPCC _ IM + HPCC _ IPJM ) [Code 6] call hpcc _ add _ comm ( comm _ template , a, HPCC _ IP + HPCC _ IM + HPCC _ IPJM )

[코드 7] call hpcc _ add _ comm ( comm _ template ,b, HPCC _ IP + HPCC _ JP + HPCC _ IPJM ) [Code 7] call hpcc _ add _ comm ( comm _ template , b, HPCC _ IP + HPCC _ JP + HPCC _ IPJM )

[코드 8] call hpcc _ add _ comm ( comm _ template ,c, HPCC _ IP + HPCC _ IMJM ) [Code 8] call hpcc _ add _ comm ( comm _ template , c, HPCC _ IP + HPCC _ IMJM )

[코드 9] call hpcc _ add _ comm ( comm _ template ,d, HPCC _ IMJP )
[Code 9] call hpcc _ add _ comm ( comm _ template , d, HPCC _ IMJP )

HPCC는 템플릿의 패턴을 유지하기 위하여 심볼 테이블을 사용할 수 있다. 도 4의 (a)는 상술한 코드에서의 comm _ template를 위한 심볼 테이블을 나타낸다. 각 템플릿은 8개의 서로 다른 최근접 이웃 간 통신 방향을 위한 8개의 엔트리(entry)를 포함할 수 있다. 매칭된(matched) 통신 방향을 갖는 8개의 엔트리의 이름은 도 4의 (b)에 나타내었다. 상술한 코드 중 변수 a는 3 방향의 통신, 즉, HPCC _ IP , HPCC_IM, 및 HPCC _ IPJM 방향의 통신이 필요하다. 이러한 방향들의 매칭된 이름이 각각 S, N, SW이기 때문에, S, N, SW로부터 시작하는 링크-리스트(linked-list)에 3개이 노드가 삽입될 수 있다. 비슷한 방법으로, 변수 b와 c에 관한 노드들이 심볼 테이블에 삽입될 수 있다. HPCC가 각 엔트리를 심볼 테이블에 삽입할 때에, HPCC는 패턴을 정렬(sort)하여 이어 붙일 수 있다(collapse). 예컨대, [코드 6]의 호출에 의해 엔트리 S, N, SW에 a가 삽입되고, 그 다음 [코드 7]의 호출에 의해 엔트리 E, S, SW에 b가 삽입되고, 그 다음 [코드 8]의 호출에 의해 엔트리 S, NW에 c가 삽입될 수 있고, 그 다음 [코드 9]의 호출에 의해 엔트리 SW에 d가 삽입될 수 있다. 그 다음, 통신 오버헤드를 줄이기 위해 모든 변수들을 한 번에 통신하도록 [코드 10]을 사용할 수 있다.
The HPCC may use a symbol table to maintain the pattern of the template. Of Figure 4 (a) shows the symbol table for the comm _ template in the above-described code. Each template may include eight entries for eight different nearest neighbor communication directions. The names of the eight entries with matched communication directions are shown in FIG. 4 (b). Variable a in the above-described code is a three-way communication, that is, HPCC _ IP , HPCC_IM , and HPCC _ IPJM Directional communication is required. Since the matched names of these directions are S, N and SW, respectively, three nodes can be inserted into the linked-list starting from S, N and SW. In a similar way, nodes for variables b and c can be inserted into the symbol table. When the HPCC inserts each entry into the symbol table, the HPCC can sort the pattern and collapse it. For example, a is inserted into the entries S, N and SW by the call of [Code 6], and then b is inserted into the entries E, S and the SW by the call of [Code 7], and then [code 8]. C can be inserted into the entry S, NW by the call of d , and then d can be inserted into the entry SW by the call of [code 9]. Then, you can use [Code 10] to communicate all variables at once to reduce the communication overhead.

[코드 10] call hpcc _ exec _ comm ( comm _ template )
[Code 10] call hpcc _ exec _ comm ( comm _ template )

그 다음, 데이터를 출력하기 위하여, 모든 부영역들은 도 5에 도시된 P0 상의 원래 영역(original domain)으로 통합될 수 있다. 여기서 P0는 호스트 프로세서로서 간주될 수 있다. 그 다음 P0는 결과값을 순차적으로 출력할 수 있다. Comm 클래스는 이러한 기능을 위한 함수를 포함할 수 있으며, 다음과 같은 코드로 구현될 수 있다.
Then, to output the data, all the subregions can be merged into the original domain on P0 shown in FIG. Where P0 may be considered as a host processor. P0 can then output the results sequentially. Comm Classes can contain functions for these functions, and they can be implemented with code like the following:

[코드 11] call hpcc _ compose ( process , var , ksize )
[Code 11] call hpcc _ compose ( process , var , ksize )

HPCC는 MPI_ALLREDUCE 및 MPI_BCAST와 같은 다른 MPI API를 지원할 수 있다. HPCC 함수들은 관련된 MPI 함수들을 직접 호출하기 때문에 구현이 쉽다. HPCC 함수 호출의 예는 다음과 같다.
HPCC can support other MPI APIs such as MPI_ALLREDUCE and MPI_BCAST. HPCC functions are easy to implement because they directly call related MPI functions. An example HPCC function call is shown below.

[코드 12] call hpcc _ broascast ( process , var ) [Code 12] call hpcc _ broascast ( process , var )

[코드 13] call hpcc _ allreduce ( process , var , HPCC _ SUM )
[Code 13] call hpcc _ allreduce ( process , var , HPCC _ SUM )

이하, 인덱스 변환에 대하여 설명한다.Hereinafter, index conversion will be described.

병렬 컴퓨터의 각 프로세서는 오직 할당된 부영역만을 계산하기 때문에, 원래의 루프 인덱스(original loop index)들은 변환되어야 한다. 예컨대 도 1에서, 전체 영역의 크기는 16 * 16이기 때문에 이 영역을 계산하기 위한 do-loop는 [코드 14]와 같이 주어질 수 있다.
Since each processor in a parallel computer only computes the allocated subregions, the original loop indexes must be converted. For example, in FIG. 1, since the size of the entire region is 16 * 16, a do-loop for calculating this region may be given as shown in [Code 14].

[코드 14][Code 14]

DODO I=1, 16 I = 1, 16

DODO J=1, 16 J = 1, 16

......

ENDDOENDDO

그러나, 병렬 처리에 있어서, P1에서의 부영역은 8 * 8이다. 따라서, P1에서의 do-loop을 위한 인덱스는 [코드 15]와 같이 변환될 수 있다.
However, in parallel processing, the subregion in P1 is 8 * 8. Therefore, the index for the do-loop in P1 may be converted as shown in [Code 15].

[코드 15][Code 15]

DODO I=9, 16 I = 9, 16

DODO J=1, 8 J = 1, 8

HPCC는 아래의 매크로를 이용하여 인덱스를 변환할 수 있다.
HPCC can convert the index using the following macro:

[코드 16] [Code 16]

## definedefine $ $ parallelparallel __ dodo (( processprocess ,i,, i, globalglobal __ beginbegin ,, globalglobal __ endend )\ ) \

dodo i= i = processprocess %% locallocal __ beginbegin %i,% i, processprocess %% locallocal __ endend %i;\ % i; \

ifif ((( ((( globalglobal __ beginbegin )<=i).) <= i). andand .(i<=(. (i <= ( globalglobal __ endend ))) ))) thenthen

## definedefine $ $ parallelparallel __ enddoenddo endifendif ; ; enddoenddo

[코드 16]에서 매크로의 첫 번째 파라미터는 Parallel 클래스에 의해 미리 정의된 객체이다. 이 객체는 계산을 위한 로컬 범위 local _ begin %i, local _ end %i를 갖고 있으며, 인덱스들은 이 범위 내에서 변환된다. 다른 두 개의 파라미터들은 원래 영역(original domain)이 완전히 계산되지 않은 경우에만 마스크 범위(mask range)를 제공한다. HPCC 매크로는 이 코드를 [코드 17]과 같이 변경할 수 있다.
In Code 16, the first parameter of the macro is Parallel An object predefined by the class. This object, and has a local scope, local _ begin% i, local _ end% i for the calculation, the index are converted within this range. The other two parameters provide a mask range only if the original domain is not fully computed. The HPCC macro can change this code as shown in [Code 17].

[코드 17][Code 17]

DODO I = 1, I = 1, IM IM → $ → $ parallelparallel __ dodo (( processprocess ,i,1,, i, 1, imim ) )

DODO J = 1, J = 1, JM JM → $ → $ parallelparallel __ dodo (( processprocess ,j,1,, j, 1, jmjm ) )

… ...

ENDDOENDDO $ $ parallelparallel __ enddoenddo

소스 코드를 용이하게 유지하기 위하여, HPCC는 순차(sequential) 프로그램과 병렬 프로그램을 동일 소스 코드로 유지한다. 컴파일러는 순차 프로그램인지 또는 병렬 프로그램인지를 파라미터 플래그(flag)에 의해 구별할 수 있다. 만일 컴파일러가 순차 프로그램을 컴파일하면, 원래의 인덱스는 변환 없이 사용된다. 따라서, 사용자가 소스 코드를 변경(modify)할 때에 순차 프로그램 및 병렬 프로그램은 동시에 변경될 수 있다. 반면, 다른 병렬 라이브러리들은 로컬 인덱스를 변수로서 저장하기 때문에(위의 [6], [7] 참조) 코드가 병렬화될 때에 루프 변수들이 변경되어야만 한다.
In order to easily maintain the source code, HPCC keeps the sequential program and the parallel program in the same source code. The compiler can distinguish by parameter flags whether it is a sequential program or a parallel program. If the compiler compiles a sequential program, the original index is used without conversion. Thus, when the user modifies the source code, the sequential program and the parallel program can be changed at the same time. On the other hand, because other parallel libraries store local indexes as variables (see [6] and [7] above), loop variables must be changed when the code is parallelized.

<실험예><Experimental Example>

HPCC는 프로그램의 병렬화 기간을 단축할 뿐만 아니라 MPI 함수를 곧장 이용하는 것에 비교하여 더 좋은 성능을 제공한다. 이하 HPCC 함수와 MPI 함수의 성능 차이를 비교한다. 그 다음, 일반적인 성능을 보여주기 위하여 HPCC를 이용하여 병렬화한 3가지 레거시 코드의 속도 향상을 제시한다. 병렬 프로그램은 Intel Xeon E5405 2.0 GHz PC와 Gigabit Ethernet 스위치로 구성된 PC 클러스터에서 실행되었다.HPCC not only shortens the parallelism period of the program, but also provides better performance compared to using MPI functions directly. The performance differences between the HPCC and MPI functions are compared below. Next, we show the speedup of three legacy codes parallelized using HPCC to show general performance. The parallel program was run on a PC cluster consisting of an Intel Xeon E5405 2.0 GHz PC and a Gigabit Ethernet switch.

비교를 위하여 HPCC 함수를 이용한 테스트 병렬 프로그램과 MPI 함수를 직접 이용한 테스트 병렬 프로그램을 설계하였다. HPCC를 이용한 프로그램은 각각 8개의 서로 다른 통신 방향을 갖는 두 개의 변수 u, v의 데이터 통신을 포함한다. 구체적인 코드는 아래의 [코드 18], [코드 19], [코드 20]과 같다.
For comparison, a test parallel program using HPCC function and a test parallel program using MPI function were designed. The program using the HPCC includes data communication of two variables u and v each having eight different communication directions. Specific codes are as shown in [Code 18], [Code 19], and [Code 20] below.

[코드 18] call hpcc _ add _ comm ( comm , u, HPCC _ IP + HPCC _ IM + HPCC _ JP + HPCC_JM + HPCC _ IPJP + HPCC _ IMJP + HPCC _ IPJM + HPCC _ IMJM ) [Code 18] call hpcc _ add _ comm ( comm , u, HPCC _ IP + HPCC _ IM + HPCC _ JP + HPCC_JM + HPCC _ IPJP + HPCC _ IMJP + HPCC _ IPJM + HPCC _ IMJM )

[코드 19] call hpcc _ add _comm( comm , v, HPCC _ IP + HPCC _ IM + HPCC _ JP + HPCC_JM + HPCC _ IPJP + HPCC _ IMJP + HPCC _ IPJM + HPCC _ IMJM ) [Code 19] call hpcc _ add _ comm ( comm , v, HPCC _ IP + HPCC _ IM + HPCC _ JP + HPCC_JM + HPCC _ IPJP + HPCC _ IMJP + HPCC _ IPJM + HPCC _ IMJM )

[코드 20] call hpcc _ exec _ comm ( comm )
[Code 20] call hpcc _ exec _ comm ( comm )

각 변수의 크기는 (54 * 54 * 21)이며, 따라서 HPCC _ IP 및 HPCC _ IM의 통신 데이터 크기는 (54 * 1 * 21 * 2)이고, HPCC _ JP 및 HPCC _ JM의 통신 데이터 크기는 (1 * 54 * 21 * 2)이고, HPCC _ IPJP, HPCC _ IMJP, HPCC _ IPJM 및 HPCC _ IMJM 의 통신 데이터 크기는 (1 * 1 * 21 * 2)이다. 또 다른 프로그램에 의하여도 동일한 결과가 도출되지만 MPI_SEND와 MPI_RECV 함수를 직접 이용한다는 점이 다르다. 도 6은 비교 결과를 나타낸다. MPI 프로그램은 자신의 통신 버퍼와 함께 여러 개의 송신 함수 및 수신 함수를 포함하는데 비해, HPCC 프로그램은 모든 통신 데이터를 하나의 버퍼에 결합하고, 내부에 오직 하나의 송신 함수 및 수신 함수를 포함하기 때문에 속도가 빠르다.
The size of each variable is (54 * 54 * 21), so the communication data size of HPCC _ IP and HPCC _ IM is (54 * 1 * 21 * 2), and the communication data size of HPCC _ JP and HPCC _ JM (1 * 54 * 21 * 2), HPCC _ IPJP , HPCC _ IMJP , HPCC _ IPJM And the communication data size of HPCC _ IMJM is (1 * 1 * 21 * 2). Another program produces the same result, except that MPI_SEND and MPI_RECV functions are used directly. 6 shows a comparison result. MPI programs include multiple send and receive functions with their communication buffers, whereas HPCC programs combine all communication data into one buffer and contain only one send and receive functions internally. Is fast.

3개의 서로 다른 FDM 기반 모델이 HPCC에 의해 병렬화되었다. 이 모델들은 각각 2차원, 3차원, 4차원 영역에 기초한 것이며, 3차원 및 4차원 영역들은 각각 고정된 2차원 영역보다 고차인 1개의 차원 및 2개의 차원을 갖는다.
Three different FDM-based models were parallelized by HPCC. These models are based on two-dimensional, three-dimensional and four-dimensional regions, respectively, and the three-dimensional and four-dimensional regions each have one dimension and two dimensions that are higher order than the fixed two-dimensional region.

<실험예 1><Experimental Example 1>

TIDE 모델은 해변에서의 파고를 시뮬레이션하는 기상모델이다. 이 모델은 2차원 영역을 기초로 하며, [코드 21]은 영역 process를 정의한다. 이때 영역의 크기는 IM * JM (= 421 * 385)이다.
The TIDE model is a weather model that simulates a crest on the beach. The model is based on a two-dimensional domain, and Code 21 defines the domain process . At this time, the size of the region is IM * JM (= 421 * 385).

[코드 21] process = hpcc _ define _ parallel ( IM , JM )
[Code 21] process = hpcc _ define _ parallel ( IM , JM )

[코드 22]는 통신 템플릿을 정의한다. 영역이 2차원이기 때문에, 이보다 상위 차원의 크기는 1이다.
Code 22 defines a communication template. Since the area is two-dimensional, the size of the higher dimension is one.

[코드 22] comm _ main = hpcc _ define _ comm ( process , 1)
[Code 22] comm _ main = hpcc _ define _ comm ( process , 1)

이 병렬 프로그램은 9개의 통신 지점을 갖는다. 도 7은 PC 클러스터에서 서로 다른 개수의 프로세서들을 이용한 각 경우의 병렬 프로그램의 속도를 나타낸 것이다. 이상적인 속도와 실제 속도의 차이는 프로세서의 개수가 커질수록 커진다. 이는 최근접 이웃 간 통신 중 하나가 3-레벨 루프의 내부에서 일어난다는 점, 더 많은 프로세서가 사용될수록 통신 오버헤드가 커진다는 점으로부터 쉽게 이해할 수 있는 병렬 프로그램의 오버헤드이다.
This parallel program has nine communication points. 7 shows the speed of a parallel program in each case using different numbers of processors in a PC cluster. The difference between the ideal speed and the actual speed increases as the number of processors increases. This is an overhead of parallel programs that can be easily understood from the fact that one of the nearest neighbor communication takes place inside the three-level loop, and that the more communication is used, the greater the communication overhead.

<실험예 2><Experimental Example 2>

QPM(Qualitative Prediction Model)은 시간에 따른 응결점(precipitation)의 변화를 시뮬레이션하기 위한 3차원 모델이다(아래 [12] 참조). 영역 process와 3차원 통신 템플릿 comm _ main2는 [코드 33] 및 [코드 34]와 같이 정의될 수 있다. 영역의 크기 nx * ny * nz는 (54 * 54 * 21)이다.
The Qualitative Prediction Model (QPM) is a three-dimensional model for simulating the change of precipitation over time (see [12] below). Area process and the three-dimensional template communication comm _ main2 may be defined as the Code 33] and [34 code. The size of the region nx * ny * nz is (54 * 54 * 21).

[코드 33] process = hpcc _ define _ parallel ( nx , ny ) [Code 33] process = hpcc _ define _ parallel ( nx , ny )

[코드 34] comm _ main2 = hpcc _ define _ comm ( process , nz )
[Code 34] comm _ main2 = hpcc _ define _ comm ( process , nz )

도 8은 PC 클러스터에서 서로 다른 개수의 프로세서들을 사용하여 얻은 병렬 QPM 프로그램의 속도를 나타낸 것이다. 도 7과 마찬가지의 경향이 나타난다.
8 shows the speed of a parallel QPM program obtained using different numbers of processors in a PC cluster. The same tendency as in FIG. 7 appears.

<실험예 3><Experimental Example 3>

ACDM(Air Chemistry diffusion model)은 대기 중의 화학 확산 작용을 시뮬레이션하는 모델이다(아래 [13] 참조). 영역 process는 [코드 35]에 의해 정의될 수 있다.
ACDM (Air Chemistry diffusion model) is a model that simulates chemical diffusion behavior in the atmosphere (see [13] below). The region process can be defined by [Code 35].

[코드 35] process = hpcc _ define _ parallel ( imax1 , jmax1 )
[Code 35] process = hpcc _ define _ parallel ( imax1 , jmax1 )

이 모델은 기본적으로 3차원 영역을 기초로 하지만, 공기 및 화학 확산에 대하여는 4차원 영역을 사용한다. 3차원 영역의 크기 및 4차원 영역의 크기는 각각 imax1 * jmax1 * kmax1(45 * 45 * 21) 및 imax1 * jmax1 * 22 * 70 이다. 서로 다른 영역을 위한 통신 템플릿 comm _ main1과 comm _ main2는 [코드 36]과 [코드 37]과 같이 정의될 수 있다.
The model is basically based on three-dimensional regions, but uses four-dimensional regions for air and chemical diffusion. The size of the three-dimensional region and the size of the four-dimensional region are imax1 * jmax1 * kmax1 (45 * 45 * 21) and imax1 * jmax1 * 22 * 70, respectively. Communicate with one another template for a different area and comm comm main1 _ _ main2 may be defined as the Code 36] and [37 code.

[코드 36] comm _ main1 = hpcc _ define _ comm ( process , kmax1 ) [Code 36] comm _ main1 = hpcc _ define _ comm ( process , kmax1 )

[코드 37] comm _ main2 = hpcc _ define _ comm ( process , 22*70)
[Code 37] comm _ main2 = hpcc _ define _ comm ( process , 22 * 70)

도 9는 PC 클러스터에서 얻은 병렬 ACM(Air Chemistry Model)의 속도를 나타낸다.
9 shows the speed of the parallel ACM (Air Chemistry Model) obtained in the PC cluster.

상술한 HPCC는, 예컨대 분산-메모리 컴퓨터를 위한 FDM 모델을 기초로 한 Fortran 레거시 코드를 병렬화할 수 있다. HPCC의 주목적은 짧은 기간 동안 효율적인 병렬 프로그램을 구현하는 것이다. 보통, 병렬화 프로세스가 간단하지 않으며 MPI 함수가 사용하기 복잡하기 때문에, 방대한 양의 레거시 코드를 병렬화하는 것은 어려운 일이다. HPCC는 사용자 친화적인 개념을 갖는 객체지향형 프로그램으로 설계되었기 때문에, 사용자들은 복잡한 MPI 함수를 자세히 알 필요 없이 Fortran 프로그램을 병렬화할 수 있다. HPCC는 Fortran 소스코드와 MPI 함수들 사이의 인터페이스뿐만 아니라 병렬 프로그래밍을 간소화하기 위한 매크로를 제공할 수 있다. HPCC의 유용성과 효율성을 나타내기 위하여 상술한 3개의 실험예를 구현하기 위하여 모델을 병렬화하는 작업은 2~3주 만에 완성되었다. 이는 종래에 병렬화를 위하여 수 개월이 걸렸던 것에 비하면 매우 향상된 것이다. 상술한 바와 같이 HPCC에 의해 병렬화된 프로그램은 PC들의 클러스터 상에서 효율적으로 실행된다.
The HPCC described above can parallelize Fortran legacy code, for example based on the FDM model for distributed-memory computers. The main purpose of HPCC is to implement efficient parallel programs in a short period of time. In general, parallelizing massive amounts of legacy code is difficult because the parallelization process is not simple and the MPI functions are complex to use. Because HPCC is designed as an object-oriented program with a user-friendly concept, users can parallelize Fortran programs without having to know the details of complex MPI functions. HPCC can provide macros to simplify parallel programming as well as an interface between Fortran source code and MPI functions. In order to demonstrate the usefulness and efficiency of the HPCC, parallelization of the model was completed in two to three weeks in order to implement the three experimental examples described above. This is a significant improvement over the prior months of parallelization. As described above, the program parallelized by the HPCC runs efficiently on a cluster of PCs.

<실시예 1>&Lt; Example 1 >

이하, 본 발명의 일 실시예에 따른 프로세서 간 데이터 통신방법을 설명한다.Hereinafter, a method of communicating data between processors according to an embodiment of the present invention.

도 10a는 본 발명의 일 실시예에 따른 프로세서 간 데이터 통신방법을 순서도로 간략히 나타낸 것이다.10A is a flowchart schematically illustrating a method for communicating data between processors according to an embodiment of the present invention.

이 방법은 객체지향형 클래스를 갖는 프로그램을 이용하여, 2차원 토폴로지를 갖는 프로세서 어레이의 각 프로세서 간에 데이터 통신을 수행하는 방법이다. 여기서 객체지향형 클래스는 상술한 Parallel 클래스 및/또는 Comm 클래스일 수 있다. 그리고 본 명세서 및 특허청구범위에서 2차원 토폴로지를 갖는다는 것은 상술한 프로세서 어레이의 각 프로세서를 개념적으로 2차원 어레이 배열로서 간주할 수 있음을 의미하는 것으로서, 실제로 물리적으로 2차원으로 배열되었다는 것을 뜻하지는 않는다. 또한, 2차원 토폴로지를 갖는 프로세서 어레이는 별개의 컴퓨팅 장치에 분산 배치되어 있는 프로세서들의 집합을 의미할 수도 있고, 하나의 컴퓨팅 장치 내에서 분산 배치되어 있는 프로세서들의 집합을 의미할 수도 있다.This method is a method of performing data communication between each processor of a processor array having a two-dimensional topology by using a program having an object-oriented class. Where the object-oriented class is Parallel Class and / or Comm It may be a class. In addition, having a two-dimensional topology in the present specification and claims means that each processor of the above-described processor array can be conceptually regarded as a two-dimensional array arrangement, and does not mean that the two-dimensional topology is actually physically arranged in two dimensions. Do not. In addition, a processor array having a two-dimensional topology may mean a collection of processors distributed in separate computing devices, or may mean a collection of processors distributed in one computing device.

이 방법은, 2차원 이상의 데이터 영역 중 2차원의 데이터 영역을 복수 개의 서브-데이터 영역으로 분할하여 상기 각 프로세서에 할당하고, 상기 2차원을 초과하는 차원의 데이터 영역을 상기 프로세서 어레이에 할당하는 병렬처리객체를 생성하는 단계(S110)를 포함한다. 병렬처리객체는 예컨대, 상술한 [코드 2]의 hpcc_define_parallel( , ) 함수에 의해 생성될 수 있다.The method divides a two-dimensional data area of a two-dimensional or more data area into a plurality of sub-data areas and allocates the data area of the two-dimensional or larger dimension to the processor array in parallel. Generating a processing object (S110). The parallel processing object may be generated by , for example, the hpcc_define_parallel (,) function of the above-mentioned [Code 2].

여기서, 도 2를 예로 들면, '2차원 이상의 데이터 영역'은 인덱스 i, j, k에 의해 정의되는 영역이고, '2차원의 데이터 영역'은 인덱스 i, j에 의해 정의되는 영역이고, '2차원을 초과하는 차원의 데이터 영역'은 인덱스 k에 의해 정의되는 영역이다. 2, for example, 'a two-dimensional or more data area' is an area defined by indexes i, j, and k, and a 'two-dimensional data area' is an area defined by indexes i, j, and a '2' The data area of the dimension beyond the dimension 'is the area defined by the index k.

이와 달리, 상술한 <실험예 3>의 예를 들면, 위의 '2차원 이상의 데이터 영역'은 (imax1 * jmax1 * 22 * 70)에 의해 정의되는 영역이고, '2차원의 데이터 영역'은 (imax1 * jmax1)에 의해 정의되는 영역이고, 2차원을 초과하는 차원의 데이터 영역은 (22 * 70)에 의해 정의되는 영역이다.On the contrary, in the above-described <Experimental Example 3>, the above 'two-dimensional data area' is an area defined by ( imax1 * jmax1 * 22 * 70 ), and the 'two-dimensional data area' is ( imax1 * jmax1 ), and a data area of a dimension exceeding two dimensions is an area defined by ( 22 * 70 ).

이 방법은 또한, 상기 병렬처리객체를 이용하여 상기 각 프로세서 간의 데이터 통신방식을 정의하는 통신객체를 생성하는 단계(S120)를 포함할 수 있다. 여기서 통신객체는 상술한 [코드 3]의 hpcc _ define _ comm ( , ) 함수에 의해 정의될 수 있다. 이때, hpcc _ define _ comm ( , ) 의 인자로서, 상기 병렬처리객체 및 상기 2차원을 초과하는 차원의 데이터 영역의 크기가 사용될 수 있다.The method may also include generating a communication object (S120) that defines a data communication method between the processors using the parallel processing object. Here, the communication object may be defined by the hpcc _ define _ comm (,) function of the above [Code 3]. In this case, as a factor of hpcc _ define _ comm (,) , the size of the parallel processing object and the data area having a dimension exceeding two dimensions may be used.

그 다음, 이 방법은, 상기 통신객체에 의해 정의되는 데이터 통신방식에 따라, 상기 프로세서 어레이의 제1 프로세서에게, 상기 프로세서 어레이의 제2 프로세서에 할당된 제2 서브-데이터 영역의 경계영역의 값을 전송하는 단계(S130)를 포함할 수 있다. 여기서, '데이터 통신방식'은 상술한 [코드 5]의 call hpcc_add_comm( , , ) 함수에 의해 정의될 수 있다. 또한, 상술한 제2 서브-데이터 영역은 예컨대, 도 3의 (a)에 나타낸 P2 영역일 수 있다. 그리고, 상술한 '경계영역'은 위의 P2 영역 중 어두운 부분을 나타낼 수 있으며, '경계영역의 값'은 [코드 4] 내지 [코드 9]에서 언급한 UA , a, b, c, d와 같은 변수일 수 있다.The method then determines, according to the data communication scheme defined by the communication object, a value of the boundary area of the second sub-data area allocated to the first processor of the processor array to the second processor of the processor array. It may include the step of transmitting (S130). Here, the 'data communication method' may be defined by the call hpcc_add_comm (,,) function of the above-mentioned [Code 5]. In addition, the above-described second sub-data area may be, for example, the P2 area shown in FIG. 3A. In addition, the above-described 'border area' may represent a dark portion of the above P2 area, and the 'border area value' may correspond to UA , a, b, c, and d as mentioned in [Code 4] to [Code 9]. It can be the same variable.

또한, 이 방법은, 위의 전송된 값을 이용하여 상기 제1 프로세서에 할당된 제1 서브-데이터 영역의 경계영역의 값을 계산하는 단계(S140)를 포함할 수 있다. 여기서 '제1 서브-데이터 영역'은 예컨대, 도 3의 (a)에 나타낸 P0 영역이고, '제1 서브-데이터 영역의 경계영역'은 도 3의 (a)에 나타낸 P0 영역 중 어두운 부분일 수 있다. In addition, the method may include calculating (S140) a value of the boundary area of the first sub-data area allocated to the first processor using the transmitted value. Here, the 'first sub-data area' is, for example, a P0 area shown in FIG. 3A, and the 'border area of the first sub-data area' is a dark part of the P0 area shown in FIG. 3A. Can be.

도 10b는 상술한 병렬처리객체를 생성하는 단계(S110)를 더 자세히 나타낸 것이다. 단계(S110)는, 위의 프로세서 어레이에 포함된 프로세서의 개수(n) 및 상기 2차원 데이터 영역의 크기(M * N)를 획득하는 단계(S111), p * q = n을 만족하는 자연수 p 및 자연수 q 중, M/p와 N/q의 차이값이 최소가 되는 p와 q를 계산하는 단계(S112), 및 상기 2차원의 데이터 영역을 p개의 행 영역과 q개의 열 영역으로 분할하여 n개의 서브-데이터 영역으로 분할하는 단계(S113)를 포함할 수 있다. 단계(S111 ~ S113)는 예컨대, [식 1]과 같은 코드에 의해 구현될 수 있다.Figure 10b shows in more detail the step (S110) for generating the above-described parallel processing object. In operation S110, obtaining a number n of processors included in the processor array and a size M * N of the two-dimensional data region in operation S111, a natural number p that satisfies p * q = n. And calculating p and q at which the difference between M / p and N / q is the minimum among natural numbers q (S112), and dividing the two-dimensional data region into p row regions and q column regions. The method may include dividing into n sub-data areas (S113). Steps S111 to S113 may be implemented by, for example, a code such as [Equation 1].

도 10c는 상술한 전송하는 단계(S130)를 더 자세히 나타낸 것이다. 상술한 전송하는 단계(S130)는, 위의 제1 프로세서가, 제1 서브-데이터 영역의 경계영역의 값을 계산하는 데 필요한 데이터의 통신방향을 상기 제2 프로세서에게 송신하는 단계(S131), 및 상기 제2 프로세서가, 상기 제2 서브-데이터 영역의 경계영역의 값 중 상기 통신방향에 대응되는 값을 상기 제1 프로세서에게 전송하는 단계(S132)를 포함할 수 있다. 예컨대, 위의 제1 서브-데이터 영역의 경계영역의 값이 상술한 [코드 6]의 변수 a인 경우에, 상술한 데이터의 통신방향은 [코드 6]의 인자 HPCC _ IP + HPCC _ IM + HPCC _ IPJM에 의해 정의될 수 있다. 위의 단계(S131)는 에컨대, 상술한 [코드 10]의 call hpcc _ exec _ comm () 함수에 의해 실행될 수 있다.10c illustrates the above-described transmitting step (S130) in more detail. The above-described transmitting step (S130), the first processor, the step of transmitting a communication direction of the data required to calculate the value of the boundary area of the first sub-data area to the second processor (S131), And transmitting, by the second processor, a value corresponding to the communication direction among the values of the boundary area of the second sub-data area to the first processor (S132). For example, when the value of the boundary area of the first sub-data area is the variable a of [Code 6] described above, the communication direction of the above-mentioned data is determined by the factor HPCC _ IP + HPCC _ IM + of [Code 6]. HPCC _ can be defined by the IPJM. The above step (S131) is, for example, the call of the above-mentioned [Code 10] and it executed by a hpcc _ _ comm exec () function.

또한, 이 방법은, 위의 각 프로세서에 할당된 서브-데이터 영역의 계산이 종료되었을 때에, 상기 프로세서 어레이 중 하나의 프로세서가, 다른 프로세서들에 할당된 서브-데이터 영역의 값을 상기 다른 프로세서들로부터 송신 받은 후, 상기 하나의 프로세서가 상기 2차원 이상의 데이터 영역의 값을 출력하는 단계(S150)를 포함할 수 있다.In addition, the method further includes that when the calculation of the sub-data area assigned to each of the above processors is finished, one of the processor arrays determines the value of the sub-data area assigned to the other processors. After receiving from the mobile station, the one processor may include outputting a value of the two-dimensional or higher data area (S150).

상술한 통신객체를 생성하는 단계(S120)는, 각 서브-데이터 영역의 경계영역의 값을 계산하는 데 필요한 데이터의 통신방향에 관한 정보를 위의 데이터 통신방식에 추가하는 단계(S121)를 포함할 수 있다. 단계(S121)는 예컨대, 상술한 [코드 5]의 hpcc _ add _ comm ( , , ) 함수에 의해 구현될 수 있다.Generating the above-described communication object (S120) includes adding information on the communication direction of the data necessary to calculate the value of the boundary area of each sub-data area to the above data communication method (S121). can do. Step (S121) is, for example, may be implemented by hpcc _ _ add comm (,,) functions in the above-described [5 code.

또한, 상술한 전송하는 단계(S130)는, 제1 프로세서가 제2 프로세서에게 위의 데이터 통신방식을 전송하는 단계, 및 제2 프로세서가 위의 데이터 통신방식에 따라 제2 서브-데이터 영역의 경계영역의 값을 제1 프로세서에게 전송하는 단계를 포함할 수 있다. 이대, '데이터 통신방식'은 [코드 5]의 hpcc _ add _ comm ( , , ) 에 의해 정의되고, 예컨대 도 4의 (a)에 나타낸 심볼 테이블을 이용하여 형성될 수 있다.
In addition, in the above-described transmitting step (S130), the first processor transmits the above data communication scheme to the second processor, and the second processor according to the above data communication scheme boundary of the second sub-data area And transmitting the value of the region to the first processor. Lee, "data communication method, may be formed using the symbol table shown in Code 5] hpcc _ _ add comm (,,) of a defined by, for example, Fig. 4 (a).

<실시예 2><Example 2>

이하 본 발명의 다른 실시예에 따른 컴퓨터로 읽을 수 있는 매체를 설명한다. Hereinafter, a computer-readable medium according to another embodiment of the present invention will be described.

이 매체는 객체지향형 클래스를 갖는 프로그램을 이용하여, 2차원 토폴로지를 갖는 프로세서 어레이를 포함하는 컴퓨터 시스템에, 상술한 <실시예 1>의 단계(S110, S120, S130)들을 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 매체이다. 이 매체는, CD-ROM, CD-RW, SDD, HDD 등 그 종류를 가리지 않고 위의 컴퓨터 시스템이 읽을 수 있는 매체를 모두 포함한다. 또한, 위의 프로그램은 상술한 <실시예 1>의 다른 단계들을 하나 이상 실행시키기 위한 코드를 더 포함할 수 있다는 것은 쉽게 이해할 수 있다.
This medium uses a program having an object-oriented class to record a program for executing the steps (S110, S120, S130) of <Embodiment 1> described above in a computer system including a processor array having a two-dimensional topology. It is a computer-readable medium. This medium includes all media that can be read by the above computer system, such as CD-ROM, CD-RW, SDD, HDD, and the like. In addition, it may be easily understood that the above program may further include code for executing one or more other steps of the above-described <Embodiment 1>.

<실시예 3><Example 3>

이하 본 발명이 또 다른 실시예에 따른 멀티 프로세서 컴퓨팅 시스템을 설명한다.Hereinafter, a multiprocessor computing system according to another embodiment will be described.

이 컴퓨팅 시스템은 저장장치, 및 2차원 토폴로지를 갖는 프로세서 어레이를 포함할 수 있다. 위의 저장장치는 <실시예 2>에서 상술한 컴퓨터로 읽을 수 있는 매체일 수 있다. 또한, 상술한 프로세서 어레이는 하나의 독립된 장치에 설치되어 있거나, 또는 여러 개의 독립된 장치에 분산되어 설치되어 있을 수 있다. 이 컴퓨팅 시스템이 2개 이상의 독립된 장치를 포함하는 경우 이 장치들을 연결하는 통신장치들이 이 컴퓨팅 시스템에 더 포함될 수 있다.The computing system can include a storage device and a processor array having a two-dimensional topology. The storage device may be the computer-readable medium described above in <Example 2>. In addition, the above-described processor array may be installed in one independent device or distributed in several independent devices. If the computing system includes two or more independent devices, the communication devices connecting the devices may be further included in the computing system.

상술한 저장장치에는, <실시예 1>의 각 단계들, 예컨대 단계(S110, S120, S130)들을 실행시키기 위한 프로그램 코드들이 저장되어 있을 수 있다.
In the above-described storage device, program codes for executing the steps of the first embodiment, for example, the steps S110, S120, and S130 may be stored.

본 발명의 제목이 본 발명의 범위를 한정하는 것으로 의도된 것이 아니며, 본 발명의 청구항의 내용을 포괄할 수 있도록 정정될 수 있다.The title of the present invention is not intended to limit the scope of the present invention, but may be corrected to cover the contents of the claims of the present invention.

본 발명의 일 실시예들에서 설명한 저장장치는, HDD, SDD, CD-ROM, CD-RW, Blue-Ray disc 등 그 종류를 가리지 않는다. The storage device described in the embodiments of the present invention does not select any kind of HDD, SDD, CD-ROM, CD-RW, Blue-Ray disc, or the like.

발명의 상세한 설명은 본 발명의 실시예들을 설명하도록 의도된 것이며, 본 발명에 따라 구현될 수 있는 유일한 실시예를 나타내기 위한 것은 아니다. 본 발명의 상세한 설명은 본 발명의 용이한 이해를 위하여 특정용어를 사용하여 서술될 수 있다. 그러나 본 발명의 이러한 특정용어에 의해 제한되도록 의도한 것은 아니다. 따라서, 상술한 본 발명의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니 되고 예시적인 것으로 고려되어야 한다. The detailed description of the invention is intended to describe embodiments of the invention and is not intended to represent the only embodiments that can be implemented in accordance with the invention. The detailed description of the invention may be described using specific terms for easy understanding of the invention. It is not intended, however, to be limited by this specific terminology of the invention. Accordingly, the above detailed description of the invention should not be construed as limiting in all aspects but considered as illustrative.

상술한 실시예들은 각각 본 발명의 구성요소들과 특징들이 소정 형태로 결합된 것이다. 각 구성요소 또는 특징은 다른 구성요소나 특징과 결합되지 않은 형태로 실시될 수 있다. 또한, 일부 구성요소들 및/또는 특징들을 결합하여 본 발명의 실시예를 구성하는 것도 가능하다. 어느 실시예의 일부 구성이나 특징은 본 발명의 사상에 반하지 않는다면 다른 실시예에 포함될 수 있고, 또는 다른 실시예의 대응되는 구성 또는 특징과 교체될 수 있다. 특허청구범위에서 명시적인 인용 관계가 있지 않은 청구항들을 결합하여 실시예를 구성하거나 출원 후의 보정에 의해 새로운 청구항으로 포함시킬 수 있음은 자명하다.Each of the above-described embodiments is a combination of the components and features of the present invention in a predetermined form. Each component or feature may be implemented in a form that is not combined with other components or features. It is also possible to construct embodiments of the present invention by combining some of the elements and / or features. Some constructions or features of one embodiment may be included in another embodiment if not contrary to the spirit of the present invention, or may be replaced with corresponding constructions or features of another embodiment. It is clear that the claims that are not expressly cited in the claims may be combined to form an embodiment or be included in a new claim by an amendment after the application.

본 발명의 범위는 특허청구범위 합리적 해석을 고려하여 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다. 본 발명의 기술 분야에 속하는 기술자라면 본 발명의 실시예들 및 특허청구범위로부터 본 발명의 사상을 용이하게 이해할 수 있을 것이다.
The scope of the present invention should be determined in consideration of the reasonable interpretation of the claims, and all modifications within the equivalent scope of the present invention are included in the scope of the present invention. Those skilled in the art will be able to easily understand the spirit of the present invention from the embodiments of the present invention and the claims.

[10] G. C. Fox, M. A. Johnson, G. A. Lyzenga, S. W. Otto, J. K. Salmon, and D. W. Walker, Solving Problems On Concurrent Processors Volume I: General Techniques and Regular Problem . Englewood Cliffs: Prentice-Hall, pp. 120-123, 1998. [10] GC Fox, MA Johnson, GA Lyzenga, SW Otto, JK Salmon, and DW Walker, Solving Problems On Concurrent Processors Volume I: General Techniques and Regular Problem. Englewood Cliffs: Prentice-Hall, pp. 120-123, 1998.

[11] H. Leem, O. Seo and H. Kang, "A Sensitivity Test on the Minimum Depth of the Tide Model in the Northeast Asian Marginal Seas,"J. of Korean Society of Coastal and Ocean Engineers, vol. 19, no. 5, 2007. [11] H. Leem, O. Seo and H. Kang, "A Sensitivity Test on the Minimum Depth of the Tide Model in the Northeast Asian Marginal Seas," J. of Korean Society of Coastal and Ocean Engineers , vol. 19, no. 5, 2007.

[12] R. Misumi, R. J. Moore, "River flow forecasting using a rainfall disaggregation model incorporating small-scale topographic effects," J. Meteorol. Appl., vol. 8, pp. 297-305, 2001. [12] R. Misumi, RJ Moore, "River flow forecasting using a rainfall disaggregation model incorporating small-scale topographic effects," J. Meteorol. Appl ., Vol. 8, pp. 297-305, 2001.

[13] "Core Environmental Technology Development Project for Nest Generation," 13-1-1, Korea Institute of Environmental Science and Technology, 2004. [13] "Core Environmental Technology Development Project for Nest Generation," 13-1-1, Korea Institute of Environmental Science and Technology, 2004.

Claims

In a computer system including a processor array having a two-dimensional topology by using a program having an object-oriented class, a method of performing data communication between each processor belonging to the processor array,
Generating a parallel processing object in the computer system, dividing a two-dimensional data region into a plurality of sub-data regions and allocating the processors to the respective processors;
Generating a communication object in the computer system using the parallel processing object, the communication object including information on a communication direction of data necessary for calculating a value of a boundary area of each sub-data area; And
Based on the information about the communication direction of the data, data necessary for calculating a value of a boundary area of a first sub-data area allocated to a first processor of the processor array is generated by the second processor of the processor array. Transmitting to the first processor;
Including;
Data necessary for calculating a value of the boundary area of the first sub-data area is present in the boundary area of the second sub-data area allocated to the second processor;
How to communicate data between processors.

2. The method of claim 1, further comprising the step of the first processor using the transmitted data to calculate a value of a boundary region of the first sub-data region.

The method of claim 1,
Creating the parallel processing object,
Obtaining a number n of processors included in the processor array and a size M * N of the two-dimensional data region;
calculating p and q of a natural number p and a natural number q satisfying p * q = n, wherein the difference between M / p and N / q becomes the minimum; And
Dividing the two-dimensional data region into p row regions and q column regions to divide into n sub-data regions
/ RTI >
How to communicate data between processors.

delete

The method of claim 1,
When the calculation of the sub-data area allocated to each processor is completed, receiving, by one of the processor arrays, a value of the sub-data area allocated to other processors from the other processors; And
The one processor outputting a value of the data area of at least two dimensions
Comprising a data communication method between the processors.

The method of claim 1,
The two-dimensional data area is included in a two-dimensional or more data area,
In the step of creating the communication object, the communication object is defined according to the size of the parallel processing object and the data area of the dimension exceeding two dimensions, inter-processor data communication method.

delete

In a computer system comprising a processor array having a two-dimensional topology,
Parallel processing of dividing a two-dimensional data area into a plurality of sub-data areas among two or more data areas and allocating them to each processor of the processor array, and allocating a data area of more than two dimensions to the processor array. Creating an object;
Generating a communication object defining a data communication method between the processors using the parallel processing object; And
Transmitting a value of a boundary region of a second sub-data region allocated to a second processor of the processor array to a first processor of the processor array according to a data communication scheme defined by the communication object;
A program with an object-oriented class for executing
Computer-readable media.

10. The computer program product of claim 9, wherein the program further comprises code for causing the computer system to execute the step of calculating a value of a boundary area of a first sub-data area allocated to the first processor using the transmitted value. Computer-readable media, further included.

Storage device; And
Processor Array with Two-Dimensional Topology
Including;
The storage device,
Parallel processing of dividing a two-dimensional data area into a plurality of sub-data areas among two or more data areas and allocating them to each processor of the processor array, and allocating a data area of more than two dimensions to the processor array. First program code for executing a step of creating an object,
Second program code for executing a step of generating a communication object defining a data communication method between the processors using the parallel processing object;
In accordance with a data communication scheme defined by the communication object, transmitting a value of a boundary area of a second sub-data area allocated to a second processor of the processor array to a first processor of the processor array. 3rd program code for
Is recorded,
Multiprocessor Computing System.

The method of claim 11,
The transmitting step,
The first processor transmitting the data communication scheme to the second processor, and
Transmitting, by the second processor, a value of a boundary area of the second sub-data area to the first processor according to the data communication scheme;
/ RTI >
Multiprocessor Computing System.

In a computer system including a processor array having a two-dimensional topology, a method of performing data communication between each processor belonging to the processor array,
In the computer system, dividing a two-dimensional data region into a plurality of sub-data regions and allocating the two-dimensional data regions to the respective processors;
Generating, at the computer system, information regarding a communication direction of data necessary for calculating a value of a boundary area of each sub-data area; And
Based on the information about the communication direction of the data, data necessary for calculating a value of a boundary area of a first sub-data area allocated to a first processor of the processor array is generated by the second processor of the processor array. Transmitting to the first processor;
Including;
Data necessary for calculating a value of the boundary area of the first sub-data area is present in the boundary area of the second sub-data area allocated to the second processor;
How to communicate data between processors.