CN111857732A - Serial program parallelization method based on marks - Google Patents

Serial program parallelization method based on marks Download PDF

Info

Publication number
CN111857732A
CN111857732A CN202010756781.2A CN202010756781A CN111857732A CN 111857732 A CN111857732 A CN 111857732A CN 202010756781 A CN202010756781 A CN 202010756781A CN 111857732 A CN111857732 A CN 111857732A
Authority
CN
China
Prior art keywords
parallel
data
code
serial
mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010756781.2A
Other languages
Chinese (zh)
Other versions
CN111857732B (en
Inventor
唐佩佳
徐云
余泽霖
王嘎
钟旭阳
孙一佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202010756781.2A priority Critical patent/CN111857732B/en
Publication of CN111857732A publication Critical patent/CN111857732A/en
Application granted granted Critical
Publication of CN111857732B publication Critical patent/CN111857732B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/51Source to source
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/73Program documentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention relates to a serial program parallelization method based on a mark, which comprises the following steps: marking a serial program; step (2) the code analysis system analyzes the mark and records the parameters of the mark clause; step (3) the code analysis system extracts parallel code segments from the basic parallel code library and fills the parallel code segments with the marked clause parameters; and (4) splicing the filled parallel code segments to obtain a parallel program corresponding to the final serial program. The method of the invention reduces the development cost of parallel programming and lightens the burden of developers; the parallelization capability of a plurality of platforms is provided, and the parallel API of the plurality of parallel platforms can be obtained by one serial API; a special compiler is not required to be independently developed, a mature compiler can be directly adopted, and the parallel compiling efficiency is high. The method based on the mark does not rewrite the original program and does not need to deeply understand the serial program, the reliability of the parallel program is improved by the standardized mark analysis process, and the error probability is reduced.

Description

Serial program parallelization method based on marks
Technical Field
The invention relates to a parallel program generation method, in particular to a serial program parallelization method.
Background
With the development and popularization of parallel computing technology, a large number of serial application programs in the industry need to be urgently modified into parallel programs to improve data processing capacity. While serial program parallelization faces two basic problems: (1) the high cost of parallel programming. Parallel programming requires professional parallel programming ability and rich engineering experience, and developing parallel programs requires a great deal of engineering cost and time. (2) The diversity problem of the parallel platform. As the number of various parallel hardware platforms and parallel programming models increases and diversification is presented, the ability to quickly generate parallel programs for a desired target parallel platform is required. In view of the above two problems, it is necessary to assist parallel programming by an efficient and easy-to-use parallelization method.
A parallel platform is a combination of a parallel hardware platform and a corresponding parallel programming model. Parallel hardware platforms can be divided into two categories, shared storage structure and distributed storage structure according to their storage structures. The shared storage structure hardware platform has a plurality of CPUs working together without primary and secondary relations, and each CPU shares the same physical memory and communicates through memory address operation; the distributed storage structure hardware platform is composed of a plurality of nodes, each node is provided with one or more independent CPUs and independent physical memories, and each node can independently operate and communicate through a network among the nodes. Different parallel hardware platforms have corresponding parallel programming models, a typical parallel programming model of a shared storage structure hardware platform is OpenMP, and the parallel programming model is a multi-thread parallel programming model based on guidance marks and describes parallel semantics through explicit marks. A typical parallel programming model of a distributed storage structure hardware platform is MPI, and the MPI provides rich standard APIs (application programming interfaces) to help a user complete operations such as parallel environment construction, node communication and synchronization.
The existing parallelization methods for serial programs mainly have two types, and the main contents and the defects are as follows: (1) an automatic parallelization tool. The tool automatically analyzes the parallelizable part in the original serial program and performs parallel compiling according to a certain compiling rule, a special compiler needs to be separately developed, meanwhile, in order to keep the original serial program semantics, the strategy of the compiler is conservative, and the parallel efficiency is not high. (2) A traditional parallel programming model. The tool adopts the existing mature classical parallel programming framework to rewrite the original serial program, but the program rewriting needs to deeply know the original serial program, needs experienced designers and has large workload, the program rewriting process is easy to make mistakes, the program quality depends on the professional ability of developers, the platform correlation of the tool is strong, most of the generated parallel programs are specific to specific parallel platforms, and the parallel platforms need to be redeveloped after being replaced
Disclosure of Invention
In order to solve the technical problems, the invention provides a parallelization method of a serial program, aiming at two basic problems faced by parallelization of the serial program and the defects of the existing parallelization method, and the parallelization method of the serial program solves the high cost problem of parallelization of the serial program and the diversity problem of a parallel platform by marking the serial program and analyzing the mark to generate the parallelization program.
The technical scheme of the invention is as follows: a tag-based serial program parallelization method, comprising the steps of:
marking a serial program;
step (2) the code analysis system analyzes the mark and records the parameters of the mark clause;
step (3) the code analysis system extracts parallel code segments from the basic parallel code library and fills the parallel code segments with the marked clause parameters;
and (4) splicing the filled parallel code segments to obtain a corresponding parallel program finally converted from the serial program.
Further, the function names of the serial API functions are marked, the marks comprise mark names and mark clauses, the mark names are used for identification of a subsequent code analysis system, the mark clauses are used for providing parameters required by parallelization and comprise data sources, data purposes and data batch numbers, the data source clauses provide data package information to be processed in parallel, the data purpose clauses provide result data package information after the parallelization processing is finished, the data batch number clauses provide split batch numbers of data packages in the data source clauses, and the split principle is that each batch of split data can be processed by a serial program independently.
Furthermore, the code analysis system comprises an analysis module and a code extraction module, wherein the analysis module is responsible for reading and analyzing the marked serial program, and analyzing and recording the marked clause parameters and the marked serial program when the mark name is identified; the code extraction module is responsible for extracting code segments from the underlying parallel code library.
Further, the basic parallel code library records three parallel stages of a shared storage platform and a distributed storage platform, namely parallel code segments of data division and distribution, data calculation and data collection.
Furthermore, the code analysis system fills the marked clause parameters into corresponding fixed point positions of the parallel code segment to obtain the parallel code segment containing the marked clause parameters.
Further, under a shared storage platform, the splicing sequence comprises data division and distribution, data collection and data calculation; under a distributed storage platform, the splicing sequence comprises data division and distribution, data calculation and data collection.
Further, after splicing, adding a fixed code defined by a variable to obtain a function body of the parallel API function; adding a parallel identification suffix to the serial API function name to serve as the function name of the parallel API function; and adding thread number parameters to the serial API function parameter list to obtain the whole parameter list of the parallel API function.
The beneficial results of the invention are:
(1) the parallel programming development cost is reduced, and the burden of developers is lightened;
(2) the parallelization capability of a plurality of platforms is provided, and the parallel API of the plurality of parallel platforms can be obtained by one serial API;
(3) a special compiler is not required to be independently developed, a mature compiler can be directly adopted, and the parallel compiling efficiency is high;
(4) the method based on the mark does not rewrite the original program and does not need to deeply understand the serial program, and the standardized mark analysis process improves the reliability of the parallel program and reduces the error probability.
Drawings
FIG. 1 is an overall flow diagram of the present invention;
FIG. 2 is an algorithmic flow diagram of the code resolution system of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
FIG. 1 is a general flowchart of a tag-based parallelization method for a serial program according to the present invention, comprising the steps of:
and (1) marking the serial program.
And (2) analyzing the mark by the code analysis system, and recording the parameter of the mark clause.
And (3) extracting parallel code segments from the basic parallel code library by the code analysis system, and filling the parallel code segments with the marked clause parameters.
And (4) splicing the filled parallel code segments to obtain a parallel program corresponding to the final serial program.
Specifically, the step (1) of marking the serial program includes:
the serial program refers to a serial API function. In the field of software engineering, API (application programming Interface) functions are predefined functions, represent specific software function modules that can be called, and are the basic composition structures of computer software. According to the software engineering specification, a serial API function consists of three parts, namely a function name, a parameter list and a function body.
The mark is an identification field used for expressing parallel semantics and indicating a parallel position, and comprises a mark name and a mark clause, wherein the mark name is used for a subsequent code analysis system to identify, is fixed and has a form different from that of a common program code, and the mark clause is used for providing parameters required by parallelization, is filled by a developer and is related to a serial program. The marking clause comprises a data source, a data destination and a data batch number, parameters of the marking clause can be taken from a serial program parameter list and can also be specified by developers, wherein the data source clause provides data packet information to be processed in parallel, the data destination clause provides result data packet information after the parallel processing is finished, the data batch number clause provides the detachable batch number of data packets in the data source clause, and the detachable principle is that each batch of data after being detached can be processed by a serial program independently.
The flow of the step is as follows: the developer adds a flag over the function name of the serial program definition part.
(2) And the code analysis system analyzes the mark and records the mark clause parameters.
The code analysis system is an independently executable program, is responsible for reading the marked serial program file and analyzing the serial program file into a corresponding parallel program, and is an important implementation tool of the method. The code analysis system comprises an analysis module and a code extraction module. The analysis module is responsible for reading and analyzing the marked serial program, and the code extraction module is responsible for extracting code segments from the basic parallel code library.
The flow of the step is as follows: and (3) reading the marked serial program file obtained in the step (1) by the code analysis system, scanning from beginning to end, and analyzing and recording the marked clause parameters and the marked serial program when the mark name is identified.
(3) The code parsing system extracts parallel code segments from the underlying parallel code library and populates them with tagged clause parameters.
The basic parallel code library is a file which records parallel code segments of a plurality of parallel stages of a plurality of parallel platforms. The parallel platforms currently covered by the basic parallel code library include a shared storage structure hardware platform and an OpenMP programming model, and a distributed storage structure hardware platform and an MPI programming model, which are hereinafter referred to as a shared storage platform and a distributed storage platform, respectively. The multiple parallel stages refer to a parallel program execution process, namely a program structure of a parallel program, and can be divided into three parallel stages: data partitioning and distribution, data calculation and data collection. In general, the basic parallel code library comprises three types of code segments of data division and distribution, data calculation and data collection of a shared storage platform and a distributed storage platform. The basic parallel code base reserves an expansion interface and can be expanded to other parallel platforms.
The parallel code segment extraction refers to that the code analysis system selects the required parallel code segment from the basic parallel code library according to the parallel platform and the parallel stage. Taking the data collection stage under the shared storage platform as an example, the code analysis system queries the code segment set under the directory of the basic parallel code library shared storage platform and returns the code segments in the data collection stage.
The flow of the step is as follows: and (3) the code analysis system sequentially searches parallel code segments of three parallel stages (data division and distribution, data calculation and data collection) under the parallel platform from a basic parallel code library according to the parallel platform and the parallel stages, and fills the marked clause parameters obtained in the step (2) into corresponding fixed point positions of the parallel code segments to obtain parallel code segments of all stages containing the marked clause parameters.
(4) And splicing the filled parallel code segments to obtain a parallel program corresponding to the final serial program.
The parallel program refers to a parallel API function corresponding to the serial API function and can be called by developers. According to the software engineering specification, a parallel API function is composed of a function name, a parameter list and a function body, and is the same as a serial API function.
The flow of the step is as follows: regarding the function body, under the shared storage platform, sequentially splicing the code segments obtained in the step (3) according to the sequence of data division and distribution, data collection and data calculation, under the distributed storage platform, the splicing sequence is data division and distribution, data calculation and data collection, and the function body of the parallel API function is obtained. Regarding the function name, the serial API function name plus a _ prallel suffix is taken as the function name of the parallel API function. Regarding the parameter list, the serial API function parameter list plus numprocs parameters is taken as the parameter list of the parallel API function as a whole, the numprocs parameters refer to the number of parallel processing units, and the shared storage platform and the distributed storage platform represent the thread number and the node thread number respectively.
According to a preferred embodiment of the invention: the embodiment of the invention parallelizes the serial program Func _ serial (srcname, dstname, num, var1 and var2) into parallel programs under a shared storage platform and a distributed storage platform respectively.
Marking the serial program in the step (1), specifically:
the tag # sigma parallel _ task is added above the serial program function name. After labeling as follows:
#sigma parallel_task src_data(srcname;srcdatatype;srcsize)
dst_data(dstname;dstdatatype;dstsize)
group(num)
func _ serial (src name, dstname, num, var1, var2)// function name and parameter list
{
….// function body
}
The mark name # sigma parallel _ task is used for the subsequent code parsing system to identify, and the mark clause is used for providing parameters required by code parsing, is filled by a developer and is related to a serial program. The mark clauses src _ data (srcname; srcdatatype; srcsize), dst _ data (dstname; dstatatype; dstsize) and group (num) respectively represent a data source, a data destination and a data batch number, wherein data source clause parameters src name, srcdatatype and srcsize respectively indicate the address, the data type and the data quantity of a data packet to be processed in parallel, data destination clause parameters dstname, dstatatype and stsize respectively indicate the address, the data type and the data quantity of a result data packet after the parallel processing is finished, and the data batch number clause provides the splittable batch number of the data source.
And (2) analyzing the mark by the code analyzing system, and recording mark parameters, wherein the method specifically comprises the following steps:
the algorithmic idea of the code parsing system is shown in fig. 2.
And (3) reading the marked serial program file obtained in the step (1), scanning from beginning to end, and when the mark name # sigma parallel _ task is identified, analyzing and recording each mark clause parameter such as srcname, srcdatatype, srcsize and the like by an analyzing module, and simultaneously recording the marked serial program function name Func _ serial and a parameter list (srcname, dstname, num, var1 and var 2). The unmarked part is not resolved.
And (3) extracting parallel code segments from the basic parallel code library by the code analysis system, and filling the parallel code segments by using marking parameters, wherein the method specifically comprises the following steps:
the implementation of the shared storage platform is different from that of the distributed storage platform, and firstly, taking the shared storage platform as an example:
(a) data partitioning and distribution phase
And a code extraction module of the code analysis system inquires a code segment set under the shared storage platform directory of the basic parallel code library and returns data dividing and distributing code segments. The code segment is as follows:
average_allocation(t_testlocals,step,step_before,numprocs,①);
stepsize=③/①;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
Data_trans(②,numprocs,displs,②_in);
an analysis module of the code analysis system fills corresponding point positions (the point positions are self-carried by code segments) by using the marked clause parameters obtained in the step (2), wherein the point position (i) is filled with num, the point position (ii) is filled with srcname, and the point position (iii) is filled with srcsize, the process is automatically completed by the code analysis system, and the code segments are obtained after filling:
average_allocation(t_testlocals,step,step_before,numprocs,num);
stepsize=srcsize/num;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
Data_trans(srcname,numprocs,displs,srcname_in);
several functions in the code segment are introduced:
the data partitioning function average _ allocation has the function of partitioning num batches of data into numacrocs threads, storing results in step and step _ before arrays, and setting t _ testlocals as a partitioning proportionality coefficient and setting the default as 1. The data division mode is as follows:
Figure BDA0002611832080000071
Figure BDA0002611832080000072
wherein stepiAnd step _ beforeiRespectively representing the data batch number and the batch number offset obtained by the thread i, wherein the batch number offset refers to the sum of the data quantity obtained by the first i-1 threads.
The DataPartition function is used for calculating the data volume and the data volume offset of each thread, and the result is stored in the array of sendrecvcnts and displs. The calculation method is as follows:
sendrecvcntsi=stepsize*stepi0≤i<numprocs
displsi=stepsize*step_beforei0≤i<numprocs
among them, sendrecvcntsiAnd dispisiAnd the data quantity obtained by the thread i and the data quantity offset are shown, and the data quantity offset refers to the sum of the data quantities obtained by the first i-1 threads.
The Data distribution function Data _ trans is used for distributing Data to each thread and distributing a Data packet to be processed with the address of srcname to each thread. Because the shared storage platform adopts a shared storage structure, all threads share the same physical memory, and the data distribution operation can be converted into the memory address operation. The address calculation mode is as follows:
srcname_ini=srcname+displsi0≤i<numprocs
wherein, srcname _ iniIndicating the address of the input data to which thread i is assigned.
(b) Data calculation phase
And a code extraction module of the code analysis system inquires a code segment set under the shared storage platform directory of the basic parallel code library and returns a data calculation code segment. The code segment is as follows:
omp_set_num_threads(numprocs);
#pragma omp parallel
{
int i=omp_get_thread_num();
Func_serial(①_in[i],②_out[i],step[i],var1,var2);
}
filling corresponding point positions by using the marked clause parameters obtained in the step (2) by an analysis module of the code analysis system, wherein the point positions (i) are filled with srcname and the point positions (ii) are filled with dstname, the process is automatically completed by the code analysis system, and code segments are obtained after filling as follows:
omp_set_num_threads(numprocs);
#pragma omp parallel
{
int i=omp_get_thread_num();
Func_serial(srcname_in[i],dstname_out[i],step[i],var1,var2);
}
in the code segment, omp _ set _ num _ threads is a statement for setting the thread number by OpenMP, and the thread number of the parallel domain is set to numacrocs. After the parallel domain is opened by the # pragma omp parallel, each thread executes the parallel executionCode segment in domain, each thread performs data calculation by calling serial API function Func _ serial, function parameter srcname _ in [ i [ ]]And dstname _ out [ i]Respectively representing the assigned input data address and output data address of thread i. Here, srcname _ in [ i]And the srcname _ iniSame meaning, dstname _ out [ i]The same is true.
(c) Data collection phase
And a code extraction module of the code analysis system inquires a code segment set under the shared storage platform directory of the basic parallel code library and returns a data collection code segment. The code segment is as follows:
stepsize=③/①;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
Data_trans(②,numprocs,displs,②_in);
filling the marked clause parameters obtained in the step (2) by an analysis module of the code analysis system, wherein the point location (i) is filled with num, the point location (ii) is filled with dstname, and the point location (iii) is filled with dstsize, the process is automatically completed by the code analysis system, and the code segments obtained after filling are as follows:
stepsize=dstsize/num;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
Data_trans(dstname,numprocs,displs,dstname_out);
in the code segment, the Data _ trans function is used to collect Data from each thread to the output Data packet address dstname, and the specific principle is introduced in (a) and is not described again, and the description of the DataPartition function is also introduced and is not described again.
According to a preferred embodiment of the present invention, the distributed storage platform is taken as an example:
(a) data partitioning and distribution phase
And a code extraction module of the code analysis system inquires a code segment set under the basic parallel code library distribution storage platform directory and returns data division and distribution code segments. The code segment is as follows:
average_allocation(t_testlocals,step,step_before,numprocs,①);
stepsize=③/①;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
MPI_Scatterv(②,sendrecvcnts,displs,MPI_④,
②_in,sendrecvcnts[myid],MPI_④,0,MPI_COMM_WORLD);
and (3) filling an analysis module of the code analysis system with the marked clause parameters obtained in the step (2), wherein point location (i) is filled with num, point location (ii) is filled with srcname, point location (iii) is filled with srcsize, and point location (iv) is filled with srcdatatype, the process is automatically completed by the code analysis system, and the code segments are obtained after filling:
average_allocation(t_testlocals,step,step_before,numprocs,num);
stepsize=srcsize/num;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
MPI_Scatterv(srcname,sendrecvcnts,displs,MPI_srcdatatype,srcname_in,
sendrecvcnts[myid],MPI_srcdatatype,0,MPI_COMM_WORLD);
the operation _ allocation and DataPartition in the code segment are introduced and will not be described herein. The MPI data distribution function MPI _ scatter is a standard distribution function for MPI.
(b) Data calculation phase
And a code extraction module of the code analysis system inquires a code segment set under the basic parallel code library distribution storage platform directory and returns a data calculation code segment. The code segment is as follows:
Func_serial(①_in,②_out,step[myid],var1,var2);
and (3) filling an analysis module of the code analysis system by using the marked clause parameters obtained in the step (2), wherein point locations (i) are filled with srcname and point locations (ii) are filled with dstname, the process is automatically completed by the code analysis system, and code segments are obtained after filling:
Func_serial(srcname_in,dstname_out,step[myid],var1,var2);
each node of the distributed storage platform performs data calculation by calling the serial program Func _ serial.
(c) Data collection phase
And a code extraction module of the code analysis system inquires a code segment set under the basic parallel code library distribution storage platform directory and returns a data collection code segment. The code segment is as follows:
stepsize=③/①;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
MPI_Allgatherv(②_out,sendrecvcnts[myid],MPI_④,②,sendrecvcnts,
displs,MPI_④,MPI_COMM_WORLD);
Data_trans(②,numprocs,displs,②_in);
and (3) filling an analysis module of the code analysis system by using the marked clause parameters obtained in the step (2), wherein the point location (i) is filled with num, the point location (ii) is filled with dstname, the point location (iii) is filled with dstsize, and the point location (iv) is filled with dsttatype, the process is automatically completed by the code analysis system, and the code segments are obtained after filling:
stepsize=dstsize/num;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
MPI_Allgatherv(dstname_out,sendrecvcnts[myid],MPI_dstdatatype,dstname,
endrecvcnts,displs,MPI_dstdatatype,MPI_COMM_WORLD);
in this code segment, the MPI data collection function MPI _ Allgatherv is an MPI standard function, and functions to collect data from each node to the output packet address dstname. The specific principle of the DataPartition function has been described above and will not be described again.
Splicing the filled parallel code segments to obtain a corresponding parallel program finally converted from the serial program, which specifically comprises the following steps:
taking a shared storage platform as an example, an analysis module of the code analysis system sequentially splices the code segments obtained in the step (3) according to the sequence of data division and distribution, data collection and data calculation, and then adds fixed codes such as variable definition and the like to obtain a function body of the parallel API function. And adding a _ prallel suffix to the serial API function name as the function name of the parallel API function. And taking the whole serial API function parameter list plus numprocs parameters as a parameter list of the parallel API function, wherein the numprocs parameters are thread numbers.
The resulting parallel program is as follows:
1.Func_serial_parallel(srcname,dstname,num,var1,var2,numprocs)
2.{
3.. 9.// temporary variable definition
4.// data partitioning
5.average_allocation(t_testlocals,step,step_before,numprocs,num);
6.stepsize=srcsize/num;
7.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
8.// data distribution
9.Data_trans(srcname,numprocs,displs,srcname_in);
10.// data Collection
11.stepsize=dstsize/num;
12.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
13.Data_trans(dstname,numprocs,displs,dstname_out);
14.// data calculation
15.omp_set_num_threads(numprocs);
16.#pragma omp parallel
17.{
18.int i=omp_get_thread_num();
19.Func_serial(srcname_in[i],dstname_out[i],step[i],var1,var2);
20.}
21.}
And (3) taking a distributed storage platform as an example, sequentially splicing the code segments obtained in the step (3) according to the sequence of data division and distribution, data calculation and data collection, and adding fixed codes such as variable definition and the like to obtain a function body of the parallel API function. And adding a _ prallel suffix to the serial API function name as the function name of the parallel API function. And taking the whole serial API function parameter list plus numprocs parameters as a parameter list of the parallel API function, wherein the numprocs parameters are node process numbers.
The resulting parallel program is as follows:
1.Func_serial_parallel(srcname,dstname,num,var1,var2,numprocs)
2.{
3.. 9.// temporary variable definition
4.// data partitioning
5.average_allocation(t_testlocals,step,step_before,numprocs,num);
6.stepsize=srcsize/num;
7.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
8.// data distribution
9.MPI_Scatterv(srcname,sendrecvcnts,displs,MPI_srcdatatype,srcname_in,sendrecvcnts[myid],MPI_srcdatatype,0,MPI_COMM_WORLD);
// data calculation
10.Func_serial(srcname_in,dstname_out,step[myid],var1,var2);
11.// data Collection
12.stepsize=dstsize/num;
13.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
14.MPI_Allgatherv(dstname_out,sendrecvcnts[myid],MPI_dstdatatype,dstname,sendrecvcnts,displs,/MPI_dstdatatype,MPI_COMM_WORLD);
15.}
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims (7)

1. A tag-based serial program parallelization method is characterized by comprising the following steps:
marking a serial program;
step (2) the code analysis system analyzes the mark and records the parameters of the mark clause;
step (3) the code analysis system extracts parallel code segments from the basic parallel code library and fills the parallel code segments with the marked clause parameters;
and (4) splicing the filled parallel code segments to obtain a parallel program corresponding to the final serial program.
2. The tag-based serial program parallelization method according to claim 1, characterized in that:
the method comprises the steps of marking function names of serial API functions, wherein the marks comprise mark names and mark clauses, the mark names are used for identification of a follow-up code analysis system, the mark clauses are used for providing parameters required by parallelization and comprise a data source, a data destination and a data batch number, the data source clauses provide data packet information to be processed in parallel, the data destination clauses provide result data packet information after the parallelization processing is finished, the data batch number clauses provide the detachable batch number of data packets in the data source clauses, and the detachable principle is that each batch of data after being detached can be processed by a serial program independently.
3. The tag-based serial program parallelization method according to claim 1, characterized in that:
the code analysis system comprises an analysis module and a code extraction module, wherein the analysis module is responsible for reading and analyzing the marked serial program, and analyzing and recording the marked clause parameters and the marked serial program when the mark name is identified; the code extraction module is responsible for extracting code segments from the underlying parallel code library.
4. The tag-based serial program parallelization method according to claim 3, characterized in that:
the basic parallel code library records three parallel stages of a shared storage platform and a distributed storage platform, namely parallel code segments of data division and distribution, data calculation and data collection.
5. The tag-based serial program parallelization method according to claim 1, characterized in that:
and the code analysis system fills the marked clause parameters into corresponding fixed point positions of the parallel code segment to obtain the parallel code segment containing the marked clause parameters.
6. The tag-based serial program parallelization method according to claim 1, characterized in that:
under a shared storage platform, the splicing sequence comprises data division and distribution, data collection and data calculation; under a distributed storage platform, the splicing sequence comprises data division and distribution, data calculation and data collection.
7. The tag-based serial program parallelization method according to claim 6, characterized in that:
after splicing, adding a fixed code defined by a variable to obtain a function body of the parallel API function; adding a parallel identification suffix to the serial API function name to serve as the function name of the parallel API function; and adding thread number parameters to the serial API function parameter list to obtain the whole parameter list of the parallel API function.
CN202010756781.2A 2020-07-31 2020-07-31 Serial program parallelization method based on marks Active CN111857732B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010756781.2A CN111857732B (en) 2020-07-31 2020-07-31 Serial program parallelization method based on marks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010756781.2A CN111857732B (en) 2020-07-31 2020-07-31 Serial program parallelization method based on marks

Publications (2)

Publication Number Publication Date
CN111857732A true CN111857732A (en) 2020-10-30
CN111857732B CN111857732B (en) 2021-10-22

Family

ID=72952524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010756781.2A Active CN111857732B (en) 2020-07-31 2020-07-31 Serial program parallelization method based on marks

Country Status (1)

Country Link
CN (1) CN111857732B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SU433482A1 (en) * 1972-02-28 1974-06-25 ДИвакин , В.Н.Тресоруков DEVICE mi GOiymmwASHCHITATIVE IVlAMIt-Ш WITH KAHAJIAI'M COMMUNICATIONS
CN1123930A (en) * 1994-04-28 1996-06-05 东芝株式会社 Programming methoed for concurrent programs and a supporting apparatus for concurrent programming
JP2004192139A (en) * 2002-12-09 2004-07-08 Sharp Corp Debug device, debug method and recording medium
CN1932766A (en) * 2006-10-12 2007-03-21 上海交通大学 Semi-automatic parallel method of large serial program code quantity-oriented field
CN101799762A (en) * 2010-04-07 2010-08-11 中国科学院对地观测与数字地球科学中心 Quick parallelization programming template method for remote sensing image processing algorithm
CN101944040A (en) * 2010-09-15 2011-01-12 复旦大学 Predicate-based automatic parallel optimizing method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SU433482A1 (en) * 1972-02-28 1974-06-25 ДИвакин , В.Н.Тресоруков DEVICE mi GOiymmwASHCHITATIVE IVlAMIt-Ш WITH KAHAJIAI'M COMMUNICATIONS
CN1123930A (en) * 1994-04-28 1996-06-05 东芝株式会社 Programming methoed for concurrent programs and a supporting apparatus for concurrent programming
JP2004192139A (en) * 2002-12-09 2004-07-08 Sharp Corp Debug device, debug method and recording medium
CN1932766A (en) * 2006-10-12 2007-03-21 上海交通大学 Semi-automatic parallel method of large serial program code quantity-oriented field
CN101799762A (en) * 2010-04-07 2010-08-11 中国科学院对地观测与数字地球科学中心 Quick parallelization programming template method for remote sensing image processing algorithm
CN101944040A (en) * 2010-09-15 2011-01-12 复旦大学 Predicate-based automatic parallel optimizing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FM镄: ""【并行计算】基于OpenMP的并行编程(#pragma omp parallel for)"", 《HTTPS://BLOG.CSDN.NET/WEIXIN_39568744/ARTICLE/DETAILS/88576576》 *

Also Published As

Publication number Publication date
CN111857732B (en) 2021-10-22

Similar Documents

Publication Publication Date Title
US9864590B2 (en) Method and system for automated improvement of parallelism in program compilation
CA1159151A (en) Cellular network processors
CN101601012B (en) Producer graph oriented programming framework with scenario support
CN109918294B (en) Method and system for detecting autonomous controllability of mixed source software
CN102541521B (en) Automatic operating instruction generating device based on structured query language and method
CA2172772A1 (en) Method and apparatus for an improved optimizing compiler
JP2002518732A (en) Type implementation method with parameters compatible with existing library without parameters
CN110825385B (en) Method for constructing read Native offline package and storage medium
US7493606B2 (en) Method for compiling and executing a parallel program
US20040003377A1 (en) Converting byte code instructions to a new instruction set
JP3047998B2 (en) Processor allocation method and apparatus in parallel computer
US7669191B1 (en) Compile-time dispatch of operations on type-safe heterogeneous containers
US5822592A (en) Method and system for determining source code location
EP0520708B1 (en) Method and apparatus for converting high level form abstract syntaxes into an intermediate form
CN115113876A (en) Incremental compiling method and device based on source code detection
EP4113284A1 (en) Cross-platform code conversion method and device
CN114356964A (en) Data blood margin construction method and device, storage medium and electronic equipment
CN111857732B (en) Serial program parallelization method based on marks
CN108664238A (en) A kind of execution method and device of explanation type script C-SUB
CN1932766A (en) Semi-automatic parallel method of large serial program code quantity-oriented field
CN111611158A (en) Application performance analysis system and method
JP7140935B1 (en) Deterministic memory allocation for real-time applications
CN115456628A (en) Intelligent contract viewing method and device based on block chain, storage medium and equipment
CN112083956A (en) Heterogeneous platform-oriented automatic management system for complex pointer data structure
JP3266097B2 (en) Automatic reentrant method and system for non-reentrant program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant