CN104503734A - Kahn-based process network program parallel framework extraction technique - Google Patents

Kahn-based process network program parallel framework extraction technique Download PDF

Info

Publication number
CN104503734A
CN104503734A CN201410855804.XA CN201410855804A CN104503734A CN 104503734 A CN104503734 A CN 104503734A CN 201410855804 A CN201410855804 A CN 201410855804A CN 104503734 A CN104503734 A CN 104503734A
Authority
CN
China
Prior art keywords
producer
memory array
consumer
data
kahn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410855804.XA
Other languages
Chinese (zh)
Inventor
李尚杰
程胜
周志军
魏明
卓保特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Original Assignee
BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd filed Critical BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Priority to CN201410855804.XA priority Critical patent/CN104503734A/en
Publication of CN104503734A publication Critical patent/CN104503734A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a Kahn-based process network program parallel framework extraction technique which is characterized in that based on the idea of localized control and scattered memory, and an application program is converted into a process network to extract a program parallel framework by three steps including preprocessing, consumer process reconfiguration and producer process reconfiguration. According to a typical computing-intensive program, namely a static affine nested loop program in an embedded system, a parallel discovery technique is provided and indicated as a Kahn process network.

Description

A kind of program parallelization framework extractive technique based on Kahn process network
Technical field
The invention belongs to a kind of program parallelization framework extractive technique based on Kahn process network.
Background technology
Multinuclear or multiprocessing framework have become the major technique of current various information system.From high-performance calculation to embedded system, multinuclear (comprising heterogeneous polynuclear) processor can meet calculated performance requirement effectively, reduces energy consumption.But the concurrency of the inherence of multi-core computer system utilizes and usually needs parallel program design.Existing parallel Programming based on specific parallel storehouse, programming personnel's Manual analysis program can concurrency, provide parallelization strategies and method.Existing parallel storehouse comprises MPI, OpenMP, Cuda etc., and high performance computing environment is supported in these parallel storehouses usually, and the expense of self is also comparatively large, cannot meet the demand of embedded environment multi-core parallel concurrent programming.
Summary of the invention
The technical problem to be solved in the present invention overcomes above-mentioned defect, a kind of program parallelization framework extractive technique based on Kahn process network is provided, for the computation-intensive program of a quasi-representative in embedded system---static affine nested loop program, propose a kind of concurrency discovery technique, and be expressed as Kahn process network.
For solving the problem, the technical solution adopted in the present invention is:
Based on a program parallelization framework extractive technique for Kahn process network, it is characterized in that: control based on localization and disperse the thought of internal memory, transform into process network from 3 step application programs and complete the extraction of program parallelization framework:
Step 1, pre-service: first all execution of every bar assignment statement are collapsed into a process by preprocessing process, thus the process network that formation one is initial;
Step 2, consumer process's reconstruct: consumer process's reconstruct be according to Kahn process network between communication be the basic demand of FIFO, memory array producer's process jointly with write operation is decomposed into independently region of memory, then a process of adjustment or newly-increased new consumer process;
Step 3, producer's process reconstruct: the reconstruct of producer's process refers to and the memory array that multiple consumer process accesses is replaced with independently memory array, then adjusts the process of producer's process structure.
As a kind of technical scheme of optimization, in described step 2, consumer process's restructuring procedure will consider the data consumes of each process, and provides independently memory array, makes producer's process to store data; For each array, guarantee to only have producer's process;
Concrete implementation procedure is as follows:
1) producer's process group same memory data to write operation, is identified; If S rthe set of all producer's processes memory array r being write to data, D rit is all consumer process from memory array r read data set;
2), memory array is split; For each process P i r∈ S r, utilize independently memory array r ialternative r supports its write operation;
3) effective link of consuming process and production process, is set up; In order to ensure the logical correctness of program, consuming process also must from new internal memory r imiddle reading data; By r ibe connected with the consumer of each r; But in such link, there is many invalid links; In order to eliminate invalid link, perform P i rand the data dependence analysis between consuming process, for the link not having data dependence, then deletes it, leaves the link really having data dependence;
4), r is determined iscope; According to data dependence analysis, set up an affine dependence function, this function is effective on input port territory; In general, if each input port territory can utilize the comprehensive parameters polyhedron of a k dimension to represent, namely for each producer to P/C, it can be represented by a unique polyhedron C (N) and an affine dependence function f; An affine dependence function f is expressed as integration matrix M and offset vector O:f (x)=Mx+O; By this dependence function, can determine for each producer/consumer at internal memory r ion input range;
5), consumer process is reconstructed; According to each r iand the scope in input port territory, be consumer process claim a new process its circulation bound is determined according to the scope of input port.
As a kind of technical scheme of optimization, in described step 3, step is as follows:
1), for each memory array, corresponding consumer process set is identified, if D rit is the set of all consumer process reading memory array r;
2), be each consumer process set up independently memory array r icarry out the access of alternative team r;
3), each r is determined iinput domain; Because consumer process's restructuring procedure completes, only there is a process P for each memory array r rinput data; So just only need to determine P rat each r ion output; It exports and is
4), the producer is determined meet r iinput domain y; This needs according to the reverse data-flow analysis between producers and consumers, solves data stream function and obtains;
5), according to the input range reconstruct producer code of each independently memory array.
Owing to have employed technique scheme, compared with prior art, the present invention devises the technology that a kind of technology extracts concurrency from serial program, supports the program parallelization of robotization.Because dissimilar serial program feature is different, the effect of parallelization is also not quite similar.The present invention is directed to the computation-intensive program of a quasi-representative in embedded system---static affine nested loop program, proposes a kind of concurrency discovery technique, and is expressed as Kahn process network.
Embodiment
Embodiment:
Based on a program parallelization framework extractive technique for Kahn process network, control based on localization and disperse the thought of internal memory, transforming into process network from 3 step application programs and complete the extraction of program parallelization framework:
Step 1, pre-service: first all execution of every bar assignment statement are collapsed into a process by preprocessing process, thus the process network that formation one is initial.
Step 2, consumer process's reconstruct: consumer process's reconstruct be according to Kahn process network between communication be the basic demand of FIFO, memory array producer's process jointly with write operation is decomposed into independently region of memory, then a process of adjustment or newly-increased new consumer process.
Step 3, producer's process reconstruct: the reconstruct of producer's process refers to and the memory array that multiple consumer process accesses is replaced with independently memory array, then adjusts the process of producer's process structure.
In described step 2, consumer process's restructuring procedure will consider the data consumes of each process, and provides independently memory array, makes producer's process to store data; For each array, guarantee to only have producer's process;
Concrete implementation procedure is as follows:
1) producer's process group same memory data to write operation, is identified; If S rthe set of all producer's processes memory array r being write to data, D rit is all consumer process from memory array r read data set.
2), memory array is split; For each process P i r∈ S r, utilize independently memory array r ialternative r supports its write operation.
3) effective link of consuming process and production process, is set up; In order to ensure the logical correctness of program, consuming process also must from new internal memory r imiddle reading data; By r ibe connected with the consumer of each r; But in such link, there is many invalid links; In order to eliminate invalid link, perform P i rand the data dependence analysis between consuming process, for the link not having data dependence, then deletes it, leaves the link really having data dependence.
4), r is determined iscope; According to data dependence analysis, set up an affine dependence function, this function is effective on input port territory; In general, if each input port territory can utilize the comprehensive parameters polyhedron of a k dimension to represent, namely for each producer to P/C, it can be represented by a unique polyhedron C (N) and an affine dependence function f; An affine dependence function f is expressed as integration matrix M and offset vector O:f (x)=Mx+O; By this dependence function, can determine for each producer/consumer at internal memory r ion input range.
5), consumer process is reconstructed; According to each r iand the scope in input port territory, be consumer process claim a new process its circulation bound is determined according to the scope of input port.
In described step 3, step is as follows:
1), for each memory array, corresponding consumer process set is identified, if D rit is the set of all consumer process reading memory array r.
2), be each consumer process set up independently memory array r icarry out the access of alternative team r.
3), each r is determined iinput domain; Because consumer process's restructuring procedure completes, only there is a process P for each memory array r rinput data; So just only need to determine P rat each r ion output; It exports and is
4), the producer is determined meet r iinput domain y; This needs according to the reverse data-flow analysis between producers and consumers, solves data stream function and obtains.
5), according to the input range reconstruct producer code of each independently memory array.Code conversion is become following multistage conditional expression:
If(y p∈D 1)then
x c=T 1(y p);
else if(y p∈D 2)then
x c=T 2(y p);
else if(y p∈D n)then
x c=T n(y p);
The present invention devises the technology that a kind of technology extracts concurrency from serial program, supports the program parallelization of robotization.Because dissimilar serial program feature is different, the effect of parallelization is also not quite similar.The present invention is directed to the computation-intensive program of a quasi-representative in embedded system---static affine nested loop program, proposes a kind of concurrency discovery technique, and is expressed as Kahn process network.
The present invention is not limited to above-mentioned preferred implementation, and anyone should learn the structure change made under enlightenment of the present invention, and every have identical or akin technical scheme with the present invention, all belongs to protection scope of the present invention.

Claims (3)

1. based on a program parallelization framework extractive technique for Kahn process network, it is characterized in that: control based on localization and disperse the thought of internal memory, transform into process network from 3 step application programs and complete the extraction of program parallelization framework:
Step 1, pre-service: first all execution of every bar assignment statement are collapsed into a process by preprocessing process, thus the process network that formation one is initial;
Step 2, consumer process's reconstruct: consumer process's reconstruct be according to Kahn process network between communication be the basic demand of FIFO, memory array producer's process jointly with write operation is decomposed into independently region of memory, then a process of adjustment or newly-increased new consumer process;
Step 3, producer's process reconstruct: the reconstruct of producer's process refers to and the memory array that multiple consumer process accesses is replaced with independently memory array, then adjusts the process of producer's process structure.
2. the program parallelization framework extractive technique based on Kahn process network according to claim 1, it is characterized in that: in described step 2, consumer process's restructuring procedure will consider the data consumes of each process, and provides independently memory array, makes producer's process to store data; For each array, guarantee to only have producer's process;
Concrete implementation procedure is as follows:
1) producer's process group same memory data to write operation, is identified; If S rthe set of all producer's processes memory array r being write to data, D rit is all consumer process from memory array r read data set;
2), memory array is split; For each process utilize independently memory array r ialternative r supports its write operation;
3) effective link of consuming process and production process, is set up; In order to ensure the logical correctness of program, consuming process also must from new internal memory r imiddle reading data; By r ibe connected with the consumer of each r; But in such link, there is many invalid links; In order to eliminate invalid link, perform and the data dependence analysis between consuming process, for the link not having data dependence, then deletes it, leaves the link really having data dependence;
4), r is determined iscope; According to data dependence analysis, set up an affine dependence function, this function is effective on input port territory; In general, if each input port territory can utilize the comprehensive parameters polyhedron of a k dimension to represent, namely for each producer to P/C, it can be represented by a unique polyhedron C (N) and an affine dependence function f; An affine dependence function f is expressed as integration matrix M and offset vector O:f (x)=Mx+O; By this dependence function, can determine for each producer/consumer at internal memory r ion input range;
5), consumer process is reconstructed; According to each r iand the scope in input port territory, be consumer process claim a new process , its circulation bound is determined according to the scope of input port.
3. the program parallelization framework extractive technique based on Kahn process network according to claim 2, is characterized in that: in described step 3, step is as follows:
1), for each memory array, corresponding consumer process set is identified, if D rit is the set of all consumer process reading memory array r;
2), be each consumer process set up independently memory array r icarry out the access of alternative team r;
3), each r is determined iinput domain; Because consumer process's restructuring procedure completes, only there is a process P for each memory array r rinput data; So just only need to determine P rat each r ion output; It exports and is
4), the producer is determined meet r iinput domain y; This needs according to the reverse data-flow analysis between producers and consumers, solves data stream function and obtains;
5), according to the input range reconstruct producer code of each independently memory array.
CN201410855804.XA 2014-12-31 2014-12-31 Kahn-based process network program parallel framework extraction technique Pending CN104503734A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410855804.XA CN104503734A (en) 2014-12-31 2014-12-31 Kahn-based process network program parallel framework extraction technique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410855804.XA CN104503734A (en) 2014-12-31 2014-12-31 Kahn-based process network program parallel framework extraction technique

Publications (1)

Publication Number Publication Date
CN104503734A true CN104503734A (en) 2015-04-08

Family

ID=52945135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410855804.XA Pending CN104503734A (en) 2014-12-31 2014-12-31 Kahn-based process network program parallel framework extraction technique

Country Status (1)

Country Link
CN (1) CN104503734A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992368A (en) * 2017-11-15 2018-05-04 国家计算机网络与信息安全管理中心 Method for interchanging data and system between a kind of multi-process

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1569104A2 (en) * 2004-01-09 2005-08-31 Interuniversitair Microelektronica Centrum Vzw An automated method for performing parallelization of sequential code and a computerized system adapted therefore

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1569104A2 (en) * 2004-01-09 2005-08-31 Interuniversitair Microelektronica Centrum Vzw An automated method for performing parallelization of sequential code and a computerized system adapted therefore

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALEXANDRU TURJAN等: "Translating affine nested-loop programs to Process Networks", 《PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON COMPILER、ARCHITETURE、AND SYNTHESIS FOR EMBEDDED SYSTEMS》 *
钱正平等: "分布式Kahn处理网络的一种集群调度算法", 《计算机应用研究》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992368A (en) * 2017-11-15 2018-05-04 国家计算机网络与信息安全管理中心 Method for interchanging data and system between a kind of multi-process

Similar Documents

Publication Publication Date Title
Wang et al. Smart: A mapreduce-like framework for in-situ scientific analytics
Huang et al. Implementation of the parallel mean shift-based image segmentation algorithm on a GPU cluster
Ding et al. ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms
Neshatpour et al. Big data analytics on heterogeneous accelerator architectures
Hamilton et al. Flexible and scalable deep learning with MMLSpark
Xiong et al. SLDP: A novel data placement strategy for large-scale heterogeneous Hadoop cluster
He et al. Parallel feature selection using positive approximation based on mapreduce
CN104503734A (en) Kahn-based process network program parallel framework extraction technique
Ashokkumar et al. Efficient method for secure key matching process of large data set integration in grid computing
Pektürk et al. Performance-aware high-performance computing for remote sensing big data analytics
Kaitoua et al. Muses: distributed data migration system for polystores
George et al. Hadoop mapreduce for tactical clouds
Zhu et al. Parallel image texture feature extraction under hadoop cloud platform
Rabenseifner Some aspects of message-passing on future hybrid systems
CN103440122A (en) Novel static function identification method using reverse extension control flow graphs
Dorier Src: Damaris-using dedicated i/o cores for scalable post-petascale hpc simulations
Zhu et al. Wolfpath: accelerating iterative traversing-based graph processing algorithms on GPU
Zheng et al. Accelerate K-means algorithm by using GPU in the hadoop framework
Hasanov et al. Topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms
Shan et al. Hybrid Cloud and HPC Approach to High-Performance Dataframes
Fan et al. Parallel geometric correction for single spaceborne SAR image
Stamelos et al. Performance and energy evaluation of spark applications on low-power SoCs
Liu et al. Collective computing for scientific big data analysis
Zhou et al. Efficient discovering and maintenance algorithm of subspaceclustering over high dimensional data streams.
Kumar et al. Parallel and distributed computing for processing big image and video data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150408