CN104503734A

CN104503734A - Kahn-based process network program parallel framework extraction technique

Info

Publication number: CN104503734A
Application number: CN201410855804.XA
Authority: CN
Inventors: 李尚杰; 程胜; 周志军; 魏明; 卓保特
Original assignee: BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Current assignee: BEIJING SHENZHOU AEROSPACE SOFTWARE TECHNOLOGY Co Ltd
Priority date: 2014-12-31
Filing date: 2014-12-31
Publication date: 2015-04-08

Abstract

The invention discloses a Kahn-based process network program parallel framework extraction technique which is characterized in that based on the idea of localized control and scattered memory, and an application program is converted into a process network to extract a program parallel framework by three steps including preprocessing, consumer process reconfiguration and producer process reconfiguration. According to a typical computing-intensive program, namely a static affine nested loop program in an embedded system, a parallel discovery technique is provided and indicated as a Kahn process network.

Description

A kind of program parallelization framework extractive technique based on Kahn process network

Technical field

The invention belongs to a kind of program parallelization framework extractive technique based on Kahn process network.

Background technology

Multinuclear or multiprocessing framework have become the major technique of current various information system.From high-performance calculation to embedded system, multinuclear (comprising heterogeneous polynuclear) processor can meet calculated performance requirement effectively, reduces energy consumption.But the concurrency of the inherence of multi-core computer system utilizes and usually needs parallel program design.Existing parallel Programming based on specific parallel storehouse, programming personnel's Manual analysis program can concurrency, provide parallelization strategies and method.Existing parallel storehouse comprises MPI, OpenMP, Cuda etc., and high performance computing environment is supported in these parallel storehouses usually, and the expense of self is also comparatively large, cannot meet the demand of embedded environment multi-core parallel concurrent programming.

Summary of the invention

The technical problem to be solved in the present invention overcomes above-mentioned defect, a kind of program parallelization framework extractive technique based on Kahn process network is provided, for the computation-intensive program of a quasi-representative in embedded system---static affine nested loop program, propose a kind of concurrency discovery technique, and be expressed as Kahn process network.

For solving the problem, the technical solution adopted in the present invention is:

Based on a program parallelization framework extractive technique for Kahn process network, it is characterized in that: control based on localization and disperse the thought of internal memory, transform into process network from 3 step application programs and complete the extraction of program parallelization framework:

Step 1, pre-service: first all execution of every bar assignment statement are collapsed into a process by preprocessing process, thus the process network that formation one is initial;

Step 2, consumer process's reconstruct: consumer process's reconstruct be according to Kahn process network between communication be the basic demand of FIFO, memory array producer's process jointly with write operation is decomposed into independently region of memory, then a process of adjustment or newly-increased new consumer process;

Step 3, producer's process reconstruct: the reconstruct of producer's process refers to and the memory array that multiple consumer process accesses is replaced with independently memory array, then adjusts the process of producer's process structure.

As a kind of technical scheme of optimization, in described step 2, consumer process's restructuring procedure will consider the data consumes of each process, and provides independently memory array, makes producer's process to store data; For each array, guarantee to only have producer's process;

Concrete implementation procedure is as follows:

1) producer's process group same memory data to write operation, is identified; If S _rthe set of all producer's processes memory array r being write to data, D _rit is all consumer process from memory array r read data set;

2), memory array is split; For each process P _i ^r∈ S _r, utilize independently memory array r _ialternative r supports its write operation;

3) effective link of consuming process and production process, is set up; In order to ensure the logical correctness of program, consuming process also must from new internal memory r _imiddle reading data; By r _ibe connected with the consumer of each r; But in such link, there is many invalid links; In order to eliminate invalid link, perform P _i ^rand the data dependence analysis between consuming process, for the link not having data dependence, then deletes it, leaves the link really having data dependence;

4), r is determined _iscope; According to data dependence analysis, set up an affine dependence function, this function is effective on input port territory; In general, if each input port territory can utilize the comprehensive parameters polyhedron of a k dimension to represent, namely for each producer to P/C, it can be represented by a unique polyhedron C (N) and an affine dependence function f; An affine dependence function f is expressed as integration matrix M and offset vector O:f (x)=Mx+O; By this dependence function, can determine for each producer/consumer at internal memory r _ion input range;

5), consumer process is reconstructed; According to each r _iand the scope in input port territory, be consumer process claim a new process its circulation bound is determined according to the scope of input port.

As a kind of technical scheme of optimization, in described step 3, step is as follows:

1), for each memory array, corresponding consumer process set is identified, if D _rit is the set of all consumer process reading memory array r;

2), be each consumer process set up independently memory array r _icarry out the access of alternative team r;

3), each r is determined _iinput domain; Because consumer process's restructuring procedure completes, only there is a process P for each memory array r ^rinput data; So just only need to determine P ^rat each r _ion output; It exports and is

4), the producer is determined meet r _iinput domain y; This needs according to the reverse data-flow analysis between producers and consumers, solves data stream function and obtains;

5), according to the input range reconstruct producer code of each independently memory array.

Owing to have employed technique scheme, compared with prior art, the present invention devises the technology that a kind of technology extracts concurrency from serial program, supports the program parallelization of robotization.Because dissimilar serial program feature is different, the effect of parallelization is also not quite similar.The present invention is directed to the computation-intensive program of a quasi-representative in embedded system---static affine nested loop program, proposes a kind of concurrency discovery technique, and is expressed as Kahn process network.

Embodiment

Embodiment:

Based on a program parallelization framework extractive technique for Kahn process network, control based on localization and disperse the thought of internal memory, transforming into process network from 3 step application programs and complete the extraction of program parallelization framework:

Step 1, pre-service: first all execution of every bar assignment statement are collapsed into a process by preprocessing process, thus the process network that formation one is initial.

Step 2, consumer process's reconstruct: consumer process's reconstruct be according to Kahn process network between communication be the basic demand of FIFO, memory array producer's process jointly with write operation is decomposed into independently region of memory, then a process of adjustment or newly-increased new consumer process.

In described step 2, consumer process's restructuring procedure will consider the data consumes of each process, and provides independently memory array, makes producer's process to store data; For each array, guarantee to only have producer's process;

Concrete implementation procedure is as follows:

1) producer's process group same memory data to write operation, is identified; If S _rthe set of all producer's processes memory array r being write to data, D _rit is all consumer process from memory array r read data set.

2), memory array is split; For each process P _i ^r∈ S _r, utilize independently memory array r _ialternative r supports its write operation.

3) effective link of consuming process and production process, is set up; In order to ensure the logical correctness of program, consuming process also must from new internal memory r _imiddle reading data; By r _ibe connected with the consumer of each r; But in such link, there is many invalid links; In order to eliminate invalid link, perform P _i ^rand the data dependence analysis between consuming process, for the link not having data dependence, then deletes it, leaves the link really having data dependence.

4), r is determined _iscope; According to data dependence analysis, set up an affine dependence function, this function is effective on input port territory; In general, if each input port territory can utilize the comprehensive parameters polyhedron of a k dimension to represent, namely for each producer to P/C, it can be represented by a unique polyhedron C (N) and an affine dependence function f; An affine dependence function f is expressed as integration matrix M and offset vector O:f (x)=Mx+O; By this dependence function, can determine for each producer/consumer at internal memory r _ion input range.

In described step 3, step is as follows:

1), for each memory array, corresponding consumer process set is identified, if D _rit is the set of all consumer process reading memory array r.

2), be each consumer process set up independently memory array r _icarry out the access of alternative team r.

4), the producer is determined meet r _iinput domain y; This needs according to the reverse data-flow analysis between producers and consumers, solves data stream function and obtains.

5), according to the input range reconstruct producer code of each independently memory array.Code conversion is become following multistage conditional expression:

If(y _p∈D ₁)then

x _c＝T ₁(y _p)；

else if(y _p∈D ₂)then

x _c＝T ₂(y _p)；

…

else if(y _p∈D _n)then

x _c＝T _n(y _p)；

The present invention devises the technology that a kind of technology extracts concurrency from serial program, supports the program parallelization of robotization.Because dissimilar serial program feature is different, the effect of parallelization is also not quite similar.The present invention is directed to the computation-intensive program of a quasi-representative in embedded system---static affine nested loop program, proposes a kind of concurrency discovery technique, and is expressed as Kahn process network.

The present invention is not limited to above-mentioned preferred implementation, and anyone should learn the structure change made under enlightenment of the present invention, and every have identical or akin technical scheme with the present invention, all belongs to protection scope of the present invention.

Claims

1. based on a program parallelization framework extractive technique for Kahn process network, it is characterized in that: control based on localization and disperse the thought of internal memory, transform into process network from 3 step application programs and complete the extraction of program parallelization framework:

2. the program parallelization framework extractive technique based on Kahn process network according to claim 1, it is characterized in that: in described step 2, consumer process's restructuring procedure will consider the data consumes of each process, and provides independently memory array, makes producer's process to store data; For each array, guarantee to only have producer's process;

Concrete implementation procedure is as follows:

2), memory array is split; For each process utilize independently memory array r _ialternative r supports its write operation;

3) effective link of consuming process and production process, is set up; In order to ensure the logical correctness of program, consuming process also must from new internal memory r _imiddle reading data; By r _ibe connected with the consumer of each r; But in such link, there is many invalid links; In order to eliminate invalid link, perform and the data dependence analysis between consuming process, for the link not having data dependence, then deletes it, leaves the link really having data dependence;

5), consumer process is reconstructed; According to each r _iand the scope in input port territory, be consumer process claim a new process , its circulation bound is determined according to the scope of input port.

3. the program parallelization framework extractive technique based on Kahn process network according to claim 2, is characterized in that: in described step 3, step is as follows: