WO2012006285A1  Method for quantifying and analyzing intrinsic parallelism of an algorithm  Google Patents
Method for quantifying and analyzing intrinsic parallelism of an algorithmInfo
 Publication number
 WO2012006285A1 WO2012006285A1 PCT/US2011/042962 US2011042962W WO2012006285A1 WO 2012006285 A1 WO2012006285 A1 WO 2012006285A1 US 2011042962 W US2011042962 W US 2011042962W WO 2012006285 A1 WO2012006285 A1 WO 2012006285A1
 Authority
 WO
 Grant status
 Application
 Patent type
 Prior art keywords
 algorithm
 parallelism
 information
 sense
 computer
 Prior art date
Links
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRICAL DIGITAL DATA PROCESSING
 G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
 G06F17/10—Complex mathematical operations

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRICAL DIGITAL DATA PROCESSING
 G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
 G06F17/10—Complex mathematical operations
 G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, KarhunenLoeve, transforms
 G06F17/147—Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRICAL DIGITAL DATA PROCESSING
 G06F8/00—Arrangements for software engineering
 G06F8/40—Transformations of program code
 G06F8/41—Compilation
 G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
 G06F8/456—Parallelism detection
Abstract
Description
METHOD FOR QUANTIFYING AND ANALYZING INTRINSIC PARALLELISM OF AN ALGORITHM
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method for quantifying and analyzing parallelism of an algorithm, more particularly to a method for quantifying and analyzing intrinsic parallelism of an algorithm.
2. Description of the Related Art
G. M. Amdahl introduced a method for parallelization of an algorithm according to a ratio of sequential portion of the algorithm ("Validity of singleprocessor approach to achieving largescale computing capability, " Proc. of AFIPS Conference, pages 483485, 1967) . A drawback of Amdahl's method is that a degree of parallel ism of the algorithm obtained using the method is dependent on a target platform executing the method, and is not necessarily dependent on the algorithm itself. Therefore, the degree of parallelism obtained using Amdahl's method is extrinsic to the algorithm and is biased by the target platform.
A. Prihozhy et al. proposed a method for evaluating parallelization potential of an algorithm based on a ratio between complexity and a critical path length of the algorithm ("Evaluation of the parallelization potential for efficient multimedia implementations: dynamic evaluation of algorithm critical path," IEEE Trans, on Circuits and Systems for Video Technology, pages 593608, Vol.15, No.5, May2005) . The complexity is a total number of operations in the algorithm, and the critical path length is the largest number of operations that need to be sequentially executed due to computational data dependencies. Although the method may characterize an average degree of parallelism embedded in the algorithm, it is insufficient for exhaustively characterizing versatile multigrain parallelisms embedded in the algorithm.
SUMMARY OF THE INVENTION
Therefore, the object of the present invention is to provide a method for quantifying and analyzing intrinsic parallelism of an algorithm that is not susceptible to bias by a target hardware and/or software platform .
Accordingly, a method of the present invention for quantifying and analyzing intrinsic parallelism of an algorithm is adapted to be implemented by a computer and comprises the steps of:
a) configuring the computer to represent the algorithm by means of a plurality of operation sets; b) configuring the computer to obtain a Laplacian matrix according to the operation sets;
c^{)} configuring the computer to compute eigenvalues and eigenvectors of the Laplacian matrix; and d) configuring the computer to obtain a set of information related to intrinsic parallelism of the algorithm according to the eigenvalues and the eigenvectors of the Laplacian matrix.
BRIEF DESCRIPTION OF THE DRAWINGS
Other features and advantages of the present invention will become apparent in the following detailed description of the preferred embodiment with reference to the accompanying drawings, of which:
Figure 1 is a flow chart illustrating a preferred embodiment of a method for quantifying and analyzing intrinsic parallelism of an algorithm according to the present invention;
Figure 2 is a schematic diagram illustrating dataflow information related to an exemplary algorithm;
Figure 3 is a schematic diagram of an exemplary set of dataflow graphs;
Figure 4 is a schematic diagram illustrating operation sets of a 4x4 discrete cosine transform algorithm;
Figure 5 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 6;
Figure 6 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 5; and Figure 7 is a schematic diagram illustrating an exemplary composition of intrinsic parallelism corresponding to a dependency depth equal to 3 .
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Referring to Figure 1 , the preferred embodiment of a method according to the present invention for evaluating intrinsic parallelism of an algorithm is adapted to be implemented by a computer, and includes the following steps. A degree of intrinsic parallelism indicates a degree of parallelism of an algorithm itself without considering designs and configuration of software and hardware, that is to say, the method according to this invention is not limited by software and hardware when it is used for analyzing an algorithm.
In step 1 1 , the computer is configured to represent an algorithm by means of a plurality of operation sets. Each of the operation sets may be an equation, a program code, a flow chart, or any other form for expressing the algorithm. In the fol lowing example , the algorithm includes three operation sets 01 , 02 and 03 that are expressed as
01 =A_{!} + B i + Ci + Di ,
02 =A_{2} + B_{2} + C_{2} , and
03 =A_{3} + B_{3} + C_{3} .
Step 12 is to configure the computer to obtain a
Laplacian matrix L_{d} according to the operation sets, and includes the following substeps. In substep 121, according to the operation sets, the computer is configured to obtain dataflow information related to the algorithm. As shown in Figure 2, the dataflow information corresponding to the operation sets of the example may be expressed as follows .
Datal=A_{!}+Bi
Data2=A_{2}+B_{2}
Data3=A_{3}+B_{3}
Data4=Datal+Data7
Data5=Data2+C_{2}
Data6=Data3+C_{3}
In substep 122, the computer is configured to obtain a dataflow graph according to the dataflow information. The dataflow graph is composed of a plurality of vertexes that denote operations in the algorithm, and a plurality of directed edges that indicate interconnection between corresponding two of the vertexes and that represent sources and destinations of data in the algorithm. For the dataflow information shown in Figure 2, operator symbols Vi to V_{7} (i.e., the vertexes) are used instead of addition operators and arrows (i.e., the directed edges) represent the sources and destinations of data to thereby obtain the dataflow graph as shown in Figure 3. In particular, the operator symbol Vi represents the addition operation for Ai+Bi, the operator symbol V_{2} represents the addition operation for A_{2}+B_{2}, the operator symbol V_{3} represents the addition operation for A3+B3, the operator symbol V_{4} represents the addition operation for Datal+Data7, the operator symbol V_{5} represents the addition operation for Data2+C_{2}, the operator symbol V_{6} represents the addition operation for Data3+C_{3}, and the operator symbol V_{7} represents the addition operation for D1+C1.
From the dataflow graph shown in Figure 3, it can be appreciated that the operator symbol V_{4} is dependent on the operator symbols V_{x} and V_{7}. Similarly, the operator symbol V_{5} is dependent on the operator symbol V_{2}, the operator symbol V_{6} is dependent on the operator symbol V_{3}, and the operator symbols V_{4}, V_{5} and V_{6} are independent of each other.
In substep 123, the computer is configured to obtain the Laplacian matrix ^according to the dataflow graphs . In the Laplacian matrix Ld, the i^{th} diagonal element shows a number of operator symbols that are connected to the operator symbol V±, and the offdiagonal element denotes whether two operator symbols are connected. Therefore, the Laplacian matrix L_{d} can clearly express the dataflow graphs by a compact linear algebraic form. The set of dataflow graphs shown in Figure 3 may be expressed as follows . 1 0 0 1 0 0 0
0 1 0 0 1 0 0
0 0 1 0 0 1 0
1 0 0 2 0 0 1
0 1 0 0 1 0 0
0 0 1 0 0 1 0
0 0 0 1 0 0 1
The Laplacian matrix Lj represents connect ivity among the operator symbols V_{x} to V_{7}, and the first column to the seventh column represent the operator symbols Vi to V_{7}, respectively. For example, in the first column, the operator symbol Vi is connected to the operator symbbl V_{4}, and thus the matrix element (1,4) is 1.
In step 13, the computer is configured to compute eigenvalues λ and eigenvectors ¾of the Laplacian matrix Ld. Regarding the Laplacian matrix Lj obtained in the above example, the eigenvalues λ and the eigenvectors
λ = [0 0 0 1 2 2 3] , and
In step 14, the computer is configured to obtain a set of information related to intrinsic parallelism of the algorithm according to the eigenvalues λ and the eigenvectors X_{d} of the Laplacian matrix Ld The set of information related to intrinsic parallelism is defined in a strict manner to recognize independent ones of the operation sets that are independent of each other and hence can be executed in parallel. The set of information related to strictsense parallelism includes a degree of strictsense parallelism representing a number of independent ones of the operation sets of the algorithm, and a set of compositions of strictsense parallelism corresponding to the operation sets, respectively.
Based on spectral graph theory introduced by F. R. K. Chung (Regional Conferences Series in Mathematics, No. 92, 1997), a number of connected components in a graph is equal to a number of the eigenvalues of the Laplacian matrix that are equal to 0. The degree of strictsense parallelism embedded within the algorithm is thus equal to a number of eigenvalues λ that are equal to 0. Besides, based on the spectral graph theory, the compositions of strictsense parallelism may be identified according to the eigenvectors X_{d} associated with the eigenvalues λ that are equal to 0.
From the above example, it can be found that the set of dataflow graphs is composed of three independent operation sets, since there exist three Laplacian eigenvalues that are equal to 0. Thus, the degree of strictsense parallelism embedded in the exemplified algorithm is equal to 3. Subsequently, the first, second and third ones of the eigenvectors Xd are associated with the eigenvalues that are equal to 0. By observing the first one of the eigenvectors Xd, it is clear that the values corresponding to the operator symbols Vi, V_{4} and V_{7} are nonzero, that is to say, the operator symbols Vi, V_{4} and V_{7} are dependent and form a connected one (ViV_{4}V_{7}) of the dataflow graph. Similarly, from the second and third ones of the eigenvectors ¾ associated with the eigenvalues λ that are equal to 0, it can be appreciated that the operator symbols V_{2} , V_{5} and the operator symbol s V_{3} , V_{6} are dependent and form the remaining two connected ones (V_{2}V_{5} and V3~V_{6}) of the dataflow graph, respectively. Therefore, the computer is configured to obtain the degree of strictsense parallelism that is equal to 3, and the compositions of strictsense parallelism that may be expressed in the form of a graph (shown in Figure 3) , a table, equations, or program codes.
In step 15, the computer is configured to obtain a plurality of sets of information related to multigrain parallelism of the algorithm according to the set of information related to strictsense parallelism and at least one of a plurality of dependency depths of the algorithm. The sets of information related to multigrain parallelism include a set of information related to widesense parallelism of the algorithm that characterizes all possible parallelisms embedded in an independent operation set.
It should be noted that the dependency depths of an algorithm represent associated sequential steps essential for processing the algorithm, and thus are complementary to potential parallelism of the algorithm. Thus, information related to different intrinsic parallelisms of an algorithm may be obtained based on different dependency depths. In particular, the information related to strictsense parallelism is the information related to intrinsic parallelism of the algorithm corresponding to a maximum one of the dependency depths of the algorithm, and the information related to widesense parallelism is the information related to intrinsic parallelism of the algorithm corresponding to a minimum one of the dependency depths .
For example, the abovementioned algorithm includes two different compositions of strictsense parallelism, i.e., ViV_{4}V_{7} and V_{2}V_{5} (V_{3}V_{6} is similar to V_{2}V_{5} and can be considered to be the same composition) . Regarding the composition of the strictsense parallelism Vi^{~}V_{4}V_{7}, it can be known that the operator symbols Vi and V_{7} are independent of each other, i.e. , the operator symbols Vi and V_{7} can be processed in parallel . Therefore, the set of information related to widesense parallelism of the algorithm includes a degree of widesense parallelism that is equal to 4, and compositions of widesense parallelism are similar to the compositions of strictsense parallelism.
According to the method of this embodiment, the degree of widesense parallelism of the abovementioned algorithm is equal to 4. It is assumed that a processing element requires 7 processing cycles for implementing the algorithm, since the algorithm includes 7 operator symbols Vi~V_{7}. According to the degree of strictsense parallelism that is equal to 3, using 3 processing elements to implement the algorithm will take up 3 processing cycles. According to the degree of widesense parallelism that is equal to 4, using 4 processing elements to implement the algorithm will take up 2 processing cycles. Further, it can be known that at least 2 processing cycles are necessary for implementing the algorithm even though more processing elements are used. Therefore, an optimum number of processing elements used for implementing an algorithm may be obtained according to the method of this embodiment .
Taking a 4x4 discrete cosine transform (DCT) as an example, operation sets of the DCT algorithm are represented by dataflow graphs as shown in Figure 4.
Since the 4x4 DCT is well known to those skilled in the art, further details thereof will be omitted herein for the sake of brevity. From Figure 4, it can be known that the maximum one of the dependency depths of the
4x4 DCT algorithm is equal to 6. Regarding the maximum one of the dependency depths (i.e., 6), the composition of strictsense parallelism of this algorithm may be obtained as shown in Figure 5, and the degree of strictsense parallelism of this algorithm is equal to 4 according to the method of this embodiment. When analyzing the intrinsic parallelism of the 4x4 DCT algorithm with one of the dependency depths that is equal to 5, the composition of intrinsic parallelism of this algorithm may be obtained as shown in Figure 6, and the degree of intrinsic parallelism is equal to 8. Further, when analyzing the intrinsic parallelism of the 4x4 DCT algorithm with one of the dependency depths that is equal to 3, the composition of intrinsic parallelism of this algorithm may be obtained as shown in Figure 7, and the degree of intrinsic parallelism is equal to 16.
In summary, the method according to this invention may be used to evaluate the intrinsic parallelism of an algorithm.
While the present invention has been described in connection with what is considered the most practical and preferred embodiment, it is understood that this invention is not limited to the disclosed embodiment but is intended to cover various arrangements included within the spirit and scope of the broadest interpretation so as to encompass all suchmodifications and equivalent arrangements.
Claims
Priority Applications (4)
Application Number  Priority Date  Filing Date  Title 

TW99122162  20100706  
TW099122162  20100706  
US12832557 US20120011186A1 (en)  20100708  20100708  Method for quantifying and analyzing intrinsic parallelism of an algorithm 
US12/832,557  20100708 
Applications Claiming Priority (4)
Application Number  Priority Date  Filing Date  Title 

JP2013518789A JP5925202B2 (en)  20100706  20110705  Method for quantifying and analyzing parallel processing algorithm 
EP20110804255 EP2591414A4 (en)  20100706  20110705  Method for quantifying and analyzing intrinsic parallelism of an algorithm 
KR20137001820A KR20130038903A (en)  20100706  20110705  Method for quantifying and analyzing intrinsic parallelism of an algorithm 
CN 201180033554 CN103180821B (en)  20100708  20110705  The method of analysis and quantization algorithm parallelism nature 
Publications (1)
Publication Number  Publication Date 

WO2012006285A1 true true WO2012006285A1 (en)  20120112 
Family
ID=45441539
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

PCT/US2011/042962 WO2012006285A1 (en)  20100706  20110705  Method for quantifying and analyzing intrinsic parallelism of an algorithm 
Country Status (4)
Country  Link 

EP (1)  EP2591414A4 (en) 
JP (1)  JP5925202B2 (en) 
KR (1)  KR20130038903A (en) 
WO (1)  WO2012006285A1 (en) 
Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US20030195938A1 (en) *  20000626  20031016  Howard Kevin David  Parallel processing systems and method 
US7171397B1 (en) *  20020821  20070130  Ncr Corp.  Method and system for measuring parallelism of a database system execution step 
US20090007116A1 (en) *  20070627  20090101  Microsoft Corporation  Adjacent data parallel and streaming operator fusion 
US20090028433A1 (en) *  20070503  20090129  David Allen Tolliver  Method for partitioning combinatorial graphs 
WO2011163223A1 (en)  20100622  20111229  National Cheng Kung University  Method of analyzing intrinsic parallelism of algorithm 
Family Cites Families (4)
Publication number  Priority date  Publication date  Assignee  Title 

US5587922A (en) *  19930616  19961224  Sandia Corporation  Multidimensional spectral load balancing 
US6615211B2 (en) *  20010319  20030902  International Business Machines Corporation  System and methods for using continuous optimization for ordering categorical data sets 
US7724256B2 (en) *  20050321  20100525  Siemens Medical Solutions Usa, Inc.  Fast graph cuts: a weak shape assumption provides a fast exact method for graph cuts segmentation 
US7406200B1 (en) *  20080108  20080729  International Business Machines Corporation  Method and system for finding structures in multidimensional spaces using imageguided clustering 
Patent Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US20030195938A1 (en) *  20000626  20031016  Howard Kevin David  Parallel processing systems and method 
US7171397B1 (en) *  20020821  20070130  Ncr Corp.  Method and system for measuring parallelism of a database system execution step 
US20090028433A1 (en) *  20070503  20090129  David Allen Tolliver  Method for partitioning combinatorial graphs 
US20090007116A1 (en) *  20070627  20090101  Microsoft Corporation  Adjacent data parallel and streaming operator fusion 
WO2011163223A1 (en)  20100622  20111229  National Cheng Kung University  Method of analyzing intrinsic parallelism of algorithm 
NonPatent Citations (2)
Title 

IE TRANS. ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 15, no. 5, May 2005 (20050501), pages 593  608 
See also references of EP2591414A4 * 
Also Published As
Publication number  Publication date  Type 

EP2591414A4 (en)  20140806  application 
EP2591414A1 (en)  20130515  application 
KR20130038903A (en)  20130418  application 
JP2013530477A (en)  20130725  application 
JP5925202B2 (en)  20160525  grant 
Similar Documents
Publication  Publication Date  Title 

An efficient binarydecisiondiagrambased approach for network reliability and sensitivity analysis  
Kultima et al.  MOCAT: a metagenomics assembly and gene prediction toolkit  
Peymandoust et al.  Automatic instruction set extension and utilization for embedded processors  
US20100287204A1 (en)  Systems and methods for using provenance information for data retention in streamprocessing  
Krogmann et al.  Using genetic search for reverse engineering of parametric behavior models for performance prediction  
Vasicek et al.  An evolvable hardware system in Xilinx Virtex II Pro FPGA  
Suhendra et al.  Efficient detection and exploitation of infeasible paths for software timing analysis  
Bayer et al.  DCTlike transform for image compression requires 14 additions only  
Ranjan et al.  ASLAN: Synthesis of approximate sequential circuits  
Li  BFC: correcting Illumina sequencing errors  
Schweer et al.  Compound Poisson INAR (1) processes: stochastic properties and testing for overdispersion  
US20100205585A1 (en)  Fast vector masking algorithm for conditional data selection in simd architectures  
US20140150105A1 (en)  Clustering processing method and device for virus files  
Shimizu et al.  Discovery of nongaussian linear causal models using ICA  
US20130198272A1 (en)  Operation log storage system, device, and program  
Russell et al.  A grammarbased distance metric enables fast and accurate clustering of large sets of 16S sequences  
Debrabant et al.  B–Series Analysis of Stochastic Runge–Kutta Methods That Use an Iterative Scheme to Compute Their Internal Stage Values  
Cintra et al.  Lowcomplexity 8point DCT approximations based on integer functions  
Merlo  Detection of plagiarism in university projects using metricsbased spectral similarity  
Yamaguchi et al.  Finding the maximum common subgraph of a partial ktree and a graph with a polynomially bounded number of spanning trees  
US20080276081A1 (en)  Compact representation of instruction execution path history  
US20140075161A1 (en)  DataParallel Computation Management  
JP2010123000A (en)  Web page group extraction method, device and program  
Li et al.  Revealing the trace of highquality JPEG compression through quantization noise analysis  
Gopalakrishnan et al.  Finding linear buildingblocks for RTL synthesis of polynomial datapaths with fixedsize bitvectors 
Legal Events
Date  Code  Title  Description 

121  Ep: the epo has been informed by wipo that ep was designated in this application 
Ref document number: 11804255 Country of ref document: EP Kind code of ref document: A1 

NENP  Nonentry into the national phase in: 
Ref country code: DE 

ENP  Entry into the national phase in: 
Ref document number: 2013518789 Country of ref document: JP Kind code of ref document: A 

ENP  Entry into the national phase in: 
Ref document number: 20137001820 Country of ref document: KR Kind code of ref document: A 