CN102043886B - Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow - Google Patents

Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow Download PDF

Info

Publication number
CN102043886B
CN102043886B CN201010619849A CN201010619849A CN102043886B CN 102043886 B CN102043886 B CN 102043886B CN 201010619849 A CN201010619849 A CN 201010619849A CN 201010619849 A CN201010619849 A CN 201010619849A CN 102043886 B CN102043886 B CN 102043886B
Authority
CN
China
Prior art keywords
operator
data
integrated circuit
unit
control flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201010619849A
Other languages
Chinese (zh)
Other versions
CN102043886A (en
Inventor
王新安
胡子一
安辉耀
谢峥
王腾
张兴
周生明
赵秋奇
马芝
孙亚春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN201010619849A priority Critical patent/CN102043886B/en
Publication of CN102043886A publication Critical patent/CN102043886A/en
Application granted granted Critical
Publication of CN102043886B publication Critical patent/CN102043886B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

The invention discloses an underlying hardware mapping method and device for an integrated circuit, wherein a computer language program describing the algorithm of the integrated circuit is analyzed, mapped into a data control flow graph and further converted into an operator time-space diagram; and the time sequence constraint is carried out on the data control flow graph, then the clustering compression is carried out on the operator time-space diagram according to a time sequence label, and the logical description on the underlying hardware circuit of the integrated circuit is further generated, thereby creating a mapping tool from computer language to the underlying hardware circuit of the integrated circuit, realizing the process of generating underlying hardware of the integrated circuit from C or MATLAB (matrix laboratory) and other languages in a standardized manner conveniently and fast. The invention also discloses time sequence constraint method and device for a data control flow graph, wherein the circuit obtained according to the constraint method is ensured to have regularity by carrying out the time sequence constraint on the data control flow; and furthermore, the method is applicable to time sequence design and verification of a digital circuit and can be favorable for a hardware engineer to carry out hardware design to a greater extent.

Description

Integrated circuit lower floor hardware mapping method, Data Control stream temporal constraint method and device
Technical field
The present invention relates to the IC design field, especially a kind of integrated circuit lower floor hardware mapping method, Data Control stream temporal constraint method and device.
Background technology
In integrated circuit fields, the design rate of integrated circuit lags behind the speed of development of integrated circuit fabrication process usually.Especially the manufacturing process of integrated circuit gets into after the nanoscale, and the design rate of integrated circuit has lagged far behind the speed of development of integrated circuit fabrication process.Therefore, for the IC design field, improving design rate is one of current the most urgent problem.As shown in Figure 1, in the prior art, the design of integrated circuit generally includes two parts: first is from based on the description to the RTL level of the arthmetic statement of C language or MATLAB language; Second portion is from rtl description to standard block ASIC structure or the implementation procedure of gate array existing (or other S-ASIC structure) or FPGA structure.The wherein realization of the second portion instrument support of existing comparative maturity at present, its implementation procedure satisfies requirements such as efficient, quick basically.Therefore; The key point that improves design rate has dropped in the realization of first; Just from the arthmetic statement of C language or MATLAB language etc. to the description of RTL level, can this be referred to as the mapping method of integrated circuit lower floor hardware or high-level comprehensive or structural level is comprehensive.
But since the realization of first mainly be by the technician according to self understanding to C language or MATLAB language, be converted into the description of RTL level artificially.That is to say that the realization of first receives the influence of technician's self experience and knowledge level, for the different techniques personnel, there is bigger difference in the time of realization.Implementation for first; Some external companies have launched corresponding research and have pushed away some implementation tools, such as the Catapult C of Mentor, the AutoPilot of AutoESL, the Cynthesizer of Fore Design System, the SPARK of UC San Diego etc.
Summary of the invention
The technical problem underlying that the present invention will solve is, a kind of integrated circuit lower floor's hardware mapping method and device are provided, and can improve the design rate of integrated circuit.
The present invention also provides a kind of Data Control stream temporal constraint method and device; Make the circuit that obtains according to this constrained procedure have regularity; And be applicable to the timing Design and the checking of digital circuit, can help the Hardware Engineer to carry out hardware designs to a greater extent.
For solving the problems of the technologies described above, the technical scheme that the present invention adopts is following:
A kind of integrated circuit lower floor hardware mapping method comprises step:
The process analysis step is used to read the computer language procedure of describing the integrated circuit algorithm, and therefrom identifies execution object and the parameter object that is shone upon;
The Data Control flow graph generates step, is used for execution object that identifies and parameter object are mapped to the respective nodes of the Data Control flow graph of description integrated circuit algorithm;
The operator spacetime diagram generates step; Be used for from the operator cell library of setting up in advance, obtaining at least one operator unit of corresponding function, the Data Control flow graph converted to the operator spacetime diagram of forming by the operator unit according to the function treatment that each node carried out of Data Control flow graph;
The temporal constraint step is used for determining total temporal constraint according to the requirement of user specification requirement and target integrated circuit technology, and each the operator unit label time in the operator spacetime diagram carries out temporal constraint to each level of operator spacetime diagram;
The spacetime diagram compression step is used for according to time-labeling the operator spacetime diagram being carried out the cluster compression on the space, makes overall algorithm execution time approach total temporal constraint most;
Lower floor's hardware mapping step is used for generating integrated circuit lower floor hardware logic according to the operator spacetime diagram after the cluster compression and describes.
Based on above-mentioned method, the present invention also provides a kind of integrated circuit lower floor hardware mapping device, comprising:
The process analysis module is used to read the computerese of describing the integrated circuit algorithm, and therefrom identifies execution object and the parameter object that is shone upon;
Data Control flow graph generation module is used for the execution object that identifies and parameter object are mapped to the respective nodes of the Data Control flow graph of describing the integrated circuit algorithm;
Operator spacetime diagram generation module; Be used for from the operator cell library of setting up in advance, taking out at least one operator unit of corresponding function, the Data Control flow graph converted to the operator spacetime diagram of forming by the operator unit according to the function treatment that each node carried out of Data Control flow graph;
The temporal constraint module is used for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, and each the operator unit label time in the operator spacetime diagram carries out temporal constraint to each level of operator spacetime diagram;
The spacetime diagram compression module is used for according to time-labeling spacetime diagram being carried out the cluster compression on the space, and makes it overall algorithm execution time and approach total temporal constraint most;
Lower floor's hardware mapping block generates integrated circuit lower floor hardware logic according to the spacetime diagram after the cluster compression and describes.
The present invention also provides a kind of Data Control stream temporal constraint method, comprises step:
Total temporal constraint is determined in requirement according to user specification demand and target integrated circuit technology;
With the function is unit, and each the operator unit label time in the operator spacetime diagram of each function carries out temporal constraint according to data flow architecture in the Data Control flow graph to each level of operator spacetime diagram.
Further, each level of operator spacetime diagram is being carried out in the temporal constraint step,, overall temporal constraint is being divided equally each operator level of the correspondence in the spacetime diagram according to the structure of data stream in the Data Control flow graph.
Further, if said data flow architecture is a parallel data stream, then divide total temporal constraint equally in the spacetime diagram of correspondence each operator level, and divide the temporal constraint of each operator level equally in this operator level each operator unit.
Further; If the data stream in the said Data Control flow graph is a serial data stream, then account in each operator level the sequential that the ratio of the pairing sequential summation in operator unit that long arithmetic path shone upon is distributed each operator level according to the pairing sequential in operator unit that the longest arithmetic path shone upon in each operator level.
Based on above-mentioned method, the present invention also provides a kind of Data Control stream temporal constraint device, comprising:
Total sequential is confirmed the unit, is used for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology;
Sequential mark unit is used for each the operator unit label time to the operator spacetime diagram, and each level of operator spacetime diagram is carried out temporal constraint.
Further, said device also comprises judge module, is used for the data flow architecture of judgment data control flow graph, and said sequential mark unit carries out the operation of corresponding sequential mark according to said data flow architecture.
Further; When the said data flow architecture of said judgment unit judges is a parallel data stream; Then said sequential marks the unit and total temporal constraint is divided equally each the operator level in the spacetime diagram of correspondence, and divides the temporal constraint of each operator level equally in this operator level each operator unit.
Further; When the said data flow architecture of said judgment unit judges is a serial data stream, then said sequential mark unit accounts in the child level at long last the ratio of the pairing sequential summation in operator unit that long arithmetic path shone upon with the pairing sequential in operator unit that the longest arithmetic path shone upon in each operator level and distributes the temporal constraint of each operator level.
The invention has the beneficial effects as follows:, identify the execution object and the parameter object of mapping through analysis to former c program or MATLAB program; And the execution object that will identify and parameter object become the Data Control flow graph, and this Data Control flow graph can be represented the algorithm of integrated circuit; Then each node in the Data Control flow graph is substituted the generating operator spacetime diagram with operator; The operator spacetime diagram that generates is through the cluster compression, makes the overall execution time of the spacetime diagram after the compression approach total temporal constraint most; Spacetime diagram after the compression is generated lower floor's hardware circuit of integrated circuit.Thereby created a kind of mapping tool, realized to standardization the process of integrated circuit, implemented convenient and swift from language generation lower floor hardware such as C or MATLAB from computerese to integrated circuit lower floor hardware circuit.
The present invention also through data control stream is carried out temporal constraint, makes the circuit that obtains according to this constrained procedure have regularity.And with the operator is the basic circuit component unit, and the basic sequential of each operator has clear and definite definition, therefore is applicable to the timing Design and the checking of digital circuit, can help the Hardware Engineer to carry out hardware designs to a greater extent.
Description of drawings
Fig. 1 is a method of designing integrated circuit process flow diagram of the prior art;
Fig. 2 is an ADDS operator structural representation;
Fig. 3 is ADDS operator function figure;
Fig. 4 is the universal architecture synoptic diagram of storage class operator;
Fig. 5 is the universal architecture synoptic diagram of class of paths operator;
Fig. 6 is the universal architecture synoptic diagram of control class operator;
Fig. 7 a and Fig. 7 b are respectively the structured flowchart of a kind of embodiment of lower floor of the present invention hardware mapping process flow diagram and integrated circuit lower floor hardware mapping device;
The process flow diagram of Fig. 8 among a kind of embodiment of Data Control flow graph generation method of the present invention function X264_me_search being analyzed;
Fig. 9 is the process flow diagram that Fig. 8 function X264_me_search and pixel_sad_16 * 16 generate the Data Control flow graph;
Figure 10 is the mapping structure figure of the individual layer dynamic circulation statement among a kind of embodiment of the present invention;
Figure 11 is a kind of mapping structure figure of the multilayer dynamic circulation statement among a kind of embodiment of the present invention;
Figure 12 is another mapping structure figure of the multilayer dynamic circulation statement among a kind of embodiment of method of the present invention;
Figure 13 is the structural drawing of the branch's control statement mapping among a kind of embodiment of the present invention;
Figure 14 is the structural drawing of the nested branch control statement mapping among a kind of embodiment of the present invention;
Figure 15 is the Data Control flow graph of the function X264_me_search of present embodiment;
Figure 16 is the Data Control flow graph of function pixel_sad_16 * 16 of present embodiment;
Figure 17 is the structural drawing of a kind of embodiment of order related data flow expansion of the present invention;
Figure 18 expands into local flow's line structure synoptic diagram for the feedback data stream that exists of the present invention by operator;
Figure 19 is the structural drawing of a kind of embodiment of parallel data stream expansion of the present invention;
Figure 20 is the operator spacetime diagram that L0 and the L1 logic of the function x264_me_search of present embodiment generates;
The first of the logic L3 of Figure 21 a and Figure 21 b difference function x264_me_search and the operator spacetime diagram that second portion generates;
Figure 22 is the operator spacetime diagram that the logic L5 of the function X264_me_search of present embodiment generates;
Figure 23 a and Figure 23 b are respectively the function x264_me_search of present embodiment and the operator spacetime diagram of letter pixel_sad_16 * 16;
Figure 24 a and Figure 24 b are respectively the function X264_me_search of present embodiment and the temporal constraint synoptic diagram of function pixel_sad_16 * 16;
Figure 25 is the structured flowchart of a kind of embodiment of Data Control flow graph temporal constraint device of the present invention;
Figure 26 is the spacetime diagram after function pixel_sad_16 * 16 clusters of present embodiment are compressed;
Shown in Figure 27 for solidifying a kind of embodiment schematic diagram of customization;
Figure 28 is the comparison diagram after the X264_me_search cluster compression.
Embodiment
Combine accompanying drawing that the present invention is done further explain through embodiment below.
Look back the development course that method of designing integrated circuit is learned, can see:, the method for designing of elementary cell occurred classifying as with gate array when the epoch of integrated circuit fabrication process entering 1um; When the epoch of integrated circuit fabrication process entering 0.5um, having occurred with the standard block is the method for designing of elementary cell; When the epoch of integrated circuit fabrication process entering 0.18um, having occurred with the IP kernel is the method for designing of elementary cell.This shows: the design methodology of integrated circuit develops along with the development of integrated circuit fabrication process on the one hand, and the unit granularity of the elementary cell of using during method of designing integrated circuit is learned on the other hand (door, standard block, IP kernel) constantly increases.Simultaneously, the appearance of each new elementary cell all indicates the revolutionary progress of method of designing integrated circuit.Therefore; What can rationally predict is; Progress at full speed along with integrated circuit fabrication process over past ten years; Especially after integrated circuit fabrication process gets into nanoscale, more the new situation of IC design will appear and open in the elementary cell of coarsegrain, to adapt to the develop rapidly of integrated circuit fabrication process.
Operator is as the elementary cell in the integrated circuit building block; The granularity of its granularity overgauge unit; Therefore the present invention adopts the method for designing integrated circuit based on operator, makes and has quickened the design rate of integrated circuit, to adapt to the progress of integrated circuit fabrication process.
Among the present invention, operator commonly used has five types, is respectively computing class operator, storage class operator, class of paths operator, control class operator and clock class operator.
1, computing class operator.
Computing operator (AU) is the elementary cell that is used to realize logical operation or arithmetical operation or the hybrid operation of logical and arithmetic.It comprises arithmetic logical unit and computing configuration register; The computing configuration register is used for receiving and storage computing configuration-direct; The arithmetical logic operation that different computing configuration-directs is corresponding different; That is to say, can make same computing operator realize multiple different functions through the computing configuration-direct.Below, be that example describes the computing operator with the ADDS operator.
Fig. 2 is the structural representation of ADDS operator; It comprises the ADD unit and "/" unit that is used to realize shifting function that is used to realize adding reducing; Through the parameter value of control bit X is set; Can make the ADDS operator realize multiple different functions, show the corresponding relation of different control bit X values and different operating in a kind of embodiment such as, the form of Fig. 3.Can realize that the operator of multiple difference in functionality is called the restructural operator through control bit X as ADDS is this, the restructural operator has reduced the operator number that stores in the operator cell library because abundant application function can be used in the different scene.And the restructural operator can also be realized dynamic reconstruct through the mode that changes control bit in its implementation.
2, storage class operator.
Be illustrated in figure 4 as the basic structure synoptic diagram (CU representes to control operator among the figure) of storage class operator (MU).Storage operators comprises stored configuration register (MU configuration register) and storage unit, and storage unit comprises address-generation unit, data-carrier store, data generation unit and data output control unit.The stored configuration register can be through data output control unit configuration store operator (MU) memory bank (various storage mediums: writing and/or playback mode MEM such as register, RAM), working method that can also config memory corresponding address generation unit.Data storage directly will be imported to the precalculated position in address according to address-generation unit generates, and the data of needs are exported from deposit position.
3, class of paths operator.
As shown in Figure 5, be the universal architecture synoptic diagram of class of paths operator (LU).Class of paths operator LU comprises routing configuration register (LU configuration register) and forms the alteration switch and the data register (REG) of Route Selection unit; Wherein, The routing configuration register is controlled the control of operator CU, and the control alteration switch is realized the connection between the nonidentity operation operator AU according to the mode of expectation under the control action of control operator CU.Data register is used for the inputoutput data of temporary computing class operator LU and storage class operator M U.
4, control class operator.
As shown in Figure 6, be the universal architecture synoptic diagram of control class operator (CU).The effect of control class operator mainly is that configuration information is sent to corresponding configuration register, and configuration computing operator AU, storage operators MU and path operator LU realize intended function.The form of control operator CU comprises three kinds of counter, state machine and micro-orders.Wherein the micro-order structure comprises code translator, programmable counter, command memory and streamline control module etc.Control operator CU sends configuration information through carrying out the simple configuration instruction to each functional unit, because the instruction that CU supports seldom, so the order register capacity is little, and code translator is very simple.
5, clock class operator.
The clock operator is used for the clock control signal of computing class operator, storage class operator, class of paths operator and control class operator, and clock signal comprises the signal of start-stop of control clock and control clock frequency, and clock signal can dispose according to the mode of expectation.
More than five class operators are the bases of realizing following embodiment; It is understandable that; Above-mentioned the operator that will be used for IC design according to function is divided into five big types and not exclusive dividing mode, can also carries out the division of wide region more or thinner scope targetedly according to actual conditions.
In an embodiment of the present invention, a kind of mapped system from computerese to integrated circuit lower floor hardware circuit is provided, the integrated circuit lower floor hardware mapping method as Fig. 7 a is depicted as this system may further comprise the steps:
Step S1 analyzes program, promptly reads the computer language procedure of describing the integrated circuit algorithm, from said computer language procedure, identifies execution object and the parameter object that is shone upon according to the rule of this computerese.Special IC be used to realize certain protocol or function, and these functions and agreement is described with computer language procedure at first usually, and computerese wherein adopts C language or MATLAB language etc. usually.The computer language procedure of writing is input in the mapped system of the present invention, this mapped system identifies execution object and the parameter object that is shone upon according to the rule of the used computerese of coding from said computer language procedure again.
This execution object comprises operational order and/or steering order in this embodiment, and this parameter object comprises at least a in input data, output data, the intermediate data.Operational order in the present embodiment comprises and adding, subtracts, takes advantage of and computing such as displacement.
Step S2 generates the Data Control flow graph, and execution object that identifies and parameter object are mapped to the respective nodes in the Data Control flow graph of describing the integrated circuit algorithm.Said operational order is mapped as the processing block diagram, said steering order is mapped as the control stream that is used for identification-state, state transitions condition and state control signal, said input data, output data and intermediate data are mapped as the memory node on the data stream.
Step S3; The operator spacetime diagram generates step; Be used for from the operator cell library of setting up in advance, taking out at least one operator unit of corresponding function, the Data Control flow graph converted to the operator spacetime diagram of forming by the operator unit according to the processing capacity that each node carried out of Data Control flow graph.Earlier the Data Control flow graph is launched according to its data flow dependency, each node after will launching then converts the operator unit that can accomplish this nodal function to.Combination with one or more operators unit replaces each node in the Data Control flow graph, and the combination of one or more operators unit can be accomplished and each node identical functions.For how the Data Control flow graph being launched, include but not limited to following several kinds of modes:, then adopt the mode of streamline to launch the order related data flow if the data stream in the Data Control flow graph is order related data flow structure; If there is feedback in the data stream in the Data Control flow graph; Promptly this data stream is a circulation time; Then there is data dependence in this data stream; Then this data stream can not be converted into flowing structure, if but when not having data dependence between the internal data of this data stream, then do not exist each data stream of data dependence to adopt the mode of local flow's waterline to launch said internal data; If there is not data dependence between the data stream in the Data Control flow graph, then adopts parallel mode to launch this parallel data stream, and convert the operator spacetime diagram of forming by the operator unit to.
Step S4, the temporal constraint step is used for determining total temporal constraint according to the requirement of user specification requirement and target integrated circuit technology, to each the operator unit label time in the operator spacetime diagram.On the other hand, from the operator cell library, can extract the operator time sequence information, the operator spacetime diagram is done the sequential mark, form the object of temporal constraint.Thereby can temporal constraint each level specific to the operator spacetime diagram be realized each level of operator spacetime diagram is carried out temporal constraint according to data flow characteristic.Because operator can constitute different operator function pieces, and then constitutes different operator function groups, then each operator function group is an operator level.
If said data flow architecture is a parallel data stream, then divides total temporal constraint equally be given in the corresponding spacetime diagram each operator level, and divide the temporal constraint of each operator level equally in this operator level each operator unit.The basic sequential unit of the operator that pairing each the operator level of each node of serial in the Data Control stream is total is as overall temporal constraint, according in each operator level the sequential of the computing operator that long arithmetic path shone upon account in each operator level the sequential that the ratio of the sequential summation of the operator unit correspondence that long arithmetic path shone upon is distributed each operator level.
Step S5, the spacetime diagram compression step is used for according to time-labeling spacetime diagram being carried out the cluster compression in space (being on the hardware resource or area), and makes it overall algorithm execution time and approach total temporal constraint most.
In one embodiment, spacetime diagram is compressed may further comprise the steps: in the operator spacetime diagram, find out identical computing class operator of attribute and/or the identical storage class operator of memory attribute; Then according to time-labeling to the identical computing class operator of operational attribute spatially merge the compression and/or the storage class operator that memory attribute is identical spatially merge compression; Introduce the control class operator then, computing class operator after the compression and/or storage class computing operator are generated the corresponding configuration instruction, realize the multiplexing of computing class operator and/or storage class operator.
Cluster compression step and the step that generates restructural operator function piece all can produce not only a kind of result.The same subfunction of different function calls, because confinement time is different, the cluster result that is produced is also different.Therefore need be optimized according to parameters such as time, area, power consumptions; By performance (execution time) discharge order; Just satisfy the cluster result of time-constrain and represent that its hardware realization cost is minimum, therefore select overall algorithm execution time to approach to accomplish the Optimization result of the spacetime diagram of the needed total temporal constraint of integrated circuit algorithm most as the cluster compression.
Step S6, lower floor's hardware mapping step generates integrated circuit lower floor hardware logic according to the spacetime diagram after the cluster compression and describes.
Based on above-mentioned integrated circuit lower floor hardware mapping method, the invention also discloses a kind of integrated circuit lower floor hardware mapping device, please refer to Fig. 7 b, the integrated circuit lower floor hardware mapping device of present embodiment comprises:
Process analysis module 1 is used to read the computer language procedure of describing the integrated circuit algorithm, from said computer language procedure, identifies execution object and the parameter object that is shone upon according to the rule of this computerese; Data Control flow graph generation module 2 is used for execution object that identifies and parameter object are mapped to the Data Control flow graph of describing the integrated circuit algorithm; Operator spacetime diagram generation module 3; Be used for from the operator cell library of setting up in advance, taking out at least one operator unit of corresponding function, thereby convert the Data Control flow graph to form operator spacetime diagram by the operator unit according to the processing capacity that each node carried out of Data Control flow graph; Temporal constraint module 4 is used for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, and each the operator unit label time in the operator spacetime diagram carries out temporal constraint to each level of operator spacetime diagram; Spacetime diagram compression module 5 carries out the cluster compression on the space when being used for according to time-labeling spacetime diagram, and makes it overall algorithm execution time and approach total temporal constraint most; Lower floor's hardware mapping block 6 generates integrated circuit lower floor hardware circuit according to the spacetime diagram after the cluster compression.
Below in conjunction with specific embodiment integrated circuit lower floor's hardware mapping method of the present invention and device are described.
H.264 be the common digital video coding standard of formulating of the common joint video team of setting up (JVT) of International Telecommunication Association (ITU-T) and International Organization for Standardization.X264_me_search function with the H.264 C language description of standard in the present embodiment is an example, and method of the present invention is carried out more detailed explanation.
As shown in Figure 8, for being analyzed, function X264_me_search comprises step:
S11, read computer language procedure, and in this computer language procedure, search function.In the present embodiment, at first the fetch program, go forward side by side lang method and lexical analysis obtain function X264_me_search.
S12, this function is resolved; Obtain the function calling relationship and the parameter object of this function; This parameter object comprises input data, output data, the input constant of this function; And be somebody's turn to do the intermediate data of going up layer functions, and each parameter object of this function is carried out the mark of information such as corresponding data dependence, shared storage, distributed store.In the present embodiment, function x264_me_search analyzed obtain its input variable, output variable, input constant and the output constant is as shown in table 1:
Table 1:
Signal name Data type Direction Explanation
i_pixel Int IN //PIXEL_WxH
lm Int IN //lambda?motion
p_fref uint8_t* IN // reference frame
p_fenc uint8_t* IN // coded frame
i_stride Int IN The width of // image
i_mv_range Int IN The maximum magnitude of // motion vector
mvp[2] Int IN // motion vectors
cost Int OUT //satd+lm*nbits
mv[2] Int OUT // motion vector
In the present embodiment, layer functions x264_me_search internal analysis on this is obtained its built-in variable and constant, as shown in table 2:
Table 2:
Signal name Data type Explanation
i_pixel int // pixel WxH
bcost int // interim optimum estimate point
bmx int // interim motion vector x component
bmy int // interim motion vector y component
p_fref uint8_t* // reference frame address
i_iter int // loop variable
The function calling relationship of the current function of being resolved of S13, basis judges that current function of being resolved comprises layer functions down, then searches down layer functions in this way, and execution in step S14, otherwise end operation.In the present embodiment, if do not comprise layer functions down in the current function of being resolved, then this function also is the bottom function.This function is being carried out in the resolving, function pixel_sad_16 * 16 that obtained this function call, then function x264_me_search is last layer functions, function pixel_sad_16 * 16 are layer functions down.In the present embodiment, if do not comprise layer functions down in the current function of being resolved, then this function also is the bottom function.
S14, following layer functions is analyzed, obtained this lower floor's function parameters object, comprise input data and output data, and each parameter object of this time layer functions is carried out the mark of information such as data dependence, shared storage, distributed store.In the present embodiment, following layer functions pixel_sad_16 * 16 are analyzed, obtained the input and output of this time layer functions pixel_sad_16 * 16, as shown in table 3:
Table 3:
Signal name Data type Direction Explanation
pix1 uint8_t* IN // input pixel value 1
i_stride_pixl int IN // pixel value 1 storage line width
pix2 uint8_t* IN // input pixel value 2
i_stride_pix2 int IN // pixel value 1 storage line width
i_sum int OUT //SAD result of calculation
In this embodiment, data dependence is meant analyzes the relation that is associated between the variable draw and/or the constant, comprises that computing is relevant relevant with storage.Wherein, computing is relevant to be that input and output are relevant, and the output variable of process computing operator or output signal correction are in its input variable or input signal; Storage is relevant comprise that write-read is relevant, read-write is relevant with write relevant; Wherein, Relevant being meant to the variable proper operation of same memory address of write-read read for writing earlier in proper order again, promptly reads variable and is relevant to and writes variable, if traffic error promptly takes place write-after-read in operation; Read-write is relevant to be meant that the variable proper operation to same memory address writes for reading earlier in proper order again, promptly writes variable and is relevant to and reads variable, is capped and makes a mistake if write-then-read in operation is a read data; Writing the relevant variable that is directed against same memory address that is meant will write according to the proper operation order, promptly writes to have correlativity between the variable.
In this embodiment, shared storage refers to be connected between each storage of same module and can visit each other, and promptly the data between them can be through sharing the storage mode exchange.And distributed store refers to that each module has the storage of its independent allocation, the storage of each other module of module inaccessible, and the data communication between them can only be mutual through communication port.In hardware designs, the shared storage distributed store that compares has increased extra interconnect resources, therefore when generating algorithm Data Control flow graph, will tell according to its content and share storage and distributed store.
As shown in Figure 9, according to execution object and the parameter object that analysis obtains, generate the Data Control flow graph and comprise step:
S21, the parameter object that identification is obtained are mapped as the memory node on the data stream, and distinguish sharing storage and distributed store according to the canned data of mark.
S22, according to the mark data dependence, the operational order in the function is mapped as the processing block diagram in the Data Control flow graph.Computings such as the operational order in the present embodiment comprises and adding, subtracts, displacement.
S23, according to the data dependence of mark, the steering order in the function is mapped as the control stream that is used for identification-state, state transitions condition and state control signal in the Data Control flow graph.
In the present embodiment, steering order comprises call relation between the function, recursion instruction etc.Loop statement comprises quiet cycle and dynamic circulation statement, and wherein the dynamic circulation statement comprises dynamic circulation, individual layer dynamic circulation and the multilayer dynamic circulation that can be changed into quiet cycle again.Quiet cycle refers to that loop variable is a constant; The dynamic circulation that can be changed into quiet cycle refers to that cycle index is a variable; But confirm in a single day that when the occasion of its application then its loop variable is also just confirmed, the occasion of promptly in this circulates, using is confirmed; Then its cycle index also just becomes constant, thereby becomes quiet cycle by dynamic circulation; The individual layer dynamic circulation refers to that cycle index is a variable, and does not have nested other circulations; The multilayer dynamic circulation refers to that cycle index is a variable, and is nested with interior loop.
In the present embodiment, when loop statement is quiet cycle, is mapped as control stream by this quiet cycle and comprises step:
S2411, loop body is launched, obtain new loop body with the cycle index equivalent number according to cycle index.Comprise operation expression in each loop body, and between each operation expression common parameter object is arranged.
In the present embodiment, include three loop statements among the function static void predict_16x16_dc:
static?void?predict_16x16_dc(uint8_t*src,int?i_stride)
{
int?dc=0;int?i,j;
for(i=0;i<16;i++)
{dc+=src[-1+i*i_stride];dc+=src[i-i_stride];}
dc=(dc+16)>>5;
for(i=0;i<16;i++)
{
for(j=0;j<16;j++)
{src[j]=dc;}
src+=i_stride;
}
}
Thus, can know first loop unrolling is obtained new operation expression, be respectively:
dc=dc+src[-1];dc=dc+src[-i_stride];dc=dc+src[-1+i_stride];dc=dc+src[1-i_stride];......dc=dc+src[-1+15*i_stride];dc=dc+src[15-i_stride];dc=(dc+16)>>5。
Second and the 3rd loop unrolling obtain operation expression respectively and are:
src[0]=dc;
src[15]=dc;
src[0+i_stride]=dc;
src[15+i_stride]=dc;
src[0+15*i_stride]=dc;
src[15+15*i_stride]=dc;
S2412, the new expression formula of launching to obtain is carried out iteration, thereby obtain a new operation expression according to parameter object.In the present embodiment, can know, these operation expressions carried out iteration, obtain a new expression formula about dc according to index dc by the new operation expression that first above-mentioned loop unrolling obtains:
dc=(0+src[-1]+src[-i_stride]+src[-1+i_stride]+src[1-i_stride]+...+src[-1+15*i_stride]+src[15-i_stride]+16)>>5。
S2413, the operational order in will this new operation expression are mapped as the processing block diagram, and the parameter object in the operation expression is mapped as the memory node on the data stream.In the present embodiment, when loop statement is when can be changed into the dynamic circulation statement of quiet cycle, because its environment of applications is in case confirm that then its loop variable just becomes constant, then its corresponding mapping step is identical with the mapping step of quiet cycle statement.
In the present embodiment, when loop statement shone upon for the individual layer dynamic circulation, then the generation step of control stream comprised:
S2421a, circulating content is mapped as the processing block diagram, recursion instruction is mapped as state machine.Present embodiment is mapped as the mode of state machine with recursion instruction, thereby data stream is divided into two states through respectively circulating content being mapped as the processing block diagram: order status (sequence) A and recurrent state (loop) A, and shown in figure 10.
In the present embodiment, when circulation was the multilayer dynamic circulation, then the generation method of control stream comprised:
S2421b, respectively with the content map of interior loop statement and outer loop statement for handling block diagram A and handling block diagram B; Respectively the skin circulation is mapped as state machine B; Interior loop is mapped as state machine A, and this processing block diagram A and state machine A be mapped in and handle in the block diagram B, shown in figure 11.That is to say in the present embodiment, adopt each circulation in the multilayer dynamic circulation all is mapped as state machine, and the control that the nested mode of carrying out state machine generates corresponding Data Control stream is flowed.
Certainly in the present embodiment; This multilayer dynamic circulation also can adopt the mode that interior loop and outer circulation is mapped as a unified state machine; Shown in figure 12, if still nested loop then is mapped as the processing block diagram with circulating content for the circulation of N layer; Circulation is mapped as a unified state machine, and the status number of state machine is N+1.That is to say in the present embodiment that when the multilayer dynamic circulation adopted the unification state machine, the status number of state machine equaled cycle index and adds one.
In the present embodiment, comprise branch's control statement according to the control statement in the computer language procedure, and branch's control statement comprises single branch control statement and nested branch control statement.
In the present embodiment, when this branch's control statement is single branch control statement, then:
S2421b, branch's steering order is mapped as MUX.
S2422b is the input end processing block diagram of MUX with the control statement content map.S2423b, the control end that controlled condition is mapped as MUX are handled block diagram, obtain the structural drawing of this branch's control statement at last, and be shown in figure 13.In the present embodiment, if this control statement content then is mapped as memory node with it or/and controlled condition is variable.
In the present embodiment, when this branch's control statement is nested branch control statement,, therefore obtain the upper strata control statement and the bottom control statement of this nested branch control statement owing to function is analyzed before, then:
S2421c, upper strata branch steering order is mapped as MUX 1; With the content map of upper strata branch control statement is the input end processing block diagram of MUX 1, and the control end that the controlled condition in the upper strata branch control statement is mapped as this MUX 1 is handled block diagram.
S2422c; Branch of lower floor steering order is mapped as MUX 2; With the content map of branch of lower floor control statement is the input end processing block diagram of MUX 2, and the control end that the controlled condition in branch of the lower floor control statement is mapped as MUX 2 is handled block diagram.
S2423c selects multichannel 1 the output input as MUX 2, thereby obtains the structural drawing of this nested branch control statement, and is shown in figure 14.
Certainly in the present embodiment, if this control statement content then is mapped as the memory node on the data stream with it or/and controlled condition is variable.
Based on above-mentioned method, thereby obtain respectively corresponding to function X264_me_search and function pixel_sad_16 * 16, the Data Control flow graph, like Figure 15 and shown in Figure 16.
Control expression formula conversion method in the present embodiment is the core of all linguistic expression's conversions, and its conversion efficiency directly influences the data volume of generating algorithm Data Control flow graph.The conversion method target that present embodiment proposes is the hardware realization, therefore in advance concurrency factor, hardware configuration such as state machine, MUX etc. is taken into account, can help the Hardware Engineer to carry out hardware designs to a greater extent.
Data Control flow graph based on said method generates carries out more detailed explanation below in conjunction with specific embodiment to generating operator space-time drawing generating method.
Transfer principle by Data Control flow graph generating operator spacetime diagram is: computing comprises 1 with the storage implementation structure, and same storer is shared in input and output, and the arithmetic section restructural is promptly based on the data stream form of processor; 2 pipeline datas stream form; 3 parallel data stream forms.Wherein first kind form is general form, and its data stream form is serial in time, and will be according to the sequential demand with its parallelization in from the higher level lanquage to the hardware conversion.
Therefore, the principle of in the present embodiment data stream being launched is:
I, order related data flow structure: utilize the operator unit to launch, this expansion can be implemented streamline, and flow beat was calculated according to the longest processing time, as data storage between streamline, adjusts the streamline beat by storer.For example, the execution time of FunA, Func B and Func C is respectively 3,4 and 5 unit interval; The input bandwidth of Func A is 3data/cycle; Output bandwidth is 2data/cycle, and the input bandwidth of FuncB is 5data/cycle, and output bandwidth is 4data/cycle; The input bandwidth of Func C is 7data/cycle; Output bandwidth is 6data/cycle, and raw data A_OUT, B_IN, B_OUT, C_IN, C_OUT and A_IN are shared same storage, and arithmetic section is reconstruct then.Data stream relation is: A_OUT=B_IN, when B_OUT=C_IN, and C_OUT ≠ A_IN; Be of the input of the output data of Func A as Func B; The output data of Func B is as the input of Func C, and do not have the feedback of data, and then this data stream can be converted into pipeline organization.Wherein, flow beat was calculated according to the longest processing time, as interrupting, adjusted the streamline beat by storage operators, and is shown in figure 17.
II, there is feedback data stream: when data stream is a circulation; Can not be converted into pipeline organization for the data stream that data dependence is arranged; If have data dependence but be stored between each batch data of same shared storage; And the inside flowing water of each batch data can be realized handling when not having data dependence in each batch data inside, can reduce the bandwidth of storage like this.For example, data A_IN is by A_IN_0, and A_IN_1 and A_IN_2 form; Though A_IN integral body depends on the output of C_OUT, A_IN_0 does not have data dependence between A_IN_1 and the A_IN_2; Therefore can utilize inner flowing water with A_IN_0, A_IN_1 and A_IN_2 do in batches, obtain being FuncA again behind the complete C_OUT; Thereby obtain local flow's waterline formal transformation structure, shown in figure 18.
III, parallel data flow structure: owing to there is not the independent expansion that can walk abreast of inputoutput data correlativity, hardware.For example, as A_IN ≠ A_OUT ≠ B_OUT ≠ C_OUT, B_IN ≠ A_OUT ≠ B_OUT ≠ C_OUT; During C_IN ≠ A_OUT ≠ B_OUT ≠ C_OUT; Be Func A, Func B, any input of Func C is all uncorrelated with their output; Thereby data stream is expanded into parallel form, shown in figure 19.
Based on above-mentioned principle, convert the corresponding operators spacetime diagram into by the Data Control flow graph and specifically comprise step:
S31, set up the operator cell library in advance.
S32, according to the mark shared canned data and distributed store information, inputoutput data and intermediate data are stored.
S33, existence order related data flow then expands into pipeline organization with this data stream in the Data Control flow graph.
S34, in the operator cell library, take out with launch after the operator unit of each node corresponding function.When the processing capacity that node carried out in the Data Control flow graph wherein is simple; Then only need an operator or two operators in the corresponding operator cell library; When yet the processing of carrying out when its functional module is complicated; Then need be corresponding to a plurality of operators in the operator cell library, and replace node corresponding in the Data Control flow graph with the combination of these a plurality of operators.In the present embodiment, arithmetic logic L0 and the L1 of function x264_me_search are respectively: and L0 logic: bmx=x264_clip3 ((m->mvp [0]+2)>>2 ,-m->i_mv_range, m->i_mv_range); L1 logic: bmy=x264_clip3 ((m->mvp [1]+2)>>2 ,-m->i_mv_range, m->i_mv_range).
Logic L0 and logic L1 are order related data flow structure in the present embodiment, and it is shown in figure 20 therefore it to be expanded into the operator spacetime diagram, wherein, the configuration signal that X0 and X1 produce for the control operator, its control stream dots.Rectangle among the figure is handled the computing operator or the storage operators of block diagram representative mapping, and their interconnection is accomplished by the link operator.
In the present embodiment, not only there is the alphabetic data dependency structure between the data stream, also possibly has the feedback arrangement of data stream; Execution in step: S35 then; According to canned data, judge between the inner data of each batch data be stored in same shared storage whether have data dependence, if there is not data dependence; Then can realize handling the inside flowing water of this batch data, thereby reduce the bandwidth of storage.
In the present embodiment, the arithmetic logic L3 of function x264_me_search comprises two parts, is respectively: first:
bcost=h->pixf.sad[i_pixel](m->p_fenc,m->i_stride,p_fref,(m->i_stride)
*5);
for(i_iter=0;i_iter<16;i_iter++)
{int?best=0;int?cost[4];
(cost[0])=h->pixf.sad[i_pixel](m->p_fenc,
m->i_stride,&p_fref[(-1)*m->i_stride*5+(0)],m->i_stride*5)+m->lm*
(bs_size_se(((bmx+(0))<<2)-m->mvp[0])+bs_size_se(((bmy+(-1))<<2)-
m->mvp[1]));
(cost[1])=h->pixf.sad[i_pixel](m->p_fenc,
m->i_stride,&p_fref[(1)*m->i_stride*5+(0)],m->i_stride*5)+m->lm*
(bs_size_se(((bmx+(0))<<2)-m->mvp[0])+bs_size_se(((bmy+(1))<<2)-
m->mvp[1]));
(cost[2])=h->pixf.sad[i_pixel](m->p_fenc,
m->i_stride,&p_fref[(0)*m->i_stride*5+(-1)],m->i_stride*5)+m->lm*
(bs_size_se(((bmx+(-1))<<2)-m->mvp[0])+bs_size_se(((bmy+(0))<<2)-
m->mvp[1]));
(cost[3])=h->pixf.sad[i_pixel](m->p_fenc,
m->i_stride,&p_fref[(0)*m->i_stride*5+(1)],m->i_stride*5)+m->lm*
(bs_size_se(((bmx+(1))<<2)-m->mvp[0])+bs_size_se(((bmy+(0))<<2)-
m->mvp[1]));
Second portion:
if(cost[1]<cost[0])best=1;
if(cost[2]<cost[best])best=2;
if(cost[3]<cost[best])best=3;
if(bcost<=cost[best])
break;
bcost=cost[best];
Wherein, First is the order related data flow, existence order related data flow in each batch data in the second portion, and have feedback; Then at first generate inner streamline; Feeding back conversion, then two parts convert the spacetime diagram of being made up of the operator unit into, respectively shown in Figure 21 a and Figure 21 b.
In the present embodiment, be stored between each batch data stream in the same shared storage and also possibly do not have data dependence, promptly separate between each batch data stream, then with this data stream parallel expansion.
S36 carries out parallel pipeline with data stream.In the present embodiment, the arithmetic logic L5 of function x264_me_search is: m->mv [0]=bmx<<2; M->mv [1]=bmy<<2; Then this logic L5 is a parallel pipeline, and then its generating operator spacetime diagram is shown in figure 22.
Through mentioned above principle, the operator spacetime diagram of being changed by the Data Control flow graph of function x264_me_search and function pixel_sad_16 * 16 is respectively shown in Figure 23 a and Figure 23 b.Wherein, Owing to called function pixel_sad_16 * 16 in the original program of function X264_me_search; So in Figure 23 a, clearly do not indicate the concrete operator spacetime diagram of function pixel_sad_16 * 16, the operator spacetime diagram of function pixel_sad_16 * 16 is shown in Figure 23 b.
Data Control flow graph based on above-mentioned method generates describes in detail to the method that this Data Control flow graph applies temporal constraint below in conjunction with specific embodiment.
The data flow graph is applied the method for temporal constraint, is divided into two stages:
I, the requirement specification definition of confirming algorithm and target integrated circuit technology, and adopt the basic sequential unit of different operators according to different process.II, to be unit with the function carry out the mark of temporal constraint to the data flow graph, can obtain from the temporal constraint of the downward layer functions of top layer according to its data dependence and variable storage information.
Based on mentioned above principle, in the present embodiment, the method that data control flow graph is applied temporal constraint comprises step:
S41, determine total temporal constraint according to the requirement of user specification demand and target integrated circuit technology.In the present embodiment, the X264 code is described for the c program of algorithm H.264, if the algorithm requirements specification is defined as 720p60fps (video resolution is 1280 * 720 pixels, handles 60 frame video datas p.s.).If the target integrated circuit technology is a 130nm technology, call the operator technology library, can obtain standard unit's sequential is 5ns, but promptly utilizes operator technology library generative circuit dominant frequency to be 200MHz.If the target integrated circuit technology is a 65nm technology, call the operator technology library, can obtain standard unit's sequential is 2.5ns, but promptly utilizes operator technology library generative circuit dominant frequency to be 400MHz.
In the present embodiment, obtain total temporal constraint according to step S41 after, different data flow structure in the data control flow graph is carried out temporal constraint comprises two kinds of methods, be respectively according to parallel data stream and carry out temporal constraint; Carry out temporal constraint according to serial data stream.
When the data stream in the Data Control flow graph was parallel data stream, step S42 comprised step:
S421a, total temporal constraint divided equally each the operator level in the spacetime diagram of correspondence, and divide the temporal constraint of each operator level equally in this operator level each operator unit.Be example with algorithm H.264 in the present embodiment, in the frame level was handled, its constraint came from the algorithm requirements specification and is defined as 720p60fps, and promptly per second is handled 60 frames.If selecting technology for use is 130nm technology, but utilize operator technology library generative circuit dominant frequency to be 200MHz, the temporal constraint that promptly obtains this level is 200MHz/60 frame=3.33M cycle/ frame.In the macroblock level data processing; Because each macro block processing sequence is a serial processing; Therefore the temporal constraint condition is distributed according to serial, and promptly every frame is made up of 80 * 45=3600 macro block, and the processing power that requires each macro block is 1280 * 720 * 60/256=216000MB/s; Ordinal number was 200MHz/216000MB/s=926cycle/MB when each macro block was handled required operator unit, and the sequential that is about to the computing of frame level is divided equally to each macro block: 3.33M cycle/ frame/3600=926cycle/MB.Because the macro block in the video encoder is divided into inter prediction, infra-frame prediction, transform and quantization, entropy coding, block elimination filtering etc.; These several macro blocks are handled a kind of parallel form that is operating as; Therefore the temporal constraint of each macro block also is 926cycle/MB, thereby has generated the temporal constraint of bottom layer treatment module successively downwards.
In the present embodiment, when serial data was carried out temporal constraint, step S42 comprised step:
S421b; The basic sequential unit of the operator that pairing each the operator level of each node of serial in the Data Control stream is total is as overall temporal constraint, according in each operator level the sequential of the computing operator that long arithmetic path shone upon account in each operator level the sequential that the ratio of the sequential summation of the operator unit correspondence that long arithmetic path shone upon is distributed each operator level.For example when two operator level A be serial with the relation of B in the Data Control flow graph, and their overall temporal constraint be they N basic sequential of operator and.The corresponding sequential of computing operator so that the longest arithmetic path was shone upon among operator level A and the B is a benchmark, draws ratio in this benchmark and distributes N the basic sequential unit of operator.The computing operator number that for example the longest arithmetic path shone upon among the operator level A is a Ma operator, the basic sequential unit of a corresponding Na operator of institute; The computing operator number that the longest arithmetic path shone upon among the operator level B is a Mb operator; The basic sequential unit of a corresponding Nb operator of institute, the temporal constraint that is assigned to so on the operator level A is: the temporal constraint that is assigned on the operator level B is:
Figure BDA0000042571490000152
In the present embodiment, the computing operator can be that many basic sequential units carry out, so the operator number is not necessarily consistent with the basic sequential unit of operator.
S422b, serial data stream contains the sign of parallel data in the Data Control flow graph, then according to the sign of this parallel data temporal constraint among the step S421b is revised.For example, modules A can obtain information p-x-n-m, and wherein p is the concurrency sign, and x is the concurrency kind, comprises types such as pipeline parallel method, local pipeline parallel method, data parallel; N is and line position, promptly walks abreast n the processing stage; M is and line number.According to above information, the temporal constraint of modules A can be adjusted into β wherein nFor being in the empirical parameter of stage n parallelization, N PnFor being in the basic sequential unit of processing operator of stage n.
Based on above-mentioned method, the structure during to the structure shown in Figure 23 a and Figure 23 b behind the row sequential mark is shown in Figure 24 a and Figure 24 b.
Based on above-mentioned Data Control stream temporal constraint method, present embodiment also provides a kind of Data Control stream temporal constraint device.Please refer to Figure 25; Flow the structured flowchart of temporal constraint for the Data Control of present embodiment; The Data Control stream temporal constraint device of present embodiment comprises: total temporal constraint is confirmed unit 1, is used for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology; Sequential mark unit 2 is used for each the operator unit label time to the operator spacetime diagram, and each level of operator spacetime diagram is carried out temporal constraint; Judging unit 3; The data flow architecture that is used for judgment data control flow graph; When judging unit 3 judgment data flow structures are parallel data stream; Then sequential mark unit 2 is divided total temporal constraint equally each the operator level in the spacetime diagram of correspondence, and divides the temporal constraint of each operator level equally in this operator level each operator unit; When judging unit 3 judgment data flow structures are serial data stream, then sequential mark unit 2 accounts in the child level at long last the ratio of the pairing sequential summation in operator unit that long arithmetic path shone upon with the pairing sequential in operator unit that the longest arithmetic path shone upon in each operator level and distributes the temporal constraint of each operator level.
Because structure has been called the operator structure of function pixel_sad_16 * 16 shown in Figure 24 a, so earlier structure shown in Figure 24 b is compressed, the result after the compression is shown in figure 26; And then row compression during to structure shown in Figure 24 a, thereby the spacetime diagram of the function X264_me_search after obtaining compressing according to the method that spacetime diagram shown in Figure 26 is compressed.
When structure shown in 24a and Figure 24 b being carried out the spacetime diagram compression, mainly follow following principle:
1, the computing class operator that operational attribute is identical in the operator spacetime diagram is carried out the cluster compression.Such as, can two parallel add operation operators be compressed into an add operation operator in the spacetime diagram, the addition operator after realizing compressing through the mode of introducing the control operator simultaneously multiplexing, accomplish with compression before two addition operator identical functions.This shows that after the operator spacetime diagram was compressed, the number of operator can significantly reduce, thereby had practiced thrift the area of integrated circuit, correspondingly, the operator after the compression is realized multiplexing through the control operator, has increased the execution time of integrated circuit overall algorithm.It is understandable that; Cluster compression to the computing class operator must cause that storage class operator, control class operator, class of paths operator and clock class operator also correspondingly change; So can also do corresponding cluster compression with further saving integrated circuit area according to actual conditions to above-mentioned operator, storage class operator especially wherein.
2, when introducing the control operator, generate corresponding configuration-direct, said configuration-direct is used to control the operator of generation works according to predetermined mode, thereby realizes the identical function with the compression pre-operator.
3, have multiple for the possible cluster compression result of same operator spacetime diagram.Therefore; In compression process; Select the compression back spacetime diagram overall algorithm execution time near the spacetime diagram of confinement time as final compression result; Select the overall algorithm execution time near the spacetime diagram of confinement time as compression result, can satisfy under the situation of sequential condition guaranteeing, practice thrift the area of integrated circuit the biglyyest.Be the integrated circuit maximum execution time that calculates according to the performance index that the user proposes confinement time.
Above-mentioned to after the compression of spacetime diagram cluster, can reduce the area and the power consumption of integrated circuit.And the operator that cluster compression back generates has certain regularity.
Above-mentioned spacetime diagram is carried out cluster compression after, row is optimized in the time of can also be to wherein some operator, a kind of mode of optimization is that row solidifies customization during to some operator.Such as, the left side is the computing class operator after a kind of compression among Figure 27, because logic unit does not wherein use, so obtain the operator structure shown in the right among Figure 27 after can the logic unit of this operator being removed, has dwindled the area of operator further.Again such as; For ADDS operator shown in Figure 2, owing to, can make this operator realize different functions through changing the value of control bit x; And in a certain concrete integrated circuit; In fact only used the addition shifting function of this operator, then can the value of the control bit x of this operator be fixed as 000, practiced thrift power consumption of integrated circuit thereby satisfy under the prerequisite of functional requirement.Method through solidifying customization can the minimal hardware that the realization identical function is operated only occur in the circuit, then these minimal hardware is carried out full custom design, makes it not have other expanded function.Like this, both can guarantee the correct execution of algorithm, can optimize the area and the power consumption of integrated circuit again.
After spacetime diagram before and after the X264_me_search function cluster compression analyzed, can obtain form shown in Figure 28.Can find out that from this form through after the cluster compression, used operator number reduces to 139 by 1724, though 500 cycles being increased to from 292 cycles of sequential correspondingly, 500 cycles are still less than constraint cycle 600 of X264_me_search.This shows, after the cluster compression, satisfy under the condition of temporal constraint, can also reduce used operator number significantly, thereby practice thrift the area and the power consumption of integrated circuit in the execution time that guarantees overall algorithm.
Based on above-mentioned method; Through the X264_me_search function with the H.264 C language description of standard is carried out process analysis; And obtain the Data Control flow graph of function X264_me_search shown in figure 15 after shining upon; Converting the operator spacetime diagram shown in Figure 23 a and Figure 23 b into by this Data Control flow graph; According to information such as data dependences data control flow graph is carried out temporal constraint and obtain the temporal constraint figure shown in Figure 24 a and Figure 24 b; According to time-labeling spacetime diagram is carried out the spacetime diagram after cluster is compressed the cluster compression that obtains function pixel_sad_16 * 16 shown in figure 26 then, last again by the spacetime diagram generation integrated circuit lower floor hardware circuit after this compression.Because in to the computer language procedure analytic process; But obtain data dependence, data concurrency and corresponding control information; And this program is mapped as the Data Control flow graph; Thereby but the Data Control flow graph that obtains comprises data dependence, data concurrency and corresponding control information, thereby the ancillary hardware circuit designer is carried out circuit design effectively.And the device of present embodiment can be realized computer language procedure is mapped as the Data Control flow graph automatically, thereby converts the operator spacetime diagram again into, greatly promotes integrated circuit hardware configuration design efficiency.
Method from computer language procedure generation Data Control flow graph provided by the invention not only is adapted to the foregoing description, also can be used for other and generates the mapping process that integrated circuit lower floor hardware logic is described by computer language procedure.
Above content is to combine concrete embodiment to the further explain that the present invention did, and can not assert that practical implementation of the present invention is confined to these explanations.For the those of ordinary skill of technical field under the present invention, under the prerequisite that does not break away from the present invention's design, can also make some simple deduction or replace, all should be regarded as belonging to protection scope of the present invention.

Claims (10)

1. integrated circuit lower floor hardware mapping method is characterized in that comprising:
The process analysis step; Be used to read the computer language procedure of describing the integrated circuit algorithm; Rule according to this computerese identifies execution object and the parameter object that is shone upon from said computer language procedure; Said execution object comprises operational order and/or steering order, and said parameter object comprises at least a in input data, output data, the intermediate data;
The Data Control flow graph generates step; Be used for said operational order is mapped as the processing block diagram; Said steering order is mapped as the control stream that is used for identification-state, state transitions condition and state control signal, said input data, output data and intermediate data is mapped as the respective nodes in the Data Control flow graph of describing the integrated circuit algorithm;
The operator structural drawing generates step; Be used for from the operator cell library of setting up in advance, taking out at least one operator unit of corresponding function, the Data Control flow graph converted to the operator spacetime diagram of forming by the operator unit according to the function treatment that each node carried out of Data Control flow graph;
The temporal constraint step is used for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, and each the operator unit label time in the operator spacetime diagram carries out temporal constraint to each level of operator spacetime diagram;
Operator spacetime diagram compression step, be used for according to time-labeling the operator spacetime diagram is carried out on the hardware resource or area on the cluster compression, and make it overall algorithm execution time and approach total temporal constraint most;
Lower floor's hardware mapping step generates integrated circuit lower floor hardware circuit logical description according to the operator spacetime diagram after the cluster compression.
2. integrated circuit lower floor hardware mapping device is characterized in that comprising:
The process analysis module; Be used to read the computer language procedure of describing the integrated circuit algorithm; Rule according to this computerese identifies execution object and the parameter object that is shone upon from said computer language procedure; Said execution object comprises operational order and/or steering order, and said parameter object comprises at least a in input data, output data, the intermediate data;
Data Control flow graph generation module; Be used for said operational order is mapped as the processing block diagram; Said steering order is mapped as the control stream that is used for identification-state, state transitions condition and state control signal, said input data, output data and intermediate data is mapped to the respective nodes in the Data Control flow graph of describing the integrated circuit algorithm;
Operator structural drawing generation module; Be used for from the operator cell library of setting up in advance, taking out at least one operator unit of corresponding function, the Data Control flow graph converted to the operator spacetime diagram of forming by the operator unit according to the function treatment that each node carried out of Data Control flow graph;
The temporal constraint module is used for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, and each the operator unit label time in the operator spacetime diagram carries out temporal constraint to each level of operator spacetime diagram;
Operator spacetime diagram compression module, be used for according to time-labeling the operator spacetime diagram is carried out on the hardware resource or area on the cluster compression, and make it overall algorithm execution time and approach total temporal constraint most;
Lower floor's hardware mapping block generates integrated circuit lower floor hardware circuit logical description according to the operator spacetime diagram after the cluster compression.
3. a Data Control flows the temporal constraint method, it is characterized in that comprising:
Read the computer language procedure of describing the integrated circuit algorithm; Rule according to this computerese identifies execution object and the parameter object that is shone upon from said computer language procedure; Said execution object comprises operational order and/or steering order, and said parameter object comprises at least a in input data, output data, the intermediate data;
Said operational order is mapped as the processing block diagram; Said steering order is mapped as the control stream that is used for identification-state, state transitions condition and state control signal, said input data, output data and intermediate data is mapped as the respective nodes in the Data Control flow graph of describing the integrated circuit algorithm;
At least one operator unit according to the taking-up corresponding function from the operator cell library of setting up in advance of the function treatment that each node carried out in the Data Control flow graph converts the Data Control flow graph to the operator spacetime diagram of being made up of the operator unit;
Total temporal constraint is determined in requirement according to user specification demand and target integrated circuit technology;
With the function is unit, and each the operator unit label time in the operator spacetime diagram of each function carries out temporal constraint according to data flow architecture in the Data Control flow graph to each level of operator spacetime diagram;
According to time-labeling the operator spacetime diagram is carried out on the hardware resource or area on the cluster compression, and make it overall algorithm execution time and approach total temporal constraint most;
Generate integrated circuit lower floor hardware circuit logical description according to the operator spacetime diagram after the cluster compression.
4. method as claimed in claim 3; It is characterized in that; Each level of operator spacetime diagram is being carried out in the temporal constraint step,, overall temporal constraint is being divided equally each operator level of the correspondence in the operator spacetime diagram according to the structure of data stream in the Data Control flow graph.
5. method as claimed in claim 4; It is characterized in that; If said data flow architecture is a parallel data stream; Then divide total temporal constraint equally in the operator spacetime diagram of correspondence each operator level, and divide the temporal constraint of each operator level equally in this operator level each operator unit.
6. method as claimed in claim 4; It is characterized in that; If the data stream in the said Data Control flow graph is a serial data stream, then account in each operator level the sequential that the ratio of the pairing sequential summation in operator unit that long arithmetic path shone upon is distributed each operator level according to the pairing sequential in operator unit that the longest arithmetic path shone upon in each operator level.
7. a Data Control flows the temporal constraint device, it is characterized in that comprising:
The process analysis unit; Be used to read the computer language procedure of describing the integrated circuit algorithm; Rule according to this computerese identifies execution object and the parameter object that is shone upon from said computer language procedure; Said execution object comprises operational order and/or steering order, and said parameter object comprises at least a in input data, output data, the intermediate data;
Data Control flow graph generation unit; Be used for said operational order is mapped as the processing block diagram; Said steering order is mapped as the control stream that is used for identification-state, state transitions condition and state control signal, said input data, output data and intermediate data is mapped as the respective nodes in the Data Control flow graph of describing the integrated circuit algorithm;
Operator structural drawing generation unit; Be used for from the operator cell library of setting up in advance, taking out at least one operator unit of corresponding function, the Data Control flow graph converted to the operator spacetime diagram of forming by the operator unit according to the function treatment that each node carried out of Data Control flow graph;
Total sequential is confirmed the unit, is used for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology;
Sequential mark unit, being used for the function is unit, each the operator unit label time in the operator spacetime diagram of each function carries out temporal constraint according to data flow architecture in the Data Control flow graph to each level of operator spacetime diagram;
Operator spacetime diagram compression unit, be used for according to time-labeling the operator spacetime diagram is carried out on the hardware resource or area on the cluster compression, and make it overall algorithm execution time and approach total temporal constraint most;
Lower floor's hardware map unit generates integrated circuit lower floor hardware circuit logical description according to the operator spacetime diagram after the cluster compression.
8. device as claimed in claim 7 is characterized in that, also comprises judging unit, is used for the data flow architecture of judgment data control flow graph, and said sequential mark unit carries out the operation of corresponding sequential mark according to said data flow architecture.
9. device as claimed in claim 8; It is characterized in that; When the said data flow architecture of said judgment unit judges is a parallel data stream; Then said sequential marks the unit and total temporal constraint is divided equally each the operator level in the operator spacetime diagram of correspondence, and divides the temporal constraint of each operator level equally in this operator level each operator unit.
10. like each described device in the claim 7 to 9; It is characterized in that; When the said data flow architecture of said judgment unit judges is a serial data stream, then said sequential mark unit accounts in the child level at long last the ratio of the pairing sequential summation in operator unit that long arithmetic path shone upon with the pairing sequential in operator unit that the longest arithmetic path shone upon in each operator level and distributes the temporal constraint of each operator level.
CN201010619849A 2010-12-31 2010-12-31 Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow Expired - Fee Related CN102043886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010619849A CN102043886B (en) 2010-12-31 2010-12-31 Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010619849A CN102043886B (en) 2010-12-31 2010-12-31 Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow

Publications (2)

Publication Number Publication Date
CN102043886A CN102043886A (en) 2011-05-04
CN102043886B true CN102043886B (en) 2012-10-24

Family

ID=43910022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010619849A Expired - Fee Related CN102043886B (en) 2010-12-31 2010-12-31 Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow

Country Status (1)

Country Link
CN (1) CN102043886B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102054109B (en) * 2010-12-31 2014-03-19 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and data control flow generation method and device
CN102054107B (en) * 2010-12-31 2013-11-06 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
CN102419789B (en) * 2011-12-16 2013-05-01 中山大学 High-level synthesis method and system
CN103077283B (en) * 2013-01-16 2016-05-18 清华大学 The C-to-RTL integrated approach of optimizing based on VFI
CN103150461B (en) * 2013-04-10 2016-08-10 上海安路信息科技有限公司 Parallel integration method and system thereof for IC design
CN106777441B (en) * 2015-11-24 2020-04-21 龙芯中科技术有限公司 Timing constraint management method and device
CN107679012A (en) * 2017-09-27 2018-02-09 清华大学无锡应用技术研究院 Method and apparatus for the configuration of reconfigurable processing system
CN108170957B (en) * 2017-12-28 2022-01-04 佛山中科芯蔚科技有限公司 Method and system for generating data control flow diagram and integrated circuit design method
CN108319459B (en) * 2018-02-12 2022-04-29 芯峰科技(广州)有限公司 CCC compiler for describing behavior level to RTL
CN112000370B (en) * 2020-08-27 2022-04-15 北京百度网讯科技有限公司 Processing method, device and equipment of loop instruction and storage medium
CN115526135A (en) * 2022-10-09 2022-12-27 中山大学 High-level comprehensive tool optimization method and system based on differential constraint system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7530047B2 (en) * 2003-09-19 2009-05-05 Cadence Design Systems, Inc. Optimized mapping of an integrated circuit design to multiple cell libraries during a single synthesis pass
CN101727434A (en) * 2008-10-20 2010-06-09 北京大学深圳研究生院 Integrated circuit structure special for specific application algorithm
CN101901161A (en) * 2010-07-21 2010-12-01 四川大学 Energy consumption related software/hardware partition-oriented hierarchical control and data flow graph modeling method
CN102054108A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and time-space diagram compression method and device
CN102054107A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
CN102054109A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and data control flow generation method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7530047B2 (en) * 2003-09-19 2009-05-05 Cadence Design Systems, Inc. Optimized mapping of an integrated circuit design to multiple cell libraries during a single synthesis pass
CN101727434A (en) * 2008-10-20 2010-06-09 北京大学深圳研究生院 Integrated circuit structure special for specific application algorithm
CN101901161A (en) * 2010-07-21 2010-12-01 四川大学 Energy consumption related software/hardware partition-oriented hierarchical control and data flow graph modeling method
CN102054108A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and time-space diagram compression method and device
CN102054107A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
CN102054109A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and data control flow generation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯亚林等."集成电路的现状及其发展趋势".《微电子学》.2006,第36卷(第2期),173-176页. *

Also Published As

Publication number Publication date
CN102043886A (en) 2011-05-04

Similar Documents

Publication Publication Date Title
CN102043886B (en) Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow
Burrello et al. Dory: Automatic end-to-end deployment of real-world dnns on low-cost iot mcus
KR102258414B1 (en) Processing apparatus and processing method
CN102054108B (en) Lower hardware mapping method of integrated circuit, and time-space diagram compression method and device
CN102054109B (en) Lower hardware mapping method of integrated circuit, and data control flow generation method and device
DE102018006735A1 (en) Processor and method for configurable clock gating in a spatial array
DE102018005169A1 (en) PROCESSORS AND METHODS FOR CONFIGURABLE NETWORK-BASED DATA FLUID OPERATOR CIRCUITS
Gu et al. DLUX: A LUT-based near-bank accelerator for data center deep learning training workloads
CN104919432A (en) Instruction for shifting bits left with pulling ones into less significant bits
Catthoor et al. Application-specific architectural methodologies for high-throughput digital signal and image processing
CN102055981A (en) Deblocking filter for video coder and implementation method thereof
CN102054107B (en) Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
CN102306141A (en) Method for describing configuration information of dynamic reconfigurable array
CN113220630A (en) Reconfigurable array optimization method and automatic tuning method of hardware accelerator
Huang et al. IECA: An in-execution configuration CNN accelerator with 30.55 GOPS/mm² area efficiency
Chiu et al. Flexibility: FPGAs and CAD in deep learning acceleration
CN114356836A (en) RISC-V based three-dimensional interconnected many-core processor architecture and working method thereof
CN114691108A (en) Automatic code generation method for neural network accelerator
CN113158599B (en) Quantum informatics-based chip and chip-based EDA device
CN113157638A (en) Low-power-consumption in-memory calculation processor and processing operation method
CN102075762B (en) Inter-frame predictor circuit for video encoder and method for implementing same
CN103136162B (en) Cloud framework and the method for designing based on this framework in ASIC sheet
CN102075763A (en) Intra-frame sub-block predictor circuit for video encoder and method for implementing same
CN114595547A (en) Grid structure parameter determination method and system
CN115170381A (en) Visual SLAM acceleration system and method based on deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121024

Termination date: 20211231

CF01 Termination of patent right due to non-payment of annual fee