CN110333857A

CN110333857A - A kind of custom instruction automatic identifying method based on constraint planning

Info

Publication number: CN110333857A
Application number: CN201910627531.6A
Authority: CN
Inventors: 肖成龙; 王珊珊; 王心霖
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2019-10-15
Anticipated expiration: 2039-07-12
Also published as: CN110333857B

Abstract

The present invention provides a kind of custom instruction automatic identifying method based on constraint planning, is related to EDA Technique field.This method includes the selection two parts of custom instruction enumerated with custom instruction；Enumerating for custom instruction enumerates constraint programming model by establish custom instruction, and all subgraphs for meeting constraint condition are enumerated from data flow diagram and are realized；The process respectively models constraint condition, and for problem is enumerated, seeks all custom instructions for meeting constraint condition using constraint programming method, custom instruction is enumerated in completion；The selection of custom instruction realizes multiple-objection optimization by establishing the selection constraint programming model of custom instruction；Multi-objective optimization question is converted to single-object problem and is realized by the process by establishing the maximization objective function that custom instruction bring processor performance is promoted with energy consumption reduction using the method based on weight.

Description

A kind of custom instruction automatic identifying method based on constraint planning

Technical field

The present invention relates to EDA Technique field more particularly to a kind of custom instructions based on constraint planning Automatic identifying method.

Background technique

In recent years, in order to meet Embedded Application to high-performance and low-power consumption increasing need, expansion instruction set is wide It is general be applied to embedded system in for example, dedicated instruction set processor (Application Specific Instruction Processor, ASIP) the advantages of combining general processor and ASIC, in design cycle, flexibility, performance and power consumption etc. Aspect provides good compromise.The custom instruction that extended instruction is concentrated is realized basic by encapsulating a series of elementary instructions Chain and parallelization between instruction, and then improve performance.

Expansion instruction set towards specific application is the core link of dedicated instruction set processor design.Expansion instruction set is logical Often used in the fields such as multimedia application processing and signal processing.In order to enable heterogeneous multi-processor preferably to run difference Multimedia application, expansion instruction set is applied among heterogeneous multi-nucleus processor system on chip by Dammak et al., make be System has carried out good tradeoff between performance and power consumption.Dedicated instruction set processor is used to execute number by Momcilovic et al. According to adaptive motion estimation algorithm, data are greatly saved and calculate cost, improve the speed of video processing well.Sisto Et al. propose one kind be exclusively used in automobile application field sensor signal conditioning application specific processor design.

Currently, field of image processing is quickly grown, and the effect of image procossing is also just continuous to be promoted.Neural network, support Although the learning-oriented mechanism such as vector machine has preferable advantage in terms of image procossing, it is directed to current image data amount Huge feature, some preferable optimization algorithms of effect but need a large amount of time to remove processing image data or training sample. In addition, needing stringent time restriction for scan picture.Domestic and international current research discovery, expansion instruction set is applied to It, being capable of significant ground improving performance in field of image processing.Mori et al. is proposed for accelerating Real-Time IP/CV algorithm dedicated Processor designs .Edwards et al. by the way that among application extension instruction set to real-time target detection system, processing speed is improved 1.5 to 6.8 times.

It is efficiently to realize application program by design specialized chip, but special chip is set in the research of early stage Meter period length, hardware development are difficult to debug, and cost is also very high.So more and more researchers also start that weight will be studied The heart is transferred in extended instruction, automatically identifies expansion instruction set for specific application.

The process of automatic identification expansion instruction set is as shown in Figure 1, firstly, image processing algorithm source code is as open source compiling The input of device GeCoS, GeCoS convert source code into control data flow diagram (Control Data Flow Graph, CDFG), Controlling data flow diagram is the figure for indicating the data dependence relation between multiple basic blocks.Then, subgraph enumeration is from data flow All subgraphs (graph-based that subgraph is custom instruction) for meeting constraint condition are enumerated in figure.Then, subgraph selects Algorithm is from the best subgraph of selected section in the subgraph enumerated as final custom instruction.Finally, source code is converted For the fresh code comprising selected custom instruction.

Constraint programming is a kind of universal search technology of combination reasoning from logic, is led originating from computer science and artificial intelligence The constraint satisfaction problemx (Constraint Satisfaction Problem, CSP) in domain.Constraint satisfaction problemx is by given (equation, inequality and program etc. all can serve as constraint item for one group of variable, the codomain of this group of variable and one group of constraint condition Part) it is composed, the solution for constraint satisfaction problemx is to find out one or more in all combinations to meet constraint condition Combination.In general, Combinatorial Optimization, Problems of Optimal Dispatch belong to constraint satisfaction problemx.When being solved the problems, such as using constraint programming, Statement is closer to practical problem, without being linear equality or inequality by constraints conversion, keep formula expression simple and It should be readily appreciated that.

Summary of the invention

The technical problem to be solved by the present invention is in view of the above shortcomings of the prior art, provide it is a kind of based on constraint planning Custom instruction automatic identifying method carries out automatic identification to custom instruction based on constraint programming method.

In order to solve the above technical problems, the technical solution used in the present invention is: a kind of based on the customized of constraint planning Instruct automatic identifying method, selection two parts enumerated with custom instruction including custom instruction；

Enumerating for the custom instruction enumerates constraint programming model by establish custom instruction, from data flow diagram All subgraphs for meeting constraint condition are enumerated to realize, method particularly includes:

In order to enumerate all custom instructions for meeting given constraint from data flow diagram G (V, E), if subgraph S= (V_s, E_s) be custom instruction example graph-based,I₁, I₂Respectively indicate the active node in figure G The set of set and illegal node,

The data flow diagram G=(V, E) is a directed acyclic graph (Directed Acyclic Graph, i.e. DAG), knot Point set V={ v₁, v₂..., v_MIndicating elementary instruction, M is the number of data flow diagram node, side collectionIndicate that data dependence relation between instruction, m indicate the number on data flow diagram side；

The given constraint condition includes: the constraint condition that custom instruction does not include illegal node, custom instruction Connectivity constraint condition, custom instruction is the input and output constraint condition of convex constraint condition and custom instruction；

Constraint condition is modeled respectively, and for problem is enumerated, asks all using constraint programming method and meets constraint condition Custom instruction, completion custom instruction is enumerated；

The constraint condition for not including illegal node to custom instruction models, shown in following formula:

Wherein, v_sel=0 indicates that illegal node v is not included in custom instruction；

The illegal node are as follows: due to the limitation of scalable processors architecture, internal memory operation and branch operation this two Kind elementary instruction cannot be included in custom instruction, and the node for representing these elementary instructions is considered as illegal node；

The constraint condition of the connectivity to custom instruction models, shown in following formula:

Wherein,Indicate node v and node v_kBetween there are a undirected path, when enumerating separation subgraph When this constraint can remove；

The custom instruction is that convex constraint condition is and if only if between any two the node u, v in subgraph s Any path only pass through the node in subgraph s, to the constraint condition model, shown in following formula:

Wherein, u_sel, v_selRespectively indicate whether node u and v are selected, 0 indicates not selected, and 1 indicates to be selected；

Shown in the following formula of input and output constraint condition of the custom instruction:

Wherein, IN_max, OUT_maxRespectively indicate the input and output upper limit of custom instruction, IN_v, OUT_vRespectively indicate node v In-degree and out-degree, Pred (u)={ v | v ∈ V, (v, u) ∈ E }, Succ (u)={ v | v ∈ V, (u, v) ∈ E } respectively indicate knot The forerunner's node set and subsequent node set of point v, v_in、v_outRespectively indicate input, the output number of node v, m_selIndicate knot Whether point m is selected；

The selection of the custom instruction realizes that multiple target is excellent by establishing the selection constraint programming model of custom instruction Change, method particularly includes: on the basis of the subgraph that custom instruction enumeration stage is enumerated, it is same that figure is carried out to all subgraphs first Structure matching treatment: giving two subgraphs a and b, if a and b isomorphism, creation mode C_i, and using subgraph a and b as example It is recorded in mode C_iIn；The mode is the graph-based of candidate custom instruction；

In order to establish the constraint programming model of custom instruction select permeability, first define some variables: N is custom instruction The number for the candidate custom instruction that enumeration stage enumerates, C_iIndicate i-th of candidate custom instruction, i=1 ..., N；It makes by oneself Justice instruction C_iThere is n in code_iA example, respectivelyThe execution frequency of the example of each custom instruction is f_{I, j}； Custom instruction bring processor performance is promoted and custom instruction realizes required hardware face in custom feature unit Integral does not use P_iAnd A_iIt indicates；

Shown in the following formula of maximization objective function that then custom instruction bring processor performance is promoted:

Wherein, s_{I, j}For binary variable, as custom instruction example c_{I, j}Its value is 1 when selection, is otherwise 0；

Since custom instruction is to reduce final instruction fetch and data in register and place by encapsulating multiple elementary instructions The number transmitted between reason device, to reduce the energy consumption of processor；Then custom instruction bring processor energy consumption is reduced most Shown in the following formula of bigization objective function:

Wherein, E (c_{I, j}) indicate custom instruction example c_{I, j}The number of internal edges, Indicate the reduction amount of instruction fetch number,Indicate the reduction amount of data transmission times, α, β are power Weight parameter, alpha+beta=1；

On the basis of the custom instruction preference pattern established above based on objective function, in order to simplify problem, use Multi-objective optimization question is converted to single-object problem by the method based on weight, is obtained customized shown in following formula Instruct preference pattern:

Wherein, γ, ε are weight parameter ,+ε=1 γ；

Given for user area-constrained, each custom instruction corresponding hardware in custom feature unit has Size then needs to model the area-constrained of custom instruction, shown in following formula:

Wherein, the area master budget of the given corresponding hardware of all custom instructions when A designs for scalable processors, A_iFor hardware area corresponding to i-th of custom instruction, S_iFor binary variable；If custom instruction C_iAt least one reality Example is selected, then S_iValue be 1, be otherwise 0, shown in following formula:

The beneficial effects of adopting the technical scheme are that it is provided by the invention it is a kind of based on constraint planning from Definition instruction automatic identifying method, enumerates problem for custom instruction, by the modeling of problem and solves separation, is applicable to more The combination of kind constraint condition, has preferable versatility and flexibility.For custom instruction select permeability, by establishing more mesh Mark optimization constraint programming model is, it can be achieved that multiple-objection optimization；The custom instruction automatically identified of the invention is applied to figure As processing class algorithm, the performance of algorithm can be obviously improved.

Detailed description of the invention

Fig. 1 is the automatic identification expansion instruction set flow chart towards image processing algorithm that background of invention provides；

Fig. 2 is the schematic diagram of data flow diagram provided in an embodiment of the present invention；

Fig. 3 is runing time comparison result figure under difference I/O constraint condition provided in an embodiment of the present invention；

Fig. 4 is that provided in an embodiment of the present invention enumerate and enumerates all subgraph runing time comparison result figures at connected subgraph；

Fig. 5 is performance boost comparison result figure provided in an embodiment of the present invention；

Fig. 6 is the comparison result figure provided in an embodiment of the present invention using distinct methods selection instruction number.

Specific embodiment

With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.

It is a kind of based on constraint planning custom instruction automatic identifying method, including custom instruction enumerate with it is customized Selection two parts of instruction；

In order to enumerate all custom instructions for meeting given constraint from data flow diagram G (V, E), if subgraph S= (Vs, Es) is the graph-based of custom instruction example,I₁, I₂Respectively indicate the active node in figure G Set and illegal node set,

The data flow diagram G=(V, E) is a directed acyclic graph (DirectedAcyclic Graph, i.e. DAG), such as Shown in Fig. 2, nodal set V={ v₁, v₂..., v_MIndicating elementary instruction, M is the number of data flow diagram node, side collectionIndicate that data dependence relation between instruction, m indicate the number on data flow diagram side；

Wherein,It indicates between node v and node vk there are a undirected path, when enumerating separation subgraph When this constraint can remove；

In the present embodiment, for data flow diagram as shown in Figure 2, subgraph { 1,2,3 } is convex portion figure, and subgraph { 2,3,5 } It is not convex portion figure.

Wherein, γ, ε are weight parameter ,+ε=1 γ；

In the present embodiment, the environment of operation is i3-3240 3.4GHz processor, 4GB main memory, and operating system is 8. constraint programming tool of Windows is JaCop 2.3.Test benchmark collection derives from this implementation of MediaBench and MiBench. Used test reference application program is common algorithms in field of image processing or in field of video processing in example.

In the present embodiment, for being directed to the common algorithms of field of image processing, GeCoS front-end compiler is used first, it will Algorithm routine is converted to corresponding control data flow diagram.Then, using the custom instruction piece of the invention based on constraint programming Act method enumerates all subgraphs for meeting constraint condition from data flow diagram.Custom instruction based on constraint programming method is enumerated The results are shown in Table 1.Column Nodes, Enumerated Subgraphs and Time in table 1 respectively indicates benchmark program used The nodal point number of corresponding data flow diagram, (the input and output upper limit is set the connected subgraph number for meeting constraint condition enumerated respectively It 2) and the runing time of enumeration methodology is 6 and.

1 custom instruction enumeration result of table

In order to further analyze influence of the various boundary conditions to the runing time of enumeration methodology, in the present embodiment, compare The runing time of enumeration methodology under different input and output constraint conditions.For benchmark SUSAN, JPEG Encode, JPEG Decode and MESA, under different I/O constraint conditions, runing time result is more as shown in Figure 3.

From figure 3, it can be seen that the runing time of enumeration methodology is dramatically increased with the increase of input and output number.Pass through Further it was found that, increasing output number influences significantly greater than to increase input number to runing time to runing time It influences.For example, under conditions of being 6/2 compared to the input and output upper limit, when the input and output upper limit is 7/2, the fortune of enumeration methodology The row time averagely increases by 1.5 times, and when the input and output upper limit is set as 6/3, the runing time of enumeration methodology averagely increases by 10 times.

Connectivity due to enumerating subgraph is an important constraint condition in custom instruction enumeration process.The present embodiment In, by the runing time for only enumerating connected subgraph and the runing time for enumerating all subgraphs (including connected subgraph and separation subgraph) It compares, as a result as shown in Figure 4 (I/O condition is 6/2).It can be seen from the figure that enumerating the runing time of all subgraphs Significantly larger than only enumerate the runing time of connected subgraph.

In the present embodiment, the custom instruction selection method of the invention based on constraint programming and Kamal et al. are proposed Custom instruction selection method and the custom instruction selection method that proposes of Xiao et al. be compared.Wherein, Kamal et al. The method of proposition is to select the maximized custom instruction of improving performance under the conditions of giving area-constrained.Xiao et al. is proposed Method be under the conditions of given area-constrained, by selecting less custom instruction number, to reduce power consumption

In the present embodiment, according to the method that Kamal et al. is proposed, the hardware custom feature unit of custom instruction realization The hardware delay and area information of the elementary instruction of middle realization are as shown in table 2.

The hardware delay and area information of elementary instruction in 2 custom feature unit of table

Operation	Area	Delay(ns)
			SUB	225	0.5
Add	200	0.5
			SHR/SHL	326	0.19
EQT/NEQ	87	0.16
			GRT/LKS	115	0.21
AND	41	0.04
			OR	42	0.05
XOR	64	0.05

In the present embodiment, it is assumed that the custom instruction comprising multiple nodes executes on custom feature unit, and applies The elementary instruction for not being included in custom instruction in program executes formula (13) on reference processor and gives using customized The calculating of the overall delay of the application program of instruction:

L_h=(∑_S∈SC∑_i∈C(S)HW(i)+∑_S∈SCT(S))+∑_K∈PSW(K) (13)

Wherein, HW (i) indicates the hardware delay of custom instruction i.T (S) indicates the transmission input of custom instruction and defeated Extra latency needed for operand out.∑ in formula (13)_S∈SC∑_i∈C(S)When HW (i) indicates selected custom instruction accumulation hardware (SC indicates the set of selected custom instruction to the summation prolonged, and C (S) indicates it is in the critical path of selected custom instruction s Node)；Part 2 indicates the accumulation software time delay not comprising the elementary instruction into custom instruction, and wherein P expression does not include Elementary instruction set.

Shown in calculating such as formula (14) by using the performance boost of custom instruction realization:

Wherein,It is accumulation software time delay (the n expression of all elementary instructions in the source code of original application program The quantity of elementary instruction in source code).

In the present embodiment, by custom instruction selection method and Kamal of the invention et al. and Xiao et al. propose from Instruction method is defined to compare.Wherein, area-constrained condition is set to 10%, 30% and 50%. ginseng of area of reference Examine custom instruction area that area is the selected maximization improving performance of greedy algorithm proposed using Bonzini et al. it With for the 9 benchmark Benchmarks enumerated in table 1, three kinds of selected number of instructions of method (NS) and property The comparison result that (PI) can be promoted is as shown in table 3.

3 custom instruction selection method Comparison of experiment results of table

In the present embodiment, parameter γ, ε, α and β in Model for Multi-Objective Optimization of the invention are set as 0.5.It can observe It arrives, with loosening for area-constrained condition, the performance boost that three kinds of methods obtain is in increased trend.Compared to Xiao et al. The method of proposition, method of the invention performing better than in terms of performance boost: the method for the present invention obtains performance boost average out to 3.12 times, the method that Xiao et al. is proposed obtains 2.81 times of performance boost average out to.On the other hand, the method for the present invention is selected The number of custom instruction example is considerably less than the number of the selected custom instruction example of method of Kamal et al. proposition. The instruction number average out to 58 of the method for the present invention final choice, and the number of instructions for the method final choice that Kamal et al. is proposed is flat It is 62.Number due to reducing custom instruction example can reduce final instruction fetch and data between register and processor The number of transmission, to reduce energy consumption.

In addition, by adjusting parameter γ and ε in Model for Multi-Objective Optimization, the method for the present invention can in terms of performance boost or There is preferably performance in terms of reducing number of instructions.When parameter γ and ε are set to 1 and 0, proposed compared to Kamal et al. Method, the method for the present invention in performance boost effect advantageously, as a result as shown in Figure 5 (area-constrained is 50%).Work as ginseng When number γ and ε is set to 1 and 0, problem model is translated under the conditions of giving area-constrained, asks improving performance maximized Custom instruction selection.Since the constraint programming method that the present invention uses can be used to find optimal solution, and Kamal et al. is proposed Method cannot be guaranteed that the solution obtained is optimal.Therefore, the method for the present invention becomes apparent from performance boost effect.

When parameter γ and ε are set to 0 and 1, compared to the method that Xiao et al. is proposed, the finger of the method for the present invention selection Enable example number less, as a result as shown in Figure 6.When parameter γ and ε are set to 0 and 1, problem model is translated into given Under the conditions of area-constrained, the Command Example of minimal number is selected to cover former data flow diagram.For each test benchmark program, constraint Programmed method can choose the instruction of minimal number, and the heuristic method that Xiao et al. is proposed is in most cases, cannot look for To the instruction of minimal number.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify to technical solution documented by previous embodiment, or some or all of the technical features are equal Replacement；And these are modified or replaceed, model defined by the claims in the present invention that it does not separate the essence of the corresponding technical solution It encloses.

Claims

1. a kind of custom instruction automatic identifying method based on constraint planning, it is characterised in that: piece including custom instruction Lift selection two parts with custom instruction；

Enumerating for the custom instruction enumerates constraint programming model by establish custom instruction, enumerates from data flow diagram All subgraphs realizations for meeting constraint condition, method particularly includes:

In order to enumerate all custom instructions for meeting given constraint from data flow diagram G (V, E), if subgraph S=(V_s, E_s) It is the graph-based of custom instruction example,I₁, I₂Respectively indicate the active node in figure G set and The set of illegal node,

The data flow diagram G=(V, E) is a directed acyclic graph, nodal set V={ v₁, v₂..., v_MIndicate elementary instruction, M For the number of data flow diagram node, side collectionIndicate that data dependence relation between instruction, m indicate number According to the number on flow graph side；

The given constraint condition includes: the constraint condition that custom instruction does not include illegal node, the company of custom instruction General character constraint condition, custom instruction are the input and output constraint condition of convex constraint condition and custom instruction；

Constraint condition is modeled respectively, and for enumerating problem, using constraint programming method ask it is all meet constraint condition from Custom instruction is enumerated in definition instruction, completion；

The selection of the custom instruction realizes multiple-objection optimization by establishing the selection constraint programming model of custom instruction, Method particularly includes:

On the basis of the subgraph that custom instruction enumeration stage is enumerated, all subgraphs are carried out at isomorphism of graph matching first Reason；

In order to establish the constraint programming model of custom instruction select permeability, first define some variables: N enumerates for custom instruction The number for the candidate custom instruction that stage enumerates, C_iIndicate i-th of candidate custom instruction, i=1 ..., N；Customized finger Enable C_iThere is n in code_iA example, respectivelyThe execution frequency of the example of each custom instruction is f_{I, j}；It makes by oneself Justice instruction bring processor performance is promoted and custom instruction realizes required hardware area point in custom feature unit P is not used_iAnd A_iIt indicates；

Since custom instruction is to reduce final instruction fetch and data in register and processor by encapsulating multiple elementary instructions Between the number that transmits, to reduce the energy consumption of processor；The then maximization of custom instruction bring processor energy consumption reduction Shown in the following formula of objective function:

Wherein, E (c_{I, j}) indicate custom instruction example c_{I, j}The number of internal edges,Expression takes The reduction amount of number of instructions,Indicating the reduction amount of data transmission times, α, β are weight parameter, Alpha+beta=1；

On the basis of the custom instruction preference pattern established above based on objective function, in order to simplify problem, using being based on Multi-objective optimization question is converted to single-object problem by the method for weight, obtains custom instruction shown in following formula Preference pattern:

Wherein, γ, ε are weight parameter ,+ε=1 γ；

Given for user area-constrained, each custom instruction corresponding hardware in custom feature unit has area Size then needs to model the area-constrained of custom instruction, shown in following formula:

Wherein, the area master budget of the given corresponding hardware of all custom instructions, A when A designs for scalable processors_iFor Hardware area corresponding to i-th of custom instruction, S_iFor binary variable；If custom instruction C_iAt least one example quilt Selection, then S_iValue be 1, be otherwise 0, shown in following formula:

2. a kind of custom instruction automatic identifying method based on constraint planning according to claim 1, it is characterised in that: It is described to constraint condition modeling method particularly includes:

v_sel=0

The illegal node are as follows: due to the limitation of scalable processors architecture, internal memory operation and branch operation both bases This instruction cannot be included in custom instruction, and the node for representing these elementary instructions is considered as illegal node；

Wherein,Indicate node v and node v_kBetween there are a undirected path, when enumerating separation subgraph, this is about Beam can remove；

The custom instruction is that convex constraint condition is and if only if appointing between any two the node u, v in subgraph s The node in subgraph s is only passed through in what path, models to the constraint condition, shown in following formula:

Wherein, IN_max, OUT_maxRespectively indicate the input and output upper limit of custom instruction, IN_v, OUT_vRespectively indicate entering for node v Degree and out-degree, Pred (u)=v | v ∈ V, (v, u) ∈ E }, Succ (u)=v | v ∈ V, (u, v) ∈ E } respectively indicate node v's Forerunner's node set and subsequent node set, v_in、v_outRespectively indicate input, the output number of node v, m_selIndicating node m is It is no to be selected.

3. a kind of custom instruction automatic identifying method based on constraint planning according to claim 1, it is characterised in that: Described pair of all subgraphs carry out isomorphism of graph matching treatment method particularly includes:

Two subgraphs a and b are given, if a and b isomorphism, creation mode C_i, and be recorded in using subgraph a and b as example Mode C_iIn；The mode is the graph-based of candidate custom instruction.