CN101076780A - Compiling method, apparatus and computer system for loop in program - Google Patents

Compiling method, apparatus and computer system for loop in program Download PDF

Info

Publication number
CN101076780A
CN101076780A CN200580042539.2A CN200580042539A CN101076780A CN 101076780 A CN101076780 A CN 101076780A CN 200580042539 A CN200580042539 A CN 200580042539A CN 101076780 A CN101076780 A CN 101076780A
Authority
CN
China
Prior art keywords
instruction
combined command
command section
cycle
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200580042539.2A
Other languages
Chinese (zh)
Other versions
CN100583042C (en
Inventor
吴凡
孙彦孟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
St Wireless
ST Ericsson SA
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Priority claimed from PCT/IB2005/054089 external-priority patent/WO2006064409A1/en
Publication of CN101076780A publication Critical patent/CN101076780A/en
Application granted granted Critical
Publication of CN100583042C publication Critical patent/CN100583042C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

Methods and apparatus for receiving user input from a first user specifying items that the first user is offering to trade and items that the first user is seeking to obtain, and receiving user input from a second user specifying items that the second user is seeking to obtain and that the second user is offering to trade. A search operation performed in response to a request from the second user evaluates matches between the items of the first user and the second user. The result of the search operation is displayed to the second user. The displayed search results include items sought by the second user that are also offered by the first user, and items offered by the second user that are also sought by the second user.

Description

At round-robin Compilation Method, compiling equipment and computer system in the program
Technical field
The present invention relates to a kind of method, equipment and computer system to comprising that the round-robin program compiles, more specifically, relate to a kind of in VLIW (very long instruction word) processor method, equipment and the computer system to comprising that the round-robin program compiles.
Background technology
In traditional computer system, its hardware components comprises CPU (CPU (central processing unit)), storer etc.Computer system is operated by execution command.Traditional instruction set computer comprises RISC (Reduced Instruction Set Computer) and CISC (complex instruction set computer (CISC)), and VLIW becomes the technology of more and more popularizing in the microprocessor Design field.Compare with cisc processor with RISC, vliw processor has that cost is low, energy consumption is low, simple in structure and advantage that processing speed is fast.
But vliw processor uses the long instruction of the some regular lengths than short instruction comprise executed in parallel.In addition, in operation, vliw processor does not need the control circuit of a plurality of complexity, and the superscalar processor cooperation must be used the control circuit of a plurality of complexity during with executed in parallel.
In addition, vliw processor also will be the instruction bag more than two the packing of orders.Compiler is dispatched instruction bag in advance, so that vliw processor parallel execution of instructions apace, thereby microprocessor does not need to carry out complicated time series analysis, and in superscale RISC and cisc processor, must finish complicated time series analysis.
So-called pilosity is penetrated processor and is allowed processor to carry out a plurality of instructions in a clock period.Have following two kinds of pilositys to penetrate processor:
1. superscalar processor, each clock period is carried out the instruction of variable number, and can use the technology such as scoreboard (score boarding) to carry out static state or dynamic dispatching by compiling equipment (for example hardware and/or software).
2.VLIW (very long instruction word) processor is carried out the instruction of the fixed qty that is formatted as a bigger instruction or fixed instruction bag.Inherently vliw processor is carried out static scheduling.
The VLIW instruction generally includes the plurality of sub instruction.Each sub-instructions is corresponding with specific functional units (being module) and computing collection in the processor.For example, the 285-289 page or leaf of Computer Architecture, a Quantitative Approach (2 NdEdition) of Hennessy, John L. and David A.Patterson[1996], Morgan Kaufmann Publishers, Inc. has pointed out that a VLIW instruction comprises that two integer arithmetics, two floating point arithmetics, two storeies guide and branch.
Vliw processor uses a plurality of independent functional units, and each functional unit is used to carry out a sub-instructions of VLIW instruction.The parallel scheduling of these operations needs complicated compilation scheme and instrument.
Fig. 1 shows the relation between VLIW instruction and the vliw processor.As shown in Figure 1, the VLIW instruction comprises four sub-instructions, is respectively ADD int a, b; MUL double c, 3.142; READ d, ARO; With BNZ loop, e.These four sub-instructions are corresponding with four functional units in the vliw processor, i.e. integer functional unit (INT FU), floating number functional unit (Float FU), data-carrier store (data-carrier store) and program storage (program storage).
Traditional VLIW compiling equipment is each instruction of decipher and generation machine code independently, and promptly each instruction is corresponding to a VLIW binary command with length-specific (for example 256 bits).This compilation scheme can cause operating the waste of surplus, especially in loop structure.
Circulation is all to have one of base program structure in the senior and low-level language.In the application program of most of DSP (digital signal processing) style, use a large amount of circulations to carry out such as filtering, relevant etc. calculating.In fact, loop structure makes processor carry out the instruction block of repetition with minimum program's memory space.
After adopting traditional Compilation Method decipher instruction, circulation is expressed as machine (scale-of-two) instruction.Each binary command takies 256 bits in the program storage.If the round-robin multiplicity is K, then processor realizes that the whole circulation structure needs K cycle (supposing that the whole circulation structure is will repeat one to circulate and be the circulation of zero-overhead).Therefore, for circulation, one of advantage of traditional Compilation Method is to make processor come the many repetitive cycling structures of executive chairman with limited program's memory space.
For non-vliw processor, traditional Compilation Method can take and circulate to carry out and reach optimum aspect the efficient two in program memory space.Yet for vliw processor, traditional Compilation Method can't be guaranteed circulation execution efficient.
As everyone knows, because the order set more complicated of vliw processor, so the code quality that compiler produces has appreciable impact to its operating performance.In addition owing in the VLIW code, use a large amount of circulations, and the running time of loop structure occupy the major part of whole operation time, so the execution efficient of loop structure directly influences the operating efficiency of whole vliw processor.
Under with traditional Compilation Method situation that circulation compiles to VLIW, the execution efficient of loop structure is not high, thereby causes the waste of cycling time, and therefore the operating efficiency of whole vliw processor is difficult to satisfy the demands.
For example, if a circulation in the program needs to repeat M time, then when circulation compiles to VLIW with traditional Compilation Method, wasted the individual instruction cycle of 2 (M-1) in the vliw processor.Under the relatively large situation of M value, will cause the remarkable reduction of operating performance.
Summary of the invention
In view of problems of the prior art, the present invention is proposed, the purpose of this invention is to provide a kind of method to comprising that the round-robin program compiles.In this program, circulation comprises K instruction (K 〉=2) and repeats M time (M 〉=2) that this Compilation Method may further comprise the steps:
Resource conflict analysis is carried out in K in circulation instruction;
K instruction in the circulation is divided into the first combined command section, link order section and the second combined command section, wherein, do not have resource contention respectively between instruction in the first combined command section and the instruction in the second combined command section; And
Program compiler, wherein, respectively to cycle N (N=2,3 ... parallel compilation is carried out in the instruction among instruction M) in the first combined command section and the cycle N-1 in the second combined command section.
According to a further aspect in the invention, provide a kind of compiling equipment to comprising that the round-robin program compiles.In this program, circulation comprises K instruction (K 〉=2) and repeats M time (M 〉=2) that this compiling equipment comprises:
Analytical equipment is used for resource conflict analysis is carried out in K instruction of circulation;
Classification apparatus is used for K instruction of circulation is divided into the first combined command section, link order section and the second combined command section, wherein, does not have resource contention respectively between instruction in the first combined command section and the instruction in the second combined command section; And
Compilation device is used for program compiler, wherein, respectively to cycle N (N=2,3 ... parallel compilation is carried out in the instruction among instruction M) in the first combined command section and the cycle N-1 in the second combined command section.
According to a further aspect in the invention, provide a kind of computer system.This computer system comprises storer, input and output device and at the compiling equipment that comprises the round-robin program.In this program, circulation comprises K instruction (K 〉=2) and repeats M time (M 〉=2) that this compiling equipment comprises:
Analytical equipment is used for resource conflict analysis is carried out in K instruction of circulation;
Classification apparatus is used for K instruction of circulation is divided into the first combined command section, link order section and the second combined command section, wherein, is not having resource contention between instruction in the first combined command section and the instruction in the second combined command section respectively; And
Compilation device is used for program compiler, wherein, respectively to cycle N (N=2,3 ... parallel compilation is carried out in the instruction among instruction M) in the first combined command section and the cycle N-1 in the second combined command section.
By using, can significantly improve the cycle efficient of program according to method, equipment or the computer system to comprising that the round-robin program compiles of the present invention.
According to the detailed description of the following preferred embodiment of the present invention with reference to accompanying drawing, these and other purposes, features and advantages of the present invention will become apparent.
Description of drawings
Below, with preferred embodiments of the present invention will be described in detail with reference to the annexed drawings.
Fig. 1 is the figure that schematically shows the relation between traditional VLIW instruction and the vliw processor;
Fig. 2 is according to first embodiment of the invention, at the process flow diagram of the Compilation Method that comprises the round-robin program;
Fig. 3 is the process flow diagram of the resource conflict analysis step of Compilation Method shown in Figure 2;
Fig. 4 (a) show by use according to first embodiment of the invention at the Compilation Method that comprises the round-robin program, the situation that the circulation that comprises the even number instruction is compiled;
Fig. 4 (b) show by use according to first embodiment of the invention at the Compilation Method that comprises the round-robin program, the situation that the circulation that comprises the odd number instruction is compiled;
Fig. 5 shows VLIW circulation and compiling result's thereof example;
Fig. 6 shows the instruction segment division result of three adjacent periods carrying out same loop and the instruction segment combined result in the adjacent periods;
Fig. 7 shows the concrete instruction set of three adjacent periods carrying out same loop for example;
Fig. 8 show by use according to first embodiment of the invention at the Compilation Method that comprises the round-robin program, in first combined command section of three adjacent periods that make up execution same loop shown in Figure 7 respectively and the instruction set after the second combined command section;
Fig. 9 shows the compiling result of the instruction set after the combination shown in Figure 8; And
Figure 10 schematically shows the compiling equipment that is used to realize Compilation Method according to second embodiment of the invention.
Embodiment
Below, will describe the preferred embodiments of the present invention with reference to the accompanying drawings in detail.
For the sake of clarity, the relational term that uses among the application illustrates as follows:
Program representation can be by the instruction sequence of computing machine execution.Cyclic representation can repeat by program, up to having carried out fixed number of times or being statement group true or that just stop for fictitious time up to certain condition.Cycle represents to carry out the round-robin operation.The use a computer imperative statement of language compilation of instruction expression.
In according to the vliw processor of first embodiment of the invention,, introduce resource conflict analysis in the Compilation Method that comprises the round-robin program.In other words, at the compilation process that comprises the round-robin program, add analysis at each instruction in the circulation.The resource conflict analysis process comprises two parts:
1. functional unit conflict analysis; And
2. register conflict analysis.
The effect of functional unit analysis is the conflict that is used to avoid to carry out the required functional unit of two instructions.The register conflict analysis is to be used to check two data dependencies between the instruction.By resource conflict analysis, can in two adjacent periods carrying out in a plurality of cycles of same loop, determine can executed in parallel instruction, thereby the raising cycle carry out efficient, and the function of reprogramming not.
There is not this true expression of resource contention both not have the functional unit conflict not have the register conflict between two instructions yet.
Have multiple algorithm in the technical field, these algorithms can be realized instructing and obtain, to satisfy the rule of avoiding of functional unit conflict and register conflict.For example, the syntactic correction function that can use compiling equipment to have.
The compiling result who should be noted that various compiling equipment is identical, but may produce different effects with different algorithms.Employed algorithm directly influences the complexity of compiling equipment.The intelligent degree of compiling equipment is high more, and its complicated process is high more.
Fig. 2 is the process flow diagram at the Compilation Method that comprises the round-robin program according to first embodiment of the invention, and wherein circulation comprises K instruction (K is equal to or greater than 2 integer), and repeats M time (M 〉=2) in program.
As shown in Figure 2, comprise following basic step at the Compilation Method that comprises the round-robin program:
In step S101, resource conflict analysis is carried out in K in circulation instruction, and judged between command adapted thereto, whether there is resource contention.
Then in step S102, according to the analysis result among the step S101, K in circulation instruction is divided into the first combined command section, link order section and the second combined command section, wherein, there is not resource contention respectively between instruction in the first combined command section and the instruction in the second combined command section.Can carry out parallel compilation to the instruction in the instruction in the first combined command section and the second combined command section respectively, and the instruction in the instruction in the link order section and the first or second combined command section there is resource contention.
In step S103, program compiler, wherein, with array mode to cycle N (N=2,3 ... first combined command section M) and the second combined command section among the cycle N-1 compile.
Below, the resource conflict analysis described in the step S101 of detailed description Fig. 2.
Fig. 3 shows the details of the resource conflict analysis step of mentioning when K is even number, in above-mentioned steps S101.
At first, two variable i and j are set, and respectively variable i and j are initialized as 0 and K/2 (step S201).The setting of variable i and j initial value is just in order to design this process flow diagram.
According to the first embodiment of the present invention, be under the situation of even number at K, the variation range of variable i is 1≤i≤K/2, and the variation range of variable j is (K/2+1)≤j≤K.In fact, be under the situation of even number at K, the instruction that variable I points to is first section instruction of round-robin (first half instruction), and the instruction that variable j points to is second section instruction of round-robin (latter half instruction).
Then, in step S202, make variable i=i+1; And in step S203, make variable j=j+1.
Next, in step S204, judge whether the functional unit of carrying out i instruction in the circulation and the functional unit of carrying out j instruction conflict.
If two instructions the term of execution relate to the identical functions unit, then determine to have the functional unit conflict between two instructions, otherwise determine not have the functional unit conflict between two instructions.
If the judged result in step S204 is a "No", then flow process advances to step S205, judges further whether the register of carrying out i instruction conflicts with the register of carrying out j instruction.
If the source-register or the destination register of any one in carrying out i source-register that instructs or destination register and execution j instructing to K relate to identical register, then determine to have the register conflict between i instruction and j instruction, otherwise determine not have the register conflict between i instruction and j instruction.
If the judged result in step S204 is a "Yes", then mean and have the functional unit conflict between the command adapted thereto, flow process is returned step S203, and continue to judge whether there is register conflict (step S204) between the successor instruction in second section instruction of instruction and variable j sensing in first section instruction that variable i is pointed to.
If the judged result in step S205 is a "No", then make j=j+1 (step S206), and flow process continues to judge whether the register of carrying out i instruction exists conflict (step S207) with the register of carrying out j instruction.
In addition, if the judged result among the step S205 is a "Yes", then flow process is returned step S203.
If the judged result among the step S207 is a "No", then flow process continues to judge that whether j is less than K (step S208); If answer is a "Yes", then flow process is returned step S206, otherwise flow process advances to step S209.
If the judged result among the step S207 is a "Yes", then flow process is returned step S203.
In step S209, command adapted thereto had not both had the functional unit conflict not have the register conflict yet, thereby determined there is not resource contention between command adapted thereto.
Above-mentioned finished dealing with to instruction in first section in the circulation with second section in one instruct and successor instruction between the judgement of resource contention.
In step S210, whether judgment variable i is less than K/2.If the judged result among the step S210 is a "No", then flow process is returned step S202, otherwise stops the resource contention of instruction in the circulation is judged, and flow process advances to step S102 (Fig. 2).
In step S202, begin to the circulation in first section in another the instruction and second section in another instruction and the resource contention between the successor instruction judge.
At K is under the situation of odd number, and resource conflict analysis and K are that the situation of even number is similar, and 1≤i≤(variation range of (K+1)/2-1) and variable j becomes ((K+1)/2+1)≤j≤K except the variation range of variable i becomes.In this case, the situation of other instruction conflict in consideration center instruction and the circulation not.
The circulation of n instruction coupling of n instruction and back promptly was applicable to the circulation that does not have resource contention before the Compilation Method at comprising the round-robin program according to first embodiment of the invention shown in Fig. 2 and 3 was applicable to.Wherein, n instruction of preceding n instruction and back must be arranged continuously, and shows matching relationship one to one.K-2n the instruction at circulation middle part can be used as the link order section.When program compiler, sequentially compile the link order section.
Fig. 4 (a) show by use according to first embodiment of the invention at the Compilation Method that comprises the round-robin program, the situation that the circulation that comprises the even number instruction is compiled.
Circulation shown in Fig. 4 (a) comprises A, B, C, D, E, F, G and eight (K=8) instructions of H, and according to instruction resource conflict analysis method of the present invention, carry out following definite: do not have resource contention (promptly instructing had not both had the functional unit conflict not have the register conflict between the A and instruction E yet, and did not have the register conflict between instruction A and instruction F, G and H yet) between the instruction A and instruction E; Between instruction B and instruction F, there is not resource contention (promptly instructing had not both had the functional unit conflict not have the register conflict between B and the instruction F yet, and did not have the register conflict between instruction B and instruction G and H); Between instruction C and instruction G, there is not resource contention (promptly instructing had not both had the functional unit conflict not have the register conflict between the C and instruction G yet, and did not have the register conflict between instruction C and instruction H yet); And between instruction D and instruction H, there is not resource contention (promptly instructing had not both had the functional unit conflict not have the register conflict between the D and instruction H yet).
Arrow among Fig. 4 (a) shows the situation (not conflict) of mating between each command adapted thereto, wherein, four-headed arrow represents both do not had the functional unit conflict also not have the register conflict between two instructions, and unidirectional arrow represents do not have the register conflict between two instructions.
Shown in Fig. 4 (a), circulation is divided into first combined command section (comprising instruction A, B, C and D) and the second combined command section (comprising E, F, G and H).In this case, the number of instructions in the link order section is zero.
When compiling comprises the round-robin program, instruction E, F, G and H among N-1 compiling duration concurrently and instruction A, B, C and the D among the cycle N can be distinguished, and instruction E, F, G and H among N compiling duration concurrently and instruction A, B, C and the D among the cycle N+1 can be distinguished.Therefore, the execution time in cycle significantly shortens.
Certainly, if by use according to first embodiment of the invention at the Compilation Method that comprises the round-robin program, judge between the instruction A that has only in the circulation and B and instruction G and the H and do not have resource contention, then the first combined command section comprises instruction A and B, the second combined command section comprises instruction G and H, and the link order section comprises C, D, E and F.But in this case, the execution time in cycle is longer than the situation shown in Fig. 4 (a) slightly.
Fig. 4 (b) show by use according to first embodiment of the invention at the Compilation Method that comprises the round-robin program, the situation that the circulation that comprises the odd number instruction is compiled.Circulation shown in Fig. 4 (b) comprises 7 instruction A, B, C, D, E, F and G (K=7), and according to instruction resource conflict analysis method of the present invention, carry out following definite: between instruction A and instruction E, do not have resource contention (promptly instructing had not both had the functional unit conflict not have the register conflict between the A and instruction E yet, and did not have the register conflict between instruction A and instruction F and G yet); There is not resource contention (promptly between instruction B and instruction F, do not have the functional unit conflict, and between instruction B and instruction G, do not have the register conflict yet) between the instruction B and instruction F; There is not resource contention (promptly between instruction C and instruction G, both not had the functional unit conflict yet not have the register conflict) between the instruction C and instruction G.
Consideration center instruction D and other instruction in the circulation do not have the situation of resource contention, and this is that the situation of even number is different with K.
Under the situation shown in Fig. 4 (b), the first combined command section comprises instruction A, B and C, and the second combined command section comprises instruction E, F and G, and the link order section comprises instruction D.
When compiling comprises the round-robin program, can distinguish instruction E, F among N-1 compiling duration concurrently and instruction A, B and the C among G and the cycle N, and can distinguish instruction E, F among N compiling duration concurrently and instruction A, B and the C among G and the cycle N+1.
Can sequentially compile the instruction in the link order section in each cycle.
Fig. 5 shows VLIW circulation, wherein is the VLIW circulation of being represented by assembly language in the square frame of the left side, and is VLIW round-robin compiling result in the square frame on the right.
In the cycle shown in Figure 5, instruction A (READ a, AR0+; READ b, AR1+; ) only relate to the write functionality unit, and the register of the required operation of execution command A comprises a and b; Instruction E (MUL int e, f, g; ) only relate to the computing function unit, and the register of the required operation of execution command E comprises e, f and g.Therefore, both there be not the functional unit conflict not have the register conflict between instruction A and the E yet.There is not the register conflict between instruction A and the F.Similarly, instruction B (ADD int a, b, c; SUB int b, a is d) with instruction F (WRITE g, AR2+; ) also take different functional units, therefore between instruction B and F, there is not the register conflict.
Therefore, instruction E among N compiling duration and instruction A and the B among F and the cycle N+1 concurrently.
Fig. 6 shows the division result of instruction segment of three adjacent periods carrying out same loop and the combined result of the instruction segment in the adjacent periods.
Shown in the left side synoptic diagram of Fig. 6,, then compile equipment each instruction segment among compiling duration N-1, cycle N and the cycle N+1 in the following order if according to the classic method compiler directive:
The first combined command section among the cycle N-1, link order section and the second combined command section;
The first combined command section among the cycle N, link order section and the second combined command section; And
The first combined command section among the cycle N+1, link order section and the second combined command section.
In this case, need nine instruction segments of compiling.
Shown in the right part of flg of Fig. 6, according to the present invention, with the combination of the second combined command section among the cycle N-1 and the first combined command section among the cycle N compiling, and with the second combined command section among the cycle N and the combination of the first combined command section among the cycle N+1 to compile.In this case, only need to compile seven instruction segments.
As seen, the packing of orders is to move on the first combined command section that makes among the cycle N and the second combined command section in cycle N-1 is carried out concurrently, and makes and move on the first combined command section among the cycle N+1 and the second combined command section in cycle N is carried out concurrently.
Opposite with the cycle of centre, will not carry out the first combined command section of round-robin period 1 and the second combined command section of execution round-robin final cycle and also compile, but it will sequentially be carried out with adjacent instruction segment combination.
By using the packing of orders, reduced the number of the VLIW instruction in the whole circulation structure.
Fig. 7 shows the concrete instruction set of three adjacent periods carrying out same loop for example, and wherein, with dashed lines separates the instruction of carrying out among cycle N-1, cycle N and the cycle N+1.
Fig. 8 shows by use the Compilation Method according to first embodiment of the invention, the instruction that obtains after the first combined command section that has made up concrete instruction set shown in Figure 7 and the second combined command section.
According to first embodiment of the invention at the Compilation Method that comprises the round-robin program, circulation shown in Figure 7 can be divided into following instruction segment:
The first combined command section:
READ?a,AR0+;READ?b,AR1+;
ADD?int?a,b,c;SUB?int?b,a,d;
The second combined command section:
MUL?int?e,f,g;
WRITE?g,AR2+;
The link order section:
MAC?int?c,d,e;ADD?int?a,32,f;
As can be seen from Fig. 8, with the first command M UL inte of the second combined command section among the cycle N-1, f, first of the first combined command section instruction READ a among g and the cycle N, AR0+; READb, AR1+ combination to be compiling, and with the second instruction WRITE g of the second combined command section among the cycle N-1, second of the first combined command section instruction ADD inta among AR2+ and the cycle N, b, c; SUB int b, a, d makes up to compile.Similarly, with the first command M UL int e of the second combined command section among the cycle N, f, first of the first combined command section instruction READ a among g and the cycle N+1, AR0+; READ b, AR1+ combination to be compiling, and with the second instruction WRITE g of the second combined command section among the cycle N, second of the first combined command section instruction ADD int a among AR2+ and the cycle N+1, b, c; SUB int b, a, d makes up to compile.
Fig. 9 shows the compiling result of the instruction set after combination shown in Figure 8.
Figure 10 schematically shows the compiling equipment that is used to realize Compilation Method according to second embodiment of the invention, and in program, circulation comprises K instruction (K 〉=2) and repeats M time (M 〉=2).
As shown in figure 10, compiling equipment comprises:
Resource conflict analysis unit 500 is used for resource conflict analysis is carried out in K instruction of circulation;
Instruction division unit 530, be used for K instruction of circulation is divided into the first combined command section, link order section and the second combined command section, wherein, there is not resource contention respectively between instruction in the first combined command section and the instruction in the second combined command section.At K is under the situation of even number, the instruction coupling part comprises the even number instruction that has resource contention each other respectively, and be under the situation of odd number at K, the even number instruction that the link order section comprises the instruction of round-robin center and has resource contention each other respectively; And
Compiler 540 is used for program compiler, wherein with cycle N (N=2,3 ... the combination of the first combined command section M) and the second combined command section among the cycle N-1 is to compile.
Resource conflict analysis unit 500 comprises functional unit conflict analysis unit 510 and register conflict analysis unit 520.
Whether the functional unit that functional unit conflict analysis unit 510 is used for sequentially judging i functional unit that instructs of K instruction of execution and carry out j instruction has is conflicted.At K is under the situation of even number, 1≤i≤K/2, and (K/2+1)≤j≤K.At K is under the situation of odd number, 1≤i≤((K+1)/2-1) and ((K+1)/2+1)≤i≤K.
Register conflict analysis unit 520 is used for sequentially judging whether the register of i instruction carrying out K instruction and the register of j to K instruction of execution have conflict.At K is under the situation of even number, 1≤i≤K/2, and (K/2+1)≤j≤K.At K is under the situation of odd number, 1≤i≤((K+1)/2-1) and ((K+1)/2+1)≤j≤K.
In this case, cycle N (N=2,3 ... M) in the instruction of the first combined command section and cycle (N-1) instruction of the second combined command section both do not had functional unit to conflict respectively not have the register conflict yet, can make up respectively to compile.
Similarly, among the cycle N among the instruction of the second combined command section and the cycle N+1 instruction of the first combined command section both do not had functional unit to conflict respectively not have the register conflict yet, can make up respectively to compile.
In addition, can sequentially compile the instruction of the link order section in each cycle.
If the number of instructions K that comprises in the circulation is an even number, and the instruction in instruction in the first half instruction and the latter half instruction does not have resource contention, then the instruction in the first half instruction is divided into the first combined command section, and the instruction in the latter half instruction is divided into the second combined command section.Number of instructions in the link order section is zero, does not promptly have the link order section.
A third embodiment in accordance with the invention also provides a kind of computer system, and it comprises storer, input and output device (not shown) and at the compiling equipment that comprises the round-robin program.Compiling equipment is with identical with reference to the described compiling equipment according to second embodiment of the invention of Figure 10.
By using according to of the present invention, can improve the cycle efficient of program significantly at the Compilation Method that comprises the round-robin program, compiling equipment or computer system.
With circulation shown in Figure 5 is example, supposes that the round-robin multiplicity is K.Because circulation comprises 5 instructions, according to traditional approach, needs 5K instruction cycle to finish the compiling of whole circulation structure altogether; If use according to of the present invention at the Compilation Method that comprises the round-robin program, except period 1 and final cycle, each cycle all have two instructions with the last week interim two instruct parallel compilation, and also have two other instruction two in the follow-up cycle to instruct parallel compilation, therefore only need [5+ (K-1) * (5-2)]=3K+2 instruction cycle to finish compiling altogether.Compare with traditional Compilation Method, can save 5K-(3K+2)=2K-2 instruction cycle at the round-robin Compilation Method according to of the present invention.
The ratio that the instruction of supposing the combined command section accounts for the instruction of whole circulation is n%, and circulation repeats K time.At the Compilation Method that comprises the round-robin program, the ratio that the cycle efficient of program improves should be according to of the present invention in utilization:
E = 1 - ( 1 - n % ) ( K - 1 ) + 1 K = ( K - 1 ) · n % K
When K was quite big, cycle efficient improves can be up to about n%.
For situation shown in Figure 5, when n%=2/5=40% and K=256, efficient improves can be up to about 39.84% (approximate 40%).
Although with reference to structrual description described in the literary composition the present invention, the present invention is not limited to described details, the application be intended to contain fall into claims spirit and scope in multiple modification and change.

Claims (12)

1. method to comprising that the round-robin program compiles, the described circulation in the described program comprises K instruction, and repeats M time, K 〉=2 wherein, M 〉=2, described Compilation Method may further comprise the steps:
A) resource conflict analysis is carried out in the K in the described circulation instruction;
B) K in the described circulation instruction is divided into the first combined command section, link order section and the second combined command section, wherein, do not have resource contention respectively between each instruction in the first combined command section and each instruction in the second combined command section; And
C) compile described program, wherein, respectively to cycle N (N=2,3 ... parallel compilation is carried out in the instruction among instruction M) in the first combined command section and the cycle N-1 in the second combined command section.
2. method according to claim 1, wherein, step a) comprises:
At K is under the situation of even number, for K instruction, the functional unit that is identified for carrying out i instruction successively and repeatedly be used to carry out j functional unit that instructs and whether have and conflict, 1≤i≤K/2 wherein, (K/2+1)≤j≤K; And
For K instruction, the register that is identified for carrying out i instruction successively be used for carrying out j and instruct whether to have and conflict to the register of K any one of instructing, 1≤i≤K/2 wherein, (K/2+1)≤j≤K.
3. method according to claim 1, wherein step a) comprises:
At K is under the situation of odd number, for K instruction, the functional unit that is identified for carrying out i instruction successively and repeatedly be used to carry out j functional unit that instructs and whether have and conflict, 1≤i≤((K+1)/2-1), ((K+1)/2+1)≤j≤K wherein; And
For K instruction, the register that is identified for carrying out i instruction successively be used for carrying out j and instruct whether to have and conflict, 1≤i≤((K+1)/2-1), ((K+1)/2+1)≤j≤K wherein to the register of K any one of instructing.
4. method according to claim 1, wherein, the instruction in the instruction in the link order section and the first or second combined command section has resource contention.
5. method according to claim 4, wherein, the number of instructions in the link order section is zero.
6. method according to claim 1, wherein, the number of instructions in the first combined command section equates with number of instructions in the second combined command section.
7. method according to claim 1 wherein, compiles the instruction in the link order section in each cycle successively.
8. method according to claim 1 wherein, compiles the first combined command section of period 1 and the instruction in the second combined command section in M cycle successively.
9. compiling equipment to comprising that the round-robin program compiles, the described circulation in the described program comprises K instruction and repeats M time, K 〉=2 wherein, and M 〉=2, described compiling equipment comprises:
Analytical equipment is used for resource conflict analysis is carried out in K instruction of described circulation;
Classification apparatus, be used for K instruction of described circulation is divided into the first combined command section, link order section and the second combined command section, wherein, there is not resource contention respectively between each instruction in the first combined command section and each instruction in the second combined command section; And
Compilation device is used to compile described program, wherein, respectively to cycle N (N=2,3 ... parallel compilation is carried out in the instruction among instruction M) in the first combined command section and the cycle N-1 in the second combined command section.
10. equipment according to claim 9, wherein, analytical equipment comprises:
For K instruction, successively and repeatedly the functional unit that is identified for carrying out i instruction be used to carry out j functional unit that instructs whether the device that conflicts arranged; And
For K instruction, successively the register that is identified for carrying out i instruction be used for carrying out j and instruct to the register of K any one of instructing whether the device that conflicts is arranged,
Wherein, be under the situation of even number at K, 1≤i≤K/2 and (K/2+1)≤j≤K is under the situation of odd number at K, 1≤i≤((K+1)/2-1) and ((K+1)/2+1)≤j≤K.
11. a computer system comprises storer, input and output device and at the compiling equipment that comprises the round-robin program, the described circulation in the described program comprises K instruction and repeats M time, K 〉=2 wherein, and M 〉=2, described compiling equipment comprises:
Analytical equipment is used for resource conflict analysis is carried out in K instruction of described circulation;
Classification apparatus, be used for K instruction of described circulation is divided into the first combined command section, link order section and the second combined command section, wherein, there is not resource contention respectively between each instruction in the first combined command section and each instruction in the second combined command section; And
Compilation device is used to compile described program, wherein, respectively to cycle N (N=2,3 ... parallel compilation is carried out in the instruction among instruction M) in the first combined command section and the cycle N-1 in the second combined command section.
12. computer system according to claim 11, wherein, analytical equipment comprises:
For K instruction, successively and repeatedly the functional unit that is identified for carrying out i instruction be used to carry out j functional unit that instructs whether the device that conflicts arranged; And
For K instruction, successively the register that is identified for carrying out i instruction be used for carrying out j and instruct to the register of K any one of instructing whether the device that conflicts is arranged,
Wherein, be under the situation of even number at K, 1≤i≤K/2 and K/2+1)≤j≤K, be under the situation of odd number at K, 1≤i≤((K+1)/2-1) and ((K+1)/2+1)≤j≤K.
CN200580042539A 2004-12-13 2005-12-07 Compiling method, apparatus for loop in program Expired - Fee Related CN100583042C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN200410098827 2004-12-13
CN200410098827.7 2004-12-13
PCT/IB2005/054089 WO2006064409A1 (en) 2004-12-13 2005-12-07 Compiling method, compiling apparatus and computer system for a loop in a program

Publications (2)

Publication Number Publication Date
CN101076780A true CN101076780A (en) 2007-11-21
CN100583042C CN100583042C (en) 2010-01-20

Family

ID=38982532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200580042539A Expired - Fee Related CN100583042C (en) 2004-12-13 2005-12-07 Compiling method, apparatus for loop in program

Country Status (1)

Country Link
CN (1) CN100583042C (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116485A (en) * 2013-01-30 2013-05-22 西安电子科技大学 Assembler designing method based on specific instruction set processor for very long instruction words
CN108279976A (en) * 2017-12-26 2018-07-13 努比亚技术有限公司 A kind of compiling resource regulating method, computer and computer readable storage medium
CN109933368A (en) * 2019-03-12 2019-06-25 苏州中晟宏芯信息科技有限公司 A kind of transmitting of instruction and verification method and device
CN112084013A (en) * 2019-06-13 2020-12-15 合肥杰发科技有限公司 Program calling method, chip and computer storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116485A (en) * 2013-01-30 2013-05-22 西安电子科技大学 Assembler designing method based on specific instruction set processor for very long instruction words
CN103116485B (en) * 2013-01-30 2015-08-05 西安电子科技大学 A kind of assembler method for designing based on very long instruction word ASIP
CN108279976A (en) * 2017-12-26 2018-07-13 努比亚技术有限公司 A kind of compiling resource regulating method, computer and computer readable storage medium
CN108279976B (en) * 2017-12-26 2021-11-19 努比亚技术有限公司 Compiling resource scheduling method, computer and computer readable storage medium
CN109933368A (en) * 2019-03-12 2019-06-25 苏州中晟宏芯信息科技有限公司 A kind of transmitting of instruction and verification method and device
CN109933368B (en) * 2019-03-12 2023-07-11 北京市合芯数字科技有限公司 Method and device for transmitting and verifying instruction
CN112084013A (en) * 2019-06-13 2020-12-15 合肥杰发科技有限公司 Program calling method, chip and computer storage medium
CN112084013B (en) * 2019-06-13 2024-04-05 武汉杰开科技有限公司 Program calling method, chip and computer storage medium

Also Published As

Publication number Publication date
CN100583042C (en) 2010-01-20

Similar Documents

Publication Publication Date Title
CN1284080C (en) Method and apparatus for perfforming compiler transformation of software code using fastforward regions and value specialization
Clark et al. Automated custom instruction generation for domain-specific processor acceleration
CN1421001A (en) Optimization of N-base typed arithmetic expressions
CN1655118A (en) Processor and compiler
CN1308274A (en) Command and result tranferring and compiling method for processor
CN1234548A (en) Mixed execution of stack and abnormal processing
CN1608247A (en) Automatic instruction set architecture generation
CN1250906A (en) Use composite data processor systemand instruction system
CN1922574A (en) Method and system for performing link-time code optimization without additional code analysis
CN1245684C (en) Method and system for searching reduction variable quantity in assign ment satement
CN1752934A (en) Compiler, compilation method, and compilation program
CN1853164A (en) Combinational method for developing building blocks of DSP compiler
CN101076780A (en) Compiling method, apparatus and computer system for loop in program
Cardoso et al. Compilation and Temporal Partitioning for a Coarse-Grain Reconfigurable Architecture
Wang et al. Decomposed Software Pipelining: A New Approach to Exploit Instruction Level Parallelism for Loop Programs.
CN1570811A (en) Microprocessor equipped with power control function, and instruction converting apparatus
CN1244050C (en) Method for compiling a program
Kessler Compiling for VLIW DSPs
Podobas et al. Empowering openmp with automatically generated hardware
Aleta et al. Exploiting pseudo-schedules to guide data dependence graph partitioning
CN1473294A (en) hardware loop
CN1306401C (en) A micro-dispatching method supporting directed cyclic graph
CN100336033C (en) Single-chip analog system with multi-processor structure
CN1231840C (en) Compile programme device and method for determining storage unit of data in storage area
CN1881175A (en) Method for solving multi-register conflict

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: NXP CO., LTD.

Free format text: FORMER OWNER: KONINKLIJKE PHILIPS ELECTRONICS N.V.

Effective date: 20080404

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20080404

Address after: Holland Ian Deho Finn

Applicant after: NXP B.V.

Address before: Holland Ian Deho Finn

Applicant before: Koninklijke Philips Electronics N.V.

C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: NXP BV

Free format text: FORMER OWNER: KONINKL PHILIPS ELECTRONICS NV

Effective date: 20110222

C41 Transfer of patent application or patent right or utility model
C56 Change in the name or address of the patentee

Owner name: ST WIRELESS SA

Free format text: FORMER NAME: NXP BV

COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: EINDHOVEN, HOLLAND TO: 1228 NO. 39, DEFEIYE ROAD, WUTESHANG, PULANGLAI, SWISS

CP01 Change in the name or title of a patent holder

Address after: One thousand two hundred and twenty-eight Swiss Prang Eli Ute Jean Deferre at No. 39

Patentee after: ST-ERICSSON S.A.

Address before: One thousand two hundred and twenty-eight Swiss Prang Eli Ute Jean Deferre at No. 39

Patentee before: ST Wireless

TR01 Transfer of patent right

Effective date of registration: 20110222

Address after: One thousand two hundred and twenty-eight Swiss Prang Eli Ute Jean Deferre at No. 39

Patentee after: ST Wireless

Address before: Holland Ian Deho Finn

Patentee before: NXP B.V.

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100120

Termination date: 20181207

CF01 Termination of patent right due to non-payment of annual fee