CN103377034B

CN103377034B - The pre-delivery method of instruction and device, instruction management system, arithmetic core

Info

Publication number: CN103377034B
Application number: CN201210107338.8A
Authority: CN
Inventors: 高剑刚; 卢宏生; 任秀江; 郑方; 郑卫华; 王梦嘉; 施晶晶
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2012-04-12
Filing date: 2012-04-12
Publication date: 2016-06-08
Anticipated expiration: 2032-04-12
Also published as: CN103377034A

Abstract

A kind of pre-delivery method of instruction and device, instruction management system, arithmetic core, the pre-delivery method of described instruction includes: be instruction block according to the execution sequence of program by described procedure division; Arranging the super block table of instruction of described instruction block, described instruction super block table carries the mark of described instruction block, the storage address of described instruction block, the mark of next instruction block; According to execution sequence transmission instruction block at least one arithmetic core. Technical scheme can effectively reduce the instruction of arithmetic core and miss the target and latency delays, improves the computational efficiency of arithmetic core.

Description

The pre-delivery method of instruction and device, instruction management system, arithmetic core

Technical field

The present invention relates to instruction management technique field, especially the one pre-delivery method of instruction and device, instruction management system and a kind of arithmetic core.

Background technology

In general processor, generally adopt the instruction storage organization of classification, it is, instruction is stored in the storage medium of different stage. Arithmetic core (the composition device of processor, each arithmetic core can regard a little processor as) from local storage, operationally obtain instruction, the memory span local due to arithmetic core is limited, easily produce the situation that fetching is failed, that is, if not storing the instruction that will run in arithmetic core, arithmetic core needs just can continue to run with after obtaining instruction from upper level command memory, and fetching failure is missed the target also referred to as instruction. In the instruction storage organization situation adopting classification, from upper level command memory, obtain instruction can expend the substantial amounts of time, if there is fetching failure frequently, the consuming time of instruction transmission can be increased, reduce the work efficiency of arithmetic core.

In multinuclear, many-core processor, integrated multiple arithmetic cores on single silicon-chip. Owing to arithmetic core quantity is many, the instruction memory size in each arithmetic core is little, and the fetching competition conflict of shared upper level command memory can be increased, and the fetching race problem between arithmetic core highlights gradually. Especially when the operation core calculation on single silicon-chip increase to tens, hundreds of time, traditional fetching processing mode makes the situation that arithmetic core fetching postpones substantially increase. Meanwhile, it is congested that fetching competition also results in communication network, the bottleneck of this performance that can become restriction arithmetic core and adaptive surface.

Instruction processing technique the more commonly used in current processor includes SIMD (SingleInstructionMultipleData, single-instruction multiple-data stream (SIMD)) technology and SPMD (SingleProgrameMultipleData, single program multiple data stream) technology.

The technology unified instruction demands such as SIMD, the SPMD adopted in polycaryon processor, this can reduce instruction demand to a certain extent.

Polycaryon processor adopts SIMD technology, is primarily referred to as multiple arithmetic core (or a plurality of streamline in arithmetic core) and shares same instruction issue platform, the instruction that synchronous operation is identical, but the data that arithmetic core processes are different.

Adopting SPMD technology in polycaryon processor, be primarily referred to as each arithmetic core and perform identical program code, the program that each arithmetic core runs is identical, but the data processed are different.

The advantage of SIMD technology is each arithmetic core shared instruction transmitter unit of requirement, and every instruction all synchronizes to perform, and this prevent fetching competition, it is possible to alleviate congested to communication network of fetching operation that multi-core assembles.

The advantage of SPMD technology is to relax the synchronization requirement between arithmetic core, and the synchronization granularity between each arithmetic core is brought up to independent program level, and in program limit, each arithmetic core can autonomous operation.

The above technology, from reducing fetching operation source, reducing the angle of program code kind, can reduce fetching conflict to a certain extent and reduce fetching delay.

But SIMD technology requires that every instruction of each arithmetic core will synchronize to perform, and the resource of arithmetic core is generally difficult to is fully used, it is impossible to play the computing capability of all arithmetic cores, limits the scope of application of this technology.

In multinuclear, many-core processor, along with arithmetic core quantity increases, memory span in arithmetic core is little, if SPMD procedure quantity is more than the memory span in arithmetic core, fetching is missed the target or can cause fetching operation frequently, causes fetching operating collision to aggravate, and communication network is congested seriously, the fetching waiting time of arithmetic core is longer, and the computational efficiency playing arithmetic core is had considerable influence. Therefore, in multinuclear, many-core processor, the memory span in arithmetic core limits the scope of application of SPMD technology.

For the method in the Chinese patent of CN1466716A, publication number thinks that a processor provides instruction prefetch service only, be unsuitable for the processor structure of multinuclear, many-core processor. On the other hand, for the method for prefetched instruction in this patent, each calculating core being needed extra secondary processor, for performing the simple version of program, hardware spending is bigger.

The instruction how effectively reducing arithmetic core is missed the target and latency delays, and the computational efficiency improving arithmetic core becomes one of current problem demanding prompt solution.

Summary of the invention

The problem that this invention address that is that the instruction how effectively reducing arithmetic core is missed the target and latency delays, improves the computational efficiency of arithmetic core.

For solving the problems referred to above, the invention provides a kind of pre-delivery method of instruction, including:

It is instruction block according to the execution sequence of program by described procedure division;

Arranging the super block table of instruction of described instruction block, described instruction super block table carries the mark of described instruction block, the storage address of described instruction block, the mark of next instruction block;

According to execution sequence transmission instruction block at least one arithmetic core.

For solving the problems referred to above, present invention also offers a kind of instruction and send on device, including:

The super block table of instruction arranges unit, the super block table of instruction in order to arrange instruction block, and described instruction super block table comprises the mark of described instruction block, the storage address of described instruction block, the mark of next instruction block;

Feedback unit, sends signal in order to send instruction block, and described procedure division is obtained by described instruction block according to the execution sequence of program.

For solving the problems referred to above, present invention also offers a kind of arithmetic core, including:

The location of instruction, in order to receive and to store the instruction of instruction block, described instruction block is what send on;

Arithmetic element, in order to run the instruction that the described location of instruction stores.

For solving the problems referred to above, present invention also offers a kind of instruction management system, including:

Above-mentioned instruction sends on device;

Above-mentioned arithmetic core;

Arithmetic core as above.

Compared with prior art, the invention have the advantages that

Adopting the method that software and hardware combines, the software execution track according to program, instruction code is divided into a series of instruction block sequence, the instruction block track of software assurance each arithmetic core instruction is consistent. The instruction block sequence information that hardware divides out according to software, sends into instruction needed for arithmetic core in the command memory of arithmetic core. Due to programmed instruction track in advance it can be seen that before arithmetic core really performs instruction, instruction actively can be loaded in the memorizer of arithmetic core in advance.

Technical scheme can arrange the arithmetic core information accepting to send on instruction, only the arithmetic core registered in advance is carried out instruction and sends on, it does not have the arithmetic core of registration can fetching run voluntarily. This mechanism is conducive to supporting multi-job operation, and partial arithmetic core performs to send on instruction, and part core fetching voluntarily performs other instructions; This mechanism also helps fault-tolerant, when can not run when there being core to break down, sends on the operation of mechanism without influence on instruction.

By the instruction block that program cutting is little, the instruction in instruction block being delivered to arithmetic core in advance, arithmetic core can obtain instruction before operation. This makes sending on and calculating core and run relatively independent of call instruction, and each arithmetic core can accept other instructions sent on not affecting the instruction being carrying out simultaneously, reduces the time that arithmetic core fetching waits, improves the operation efficiency of arithmetic core.

Difference according to program, has done special handling to function call and cyclic program, it is ensured that the execution sequence of program will not be destroyed.

Each arithmetic core both can actively send fetching request, can passively accept again instruction. Arithmetic core active fetching voluntarily, it is possible to ensure the independence between each core and improve the flexibility ratio of program; What passive acceptance multicast or broadcasted sends on instruction, it is possible to eliminate the competition of arithmetic core fetching, additionally it is possible to minimizing taking communication network, is conducive to improving further the operation efficiency of arithmetic core; It addition, also support section arithmetic core actively voluntarily fetching and partial arithmetic core passively accept instruction and occur simultaneously, remain the support to program flexibility, be conducive to improving the scope of application of the present invention.

Instruction is organized into the instruction block of volume-variable, and the instruction super block track that each arithmetic core performs is identical, but allows each arithmetic core execution track in instruction block different. It is to say, the execution speed of each arithmetic core is different, sends on the response that inquiry request makes based on each arithmetic core send the instruction in this instruction block to described, it is possible to balance the difference of the speed of service between each arithmetic core. This makes each arithmetic core speed of service difference in controlled range, but does not force each arithmetic core synchronous operation instruction. This is a kind of loose synchronization mechanism to send on the control mode that instruction is means, namely inquiry request is sent on by transmission, based on the speed difference between the arithmetic core that the rhythm sent in advance of the response control instruction of arithmetic core, the fireballing arithmetic core of balance movement and the speed of service are slow. That is, make the arithmetic core that operation is the fastest too many not over running the slowest core, if running the instruction that fast arithmetic core will run when can cover the instruction block being currently running running slow arithmetic core, run slow arithmetic core and send instruction conflict response, suspend the work sending instruction to all arithmetic cores. There is some difference for the speed of the technical scheme each arithmetic core of permission, and instruction sends in process the interference of arithmetic core is few, and what be therefore suitable for is wider.

Mode is sent on the instruction of multicast, broadcast, multinuclear, many-core structure is especially suitable, it is possible to reduce the fetching competition between arithmetic core, and reduce the instruction transmission congestion time to communication network, improve chip-on communication network utilisation efficiency.

The technical scheme is that a kind of instruction based on loose synchronization sends on mechanism, solve the fetching race problem calculated in monokaryon, multinuclear, many-core processor in the long problem of core fetching delay and multinuclear, many-core processor, improve the operation efficiency of processor.

Accompanying drawing explanation

Fig. 1 is the flow chart of the pre-delivery method of instruction that the embodiment of the present invention provides;

Fig. 2 is the schematic diagram that the instruction block that the embodiment of the present invention provides delivers to the arithmetic core specified in advance;

Fig. 3 is the structural representation of the program comprising function call that the embodiment of the present invention provides;

Fig. 4 is the execution sequence schematic diagram of the instruction block of the procedure division shown in Fig. 3;

Fig. 5 is the schematic diagram of the super block table of instruction of the instruction block shown in Fig. 4;

Fig. 6 is the schematic diagram of the instruction block of the procedure division comprising cyclic program that the embodiment of the present invention provides;

Fig. 7 is the schematic diagram of the super block table of instruction of the instruction block of the procedure division comprising cyclic program that the embodiment of the present invention provides;

Fig. 8 is that the instruction that the embodiment of the present invention provides sends on device;

Fig. 9 is the arithmetic core that the embodiment of the present invention provides;

Figure 10 is the schematic diagram that the command memory that the embodiment of the present invention provides stores the storage address mapping relation of instruction block with arithmetic core;

Figure 11 is the program that provides of the embodiment of the present invention and the schematic diagram of instruction block;

Figure 12 is the schematic diagram of the instruction in the instruction block that the embodiment of the present invention provides;

Figure 13 is the instruction management system that the embodiment of the present invention provides.

Detailed description of the invention

Understandable for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from, below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail.

Elaborate detail in the following description so that fully understanding the present invention. But the present invention can be different from alternate manner described here implement with multiple, and those skilled in the art can do similar popularization when without prejudice to intension of the present invention. Therefore the present invention is not by the restriction of following public detailed description of the invention.

Fig. 1 is the flow chart of the pre-delivery method of instruction that the embodiment of the present invention provides, and describes in detail below in conjunction with Fig. 1.

The pre-delivery method of described instruction includes:

Step S1, is instruction block according to the execution sequence of program by described procedure division;

Step S2, arranges the super block table of instruction of described instruction block, and described instruction super block table carries the mark of described instruction block, the storage address of described instruction block, the mark of next instruction block;

Step S3, according to execution sequence transmission instruction block at least one arithmetic core.

The pre-delivery method of described instruction also includes:

Step S4 (does not show in Fig. 1), described step S4 is generally before step S3, namely before the instruction sending instruction block, send to each arithmetic core and send on inquiry request, send on the response that inquiry request makes based on each arithmetic core send the instruction in this instruction block to described, described in send on inquiry request and comprise the information of the instruction sent on.

In described step S1, program is divided into instruction block according to execution sequence, after having divided, obtains the execution sequence of instruction block, perform track also referred to as instruction block.

Specifically, instruction block is the third party software execution sequence according to program, or divide according to execution sequence in conjunction with the call number etc. of the execution time of practical experience such as instruction, instruction and to obtain, the execution sequence of the instruction block after division is identical with the execution sequence of program, and namely the track that performs of instruction block is consistent with the execution sequence of program. The execution sequence of described program refers to the execution sequence of instruction in program.

If have the programs such as circulation, recurrence or conditional judgment in program, the execution sequence performing track and program of instruction block can be slightly different, but in instruction block, the execution sequence of instruction and the execution sequence of program are identical.

As shown in table 1. Such as, comprising instruction 1, instruction 2, instruction 3, instruction 4 in program, if instruction 2 and instruction 3 form cyclic program, and the number of times of circulation is 2 times, then the execution sequence of program is instruction 1 �� instruction, 2 �� instruction, 3 �� instruction, 2 �� instruction, 3 �� instruction 4; Can being three instruction blocks by procedure division, instruction block 1 comprises instruction 1, and instruction block 2 comprises instruction 2 and instruction 3, and instruction block 3 comprises instruction 4, then the execution track of instruction block is instruction block 1 �� instruction block, 2 �� instruction block 3.

Table 1

In the example above, although the execution sequence of instruction block and the execution sequence of program are slightly different, and both essence are identical. The execution sequence of instruction block illustrates the execution sequence of the instruction comprised in instruction block, the execution track of instruction block is instruction block 1 �� instruction block, 2 �� instruction block 3, and the execution sequence of the instruction in instruction block is still instruction 1 �� instruction, 2 �� instruction, 3 �� instruction, 2 �� instruction, 3 �� instruction 4. Therefore the execution sequence performing track and program of instruction block is slightly different, but in instruction block, the execution sequence of instruction and the execution sequence of program are identical.

In above-mentioned example, comprising instruction 1, instruction 2, instruction 3, instruction 4 in program, if instruction 2 and instruction 3 form cyclic program, and the number of times of circulation is 2 times, then the execution sequence of program is instruction 1 �� instruction, 2 �� instruction, 3 �� instruction, 2 �� instruction, 3 �� instruction 4; When program is carried out piecemeal, can also be four instruction blocks by procedure division, instruction block 1 comprises instruction 1, instruction block 2 comprises instruction 2, instruction block 3 comprises instruction 3, instruction block 4 comprises instruction 4, then the execution track of instruction block is instruction block 1 �� instruction block, 2 �� instruction block, 3 �� instruction block, 2 �� instruction block, 3 �� instruction block 4. In this kind of partitioned mode, the execution track of instruction block is identical with the execution sequence of program.

In specific implementation process, the mode dividing instruction block can divide as required flexibly, is not limited to the above-mentioned content enumerated.

Can be one by the quantity of the instruction block of procedure division can also be multiple, say, that, it is possible to it is multiple instruction block by procedure division, it is also possible to a whole program is as an instruction block. Comprising a number of instruction in each instruction block, the quantity of the instruction comprised in each instruction block can be one can also be a plurality of. Generally, instruction block comprises at least two instructions but it also may only comprise an instruction.

Arithmetic core includes the location of instruction receiving and storing described instruction block and the arithmetic element running the instruction block that the described location of instruction stores, and the space shared by described instruction block is less than or equal to the memory space of the described location of instruction. The execution sequence of described program generally will also maintain in described arithmetic core.

When being instruction block by procedure division, it is desirable to the space shared by instruction block after division is less than or equal to the memory space of the described location of instruction. This can ensure that arithmetic core can preserve at least one complete instruction block.

In described step S2, the super block table of instruction is saved in on-chip memory, described instruction super block table have recorded the mark of described instruction block and (index of instruction block can also be called, for search instruction block), the storage address (sending on for instruction block) of this instruction block, the mark of next instruction block (present instruction block send on end after, for retrieving the instruction block that the next one to send on). Wherein, the storage address of described instruction block includes start address and the end address of instruction block storage.

In described step S3, sending instruction block at least one arithmetic core according to execution sequence, the instruction block run in arithmetic core sends on, and therefore sends instruction block according to execution sequence, it is ensured that arithmetic core performs instruction block according to execution sequence.

Execution track according to program, is divided into a series of instruction block sequence by instruction code, it is ensured that the instruction block track of each arithmetic core instruction is consistent. Instruction needed for arithmetic core is sent in the command memory of arithmetic core, due to programmed instruction track in advance it can be seen that therefore before arithmetic core really performs instruction, instruction actively can be loaded in the memorizer of arithmetic core in advance.

In being embodied as, when sending instruction block, instruction block splitting into some instruction groups, often group includes the instruction of agreement granularity. Send the instruction of this instruction block according to agreement granularity (instruction group), described agreement granularity is the quantity of the instruction every time sent on. Described instruction block or instruction is sent to described arithmetic core, it is possible to improve transmission speed, thus improving work efficiency by the mode of broadcast or multicast.

Instruction block is sent at least one arithmetic core according to execution sequence, until all of instruction block is sent, if or obtain the stopping response that described arithmetic core sends, then stop sending instruction block to whole arithmetic cores. If after obtaining the stopping response that described arithmetic core sends, again obtain the response of reception again that this arithmetic core sends, then again send instruction block to whole arithmetic cores.

When the internal appearance of arithmetic core is abnormal, sends stopping response, stop the instruction block that reception sends on, for instance, the memory space of arithmetic core is full, it is impossible to when storing new instruction block; When the instruction block that the instruction block sent on is currently executing with arithmetic core conflicts etc. When the internal appearance of arithmetic core is abnormal, send stopping response, if Abnormality remove, arithmetic core can also again send and again receive response, again receives the instruction block sent on, for instance, delete the instruction block of executed, when the memory space of arithmetic core can continue to new instruction block; When currently stored instruction block can be covered by the follow-up instruction block sent on.

In described step S4, described in send on inquiry request and send before the instruction sending this instruction block according to agreement granularity every time. Which ensure that arithmetic core can correctly judge the need of receiving the instruction sent in advance, to the arithmetic core not needing instruction, it is to avoid useless transmission.

For ensureing that arithmetic core can be properly received instruction block, before sending instruction block, send to each arithmetic core and send on inquiry request. Concrete it may be that when sending instruction block, described instruction block splits into some instruction groups, often group comprises the instruction of agreement granularity. Granularity (instruction group) sends the instruction in instruction block according to a preconcerted arrangement. Therefore, before sending the instruction in instruction block according to agreement granularity every time, all can send to described arithmetic core and send on inquiry request.

Send on the response that inquiry request makes based on each arithmetic core send the instruction in this instruction block to described, described in send on inquiry request and comprise the information of the instruction sent on. The described inquiry request mainly inquiry arithmetic core that sends on is at present the need of new instruction.

Each arithmetic core no longer actively sends fetching request, it is possible to eliminate the competition of arithmetic core fetching, additionally it is possible to reduce taking communication network, be conducive to improving further the operation efficiency of arithmetic core.

Described arithmetic core sends on the response that inquiry request makes include described: response and instruction conflict response are abandoned in command reception response, instruction; Wherein command reception response is divided into: immediately receives response, postpone to receive response.

Arithmetic core is according to the instruction currently performed, the instruction sent on and the information of instruction that is stored but that have not carried out, it may be judged whether needs to receive the instruction sent on, and responds, specifically:

If the instruction that the instruction sent on is described arithmetic core to be needed and this arithmetic core have the instruction sent on described in enough memory space storages, then this arithmetic core is made and is immediately received response; If the instruction that the instruction sent on is described arithmetic core to be needed but this arithmetic core do not have the instruction sent on described in the storage of enough memory spaces, then this arithmetic core is made delay and is received response.

If the instruction that sends on covers instruction that described arithmetic core is currently running or covers stored but also unenforced instruction, or two kinds of situations all exist, then this arithmetic core is made instruction conflict and responded; If the instruction sent on is stored in described arithmetic core, then this arithmetic core is made instruction and is abandoned response.

In the specific implementation, the pre-delivery method of described instruction also includes: setting up arithmetic core state table, described arithmetic core state table can record the record that misses the target of arithmetic core, and synchronous point is registered, the information such as the conditions of demand of the instruction sent on are registered, the number of the arithmetic core of the instruction that reception sends on.

Described send on the response that inquiry request makes based on each arithmetic core send the instruction in this instruction block to described, can be the response that each arithmetic core is made is separately recorded in described arithmetic core state table, described arithmetic core state table is updated by the response then made according to each arithmetic core, and send the instruction in instruction block according to up-to-date arithmetic core state table, concrete may include that

If the response that all arithmetic cores are made does not include instruction conflict response, then to sending the instruction sent on described in the arithmetic core transmission receiving instruction response;

If the response that all arithmetic cores are made only includes instruction and abandons response, then abandon to the instruction sent on described in the transmission of each arithmetic core;

If the response that all arithmetic cores are made includes instruction conflict response, then wait that the arithmetic core making instruction conflict response responds again, until the response that all arithmetic cores are made does not include instruction conflict response.

Described arithmetic core sends on the response that inquiry request makes include described: immediately receive response, postpone to receive response, response is abandoned in instruction and instruction conflict responds then:

If the response that all arithmetic cores are made does not include instruction conflict response and postpones reception response, then to sending the instruction sent on described in the instant arithmetic core transmission receiving response;

If the response that all arithmetic cores are made includes instruction conflict response or postpones to receive response, or two kinds of situations all exist, then wait that the arithmetic core making instruction conflict response responds again, and wait that making the arithmetic core postponing to receive response makes and immediately receive response, until the response that all arithmetic cores are made does not include instruction conflict response and postpones to receive response.

That is, the response only made when arithmetic core only includes and instant receives response and when instruction abandons responding, just can send, to making the instant arithmetic core receiving response, the instruction sent on, and only can send, to making the instant arithmetic core receiving response, the instruction that send on, without sending, to the arithmetic core making instruction and abandoning response, the instruction sent on. This is to ensure that the arithmetic core needing instruction can obtain the instruction of needs, it is not necessary to the arithmetic core of instruction is avoided repeating to receive, and reduces the waste of the communication resource.

If the response that arithmetic core is made comprising instruction conflict response or postponing to receive response, or both have, then wait having have sent instruction conflict response or having postponed to receive the arithmetic core responded and again send response, and judge whether new response is that instant reception responds or response is abandoned in instruction, if both are neither, then continuation etc., until the response received only includes and instant receives response and when instruction abandons responding, send, to making the instant arithmetic core receiving response, the instruction sent on.

Arithmetic core is in the process performing instruction block, owing to the startup time of each arithmetic core is likely to inconsistent, the data that each arithmetic core runs also are not quite similar, therefore often there is difference in the execution speed of each arithmetic core, can make the pending slow-footed arithmetic cores such as the fireballing arithmetic core of execution by the pre-delivery method of described instruction.

Owing to the execution speed of each arithmetic core is different, the fast arithmetic core of the speed of service is likely to perform the instruction of its storages all, waits the instruction of follow-up transmission, and the arithmetic core that the speed of service is slow, it is possible to also it is not carried out the instruction of its storage. Due to the limited storage space of arithmetic core, the instruction therefore wherein stored is constantly to be updated replacement by the instruction of follow-up transmission. If the slow arithmetic core memory space of the speed of service is full, and the instruction of storage is unenforced instruction, then this arithmetic core cannot receive the instruction of follow-up transmission, and the response sending on inquiry request is instruction conflict response or postpones to receive response by this arithmetic core. The arithmetic core that the speed of service is fast, owing to having performed the instruction of its storages all, then the response sending on inquiry request is immediately receive response by this arithmetic core.

Response owing to only making when arithmetic core only includes and instant receives response and when instruction abandons responding, just can send, to making the instant arithmetic core receiving response, the instruction sent on, therefore the arithmetic core that the speed of service is slow suspends, after sending instruction conflict response or postponing reception response, the work sending instruction to all of arithmetic core. Now perform fireballing arithmetic core and also cannot receive follow-up instruction, can only wait that arithmetic core sends again and instant receive response or after instruction abandons response, could recovering the transmission of subsequent instructions, the fast arithmetic core of the speed of service could receive follow-up instruction. Additionally, if the response that the slow arithmetic core of the speed of service sends again is for postponing to receive response, then still need to wait until the slow arithmetic core of the speed of service sends and instant receives response or response is abandoned in instruction, could recovering the transmission of subsequent instructions, the fast arithmetic core of the speed of service could receive follow-up instruction.

Based on foregoing, owing to the arithmetic core that the speed of service is slow is made instruction conflict response or postpones to receive response, this will cause the work stoppage of follow-up transmission instruction, waits until the slow arithmetic core of the speed of service sends and instant receives response or instruction is abandoned response and could be recovered. Such mechanism can allow for can existing between each arithmetic core necessarily running difference, and when this species diversity is excessive, this mechanism can balance the speed difference between each arithmetic core on the one hand, has also functioned to control program simultaneously and has performed the effect of rhythm.

In being embodied as, it is also possible to send the instruction of the instruction block specified to the arithmetic core specified. The pre-delivery method of described instruction also includes:

The information including receiving the arithmetic core of the instruction of the instruction block sent on is set and the information table of the mark of the instruction block sent on that this arithmetic core need to receive; The described instruction arithmetic core that extremely described information table is specified including according to execution sequence transmission instruction block at least one arithmetic core sending described instruction block according to execution sequence;

Described arithmetic core also sends fetching request and mode switch request voluntarily, the pre-delivery method of described instruction also includes: when receiving the request of fetching voluntarily that described arithmetic core sends, and stops sending to the arithmetic core of fetching request voluntarily described in transmission sending on inquiry request and the instruction sent on;

Send at the arithmetic core specified to described information table the instruction block sent on that this arithmetic core need to receive instruction send on inquiry request before, if this arithmetic core has sent fetching request and again sending mode handover request voluntarily, then send to this arithmetic core the instruction block sent on that this arithmetic core need to receive instruction send on inquiry request, if this arithmetic core has sent fetching request voluntarily but also non-sending mode handover request, then wait until the mode switch request that again sends of the arithmetic core that receives fetching request voluntarily described in transmission.

Can send, to the arithmetic core specified, the instruction block specified by configuration information table. The fetching pattern of arithmetic core can be arranged the pattern that send on or fetching pattern voluntarily by practical operation, the arithmetic core specified in information table is the arithmetic core of the instruction receiving the instruction block sent on, could be arranged to send on the arithmetic core of pattern, in information table, unspecified arithmetic core could be arranged to the arithmetic core of fetching pattern voluntarily. Information table specifies the instruction of the instruction block that arithmetic core need to receive, if before the instruction of its instruction block that need to receive that the arithmetic core being set to the pattern that sends on is specified in not receiving information table, send fetching request voluntarily, then enter the fetching pattern of fetching pattern voluntarily, after fetching terminates voluntarily, again send mode switch request, reenter the pattern of sending on.

Arithmetic core would generally coordinate send voluntarily fetching request and mode switch request; that is; arithmetic core suspends the instruction stopping the instruction block that acceptance sends on; by arithmetic core voluntarily to command memory fetching; and after fetching terminates voluntarily; during the instruction of the instruction block again accepting to send on; described arithmetic core first sends fetching request voluntarily; and after fetching terminates voluntarily; send mode switch request; after reentering the pattern of sending on, again receive in information table this arithmetic core specified and need to receive but the instruction of the instruction block sent on that also do not receive.

Described information table can also be saved in on-chip memory, when carrying out instruction and sending on, only the arithmetic core specified in information table is sent on instruction, it does not have the arithmetic core specified completes fetching operation voluntarily.

If arithmetic core terminates the instruction of the instruction block that acceptance sends on, by arithmetic core voluntarily to command memory fetching, then send fetching request voluntarily and only send fetching request voluntarily.

Illustrate, as shown in Figure 2, when being instruction block 1 to instruction block 5 by procedure division, after program goes to instruction block 2, there may be other programs that small part arithmetic core needs to perform to be not included in instruction block 1 to instruction block 5, this partial arithmetic core can restart to receive the instruction sent on from instruction block 5 after having performed other programs. Configuration information table, owing to all cores will start to perform program from instruction block 1, is therefore the instruction block that all arithmetic cores are required for by instruction block 1 signalment; When performing instruction block 2, instruction block 3, instruction block 4, some arithmetic core possible need not, so these instruction blocks not being done signalment, the fetching voluntarily sent according to arithmetic core when sending on these instruction blocks is asked the situation with mode switch request to adjust in real time and is sent on target core; Have sent the arithmetic core of fetching request voluntarily to start to rejoin the arithmetic core queue that reception sends on instruction block from instruction block 5, therefore instruction block 5 is also required to signalment is the instruction block that all arithmetic cores are required for.

As it was previously stated, instruction block 1, instruction block 5 can be designated and send on pattern enabled instruction block, carry out labelling with start symbol. The process that instruction sends on is as described below: after program starts, first processing instruction block 1, and the start identifier in instruction block 1 is effective, all arithmetic cores is sent and sends on inquiry request, sending on of sign on block 1. When processing instruction block 2, small part core is had to perform other programmed instruction, after sending out fetching request voluntarily, instruction sends on device and records these information, no longer these arithmetic cores are sent and send on inquiry request and the instruction sent on, only continue to send instruction block 3, instruction block 4 to the arithmetic core not sending fetching request voluntarily. When processing instruction block 5, start marker character in instruction block 5 is effective, represent that the arithmetic core sending fetching request voluntarily may require that this instruction block, have sent after the arithmetic core of the request of fetching voluntarily sends mode switch request before now needing to wait, that could continue all arithmetic cores are sent instruction block 5 sends on inquiry request.

In the execution process of program, it will usually the situation that outcome function calls, when in described program, existence function calls, the pre-delivery method of described instruction also includes:

When in described program, existence function calls, being before instruction block according to the execution sequence of program by described procedure division, the invoked function in described program is divided in bulk function, described piece of function is also comprise instruction; Described procedure division is that instruction block includes the execution sequence according to described piece of function Program a described piece function is divided into instruction block by the described execution sequence according to program, and the super block table of instruction calling the instruction block of the instruction block of described piece of function also includes: represent the invocation flags symbol called; The mark of the instruction block returned after function call; The super block table of instruction of last instruction block in described piece of function also includes: represent the return label symbol returned;

When the invocation flags in the super block table of the instruction of the instruction block at instruction place to be sent accords with effectively, then by the identification record of the instruction block of return after function call in super for the instruction of this instruction block block table in function stack;

When the return label in the super block table of the instruction of the instruction block at instruction place to be sent accords with effectively, after the transmission completing this instruction block, take out the mark of the instruction block that record returns after the function call of the stack top of described function stack, to determine instruction block next to be sent. In each piece of function, the return label symbol of last instruction block is effective.

In being embodied as, in program, often outcome function calls the situation that nested function calls, say, that allow nesting, i.e. nested block function in block function occur in block function.

For example, Fig. 3 to Fig. 5 is that the program comprising function call that the embodiment of the present invention provides implements schematic diagram according to technical scheme, describes in detail below in conjunction with Fig. 3 to Fig. 5.

Fig. 3 is the structural representation of the program comprising function call that the embodiment of the present invention provides, and the invoked function (being generally subfunction) in program is divided in bulk function, it is possible to be that a subfunction is independently divided into a block function. Can also nested block function again in block function.

Shown in Fig. 3, described program comprises principal function Main (), subfunction Function0 (), subfunction Function1 (). Wherein, principal function Main () calls subfunction Function0 (), subfunction Function0 () calls subfunction Function1 ().

Invoked function in program is divided into block function respectively: Main block function 10, Function0 block function 20, Function1 block function 30. The division that the instruction comprised in each block function carries out instruction block is as follows: Main block function 10 (comprising instruction block 1 and instruction block 2), Function0 block function 20 (comprising instruction block n and instruction block n+1), Function1 block function 30 (comprising instruction block m and instruction block m+1).

The instruction except block function of the program shown in Fig. 3 is divided into instruction block, described program package containing instruction block respectively: instruction block 1, instruction block 2, instruction block 3 ..., instruction block n, instruction block n+1 ..., instruction block m, instruction block m+1.

Program operation process performs function call: perform principal function Main (), after having performed instruction block 1, call function Function0 (), after entering function Function0 (), perform instruction block n, function Function1 () is called in execution after completing instruction block n, after the Function1 () entered, order performs instruction block m and instruction block m+1, perform return function Function0 () after instruction block m+1, perform instruction block n+1, after having performed instruction block n+1, return Main (), perform instruction block 2.

Fig. 4 is the execution sequence schematic diagram of the instruction block of the procedure division shown in Fig. 3, figure 4, it is seen that the execution sequence of this program is: instruction block 1 �� instruction block n �� instruction block m �� instruction block m+1 �� instruction block n+1 �� instruction block 2.

Fig. 5 is the schematic diagram of the super block table of instruction of the instruction block shown in Fig. 4, and in being embodied as, the super block table of described instruction can be stored in on-chip memory. In the super block table of described instruction, call represents that invocation flags accords with, return represents that return label accords with, next represents the mark of next instruction block, saddr represents the start address that instruction block stores, eaddr represents the end address (being only herein the start address representing instruction block storage and the end address of signal) that instruction block stores, the mark (when call is effective, stack is just effective) of the instruction block that stack representative function returns after calling. Execution calls function Function0 () after completing instruction block 1, therefore the call of instruction block 1 is effective, its value is 1, now the stack of instruction block 1 is effective, and its value is the mark of instruction block 2, calls function Function1 () after having performed instruction block n, therefore the call of instruction block n is effective, its value is 1, and now the stack of instruction block n is effective, and its value is the mark of instruction block n+1. Having performed return function Function0 () after instruction block m+1, therefore the return of instruction block m+1 is effective, and its value is 1, and after having performed instruction block n+1, return function Main (), therefore the return of instruction block n+1 is effective, and its value is 1.

In specific implementation process, execution calls subfunction Function0 () after completing instruction block 1, the call of instruction block 1 is effective, its value is 1, now the stack of instruction block 1 is effective, obtain the value (mark of instruction block 2) in the stack of instruction block 1, the mark (2) of instruction block 2 is pressed into function stack, subfunction Function1 () is called in execution after completing instruction block n, the call of instruction block n is effective, its value is 1, now the stack of instruction block n is effective, obtain the value (mark of instruction block n+1) in the stack of instruction block n, the mark (n+1) of instruction block n+1 is pressed into function stack. subfunction Function0 () is returned after having performed instruction block m+1, the return of instruction block m+1 is effective, its value is 1, takes out the mark (n+1) of instruction block from the stack top of function stack, it is determined that next instruction block to be sent is the instruction block corresponding to this mark, after having performed instruction block n+1, returning principal function Main (), the return of instruction block n+1 effective, its value is 1, the mark (2) of instruction block is taken out, it is determined that next instruction block to be sent is the instruction block corresponding to this mark is instruction block 2 from the stack top of function stack.

Program can also comprise cyclic program, such as, in above-mentioned steps 1 program of citing, when described program exists cyclic program, described cyclic program is divided into the instruction block that circulation performs, and constitutes the mark also including the next instruction block after jumping out circulation in the super block table of instruction of the instruction block that circulation performs.

In the process performing to constitute the instruction block of circulation, described send instruction block according to execution sequence and include at least one arithmetic core: according to the mark transmission instruction block of the next instruction block in the super block table of instruction constituting the instruction block that circulation performs at least one arithmetic core. Described arithmetic core is after the execution completing cyclic program, it is also possible to sends and jumps out round robin; Round robin is jumped out described in all arithmetic core transmissions performing to constitute the instruction block that circulation performs when receiving, jump out the mark of the next instruction block after circulation described in obtaining from the super block table of instruction of the instruction block constituting circulation execution, send corresponding instruction block and constitute the arithmetic cores of the instruction block that circulation performs to all execution.

Fig. 6 is the schematic diagram of the instruction block of the procedure division comprising cyclic program that the embodiment of the present invention provides, as shown in Figure 6, instruction block 2 and instruction block 3 circulation perform (number of times of circulation does not limit), the execution sequence of instruction block be instruction block 1 �� instruction block, 2 �� instruction block, 3 �� instruction block, 2 �� instruction block 3 �� ... �� instruction block 4 (instruction block 4 can also continue to perform instruction block 5 below).

Fig. 7 is the schematic diagram of the super block table of instruction of the instruction block of the procedure division comprising cyclic program that the embodiment of the present invention provides, and next represents the mark of next instruction block, and break represents the mark of the next instruction block after jumping out circulation.

After arithmetic core has performed the circulation of instruction block 2 and instruction block 3 composition, round robin is jumped out in transmission, according to mark (instruction block 4) the transmission instruction block of the next instruction block in the super block table of instruction constituting the instruction block (instruction block 2 or instruction block 3) that circulation performs at least one arithmetic core;

Round robin is jumped out described in all arithmetic core transmissions performing to constitute the instruction block that circulation performs when receiving, from the mark of the next instruction block constituted after jumping out circulation described in the super block table acquisition of the instruction circulating the instruction block performed, send the extremely all arithmetic cores performing the instruction block that composition circulation performs of corresponding instruction block (instruction block 4).

Fig. 8 is that the instruction that the embodiment of the present invention provides sends on device, describes in detail below in conjunction with Fig. 8.

Described instruction sends on device and includes:

The super block table of instruction arranges unit 13, the super block table of instruction in order to arrange described instruction block, and described instruction super block table comprises the mark of described instruction block, the storage address of described instruction block, the mark of next instruction block;

Feedback unit 12, sends signal in order to send instruction block, and described procedure division is obtained by described instruction block according to the execution sequence of program, and feedback unit 12 arranges unit 13 with the super block table of instruction and is connected.

Feedback unit 12 is also in order to, after obtaining the stopping response that described arithmetic core sends, to stop sending described instruction block and to send signal. Described feedback unit 12 is also in order to, after obtaining the stopping response that described arithmetic core sends, after again obtaining the response of reception again that this arithmetic core sends, to re-emit described instruction block and to send signal.

Ready-portioned instruction block can be stored in SAM Stand Alone Memory, and feedback unit 12 is sent instruction block by SAM Stand Alone Memory to arithmetic core after sending instruction block transmission signal.

Described instruction sends on device can also arrange built-in command memory (not shown), command memory is connected with feedback unit 12, in order to store described instruction block, after described command memory obtains the described instruction block transmission signal that described feedback unit 12 sends, according to execution sequence transmission instruction block at least one arithmetic core.

Described instruction sends on device and also includes:

Send on query unit 11, in order to, before the instruction sending instruction block, to send to each arithmetic core and to send on inquiry request, described in send on inquiry request and comprise the information of the instruction sent on; Send on query unit 11 to arrange unit 13 with the super block table of instruction and be connected, and send on query unit 11 and be connected with feedback unit 12;

Feedback unit 12 also in order to send on, to described, the response that inquiry request is made based on each arithmetic core, makes feedback.

Described instruction sends on device and can also include:

Draw module unit (not shown), in order to being described instruction block according to the execution sequence of program by described procedure division.

Command memory (not shown), in order to store described instruction block, after described command memory obtains described feedback or instruction block transmission signal from described feedback unit, according to execution sequence transmission instruction block at least one arithmetic core.

If the response that all arithmetic cores are made does not include instruction conflict response, then described feedback unit 12 is made to the feedback sending the instruction sent on described in the arithmetic core transmission receiving instruction response; If the response that all arithmetic cores are made includes instruction conflict response, then described feedback unit 12 is made and is waited that the arithmetic core of instruction conflict response responds again, until the response that all arithmetic cores are made does not include the feedback of instruction conflict response.

If the response that all arithmetic cores are made does not include instruction conflict response and postpones reception response, then described feedback unit 12 is made to the feedback sending the instruction sent on described in the instant arithmetic core transmission receiving response; If the response that all arithmetic cores are made includes instruction conflict response and/or postpones to receive response, then described feedback unit 12 is made and is waited that the arithmetic core making instruction conflict response responds again, and wait that making the arithmetic core postponing to receive response makes and immediately receive response, until the response that all arithmetic cores are made does not include instruction conflict response and postpones to receive the feedback of response.

Described instruction sends on device and can also include:

Information table arranges unit (not shown) in order to arrange the information including receiving the arithmetic core of the instruction of the instruction block sent on and the information table of the mark of the instruction block sent on that this arithmetic core need to receive; The described query unit that sends on sends, according to execution sequence, the arithmetic core that instruction to the described information table of described instruction block is specified;

Described feedback unit 12, also in order to after receive that described arithmetic core sends first voluntarily fetching request, sends the arithmetic core stopped to sending the described first fetching request voluntarily and sends, to the described query unit that sends on, the feedback of instruction sending on inquiry request and sending on;

Described feedback unit 12 is also in order to after receive the arithmetic core transmission that the described information table of transmission is specified first, fetching is asked voluntarily, described send on query unit send to the arithmetic core that described information table is specified the instruction block sent on that this arithmetic core need to receive instruction send on inquiry request before, if confirming the arithmetic core sending mode handover request again that transmission first fetching voluntarily that this described information table is specified is asked, then make the feedback sending on inquiry request of the instruction sending the instruction block sent on that this arithmetic core need to receive to this arithmetic core to the described inquiry request unit that sends on, if the arithmetic core not sending mode handover request again sending the first fetching request voluntarily confirming that this described information table specifies, then wait until receiving the mode switch request that this arithmetic core sends again.

Arithmetic core terminates accepting the instruction of instruction block, sends the second fetching request voluntarily. When receiving the second fetching request voluntarily that arithmetic core sends, described feedback unit 12 is also in order to make the feedback stopping sending on inquiry request to the arithmetic core transmission sending fetching request voluntarily.

When being embodied as, described instruction sends on device and also includes function encapsulation unit (not shown), when in described program, existence function calls, in order to the invoked function in described program is divided in bulk function; Described stroke of module unit is also in order to be divided into instruction block according to the execution sequence of described piece of function Program by described piece of function; The super block table of instruction calling the instruction block of the instruction block in described piece of function also includes: represent the invocation flags symbol called; The mark of the instruction block returned after function call; The super block table of instruction of last instruction block in described piece of function also includes: represent the return label symbol returned. Record unit (not shown) is when the invocation flags in the instruction super block table of the instruction block at instruction place to be sent accords with effective, by the identification record of the instruction block of return after function call in super for the instruction of this instruction block block table in function stack; Described send on query unit 11, when the return label in the super block table of the instruction of the instruction block at instruction place to be sent accords with effectively, after the transmission completing this instruction block, also in order to take out the mark of the instruction block that record returns after the function call of the stack top of described function stack, to determine instruction block next to be sent.

When being embodied as, when described program exists cyclic program, constitute the mark also including the next instruction block after jumping out circulation in the super block table of instruction of the instruction block that circulation performs. Described feedback unit 12 is also in order to when the jumping out round robin of arithmetic core transmission receiving all instruction blocks performing composition circulation execution, mark from the next instruction block constituted after jumping out circulation described in the super block table acquisition of the instruction circulating the instruction block performed, and by described jump out circulation after next instruction block mark send to described in send on query unit 11, described sending on after query unit 11 obtains this mark, what send the instruction sent on that comprises instruction block corresponding to this mark to each arithmetic core sends on inquiry request.

Described arithmetic core includes:

The location of instruction 21, in order to receive and to store the instruction of instruction block, described instruction block is what send on;

Arithmetic element 22, in order to run the instruction that the described location of instruction stores, arithmetic element 22 is connected with the location of instruction 21.

Described arithmetic core also includes:

Instruction transmission processing unit 23, inquiry request is sent in order to what send on that device sends based on the instruction block of storage in currently running instruction block, the described location of instruction and described instruction, make the described response sending on inquiry request, instruction transmission processing unit 23 is connected with the location of instruction 21, and instruction transmission processing unit 23 is connected with arithmetic element 22.

Specifically, send on inquiry request described in and comprise the instruction sent on storage address in command memory and the storage address that the Article 1 instruction of the instruction block belonging to the instruction sent on is in the location of instruction 21.

In the process being embodied as, the instruction block sent on is stored in command memory, the instruction block that arithmetic core receives leaves in the location of instruction 21, and the storage address that this instruction block storage address in the location of instruction 21 and the instruction block sent on are stored in command memory has mapping relations.

If the instruction that sends on covers instruction that the arithmetic element 22 of described arithmetic core is currently running and/or covers stored but also unenforced instruction, then the instruction transmission processing unit 23 of this arithmetic core is made instruction conflict and is responded; If the instruction sent on is the instruction that described arithmetic core needs, then the instruction transmission processing unit 23 of this arithmetic core makes command reception response; If the instruction sent on is stored in the location of instruction 21 of described arithmetic core, then the instruction transmission processing unit 23 of this arithmetic core is made instruction and is abandoned response.

If the location of instruction 21 of the instruction that the instruction sent on is described arithmetic core to be needed and this arithmetic core has the instruction sent on described in enough memory space storages, then the instruction transmission processing unit 23 of this arithmetic core is made and is immediately received response; If the instruction that the instruction sent on is described arithmetic core to be needed but the location of instruction 21 of this arithmetic core do not have the instruction sent on described in the storage of enough memory spaces, then the instruction transmission processing unit 23 of this arithmetic core is made delay and is received response.

Described instruction transmission processing unit 23 send described postpone to receive response after, through time delay, then send and immediately receive response.

Described instruction transmission processing unit 23 after conflict releases, responds after sending instruction conflict response again. Described conflict include described in the instruction that sends on cover instruction that the arithmetic element 22 of described arithmetic core is currently running and/or cover stored but also unenforced instruction.

The arithmetic element 22 of described arithmetic core, after having performed the instruction block that composition circulation performs, sends circulation and stops asking.

The location of instruction 21 of described arithmetic core suspends the instruction accepting instruction block, sends fetching request voluntarily, the location of instruction 21 of described arithmetic core send described in voluntarily after fetching request, again accept the instruction of instruction block, send mode switch request. The location of instruction 21 refusal of described arithmetic core accepts the instruction of instruction block, sends fetching request voluntarily.

The described location of instruction 21 is also in order to send stopping response, and the described location of instruction 21 stops receiving the instruction block sent on after sending described stopping response. The described location of instruction 21 also in order to send stopping response after, again send and again receive response, the described location of instruction 21 send described in again receive response after, again receive the instruction block sent on.

When the internal appearance of arithmetic core is abnormal, the location of instruction 21 also in order to send stopping response, stops the instruction block that reception sends on, for instance, the memory space of the location of instruction 21 is full, it is impossible to when storing new instruction block; When the instruction block that the instruction block sent on is currently executing with arithmetic element 22 conflicts etc. When the internal appearance of arithmetic core is abnormal, the location of instruction 21 sends stopping response, if Abnormality remove, the location of instruction 21 can also again send and again receive response, again the instruction block sent on is received, such as, the instruction block of executed is deleted, when the memory space of the location of instruction 21 can continue to new instruction block; When currently stored instruction block can be covered by the follow-up instruction block sent on.

Described arithmetic core can also include described instruction and send on device.

Instruction storage order in command memory 3 (referring to Figure 13) is: instruction 0-4 �� instruction 5-24 �� instruction 25-34, instruction is carried out piecemeal, it is respectively as follows: instruction block B0 (comprising former instruction 0-4), instruction block B1 (comprises former instruction 5-24), and instruction block B2 (comprises former instruction 25-34). For convenience of describing, instruction name in instruction block being modified, in Figure 10, the title of the instruction in instruction block uses shown in table 2 [in instruction block instruction name], and specific corresponding to relation is as shown in table 2:

Table 2

Instruction block title	Former instruction name	Instruction name in instruction block
			Instruction block B0	Instruction 0-4	Instruction 0-4
Instruction block B1	Instruction 5-24	Instruction 0-19
			Instruction block B2	Instruction 25-34	Instruction 0-9

Storing three instruction blocks in command memory 3 and be respectively as follows: instruction block B0, instruction block B1, instruction block B2, wherein instruction block B0 includes instruction 0-4, five instructions, and instruction block B1 includes 0-19,20 instructions, and instruction block B2 includes 0-9, ten instructions. instruction block is sequential storage in command memory 3. the location of instruction 21 can store at most 20 instructions, the location of instruction 21 be divided into 20 storage addresses, each storage address correspondence order label successively: 0-address, address 19. the corresponding relation of the storage address of the instruction block of the location of instruction 21 and command memory 3 is: instruction block B0 includes instruction 0-4, instruction block B0 storage position in the location of instruction 21 is positioned at 0-address, address 4, instruction block B1 includes instruction 0-19, the instruction 0-14 of instruction block B1 storage position in the location of instruction 21 is positioned at 5-address, address 19, the instruction 15-19 of instruction block B1 storage position in the location of instruction 21 is also located at 0-address, address 4 (same address be can multiplexing), instruction block B2 includes instruction 0-9, instruction block B2 storage position in the location of instruction 21 is positioned at 5-address, address 14.

The mapping relations of the storage address of above-mentioned instruction block are preset, and in other embodiments, these mapping relations can also according to other Rulemakings. But once mapping relations are formulated, sending in process in whole instruction, these mapping relations must not be revised, the location of instruction 21 is when storing the instruction of instruction block, it is necessary to according to the mapping relations storage made.

The instruction in instruction block storage mode in command memory 3 and the location of instruction 21 has multiple, and the more commonly used is preserve with the form of Cache management, and a Cache row can include dozens or even hundreds of bar instruction, it is also possible to only includes an instruction. In the present embodiment, instruction is saved in command memory 3 and the location of instruction 21 with the direct Cache form mapped. In the direct Cache mode of management mapped, the location of instruction 21 is absent from do not have space to preserve the problem sending on instruction, produces thus without there being delay to receive response.

Based on above-mentioned mapping relations and send on inquiry request, arithmetic core can respond to sending on inquiry request.

If the instruction that sends on covers instruction that described arithmetic core is currently running or covers stored but also unenforced instruction, or either way has, then this arithmetic core is made instruction conflict and is responded; If the instruction sent on is the instruction that described arithmetic core needs, then this arithmetic core makes command reception response; If the instruction sent on has stored in described arithmetic core, then this arithmetic core is made instruction and is abandoned response.

Specifically, instruction block B0 is first instruction block being stored in the location of instruction 21, occupies the position of 0-address, address 4. Follow-up transmission instruction block B1, if agreement granularity is send 5 instructions every time, then first sends the instruction 0-4 of instruction block B1. When transmission sends on inquiry request, described in send on inquiry request contain instruction block B1 storage address in command memory, and the storage address that the Article 1 instruction of instruction block B1 is in the location of instruction 21: address 5.

Instruction block B0 is first instruction block being stored in the location of instruction 21, occupies the position of 0-address, address 4, but 5-address, address 19 is idle, therefore can continue the instruction 0-4 of storage instruction block B1. Continuing to send the instruction 10-14 of the instruction 5-9 of instruction block B1, instruction block B1 by agreement granularity, the location of instruction 21 presses the instruction 0-14 of mapping relations sequential storage instruction block B1 to 5-address, address 19.

When continuing the instruction 15-19 sending instruction block B1, the instruction 15-19 of storage instruction block B1 can be continued in the leisureless address of the location of instruction 21, and the instruction 15-19 of instruction block B1 storage address of correspondence in the location of instruction 21 is 0-address, address 4. If arithmetic core is carrying out instruction block B0, owing to the instruction 15-19 of the instruction block B1 storage address in the location of instruction 21 is 0-address, address 4, if being carrying out instruction block B0, then represent that the instruction that sends on covers instruction that described arithmetic core is currently running or covers stored but also unenforced instruction, can not receive the instruction currently sent on, then this arithmetic core makes instruction conflict response.

If instruction block B0 has performed, the instruction of address 0-address 4 in the location of instruction 21, it is possible to be capped, then this arithmetic core makes command reception response.

Complete after the instruction of instruction block B1 sends on, continue to send on the instruction of instruction block B2, receive send on instruction block B2 send on inquiry request, this sends on inquiry request and includes: instruction block B2 storage address in command memory, and the storage address that the Article 1 instruction of instruction block B2 is in the location of instruction 21: address 5.

Ibid, if the instruction 0-4 of instruction block B1 is carrying out or stored but have not carried out, then the instruction 0-4 of instruction block B2 can not store principle, and arithmetic core makes instruction conflict response. If the instruction 0-4 of instruction block B1 has all performed, then the instruction 0-4 of instruction block B2 can store, and arithmetic core makes command reception response.

If the instruction 5-9 of instruction block B1 is carrying out or stored but have not carried out, then the instruction 5-9 of instruction block B2 can not store, and arithmetic core makes instruction conflict response. If the instruction 5-9 of instruction block B1 has all performed, then the instruction 5-9 of instruction block B2 can store, and arithmetic core makes command reception response. When being embodied as, send the instruction of instruction block according to agreement granularity, will send before sending instruction according to agreement granularity every time and send on inquiry request, and wait that the response of arithmetic core carries out follow-up operation.

The instruction comprised in the instruction comprised in instruction block B1 and instruction block B2 can not be identical. Being whether the stored address area split instruction with instruction is identical in the location of instruction 21, the storage address in command memory is different, even if content is identical is considered as different instructions.

Performing after one time if instruction block sequence is B0-B1-B2, recirculation performs one time, for instance B0-B1-B2-B0-B1-B2. There is the position of 0-address, address 4 in instruction block B0,20 instructions of instruction block B1 exist the position of 5-address, address 19,0-address, address 4, and 10 instructions of instruction block B2 exist the position of 5-address, address 14. The first round performs (B0-B1-B2), instruction block B0, instruction block B1, instruction block B2 all send on, when instruction block B2 sends on end time, in the location of instruction 21, that the position of address 0-address 4 preserves is the instruction 15-19 of instruction block B1, the position of 5-address, address 14, that preserve is the instruction 0-9 of instruction block B2, the position of 15-address, address 19, and preservation is the instruction 10-14 of instruction block B1. Second takes turns execution (B0-B1-B2), when sending on instruction block B0, the position of address 0-address 4 in the location of instruction 21 can be covered the instruction 0-4 of B0; When sending on instruction block B1, when sending on instruction 0-9, owing in the location of instruction, the position of address 5-14 is the instruction of instruction block B2, need again to receive the instruction 0-9 of instruction block B1, instruction block B1 instruction 10-14 is in the location of instruction 21, therefore need not send on, return instruction can abandon response to sending on inquiry request.

Specifically, the response of described command reception is divided into: instant reception responds and postpone to receive response. If the instruction that the instruction sent on is described arithmetic core to be needed and this arithmetic core have the instruction sent on described in enough memory space storages, then this arithmetic core is made and is immediately received response; If the instruction that the instruction sent on is described arithmetic core to be needed but this arithmetic core do not have the instruction sent on described in the storage of enough memory spaces, then this arithmetic core is made delay and is received response. In the instruction way to manage of direct Cache projected forms, only can produce when instruction conflict to postpone to receive response. In other instruction way to manage, according to different situations, arithmetic core is made instruction conflict response or postpones to receive response.

Instruction transmission processing unit 23 (is called time delay) through after a while after conflict releases, again responds after sending described delay reception response.

Instruction transmission processing unit 23 after conflict releases, responds after sending instruction conflict response again. Conflict include described in the instruction that sends on cover instruction that described arithmetic core is currently running and/or cover stored but also unenforced instruction, arithmetic core stores the memory space inadequate etc. of the instruction of instruction block. New response can be immediately receive response, postpone receive response or instruction conflict response, instruction abandon response in one.

Can specify an arithmetic core in numerous arithmetic cores is that instruction sends on device, then the instruction that this arithmetic core also includes as shown in Figure 8 sends on device.

Arithmetic core mostly is processor, and in order to perform instruction block and to respond to sending on inquiry request, but arithmetic core can also send on device with instruction integrates. That is, instruction sends on device and arithmetic core can respectively as independent parts, intercomed the process of instruction mutually, instruction sends on device can also integrate formation integrated component with arithmetic core, this integrated component both can realize instruction and send on the work of device, can realize again the work of arithmetic core.

Figure 11 is the program that provides of the embodiment of the present invention and the schematic diagram of instruction block, and Figure 12 is the schematic diagram of the instruction in the instruction block that the embodiment of the present invention provides, and Figure 13 is the instruction management system that the embodiment of the present invention provides, and describes in detail below in conjunction with Figure 11 to Figure 13.

Specific embodiment:

Before program is run in entering arithmetic core, it is necessary to described program is divided into instruction block. Program is made up of some instructions, comprises the instruction of program in the instruction block after division. Described program can be a large-scale system level program, it is also possible to be small-sized Application Software Program, it is also possible to be the program of the partial function module of selected parts from complete program. The present embodiment illustrates for small-sized Application Software Program, can be large-scale system level program in other embodiments, or from complete program the program of the partial function module of selected parts, can also is that other program, the installation procedure etc. of such as software, not by the restriction of the present embodiment.

Program is divided into the main following several principles of instruction block:

1. it is divided into instruction block according to the execution sequence of program;

2., after dividing, the space shared by single instruction block is less than or equal to the memory space of the location of instruction of arithmetic core;

3. instruction block to include complete instruction;

4. between instruction block, it is not allow for overlap.

Program can be divided into instruction block by block algorithm, it is also possible in conjunction with practical experience, for instance, the execution time of instruction, instruction call number etc. be divided into instruction block.

Specifically, in principle 1, program is generally all continuous print, and has certain execution sequence, when program is divided into instruction block, divides according to the execution sequence of program. Owing to program is continuous print, according to the instruction block that execution sequence divides, it is generally also continuous print.

The execution sequence of the instruction block after division is identical with the execution sequence of program, and namely the track that performs of instruction block is consistent with the execution sequence of program. The execution sequence of described program refers to the execution sequence of instruction in program.

For example, for instance, cyclic program. Cyclic program is generally circulated some instructions of execution. If being divided among in different instruction blocks by these instructions, when transmission sends on inquiry request, also cycling through identical instruction, this can affect the execution efficiency of arithmetic core. Therefore the instruction of the above-mentioned type is normally placed in same instruction block. The instruction of the above-mentioned type is placed in same instruction block, this instruction block performs complete cyclic program, perform sequence from the entirety of instruction block, be order. But in the actual motion of program, in this instruction block, instruction is that circulation performs, and in its execution sequence and program, the execution sequence of instruction is on all four.

But in a special case, the instruction of the above-mentioned type can also be dispersed in different instruction blocks. Such as, the program of loop nesting circulation, now outer loop instruction and interior loop instruction can be placed in different instruction blocks.

It is the division that instruction block refers in logic by procedure division, say, that what be that instruction block finally obtains by procedure division is instruction block sequence or instruction block list herein. It is to say, the instruction block after dividing is only so which instruction block the form representation program of instruction block sequence has, which instruction each instruction block comprises, but not program is carried out segmentation physically.

When outcome function calls in program, function invoked in program being divided into block function, each invoked function is divided into an independent block function. When instruction in the function program comprised in block function is carried out instruction block division, also in compliance with mentioned above principle. It should be noted that and need to arrange effective return label symbol in the instruction super block table of last instruction block of block function.

For example, program package to be divided is containing 15 instructions, the operation time of combined command and call number, it is three instruction blocks by procedure division, each instruction block comprises 5 instructions, instruction block after division represents with the form of instruction block sequence, rather than exists with three independent instruction blocks, and the instruction block sequence after division is as shown in table 3:

Table 3

Instruction block title	Former instruction name	Instruction name in instruction block
			Instruction block 1	Instruction 0-4	Instruction 0-4
Instruction block 2	Instruction 5-9	Instruction 0-4
			Instruction block 3	Instruction 10-14	Instruction 0-4

As shown in table 3, instruction block sequence shows the instruction that instruction block title, each instruction block comprise. when sending instruction block, order according to instruction block, granularity is (in the present embodiment according to a preconcerted arrangement, agreement granularity be send 5 instructions every time) instruction of first transmission instruction block 1, when transmission sends on inquiry request, send the instruction 0 of the first five instruction (instruction 0-4) of former instruction storage address in command memory and the former instruction storage address in the location of instruction, then instruction block 2 is sent, when now transmission sends on inquiry request, send the instruction 5 of the instruction 5-9 of the former instruction storage address in command memory and the former instruction storage address in the location of instruction, secondly instruction block 3 is sent, when now transmission sends on inquiry request, send the instruction 10 of the instruction 10-14 of the former instruction storage address in command memory and the former instruction storage address in the location of instruction.

In principle 2, the space shared by instruction block after division is less than or equal to the memory space of the location of instruction of arithmetic core, instruction block needs to deliver in advance in the location of instruction of arithmetic core, if the space shared by instruction block after dividing is more than the memory space of the location of instruction of arithmetic core, then cannot store described instruction block.

For multinuclear and many-core, the command capacity that the instruction block after division comprises is less than or is equal to the memory space of the location of instruction of each arithmetic core. It is to say, be less than or be equal to the memory space of the minimum location of instruction. The storage mode of instruction has multiple, conventional is store with the form of Cache row, in general, store with the form of Cache row, one Cache row can be 128 bytes, 256 bytes or 512 bytes, each Cache row includes a number of instruction, and the number of instructions that each Cache row includes is identical. Instruction block generally includes some Cache row, and size can be 128 bytes, 256 bytes or 512 bytes.

Specifically, the command capacity comprised in the single instruction block after division is less than or is equal to the storage size of the location of instruction of arithmetic core. Instruction is in transmitting procedure, and with predetermined agreement granularity transmission, when storing instruction with the form of Cache row, described agreement granularity is generally a Cache row. Such as, the Cache row of 128 bytes, when instruction is 4 byte, a Cache row comprises 32 instructions.

Usually require that, the integral multiple of the agreement granularity of the number of instructions that instruction block comprises preferably instruction transmission, such as, instruction stores with the form of Cache row, the agreement granularity of instruction transmission is a Cache row (can also be several Cache row), it is sized to 128 bytes (Byte), then instruction block can be divided with the form of a Cache row (128 byte), two Cache row (256 byte) or four Cache row (512 byte); Or otherwise during storage instruction, the agreement granularity of instruction transmission is 5 instructions, then single instruction block can comprise 10,15 or 20 instructions.

In principle 3, including complete instruction in instruction block, instruction preserves with the form of Cache row, and a Cache row includes an instruction or several instructions.

Such as aforementioned citing, Cache row is exactly that sheet is internally cached and the unit of the replacement of agreement between main memory, transmission granularity, for instance, it is possible to it is the unit transmitting granularity between command memory and arithmetic core. Above agreement granularity represents the quantity of the instruction of transmission every time, if Cache behavior 256 byte, when instruction is 4 byte, a Cache row represents 64 instructions, then agreement granularity is a Cache row (256 byte), sends 64 instructions every time. Under normal circumstances, the storage address in arithmetic core is by byte-addressable, if one Cache row 256 byte of regulation, every data transfer is exactly 256 bytes, and the address provided is 0x100, and low level is all 0, and this is called address 256 byte boundary alignment. Such memory access most effective.

When dividing instruction block, being that a lot of instructions are divided into several groups (often group calls instruction block), such a instruction block is likely to take several Cache row, or less than 1 Cache row. So-called instruction block to include complete instruction, refer to and be preferably able to ensure 1 instruction block comprises complete Cache row (namely drawing block by Cache to boundary), such as, the Cache row of 256 byte boundary alignments does not have the first half in previous instruction block, and later half occurs in the situation in next instruction block.

An instruction in arithmetic core or command memory, it is common that 4 bytes or 8 bytes, when depositing in main memory or on chip cache (command memory), deposits continuously. If so agreement Cache row is sized to 256 bytes, 256 is the integral multiple of 4 or 8, according to the address mode to boundary, transmit one Cache row and does not have 4 bytes of 1 instruction and can leave the situation in difference Cache row in.

Adopt the mode to boundary can improve the efficiency of memory access. But this constraint is not necessarily, it is possible to not to boundary.

In principle 4, overlap it is not allow between instruction block, for instance: comprising continuous print instruction in program and be followed successively by: instruction 0��instruction 10, two instruction blocks being divided into order are followed successively by: instruction block 1, instruction block 2, instruction block 1 comprises instruction 0��instruction 5, and instruction block 2 comprises instruction 6��instruction 10. This zoned format allows for.

But if there is such zoned format: instruction block 1 comprises instruction 0��instruction 8, and instruction block 2 comprises instruction 5��instruction 10, is so unallowed.

Additionally, instruction block is the division in logic to the programmed instruction in a section Already in main memory (command memory), the physical address (storage address) that programmed instruction is deposited is fixing in command memory, and instruction block is appointment when dividing is instruction block B_z from the part between storage address addr_x��addr_y. As shown in aforementioned table 3, instruction block sequence can be embodied as table 4:

Table 4

Instruction block title	Original instruction address	Instruction name in instruction block	Storage address
				Instruction block 1	Instruction 0-4	Instruction 0-4	Addr_0��addr_4
Instruction block 2	Instruction 5-9	Instruction 0-4	Addr_5��addr_9
				Instruction block 3	Instruction 10-14	Instruction 0-4	Addr_10��addr_14

When sending on instruction block, the content that inquiry request comprises is: inquiry arithmetic core is the need of the instruction (m represents the side-play amount storing address) being positioned at address addr_x��addr_x+m scope. Arithmetic core is according to currently executing instruction block, stored but unenforced instruction block, and sends on inquiry request, makes suitable response to sending on inquiry request.

When arithmetic core operating instruction, being go to take in command memory according to the address of instruction, the instruction got is saved in the location of instruction by arithmetic core according to previously described mapping relations.

The mode dividing instruction block has multiple, and the program divided is different, and the mode of employing is also different, in the specific implementation, it is necessary to dividing according to above-mentioned 4 principles, instruction block sequence divides according to actual needs, not by the example above content constraints.

Figure 11 is the schematic diagram of the instruction block of an application program in the embodiment of the present invention, describes in detail below in conjunction with Figure 11.

In this application citing, program B is small-sized Application Software Program, when program B is divided into instruction block, it then follows above-mentioned 4 principles.

When dividing instruction block, first considering that the execution sequence according to program B divides, program B is once design, and its execution sequence also just secures. The instruction block sequence of program B can divide according to rule, regardless of the instruction block sequence dividing out, will ensure that the track of final instruction is consistent with original program B.

Figure 11 Program B comprises program segment 0, program segment 1, program segment 2, and its execution sequence is program segment 0 �� program segment, 1 �� program segment 2, when being therefore divided into instruction block, it is possible to each program segment is divided into an instruction block. Program B is divided into three instruction blocks, respectively instruction block B0, instruction block B1, instruction block B2 according to execution sequence.

In Figure 11, the flow direction shown in arrow represents instruction block B0, instruction block B1, the execution sequence of instruction block B2, i.e. B0 �� B1 �� B2.

In other embodiments, if program A comprises program segment 0, program segment 1, program segment 2, its execution sequence redirects execution, such as, program segment 1 �� program segment, 0 �� program segment 2, occur this execution sequence redirected or similar this redirect execution sequence time, can by program segment 0, program segment 1, program segment 2 is placed in same instruction block (namely whole program only comprises an instruction block), it would however also be possible to employ the dividing mode in the present embodiment; If program C comprises program segment 0, program segment 1, program segment 2, its execution sequence is that circulation performs, such as, program segment 0 �� program segment, 1 �� program segment, 2 �� program segment, 1 �� program segment, 2 �� program segment, 1 �� program segment, 2 �� program segment 0, when the execution sequence of the execution sequence of this circulation or similar this circulation occurs, can program segment 1 and program segment 2 be divided in an instruction block B1, program segment 0 is individually divided into an instruction block B0, and instruction block sequence is the circulation between B0 and B1: B0 �� B1 �� B0. When dividing instruction block, it is possible to divide according to practical situation, be not limited to the content of the example above.

Also need to when dividing instruction block consider: the space shared by instruction block after division is less than or equal to the memory space of the location of instruction of each arithmetic core; Instruction block to include complete instruction; Instruction between instruction block does not allow overlap.

In the present embodiment, every instruction 4B, the minimized storage space of the location of instruction of arithmetic core is 80B, therefore instruction block B0, and instruction block B1 and the command capacity comprised in instruction block B2 are less than or equal to 80B. In the present embodiment, instruction block B0, instruction block B1 and instruction block B2 are respectively divided into 20B, 80B and 40B. In the present embodiment, instruction block B0, instruction block B1 and the command capacity comprised in instruction block B2 differ, and in other embodiments, instruction block B0, the command capacity comprised in instruction block B1 and instruction block B2 can be identical.

Program segment 0, program segment 1, the program segment 2 complete program that respectively order performs, every section of program package is containing some instructions. Therefore by program segment 0, program segment 1, when program segment 2 is respectively divided into instruction block B0, instruction block B1 and instruction block B2, it is possible to ensure that the execution sequence including instruction block in instruction block is consistent with the execution sequence of instruction.

Accordingly, instruction block B0, instruction block B1 and instruction block B2 also ensure that, the instruction that order performs can not repeat in the instruction block that order performs. It is to say, instruction block B0 comprises program segment 1, instruction block B1 can not comprise program segment 1 again.

In the present embodiment, program segment 0, program segment 1, program segment 2 is the program that order performs, therefore instruction block B0, the execution sequence of instruction block B1 and instruction block B2 and program segment 0, and program segment 1, the execution sequence of program segment 2 is identical.

Ready-portioned instruction block sequence information preserves in the form of a list, and in the present embodiment, ready-portioned instruction block refers to and logically instruction divided.

Figure 12 is the schematic diagram of the instruction that instruction block comprises in Figure 11, and instruction block B0 includes instruction 0-4 as shown in figure 12, five instructions, and instruction block B1 includes instruction 0-19,20 instructions, and instruction block B2 includes instruction 0-9, ten instructions. In instruction block shown in Figure 12, the name of instruction can refer to the content shown in table 3.

Figure 13 is the instruction management system that the embodiment of the present invention provides. In the present embodiment, instruction sends on device 1, arithmetic core 2 and command memory 3 and is independently arranged, each other by the communication cooperating of message.

Described instruction management system includes:

Instruction sends on device 1, in order to when sending instruction block, send to each arithmetic core 2 and send on inquiry request, based on each arithmetic core 2 to the described instruction sent in response this instruction block of transmission that inquiry request is made, the described inquiry request that sends on comprises the instruction sent on storage address in command memory 3 and the storage address that the Article 1 instruction of the instruction block belonging to the instruction sent on is in arithmetic core 2;

Instruction sends on device 1 and sends on inquiry request to the transmission of each arithmetic core, it is primarily to the implementation status understanding each arithmetic core 2 (arithmetic core 2 is referring to each arithmetic core), and the response content according to each arithmetic core 2, it may be judged whether send follow-up instruction block to each arithmetic core 2;

Owing to the execution speed of each arithmetic core 2 there are differences, it is fast that some arithmetic cores perform speed, it is slow that some arithmetic cores perform speed, send on, to described, the response that inquiry request is made by each arithmetic core 2, may determine that the implementation status of each arithmetic core 2, when the execution speed difference of each arithmetic core 2 is bigger, sends instruction by control to each arithmetic core 2, balance the difference of each arithmetic core 2.

Arithmetic core 2, in order to receive and operating instruction block, and sends on the inquiry request that sends on that device 1 sends and responds instruction; In described instruction management system, arithmetic core 2 is multiple, respectively arithmetic core 0 to arithmetic core n;

Inquiry request that what instruction was sent on that device 1 sends by arithmetic core 2 send on responds and generally includes four kinds, respectively: instruction receives response immediately, instruction delay receives response, response is abandoned in instruction and instruction conflict response. If the location of instruction in arithmetic core 2 adopts the direct Cache way to manage mapped, it is identical with instruction conflict response that instruction delay receives response, therefore instruction is sent on the inquiry request that sends on that device 1 sends and responds by arithmetic core 2, it is also possible to regard three kinds as: immediately receive response, response is abandoned in instruction and instruction conflict response; Arithmetic core 2 once can only make a kind of response.

Specifically, if the instruction that the instruction sent on is arithmetic core 2 to be needed and this arithmetic core 2 have the instruction sent on described in enough memory space storages, then this arithmetic core 2 is made and is immediately received response; If the instruction that covering operation core 2 is currently running by the instruction sent on or cover stored but also unenforced instruction, or two kinds of instructions all will be capped, cause that this arithmetic core 2 does not have the instruction sent on described in enough memory space storages, then arithmetic core 2 makes instruction conflict response; If the instruction sent on has stored in arithmetic core 2, then arithmetic core 2 is made instruction and is abandoned response;

Described instant reception response represents that arithmetic core 2 can receive instruction conflict response described in instruction block at any time and represent that arithmetic core 2 temporarily can not receive instruction block, only conflicts and needs instruction block when could judge after releasing; Described conflict include described in the instruction that sends on cover instruction that described arithmetic core is currently running or cover stored but also unenforced instruction, or both cover;

If arithmetic core 2 sends instruction conflict response, arithmetic core 2 also needs to, after sending instruction conflict response, when being able to receive that instruction after conflict releases, again send on device 1 to instruction and respond.

Command memory 3, in order to store instruction block (in the present embodiment, in command memory 3, storage is program, and concrete can be the ready-portioned instruction block preserved with job sequence or tabular form); Command memory 2 is sent on the control of device 1 by instruction, instruction sends on device 1 and sends on, to described, the response that inquiry request is made based on each arithmetic core 2, judge the state of each arithmetic core 2, thus deciding whether to continue to send follow-up instruction block, if continuing to send follow-up instruction block, then sending fetching request to command memory 3, namely control instruction memorizer 3 sends the order of instruction block to each arithmetic core 2; When command memory 3 receives order (such as, fetching is asked) that instruction sends on the transmission instruction block that device 1 sends, send instruction to arithmetic core 2;

Owing to arithmetic core 2 is multiple, command memory 3 sends instruction typically via the mode of broadcast or multicast, namely can be once the transmission instruction of multiple arithmetic core simultaneously, the waiting time of arithmetic core 2 during serial transmission can be reduced so on the one hand, instruction transmission taking communication network can be reduced on the other hand, chip data transmission utilization rate can be improved, and then improve the operation efficiency of arithmetic core 2.

In specific implementation process, ready-portioned instruction block B0, instruction block B1 and instruction block B2 leaves in command memory 3 in advance. The capacity of command memory 3 can be relatively larger, in order to deposit more instruction block.

Described instruction management system can also include blocking unit (not shown), in order to by procedure division be order perform instruction block. Program is when sending in command memory 3, it is possible to is the instruction block divided, it is also possible to arrange blocking unit in command memory 3, after program is sent into command memory 3, blocking unit is divided into instruction block.

The information write instruction having divided instruction block is sent on device 1, and the information of described instruction block includes the quantity of instruction block, the size of instruction block, the execution sequence information of instruction block, the number of the instruction that each instruction block comprises, instruction storage address etc. The information of instruction block in the form of a list, can be write by software instruction or import instruction and send in device 1.

In command memory, instruction block B0, instruction block B1, instruction block B2 are ordered storage, instruction block B0, instruction block B1, instruction block B2 instruction be stored in arithmetic core 2 according to predetermined direct mapping relations.

There is the position of 0-address, address 4 in 5 instructions such as reference Figure 10, instruction block B0,20 instructions of instruction block B1 exist the position of 5-address, address 19,0-address, address 4, and 10 instructions of instruction block B2 exist the position of 5-address, address 14.

Instruction is stored in independent command memory 3, it is possible to improve instruction and send on the work efficiency of device 1, reduces instruction and send on the expense of device 1. Instruction sends on device 1 only needs to undertake the work of management, carries out sending on the communication of inquiry request with arithmetic core 2, sends fetching request to command memory 3, without the work undertaking the instruction sending instruction block. Carry out sending on the communication of inquiry request with arithmetic core 2, send fetching request to command memory 3.

In the present embodiment, instruction sends on device 1 and command memory 3 is independently arranged, and in other embodiments, both can integrate, not by the restriction of foregoing.

The number of arithmetic core 2 is generally multiple, is called multinuclear or many-core, and multiple arithmetic core 2 parallel data processings, to strengthen the efficiency that data process. The arithmetic element of the location of instruction comprising the instruction receiving and storing described instruction block in arithmetic core 2 and the instruction running the instruction block that the described location of instruction stores. The described location of instruction is deposited also referred to as instruction office, and capacity is typically in KB magnitude.

In the present embodiment, the capacity of the location of instruction of arithmetic core 0 to arithmetic core k is 80B, and arithmetic core k+1 is 160B to the capacity of the location of instruction of arithmetic core n. Instruction block B0 is 20B, and instruction block B1 is 80B, and instruction block B2 is 40B. In other embodiments, the capacity of the location of instruction of arithmetic core 0 to arithmetic core n can be identical, for instance, it is all 80B, or is all 160B. The size of instruction block can also be identical, for instance, it is all 80B.

It is during the original state of described instruction management system startup optimization, usually empty in the location of instruction of arithmetic core 2, say, that, it does not have storage instruction block. Now, instruction sends on device 1 and sends on inquiry request to the transmission of each arithmetic core 2, and arithmetic core 2 all can feed back and immediately receive response. Follow-up transmission instruction block B1, instruction block B2 instruction time, arithmetic core 2, according to self ruuning situation and the information of instruction that sends on, sends on, to instruction, the response that device 1 feedback is suitable for.

Owing to instruction block have followed principle 1 when dividing, all of instruction block is both less than or is equal to the capacity of the location of instruction of arithmetic core 2, and instruction has corresponding relation in the storage address of command memory 3 with the storage address in arithmetic core 2, therefore the information sending on the size generally not carrying instruction block in inquiry request of instruction, but send in inquiry request in instruction, carry the instruction sent in the storage address of command memory 3.

The described inquiry request that sends on comprises the information of the instruction sent on. The described inquiry request that sends on comprises the instruction sent on storage address in command memory and the storage address that the Article 1 instruction of the instruction block belonging to the instruction sent on is in the location of instruction (namely arithmetic core 2).

Instruction sends on device 1 usually according to the execution sequence of instruction block, sends instruction block to arithmetic core 2 successively, and after an instruction block is sent completely, retransmits next instruction block. Concrete, when sending instruction block, instruction block is split into instruction, and sends according to agreement granularity. Described agreement granularity refers to the number of the instruction of transmission every time. Therefore instruction sends on device 1 and sends to arithmetic core 2 when sending on inquiry request, be also an instruction block instruction send on after inquiry request sends and processed, the sending on inquiry request and process of the instruction of transmission next instruction block.

In the present embodiment, arithmetic core will not send delay and receive response, therefore every time to arithmetic core 2 send instruction time, instruction sends on device 1 when only not comprising instruction conflict response in the response received, and just can send fetching to command memory 3 and ask; If instruction sends on comprises instruction conflict response in the response that device 1 receives, then instruction sends on device 1 and can suspend and send instruction to arithmetic core 2 by control instruction memorizer 3. It addition, instruction send on device 1 to command memory 3 send fetching request time, can inform that command memory 3 only sends instruction to sending the instant arithmetic core 2 receiving response. If it is to say, arithmetic core 2 makes instruction abandons response, then this arithmetic core 2 would not receive this instruction sent on. This is possible to prevent command reception to repeat, it is to avoid the communication resource is wasted.

In the present embodiment, the capacity of the location of instruction of arithmetic core 0 to arithmetic core k is 80B, and arithmetic core k+1 is 160B to the capacity of the location of instruction of arithmetic core n. Instruction block B0 is sized to 20B, arithmetic core 2 is after the instruction that have received instruction block B0, arithmetic core 0 to arithmetic core n stores the instruction of 20B, for arithmetic core 0 to arithmetic core k, its capacity is 80B, and being currently available that capacity is 60B, for arithmetic core k+1 to arithmetic core n, its capacity is 160B, and being currently available that capacity is 140B.

In other embodiments, if the capacity of the location of instruction of arithmetic core 0 to arithmetic core n is identical, owing to each arithmetic core performs the speed difference of instruction, it is also possible to the active volume causing the location of instruction of each arithmetic core is different.

Technical scheme at least has the advantage that

Although the present invention is with preferred embodiment openly as above; but it is not for limiting the present invention; any those skilled in the art are without departing from the spirit and scope of the present invention; may be by the method for the disclosure above and technology contents and technical solution of the present invention is made possible variation and amendment; therefore; every content without departing from technical solution of the present invention; according to any simple modification, equivalent variations and modification that above example is made by the technical spirit of the present invention, belong to the protection domain of technical solution of the present invention.

Claims

1. the pre-delivery method of instruction, it is characterised in that including:

Sending before instruction block, send to each arithmetic core and send on inquiry request, send on the response that inquiry request makes based on each arithmetic core send the instruction in this instruction block to described, described in send on inquiry request and comprise the information of the instruction sent on;

2. the pre-delivery method of instruction as claimed in claim 1, it is characterized in that, described arithmetic core includes the location of instruction receiving and storing described instruction block and the arithmetic element running the instruction block that the described location of instruction stores, and the space shared by described instruction block is less than or equal to the memory space of the described location of instruction.

3. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that described transmission instruction block includes the instruction sending this instruction block according to agreement granularity, and described agreement granularity is the quantity of the instruction every time sent on.

4. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that described in send on inquiry request and send before the instruction every time sending this instruction block according to agreement granularity, described in send on inquiry request and also include the storage address of described instruction block.

5. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that described arithmetic core sends on the response that inquiry request makes include described: response and instruction conflict response are abandoned in command reception response, instruction; Described send on the instruction that the response that inquiry request makes sends in this instruction block based on each arithmetic core include described:

6. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that described arithmetic core sends on the response that inquiry request makes include described: immediately receive response, postpone to receive response, response is abandoned in instruction and instruction conflict responds; Described send on the instruction that the response that inquiry request makes sends in this instruction block based on each arithmetic core include described:

If the response that all arithmetic cores are made includes instruction conflict response and/or postpones to receive response, then wait that the arithmetic core making instruction conflict response responds again, and wait that making the arithmetic core postponing to receive response makes and immediately receive response, until the response that all arithmetic cores are made does not include instruction conflict response and postpones to receive response.

7. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that also include arranging the information including receiving the arithmetic core of the instruction of the instruction block sent on and the information table of the mark of the instruction block sent on that this arithmetic core need to receive; The described instruction arithmetic core that extremely described information table is specified including according to execution sequence transmission instruction block at least one arithmetic core sending described instruction block according to execution sequence;

Described arithmetic core also sends fetching request and mode switch request voluntarily, and the pre-delivery method of described instruction also includes:

When receiving the request of fetching voluntarily that described arithmetic core sends, stop sending to the arithmetic core of fetching request voluntarily described in transmission sending on inquiry request and the instruction sent on;

8. the pre-delivery method of instruction as claimed in claim 1, it is characterized in that, described arithmetic core also sends fetching request voluntarily, the pre-delivery method of described instruction also includes: when receiving the request of fetching voluntarily that arithmetic core sends, and stops sending to the arithmetic core of fetching request voluntarily described in transmission sending on inquiry request and the instruction sent on.

9. the pre-delivery method of instruction as claimed in claim 1, it is characterized in that, described instruction block is stored in command memory, described send on the instruction that the response that inquiry request makes sends in this instruction block based on described arithmetic core include described: sends on the response control instruction memorizer that inquiry request makes based on described arithmetic core and sends the instruction in this instruction block to described to arithmetic core.

10. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that the storage address of described instruction block includes start address and the end address of the storage of described instruction block.

11. the pre-delivery method of instruction as claimed in claim 1, it is characterized in that, when in described program, existence function calls, the pre-delivery method of described instruction also includes: be before instruction block according to the execution sequence of program by described procedure division, and the invoked function in described program is divided in bulk function; Described procedure division is that instruction block includes the execution sequence according to described piece of function Program a described piece function is divided into instruction block by the described execution sequence according to program, and the super block table of instruction calling the instruction block of the instruction block in described piece of function also includes: represent the invocation flags symbol called; The mark of the instruction block returned after function call; The super block table of instruction of last instruction block in described piece of function also includes: represent the return label symbol returned.

12. the pre-delivery method of instruction as claimed in claim 11, it is characterised in that also include:

When the return label in the super block table of the instruction of the instruction block at instruction place to be sent accords with effectively, after the transmission completing this instruction block, take out the mark of the instruction block that record returns after the function call of the stack top of described function stack, to determine instruction block next to be sent.

13. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that when there is cyclic program in described program, constitute the mark also including the next instruction block after jumping out circulation in the super block table of instruction of the instruction block that circulation performs.

14. the pre-delivery method of instruction as claimed in claim 13, it is characterised in that described arithmetic core also sends circulation and stops request;

Described according to execution sequence send instruction block include at least one arithmetic core:

According to the mark transmission instruction block of the next instruction block in the super block table of instruction constituting the instruction block that circulation performs at least one arithmetic core;

Round robin is jumped out when what the arithmetic core receiving all instruction blocks performing composition circulation execution sent, from the mark of the next instruction block constituted after jumping out circulation described in the super block table acquisition of the instruction circulating the instruction block performed, send the extremely all arithmetic cores performing the instruction block that composition circulation performs of corresponding instruction block.

15. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that also include: after obtaining the stopping response that described arithmetic core sends, stop sending instruction block to whole arithmetic cores.

16. the pre-delivery method of instruction as claimed in claim 15, it is characterised in that also include: after obtaining the stopping response that described arithmetic core sends, again obtain the response of reception again that this arithmetic core sends, and again send instruction block to whole arithmetic cores.

17. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that also include being saved in described arithmetic core the execution sequence of described program.

18. the pre-delivery method of instruction as claimed in claim 1, it is characterised in that send described instruction block to described arithmetic core by the mode of broadcast or multicast.

19. an instruction sends on device, it is characterised in that including:

Send on query unit, in order to, before the instruction sending instruction block, to send to each arithmetic core and to send on inquiry request, described in send on inquiry request and comprise the information of the instruction sent on;

Feedback unit, sends signal in order to send instruction block, and described procedure division is obtained by described instruction block according to the execution sequence of program, described feedback unit, also in order to send on, to described, the response that inquiry request is made based on each arithmetic core, makes feedback.

20. instruction as claimed in claim 19 sends on device, it is characterized in that, if the response that all arithmetic cores are made does not include instruction conflict response, then described feedback unit is made to the feedback sending the instruction sent on described in the arithmetic core transmission receiving instruction response; If the response that all arithmetic cores are made includes instruction conflict response, then described feedback unit is made and is waited that the arithmetic core of instruction conflict response responds again, until the response that all arithmetic cores are made does not include the feedback of instruction conflict response.

21. instruction as claimed in claim 19 sends on device, it is characterized in that, if the response that all arithmetic cores are made does not include instruction conflict response and postpones reception response, then described feedback unit is made to the feedback sending the instruction sent on described in the instant arithmetic core transmission receiving response; If the response that all arithmetic cores are made includes instruction conflict response and/or postpones to receive response, then described feedback unit is made and is waited that the arithmetic core making instruction conflict response responds again, and wait that making the arithmetic core postponing to receive response makes and immediately receive response, until the response that all arithmetic cores are made does not include instruction conflict response and postpones to receive the feedback of response.

22. instruction as claimed in claim 19 sends on device, it is characterized in that, also include information table and unit is set in order to arrange the information including receiving the arithmetic core of the instruction of the instruction block sent on and the information table of the mark of the instruction block sent on that this arithmetic core need to receive; The described query unit that sends on sends, according to execution sequence, the arithmetic core that instruction to the described information table of described instruction block is specified;

Described feedback unit, also in order to after receiving the request of fetching voluntarily that described arithmetic core sends, sends stop sending, to the arithmetic core of fetching request voluntarily described in sending, the feedback of instruction sending on inquiry request and sending on to the described query unit that sends on;

Described feedback unit is also in order to after receiving the request of fetching voluntarily sending the arithmetic core transmission that described information table is specified, described send on query unit send to the arithmetic core that described information table is specified the instruction block sent on that this arithmetic core need to receive instruction send on inquiry request before, if confirming the arithmetic core sending mode handover request again that the transmission fetching voluntarily that this described information table is specified is asked, then make the feedback sending on inquiry request of the instruction sending the instruction block sent on that this arithmetic core need to receive to this arithmetic core to the described inquiry request unit that sends on, if confirming the arithmetic core of the transmission fetching voluntarily request that this described information table specifies not sending mode handover request again, then wait until receiving the mode switch request that this arithmetic core sends again.

23. instruction as claimed in claim 19 sends on device, it is characterized in that, when receiving the request of fetching voluntarily that arithmetic core sends, described feedback unit is also in order to make stop sending, to the arithmetic core of fetching request voluntarily described in sending, the feedback of instruction sending on inquiry request and sending on to the described query unit that sends on.

24. instruction as claimed in claim 19 sends on device, it is characterized in that, also include command memory, in order to store described instruction block, after described command memory obtains described feedback or instruction block transmission signal from described feedback unit, according to execution sequence transmission instruction block at least one arithmetic core.

25. instruction as claimed in claim 19 sends on device, it is characterised in that the storage address of described instruction block includes start address and the end address of the storage of described instruction block.

26. instruction as claimed in claim 19 sends on device, it is characterised in that also include drawing a module unit, in order to being described instruction block according to the execution sequence of program by described procedure division.

27. instruction as claimed in claim 26 sends on device, it is characterised in that also include: function encapsulation unit, when in described program, existence function calls, described function encapsulation unit is in order to divide in bulk function by the invoked function in described program;

Described stroke of module unit is also in order to be divided into instruction block according to the execution sequence of described piece of function Program by described piece of function; The super block table of instruction calling the instruction block of the instruction block in described piece of function also includes: represent the invocation flags symbol called; The mark of the instruction block returned after function call; The super block table of instruction of last instruction block in described piece of function also includes: represent the return label symbol returned.

28. instruction as claimed in claim 27 sends on device, it is characterised in that described instruction sends on device and also includes:

Record unit, when the invocation flags in the super block table of the instruction of the instruction block at instruction place to be sent accords with effectively, described record unit by the identification record of the instruction block of return after function call in super for the instruction of this instruction block block table in function stack;

Described send on query unit, when the return label in the super block table of the instruction of the instruction block at instruction place to be sent accords with effectively, after the transmission completing this instruction block, also in order to take out the mark of the instruction block that record returns after the function call of the stack top of described function stack, to determine instruction block next to be sent.

29. instruction as claimed in claim 19 sends on device, it is characterised in that when there is cyclic program in described program, constitute the mark also including the next instruction block after jumping out circulation in the super block table of instruction of the instruction block that circulation performs.

30. instruction as claimed in claim 29 sends on device, it is characterized in that, described feedback unit is also in order to when the jumping out round robin of arithmetic core transmission receiving all instruction blocks performing composition circulation execution, mark from the next instruction block constituted after jumping out circulation described in the super block table acquisition of the instruction circulating the instruction block performed, and by described jump out circulation after next instruction block mark send to described in send on query unit, described send on after query unit obtains this mark, to each arithmetic core send comprise instruction block corresponding to this mark the instruction sent on send on inquiry request.

31. instruction as claimed in claim 19 sends on device, it is characterised in that described feedback unit is also in order to, after obtaining the stopping response that described arithmetic core sends, to stop sending described instruction block and to send signal.

32. instruction as claimed in claim 19 sends on device, it is characterized in that, described feedback unit is also in order to, after obtaining the stopping response that described arithmetic core sends, after again obtaining the response of reception again that this arithmetic core sends, to re-emit described instruction block and to send signal.

33. an arithmetic core, it is characterised in that including:

Arithmetic element, in order to run the instruction that the described location of instruction stores;

Instruction transmission processing unit, sends on inquiry request in order to what send on that device sends based on the instruction block of storage in currently running instruction block, the described location of instruction and the instruction described in claim 19, makes the described response sending on inquiry request.

34. arithmetic core as claimed in claim 33, it is characterized in that, if the instruction that sends on covers instruction that the arithmetic element of described arithmetic core is currently running and/or covers the stored but also unenforced instruction sent on, then the instruction transmission processing unit of this arithmetic core is made instruction conflict and is responded; If the instruction sent on is the instruction that described arithmetic core needs, then the instruction transmission processing unit of this arithmetic core makes command reception response; If the instruction sent on is stored in the location of instruction of described arithmetic core, then the instruction transmission processing unit of this arithmetic core is made instruction and is abandoned response.

35. arithmetic core as claimed in claim 33, it is characterized in that, if the location of instruction of the instruction that the instruction sent on is described arithmetic core to be needed and this arithmetic core has the instruction sent on described in enough memory space storages, then the instruction transmission processing unit of this arithmetic core is made and is immediately received response; If the instruction that the instruction sent on is described arithmetic core to be needed but the location of instruction of this arithmetic core do not have the instruction sent on described in the storage of enough memory spaces, then the instruction transmission processing unit of this arithmetic core is made delay and is received response.

36. arithmetic core as claimed in claim 35, it is characterised in that described instruction transmission processing unit send described postpone to receive response after, through time delay, then send and immediately receive response.

37. the arithmetic core as described in claim 34 or 35, it is characterised in that described instruction transmission processing unit after conflict releases, responds after sending instruction conflict response again.

38. arithmetic core as claimed in claim 37, it is characterised in that described conflict include described in the instruction that sends on cover instruction that the arithmetic element of described arithmetic core is currently running and/or cover stored but also unenforced instruction.

39. arithmetic core as claimed in claim 33, it is characterised in that the described location of instruction is also in order to send stopping response, and the described location of instruction stops receiving the instruction block sent on after sending described stopping response.

40. arithmetic core as claimed in claim 33, it is characterized in that, the described location of instruction is also in order to, after sending stopping response, again to send and again to receive response, the described location of instruction send described in again receive response after, again receive the instruction block sent on.

41. arithmetic core as claimed in claim 33, it is characterised in that described arithmetic element, after having performed the instruction block that composition circulation performs, sends and jumps out round robin.

42. arithmetic core as claimed in claim 33, it is characterized in that, the described location of instruction suspends the instruction of the instruction block that acceptance sends on, send fetching request voluntarily, the described location of instruction send described in voluntarily fetching request after, again accept the instruction of the instruction block sent on, send mode switch request.

43. arithmetic core as claimed in claim 33, it is characterised in that described location of instruction refusal accepts the instruction of instruction block, send fetching request voluntarily.

44. arithmetic core as claimed in claim 33, it is characterised in that also including, the instruction as described in any one of claim 19 to 32 sends on device.

45. an instruction management system, it is characterised in that including:

Instruction as described in any one of claim 19 to 32 sends on device;

Arithmetic core as described in any one of claim 33 to 43.

46. an instruction management system, it is characterised in that including:

Arithmetic core described in any one of claim 33 to 43;

Arithmetic core described in claim 44.