CN103377085B - Method, device and system for instruction management and operation core - Google Patents

Method, device and system for instruction management and operation core Download PDF

Info

Publication number
CN103377085B
CN103377085B CN201210107228.1A CN201210107228A CN103377085B CN 103377085 B CN103377085 B CN 103377085B CN 201210107228 A CN201210107228 A CN 201210107228A CN 103377085 B CN103377085 B CN 103377085B
Authority
CN
China
Prior art keywords
instruction
response
arithmetic
arithmetic core
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210107228.1A
Other languages
Chinese (zh)
Other versions
CN103377085A (en
Inventor
高剑刚
李宏亮
郑方
许勇
卢宏生
任秀江
高红光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201210107228.1A priority Critical patent/CN103377085B/en
Publication of CN103377085A publication Critical patent/CN103377085A/en
Application granted granted Critical
Publication of CN103377085B publication Critical patent/CN103377085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

Disclosed are a method, a device and a system for instruction management and an operation core. The method for instruction management includes dividing a program into instruction blocks according to an execution sequence of the program and sending the instruction blocks to at least one operation core according to the execution sequence. According to the technical scheme, instruction off-target and waiting delay of the operation cores are reduced effectively and operation efficiency of the operation core is improved.

Description

Instruction management method and device, instruction management system, arithmetic core
Technical field
The present invention relates to management technique field is instructed, it is especially a kind of to instruct management method and device, instruction management system With a kind of arithmetic core.
Background technology
In general processor, generally using the instruction storage organization of classification, it is, instruction is stored in different stage In storage medium.Arithmetic core (the composition device of processor, each arithmetic core can regard a little processor as) exists Instructed from local storage during operation, because the local memory span of arithmetic core is limited, easily produce fetching and lose Situation about losing, that is to say, that if not storing the instruction that will be run in arithmetic core, arithmetic core needs from upper level instruction to deposit Reservoir just can be continued to run with after being instructed, and fetching is unsuccessfully also referred to as instruction and misses the target.In the instruction storage organization feelings using classification Under condition, instruction is obtained from upper level command memory can be taken a substantial amount of time, if frequently there is fetching failure, can be increased Time-consuming, the work efficiency of reduction arithmetic core of instruction transmission.
In multinuclear, many-core processor, integrated multiple arithmetic cores on single silicon-chip.Because arithmetic core quantity is more, each is transported The instruction memory size calculated in core is little, and the fetching competition conflict of the upper level command memory to sharing can increase, computing Fetching race problem between core is gradually highlighted.Especially increase to tens when the operation core calculation on single silicon-chip, hundreds of When, traditional fetching processing mode makes the situation that arithmetic core fetching postpones substantially increase.Meanwhile, fetching competition also results in logical Communication network congestion, this can become the bottleneck of the performance of restriction arithmetic core and adaptive surface.
At present instruction processing technique the more commonly used in processor includes SIMD (Single Instruction Multiple Data, single-instruction multiple-data stream (SIMD)) technology and SPMD (Single Programe Multiple Data, one way sequence Multiple data stream) technology.
The technology unified instruction demand such as SIMD, SPMD adopted in polycaryon processor, this can be reduced to a certain extent Instruction demand.
SIMD technologies are adopted in polycaryon processor, multiple arithmetic cores (or a plurality of stream in arithmetic core is primarily referred to as Waterline) same instruction issue platform, synchronous operation identical instruction are shared, but the data that arithmetic core is processed are different.
SPMD technologies are adopted in polycaryon processor, each arithmetic core is primarily referred to as and is performed identical program code, often The program of individual arithmetic core operation is identical, but the data for processing are different.
The advantage of SIMD technologies is each arithmetic core shared instruction transmitter unit of requirement, and per bar, instruction is all synchronous performs, Fetching competition is this prevent, congestion of the fetching operation of multi-core aggregation to communication network can be mitigated.
The advantage of SPMD technologies is to relax to require the synchronization between arithmetic core, by the synchronization granularity between each arithmetic core Independent program level is brought up to, each arithmetic core being capable of autonomous operation in program limit.
The above technology, can be to a certain degree from the angle for reducing fetching operation source, reducing program code species Upper reduction fetching conflict and reduction fetching postpone.
But SIMD technical requirements each arithmetic cores is instructed per bar and will synchronously performed, the resource of arithmetic core is usual It is difficult to be fully used, it is impossible to play the computing capability of all arithmetic cores, limits the scope of application of the technology.
In multinuclear, many-core processor, as arithmetic core quantity increases, the memory span in arithmetic core is little, if SPMD procedure quantities more than the memory span in arithmetic core, miss the target or can cause frequently fetching operation, causes and takes by fetching Refer to operation conflict aggravation, communication network congestion is serious, and the fetching waiting time of arithmetic core is longer, the meter to playing arithmetic core Calculating efficiency has considerable influence.Therefore, in multinuclear, many-core processor, the memory span in arithmetic core limits SPMD skills The scope of application of art.
Method in the Chinese patent of Publication No. CN 1466716A as a processor only to provide instruction prefetch clothes Business, is unsuitable for the processor structure of multinuclear, many-core processor.On the other hand, the method for prefetched instruction is used in the patent, to every The individual core that calculates needs extra secondary processor, and for the simple version of configuration processor, hardware spending is larger.
The instruction for how effectively reducing arithmetic core is missed the target and waits to be delayed, is improved the computational efficiency of arithmetic core and is become mesh One of front problem demanding prompt solution.
The content of the invention
The problem that the present invention is solved is that the instruction for how effectively reducing arithmetic core is missed the target and waits to be delayed, improves operation core The computational efficiency of the heart.
To solve the above problems, the invention provides a kind of instruction management method, including:
Described program is divided into instruction block by the execution sequence according to program;
According to execution sequence instruction block is sent at least one arithmetic core.
To solve the above problems, present invention also offers a kind of instruction managing device, including:
Feedback unit, to send instruction block sending signal, the instruction block is according to the execution sequence of program by the journey Sequence is divided and obtained.
To solve the above problems, present invention also offers a kind of arithmetic core, including:
The location of instruction, to receive and stores the instruction of instruction block, and the instruction block sends on;
Arithmetic element, to the instruction for running the location of instruction storage.
To solve the above problems, present invention also offers a kind of instruction management system, including:
Instruction managing device as above;
Arithmetic core as above.
To solve the above problems, present invention also offers a kind of instruction management system, including:
Arithmetic core as above.
Compared with prior art, the present invention has advantages below:
The method combined using software and hardware, software is divided into instruction code a series of according to the perform track of program Instruction block sequence, the instruction block track of software assurance each arithmetic core instruction is consistent.Hardware marks off the finger for coming according to software Block sequence information is made, instruction needed for arithmetic core is sent in the command memory of arithmetic core.Because programmed instruction track carries It is front to understand, instruction actively can be in advance loaded in the memorizer of arithmetic core before the real execute instruction of arithmetic core.
It is little instruction block by program cutting, arithmetic core is delivered in advance in the instruction in instruction block, arithmetic core can be in fortune Instructed before row.This makes, and sending on for call instruction is relatively independent with calculating core operation, and each arithmetic core can not affect just Receive other instructions for sending on simultaneously in the instruction for performing, reduce the time that arithmetic core fetching is waited, improve arithmetic core Operation efficiency.
Send the instruction of instruction block to arithmetic core in advance, each arithmetic core no longer actively sends fetching request, can disappear Division operation core fetching is competed, additionally it is possible to is reduced the occupancy to communication network, is conducive to further improving the computing of arithmetic core Efficiency.
Instruction is organized into the instruction block of volume-variable, and the instruction super block track that each arithmetic core is performed is identical, but allows Perform track of each arithmetic core in instruction block is different.That is, the execution speed of each arithmetic core is different, base Send on the response that inquiry request makes and send instruction in the instruction block to described in each arithmetic core, each operation core can be balanced The difference of the speed of service between the heart.This causes each arithmetic core speed of service difference in controlled range, but does not force each Arithmetic core synchronous operation instruction.This is a kind of loose synchronization mechanism to send on control mode of the instruction as means, that is, lead to Cross transmission and send on inquiry request, the rhythm of the advance transmission that the response control based on arithmetic core is instructed, balance movement speed is fast Arithmetic core and the slow arithmetic core of the speed of service between speed difference.That is, making the most fast arithmetic core of operation Not over running, most slow core is too many, if the instruction that the fast arithmetic core of operation will run can cover the slow computing of operation Core be currently running instruction block when, run slow arithmetic core and send instruction conflict response, suspend to all arithmetic cores Send the work of instruction.There is some difference for the speed of each arithmetic core of technical scheme permission, and instruction sends on process In few to the interference of arithmetic core therefore applicable scope it is wider.
Mode is sent on the instruction of multicast, broadcast, it is especially suitable to multinuclear, many-core structure, can reduce between arithmetic core Fetching competition, and reduce instruction congestion time of the transmission to communication network, improve chip-on communication network utilization ratio.
The technical scheme is that a kind of instruction based on loose synchronization sends on mechanism, solve monokaryon, multinuclear, many-core The fetching race problem that core fetching postpones in long problem and multinuclear, many-core processor is calculated in processor, process is improve The operation efficiency of device.
Description of the drawings
Fig. 1 is the flow chart of instruction management method provided in an embodiment of the present invention;
Fig. 2 is instruction managing device provided in an embodiment of the present invention;
Fig. 3 is arithmetic core provided in an embodiment of the present invention;
Fig. 4 is that command memory provided in an embodiment of the present invention is closed with the storage address mapping of arithmetic core store instruction block The schematic diagram of system;
Fig. 5 is the schematic diagram of program provided in an embodiment of the present invention and instruction block;
Fig. 6 is the schematic diagram of the instruction in instruction block provided in an embodiment of the present invention;
Fig. 7 is instruction management system provided in an embodiment of the present invention.
Specific embodiment
It is understandable to enable the above objects, features and advantages of the present invention to become apparent from, below in conjunction with the accompanying drawings to the present invention Specific embodiment be described in detail.
Elaborate detail in order to fully understand the present invention in the following description.But the present invention can with it is various not It is same as alternate manner described here to implement, those skilled in the art can do class in the case of without prejudice to intension of the present invention Like popularization.Therefore the present invention is not limited by following public specific embodiment.
Fig. 1 is the flow chart of instruction management method provided in an embodiment of the present invention, is described in detail with reference to Fig. 1.
The instruction management method includes:
Step S1, instruction block is divided into according to the execution sequence of program by described program;
Step S2, instruction block is sent at least one arithmetic core according to execution sequence.
The instruction management method can also include:
Step S3 (does not show) in Fig. 1, before the instruction for sending instruction block, sends to send on to inquire about to each arithmetic core and asks Ask, send on the response that inquiry request makes and send instruction in the instruction block to described based on each arithmetic core, described sending on is looked into Ask information of the request bag containing the instruction for sending on.
In being embodied as, in step S1, program is divided into into instruction block according to execution sequence, after division is completed, Obtain the execution sequence of instruction block, also referred to as instruction block perform track.
Specifically, instruction block is execution sequence of the third party software according to program, or is for example referred to reference to practical experience Execution time, call number of instruction of order etc. divide what is obtained according to execution sequence, and the execution of the instruction block after division is suitable Sequence is identical with the execution sequence of program, that is, the perform track of instruction block is consistent with the execution sequence of program.The journey The execution sequence of sequence refers to the execution sequence instructed in program.
If there is the programs such as circulation, recurrence or conditional judgment in program, the perform track of instruction block and holding for program Row order can be slightly different, but the execution sequence instructed in instruction block is identical with the execution sequence of program.As shown in table 1. For example, in program comprising instruction 1, instruction 2, instruction 3, instruction 4, if instruction 2 and instruction 3 formed cyclic programs, and circulate time Number is 2 times, then the execution sequence of program is instruction 1 → instruction 2 → instruction 3 → instruction 2 → instruction 3 → instruction 4;By procedure division For three instruction blocks, comprising instructing 1, instruction block 2 includes instruction 2 and instruction 3 to instruction block 1, and instruction block 3 includes instruction 4, then instructs The perform track of block is 1 → instruction block of instruction block, 2 → instruction block 3.
Table 1
In the example above, although the execution sequence of instruction block is slightly different with the execution sequence of program, both essence are phases With.The execution sequence of instruction block illustrates the execution sequence of the instruction included in instruction block, and the perform track of instruction block is finger Make 1 → instruction block of block, 2 → instruction block 3, the execution sequence of the instruction in instruction block is still instruction 1 → instruction 2 → instruction 3 → refer to Make 2 → instruction 3 → instruction 4.Therefore the perform track of instruction block is slightly different with the execution sequence of program, but instruction block middle finger The execution sequence of order and the execution sequence of program are identicals.
Can also be able to be multiple for one by the quantity of the instruction block of procedure division, that is to say, that program can be drawn It is divided into multiple instruction block, it is also possible to which a whole program is used as an instruction block.A number of finger is included in each instruction block Order, the quantity of the instruction included in each instruction block can also be able to be a plurality of for one.Generally, include in instruction block At least two instructions, but it is also possible to only comprising an instruction.
Arithmetic core includes receiving and storing the location of instruction of the instruction block and the operation location of instruction The arithmetic element of the instruction block of storage, the space shared by the instruction block is empty less than or equal to the storage of the location of instruction Between.The execution sequence of described program generally will also maintain in the arithmetic core.
By procedure division be instruction block when, it is desirable to the space shared by instruction block after division be less than or equal to the instruction The memory space of memory element.This can ensure that arithmetic core can preserve at least one complete instruction block.
In step S2, instruction block is sent at least one arithmetic core, until by all of finger according to execution sequence Make block be sent, if or obtain the stopping response that the arithmetic core sends, stop being sent to whole arithmetic cores and refer to Make block.If after obtaining the stopping response that the arithmetic core sends, the response of reception again that the arithmetic core sends is obtained again, Then again to whole arithmetic cores transmission instruction block.
When arithmetic core inside occurs abnormal, stopping response being sent, stop receiving the instruction block for sending on, for example, operation core The memory space of the heart is full, it is impossible to when storing new instruction block;The finger that the instruction block for sending on is currently executing with arithmetic core When making block conflict etc..When arithmetic core inside occurs abnormal, stopping response being sent, if Abnormality remove, arithmetic core can also be again Secondary transmission receives response again, and the instruction block for sending on is received again, for example, deletes the instruction block of executed, arithmetic core When memory space can continue to new instruction block;The instruction block that currently stored instruction block can subsequently be sent on is covered When.
The instruction block run in arithmetic core sends on, therefore sends instruction block according to execution sequence, it is ensured that fortune Core is calculated according to execution sequence execute instruction block.
According to the perform track of program, instruction code is divided into into series of instructions block sequence, it is ensured that each arithmetic core The instruction block track of instruction is consistent.Instruction needed for arithmetic core is sent in the command memory of arithmetic core, because program refers to Make track understand in advance, therefore instruction actively can be in advance loaded into into arithmetic core before the real execute instruction of arithmetic core Memorizer in.
In being embodied as, when instruction block is sent, instruction block is split into into some instructions, sending according to agreement granularity should The instruction of instruction block, the agreement granularity is the quantity of the instruction for sending on every time.To the fortune by way of broadcast or multicast Calculate core and send the instruction block or instruction, transmission speed can be improved, so as to improve work efficiency.
It is to ensure that arithmetic core can be properly received instruction block, before instruction block is sent, to each computing in step S3 Core sends and sends on inquiry request.As described in step S2, when sending instruction block, the instruction block is split into into some fingers Order, the instruction in instruction block is sent according to agreement granularity.It is therefore each to send before the instruction in instruction block according to agreement granularity, Will send to the arithmetic core and send on inquiry request.
The inquiry request that sends on sends before the instruction for sending the instruction block according to agreement granularity every time.Which ensure that fortune Calculate core can in advance correctly judge whether need to receive the instruction for sending on, to the arithmetic core that need not be instructed, it is to avoid nothing Transmission.
It is little instruction block by program cutting, arithmetic core is delivered in advance in the instruction in instruction block, arithmetic core can be in fortune Instructed before row.This makes, and sending on for call instruction is relatively independent with calculating core operation, and each arithmetic core can not affect just Receive other instructions for sending on simultaneously in the instruction for performing, reduce the time that arithmetic core fetching is waited, improve arithmetic core Operation efficiency.
Send on the response that inquiry request makes and send instruction in the instruction block to described based on each arithmetic core, it is described pre- Send the information of instruction of the inquiry request comprising sending on.It is described to send on whether inquiry request mainly inquiry arithmetic core needs at present New instruction.
Each arithmetic core no longer actively sends fetching request, can eliminate the competition of arithmetic core fetching, additionally it is possible to reduce right The occupancy of communication network, is conducive to further improving the operation efficiency of arithmetic core.
The arithmetic core sends on the response that inquiry request makes and includes to described:Command reception response, instruction are abandoned ringing Should respond with instruction conflict;Wherein command reception response is divided into:Immediately receive response, postpone to receive response.
Arithmetic core is according to the instruction for currently performing, the instruction for sending on and the stored but instruction that has not carried out Information, judges whether to need to receive the instruction for sending on, and responds, specifically:
If send on instruction for the arithmetic core need instruction and there is the arithmetic core enough memory spaces to deposit The storage instruction for sending on, then the arithmetic core make receiving immediately and respond;If the instruction for sending on is the arithmetic core needs Instruction but the arithmetic core does not have the instruction that sends on described in enough memory space storages, then the arithmetic core is made delay and is connect Receive response.
If instruction or cover stored but also unenforced that the instruction covering arithmetic core for sending on is currently running Instruct, or two kinds of situations are present, then the arithmetic core makes instruction conflict response;If the instruction for sending on is in the operation core Stored in the heart, then the arithmetic core makes instruction and abandons response.
It is described to send on the response that inquiry request makes and send instruction bag in the instruction block to described based on each arithmetic core Include:
If the response that all arithmetic cores are made is not responded including instruction conflict, to sending the computing that instruction is responded is received The instruction sent on described in core transmission;
If the response that all arithmetic cores are made only abandons response including instruction, abandon sending described to each arithmetic core The instruction for sending on;
If the response that all arithmetic cores are made is responded including instruction conflict, the computing of instruction conflict response is made in wait Core is responded again, until the response that all arithmetic cores are made is not responded including instruction conflict.
The arithmetic core sends on the response that inquiry request makes and includes to described:Immediately response is received, is postponed to receive and is rung Answer, response is abandoned in instruction and instruction conflict is responded then:
It is instant to sending if the response that all arithmetic cores are made does not respond including instruction conflict and postpones to receive response The instruction sent on described in the arithmetic core transmission for receiving response;
If the response that all arithmetic cores are made only abandons response including instruction, abandon sending described to each arithmetic core The instruction for sending on;
If the response that all arithmetic cores are made includes that instruction conflict responds or postpones to receive respond, or two kinds of situations are equal Exist, then wait the arithmetic core for making instruction conflict response to respond again, and wait is made and postpones to receive response Arithmetic core is made and receive immediately response, until the response that all arithmetic cores are made does not include that instruction conflict response and delay connect Receive response.
That is, only when the response that arithmetic core is made only includes reception response immediately and response is abandoned in instruction, The instruction for sending on just can be sent to the arithmetic core for making reception response immediately, and only can immediately receive the computing for responding to making Core sends the instruction that sends on, without sending the instruction for sending on to making the arithmetic core that instruction abandons responding.This be in order to Ensure the instruction for needing the arithmetic core for instructing to obtain needs, it is not necessary to which the arithmetic core of instruction avoids repeating to receive, and subtracts The waste of the little communication resource.
If responding or postpone reception response, or both comprising instruction conflict in the response that arithmetic core is made has, Then wait the arithmetic core that have sent instruction conflict response or delay reception response to send response again, and judge new sound Should whether be to receive response immediately or instruct to abandon response, if both are neither, continue etc., until the response for receiving only is wrapped Include and receive immediately response and when response is abandoned in instruction, to the arithmetic core that receives response immediately is made the instruction that sends on is sent.
Arithmetic core is each to transport because the starting time of each arithmetic core may be inconsistent during execute instruction block The data for calculating core operation are also not quite similar, therefore difference often occurs in execution speed Jing of each arithmetic core, by the instruction Management method can make the pending slow-footed arithmetic core such as the fireballing arithmetic core of execution.
Due to the execution speed of each arithmetic core it is different, the fast arithmetic core of the speed of service may perform it is all its The instruction of storage, the follow-up instruction for sending of wait, and the slow arithmetic core of the speed of service, may also be not carried out the finger of its storage Order.Due to the limited storage space of arithmetic core, therefore the instruction for wherein storing is that the instruction for constantly subsequently being sent updates Substitute.If the slow arithmetic core memory space of the speed of service is full, and the instruction for storing is unenforced instruction, then the computing Core cannot receive the instruction of follow-up transmission, and the arithmetic core is that instruction conflict is responded or postponed to the response for sending on inquiry request Receive response.The fast arithmetic core of the speed of service, the instruction due to having performed all its storages, then the arithmetic core is to pre- The response of inquiry request is sent to receive response immediately.
Due to only receive immediately response and when response is abandoned in instruction when being only included in the response that arithmetic core is made, just meeting to Make and receive the instruction that the arithmetic core transmission of response sends on immediately, therefore the slow arithmetic core of the speed of service sends instruction conflict Response postpones to receive after response, suspends the work that instruction is sent to all of arithmetic core.Now perform fireballing computing Core cannot also receive follow-up instruction, arithmetic core can only be waited to send reception response immediately again or instruct and abandon response Afterwards, the transmission of subsequent instructions could be recovered, the fast arithmetic core of the speed of service could receive follow-up instruction.If in addition, operation The response that slow-footed arithmetic core sends again still needs to wait until the slow fortune of the speed of service to postpone to receive response, then Calculation core to send receive immediately response or instruct and abandons responding, and could recover the transmission of subsequent instructions, the fast computing of the speed of service Core could receive follow-up instruction.
Based on the above, because the slow arithmetic core of the speed of service makes instruction conflict response or postpones to receive response, This will cause the follow-up work stoppage for sending instruction, wait until the slow arithmetic core of the speed of service send receive immediately response or Response is abandoned in instruction could recover.Such mechanism can allow for there may be certain operation difference between each arithmetic core, when this On the one hand the mechanism can balance the speed difference between each arithmetic core when species diversity is excessive, while also playing control program Perform the effect of rhythm.
Fig. 2 is instruction managing device provided in an embodiment of the present invention, including:
Feedback unit 12, to send instruction block sending signal, the instruction block will be described according to the execution sequence of program Procedure division is obtained.
Feedback unit 12 is also to after the stopping response that the arithmetic core sends is obtained, stopping sends the instruction block Sending signal.The feedback unit 12 is also after the stopping response that the arithmetic core sends is obtained, to obtain again the fortune After calculating the response of reception again that core sends, the instruction block sending signal is re-emitted.
Ready-portioned instruction block can be stored in SAM Stand Alone Memory, and feedback unit 12 is sent after instruction block sending signal, Instruction block is sent from SAM Stand Alone Memory to arithmetic core.
The instruction managing device can also arrange built-in command memory 13 (shown in Fig. 2), command memory 13 with Feedback unit 12 is connected, and to store the instruction block, the command memory 13 obtains the institute that the feedback unit 12 sends After stating instruction block sending signal, instruction block is sent at least one arithmetic core according to execution sequence.
The instruction managing device can also include:
Query unit 11 is sent on, before the instruction for sending instruction block, to send to each arithmetic core and send on inquiry request, Described to send on information of the inquiry request comprising the instruction for sending on, the instruction block draws described program according to the execution sequence of program Get;
The feedback unit 12 also to send on the response that inquiry request is made to described based on each arithmetic core, sets out Send instruction message;Feedback unit 12 is connected with query unit 11 is sent on, and sends on inquiry request from the acquisition of query unit 11 is sent on, instead Feedback unit 12 sends on inquiry request according to what is obtained, and in the response for obtaining corresponding response is searched;
Command memory 13 can send and refer to obtain the transmission instruction message from feedback unit 12 according to execution sequence Block is made at least one arithmetic core;Command memory 13 is connected with feedback unit 12, obtains to computing from feedback unit 12 Core sends the message of instruction, also referred to as fetching request.
Command memory 13 sends the instruction of the instruction block, the agreement granularity according to agreement granularity to the arithmetic core The quantity of the instruction to send on every time.Command memory 13 sends described by way of broadcast or multicast to the arithmetic core The instruction of instruction block.
The instruction managing device also includes drawing a module unit (not shown), to the execution sequence according to program by institute Procedure division is stated for the instruction block.
In being embodied as, feedback unit 12 makes different feedbacks, Ke Yishi for the response of arithmetic core:
If the response that all arithmetic cores are made is not responded including instruction conflict, the feedback unit 12 is made to sending The feedback of the instruction sent on described in the arithmetic core transmission for receiving instruction response;If the response that all arithmetic cores are made includes referring to Conflict response is made, then the feedback unit 12 is made and waits the arithmetic core of instruction conflict response to respond again, until institute There is the response that arithmetic core is made not include the feedback of instruction conflict response.
Can also be:If the response that all arithmetic cores are made does not include that instruction conflict responds and postpones to receive response, The feedback unit 12 is made to the feedback for sending the instruction sent on described in the arithmetic core transmission that immediately reception is responded;If all The response that arithmetic core is made includes that instruction conflict responds and/or postpones to receive response, then the feedback unit 12 makes wait The arithmetic core for making instruction conflict response is responded again, and waits the arithmetic core for making delay reception response to make Immediately response is received, until the response that all arithmetic cores are made does not include that instruction conflict responds and postpones to receive the anti-of response Feedback.
Instruction managing device can be used as independent device and arithmetic core cooperating, it is also possible to be integrated in arithmetic core In.
Fig. 3 is arithmetic core provided in an embodiment of the present invention, including:
The location of instruction 21, to receive and stores the instruction of instruction block;
Arithmetic element 22, to the instruction block that operating instruction memory element 21 is stored;Arithmetic element 22 is single with instruction storage Unit 21 is connected, and obtains from the location of instruction 21 and run after instruction;
The arithmetic core can also include:
Instruction transmission processing unit 23, to the finger stored in instruction block, the location of instruction 21 based on current operation Make block and instruction managing device send send on inquiry request, make to the response for sending on inquiry request;At instruction transmission Reason unit 23 is connected with the location of instruction 21, and instruction transmission processing unit 23 is also connected with arithmetic element 22, at instruction transmission Reason unit 23 obtains the information of the instruction for having stored from the location of instruction 21, including:Which instruction has been performed, and which refers to Order is also not carried out;Instruction transmission processing unit 23 obtains the information of the instruction block being carrying out from arithmetic element 22, including The storage location of the instruction block of execution.
Specifically, the storage address for sending on inquiry request comprising the instruction for sending in command memory, and Storage address of first instruction of the instruction block belonging to instruction for sending in the location of instruction 21.
During being embodied as, the instruction block for sending on is stored in command memory, the instruction that arithmetic core is received Block is stored in the location of instruction 21, and storage address of the instruction block in the location of instruction 21 is deposited with the instruction block for sending on Storage address of the storage in command memory has mapping relations.The location of instruction 21 is also to preserve holding for the instruction block Row order.
Also to send stopping response, the location of instruction 21 sends the stopping and rings the location of instruction 21 Ying Hou, stops receiving the instruction block for sending on.The location of instruction 21 is also after stopping response being sent, to send again weight New to receive response, the location of instruction 21 is sent after the response of reception again, and the instruction block for sending on is received again.
When arithmetic core inside occurs abnormal, the location of instruction 21 also to send stopping response, stops reception and sends on Instruction block, for example, the memory space of the location of instruction 21 is full, it is impossible to when storing new instruction block;The instruction block for sending on When the instruction block being currently executing with arithmetic element 22 conflicts etc..When arithmetic core inside occurs abnormal, the location of instruction 21 send stopping response, if Abnormality remove, the location of instruction 21 can also send again receive again response, receive pre- again The instruction block for sending, for example, deletes the instruction block of executed, and the memory space of the location of instruction 21 can continue to new During instruction block;When the instruction block that currently stored instruction block can subsequently be sent on is covered.
Fig. 4 is that command memory provided in an embodiment of the present invention is closed with the storage address mapping of arithmetic core store instruction block The schematic diagram of system.
Storage order of the instruction in command memory 3 (referring to Fig. 7, or the command memory 13 in Fig. 2) be:Instruction 0-4 → instruction 5-24 → instruction 25-34, carries out piecemeal, respectively to instruction:Instruction block B0 (comprising former instruction 0-4), instruction block B1 (comprising former instruction 5-24), instruction block B2 (comprising former instruction 25-34).For convenience of describing, to life of the instruction in instruction block Name is modified, and the title of the instruction in Fig. 4 in instruction block uses [instruction name in instruction block] shown in table 2, specifically Corresponding relation is as shown in table 2:
Table 2
Instruction block title Former instruction name Instruction name in instruction block
Instruction block B0 Instruction 0-4 Instruction 0-4
Instruction block B1 Instruction 5-24 Instruction 0-19
Instruction block B2 Instruction 25-34 Instruction 0-9
Store three instruction blocks in command memory 3 to be respectively:Instruction block B0, instruction block B1, instruction block B2, its middle finger Block B0 is made to include instruction 0-4, five instructions, instruction block B1 includes 0-19,20 instructions, and instruction block B2 includes 0-9, ten fingers Order.Instruction block sequential storage in command memory 3.The location of instruction 21 can at most store 20 instructions, instruction storage 21 points of unit is 20 storage address, and each storage address correspondence order label is successively:Address 0- addresses 19.Instruction storage is single First 21 are with the corresponding relation of the storage address of the instruction block of command memory 3:Instruction block B0 includes instruction 0-4, instruction block B0 Storage location in the location of instruction 21 is located at address 0- addresses 4, and instruction block B1 includes instruction 0-19, the finger of instruction block B1 Storage locations of the 0-14 in the location of instruction 21 is made to be located at address 5- addresses 19, the instruction 15-19 of instruction block B1 is in instruction Storage location in memory element 21 is also located at address 0- addresses 4 (same address can be multiplexing), and instruction block B2 includes Instruction 0-9, storage locations of the instruction block B2 in the location of instruction 21 is located at address 5- addresses 14.
The mapping relations of the storage address of above-mentioned instruction block are preset, and in other embodiment, the mapping relations can also be by According to other Rulemakings.But once mapping relations are formulated, during the sending on of whole instruction, the mapping relations must not be repaiied Change, the location of instruction 21 is in the instruction of store instruction block, it is necessary to according to the mapping relations storage for making.
Storage mode of the instruction in instruction block in command memory 3 and the location of instruction 21 has various, relatively more normal It is to be preserved in the form of Cache management, a Cache row can be instructed including dozens or even hundreds of bar, it is also possible to only be wrapped Include an instruction.In the present embodiment, instruct and command memory 3 is stored in the form of direct Cache mappings and instructs storage single In unit 21.In the mode of management of direct Cache mappings, there is no no space preservation in the location of instruction 21 and send on instruction Problem, therefore do not have delay receive response produce.
Based on above-mentioned mapping relations and inquiry request is sent on, arithmetic core can respond to sending on inquiry request.
If instruction or cover stored but also unenforced that the instruction covering arithmetic core for sending on is currently running Instruction, or either way have, then the instruction transmission processing unit 23 of the arithmetic core makes instruction conflict response;If sending on Instruction for the arithmetic core need instruction, then the instruction transmission processing unit 23 of the arithmetic core make command reception sound Should;If the instruction for sending on is had stored in the arithmetic core, the instruction transmission processing unit 23 of the arithmetic core is made Response is abandoned in instruction.
Specifically, instruction block B0 is first instruction block for being stored in the location of instruction 21, occupies address 0- addresses 4 Position.It is follow-up to send instruction block B1, if agreement granularity first sends the instruction 0- of instruction block B1 to send 5 instructions every time 4.It is described to send on the storage that instruction block B1 is contained in inquiry request in command memory when transmission sends on inquiry request Location, and storage address of first instruction of instruction block B1 in the location of instruction 21:Address 5.
Instruction block B0 is first instruction block for being stored in the location of instruction 21, occupies the position of address 0- addresses 4, but It is that address 5- addresses 19 are the free time, therefore the instruction 0-4 of store instruction block B1 can be continued.Continue to send instruction by agreement granularity The instruction 5-9 of block B1, the instruction 10-14 of instruction block B1, the location of instruction 21 is by mapping relations sequential storage instruction block B1's 0-14 is to address 5- addresses 19 for instruction.
When continuing to send the instruction 15-19 of instruction block B1, leisureless address can continue to deposit the location of instruction 21 The instruction 15-19 of instruction block B1 is stored up, and the instruction 15-19 of instruction block B1 corresponding storage address in the location of instruction 21 is Address 0- addresses 4.If arithmetic core is carrying out instruction block B0, because the instruction 15-19 of instruction block B1 is in the location of instruction Storage address in 21 is address 0- addresses 4, if being carrying out instruction block B0, then it represents that the instruction for sending on covers the operation core Instruction or cover stored but also unenforced instruction that the heart is currently running, it is impossible to the instruction that reception currently sends on, then should Arithmetic core makes instruction conflict response.
If instruction block B0 has been performed completing, the instruction of address 0- addresses 4, can be capped in the location of instruction 21 Fall, then the arithmetic core makes command reception response.
Complete instruction block B1 instruction send on after, continue the instruction for sending on instruction block B2, receive and send on instruction block B2's Inquiry request is sent on, this sends on inquiry request includes:Storage address of the instruction block B2 in command memory, and instruction block B2 First storage address of the instruction in the location of instruction 21:Address 5.
Principle ibid, if the instruction 0-4 of instruction block B1 is carrying out or stored but have not carried out, instruction block B2's Instruction 0-4 can not be stored, and arithmetic core makes instruction conflict response.If the instruction 0-4 of instruction block B1 is all performed and completed, Then the instruction 0-4 of instruction block B2 can be stored, and arithmetic core makes command reception response.
If the instruction 5-9 of instruction block B1 is carrying out or stored but have not carried out, the instruction 5-9 of instruction block B2 is not Can store, arithmetic core makes instruction conflict response.If the instruction 5-9 of instruction block B1 is all performed and completed, instruction block The instruction 5-9 of B2 can be stored, and arithmetic core makes command reception response.When being embodied as, according to agreement granularity instruction is sent The instruction of block, will send before instructing according to agreement granularity transmission every time and send on inquiry request, and wait the sound of arithmetic core Follow-up operation should be carried out.
The instruction included in instruction block B1 can not be identical with the instruction included in instruction block B2.It is in the location of instruction 21 With the stored address area split instruction that instructs whether identical, the storage address in command memory is different, even if content is identical It is considered as different instructions.
If instruction block sequence is B0-B1-B2 performs one time afterwards, recirculation is performed one time, such as B0-B1-B2-B0-B1- B2.There is the position of address 0- addresses 4 in instruction block B0,20 instructions of instruction block B1 have address 5- addresses 19, address 0- ground The position of location 4,10 of instruction block B2 instruct the position that there is address 5- addresses 14.The first round performs (B0-B1-B2), instruction Block B0, instruction block B1, instruction block B2 all send on, when instruction block B2 sends on end, address in the location of instruction 21 That the position of 0- addresses 4 preserves is the instruction 15-19 of instruction block B1, the position of address 5- addresses 14, and preservation is instruction block B2 Instruction 0-9, the position of address 15- addresses 19, preservation be instruction block B1 instruction 10-14.Second wheel performs (B0-B1- B2), when sending on instruction block B0, the position of address 0- addresses 4 in the location of instruction 21 can be covered the instruction 0-4 of B0;Send on During instruction block B1, when sending on instruction 0-9, because the position of address 5-14 in the location of instruction is the instruction of instruction block B2, need Again the instruction 0-9 of instruction block B1 is received, instruction block B1 instructs 10-14 in the location of instruction 21, therefore is not required to Send on, to sending on inquiry request meeting return instruction response is abandoned.
Specifically, the command reception response is divided into:Immediately receive response and postpone to receive response.If the instruction for sending on The instruction of the needs of arithmetic element 22 and the location of instruction 21 of the arithmetic core for the arithmetic core is deposited with enough The instruction sent on described in the storage of space is stored up, then the instruction transmission processing unit 23 of the arithmetic core is made and receive immediately response;If The instruction but the location of instruction 21 of the arithmetic core that the instruction for sending on needs for the arithmetic element 22 of the arithmetic core does not have There is the instruction sent on described in enough memory space storages, then the instruction transmission processing unit 23 of the arithmetic core is made delay and connect Receive response.In the instruction way to manage of direct Cache projected forms, only can produce to postpone to receive when instruction conflict and ring Should.In other instruction way to manages, according to different situations, arithmetic core makes instruction conflict response or postpones to receive and rings Should.
Instruction transmission processing unit 23 sends the delay and receives after response, exists through a period of time (referred to as time delay) After conflict is released, respond again.Instruction transmission processing unit 23 is sent after instruction conflict response, after conflict is released, then It is secondary to respond.Conflict includes the instruction for instructing the covering arithmetic core to be currently running for sending on and/or covers Storage but also unenforced instruction, memory space inadequate of the instruction of store instruction block etc. in arithmetic core.New response can be with It is to receive response immediately, postpone to receive one kind that response or instruction conflict response, instruction are abandoned in responding.
An arithmetic core can be specified in numerous arithmetic cores to instruct managing device, then the arithmetic core also includes Instruction managing device as shown in Figure 2.
Arithmetic core mostly is processor, responds to execute instruction block and to sending on inquiry request, but arithmetic core Can also integrate with instruction managing device.That is, instruction managing device and arithmetic core can be respectively as only Vertical part, is in communication with each other the process for completing to instruct, and instruction managing device can also integrate to form collection with arithmetic core Into part, the integrated component both can be realized instructing the work of managing device, and the work of arithmetic core can be realized again.
Fig. 5 is the schematic diagram of program provided in an embodiment of the present invention and instruction block, and Fig. 6 is finger provided in an embodiment of the present invention The schematic diagram of the instruction in block is made, Fig. 7 is instruction management system provided in an embodiment of the present invention, detailed with reference to Fig. 5 to Fig. 7 Explanation.
Specific embodiment:
Before program is run in into arithmetic core, need for described program to be divided into instruction block.Program is by some fingers Order composition, includes the instruction of program in the instruction block after division.Described program can be a large-scale system level program, Can also be small-sized Application Software Program, can also be the program from the partial function module of selected parts in complete program.This Illustrate by taking small-sized Application Software Program as an example in embodiment, can be in other embodiments large-scale system level program, or Person is the program from the partial function module of selected parts in complete program, can also be other programs, the installation of such as software Program etc., is not limited by the present embodiment.
Program is divided into into the main following several principles of instruction block:
1. instruction block is divided into according to the execution sequence of program;
2. after dividing, storage of the space shared by single instruction block less than or equal to the location of instruction of arithmetic core Space;
3. complete instruction to be included in instruction block;
4. it is not allow for overlapping between instruction block.
Program can be divided into instruction block by block algorithm, it is also possible to combine practical experience, for example, during the execution of instruction Between, the call number etc. of instruction be divided into instruction block.
Specifically, in principle 1, program is generally all continuous, and with certain execution sequence, program is divided into During instruction block, divide according to the execution sequence of program.Because program is continuous, according to the instruction block that execution sequence is divided, lead to It is also often continuous.
The execution sequence of the instruction block after division is identical with the execution sequence of program, that is, the perform track of instruction block It is consistent with the execution sequence of program.The execution sequence of described program refers to the execution sequence instructed in program.
If there is the programs such as circulation, recurrence or conditional judgment in program, the perform track of instruction block and holding for program Row order can be slightly different, but the execution sequence instructed in instruction block is identical with the execution sequence of program.
For example, for example, cyclic program.Cyclic program is generally circulated some instructions of execution.If these are instructed In being divided among different instruction blocks, when transmission sends on inquiry request, identical instruction can be also cycled through, this can affect computing The execution efficiency of core.Therefore the instruction of the above-mentioned type is normally placed in same instruction block.The instruction of the above-mentioned type is placed on In same instruction block, complete cyclic program is performed in the instruction block, be suitable from the point of view of the overall execution sequence of instruction block Sequence.But in the actual motion of program, the instruction in the instruction block is that circulation is performed, and is instructed in its execution sequence and program Execution sequence is on all four.
But in a special case, the instruction of the above-mentioned type can also be dispersed in different instruction blocks.For example, circulate embedding The program of set circulation, now outer loop instruction and interior loop instruction can be placed in different instruction blocks.
The division for herein procedure division being referred in logic for instruction block, that is to say, that by procedure division be instruction block Final acquisition is instruction block sequence or instruction block list.That is, the instruction block after dividing only is with instruction block sequence Form representation program have which instruction block, each instruction block include which instruction, rather than by program carry out physically point Cut.
For example, program bag to be divided is containing 15 instructions, the run time and call number of combined command, by program Three instruction blocks are divided into, each instruction block includes 5 instructions, the table in the form of instruction block sequence of the instruction block after division Show, rather than exist with three independent instruction blocks, the instruction block sequence after division is as shown in table 3:
Table 3
Instruction block title Former instruction name Instruction name in instruction block
Instruction block 1 Instruction 0-4 Instruction 0-4
Instruction block 2 Instruction 5-9 Instruction 0-4
Instruction block 3 Instruction 10-14 Instruction 0-4
As shown in table 3, the instruction that instruction block title, each instruction block are included is shown in instruction block sequence.Refer in transmission It is first according to agreement granularity (in the present embodiment, arranging granularity to send 5 instructions every time) according to the order of instruction block when making block The instruction of instruction block 1 is sent, when transmission sends on inquiry request, first five instruction (instruction 0-4) for sending former instruction is deposited in instruction Storage address of the instruction 0 of storage address and former instruction in reservoir in the location of instruction, then sends instruction block 2, When now transmission sends on inquiry request, send storage address and original of the instruction 5-9 of former instruction in command memory and instruct Storage address of the instruction 5 in the location of instruction, secondly send instruction block 3, now send when sending on inquiry request, send The instruction 10 of storage address and former instruction of the instruction 10-14 of original instruction in command memory is in the location of instruction Storage address.
In principle 2, the space shared by instruction block after division is less than or equal to the location of instruction of arithmetic core Memory space, instruction block needs to be sent in advance in the location of instruction of arithmetic core, if the sky shared by the instruction block after dividing Between more than arithmetic core the location of instruction memory space, then cannot store the instruction block.
For multinuclear and many-core, the command capacity that the instruction block after division is included is less than or equal to each fortune Calculate the memory space of the location of instruction of core.That is, being less than or equal to the storage of the minimum location of instruction Space.The storage mode of instruction has various, and conventional is stored in the form of Cache rows, in general, with the shape of Cache rows Formula is stored, and a Cache row can be 128 bytes, 256 bytes or 512 bytes, and each Cache row includes a number of finger Order, and the number of instructions that each Cache row includes is identical.Instruction block generally includes some Cache rows, and size can be 128 words Section, 256 bytes or 512 bytes.
Specifically, the command capacity for including in the single instruction block after division is less than or equal to the instruction of arithmetic core The storage size of memory element.Instruction is transmitted, in the form of Cache rows in transmitting procedure with predetermined agreement granularity During store instruction, the agreement granularity is generally a Cache row.For example, the Cache rows of 128 bytes, instruct as 4 words During section, a Cache row includes 32 instructions.
Usually require that, the number of instructions that instruction block is included preferably instructs the integral multiple of the agreement granularity of transmission, for example, refers to Order is stored in the form of Cache rows, and the agreement granularity for instructing transmission is a Cache row (can also be several Cache rows), greatly Little is 128 bytes (Byte), then can by instruction block with a Cache row (128 byte), two Cache rows (256 byte), Or the form of four Cache rows (512 byte) is divided;Or otherwise store instruction when, instruct transmission agreement granularity For 5 instructions, then single instruction block can be comprising 10,15 or 20 instructions.
In principle 3, complete instruction is included in instruction block, instruct and preserved in the form of Cache rows, a Cache row Including an instruction or several instructions.
Such as aforementioned citing, Cache rows are exactly the internally cached replacement arranged and between main memory of piece, transmit the unit of granularity, For example, it may be transmitting the unit of granularity between command memory and arithmetic core.Above arrange granularity and represent each transmission The quantity of instruction, if a byte of Cache behaviors 256, instruct for 4 byte when, a Cache row represents 64 instructions, then about Granularity is determined for a Cache row (256 byte), every time 64 instructions of transmission.Under normal circumstances, the storage address in arithmetic core It, by byte-addressable, is exactly 256 bytes per data transfer if specifying a byte of Cache rows 256 to be, and be given Address is 0x100, and low level is all 0, and this is referred to as the byte boundary alignment of address 256.Such efficiency highest of memory access.
It is that many instructions are divided into several groups (per group is referred to as instruction block), such a instruction when dividing instruction block Block may take several Cache rows, or less than 1 Cache row.To include complete instruction in so-called instruction block, refer to most Ensure that well in 1 instruction block comprising complete Cache rows (drawing block by Cache to boundary), for example, 256 byte boundary alignments Cache rows be not in the first half in previous instruction block, later half occurs in the situation in next instruction block.
An instruction in arithmetic core or command memory, typically 4 bytes or 8 bytes, on main memory or piece at a high speed It is continuous storage in caching (command memory) when storage.If it is 256 bytes so to arrange Cache rows size, 256 is 4 or 8 integral multiple, and according to mode of the address to boundary, it is not in 14 byte for instructing to transmit a Cache row Situation about can be stored in different Cache rows.
The efficiency of memory access can be improved by the way of to boundary.But this constraint is not required, can not be to boundary.
In principle 4, it is not allow for overlapping between instruction block, for example:It is followed successively by comprising continuous instruction in program:Instruction 0~ Instruction 10, two instruction blocks for being divided into order are followed successively by:Instruction block 1, instruction block 2, instruction block 1 includes instruction 0~instruction 5, Instruction block 2 includes instruction 6~instruction 10.This zoned format is allowed.
But if there is such zoned format:Instruction block 1 comprising instruction 0~instruction 8, instruction block 2 comprising instruction 5~ Instruction 10, does not so allow.
In addition, instruction block be to one section Already in host (command memory) in programmed instruction in logic draw Point, the physical address (storage address) of programmed instruction storage is fixed in command memory, and instruction block is when division It is instruction block B_z to specify from the part between storage address addr_x~addr_y.As shown in aforementioned table 3, instruction block sequence can To be embodied as table 4:
Table 4
Instruction block title Original instruction address Instruction name in instruction block Storage address
Instruction block 1 Instruction 0-4 Instruction 0-4 Addr_0~addr_4
Instruction block 2 Instruction 5-9 Instruction 0-4 Addr_5~addr_9
Instruction block 3 Instruction 10-14 Instruction 0-4 Addr_10~addr_14
When sending on instruction block, the content that inquiry request is included is:Whether inquiry arithmetic core needs to be located at address Instruction in the range of addr_x~addr_x+m (m represents the side-play amount of storage address).Arithmetic core is according to the current finger for performing Block, stored but unenforced instruction block are made, and sends on inquiry request, to sending on inquiry request suitable response is made.
It is to go in command memory to take according to the address of instruction when arithmetic core operating instruction, arithmetic core will The instruction got is stored in the location of instruction according to previously described mapping relations.
Dividing the mode of instruction block has various, and the program for dividing is different, also different by the way of, is being embodied as When, it is necessary to divided according to above-mentioned 4 principles, instruction block sequence is divided according to actual needs, not by tolerance limit in the example above System.
Fig. 5 is the schematic diagram of the instruction block of an application program in the embodiment of the present invention, is described in detail with reference to Fig. 5.
In this application citing, program B is small-sized Application Software Program, when program B is divided into into instruction block, it then follows above-mentioned 4 principles.
When dividing instruction block, consider to be divided according to the execution sequence of program B first, program B is completed once designing, its execution Order is also just secured.The instruction block sequence of program B can be divided according to rule, no matter mark off the instruction block sequence come How, will ensure that the track of final instruction is consistent with original program B.
Fig. 5 Programs B include program segment 0, program segment 1, program segment 2, its execution sequence be 0 → program segment of program segment 1 → Program segment 2, therefore when being divided into instruction block, each program segment can be divided into an instruction block.By program B according to execution Order is divided into three instruction blocks, respectively instruction block B0, instruction block B1, instruction block B2.
Flow direction in Fig. 5 shown in arrow represents instruction block B0, instruction block B1, the execution sequence of instruction block B2, i.e. B0 → B1 →B2。
In other embodiments, if including program segment 0 in program A, program segment 1, program segment 2, its execution sequence is to redirect Perform, for example, 1 → program segment of program segment, 0 → program segment 2 this execution sequence for redirecting occurs or is similar to this redirecting Execution sequence when, can be by program segment 0, program segment 1, program segment 2 is placed in same instruction block that (i.e. whole program is only included One instruction block), it would however also be possible to employ the dividing mode in the present embodiment;If including program segment 0, program segment 1, program in program C Section 2, its execution sequence is that circulation is performed, for example, 1 → program segment of program segment 0 → program segment, 1 → program segment, 2 → program segment 2 → 1 → program segment of program segment, 2 → program segment 0, when there is the execution sequence of this circulation or being similar to the execution sequence of this circulation, Program segment 1 and program segment 2 can be divided in an instruction block B1, program segment 0 is individually divided into an instruction block B0, be instructed Block sequence is the circulation between B0 and B1:B0→B1→B0.When instruction block is divided, can be divided according to practical situation, no It is limited to the content of the example above.
Also need to consider when dividing instruction block:The space shared by instruction block after division is less than or equal to each arithmetic core The location of instruction memory space;To include complete instruction in instruction block;Instruction between instruction block does not allow to overlap.
In the present embodiment, 4B is instructed per bar, the minimized storage space of the location of instruction of arithmetic core is 80B, therefore Command capacity included in instruction block B0, instruction block B1 and instruction block B2 is less than or equal to 80B.In the present embodiment, will refer to Block B0, instruction block B1 and instruction block B2 is made to be respectively divided into 20B, 80B and 40B.In the present embodiment, instruction block B0, instruction block B1 Differ with the command capacity included in instruction block B2, in other embodiments, instruction block B0, in instruction block B1 and instruction block B2 Comprising command capacity can be with identical.
Program segment 0, program segment 1, program segment 2 is respectively the complete program that order is performed, and every section of program bag contains some Instruction.Therefore by program segment 0, program segment 1, when program segment 2 is respectively divided into instruction block B0, instruction block B1 and instruction block B2, energy Enough ensure to include that the execution sequence of instruction block is consistent with the execution sequence of instruction in instruction block.
Accordingly, instruction block B0, instruction block B1 and instruction block B2 also ensure that the instruction that order is performed can not hold in order Repeat in capable instruction block.That is, instruction block B0 includes program segment 1, program segment can not be again included in instruction block B1 1。
In the present embodiment, program segment 0, program segment 1, the program that program segment 2 is performed for order, therefore instruction block B0, instruction The execution sequence of block B1 and instruction block B2 and program segment 0, program segment 1, the execution sequence of program segment 2 is identical.
Ready-portioned instruction block sequence information is preserved in the form of a list, and in the present embodiment, ready-portioned instruction block is referred to Logically instruction is divided.
Fig. 6 is the schematic diagram of the instruction that instruction block is included in Fig. 5, instruction block B0 as shown in Figure 6 include instruction 0-4, five Instruction, instruction block B1 includes instruction 0-19,20 instructions, and instruction block B2 includes instruction 0-9, ten instructions.Finger shown in Fig. 6 The name instructed in block is made to can refer to the content shown in table 3.
Fig. 7 is instruction management system provided in an embodiment of the present invention.In the present embodiment, managing device 1, arithmetic core is instructed 2 and command memory 3 be independently arranged, each other by the communication cooperating of message.
The instruction management system includes:
Instruction managing device 1, when instruction block is sent, to send to each arithmetic core 2 and send on inquiry request, based on each The instruction that the response that inquiry request makes is sent in the instruction block is sent on described in arithmetic core 2 pairs, it is described to send on inquiry request bag Instruct containing storage address of the instruction for sending in command memory 3, and first of the instruction block belonging to the instruction for sending on Storage address in arithmetic core 2;
Instruction managing device 1 sends to each arithmetic core and sends on inquiry request, primarily to understanding each arithmetic core 2 The implementation status of (arithmetic core 2 is referred to each arithmetic core), and according to the response content of each arithmetic core 2, sentence It is disconnected whether to send follow-up instruction block to each arithmetic core 2;
Because the execution speed of each arithmetic core 2 has differences, some arithmetic cores perform speed soon, some arithmetic cores Speed is performed slow, by sending on the response that inquiry request is made described in each arithmetic core 2 pairs, it can be determined that each arithmetic core 2 Implementation status, when the execution speed difference of each arithmetic core 2 is larger, is sent to each arithmetic core 2 by control and is instructed, balance The difference of each arithmetic core 2.
Arithmetic core 2, to receive simultaneously operating instruction block, and the inquiry request that sends on to instructing managing device 1 to send is done Go out response;In the instruction management system, for multiple, respectively arithmetic core 0 is to arithmetic core n for arithmetic core 2;
What the instruction of arithmetic core 2 pairs managing devices 1 sent send on inquiry request responds and generally includes four kinds, respectively It is:Instruction reception response immediately, instruction delay receive response, instruction and abandon responding and instruction conflict response.If arithmetic core 2 In the location of instruction using direct Cache mapping way to manage when, instruction delay receive response with instruction conflict response It is identical, therefore arithmetic core 2 pairs instructs the inquiry request that sends on that managing devices 1 send to respond, it is also possible to regard three kinds as: Immediately response, instruction are received and abandons response and instruction conflict response;Arithmetic core 2 can only once make a kind of response.
Specifically, if send on instruction for arithmetic core 2 need instruction and the arithmetic core 2 have enough storages The space storage instruction for sending on, then the arithmetic core 2 make receiving immediately and respond;If the instruction for sending on is by covering operation core Instruction or cover stored but also unenforced instruction that the heart 2 is currently running, or two kinds of instructions all will be capped, cause The arithmetic core 2 makes instruction conflict response without the instruction sent on described in enough memory space storages, then arithmetic core 2; If the instruction for sending on has stored in arithmetic core 2, arithmetic core 2 makes instruction and abandons response;
The instant response expression arithmetic core 2 that receives can receive at any time instruction conflict response expression fortune described in instruction block Calculating core 2 can not temporarily receive instruction block, and when only conflict could judge after releasing instruction block is needed;The conflict includes Instruction or cover stored but also unenforced instruction that the instruction covering arithmetic core for sending on is currently running, or Both person covers;
If arithmetic core 2 sends instruction conflict response, arithmetic core 2 is also needed to after instruction conflict response is sent, in punching When can receive instruction after prominent releasing, respond to instruction managing device 1 again.
Command memory 3, to store instruction block, (in the present embodiment, what is stored in command memory 3 is program, specifically Can be the ready-portioned instruction block preserved with job sequence or tabular form);Command memory 2 is by instruction managing device 1 control, instruction managing device 1 judges each operation core based on the response that inquiry request is made is sent on described in each arithmetic core 2 pairs The state of the heart 2, so as to decide whether to continue to send follow-up instruction block, if continuing to send follow-up instruction block, deposits to instruction Reservoir 3 sends fetching request, that is, control instruction memorizer 3 sends the order of instruction block to each arithmetic core 2;Instruction storage When device 3 receives order (for example, the fetching request) of the transmission instruction block that instruction managing device 1 sends, send to arithmetic core 2 and refer to Order.
Because arithmetic core 2 is multiple, command memory 3 typically sends instruction, i.e., one by way of broadcast or multicast Secondary can be simultaneously the transmission instruction of multiple arithmetic cores, the wait of arithmetic core 2 when so on the one hand can reduce serial transmission Time, on the other hand can reduce occupancy of the instruction transmission to communication network, it is possible to increase chip data transmission utilization rate, and then Improve the operation efficiency of arithmetic core 2.
In specific implementation process, ready-portioned instruction block B0, instruction block B1 and instruction block B2 are to be stored in instruction in advance In memorizer 3.The capacity of command memory 3 can be than larger, to deposit more instruction block.
The instruction management system can also include blocking unit (not shown), procedure division to be held for order Capable instruction block.Program, when sending in command memory 3, can be the instruction block for having divided, it is also possible in instruction storage Blocking unit is set in device 3, program is sent into after command memory 3, instruction block is divided into by blocking unit.
The information write instruction managing device 1 of instruction block will be divided, the information of the instruction block includes the number of instruction block Amount, the size of instruction block, the execution sequence information of instruction block, the number of the instruction that each instruction block is included, the storage address of instruction Deng.The information of instruction block can write or import in instruction managing device 1 by software instruction in the form of a list.
Instruction block B0, instruction block B1, instruction block B2 are ordered storages in command memory, instruction block B0, instruction block B1, the instruction of instruction block B2 are stored in arithmetic core 2 according to predetermined direct mapping relations.
Fig. 4 is such as referred to, 5 of instruction block B0 instruct the position that there is address 0- addresses 4,20 instructions of instruction block B1 to deposit In address 5- addresses 19, the position of address 0- addresses 4,10 of instruction block B2 instruct the position that there is address 5- addresses 14.
Instruction is stored in single command memory 3, the work efficiency of instruction managing device 1 can be improved, reduced The expense of instruction managing device 1.Instruction managing device 1 only needs to undertake the work of management, carries out sending on inquiry with arithmetic core 2 The communication of request, to command memory 3 fetching request is sent, the work without undertaking the instruction for sending instruction block.With fortune Calculating core 2 carries out sending on the communication of inquiry request, and to command memory 3 fetching request is sent.
In the present embodiment, instruction managing device 1 and command memory 3 are independently arranged, in other embodiments, both Can integrate, not limited by the above.
The number of arithmetic core 2 is generally multiple, referred to as multinuclear or many-core, the parallel data processing of multiple arithmetic cores 2, with Strengthen the efficiency of data processing.In arithmetic core 2 comprising receive and store the instruction block instruction the location of instruction and Run the arithmetic element of the instruction of the instruction block of the location of instruction storage.The location of instruction is also referred to as instruction office Deposit, capacity is typically in KB magnitudes.
In the present embodiment, the capacity of arithmetic core 0 to the location of instruction of arithmetic core k is 80B, arithmetic core k+1 Capacity to the location of instruction of arithmetic core n is 160B.Instruction block B0 is 20B, and instruction block B1 is 80B, and instruction block B2 is 40B.In other embodiments, the capacity of arithmetic core 0 to the location of instruction of arithmetic core n can with identical, for example, all for 80B, or be all 160B.The size of instruction block can also be identical, is all 80B for example.
During the original state of the instruction management system startup optimization, in the location of instruction of arithmetic core 2 typically Empty, that is to say, that there is no store instruction block.Now, instructing managing device 1 to send on inquiry to each transmission of arithmetic core 2 please Ask, arithmetic core 2 all can feed back receive immediately response.When follow-up transmission instruction block B1, the instruction of instruction block B2, arithmetic core 2 According to itself ruuning situation and the information of the instruction for sending on, to the response that the instruction feedback of managing device 1 is suitable.
Because instruction block have followed principle 1 when dividing, all of instruction block is both less than or equal to arithmetic core 2 The capacity of the location of instruction, and instruct the storage address in command memory 3 to have with the storage address in arithmetic core 2 Corresponding relation, therefore the information for sending on the size that instruction block is not typically carried in inquiry request for instructing, but sending in instruction In inquiry request, storage address of the instruction that carrying sends in command memory 3.
It is described to send on information of the inquiry request comprising the instruction for sending on.The inquiry request that sends on exists comprising the instruction for sending on Storage address in command memory, and first instruction of the instruction block belonging to the instruction for sending on is in the location of instruction Storage address in (namely arithmetic core 2).
Instruction managing device 1 typically according to the execution sequence of instruction block, sends instruction block to arithmetic core 2 successively, and And after an instruction block is sent completely, retransmit next instruction block.Specifically, when instruction block is sent, instruction block is torn open It is divided into instruction, and sends according to agreement granularity.The agreement granularity refers to the number of the instruction of each transmission.Therefore instruction management Device 1 sends when sending on inquiry request to arithmetic core 2, is also that the inquiry request that sends on of the instruction of an instruction block sends and locates After reason is completed, the instruction for sending next instruction block sends on inquiry request and processes.
Arithmetic core will not send delay and receive response in the present embodiment, therefore when instructing to the transmission of arithmetic core 2 every time, When instruction managing device 1 is only responded in the response for receiving not comprising instruction conflict, just can send to command memory 3 and take Refer to request;If comprising instruction conflict response in the response that instruction managing device 1 is received, instruction managing device 1 can control to refer to Make memorizer 3 suspend to arithmetic core 2 and send instruction.In addition, instruction managing device 1 sends fetching request to command memory 3 When, can inform that command memory 3 only sends instruction to the arithmetic core 2 for sending reception response immediately.If that is, operation core The heart 2 makes instruction and abandons response, then the arithmetic core 2 would not receive this instruction for sending on.This can prevent instruction from connecing Receive and repeat, it is to avoid the communication resource is wasted.
In the present embodiment, the capacity of arithmetic core 0 to the location of instruction of arithmetic core k is 80B, arithmetic core k+1 Capacity to the location of instruction of arithmetic core n is 160B.The size of instruction block B0 is 20B, and arithmetic core 2 have received finger After making the instruction of block B0, the instruction of 20B is all stored in arithmetic core 0 to arithmetic core n, for arithmetic core 0 is to operation core For heart k, its capacity is 80B, is currently available that capacity for 60B, for arithmetic core k+1 to arithmetic core n, its capacity For 160B, it is currently available that capacity is 140B.
In other embodiments, if arithmetic core 0 is identical to the capacity of the location of instruction of arithmetic core n, due to every The speed of individual arithmetic core execute instruction is different, it is also possible to cause the active volume of the location of instruction of each arithmetic core not Together.
Technical scheme at least has the advantage that:
The method combined using software and hardware, software is divided into instruction code a series of according to the perform track of program Instruction block sequence, the instruction block track of software assurance each arithmetic core instruction is consistent.Hardware marks off the finger for coming according to software Block sequence information is made, instruction needed for arithmetic core is sent in the command memory of arithmetic core.Because programmed instruction track carries It is front to understand, instruction actively can be in advance loaded in the memorizer of arithmetic core before the real execute instruction of arithmetic core.
It is little instruction block by program cutting, arithmetic core is delivered in advance in the instruction in instruction block, arithmetic core can be in fortune Instructed before row.This makes, and sending on for call instruction is relatively independent with calculating core operation, and each arithmetic core can not affect just Receive other instructions for sending on simultaneously in the instruction for performing, reduce the time that arithmetic core fetching is waited, improve arithmetic core Operation efficiency.
Each arithmetic core no longer actively sends fetching request, can eliminate the competition of arithmetic core fetching, additionally it is possible to reduce right The occupancy of communication network, is conducive to further improving the operation efficiency of arithmetic core.
Instruction is organized into the instruction block of volume-variable, and the instruction super block track that each arithmetic core is performed is identical, but allows Perform track of each arithmetic core in instruction block is different.That is, the execution speed of each arithmetic core is different, base Send on the response that inquiry request makes and send instruction in the instruction block to described in each arithmetic core, each operation core can be balanced The difference of the speed of service between the heart.This causes each arithmetic core speed of service difference in controlled range, but does not force each Arithmetic core synchronous operation instruction.This is a kind of loose synchronization mechanism to send on control mode of the instruction as means, that is, lead to Cross transmission and send on inquiry request, the rhythm of the advance transmission that the response control based on arithmetic core is instructed, balance movement speed is fast Arithmetic core and the slow arithmetic core of the speed of service between speed difference.That is, making the most fast arithmetic core of operation Not over running, most slow core is too many, if the instruction that the fast arithmetic core of operation will run can cover the slow computing of operation Core be currently running instruction block when, run slow arithmetic core and send instruction conflict response, suspend to all arithmetic cores Send the work of instruction.There is some difference for the speed of each arithmetic core of technical scheme permission, and instruction sends on process In few to the interference of arithmetic core therefore applicable scope it is wider.
Mode is sent on the instruction of multicast, broadcast, it is especially suitable to multinuclear, many-core structure, can reduce between arithmetic core Fetching competition, and reduce instruction congestion time of the transmission to communication network, improve chip-on communication network utilization ratio.
The technical scheme is that a kind of instruction based on loose synchronization sends on mechanism, solve monokaryon, multinuclear, many-core The fetching race problem that core fetching postpones in long problem and multinuclear, many-core processor is calculated in processor, process is improve The operation efficiency of device.
Technical scheme solves to calculate core in monokaryon, multinuclear, many-core processor using the method for software-hardware synergism Heart fetching postpones long problem.By the way of instruction sends on, the fetching competition in effectively solving multinuclear, many-core processor is asked Topic.
Although the present invention is disclosed as above with preferred embodiment, it is not for limiting the present invention, any this area Technical staff without departing from the spirit and scope of the present invention, may be by the methods and techniques content of the disclosure above to this Bright technical scheme makes possible variation and modification, therefore, every content without departing from technical solution of the present invention, according to the present invention Technical spirit any simple modification, equivalent variations and modification that above example is made, belong to technical solution of the present invention Protection domain.

Claims (31)

1. it is a kind of to instruct management method, it is characterised in that to include:
Described program is divided into instruction block by the execution sequence according to program;
According to execution sequence instruction block is sent at least one arithmetic core;
Before instruction block is sent, send to each arithmetic core and send on inquiry request, inquiry is sent on to described based on each arithmetic core The response that request is made sends the instruction in the instruction block, described to send on information of the inquiry request comprising the instruction for sending on.
2. it is as claimed in claim 1 to instruct management method, it is characterised in that the arithmetic core includes receiving and storing described The arithmetic element of the location of instruction of instruction block and the instruction block of the operation location of instruction storage, the instruction block institute Memory space of the space for accounting for less than or equal to the location of instruction.
3. it is as claimed in claim 1 to instruct management method, it is characterised in that the transmission instruction block is included according to agreement granularity The instruction of the instruction block is sent, the agreement granularity is the quantity of the instruction for sending on every time.
4. it is as claimed in claim 1 to instruct management method, it is characterised in that the arithmetic core sends on inquiry request to described The response made includes:Response and instruction conflict response are abandoned in command reception response, instruction;It is described based on each arithmetic core to institute State and send on the instruction that the response that inquiry request makes sent in the instruction block and include:
If the response that all arithmetic cores are made is not responded including instruction conflict, to sending the arithmetic core that instruction is responded is received The instruction sent on described in sending;
If the response that all arithmetic cores are made only including instruction abandon response, abandon to each arithmetic core send described in send on Instruction;
If the response that all arithmetic cores are made is responded including instruction conflict, the arithmetic core of instruction conflict response is made in wait Respond again, until the response that all arithmetic cores are made is not responded including instruction conflict.
5. it is as claimed in claim 1 to instruct management method, it is characterised in that the arithmetic core sends on inquiry request to described The response made includes:Immediately response, delay reception response are received, response is abandoned in instruction and instruction conflict is responded;It is described to be based on Each arithmetic core to it is described send on the response that inquiry request makes and send the instruction in the instruction block include:
If the response that all arithmetic cores are made does not include instruction conflict response and postpones to receive response, receive immediately to sending The instruction sent on described in the arithmetic core transmission of response;
If the response that all arithmetic cores are made only including instruction abandon response, abandon to each arithmetic core send described in send on Instruction;
If the response that all arithmetic cores are made includes that instruction conflict is responded and/or postpones to receive response, instruction is made in wait The arithmetic core of conflict response is responded again, and waits the arithmetic core for making delay reception response to make reception immediately Response, until the response that all arithmetic cores are made does not include that instruction conflict responds and postpones to receive response.
It is 6. as claimed in claim 1 to instruct management method, it is characterised in that the instruction block is stored in command memory, It is described based on the arithmetic core to it is described send on the response that inquiry request makes and send the instruction in the instruction block include:It is based on The arithmetic core sends on the response control command memory that inquiry request makes and sends the instruction block to arithmetic core to described In instruction.
7. it is as claimed in claim 1 to instruct management method, it is characterised in that also to include:Obtain what the arithmetic core sent After stopping response, stop sending instruction block to whole arithmetic cores.
8. it is as claimed in claim 1 to instruct management method, it is characterised in that also to include:Obtain what the arithmetic core sent After stopping response, the response of reception again that the arithmetic core sends is obtained again, and again to the transmission instruction of whole arithmetic cores Block.
9. it is as claimed in claim 1 to instruct management method, it is characterised in that also to include preserving the execution sequence of described program In the arithmetic core.
10. it is as claimed in claim 1 to instruct management method, it is characterised in that to the fortune by way of broadcast or multicast Calculate core and send the instruction block.
11. a kind of instruction managing devices, it is characterised in that include:
Feedback unit, to send instruction block sending signal, the instruction block draws described program according to the execution sequence of program Get;
Query unit is sent on, it is described pre- before the instruction for sending instruction block, to send to each arithmetic core and send on inquiry request Send the information of instruction of the inquiry request comprising sending on;
The feedback unit also to send on the response that inquiry request is made to described based on each arithmetic core, sends transmission instruction Message.
12. instruction managing devices as claimed in claim 11, it is characterised in that also including command memory, to store Instruction block is stated, the command memory is obtained after the transmission instruction message from the feedback unit, sent according to execution sequence Instruction block is at least one arithmetic core.
13. instruction managing devices as claimed in claim 12, it is characterised in that the command memory according to agreement granularity to The arithmetic core sends the instruction of the instruction block, and the agreement granularity is the quantity of the instruction for sending on every time.
14. instruction managing devices as claimed in claim 13, it is characterised in that the command memory passes through broadcast or multicast Mode the instruction of the instruction block is sent to the arithmetic core.
15. instruction managing devices as claimed in claim 11, it is characterised in that if the response that all arithmetic cores are made is not wrapped Include instruction conflict response, then the feedback unit make to send the arithmetic core that receives instruction response send described in the finger that sends on The feedback of order;If the response that all arithmetic cores are made is responded including instruction conflict, the feedback unit makes wait instruction The arithmetic core of conflict response is responded again, until the response that all arithmetic cores are made does not include instruction conflict response Feedback.
16. instruction managing devices as claimed in claim 11, it is characterised in that if the response that all arithmetic cores are made is not wrapped Include instruction conflict to respond and postpone to receive response, then the feedback unit is made to the arithmetic core for sending reception response immediately and being sent out Send the feedback of the instruction for sending on;If the response that all arithmetic cores are made includes that instruction conflict responds and/or postpones to receive Response, then the feedback unit is made and waits the arithmetic core for making instruction conflict response to respond again, and wait is done Go out to postpone the arithmetic core for receiving response and make reception response immediately, until the response that all arithmetic cores are made does not include instruction Conflict response and delay receive the feedback of response.
17. instruction managing devices as claimed in claim 11, it is characterised in that also including module unit is drawn, to according to program Execution sequence described program is divided into into the instruction block.
18. instruction managing devices as claimed in claim 11, it is characterised in that the feedback unit is also to obtain described After the stopping response that arithmetic core sends, stopping sends the instruction block sending signal.
19. instruction managing devices as claimed in claim 18, it is characterised in that the feedback unit is also to obtain described After the stopping response that arithmetic core sends, after the response of reception again that the arithmetic core sends is obtained again, re-emit described Instruction block sending signal.
20. a kind of arithmetic cores, it is characterised in that include:
The location of instruction, to receive and stores the instruction of instruction block, and the instruction block sends on;
Arithmetic element, to the instruction for running the location of instruction storage;
Instruction transmission processing unit, to the instruction block stored in instruction block, the location of instruction based on current operation Inquiry request is sent on the instruction managing device described in claim 11 sends, is made to the sound for sending on inquiry request Should.
21. arithmetic cores as claimed in claim 20, it is characterised in that if the instruction for sending on is covering the arithmetic core The instruction of operation and/or stored but also unenforced instruction is covered, then the instruction transmission processing unit of the arithmetic core does Go out instruction conflict response;If the instruction that the instruction for sending on needs for the arithmetic core, at the instruction transmission of the arithmetic core Reason unit makes command reception response;If the instruction for sending on is stored in the arithmetic core, the finger of the arithmetic core Make transmission processing unit make instruction and abandon response.
22. arithmetic cores as claimed in claim 20, it is characterised in that if the instruction for sending on is the computing of the arithmetic core The instruction and the location of instruction of the arithmetic core that unit needs has the instruction sent on described in enough memory space storages, Then the instruction transmission processing unit of the arithmetic core is made and receive immediately response;If the instruction for sending on is the fortune of the arithmetic core The location of instruction of the instruction but the arithmetic core of calculating unit needs does not have the finger sent on described in enough memory space storages Make, then the instruction transmission processing unit of the arithmetic core makes delay and receives response.
23. arithmetic cores as claimed in claim 20, it is characterised in that the instruction transmission processing unit sends delay and receives After response, through time delay, then reception response immediately is sent.
24. arithmetic cores as described in claim 21 or 22, it is characterised in that the instruction transmission processing unit sends instruction After conflict response, after conflict is released, respond again.
25. arithmetic cores as claimed in claim 24, it is characterised in that the conflict includes that the instruction for sending on covers institute State instruction and/or cover stored but also unenforced instruction that the arithmetic element of arithmetic core is currently running.
26. arithmetic cores as claimed in claim 20, it is characterised in that the location of instruction is also rung to send stopping Should, the location of instruction is sent after the stopping response, stops receiving the instruction block for sending on.
27. arithmetic cores as claimed in claim 20, it is characterised in that the location of instruction is also to send stopping After response, send receive again response again, the location of instruction is sent after the response of reception again, is received again pre- The instruction block for sending.
28. arithmetic cores as claimed in claim 20, it is characterised in that the location of instruction is also to preserve the finger Make the execution sequence of block.
29. arithmetic cores as claimed in claim 20, it is characterised in that also including described in any one of claim 12 to 19 Instruction managing device.
30. a kind of instruction management systems, it is characterised in that include:
Instruction managing device described in any one of claim 11 to 19;
Arithmetic core described in any one of claim 20 to 28.
31. a kind of instruction management systems, it is characterised in that include:
Arithmetic core described in any one of claim 20 to 28;
Arithmetic core described in claim 29.
CN201210107228.1A 2012-04-12 2012-04-12 Method, device and system for instruction management and operation core Active CN103377085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210107228.1A CN103377085B (en) 2012-04-12 2012-04-12 Method, device and system for instruction management and operation core

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210107228.1A CN103377085B (en) 2012-04-12 2012-04-12 Method, device and system for instruction management and operation core

Publications (2)

Publication Number Publication Date
CN103377085A CN103377085A (en) 2013-10-30
CN103377085B true CN103377085B (en) 2017-04-19

Family

ID=49462244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210107228.1A Active CN103377085B (en) 2012-04-12 2012-04-12 Method, device and system for instruction management and operation core

Country Status (1)

Country Link
CN (1) CN103377085B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10936316B2 (en) * 2015-09-19 2021-03-02 Microsoft Technology Licensing, Llc Dense read encoding for dataflow ISA
CN113568665B (en) * 2020-04-29 2023-11-17 北京希姆计算科技有限公司 Data processing device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1151047A (en) * 1994-11-25 1997-06-04 摩托罗拉公司 Method of loading instructions into instruction cache

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69327688T2 (en) * 1992-08-12 2000-09-07 Advanced Micro Devices Inc Command decoder

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1151047A (en) * 1994-11-25 1997-06-04 摩托罗拉公司 Method of loading instructions into instruction cache

Also Published As

Publication number Publication date
CN103377085A (en) 2013-10-30

Similar Documents

Publication Publication Date Title
CN103197953B (en) Speculate and perform and rollback
JP3836838B2 (en) Method and data processing system for microprocessor communication using processor interconnections in a multiprocessor system
CN107688853A (en) A kind of device and method for being used to perform neural network computing
US20090089510A1 (en) Speculative read in a cache coherent microprocessor
CN104699631A (en) Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CN101366004A (en) Methods and apparatus for multi-core processing with dedicated thread management
CN105138679B (en) A kind of data processing system and processing method based on distributed caching
WO2007084700A2 (en) System and method for thread handling in multithreaded parallel computing of nested threads
CN103946803A (en) Processor with efficient work queuing
Yi et al. Gpunfv: a gpu-accelerated nfv system
CN103778070B (en) The parallel processing of multiple pieces of consistency operations
CN105183662A (en) Cache consistency protocol-free distributed sharing on-chip storage framework
US8028017B2 (en) Virtual controllers with a large data center
CN104503948B (en) The close coupling of multi-core network processing framework is supported adaptively to assist processing system
CN103559017A (en) Character string matching method and system based on graphic processing unit (GPU) heterogeneous computing platform
US9542317B2 (en) System and a method for data processing with management of a cache consistency in a network of processors with cache memories
JP3836837B2 (en) Method, processing unit, and data processing system for microprocessor communication in a multiprocessor system
US6904465B2 (en) Low latency inter-reference ordering in a multiple processor system employing a multiple-level inter-node switch
CN103377085B (en) Method, device and system for instruction management and operation core
CN102571580A (en) Data receiving method and computer
US6907509B2 (en) Automatic program restructuring to reduce average cache miss penalty
CN106445472B (en) A kind of character manipulation accelerated method, device, chip, processor
US10275392B2 (en) Data processing device
CN107924309A (en) System and method for changeable channel framework
CN110245024A (en) The dynamic allocation system and its method of static storage block

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant