CN104932945B - A kind of out of order multi-emitting scheduler of task level and its dispatching method - Google Patents

A kind of out of order multi-emitting scheduler of task level and its dispatching method Download PDF

Info

Publication number
CN104932945B
CN104932945B CN201510342408.1A CN201510342408A CN104932945B CN 104932945 B CN104932945 B CN 104932945B CN 201510342408 A CN201510342408 A CN 201510342408A CN 104932945 B CN104932945 B CN 104932945B
Authority
CN
China
Prior art keywords
state
computing resource
unit
address
memory space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510342408.1A
Other languages
Chinese (zh)
Other versions
CN104932945A (en
Inventor
张多利
张扬
宋宇鲲
杜高明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201510342408.1A priority Critical patent/CN104932945B/en
Publication of CN104932945A publication Critical patent/CN104932945A/en
Application granted granted Critical
Publication of CN104932945B publication Critical patent/CN104932945B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Advance Control (AREA)

Abstract

The invention discloses a kind of out of order multi-emitting scheduler of task level and its dispatching method, it is characterized in that, scheduler includes:Reservation station, selection wakeup unit and managing computing resources unit;Write address administrative unit, memory space and reservation station state table are included in reservation station;It selects to include chronological table, ready query unit and ready counter in wakeup unit;Computing resource table, allocation unit and recovery unit are included in managing computing resources unit.The present invention can improve throughput and the level of resources utilization of scheduler, so as to promote assignment instructions emission effciency, lifting system performance.

Description

A kind of out of order multi-emitting scheduler of task level and its dispatching method
Technical field
The present invention relates to a kind of out of order multi-emitting scheduler of task level and its dispatching methods, belong to out of order multi-emitting processor Field.
Background technology
With the development of integrated circuit technique, the requirement for processor performance is higher and higher.The promotion of processor performance, On the one hand the development of integrated circuit technology is depended on;On the other hand the progress of processor designing technique, wherein system are also depended on The development of structure play the role of it is vital, and promoted degree of parallelism be architecture development theme.It is very one big in the past In between timesharing, many effort are done to the excavation of instruction level parallelism, superpipelined architecture, superscalar architecture, instruction are out of order multiple It penetrates, the technologies such as very long instruction word VLIW are applied in many processors.It is current more with the development of polycaryon processor Nuclear technology has become the main technique methods for promoting processor performance, but it is current multiple nucleus system that computing resource utilization ratio is low How existing main problem, fully dispatch the numerous computing resources of on piece, and lifting system performance is one in multiple nucleus system research A key direction.
It is effective ways solving the above problems that the out of order multi-emitting technology of instruction-level is expanded to task level, and task The dynamic dispatching of grade, resource dynamic management technology are crucial and difficult points therein.Task level is expanded to from instruction-level, corresponds to meter Variation of the granularity from fine granularity to coarseness is calculated, this brings many new challenges, throughput and money to the realization of multi-emitting technology Source utilization ratio is to assess two big key factors of scheduler design.The basic thought of dynamic scheduler is Tomasulo algorithms Using this is using very ripe in traditional instruction-level scheduler, but the design of traditional instruction grade scheduler is not The design requirement of task level coarseness scheduler can be met.And the existing design for task level scheduler, such as based on wide There is the problem of very big in the autgmentability for the scheduler broadcast;And dependence is expressed as based on correlation matrix scheduler requirements " producer-consumer " instructs, so it is very low to obtain efficiency for " task-task " correlation.
The content of the invention
In place of the present invention is overcomes the shortcomings of the prior art, it is proposed that a kind of out of order multi-emitting scheduler of task level and Its dispatching method to improve the throughput of scheduler and the level of resources utilization, so as to promote assignment instructions emission effciency, carries Rise system performance.
The used in order to achieve the above objectives technical solution of the present invention is:
A kind of out of order multi-emitting scheduler of task level of the present invention is provided in processor and refers to for dispatching M task Order, the processor include:Fetch unit, buffer status table and pe array;Its main feature is that the scheduler bag It includes:Reservation station, selection wakeup unit and managing computing resources unit;Write address administrative unit, storage are included in the reservation station Space and reservation station state table;Chronological table, ready query unit and ready counter are included in the selection wakeup unit;It is described Computing resource table, allocation unit and recovery unit are included in managing computing resources unit;
The memory space accommodates up to N number of assignment instructions for preserving M assignment instructions in synchronization, each Assignment instructions occupy continuous L address space in the memory space so that the memory space is divided into N sections, number according to Secondary is 0~N-1;The write address administrative unit is used to distribute N number of assignment instructions automatically the memory space of reservation station;Institute Stating the state of memory space includes " sky ", " full ", " non-empty " and " non-full ";The reservation station state table is used to store described deposit Store up the mode bit in space;The mode bit includes:" free time " or " occupancy ";
The ready query unit is used to receive the assignment instructions that the reservation station is sent and be parsed, and obtains described appoint Computing resource information and input register information needed for business instruction, and be sent respectively to the managing computing resources unit and post Storage state table and the status information for receiving feedback;The computing resource information includes:Computing resource species and computing resource Number;The input register information includes:Input register is numbered and input register number num;The chronological table is used to deposit It stores up address information of the assignment instructions in the memory space, and institute is sequentially presented to by what the assignment instructions entered reservation station State ready query unit;The ready counter is used for the ready input register fed back to the buffer status table It is counted;
Whether the computing resource table is used for the computing resource fed back needed for the assignment instructions ready;The allocation unit For inquiring about the computing resource table, and ready computing resource number is sent to the selection wakeup unit;Described time Unit is received for recycling the computing resource for having completed calculating task;
The input that the selection wakeup unit is fed back according to the ready computing resource number and buffer status table The status information of register judges whether the assignment instructions are ready, and ready assignment instructions are transmitted to external place It is performed in reason cell array.
A kind of the characteristics of dispatching method based on the out of order multi-emitting scheduler of the task level of the present invention is as follows It carries out:
The assignment instructions number that step 1, definition Fetch unit are sent to reservation station is variable p, 0≤p≤M-1;Definition institute The write address of memory space is stated as variable i, 0 × L≤i≤N × L-1;The reading address of the memory space is defined as variable j, 0 × L≤j≤N×L-1;The write address for defining the chronological table is variable k, 0≤k≤N;The reading address of the chronological table is defined to become Measure m, 0≤m≤N-1;The address for defining the reservation station state table is variable n, 0≤n≤N-1;Define the ready counter For variable cnt;And initialize p=0, i=0, j=0, k=0, m=0, n=0, cnt=0;And step 2 and step are performed simultaneously 5;
Step 2, the write address administrative unit judge whether the state of the memory space is " full " state, if " full ";It is started a query at again when then the write address administrative unit waits the state of the memory space to be " non-full " state;Otherwise, The write address administrative unit inquires about the mode bit of (n+1)th address in the reservation station state table immediately, if (n+1)th address Mode bit for " free time ", then the state of the write address of memory space is changed to " locked " shape by the write address administrative unit State, then perform step 3;Otherwise, n+1 is assigned to n, and repeats step 2;
Step 3, the reservation station judge whether the Fetch unit sends+1 assignment instructions of pth in N number of assignment instructions; If having sent, n × L is assigned to i, represents n-th × L+1 being stored in+1 assignment instructions of pth in the memory space In a address, and the mode bit of (n+1)th address in the reservation station state table is set to " occupy ";Write address administrative unit will The state of write address is changed to " unlocked " state, while n is stored in the kth+1 of the chronological table according to " old-first " principle In a address, and k+1 is assigned to k, after p+1 is assigned to p, performs step 4;If not sending, step 3 is continued to execute, etc. Treat+1 assignment instructions of pth that Fetch unit is sent;
Step 4, judge whether k=1 is true, if so, the state of the memory space is then changed to " non-empty " state, it is no Then, the state for maintaining the memory space is " non-empty " state;
Judge whether k=N is true, if so, then the state of the memory space is changed to " to expire " state, otherwise, is maintained The state of the memory space is " non-full " state;And return to step 2 performs;
Whether step 5, the state for judging the memory space are " non-empty " state, if " non-empty " state, then basis " old-first " principle reads the m+1 address of the chronological table, obtains the output of the m+1 address of the chronological table T, and t × L is assigned to j, refer to so that f-th of task can be read from the t × L+1 address of the memory space Order, and f-th of assignment instructions are parsed, it obtains the computing resource information needed for f-th of assignment instructions and input is posted Storage information, then perform step 6;Otherwise, step 5 is repeated, the state of the memory space is waited to become " non-empty " state;
Step 6, the selection wakeup unit number inquiry according to the cnt+1 input register of f-th of assignment instructions The buffer status table, and after receiving the status information of buffer status table feedback, perform step 7;
The status information that step 7, the selection wakeup unit are fed back according to the buffer status table, judges the input Whether register is ready, if ready, cnt+1 is assigned to cnt, then performs step 8;Otherwise, the value of m+1 is assigned to m, Cnt is set to 0, step 5 is returned again to and performs;
Step 8, the selection wakeup unit are by the ready counter cnt and the input register of f-th of assignment instructions Number num is compared, if meeting cnt < num, is performed back to step 6;Otherwise, step 9 is performed;
Step 9, the required computing resource information for selecting wakeup unit according to f-th of assignment instructions, to the calculating After rm-cell sends querying command;Perform step 10;
Step 10, the allocation unit receive the querying command that the selection wakeup unit is sent, and scan the calculating Resource table judges whether the required computing resource information for meeting f-th of assignment instructions, if satisfied, then providing described calculate State is the computational resource allocation of " free time " to the selection wakeup unit in source control unit, and to the selection wakeup unit Feedback status information is the computing resource number of " free time " distributed, while will be distributed in the computing resource table " free time " state of computing resource is set to " occupancy ";If not satisfied, wakeup unit feedback status information then is selected as institute to described Need computing resource insufficient;And perform step 11;
The status information that step 11, the selection wakeup unit are fed back according to the managing computing resources unit, judges institute Whether computing resource needed for stating is ready, if ready, f-th ready of assignment instructions are transmitted to external processing list It is performed in element array, and the mode bit of the t+1 address in the reservation station state table is set to " free time ";By the year The output t of the m+1 address of age table is set to engineering noise;So that t+1 address and the year of the memory space It is generated " bubble " in the m+1 address of age table, and performs step 12 and step 13 simultaneously;Otherwise, the value of m+1 is assigned to Cnt is set to 0 by m, and return to step 5 performs;
The m+2 address information in the chronological table to n-th address information is assigned to m+1 by step 12 successively Address information is assigned to k to the N-1 address information, and by k-1;Judge whether k=0 is true, if so, then by the storage The state in space is changed to " sky " state;Otherwise, " non-empty " state is maintained;
Judge whether k=N-1 is true, if so, the state of the memory space is then changed to " non-full " state;Otherwise, Maintain " non-full " state;After cnt is set to 0, returns again to step 5 and perform;
If step 13, the recovery unit receive the completion calculating task information of the pe array feedback, The completion calculating task information is parsed, obtains the computing resource species and computing resource for having completed calculating task instruction Number, and the corresponding computing resource of the computing resource species for having completed calculating task instruction and computing resource number is existed Computing resource state in the computing resource table is changed to " free time ", repeats and performs step 13.
The characteristics of dispatching method of the present invention, lies also in, and the managing computing resources unit uses and actively distributes resource Mode then by state for " free time " computational resource allocation to the selection wakeup unit;The mode for actively distributing resource For:
Step 1 forms the computing resource table using several cell fifos, it is assumed that the species of computing resource is A kinds;And The number of each computing resource is B;And it is C groups to divide the number B of each computing resource, so as to form A × C cell fifo The two-dimensional matrix formed;And each column cell fifo corresponds to a base and numbers, and is respectively
Step 2 initializes the two-dimensional matrix, by offsetIt is respectively written into the two-dimensional matrix Each cell fifo in, for representing the computing resource of " free time " state;
Step 3, each cell fifo read the offset of itself, and the computing resource table is waited to meet described The required computing resource information of f assignment instructions;
Step 4, when the computing resource table meets the required computing resource information of f-th of assignment instructions, each Cell fifo forms computing resource number after the base number corresponding to read offset and itself is combined;
Step 5, the allocation unit are arbitrarily chosen from C groups computing resource number meets f-th of assignment instructions The computing resource number of required computing resource information is supplied to the selection wakeup unit.
Compared with prior art, advantageous effects of the invention are embodied in:
1st, the present invention devises a kind of scheduler of the out of order multi-emitting of task level, including reservation station, selection wakeup unit, meter Rm-cell is calculated, so as to well expand to the out of order multi-emitting dynamic dispatching technology of the instruction-level of fine granularity characteristic The out of order multi-emitting dynamic dispatching technology of task level with coarseness characteristic, the more efficient memory space using reservation station of energy, The utilization ratio of reservation station, and then the efficient dynamic dispatching for supporting that system is instructed for task level are promoted, makes full use of calculating The powerful calculating power of resource extracts task-level parallelism execution efficiency, lifting system degree of parallelism;Higher computing resource is obtained to utilize Efficiency and better calculated performance are reducing the same of design complexities for the managing computing resources in multinuclear even many-core system When possess higher efficiency.
2nd, the present invention devises a kind of efficient reservation station, has for the utilization rate of reservation station memory space and carries well It rises;Designed write address administrative unit is with certain from query function, and by scanning reservation station state table, automatic locking is empty Not busy memory space, in case new assignment instructions write-in;It is this that there is higher efficiency from query function, it can quickly position To idle memory space;Designed reservation station state table plays very crucial for state expression, distribution, the removing of memory space Effect.
3rd, the selection wakeup unit that the present invention designs has very efficient realization for the transmitting principle of " old-first ";Rule Traditional selection awakening method is kept away, tradition selection awakening method is first to wake up reselection, is first sent to ready instruction Ready pool of instructions, then the age by comparing assignment instructions, select oldest instruction to be emitted.And the present invention is first to select Wake up again, used for reference the concept of second rank pointer, according to " old-first " principle when selection, i.e., according to assignment instructions into Entering the order of reservation station will instruct the address in reservation station to write in chronological table, when wake-up according to chronological table address from It is small to sequential reading out greatly, and only need a clock cycle for the compression of reservation station " bubble ";This method design letter Single, throughput is high, saves resource to a certain extent, while improves the efficiency that selection wakes up.
4th, the managing computing resources unit that the present invention designs is designed to the form of two-dimensional matrix, convenient to a variety of computing resources It is allocated and recycles;It is stored using FIFO, simplifies design, while improve distribution organic efficiency, counted without traversal Calculate the idle computing resource of resource vector table inquiry;Simultaneously distributed calculating is generated in a manner that base number adds offset Resource number, this form save to store the FIFO memory spaces of computing resource number to a certain extent.FIFO is carried It is actively to provide for computing resource number offset, without just being taken until managing computing resources unit needs in FIFO Go out, save the time of computational resource allocation.
Description of the drawings
Fig. 1 is overall structure diagram of the present invention;
Fig. 2 is reservation station structure diagram of the present invention;
Fig. 3 is present invention selection wakeup unit structure diagram;
Fig. 4 is chronological table of the present invention and memory space mapping graph;
Fig. 5 compresses " bubble " schematic diagram for reservation station of the present invention;
Fig. 6 is managing computing resources cellular construction schematic diagram of the present invention;
Fig. 7 is scheduler schedules flow diagram of the present invention;
Fig. 8 is the reservation station schematic diagram for not compressing " bubble " in the prior art.
Specific embodiment
In the present embodiment, as shown in Figure 1, a kind of out of order multi-emitting scheduler of task level, is provided in processor and is used in combination In dispatching M assignment instructions, which includes:Fetch unit, buffer status table and pe array;Scheduler bag It includes:Reservation station, selection wakeup unit and managing computing resources unit;Write address administrative unit, memory space are included in reservation station With reservation station state table;It selects to include chronological table, ready query unit and ready counter in wakeup unit;Managing computing resources Computing resource table, allocation unit and recovery unit are included in unit;
In general, including controller and pe array in the processor of support dynamic dispatching, fetching is included in controller Decoding unit, register renaming unit, scheduler, submission unit, physical register, buffer status table etc.;Assignment instructions By fetching decoding unit from main storage program area obtain assignment instructions, into row decoding after be sent to register renaming list Member;Register renaming unit carries out register renaming to assignment instructions, eliminates false data correlation, and it is related to retain true data Property, re-send to scheduler;Scheduler including the ready situation of input register and calculates money according to the ready situation of assignment instructions Ready assignment instructions are sent to pe array and performed by the ready situation in source.To simplify the description, will take here Refer to decoding unit and register renaming unit and be referred to as Fetch unit, Fetch unit is to scheduler dispatches assignment instructions, at this time The instruction of scheduler received task has already passed through register renaming, eliminates false data correlation;Submission is omitted simultaneously Unit and physical register;Remain buffer status table and pe array.
Assignment instructions in the present invention are to dispatch the information that a required by task is wanted, and are mainly included:Of input register Number, the number of input register, the species of computing resource, the number of each computing resource etc..And a task then includes a system The simple instruction of row.Here can be understood using function call and function body:Assignment instructions are corresponded into function call, task Then correspond to function body.Scheduler is by being scheduled function call, so as to perform corresponding function body.
Memory space accommodates up to N number of assignment instructions, each task for preserving M assignment instructions in synchronization Instruction occupies continuous L address space in memory space, so that memory space is divided into N sections, number is followed successively by 0~N- 1;If M≤N, memory space can accommodate M whole assignment instructions in synchronization, when receive Fetch unit transmission It can be write immediately in memory space during assignment instructions;Otherwise, when having there is N number of assignment instructions in memory space, storage is empty Between the assignment instructions that no longer send over Fetch unit write immediately in memory space.Only when thering is task to refer in memory space Order is transmitted to " bubble " for performing and compressing in pe array in memory space, and inquires about the free time by write address administrative unit Memory space and after locking write address, could send Fetch unit in assignment instructions write-in memory space.
Since the information that the assignment instructions in the present invention include is more, each assignment instructions can be divided into continuously L field is written in one section of continuous address space of memory space, these continuous L fields are known as assignment instructions word. As shown in Fig. 2, making L=10 in this example, i.e. an assignment instructions are divided into 10 continuous assignment instructions and are written to storage sky Between in;N=32 is made, i.e. memory space is divided into 32 sections using 10 address spaces as a unit;Therefore memory space needed for Depth for 320, corresponding address space is 0~319.Write address administrative unit is used to distribute guarantor automatically to N number of assignment instructions The memory space at station is stayed, it is 0~31 that 32 sector address spaces of memory space are numbered respectively, and write address administrative unit passes through inquiry Reservation station state table obtains the state per sector address space, selects one section of free time from the address space that number is 0~31 every time Address space, for storing the assignment instructions that Fetch unit is sent;The state of memory space includes " sky ", " full ", " non- It is empty " and " non-full ";" sky " state of memory space represents do not have assignment instructions in memory space;" full " state of memory space It represents all to store assignment instructions for 32 sections in memory space;" non-empty " state of memory space is represented at least one in memory space Section stores assignment instructions;" non-full " state of memory space represents that at least one section of memory space does not store in memory space Assignment instructions.Reservation station state table is used to store the mode bit of memory space;Mode bit includes:" free time " or " occupancy ";Such as Fig. 2 Shown, reservation station state table stores 32 mode bits in total, corresponds to 32 sector address spaces of memory space respectively, therefore each A mode bit corresponds to one section of 10 continuous address space in memory space.Reservation station state table is distributed to write address administrative unit Memory space provides foundation.Write address administrative unit is by traveling through the mode bit of reservation station state table, when running into shape for the first time State position be " free time " state when, it is assumed that the number of the address space of memory space is id at this time, then actual write address be 10 × The state of write address is set to " locked " by id at this time, is the locking process of write address;When reservation station receives Fetch unit During the assignment instructions sent, then the assignment instructions are write into distributed memory space, then write address state is set to " unlocked Calmly ", it is the releasing process of write address.
Ready query unit is used to receive the assignment instructions of reservation station transmission and be parsed, and obtains needed for assignment instructions Computing resource information and input register information, and be sent respectively to managing computing resources unit and buffer status table and receive The status information of feedback;Computing resource information includes:Computing resource species and computing resource number;Input register packet It includes:Input register is numbered and input register number num;In this example, ready query logic first inquires about input register It is whether all ready, if it is all ready, then whether inquire about computing resource ready;If it is deposited in the presence of not ready input Device then inquires about next task instruction.Chronological table instructs the address information in memory space for store tasks, and Ready query unit is sequentially presented to by what assignment instructions entered reservation station;Ready counter is used for anti-to buffer status table institute The ready input register of feedback is counted;
Assignment instructions wake up by " old-first " principle in scheduler and refer to that preferential emission more early enters reservation station Assignment instructions.Because the probability that the relatively early assignment instructions into reservation station have bigger includes more data dependences, subsequently Relevant assignment instructions realize " old-first " principle using the output register of this assignment instructions as their input It can make follow-up relevant assignment instructions ready earlier and emit, so as to the performance of lifting system.
For the scheduler of traditional instruction grade, due to instructing the information included less, each instruction is stored directly in In the memory space of reservation station, multistage need not be divided into and stored, and required memory space is smaller.The tune of instruction Spend journey is whether parallel query instruction is ready, it is former according to " old-first " then from the instruction in " ready " state It then selects into the instruction of reservation station to be emitted at first, i.e., " first wakes up, reselection ".Usually " old- is realized with two methods First " principles.
First method is that every instruction is carried in age information write-in memory space, not against the address conduct of reservation station The order waken up is selected, but the age of " ready " instruction is compared in display, chooses the instruction of age minimum --- it is i.e. most advanced The instruction for entering reservation station is emitted.This method needs more comparator, and circuit delay is larger, usually it is not recommended that adopting Take this method.
Second method is realized with shift register group, when receiving the instruction of Fetch unit transmission, is stored in In first idle address space for storing up space;When instruction issue after, can in memory space generate " bubble ", but should " bubble " only takes up an address space in memory space, and the bubble compressed in reservation station is only needed " bubble " location All the elements after the space of location move forward an address successively, you can eliminate " bubble ".Therefore in the storage of reservation station In space, all instructions are all to be sequentially stored according to the order of instruction into reservation station in memory space, memory space The smaller address space storage in location is the instruction more early entered, and the larger address space storage of memory space address is later The instruction of entrance.It is very convenient for the realization of " old-first " principle, read from small to large according to memory space address successively into Row scheduling.
But compared to traditional instruction, for the scheduler of task level, the information that each assignment instructions include is more, It needs to occupy multiple address spaces in memory space, and the hardware store resource needed is larger, and hardware design generally selects BlockRAM is realized rather than similar approach two is realized using shift register group.When being realized using BlockRAM, when storage is empty Between it is middle generate " bubble " after, it has not been convenient to compress " bubble ".Assuming that generating the position of bubble as q, then need to read the content of q+10 Go out, then write in the address space of q, then q is incremented by and repeats this operation, all contents all move after q+10 To corresponding position.If used BlockRAM is dual-port, being also required to 320-q-10 clock cycle could be complete Into so low performance is unacceptable.
Here illustratively why need to compress " bubble ".Because " old-first " principle is realized, if not compressing " gas Bubble ", then when assignment instructions enter reservation station, can only write on first idle memory space of reservation station end, and intermediate Memory space in " bubble " position cannot be allocated, it is necessary to the task in the memory space before " bubble " position After instruction is all launched, the memory space of " bubble " position can be allocated.It is illustrated in figure 8 the guarantor for not compressing " bubble " Station schematic diagram is stayed, shows that the memory space that number is 1,2,4 is occupied in figure, is " bubble " in the memory space that number is 3, this It when Shi Ruo has the assignment instructions to come in, can only be sequentially written in the memory space that number is 5~31, it is impossible to which write-in number is 3 to deposit It stores up in space;When the memory space that number is 5~31 also all writes full assignment instructions, in the memory space that number is 3 still not Assignment instructions can be write, if also assignment instructions need to write, reservation station refusal write-in.Until the storage that number is 0~2 is empty Between after the transmitting of middle assignment instructions, the assignment instructions that write could will be needed to write successively in the memory space that number is 0~2, connect The assignment instructions to get off could be write in the memory space that number is 3.Therefore the utilization of this reservation station for not compressing " bubble " It is inefficient.Therefore need to design it is a kind of both carry out assignment instructions scheduling by " old-first " principle simultaneously can also Efficient Compression The scheduler of " bubble ".
The present invention uses for reference a kind of chronological table of the conceptual design of second rank pointer, the memory space of cooperation dual-port BlockRAM It realizes and a kind of carries out assignment instructions scheduling by " old-first " principle while can also be in Efficient Compression memory space " bubble " Scheduler.The number for the memory space that write address administrative unit is distributed was stored successively to year when assignment instructions are entered reservation station In age table, using this number as Pointer, using the reading address of chronological table as second rank pointer, chronological table is designed to shift LD The form of device group.As shown in figure 3, contain head pointer, tail pointer, allocation pointer in chronological table.Head pointer is used to indicate chronological table First address, i.e., the address 0 in figure;The last one in chronological table that tail pointer be used to indicate contains effective memory space number Address, i.e., the address 28 in figure;Allocation pointer is used to indicate the next address of age table address pointed by tail pointer, that is, schemes In address 29.The memory space number that will be allocated into the assignment instructions of reservation station can be stored in allocation pointer meaning To age table address in.When allocation pointer and equal head pointer, represent there is no any effective content in chronological table, deposit at this time It is also " sky " state to store up space;When allocation pointer is equal to 32, represent that address space all in chronological table is all stored with effectively Content, memory space is also " full " state at this time.
As shown in Figure 4, it is shown that the mapping relations of chronological table and memory space.When being scheduled to assignment instructions, by Ready query unit reads chronological table according to the order of age table address from small to large, it is only necessary to tail pointer since head pointer Terminate, the output of chronological table is the number id of memory space, and the ground of corresponding memory space can be calculated according to this number Location is 10 × id.This address is obtained corresponding assignment instructions by ready query unit, and to appointing Business instruction is parsed and judged whether ready.So far " old-first " principle is realized, has preferentially read out and entered at first The assignment instructions of reservation station make choice wake-up, also may just be launched into pe array and perform at first.In Fig. 4, The memory space address corresponding to assignment instructions sequential read out is 5,3,29 ... 30,1, this is also that assignment instructions enter reservation The sequencing stood.After assignment instructions ready transmitting, in the address space that institute's launch mission instruction is corresponded in memory space Content is no longer valid, and it is also invalid to correspond to the number of the memory space of institute launch mission instruction in chronological table, i.e., in memory space and " bubble " can be all generated in chronological table, as Fig. 5 shows the front and rear chronological table and memory space of " bubble " compression, age before compression All it is " bubble " in the address 2 of table and the address 290~299 of memory space, need to only passes through the compression side of shift register at this time " bubble " in formula compression chronological table.From presentation, just no longer contain " bubble " in compressed chronological table, and memory space In " bubble " seem still also to exist, but it is first to read chronological table and then according to the output meter of chronological table during task scheduling to carry out The reading address of memory space is calculated, therefore the assignment instructions of memory space " bubble " position will not be read, this is also achieved that and deposits Store up the compression of space " bubble "." bubble " of memory space is also referred to as " bubble " of reservation station by this example.
Whether computing resource table is used for the computing resource fed back needed for assignment instructions ready;Allocation unit calculates for inquiring about Resource table, and ready computing resource number is sent to selection wakeup unit;Recovery unit has been completed to calculate for recycling The computing resource of task;It is illustrated in figure 6 managing computing resources cellular construction schematic diagram.
Wakeup unit is selected according to the input register of ready computing resource number and buffer status table feedback Status information judges whether assignment instructions are ready, and ready assignment instructions are transmitted in external pe array It is performed.
A kind of dispatching method of the out of order multi-emitting scheduler of task based access control grade, carries out as follows:
The assignment instructions number that step 1, definition Fetch unit are sent to reservation station is variable p, 0≤p≤M-1;Definition is deposited The write address in space is stored up as variable i, 0 × L≤i≤N × L-1;The reading address for defining memory space is variable j, 0 × L≤j≤N ×L-1;The write address for defining chronological table is variable k, 0≤k≤N;The reading address for defining chronological table is variable m, 0≤m≤N-1; The address for defining reservation station state table is variable n, 0≤n≤N-1;Ready counter is defined as variable cnt;And initialize p=0, I=0, j=0, k=0, m=0, n=0, cnt=0;And step 2 and step 5 are performed simultaneously;Address is opened from 0 in the present invention Begin, the e+1 address is the address that address is e.N=32 is set in this example.Following scheduling process is as shown in Figure 7.
Step 2, write address administrative unit judge whether the state of memory space is " full " state, if " full ";Then write ground Location administrative unit starts a query at again when the state of memory space being waited to be " non-full " state;Otherwise, write address administrative unit is immediately The mode bit of (n+1)th address in reservation station state table is inquired about, if the mode bit of (n+1)th address is " free time ", write address The state of the write address of memory space is changed to " locked " state by administrative unit, then performs step 3;Otherwise, n+1 is assigned to N, and repeat step 2;In this example, the initial value in reservation station state table is entirely 0, therefore is " empty by mode bit It is not busy " state is expressed as 0, and mode bit is expressed as 1 for " occupancy " state.The step is to inquire about idle storage space and locking The process of write address, n is also the number of the address space in memory space at this time.As shown in Fig. 2, one in reservation station state table A mode bit corresponds to a sector address space of memory space, for example, the 0th mode bit in reservation station state table corresponds to storage sky Between 0~9 address space, 10~19 address that the 1st mode bit in reservation station state table corresponds to memory space is empty Between ... and so on.
Step 3, reservation station judge whether Fetch unit sends+1 assignment instructions of pth in N number of assignment instructions;If it sends out It send, then n × L is assigned to i, represent+1 assignment instructions of pth being stored in n-th × L+1 address in memory space, Middle n × L is initial address of+1 assignment instructions of pth in memory space, which occupies the ground in memory space Location space is n × L~n × L+L-1, and each address space stores an assignment instructions word;It and will be (n+1)th in reservation station state table The mode bit of a address is set to " occupy ";The state of write address is changed to " unlocked " state, while root by write address administrative unit N is stored in+1 address of kth of chronological table according to " old-first " principle, and k+1 is assigned to k, after p+1 is assigned to p, Perform step 4;If not sending, step 3 is continued to execute ,+1 assignment instructions of pth that Fetch unit is waited to send;Wherein k is For the allocation pointer of chronological table.After writing an assignment instructions, allocation pointer is incrementally directed toward the next address of chronological table automatically Space.The hardware of chronological table is realized with 32 shift registers in this example, and each register is 5, for storing 0~31 The number of memory space.
Step 4, judge whether k=1 is true, if so, the state of memory space is then changed to " non-empty " state, otherwise, The state for maintaining memory space is " non-empty " state;If as shown in figure 3, k=1, show to write chronological table after the assignment instructions Allocation pointer be directed toward chronological table the 2nd address, at this time the state of memory space be " non-empty " state;Also illustrating should in write-in The allocation pointer of chronological table is directed toward the 1st address of chronological table before assignment instructions, and the state of memory space is " sky " state at this time; Therefore need the state of memory space being changed to " non-empty " state from " sky " state.Otherwise, the state for maintaining memory space is " non- It is empty " state.
Judge whether k=N is true, if so, then the state of memory space is changed to " to expire " state, otherwise, maintains storage The state in space is " non-full " state;And return to step 2 performs;Similarly as shown in figure 3, if k=N, shows to write the task The allocation pointer of chronological table is directed toward the next address space of the last one address space of chronological table after instruction, at this time memory space State be " full " state;Also illustrate that the allocation pointer of the chronological table before the assignment instructions are write is directed toward the last one of chronological table Address space, the state of memory space is " non-full " state at this time;Therefore need the state of memory space from " non-full " state It is changed to " expire " state.Otherwise, the state for maintaining memory space is " non-full " state.It illustrates, it is " non-empty " and " non-full ", " non- It is empty " all there may be intersections between " full ", " non-full " and " sky ".
Whether step 5, the state for judging memory space are " non-empty " state, if " non-empty " state, then according to " old- First " principles read the m+1 address of chronological table, the output t of the m+1 address of age of acquisition table, and by t × L assignment To j, so that f-th of assignment instructions can be read from the t × L+1 address of memory space, and f-th of task is referred to Order is parsed, and is obtained computing resource information and input register information needed for f-th of assignment instructions, then is performed step 6;It is no Then, step 5 is repeated, the state of memory space is waited to become " non-empty " state;As shown in figure 4, ready query unit is most opened Begin that the address that age table address is 0 can be read, chronological table output memory space number is 5, and memory space is calculated according to t × L Reading initial address be 50, you can within 10 clock cycle read memory space in address be 50~59 assignment instructions word, F-th of assignment instructions as in the step.Similarly, ready query unit can read the address that age table address is 1 next time, Chronological table output memory space number is 3, and the reading initial address that memory space is calculated according to t × L is 30, you can reads storage Address is 30~39 assignment instructions word in space, is the f+1 assignment instructions, and so on.
It needs exist for illustrating, address is the volume for being stored with memory space in 0~28 address in the chronological table in Fig. 4 Number, it is the chronological table that is sequentially written in for entering reservation station according to assignment instructions that these, which are numbered, most advanced next in memory space at this time Assignment instructions be stored in 50~59 address space, second assignment instructions is stored in 30~39 address space, Three assignment instructions are stored in 290~299 address space ... and so on.So according to chronological table address from it is small to Big order reads chronological table, it is possible to which the order for entering reservation station by assignment instructions reads assignment instructions, so as to fulfill " old- First " principles.
Step 6, selection wakeup unit are deposited according to the cnt+1 input register number inquiry of f-th of assignment instructions Device state table, and after the status information of receiving register state table feedback, perform step 7;
Whether the status information that step 7, selection wakeup unit are fed back according to buffer status table, judge input register It is ready, if ready, cnt+1 is assigned to cnt, then performs step 8;Otherwise, the value of m+1 is assigned to m, cnt is set to 0, Step 5 is returned again to perform;In this example, in order to avoid repeating to inquire about the state of input register, reduce and access external register The value deposit of cnt when wakeup unit is selected to judge input register to be not ready, is then got up, is denoted as cnt_ by state table reg;When next time inquires about the assignment instructions again, the cnt_reg of the assignment instructions is assigned to after cnt and continues to inquire about, thus It can before inquire about to avoid inquiry again and be judged as the state of ready input register.This needs additional increase by 32 Register cnt_reg is used to record the position of the last inquiry of each assignment instructions.Accordingly in this step, cnt is deposited No longer it is that cnt is set to 0 to after cnt_reg, and is to return to step 5 and is assigned to the cnt_reg that next task instructs Cnt is further continued for the input register state of inquiry next task instruction.Here to simplify the description, it is set to 0 using by cnt Method.
Step 8, selection wakeup unit by the input register number num of ready counter cnt and f-th of assignment instructions into Row compares, if meeting cnt < num, is performed back to step 6;Otherwise, step 9 is performed;When meeting cnt < num, this is represented When also have input register state do not inquire about, then continue to inquire about the state of remaining input register back to step 6;It is no Then, represent that all input registers have all been inquired about and all input registers are ready, then perform step 9 Whether ready inquire about computing resource.
Step 9 selects required computing resource information of the wakeup unit according to f-th of assignment instructions, to managing computing resources After unit sends querying command;Perform step 10;
Step 10, allocation unit receive the querying command that selection wakeup unit is sent, and scan computing resource table, and judgement is The no required computing resource information for meeting f-th of assignment instructions, if satisfied, being then " empty by state in managing computing resources unit The computational resource allocation in spare time " gives selection wakeup unit, and " free time " to wakeup unit feedback status information is selected to be distributed Computing resource number, while the state of " free time " computing resource distributed is set to " occupancy " in computing resource table;If It is unsatisfactory for, then it is insufficient for required computing resource to wakeup unit feedback status information is selected;And perform step 11;As shown in Figure 6 To calculate Single Component Management cellular construction schematic diagram in this example.By the computing resource species distributed and corresponding computing resource in figure Number is sent to selection wakeup unit by one computational resource allocation information of packetization module generation, this is namely when meeting f To the status information of selection wakeup unit feedback during the required computing resource information of a assignment instructions.
The status information that step 11, selection wakeup unit are fed back according to managing computing resources unit judges that required calculating provides Whether source is ready, if ready, at this point, two conditions of assignment instructions transmitting are all already prepared to:Input register is Thread and computing resource are ready.Then f-th ready of assignment instructions are transmitted in external pe array and held Row, that is, be sent in the required computing resource distributed in pe array by allocation unit and perform.And by reservation station The mode bit of the t+1 address is set to " free time " in state table;The output t of the m+1 address of chronological table is set to engineering noise; So that generating " bubble " in t+1 address of memory space and the m+1 address of chronological table, and perform simultaneously Step 12 and step 13;Otherwise, the value of m+1 is assigned to m, cnt is set to 0, and return to step 5 performs;In this example, if required Computing resource is not ready similar with step 7, is no longer that cnt is set to 0 after cnt deposits to cnt_reg, and is to return to step The cnt_reg of next task instruction is assigned to cnt by rapid 5, is further continued for the input register shape of inquiry next task instruction State.
The m+2 address information in chronological table is assigned to the m+1 address by step 12 successively to n-th address information Information is assigned to k to the N-1 address information, and by k-1;The step be for compressing " bubble " in chronological table, that is, Have compressed " bubble " in memory space, compression process such as Fig. 5.Judge whether k=0 is true, if so, then by memory space State is changed to " sky " state;Otherwise, " non-empty " state is maintained;If k=0, show to compress the distribution of " bubble " chronological table afterwards Pointer is directed toward first address space of chronological table, and the state of memory space is " sky " state at this time;Also illustrate in compression " gas The allocation pointer of chronological table is directed toward second address space of chronological table before bubble ", and the state of memory space is " non-empty " shape at this time State;Therefore need the state of memory space being changed to " sky " state from " non-empty " state.Otherwise, the state for maintaining memory space is " non-empty " state.
Judge whether k=N-1 is true, if so, the state of memory space is then changed to " non-full " state;Otherwise, maintain In " non-full " state;After cnt is set to 0, returns again to step 5 and perform;If k=N-1, show to compress " bubble " chronological table afterwards Allocation pointer is directed toward the last one address space of chronological table, and the state of memory space is " non-full " state at this time;Also illustrate The allocation pointer of chronological table is directed toward the next address space of the last one address space of chronological table before compression " bubble ", deposits at this time The state for storing up space is " full " state;Therefore need the state of memory space being changed to " non-full " state from " full " state.Otherwise, The state for maintaining memory space is " non-full " state.In this example, compressed it is similar with step 7 after " bubble ", be no longer by Cnt is set to 0, and is to return to step 5 and the cnt_reg that next task instructs is assigned to cnt, is further continued for inquiring about next The input register state of business instruction.Here because task has emitted, it is no longer necessary to by cnt deposits into cnt_reg.
If step 13, recovery unit receive the completion calculating task information of pe array feedback, to completing to count It calculates mission bit stream to be parsed, obtains the computing resource species for having completed calculating task instruction and computing resource number, and will It completes the computing resource species of calculating task instruction and computing resource numbers corresponding computing resource in computing resource table Computing resource state is changed to " free time ", repeats and performs step 13.In computing unit administrative unit structure diagram as shown in Figure 6 Allocation unit.
To sum up, step 1 is initialization procedure;Step 2~step 4 writes the process of reservation station for assignment instructions;Step 5~ Step 12 is the selection wakeup process of assignment instructions and the assigning process of computing resource;Step 13 is the recycling of computing resource Journey.The dispatching method has used for reference the concept of second rank pointer, and " old-first " efficiently realized in assignment instructions scheduling process is former Then.
Wherein, managing computing resources unit, which uses, actively distributes the mode of the resource then computing resource by state for " free time " Distribute to selection wakeup unit;The mode for actively distributing resource is:
Step 1 forms computing resource table using several cell fifos, it is assumed that the species of computing resource is A kinds;And each The number of computing resource is B;And it is C groups to divide the number B of each computing resource, so as to form A × C cell fifo institute group Into two-dimensional matrix;And each column cell fifo corresponds to a base and numbers, and is respectivelyIt is illustrated in figure 6 Managing computing resources cellular construction schematic diagram, including allocation unit, computing resource table and recovery unit.Intermediate two-dimensional matrix is For computing resource table.In this example, A=4, B=32, C=4 are made, that is, shares 4 kinds of computing resources, each computing resource has 32, Each computing resource is divided into 4 groups, then includes 8 computing resources of the same race for every group.By computing resource species coding and block encoding shape Into the two-dimensional matrix of one 4 × 4, using computing resource species coding as the row y of two-dimensional matrix, using block encoding as Two-Dimensional Moment The row x of battle array.Each element of two-dimensional matrix is the vector that a length is 8 again simultaneously, represents that every group of computing resource is numbered inclined Shifting amount, the vector are stored in FIFO, and each FIFO provides an offset and is denoted as z every time.Therefore each offset is corresponding One unique coordinate (y, x, z), wherein 0≤y≤3,0≤x≤3,0≤z≤7..
Step 2 initializes two-dimensional matrix, by offsetIt is respectively written into each of two-dimensional matrix In cell fifo, for representing the computing resource of " free time " state;In this example, offset is 0~7, i.e., in initialization rank Section, 0~7 is write successively in each FIFO.
Step 3, each cell fifo read the offset of itself, and computing resource table is waited to meet f-th of task The required computing resource information of instruction;The step is that FIFO actively provides an offset, and allocation unit is waited to obtain;When this is inclined After shifting amount is allocated unit acquisition, which reads the offset of itself automatically immediately, again waits for allocation unit acquisition. Because the process for reading FIFO usually requires 2~3 clock cycle of consumption, 2 will be waited in order to avoid obtaining computing resource every time ~3 clock cycle, thus take by FIFO grouping form and unsolicited mode, it is advantageous so in performance. In addition, the storage depth for being grouped or not being grouped required FIFO is identical, but the storage bit wide of required FIFO It is different;If not being grouped, need to store is the number of computing resource, and otherwise need to store is the inclined of computing resource number Shifting amount, in this example, the number of each computing resource is 32, and computing resource number is 0~31, and offset is 0~7;Therefore The storage bit wide that FIFO is needed if not being grouped is 5, and the storage bit wide for otherwise needing FIFO is 3.Therefore using grouping The FIFO memory spaces that mode can save in total are 4 × 32 × 2=256.
Step 4, when computing resource table meets the required computing resource information of f-th assignment instructions, each cell fifo Computing resource number is formed after base number corresponding to read offset and itself is combined;As shown in fig. 6, this reality Allocation unit in example includes row decoder module, row decoder module and packetization module.Computing resource table sends coordinate (y, x, z) Into the row decoder module and row decoder module of allocation unit, it is x × 8 to calculate base number, offset z, so as to be counted Calculate resource number g be:X × 8+z, the species of computing resource is y.By the number of the computing resource of acquisition and computing resource Species is sent in packetization module, coordinate of the computing resource needed for acquisition in pe array, you can by corresponding task Instruction, which is sent in the required computing resource of pe array, to be performed;
Step 5, allocation unit arbitrarily choose the required calculating for meeting f-th of assignment instructions from C groups computing resource number The computing resource number of resource information is supplied to selection wakeup unit.
As shown in fig. 6, the removal process of computing resource is the inverse process of computational resource allocation process, the recycling in this example Unit includes unpacking module, row coding module and row coding module.It unpacks module to finish receiving calculating task information and parse, obtain Resource category y must be calculated and computing resource number is set to g, row coding module and row coding module is subsequently sent to, is calculated The behavior y of resource is classified as x=g/8, and to divide exactly computing, " % " is complementation computing by offset z=g%8, wherein "/";So as to It is (y, x, z) to obtain coordinate, and offset z is stored in corresponding FIFO, that is, realizes the removal process of computing resource, this correspondence The computing resource state in computing resource table is changed to " free time " this operation in step 13.

Claims (3)

1. a kind of out of order multi-emitting scheduler of task level is provided in processor and for dispatching M assignment instructions, the place Reason device includes:Fetch unit, buffer status table and pe array;It is characterized in that the scheduler includes:Retain It stands, select wakeup unit and managing computing resources unit;Write address administrative unit, memory space and guarantor are included in the reservation station Stay station state table;Chronological table, ready query unit and ready counter are included in the selection wakeup unit;The computing resource Computing resource table, allocation unit and recovery unit are included in administrative unit;
The memory space accommodates up to N number of assignment instructions, each task for preserving M assignment instructions in synchronization Instruction occupies continuous L address space in the memory space so that the memory space is divided into N sections, and number is followed successively by 0 ~N-1;The write address administrative unit is used to distribute N number of assignment instructions automatically the memory space of reservation station;It is described to deposit Storing up the state in space includes " sky ", " full ", " non-empty " and " non-full ";The reservation station state table is empty for storing the storage Between mode bit;The mode bit includes:" free time " or " occupancy ";
The ready query unit is used to receive the assignment instructions that the reservation station is sent and be parsed, and obtains the task and refers to Computing resource information and input register information needed for order, and it is sent respectively to the managing computing resources unit and register State table and the status information for receiving feedback;The computing resource information includes:Computing resource species and computing resource number;Institute Stating input register information includes:Input register is numbered and input register number num;The chronological table is used for store tasks Instruct address information in the memory space, and by the assignment instructions enter reservation station be sequentially presented to it is described ready Query unit;Based on the ready counter is carried out by the ready input register fed back to the buffer status table Number;
Whether the computing resource table is used for the computing resource fed back needed for the assignment instructions ready;The allocation unit is used for The computing resource table is inquired about, and ready computing resource number is sent to the selection wakeup unit;The recycling is single Member has completed the computing resource of calculating task for recycling;
The selection wakeup unit is deposited according to the input that the ready computing resource number and buffer status table are fed back The status information of device judges whether the assignment instructions are ready, and ready assignment instructions are transmitted to external processing list It is performed in element array.
2. a kind of dispatching method of the out of order multi-emitting scheduler of task based access control grade, it is characterized in that carrying out as follows:
The assignment instructions number that step 1, definition Fetch unit are sent to reservation station is variable p, 0≤p≤M-1;Definition storage is empty Between write address for variable i, 0 × L≤i≤N × L-1;The reading address for defining the memory space is variable j, 0 × L≤j≤N ×L-1;The write address for defining chronological table is variable k, 0≤k≤N;Define the reading address of the chronological table for variable m, 0≤m≤ N-1;The address for defining reservation station state table is variable n, 0≤n≤N-1;Ready counter is defined as variable cnt;And initialize p =0, i=0, j=0, k=0, m=0, n=0, cnt=0;And step 2 and step 5 are performed simultaneously;
Step 2, write address administrative unit judge whether the state of the memory space is " full " state, if " full ";It is then described Write address administrative unit starts a query at again when the state of the memory space being waited to be " non-full " state;Otherwise, the write address Administrative unit inquires about the mode bit of (n+1)th address in the reservation station state table immediately, if the mode bit of (n+1)th address is " free time ", then the state of the write address of memory space is changed to " locked " state by the write address administrative unit, then performs step Rapid 3;Otherwise, n+1 is assigned to n, and repeats step 2;
Step 3, the reservation station judge whether the Fetch unit sends+1 assignment instructions of pth in N number of assignment instructions;If It sends, then n × L is assigned to i, represent n-th × L+1 ground being stored in+1 assignment instructions of pth in the memory space In location, and the mode bit of (n+1)th address in the reservation station state table is set to " occupy ";Write address administrative unit will write ground The state of location is changed to " unlocked " state, while n is stored in+1 ground of kth of the chronological table according to " old-first " principle In location, and k+1 is assigned to k, after p+1 is assigned to p, performs step 4;If not sending, step 3 is continued to execute, wait takes Refer to+1 assignment instructions of pth that unit is sent;
Step 4, judge whether k=1 is true, if so, the state of the memory space is then changed to " non-empty " state, otherwise, The state for maintaining the memory space is " non-empty " state;
Judge whether k=N is true, if so, then the state of the memory space is changed to " to expire " state, otherwise, described in maintenance The state of memory space is " non-full " state;And return to step 2 performs;
Whether step 5, the state for judging the memory space are " non-empty " state, if " non-empty " state, then according to " old- First " principles read the m+1 address of the chronological table, obtain the output t of the m+1 address of the chronological table, and will T × L is assigned to j, so that f-th of assignment instructions can be read from the t × L+1 address of the memory space, and F-th of assignment instructions are parsed, obtain computing resource information and input register letter needed for f-th of assignment instructions Breath, then perform step 6;Otherwise, step 5 is repeated, the state of the memory space is waited to become " non-empty " state;
Step 6, selection wakeup unit inquire about register shape according to the cnt+1 input register number of f-th of assignment instructions State table, and after receiving the status information of buffer status table feedback, perform step 7;
The status information that step 7, the selection wakeup unit are fed back according to the buffer status table, judges the input deposit Whether device is ready, if ready, cnt+1 is assigned to cnt, then performs step 8;Otherwise, the value of m+1 is assigned to m, it will Cnt is set to 0, returns again to step 5 and performs;
Step 8, the selection wakeup unit are by the input register number of the ready counter cnt and f-th of assignment instructions Num is compared, if meeting cnt < num, is performed back to step 6;Otherwise, step 9 is performed;
Step 9, the required computing resource information for selecting wakeup unit according to f-th of assignment instructions, to managing computing resources After unit sends querying command;Perform step 10;
Step 10, allocation unit receive the querying command that the selection wakeup unit is sent, and scan computing resource table, judgement is The no required computing resource information for meeting f-th of assignment instructions, if satisfied, then by shape in the managing computing resources unit State for " free time " computational resource allocation to the selection wakeup unit, and to it is described select wakeup unit feedback status information for The computing resource number of " free time " distributed, while by " free time " computing resource distributed in the computing resource table State is set to " occupancy ";If not satisfied, wakeup unit feedback status information then is selected as required computing resource deficiency to described;And Perform step 11;
The status information that step 11, the selection wakeup unit are fed back according to the managing computing resources unit, judges the institute It needs computing resource whether ready, if ready, f-th ready of assignment instructions is transmitted to external processing unit battle array It is performed in row, and the mode bit of the t+1 address in the reservation station state table is set to " free time ";By the chronological table The output t of the m+1 address be set to engineering noise;So that t+1 address and the chronological table of the memory space The m+1 address in generate " bubble ", and perform step 12 and step 13 simultaneously;Otherwise, the value of m+1 is assigned to m, it will Cnt is set to 0, and return to step 5 performs;
The m+2 address information in the chronological table is assigned to the m+1 address by step 12 successively to n-th address information Information is assigned to k to the N-1 address information, and by k-1;Judge whether k=0 is true, if so, then by the memory space State be changed to " sky " state;Otherwise, " non-empty " state is maintained;
Judge whether k=N-1 is true, if so, the state of the memory space is then changed to " non-full " state;Otherwise, maintain In " non-full " state;After cnt is set to 0, returns again to step 5 and perform;
If step 13, recovery unit receive the completion calculating task information of the pe array feedback, to described complete It is parsed into calculating task information, obtains the computing resource species for having completed calculating task instruction and computing resource number, and By the corresponding computing resource of the computing resource species for having completed calculating task instruction and computing resource number in the meter The computing resource state calculated in resource table is changed to " free time ", repeats and performs step 13.
3. dispatching method according to claim 2, it is characterized in that, the managing computing resources unit is using actively distribution money The mode in source then gives state to the selection wakeup unit for the computational resource allocation of " free time ";The side for actively distributing resource Formula is:
Step 1 forms the computing resource table using several cell fifos, it is assumed that the species of computing resource is A kinds;And each The number of computing resource is B;And it is C groups to divide the number B of each computing resource, so as to form A × C cell fifo institute group Into two-dimensional matrix;And each column cell fifo corresponds to a base and numbers, and is respectively
Step 2 initializes the two-dimensional matrix, by offsetIt is respectively written into each of the two-dimensional matrix In cell fifo, for representing the computing resource of " free time " state;
Step 3, each cell fifo read the offset of itself, and the computing resource table is waited to meet described f-th The required computing resource information of assignment instructions;
Step 4, when the computing resource table meets the required computing resource information of f-th of assignment instructions, each FIFO Unit forms computing resource number after the base number corresponding to read offset and itself is combined;
Step 5, the allocation unit are arbitrarily chosen from C groups computing resource number to be met needed for f-th of assignment instructions The computing resource number of computing resource information is supplied to the selection wakeup unit.
CN201510342408.1A 2015-06-18 2015-06-18 A kind of out of order multi-emitting scheduler of task level and its dispatching method Active CN104932945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510342408.1A CN104932945B (en) 2015-06-18 2015-06-18 A kind of out of order multi-emitting scheduler of task level and its dispatching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510342408.1A CN104932945B (en) 2015-06-18 2015-06-18 A kind of out of order multi-emitting scheduler of task level and its dispatching method

Publications (2)

Publication Number Publication Date
CN104932945A CN104932945A (en) 2015-09-23
CN104932945B true CN104932945B (en) 2018-05-18

Family

ID=54120119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510342408.1A Active CN104932945B (en) 2015-06-18 2015-06-18 A kind of out of order multi-emitting scheduler of task level and its dispatching method

Country Status (1)

Country Link
CN (1) CN104932945B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2543303B (en) * 2015-10-14 2017-12-27 Advanced Risc Mach Ltd Vector data transfer instruction
CN110618857A (en) * 2019-08-14 2019-12-27 中国电力科学研究院有限公司 Multitask measurement and control method and resource allocation method for calibration platform
CN111538534B (en) * 2020-04-07 2023-08-08 江南大学 Multi-instruction out-of-order transmitting method and processor based on instruction wither
CN111552366B (en) * 2020-04-07 2021-10-22 江南大学 Dynamic delay wake-up circuit and out-of-order instruction transmitting architecture
WO2022199043A1 (en) * 2021-03-22 2022-09-29 广东赛昉科技有限公司 Method and system for implementing vsetli instruction in risv_v vector instruction set

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101419561A (en) * 2007-10-26 2009-04-29 中兴通讯股份有限公司 Resource management method and system in isomerization multicore system
CN101710292A (en) * 2009-12-21 2010-05-19 中国人民解放军信息工程大学 Reconfigurable task processing system, scheduler and task scheduling method
CN102081551A (en) * 2011-01-28 2011-06-01 中国人民解放军国防科学技术大学 Micro-architecture sensitive thread scheduling (MSTS) method
CN102708007A (en) * 2012-04-06 2012-10-03 沈阳航空航天大学 Thread performance prediction and control method of chip multi-threading (CMT) computer system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7171668B2 (en) * 2001-12-17 2007-01-30 International Business Machines Corporation Automatic data interpretation and implementation using performance capacity management framework over many servers
US20040064558A1 (en) * 2002-09-26 2004-04-01 Hitachi Ltd. Resource distribution management method over inter-networks
US10191771B2 (en) * 2015-09-18 2019-01-29 Huawei Technologies Co., Ltd. System and method for resource management

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101419561A (en) * 2007-10-26 2009-04-29 中兴通讯股份有限公司 Resource management method and system in isomerization multicore system
CN101710292A (en) * 2009-12-21 2010-05-19 中国人民解放军信息工程大学 Reconfigurable task processing system, scheduler and task scheduling method
CN102081551A (en) * 2011-01-28 2011-06-01 中国人民解放军国防科学技术大学 Micro-architecture sensitive thread scheduling (MSTS) method
CN102708007A (en) * 2012-04-06 2012-10-03 沈阳航空航天大学 Thread performance prediction and control method of chip multi-threading (CMT) computer system

Also Published As

Publication number Publication date
CN104932945A (en) 2015-09-23

Similar Documents

Publication Publication Date Title
CN104932945B (en) A kind of out of order multi-emitting scheduler of task level and its dispatching method
CN100538628C (en) Be used for system and method in SIMD structure processing threads group
JP4292198B2 (en) Method for grouping execution threads
CN104765589B (en) Grid parallel computation preprocess method based on MPI
CN102866912A (en) Single-instruction-set heterogeneous multi-core system static task scheduling method
CN103197916A (en) Methods and apparatus for source operand collector caching
CN103226463A (en) Methods and apparatus for scheduling instructions using pre-decode data
CN103279379A (en) Methods and apparatus for scheduling instructions without instruction decode
CN104657114B (en) More dispatching systems of parallelization and the method arbitrated for sequencing queue
CN101717817A (en) Method for accelerating RNA secondary structure prediction based on stochastic context-free grammar
KR20150065865A (en) Select logic using delayed reconstructed program order
CN110231986A (en) Dynamic based on more FPGA reconfigurable multi-task scheduling and laying method
CN110750265B (en) High-level synthesis method and system for graph calculation
WO2018027706A1 (en) Fft processor and algorithm
CN102708009A (en) Method for sharing GPU (graphics processing unit) by multiple tasks based on CUDA (compute unified device architecture)
JP5522283B1 (en) List vector processing apparatus, list vector processing method, program, compiler, and information processing apparatus
Sun et al. Sense: Model-hardware codesign for accelerating sparse CNNs on systolic arrays
Qu et al. A parallel configuration model for reducing the run-time reconfiguration overhead
CN103294449A (en) Pre-scheduled replays of divergent operations
CN103455367B (en) For realizing administrative unit and the method for multi-task scheduling in reconfigurable system
Yin et al. FPGA-based high-performance CNN accelerator architecture with high DSP utilization and efficient scheduling mode
CN104636207B (en) Coordinated dispatching method and system based on GPGPU architectures
WO2013081744A1 (en) Herarchical multi-core processor and method of programming for efficient data processing
CN105844110B (en) A kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning
Singh et al. A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant