CN106293736B - Two-stage programmer and its calculation method for coarseness multicore computing system - Google Patents

Two-stage programmer and its calculation method for coarseness multicore computing system Download PDF

Info

Publication number
CN106293736B
CN106293736B CN201610645202.0A CN201610645202A CN106293736B CN 106293736 B CN106293736 B CN 106293736B CN 201610645202 A CN201610645202 A CN 201610645202A CN 106293736 B CN106293736 B CN 106293736B
Authority
CN
China
Prior art keywords
instruction
task
register
data
level task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610645202.0A
Other languages
Chinese (zh)
Other versions
CN106293736A (en
Inventor
宋宇鲲
李浩洋
张多利
杜高明
卫灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201610645202.0A priority Critical patent/CN106293736B/en
Publication of CN106293736A publication Critical patent/CN106293736A/en
Application granted granted Critical
Publication of CN106293736B publication Critical patent/CN106293736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/35Creation or generation of source code model driven

Abstract

The invention discloses a kind of two-stage programmers and its calculation method for coarseness multicore computing system, if it is characterized in that: including dry system register and general register in programmer, and using a part of Coutinuous store space of coarseness computing system synchronous DRAM SDRAM as physical register;The assignment instructions of coarseness computing system are divided into top-level task instruction and bottom function command two-stage;Bottom function command in a calculating task for instructing corresponding functional unit to execute specific data processing operation, and top-level task instruction is then for safeguarding the data transitive relation between calculating task.The present invention can conveniently realize the multiplexing to bottom function command, reduce the demand to assignment instructions memory space;All information that top-level task instruction is needed comprising task schedule, are easy to implement the dynamic dispatching of task;The setting of jump instruction makes programming have stronger flexibility, further facilitates the programming of programmer.

Description

Two-stage programmer and its calculation method for coarseness multicore computing system
Technical field
It is calculated the present invention relates to high density and field of signal processing is specifically a kind of for coarseness computing system Two-stage programmer and its calculation method.
Background technique
Multi-core technology is because low power consumption, strong parallel processing capability and excellent calculated performance have become processor and set The mainstream of meter.However, efficiently being mapped in multicore computing system algorithm, to give full play to system concurrency, As the important bottleneck for restricting multicore computing system operational performance, referred to as " programming wall ".It can realize multicore computing system Efficiently mapping reduces multicore computing system and programs difficulty, and can the operational capability that be directly related to multicore computing system be played, It has been increasingly becoming one of the main problem that current multicore computing system faces.
Summary of the invention
Present invention place in order to overcome the deficiencies of the prior art proposes a kind of two-stage for coarseness computing system and programs Device and its calculation method reduce coarseness multicore to provide stronger flexibility for the programming of multicore computing system The programming difficulty of computing system, while realizing the multiplexing of assignment instructions, reducing can memory space occupied by instruction storage;Top layer is appointed Also providing the foundation for the assignment instructions dynamic dispatching of multicore computing system occurs in business instruction.
The technical scheme adopted by the invention to achieve the purpose is as follows:
A kind of the characteristics of two-stage programmer for coarseness multicore computing system of the invention sign be for write towards The assignment instructions of heterogeneous polynuclear computing system, the assignment instructions type include: operational order, general register modification instruction, Jump instruction, branch instruction, access instruction and interface instruction, the assignment instructions are for operating in the two-stage programmer If dry system register, several general registers;Various registers pass through register number in the two-stage programmer It distinguishes;
The two-stage programmer is deposited the synchronous DRAM SDRAM's of the coarseness multicore computing system Storage space is divided into physical register region, top-level task instruction area, bottom function command region and data storage areas;
Memory space of the physical register region as several physical registers, and pass through the physical register Number can be mapped to respective physical register in the storage location of SDRAM;
The top-level task instruction area is for the instruction of top-level task described in Coutinuous store;
The bottom function command region bottom function command corresponding for the instruction of top-level task described in Coutinuous store;
Memory of the data storage areas as the two-stage programmer, and for saving data;
The operational order for reading data from several physical registers, and be sent in corresponding arithmetic element into After row operation, operation result is stored in other several physical registers, the operational order includes: the instruction of operation top-level task With operation bottom function command;
The operation top-level task instruction includes: mission number, the Data entries of task, data outlet, is deposited task type Store up number of channels, arithmetic element type and quantity, the storage address of corresponding operation bottom function command and operation bottom function The length information of instruction;The operation underlying task instruction includes the configuration information of physical register and arithmetic element;
General register modification instruction is for modifying the general of master controller in the coarseness multicore computing system Register value;The general register modification instruction is only instructed comprising register top-level task, the register top-level task Instruction includes task type and general register modification information;
The jump instruction is according to big between two general register numerical value or between general register and default value Small relationship carries out skip operation to program pointer, so that program pointer is jumped to preset position and execute operation;The jump instruction Only comprising jumping top-level task instruction;The top-level task that jumps instructs comprising task type, program pointer offset, participates in ratio Compared with general register or default value information;
The branch instruction is available to master controller, for inquiring the calculated result of operational order and deciding whether to journey Sequence pointer carries out skip operation;The instruction of branch instruction Jin Bao branch's top-level task;Branch's top-level task instructs Task type, program pointer offset, mission number, program pointer and general register numerical value storage zone;
The access instruction includes to read instruction and store instruction, and be respectively used to data storage areas in SDRAM Reading data neutralizes to physical register and stores the data in physical register into data storage areas;The access instruction Comprising the instruction of memory access top-level task and memory access bottom function command, the memory access top-level task instruction is by task type, system Register number and storage address are constituted;The memory access bottom function command includes the configuration information of physical register;
The interface instruction includes output order and input instruction, and is used for the data of data storage areas in SDRAM Data exchange is carried out with host computer or other data processing chips;The interface instruction includes the instruction of interface top-level task and interface Bottom function command, the interface top-level task instruction includes task type, system register number and interface type;The bottom Layer function instruction includes the configuration information of physical register and the configuration information of interface unit;
It is respectively completed different operation tasks with different types of assignment instructions, and by the combination of all kinds of assignment instructions, Realize jointly the coarseness multicore computing system towards Heterogeneous Computing.
A kind of the characteristics of calculation method for coarseness multicore computing system of the invention is more towards isomery for writing The assignment instructions of core computing system, the assignment instructions are divided into top-level task instruction and bottom function command;The top-level task Instruction is then for safeguarding the data transitive relation between calculating task;The bottom function command is used in a calculating task Corresponding functional unit is instructed to execute specific data processing operation;The assignment instructions type includes: operational order, general posts Storage modifies instruction, jump instruction, branch instruction, access instruction and interface instruction;The access instruction include read instruction and Store instruction, the interface instruction include output order and input instruction;Synchronizing for the coarseness multicore computing system is dynamic The memory space of state random access memory SDRAM is divided into physical register region, top-level task instruction area, bottom function command Region and data storage areas;The two-stage programmed method is to sequentially include the following steps:
Step 1: algorithm to be mapped is analyzed, it, will be described according to the type and quantity of integrated computation cluster in multicore computing system Algorithm to be mapped splits into several calculating tasks;
Step 2: the granularity of several calculating tasks being matched with coarseness multicore computing system, is appointed if calculated The granularity of business is greater than the capacity of physical register in the coarseness multicore computing system, then will according to the capacity of physical register Calculating task splits into the sub- calculating task that several granularities are less than the physical register capacity;
Step 3: Algorithm mapping being carried out to the sub- calculating task of any one function, is write in corresponding sub- calculating task The configuration information of each functional unit used, and by the configuration information of each functional unit sequence line up after, use Occupy-place coding line is filled the configuration information after sequence, so that the length of configuration information meets the coarseness multicore and calculates The integer multiple of the synchronous DRAM SDRAM burst-length of system, to form a kind of sub- calculating task of function Bottom function command;
Step 4: the top-level task instruction box of the sub- calculating task of any one function is generated using the shell script write Frame, top-level task instruction frame includes: task type, the type and quantity of arithmetic element, assignment instructions Data entries With data export volume, bottom function command length;
Step 5: when the bottom function command and top-level task of the sub- calculating task of all different function have instructed frame all Warp knit writes complete, then enters step 6;Otherwise step 3 is returned to;
Step 6: the bottom function command of the sub- calculating task of all different function being sequentially connected end to end, and in connection The bottom function command template of the interface instruction that the shell script provides and access instruction is added in end, then extracts each not The initial address of the bottom function command of congenerous, and be incorporated into the top-level task instruction frame;
Step 7: by the calculating task of the algorithm to be mapped split in step 1, operational order, jump instruction, branch being referred to The top-level task instruction of order is lined up by the computation sequence of algorithm to be mapped, obtains sub- calculating task sequence;To realize fortune Calculate function and process control;
Step 8: described in being added before needing to use operational order of the data as source data in the data storage area Instruction is read, the store instruction is added after needing to save operational order of the operation result to the data storage area;
Step 9: general register modification instruction being added before needing to recycle the sub- calculating task sequence executed, in sub- calculating Jump instruction is added after task sequence, obtains subtask cyclic sequence, is used for control loop variable and program pointer PC, to realize Circulatory function;
Step 10: output order is added after the subtask cyclic sequence, for exporting operation result;
Step 11: filling system register field in top-level task instruction, obtain calculating task sequence;
Step 12: input instruction being added before the calculating task sequence, for inputting source data, to realize to be mapped The mapping process of algorithm.
Compared with prior art, advantageous effects of the invention are embodied in:
1, the introducing of two-stage programmer of the present invention and its calculation method, by the way that assignment instructions are divided into top-level task instruction With the Type division of bottom function command and assignment instructions, the flexibility of multicore computing system programming is improved, is reduced more The programming difficulty of core computing system facilitates Algorithm mapping of the programmer in multicore computing system, substantially increases algorithm and reflect Efficiency is penetrated, alleviates the influence of " programming wall " to a certain extent.
If 2, the present invention is by including dry system register and general register in programmer, and coarseness is calculated As physical register, various registers are logical in a part of Coutinuous store space of system synchronous DRAM SDRAM The method that register number distinguishes is crossed, makes it possible the recycling of underlying programs, avoids programmer to realize The circulate operation of program and multiple copies bottom function command also avoid programmer and modify in magnanimity bottom function command to follow The numerical value for needing to change in ring body substantially increases programming efficiency, while reducing the probability of program error, improves journey The accuracy that sequence is write.In addition, a part of Coutinuous store of coarseness computing system synchronous DRAM SDRAM is empty Between introducing as physical register also for register renaming technology provide possibility, to eliminate the spurious correlation of data, tie up It protects THE Truth Of The Data to close, further increases the emission effciency of system, improve the utilization rate for calculating power.
3, the assignment instructions of coarseness computing system of the present invention are divided into top-level task instruction and bottom function command two-stage, side Multiplexing of the programmer to underlying programs, avoid programmer and repeatedly write the underlying programs of identical function, to improve Programmer shortens programming time to the mapping efficiency of algorithm;By top-level task instruction, there are many types, calculate system for multicore The Out-of-order execution of system provides possibility, when encountering interface instruction, is stored in SDRAM since data must first go through store instruction Data storage areas, so that data is realized sequence and arrange, it is uncertain due to program execution order when avoiding Out-of-order execution Property caused by random ordering export, and be unlikely to because reducing the working efficiency of whole system using instruction obstruction mode;Memory access refers to It enables and the setting of interface instruction also provides possibility for the branch prediction of master controller, due to the interface unit in functional unit Operation have irrevocable property can not cancel the output of interface unit when branch misprediction, thus will lead to mistake it is defeated Out, and store instruction such as only has just will be updated the numerical value of data storage areas in SDRAM after being submitted at the tasks, be stored in number It can guarantee its correctness according to the value of storage region, therefore the execution of interface instruction not will lead to wrong output, it is possible to Branch prediction mechanism is introduced, the computational efficiency of multicore computing system is further improved.
4, in the programming of two-stage programmer of the present invention, bottom function command is write for realizing specific single meter Data processing operation process in calculation task, top-level task instruction are write for safeguarding that the transmitting of the data between calculating task is closed System;It is extracted bottom function command as the data processing method of similar function, makes itself and specific pending data solution Coupling, while using top-level procedure as the calling to bottom function command, programmer is reduced to the difficulty of program maintenance;Due to top Layer assignment instructions and bottom function command separation, programmer only need to guarantee respectively when writing program bottom function command and The correctness of top-level task instruction can also check respectively for bottom function command and top layer is appointed when encountering mistake and needing to debug The correctness of business instruction, facilitates location of mistake and error exception, greatly reduces the debugging difficulty of program, greatly improve Programming efficiency.
Detailed description of the invention
Fig. 1 for the present invention towards coarseness multicore computing system structure chart;
Fig. 2 is the memory allocation and physical register structural scheme of mechanism of SDRAM of the present invention;
Fig. 3 is the top-level task order structure template of operational order of the present invention;
Fig. 4 is the top-level task order structure template of jump instruction of the present invention;
Fig. 5 is the top-level task order structure template of register modifying instruction of the present invention;
Fig. 6 is the top-level task order structure template of branch instruction of the present invention;
Fig. 7 is the top-level task order structure template of access instruction of the present invention;
Fig. 8 is the top-level task order structure template of interface instruction of the present invention;
Fig. 9 is the data orga-nizational format schematic diagram of bottom function command of the present invention;
Figure 10 is the form schematic diagram of subtask cyclic sequence of the present invention;
Figure 11 is the bottom function command generated when writing example task program using the present invention and interface instruction template;
Figure 12 is finally obtained bottom function command when writing example task program using the present invention;
Figure 13 is the step schematic diagram that top-level task instruction is write when writing example task program using the present invention.
Specific embodiment
This example implementation in, a kind of two-stage programmer for coarseness multicore computing system, be for write towards The assignment instructions of heterogeneous polynuclear computing system, assignment instructions type include: operational order, general register modification instruction, jump Instruction, branch instruction, access instruction and interface instruction, if assignment instructions are used to operate the dry system deposit in two-stage programmer Device, several general registers;Various registers are distinguished by register number in two-stage programmer;
The structure of coarseness heterogeneous polynuclear computing system is as shown in Figure 1, include memory MEM, network-on-chip, master controller MC and other function unit;Master controller is responsible for issuing for assignment instructions, other function unit root under the control of master controller Various calculating tasks are completed according to the configuration information collaboration received;
Assignment instructions refer in coarseness operating system to complete specific calculation task, are sent to each functional unit Order comprising action;
Calculating task refers to a series of set for simple datas operation implemented to complete special algorithm step;
As shown in Fig. 2, two-stage programmer is by the synchronous DRAM SDRAM's of coarseness multicore computing system Memory space is divided into physical register region, top-level task instruction area, bottom function command region and data storage areas;
Memory space of the physical register region as several physical registers, and pass through the number energy of physical register Respective physical register is mapped in the storage location of SDRAM;
Top-level task instruction area is instructed for Coutinuous store top-level task;
The bottom function command region bottom function command corresponding for the instruction of Coutinuous store top-level task;
Memory of the data storage areas as two-stage programmer, and for saving data;
System register is the register that programmer is used to indicate data flow, does not distribute the physical memory space, but counting When calculating task execution, system register can be mapped to specific physical register, thus its memory space be mapped to Physical register capacity is equal;The mapping relations of system register and physical register are not fixed, by computing system Master controller select free physical registers to be allocated according to the state of present physical register;
Physical register occupies the fixed Coutinuous store space of the memory capacity in SDRAM, is used for coarseness computing system Data storage between middle calculating task;Include several physical registers in coarseness computing system, and uses physical register number Different physical registers is distinguished, physical register number is mapped to physical register in the storage location of SDRAM;
When the data volume of calculating task is greater than physical register capacity corresponding to system register, need the calculating Task is divided into two or more pieces assignment instructions, occupies multiple system registers, keeps data volume in each assignment instructions all little The physical register capacity corresponding to system register completes the calculating task by a plurality of assignment instructions jointly;
When Data entries of the system register as assignment instructions, functional unit is current corresponding from system register Data are read in physical register, the source data as task;When data outlet of the system register as assignment instructions, function Energy unit is stored in operation result into the current corresponding physical register of system register, and waiting is made by calculating task thereafter With;
General register refer to it is in master controller, its numerical value can be grasped by assignment instructions by programmer The register of work, general register can be used as cyclic variable use, can also compile in assignment instructions as system register Number use;When duty cycle executes, master controller reads volume of the current value as system register in the general register Number, the system register number with the variation of general register numerical value, as assignment instructions data outlet and Data entries Change therewith, to realize the recycling of assignment instructions;
Operational order is sent in corresponding arithmetic element and is transported for reading data from several physical registers After calculation, operation result is stored in other several physical registers, operational order includes: the instruction of operation top-level task and operation bottom Layer function instruction;
As shown in figure 3, the instruction of operation top-level task includes: mission number, task type, the Data entries of task, data Outlet, memory channel quantity, arithmetic element type and quantity, the storage address of corresponding operation bottom function command and operation bottom The length information of layer function instruction;The instruction of operation underlying task includes the configuration information of physical register and arithmetic element;
General register modification instructs the general register number for modifying master controller in coarseness multicore computing system Value;As shown in figure 4, general register modification instruction is only instructed comprising register top-level task, register top-level task instruction packet Containing task type and general register modification information;
Jump instruction is closed according to the size between two general register numerical value or between general register and default value System carries out skip operation to program pointer, so that program pointer is jumped to preset position and execute operation;Jump instruction only includes to jump Turn top-level task instruction;Compare as shown in figure 5, jumping top-level task instruction comprising task type, program pointer offset, participation General register or default value information;
Branch instruction is available to master controller, for inquiring the calculated result of operational order and deciding whether to refer to program Needle carries out skip operation;The instruction of branch instruction Jin Bao branch's top-level task;As shown in fig. 6, the instruction of branch's top-level task is comprising appointing Service type, program pointer offset, mission number, program pointer and general register numerical value storage zone;
Access instruction includes to read instruction and store instruction, and be respectively used to the data of data storage areas in SDRAM Physical register is read to neutralize the data storage in physical register into data storage areas;Access instruction includes memory access Top-level task instruction and memory access bottom function command, as shown in fig. 7, the instruction of memory access top-level task is posted by task type, system Storage number and storage address are constituted;Memory access bottom function command includes the configuration information of physical register;
Interface instruction include output order and input instruction, and for by the data of data storage areas in SDRAM with Position machine or other data processing chips carry out data exchange;Interface instruction includes the instruction of interface top-level task and interface bottom function Instruction, as shown in figure 8, the instruction of interface top-level task includes task type, system register number and interface type;Bottom function Instruction includes the configuration information of physical register and the configuration information of interface unit;
It is respectively completed different operation tasks with different types of assignment instructions, and by the combination of all kinds of assignment instructions, It is common realize coarseness multicore computing system towards Heterogeneous Computing.
In this example implementation, a kind of calculation method for coarseness multicore computing system, is for writing towards isomery The assignment instructions of multicore computing system, assignment instructions are divided into top-level task instruction and bottom function command;Top-level task instructs then For safeguarding the data transitive relation between calculating task;Bottom function command is used to instruct in a calculating task corresponding Functional unit executes specific data processing operation;Assignment instructions type include: operational order, general register modification instruction, Jump instruction, branch instruction, access instruction and interface instruction;Access instruction includes to read instruction and store instruction, interface instruction Include output order and input instruction;The storage of the synchronous DRAM SDRAM of coarseness multicore computing system is empty Between be divided into physical register region, top-level task instruction area, bottom function command region and data storage areas;Calculating side Method is to sequentially include the following steps:
Step 1: algorithm to be mapped is analyzed, it, will be wait reflect according to the type and quantity of integrated computation cluster in multicore computing system It penetrates algorithm and splits into several calculating tasks;
Step 2: the granularity of several calculating tasks is matched with coarseness multicore computing system, if calculating task Granularity is greater than the capacity of physical register in coarseness multicore computing system, then according to the capacity of physical register by calculating task Split into the sub- calculating task that several granularities are less than physical register capacity;
Step 3: Algorithm mapping being carried out to the sub- calculating task of any one function, writes in corresponding sub- calculating task and uses Each functional unit configuration information, and by the configuration information of each functional unit sequence line up after, use occupy-place Coding line is filled the configuration information after sequence, so that the length of configuration information meets the same of coarseness multicore computing system The integer multiple for walking dynamic RAM SDRAM burst-length, to form a kind of bottom function of the sub- calculating task of function It can instruct, form structure as shown in Figure 9;
Step 4: the top-level task instruction box of the sub- calculating task of any one function is generated using the shell script write Frame, top-level task instruction frame includes: task type, the type and quantity of arithmetic element, the Data entries sum number of assignment instructions According to export volume, bottom function command length;
Step 5: when the bottom function command and top-level task of the sub- calculating task of all different function have instructed frame all Warp knit writes complete, then enters step 6;Otherwise step 3 is returned to;
Step 6: by the successively first connection of the bottom function command of the sub- calculating task of all different function, and at the end of connection The bottom function command template of the interface instruction that shell script provides and access instruction is added in end, then extracts each different function Bottom function command initial address, and be incorporated into top-level task instruction frame in;
Step 7: by the calculating task of the algorithm to be mapped split in step 1, operational order, jump instruction, branch being referred to The top-level task instruction of order is lined up by the computation sequence of algorithm to be mapped, obtains sub- calculating task sequence;To realize fortune Calculate function and process control;
Step 8: it is added before needing to use operational order of the data in data storage area as source data and reads instruction, Store instruction is added after needing to save operational order of the operation result to data storage area;If there are data reductions in algorithm Instruction, such as adds up, multiplies accumulating operation, can be instructed by being embedded in a pair of of store instruction and reading, carry out the operation grain of maintenance system Degree, so that the data granularity of calculating task be made to be suitble to the coarseness multicore computing system, it is more preferably to play the coarseness The computing capability of core computing system;
Step 9: general register modification instruction being added before needing to recycle the sub- calculating task sequence executed, in sub- calculating Jump instruction is added after task sequence, obtains subtask cyclic sequence, is used for control loop variable and program pointer PC, to realize Circulatory function, as shown in Figure 10;
Step 10: output order is added after the cyclic sequence of subtask, for exporting operation result;
Step 11: filling system register field in top-level task instruction, obtain calculating task sequence;
Step 12: input instruction being added before calculating task sequence, for inputting source data, to realize algorithm to be mapped Mapping process.
For this sentences the following simple computation task of completion, programming process is illustratively sketched:
Calculating task: array A data volume 32M, array B data amount 32M, array C data amount 2M form one in order The data block of 66M data volume;Every 16 numbers carry out multiplying accumulating operation array A and array B in order, obtain 2M intermediate operations knot Fruit, then add operation is sequentially done with array C, 2M operation result is obtained, finally exports operation result.
The algorithm model of the calculating task is given below:
When progress assignment instructions are write, bottom function command is write first, includes two parts calculating process in this example, It can be completed by the reconfigurable arithmetic unit RCU in functional node in goal systems, operation is respectively at 16 points and multiplies accumulating and addition, Its configuration information is generated respectively, and obtains the process template of interface instruction, as shown in figure 11;
The configuration information obtained is sequentially connected with, bottom function command is formed, structure is as shown in figure 12;
As shown in figure 13, it is gradually completing writing for top-level task instruction, the subgraph in figure respectively corresponds the step of the foregoing description Rapid 7 arrive step 12;Wherein RCU (T, A, B ...) is represented executes operation in the arithmetic element that certain one kind is referred to as RCU, and T indicates behaviour The type of work, A, B ... indicate data outlet and Data entries;LOAD (A, B) represent by memory area using B as starting point Reading data at location is into system register A;STORE (A, B) is represented the data storage in system register A to memory In region using B as initial address at;GREG (A=x, B=y ...), which is represented, is revised as x for the numerical value of general register A, will be general The numerical value of register B is revised as y ...;JUMP (A, B) is represented when cycle counter does not reach the numerical value of A, and program pointer is jumped Turn B;IN (A, B) is directed toward the data that the position input data amount of memory areas domain addresses A is B;OUT (A, B) refers to from memory The position output data quantity of regional address A is the data of B.

Claims (2)

1. a kind of two-stage programmer for coarseness multicore computing system, it is characterized in that by writing towards based on heterogeneous polynuclear The assignment instructions of calculation system, the assignment instructions type include: operational order, general register modification instruction, jump instruction, divide Zhi Zhiling, access instruction and interface instruction, if the dry system that the assignment instructions are used to operate in the two-stage programmer is posted Storage, several general registers;Various registers are distinguished by register number in the two-stage programmer;
The two-stage programmer is empty by the storage of the synchronous DRAM SDRAM of the coarseness multicore computing system Between be divided into physical register region, top-level task instruction area, bottom function command region and data storage areas;
Memory space of the physical register region as several physical registers, and pass through the volume of the physical register Number respective physical register can be mapped in the storage location of SDRAM;
The top-level task instruction area is for the instruction of top-level task described in Coutinuous store;
The bottom function command region bottom function command corresponding for the instruction of top-level task described in Coutinuous store;
Memory of the data storage areas as the two-stage programmer, and for saving data;
The operational order is sent in corresponding arithmetic element and is transported for reading data from several physical registers After calculation, operation result is stored in other several physical registers, the operational order includes: the instruction of operation top-level task and fortune Calculate bottom function command;
The operation top-level task instruction includes: mission number, task type, the Data entries of task, data outlet, storage are logical Road quantity, arithmetic element type and quantity, the storage address of corresponding operation bottom function command and operation bottom function command Length information;The operation underlying task instruction includes the configuration information of physical register and arithmetic element;
The general register modification instructs the general deposit for modifying master controller in the coarseness multicore computing system Device numerical value;The general register modification instruction is only instructed comprising register top-level task, the register top-level task instruction Include task type and general register modification information;
The jump instruction is closed according to the size between two general register numerical value or between general register and default value System carries out skip operation to program pointer, so that program pointer is jumped to preset position and execute operation;The jump instruction is only wrapped It is instructed containing top-level task is jumped;It is described jump top-level task instruction comprising task type, program pointer offset, participate in comparing General register or default value information;
The branch instruction is available to master controller, for inquiring the calculated result of operational order and deciding whether to refer to program Needle carries out skip operation;The instruction of branch instruction Jin Bao branch's top-level task;Branch's top-level task instruction includes task Type, program pointer offset, mission number, program pointer and general register numerical value storage zone;
The access instruction includes to read instruction and store instruction, and be respectively used to the data of data storage areas in SDRAM Physical register is read to neutralize the data storage in physical register into data storage areas;The access instruction includes The instruction of memory access top-level task and memory access bottom function command, the memory access top-level task instruction are deposited by task type, system Device number and storage address are constituted;The memory access bottom function command includes the configuration information of physical register;
The interface instruction include output order and input instruction, and for by the data of data storage areas in SDRAM with Position machine or other data processing chips carry out data exchange;The interface instruction includes the instruction of interface top-level task and interface bottom Function command, the interface top-level task instruction includes task type, system register number and interface type;The bottom function Can instruct includes the configuration information of physical register and the configuration information of interface unit;
Different operation tasks is respectively completed with different types of assignment instructions, and by the combination of all kinds of assignment instructions, jointly Realize the coarseness multicore computing system towards Heterogeneous Computing.
2. a kind of calculation method for coarseness multicore computing system, it is characterized in that being for writing to calculate towards heterogeneous polynuclear The assignment instructions of system, the assignment instructions are divided into top-level task instruction and bottom function command;The top-level task instruction is then used Data transitive relation between maintenance calculating task;The bottom function command is used to instruct in a calculating task corresponding Functional unit execute specific data processing operation;The assignment instructions type includes: operational order, general register modification Instruction, jump instruction, branch instruction, access instruction and interface instruction;The access instruction includes to read instruction and store instruction, The interface instruction includes output order and input instruction;The synchronous dynamic random of the coarseness multicore computing system is stored The memory space of device SDRAM is divided into physical register region, top-level task instruction area, bottom function command region and data Storage region;The calculation method is to sequentially include the following steps:
Step 1: algorithm to be mapped is analyzed, according to the type and quantity of integrated computation cluster in multicore computing system, by described wait reflect It penetrates algorithm and splits into several calculating tasks;
Step 2: the granularity of several calculating tasks is matched with coarseness multicore computing system, if calculating task Granularity is greater than the capacity of physical register in the coarseness multicore computing system, then will be calculated according to the capacity of physical register Task splits into the sub- calculating task that several granularities are less than the physical register capacity;
Step 3: Algorithm mapping being carried out to the sub- calculating task of any one function, writes in corresponding sub- calculating task and uses Each functional unit configuration information, and by the configuration information of each functional unit sequence line up after, use occupy-place Coding line is filled the configuration information after sequence, so that the length of configuration information meets the coarseness multicore computing system Synchronous DRAM SDRAM burst-length integer multiple, to form a kind of bottom of the sub- calculating task of function Layer function instruction;
Step 4: instructing frame, institute using the top-level task that the shell script write generates the sub- calculating task of any one function Stating top-level task instruction frame includes: task type, the type and quantity of arithmetic element, the Data entries of assignment instructions and data Export volume, bottom function command length;
Step 5: when bottom function command and top-level task instruction frame all warp knits of the sub- calculating task of all different function It writes complete, then enters step 6;Otherwise step 3 is returned to;
Step 6: the bottom function command of the sub- calculating task of all different function being sequentially connected end to end, and in the end of connection The bottom function command template of the interface instruction that the shell script provides and access instruction is added, then extracts each different function The initial address of the bottom function command of energy, and be incorporated into the top-level task instruction frame;
Step 7: by the calculating task of the algorithm to be mapped split in step 1, by operational order, jump instruction, branch instruction Top-level task instruction is lined up by the computation sequence of algorithm to be mapped, obtains sub- calculating task sequence;To realize operation function Energy and process control;
Step 8: the reading is added before needing to use operational order of the data as source data in the data storage area Instruction, is added the store instruction after needing to save operational order of the operation result to the data storage area;
Step 9: general register modification instruction being added before needing to recycle the sub- calculating task sequence executed, in sub- calculating task Jump instruction is added after sequence, obtains subtask cyclic sequence, is used for control loop variable and program pointer PC, to realize circulation Function;
Step 10: output order is added after the subtask cyclic sequence, for exporting operation result;
Step 11: filling system register field in top-level task instruction, obtain calculating task sequence;
Step 12: input instruction being added before the calculating task sequence, for inputting source data, to realize algorithm to be mapped Mapping process.
CN201610645202.0A 2016-08-08 2016-08-08 Two-stage programmer and its calculation method for coarseness multicore computing system Active CN106293736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610645202.0A CN106293736B (en) 2016-08-08 2016-08-08 Two-stage programmer and its calculation method for coarseness multicore computing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610645202.0A CN106293736B (en) 2016-08-08 2016-08-08 Two-stage programmer and its calculation method for coarseness multicore computing system

Publications (2)

Publication Number Publication Date
CN106293736A CN106293736A (en) 2017-01-04
CN106293736B true CN106293736B (en) 2019-05-31

Family

ID=57667132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610645202.0A Active CN106293736B (en) 2016-08-08 2016-08-08 Two-stage programmer and its calculation method for coarseness multicore computing system

Country Status (1)

Country Link
CN (1) CN106293736B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563446B (en) * 2018-03-30 2021-06-25 北京中科睿芯智能计算产业研究院有限公司 Data multiplexing and synchronizing method and device in coarse-grained data stream architecture
CN108897714B (en) * 2018-07-03 2022-05-24 中国人民解放军国防科技大学 Multi-core or many-core processor chip with autonomous region
CN110968404B (en) * 2018-09-30 2023-04-28 阿里巴巴集团控股有限公司 Equipment data processing method and device
CN111124626A (en) * 2018-11-01 2020-05-08 北京灵汐科技有限公司 Many-core system and data processing method and processing device thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7610433B2 (en) * 2004-02-05 2009-10-27 Research In Motion Limited Memory controller interface
CN102073481A (en) * 2011-01-14 2011-05-25 上海交通大学 Multi-kernel DSP reconfigurable special integrated circuit system
CN103049245A (en) * 2012-10-25 2013-04-17 浪潮电子信息产业股份有限公司 Software performance optimization method based on central processing unit (CPU) multi-core platform
CN104572109A (en) * 2015-01-19 2015-04-29 上海交通大学 Two-stage partitioned two-time polycondensation parallel computing system development method and parallel computing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7610433B2 (en) * 2004-02-05 2009-10-27 Research In Motion Limited Memory controller interface
CN102073481A (en) * 2011-01-14 2011-05-25 上海交通大学 Multi-kernel DSP reconfigurable special integrated circuit system
CN103049245A (en) * 2012-10-25 2013-04-17 浪潮电子信息产业股份有限公司 Software performance optimization method based on central processing unit (CPU) multi-core platform
CN104572109A (en) * 2015-01-19 2015-04-29 上海交通大学 Two-stage partitioned two-time polycondensation parallel computing system development method and parallel computing system

Also Published As

Publication number Publication date
CN106293736A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
EP3757901A1 (en) Schedule-aware tensor distribution module
CN102902512B (en) A kind of multi-threading parallel process method based on multi-thread programming and message queue
CN106293736B (en) Two-stage programmer and its calculation method for coarseness multicore computing system
US8782645B2 (en) Automatic load balancing for heterogeneous cores
CN110121702B (en) Processor, method of operating the processor, and readable storage medium
CN102023844B (en) Parallel processor and thread processing method thereof
US20080250227A1 (en) General Purpose Multiprocessor Programming Apparatus And Method
CN102708090B (en) Verification method for shared storage multicore multithreading processor hardware lock
US8997071B2 (en) Optimized division of work among processors in a heterogeneous processing system
WO2013184380A2 (en) Systems and methods for efficient scheduling of concurrent applications in multithreaded processors
Giorgi et al. An introduction to DF-Threads and their execution model
CN103885751A (en) System and method for allocating memory of differing properties to shared data objects
US20120331278A1 (en) Branch removal by data shuffling
CN102708009B (en) Method for sharing GPU (graphics processing unit) by multiple tasks based on CUDA (compute unified device architecture)
CN104375805A (en) Method for simulating parallel computation process of reconfigurable processor through multi-core processor
CN104317770B (en) Data store organisation for many-core processing system and data access method
KR20090089327A (en) Method and system for parallization of pipelined computations
US20240086359A1 (en) Dynamic allocation of arithmetic logic units for vectorized operations
Holk et al. Declarative parallel programming for GPUs
CN104346132A (en) Control device applied to running of intelligent card virtual machine and intelligent card virtual machine
US8914779B2 (en) Data placement for execution of an executable
US20230315479A1 (en) Method and system for supporting throughput-oriented computing
Tarakji et al. The development of a scheduling system GPUSched for graphics processing units
CN105893660B (en) A kind of CPU design method and computing system towards symbol BDD operations
Garcia et al. A dynamic schema to increase performance in many-core architectures through percolation operations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant