CN106293736B - Two-stage programmer and its calculation method for coarseness multicore computing system - Google Patents
Two-stage programmer and its calculation method for coarseness multicore computing system Download PDFInfo
- Publication number
- CN106293736B CN106293736B CN201610645202.0A CN201610645202A CN106293736B CN 106293736 B CN106293736 B CN 106293736B CN 201610645202 A CN201610645202 A CN 201610645202A CN 106293736 B CN106293736 B CN 106293736B
- Authority
- CN
- China
- Prior art keywords
- instruction
- task
- register
- data
- level task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/35—Creation or generation of source code model driven
Abstract
The invention discloses a kind of two-stage programmers and its calculation method for coarseness multicore computing system, if it is characterized in that: including dry system register and general register in programmer, and using a part of Coutinuous store space of coarseness computing system synchronous DRAM SDRAM as physical register;The assignment instructions of coarseness computing system are divided into top-level task instruction and bottom function command two-stage;Bottom function command in a calculating task for instructing corresponding functional unit to execute specific data processing operation, and top-level task instruction is then for safeguarding the data transitive relation between calculating task.The present invention can conveniently realize the multiplexing to bottom function command, reduce the demand to assignment instructions memory space;All information that top-level task instruction is needed comprising task schedule, are easy to implement the dynamic dispatching of task;The setting of jump instruction makes programming have stronger flexibility, further facilitates the programming of programmer.
Description
Technical field
It is calculated the present invention relates to high density and field of signal processing is specifically a kind of for coarseness computing system
Two-stage programmer and its calculation method.
Background technique
Multi-core technology is because low power consumption, strong parallel processing capability and excellent calculated performance have become processor and set
The mainstream of meter.However, efficiently being mapped in multicore computing system algorithm, to give full play to system concurrency,
As the important bottleneck for restricting multicore computing system operational performance, referred to as " programming wall ".It can realize multicore computing system
Efficiently mapping reduces multicore computing system and programs difficulty, and can the operational capability that be directly related to multicore computing system be played,
It has been increasingly becoming one of the main problem that current multicore computing system faces.
Summary of the invention
Present invention place in order to overcome the deficiencies of the prior art proposes a kind of two-stage for coarseness computing system and programs
Device and its calculation method reduce coarseness multicore to provide stronger flexibility for the programming of multicore computing system
The programming difficulty of computing system, while realizing the multiplexing of assignment instructions, reducing can memory space occupied by instruction storage;Top layer is appointed
Also providing the foundation for the assignment instructions dynamic dispatching of multicore computing system occurs in business instruction.
The technical scheme adopted by the invention to achieve the purpose is as follows:
A kind of the characteristics of two-stage programmer for coarseness multicore computing system of the invention sign be for write towards
The assignment instructions of heterogeneous polynuclear computing system, the assignment instructions type include: operational order, general register modification instruction,
Jump instruction, branch instruction, access instruction and interface instruction, the assignment instructions are for operating in the two-stage programmer
If dry system register, several general registers;Various registers pass through register number in the two-stage programmer
It distinguishes;
The two-stage programmer is deposited the synchronous DRAM SDRAM's of the coarseness multicore computing system
Storage space is divided into physical register region, top-level task instruction area, bottom function command region and data storage areas;
Memory space of the physical register region as several physical registers, and pass through the physical register
Number can be mapped to respective physical register in the storage location of SDRAM;
The top-level task instruction area is for the instruction of top-level task described in Coutinuous store;
The bottom function command region bottom function command corresponding for the instruction of top-level task described in Coutinuous store;
Memory of the data storage areas as the two-stage programmer, and for saving data;
The operational order for reading data from several physical registers, and be sent in corresponding arithmetic element into
After row operation, operation result is stored in other several physical registers, the operational order includes: the instruction of operation top-level task
With operation bottom function command;
The operation top-level task instruction includes: mission number, the Data entries of task, data outlet, is deposited task type
Store up number of channels, arithmetic element type and quantity, the storage address of corresponding operation bottom function command and operation bottom function
The length information of instruction;The operation underlying task instruction includes the configuration information of physical register and arithmetic element;
General register modification instruction is for modifying the general of master controller in the coarseness multicore computing system
Register value;The general register modification instruction is only instructed comprising register top-level task, the register top-level task
Instruction includes task type and general register modification information;
The jump instruction is according to big between two general register numerical value or between general register and default value
Small relationship carries out skip operation to program pointer, so that program pointer is jumped to preset position and execute operation;The jump instruction
Only comprising jumping top-level task instruction;The top-level task that jumps instructs comprising task type, program pointer offset, participates in ratio
Compared with general register or default value information;
The branch instruction is available to master controller, for inquiring the calculated result of operational order and deciding whether to journey
Sequence pointer carries out skip operation;The instruction of branch instruction Jin Bao branch's top-level task;Branch's top-level task instructs
Task type, program pointer offset, mission number, program pointer and general register numerical value storage zone;
The access instruction includes to read instruction and store instruction, and be respectively used to data storage areas in SDRAM
Reading data neutralizes to physical register and stores the data in physical register into data storage areas;The access instruction
Comprising the instruction of memory access top-level task and memory access bottom function command, the memory access top-level task instruction is by task type, system
Register number and storage address are constituted;The memory access bottom function command includes the configuration information of physical register;
The interface instruction includes output order and input instruction, and is used for the data of data storage areas in SDRAM
Data exchange is carried out with host computer or other data processing chips;The interface instruction includes the instruction of interface top-level task and interface
Bottom function command, the interface top-level task instruction includes task type, system register number and interface type;The bottom
Layer function instruction includes the configuration information of physical register and the configuration information of interface unit;
It is respectively completed different operation tasks with different types of assignment instructions, and by the combination of all kinds of assignment instructions,
Realize jointly the coarseness multicore computing system towards Heterogeneous Computing.
A kind of the characteristics of calculation method for coarseness multicore computing system of the invention is more towards isomery for writing
The assignment instructions of core computing system, the assignment instructions are divided into top-level task instruction and bottom function command;The top-level task
Instruction is then for safeguarding the data transitive relation between calculating task;The bottom function command is used in a calculating task
Corresponding functional unit is instructed to execute specific data processing operation;The assignment instructions type includes: operational order, general posts
Storage modifies instruction, jump instruction, branch instruction, access instruction and interface instruction;The access instruction include read instruction and
Store instruction, the interface instruction include output order and input instruction;Synchronizing for the coarseness multicore computing system is dynamic
The memory space of state random access memory SDRAM is divided into physical register region, top-level task instruction area, bottom function command
Region and data storage areas;The two-stage programmed method is to sequentially include the following steps:
Step 1: algorithm to be mapped is analyzed, it, will be described according to the type and quantity of integrated computation cluster in multicore computing system
Algorithm to be mapped splits into several calculating tasks;
Step 2: the granularity of several calculating tasks being matched with coarseness multicore computing system, is appointed if calculated
The granularity of business is greater than the capacity of physical register in the coarseness multicore computing system, then will according to the capacity of physical register
Calculating task splits into the sub- calculating task that several granularities are less than the physical register capacity;
Step 3: Algorithm mapping being carried out to the sub- calculating task of any one function, is write in corresponding sub- calculating task
The configuration information of each functional unit used, and by the configuration information of each functional unit sequence line up after, use
Occupy-place coding line is filled the configuration information after sequence, so that the length of configuration information meets the coarseness multicore and calculates
The integer multiple of the synchronous DRAM SDRAM burst-length of system, to form a kind of sub- calculating task of function
Bottom function command;
Step 4: the top-level task instruction box of the sub- calculating task of any one function is generated using the shell script write
Frame, top-level task instruction frame includes: task type, the type and quantity of arithmetic element, assignment instructions Data entries
With data export volume, bottom function command length;
Step 5: when the bottom function command and top-level task of the sub- calculating task of all different function have instructed frame all
Warp knit writes complete, then enters step 6;Otherwise step 3 is returned to;
Step 6: the bottom function command of the sub- calculating task of all different function being sequentially connected end to end, and in connection
The bottom function command template of the interface instruction that the shell script provides and access instruction is added in end, then extracts each not
The initial address of the bottom function command of congenerous, and be incorporated into the top-level task instruction frame;
Step 7: by the calculating task of the algorithm to be mapped split in step 1, operational order, jump instruction, branch being referred to
The top-level task instruction of order is lined up by the computation sequence of algorithm to be mapped, obtains sub- calculating task sequence;To realize fortune
Calculate function and process control;
Step 8: described in being added before needing to use operational order of the data as source data in the data storage area
Instruction is read, the store instruction is added after needing to save operational order of the operation result to the data storage area;
Step 9: general register modification instruction being added before needing to recycle the sub- calculating task sequence executed, in sub- calculating
Jump instruction is added after task sequence, obtains subtask cyclic sequence, is used for control loop variable and program pointer PC, to realize
Circulatory function;
Step 10: output order is added after the subtask cyclic sequence, for exporting operation result;
Step 11: filling system register field in top-level task instruction, obtain calculating task sequence;
Step 12: input instruction being added before the calculating task sequence, for inputting source data, to realize to be mapped
The mapping process of algorithm.
Compared with prior art, advantageous effects of the invention are embodied in:
1, the introducing of two-stage programmer of the present invention and its calculation method, by the way that assignment instructions are divided into top-level task instruction
With the Type division of bottom function command and assignment instructions, the flexibility of multicore computing system programming is improved, is reduced more
The programming difficulty of core computing system facilitates Algorithm mapping of the programmer in multicore computing system, substantially increases algorithm and reflect
Efficiency is penetrated, alleviates the influence of " programming wall " to a certain extent.
If 2, the present invention is by including dry system register and general register in programmer, and coarseness is calculated
As physical register, various registers are logical in a part of Coutinuous store space of system synchronous DRAM SDRAM
The method that register number distinguishes is crossed, makes it possible the recycling of underlying programs, avoids programmer to realize
The circulate operation of program and multiple copies bottom function command also avoid programmer and modify in magnanimity bottom function command to follow
The numerical value for needing to change in ring body substantially increases programming efficiency, while reducing the probability of program error, improves journey
The accuracy that sequence is write.In addition, a part of Coutinuous store of coarseness computing system synchronous DRAM SDRAM is empty
Between introducing as physical register also for register renaming technology provide possibility, to eliminate the spurious correlation of data, tie up
It protects THE Truth Of The Data to close, further increases the emission effciency of system, improve the utilization rate for calculating power.
3, the assignment instructions of coarseness computing system of the present invention are divided into top-level task instruction and bottom function command two-stage, side
Multiplexing of the programmer to underlying programs, avoid programmer and repeatedly write the underlying programs of identical function, to improve
Programmer shortens programming time to the mapping efficiency of algorithm;By top-level task instruction, there are many types, calculate system for multicore
The Out-of-order execution of system provides possibility, when encountering interface instruction, is stored in SDRAM since data must first go through store instruction
Data storage areas, so that data is realized sequence and arrange, it is uncertain due to program execution order when avoiding Out-of-order execution
Property caused by random ordering export, and be unlikely to because reducing the working efficiency of whole system using instruction obstruction mode;Memory access refers to
It enables and the setting of interface instruction also provides possibility for the branch prediction of master controller, due to the interface unit in functional unit
Operation have irrevocable property can not cancel the output of interface unit when branch misprediction, thus will lead to mistake it is defeated
Out, and store instruction such as only has just will be updated the numerical value of data storage areas in SDRAM after being submitted at the tasks, be stored in number
It can guarantee its correctness according to the value of storage region, therefore the execution of interface instruction not will lead to wrong output, it is possible to
Branch prediction mechanism is introduced, the computational efficiency of multicore computing system is further improved.
4, in the programming of two-stage programmer of the present invention, bottom function command is write for realizing specific single meter
Data processing operation process in calculation task, top-level task instruction are write for safeguarding that the transmitting of the data between calculating task is closed
System;It is extracted bottom function command as the data processing method of similar function, makes itself and specific pending data solution
Coupling, while using top-level procedure as the calling to bottom function command, programmer is reduced to the difficulty of program maintenance;Due to top
Layer assignment instructions and bottom function command separation, programmer only need to guarantee respectively when writing program bottom function command and
The correctness of top-level task instruction can also check respectively for bottom function command and top layer is appointed when encountering mistake and needing to debug
The correctness of business instruction, facilitates location of mistake and error exception, greatly reduces the debugging difficulty of program, greatly improve
Programming efficiency.
Detailed description of the invention
Fig. 1 for the present invention towards coarseness multicore computing system structure chart;
Fig. 2 is the memory allocation and physical register structural scheme of mechanism of SDRAM of the present invention;
Fig. 3 is the top-level task order structure template of operational order of the present invention;
Fig. 4 is the top-level task order structure template of jump instruction of the present invention;
Fig. 5 is the top-level task order structure template of register modifying instruction of the present invention;
Fig. 6 is the top-level task order structure template of branch instruction of the present invention;
Fig. 7 is the top-level task order structure template of access instruction of the present invention;
Fig. 8 is the top-level task order structure template of interface instruction of the present invention;
Fig. 9 is the data orga-nizational format schematic diagram of bottom function command of the present invention;
Figure 10 is the form schematic diagram of subtask cyclic sequence of the present invention;
Figure 11 is the bottom function command generated when writing example task program using the present invention and interface instruction template;
Figure 12 is finally obtained bottom function command when writing example task program using the present invention;
Figure 13 is the step schematic diagram that top-level task instruction is write when writing example task program using the present invention.
Specific embodiment
This example implementation in, a kind of two-stage programmer for coarseness multicore computing system, be for write towards
The assignment instructions of heterogeneous polynuclear computing system, assignment instructions type include: operational order, general register modification instruction, jump
Instruction, branch instruction, access instruction and interface instruction, if assignment instructions are used to operate the dry system deposit in two-stage programmer
Device, several general registers;Various registers are distinguished by register number in two-stage programmer;
The structure of coarseness heterogeneous polynuclear computing system is as shown in Figure 1, include memory MEM, network-on-chip, master controller
MC and other function unit;Master controller is responsible for issuing for assignment instructions, other function unit root under the control of master controller
Various calculating tasks are completed according to the configuration information collaboration received;
Assignment instructions refer in coarseness operating system to complete specific calculation task, are sent to each functional unit
Order comprising action;
Calculating task refers to a series of set for simple datas operation implemented to complete special algorithm step;
As shown in Fig. 2, two-stage programmer is by the synchronous DRAM SDRAM's of coarseness multicore computing system
Memory space is divided into physical register region, top-level task instruction area, bottom function command region and data storage areas;
Memory space of the physical register region as several physical registers, and pass through the number energy of physical register
Respective physical register is mapped in the storage location of SDRAM;
Top-level task instruction area is instructed for Coutinuous store top-level task;
The bottom function command region bottom function command corresponding for the instruction of Coutinuous store top-level task;
Memory of the data storage areas as two-stage programmer, and for saving data;
System register is the register that programmer is used to indicate data flow, does not distribute the physical memory space, but counting
When calculating task execution, system register can be mapped to specific physical register, thus its memory space be mapped to
Physical register capacity is equal;The mapping relations of system register and physical register are not fixed, by computing system
Master controller select free physical registers to be allocated according to the state of present physical register;
Physical register occupies the fixed Coutinuous store space of the memory capacity in SDRAM, is used for coarseness computing system
Data storage between middle calculating task;Include several physical registers in coarseness computing system, and uses physical register number
Different physical registers is distinguished, physical register number is mapped to physical register in the storage location of SDRAM;
When the data volume of calculating task is greater than physical register capacity corresponding to system register, need the calculating
Task is divided into two or more pieces assignment instructions, occupies multiple system registers, keeps data volume in each assignment instructions all little
The physical register capacity corresponding to system register completes the calculating task by a plurality of assignment instructions jointly;
When Data entries of the system register as assignment instructions, functional unit is current corresponding from system register
Data are read in physical register, the source data as task;When data outlet of the system register as assignment instructions, function
Energy unit is stored in operation result into the current corresponding physical register of system register, and waiting is made by calculating task thereafter
With;
General register refer to it is in master controller, its numerical value can be grasped by assignment instructions by programmer
The register of work, general register can be used as cyclic variable use, can also compile in assignment instructions as system register
Number use;When duty cycle executes, master controller reads volume of the current value as system register in the general register
Number, the system register number with the variation of general register numerical value, as assignment instructions data outlet and Data entries
Change therewith, to realize the recycling of assignment instructions;
Operational order is sent in corresponding arithmetic element and is transported for reading data from several physical registers
After calculation, operation result is stored in other several physical registers, operational order includes: the instruction of operation top-level task and operation bottom
Layer function instruction;
As shown in figure 3, the instruction of operation top-level task includes: mission number, task type, the Data entries of task, data
Outlet, memory channel quantity, arithmetic element type and quantity, the storage address of corresponding operation bottom function command and operation bottom
The length information of layer function instruction;The instruction of operation underlying task includes the configuration information of physical register and arithmetic element;
General register modification instructs the general register number for modifying master controller in coarseness multicore computing system
Value;As shown in figure 4, general register modification instruction is only instructed comprising register top-level task, register top-level task instruction packet
Containing task type and general register modification information;
Jump instruction is closed according to the size between two general register numerical value or between general register and default value
System carries out skip operation to program pointer, so that program pointer is jumped to preset position and execute operation;Jump instruction only includes to jump
Turn top-level task instruction;Compare as shown in figure 5, jumping top-level task instruction comprising task type, program pointer offset, participation
General register or default value information;
Branch instruction is available to master controller, for inquiring the calculated result of operational order and deciding whether to refer to program
Needle carries out skip operation;The instruction of branch instruction Jin Bao branch's top-level task;As shown in fig. 6, the instruction of branch's top-level task is comprising appointing
Service type, program pointer offset, mission number, program pointer and general register numerical value storage zone;
Access instruction includes to read instruction and store instruction, and be respectively used to the data of data storage areas in SDRAM
Physical register is read to neutralize the data storage in physical register into data storage areas;Access instruction includes memory access
Top-level task instruction and memory access bottom function command, as shown in fig. 7, the instruction of memory access top-level task is posted by task type, system
Storage number and storage address are constituted;Memory access bottom function command includes the configuration information of physical register;
Interface instruction include output order and input instruction, and for by the data of data storage areas in SDRAM with
Position machine or other data processing chips carry out data exchange;Interface instruction includes the instruction of interface top-level task and interface bottom function
Instruction, as shown in figure 8, the instruction of interface top-level task includes task type, system register number and interface type;Bottom function
Instruction includes the configuration information of physical register and the configuration information of interface unit;
It is respectively completed different operation tasks with different types of assignment instructions, and by the combination of all kinds of assignment instructions,
It is common realize coarseness multicore computing system towards Heterogeneous Computing.
In this example implementation, a kind of calculation method for coarseness multicore computing system, is for writing towards isomery
The assignment instructions of multicore computing system, assignment instructions are divided into top-level task instruction and bottom function command;Top-level task instructs then
For safeguarding the data transitive relation between calculating task;Bottom function command is used to instruct in a calculating task corresponding
Functional unit executes specific data processing operation;Assignment instructions type include: operational order, general register modification instruction,
Jump instruction, branch instruction, access instruction and interface instruction;Access instruction includes to read instruction and store instruction, interface instruction
Include output order and input instruction;The storage of the synchronous DRAM SDRAM of coarseness multicore computing system is empty
Between be divided into physical register region, top-level task instruction area, bottom function command region and data storage areas;Calculating side
Method is to sequentially include the following steps:
Step 1: algorithm to be mapped is analyzed, it, will be wait reflect according to the type and quantity of integrated computation cluster in multicore computing system
It penetrates algorithm and splits into several calculating tasks;
Step 2: the granularity of several calculating tasks is matched with coarseness multicore computing system, if calculating task
Granularity is greater than the capacity of physical register in coarseness multicore computing system, then according to the capacity of physical register by calculating task
Split into the sub- calculating task that several granularities are less than physical register capacity;
Step 3: Algorithm mapping being carried out to the sub- calculating task of any one function, writes in corresponding sub- calculating task and uses
Each functional unit configuration information, and by the configuration information of each functional unit sequence line up after, use occupy-place
Coding line is filled the configuration information after sequence, so that the length of configuration information meets the same of coarseness multicore computing system
The integer multiple for walking dynamic RAM SDRAM burst-length, to form a kind of bottom function of the sub- calculating task of function
It can instruct, form structure as shown in Figure 9;
Step 4: the top-level task instruction box of the sub- calculating task of any one function is generated using the shell script write
Frame, top-level task instruction frame includes: task type, the type and quantity of arithmetic element, the Data entries sum number of assignment instructions
According to export volume, bottom function command length;
Step 5: when the bottom function command and top-level task of the sub- calculating task of all different function have instructed frame all
Warp knit writes complete, then enters step 6;Otherwise step 3 is returned to;
Step 6: by the successively first connection of the bottom function command of the sub- calculating task of all different function, and at the end of connection
The bottom function command template of the interface instruction that shell script provides and access instruction is added in end, then extracts each different function
Bottom function command initial address, and be incorporated into top-level task instruction frame in;
Step 7: by the calculating task of the algorithm to be mapped split in step 1, operational order, jump instruction, branch being referred to
The top-level task instruction of order is lined up by the computation sequence of algorithm to be mapped, obtains sub- calculating task sequence;To realize fortune
Calculate function and process control;
Step 8: it is added before needing to use operational order of the data in data storage area as source data and reads instruction,
Store instruction is added after needing to save operational order of the operation result to data storage area;If there are data reductions in algorithm
Instruction, such as adds up, multiplies accumulating operation, can be instructed by being embedded in a pair of of store instruction and reading, carry out the operation grain of maintenance system
Degree, so that the data granularity of calculating task be made to be suitble to the coarseness multicore computing system, it is more preferably to play the coarseness
The computing capability of core computing system;
Step 9: general register modification instruction being added before needing to recycle the sub- calculating task sequence executed, in sub- calculating
Jump instruction is added after task sequence, obtains subtask cyclic sequence, is used for control loop variable and program pointer PC, to realize
Circulatory function, as shown in Figure 10;
Step 10: output order is added after the cyclic sequence of subtask, for exporting operation result;
Step 11: filling system register field in top-level task instruction, obtain calculating task sequence;
Step 12: input instruction being added before calculating task sequence, for inputting source data, to realize algorithm to be mapped
Mapping process.
For this sentences the following simple computation task of completion, programming process is illustratively sketched:
Calculating task: array A data volume 32M, array B data amount 32M, array C data amount 2M form one in order
The data block of 66M data volume;Every 16 numbers carry out multiplying accumulating operation array A and array B in order, obtain 2M intermediate operations knot
Fruit, then add operation is sequentially done with array C, 2M operation result is obtained, finally exports operation result.
The algorithm model of the calculating task is given below:
When progress assignment instructions are write, bottom function command is write first, includes two parts calculating process in this example,
It can be completed by the reconfigurable arithmetic unit RCU in functional node in goal systems, operation is respectively at 16 points and multiplies accumulating and addition,
Its configuration information is generated respectively, and obtains the process template of interface instruction, as shown in figure 11;
The configuration information obtained is sequentially connected with, bottom function command is formed, structure is as shown in figure 12;
As shown in figure 13, it is gradually completing writing for top-level task instruction, the subgraph in figure respectively corresponds the step of the foregoing description
Rapid 7 arrive step 12;Wherein RCU (T, A, B ...) is represented executes operation in the arithmetic element that certain one kind is referred to as RCU, and T indicates behaviour
The type of work, A, B ... indicate data outlet and Data entries;LOAD (A, B) represent by memory area using B as starting point
Reading data at location is into system register A;STORE (A, B) is represented the data storage in system register A to memory
In region using B as initial address at;GREG (A=x, B=y ...), which is represented, is revised as x for the numerical value of general register A, will be general
The numerical value of register B is revised as y ...;JUMP (A, B) is represented when cycle counter does not reach the numerical value of A, and program pointer is jumped
Turn B;IN (A, B) is directed toward the data that the position input data amount of memory areas domain addresses A is B;OUT (A, B) refers to from memory
The position output data quantity of regional address A is the data of B.
Claims (2)
1. a kind of two-stage programmer for coarseness multicore computing system, it is characterized in that by writing towards based on heterogeneous polynuclear
The assignment instructions of calculation system, the assignment instructions type include: operational order, general register modification instruction, jump instruction, divide
Zhi Zhiling, access instruction and interface instruction, if the dry system that the assignment instructions are used to operate in the two-stage programmer is posted
Storage, several general registers;Various registers are distinguished by register number in the two-stage programmer;
The two-stage programmer is empty by the storage of the synchronous DRAM SDRAM of the coarseness multicore computing system
Between be divided into physical register region, top-level task instruction area, bottom function command region and data storage areas;
Memory space of the physical register region as several physical registers, and pass through the volume of the physical register
Number respective physical register can be mapped in the storage location of SDRAM;
The top-level task instruction area is for the instruction of top-level task described in Coutinuous store;
The bottom function command region bottom function command corresponding for the instruction of top-level task described in Coutinuous store;
Memory of the data storage areas as the two-stage programmer, and for saving data;
The operational order is sent in corresponding arithmetic element and is transported for reading data from several physical registers
After calculation, operation result is stored in other several physical registers, the operational order includes: the instruction of operation top-level task and fortune
Calculate bottom function command;
The operation top-level task instruction includes: mission number, task type, the Data entries of task, data outlet, storage are logical
Road quantity, arithmetic element type and quantity, the storage address of corresponding operation bottom function command and operation bottom function command
Length information;The operation underlying task instruction includes the configuration information of physical register and arithmetic element;
The general register modification instructs the general deposit for modifying master controller in the coarseness multicore computing system
Device numerical value;The general register modification instruction is only instructed comprising register top-level task, the register top-level task instruction
Include task type and general register modification information;
The jump instruction is closed according to the size between two general register numerical value or between general register and default value
System carries out skip operation to program pointer, so that program pointer is jumped to preset position and execute operation;The jump instruction is only wrapped
It is instructed containing top-level task is jumped;It is described jump top-level task instruction comprising task type, program pointer offset, participate in comparing
General register or default value information;
The branch instruction is available to master controller, for inquiring the calculated result of operational order and deciding whether to refer to program
Needle carries out skip operation;The instruction of branch instruction Jin Bao branch's top-level task;Branch's top-level task instruction includes task
Type, program pointer offset, mission number, program pointer and general register numerical value storage zone;
The access instruction includes to read instruction and store instruction, and be respectively used to the data of data storage areas in SDRAM
Physical register is read to neutralize the data storage in physical register into data storage areas;The access instruction includes
The instruction of memory access top-level task and memory access bottom function command, the memory access top-level task instruction are deposited by task type, system
Device number and storage address are constituted;The memory access bottom function command includes the configuration information of physical register;
The interface instruction include output order and input instruction, and for by the data of data storage areas in SDRAM with
Position machine or other data processing chips carry out data exchange;The interface instruction includes the instruction of interface top-level task and interface bottom
Function command, the interface top-level task instruction includes task type, system register number and interface type;The bottom function
Can instruct includes the configuration information of physical register and the configuration information of interface unit;
Different operation tasks is respectively completed with different types of assignment instructions, and by the combination of all kinds of assignment instructions, jointly
Realize the coarseness multicore computing system towards Heterogeneous Computing.
2. a kind of calculation method for coarseness multicore computing system, it is characterized in that being for writing to calculate towards heterogeneous polynuclear
The assignment instructions of system, the assignment instructions are divided into top-level task instruction and bottom function command;The top-level task instruction is then used
Data transitive relation between maintenance calculating task;The bottom function command is used to instruct in a calculating task corresponding
Functional unit execute specific data processing operation;The assignment instructions type includes: operational order, general register modification
Instruction, jump instruction, branch instruction, access instruction and interface instruction;The access instruction includes to read instruction and store instruction,
The interface instruction includes output order and input instruction;The synchronous dynamic random of the coarseness multicore computing system is stored
The memory space of device SDRAM is divided into physical register region, top-level task instruction area, bottom function command region and data
Storage region;The calculation method is to sequentially include the following steps:
Step 1: algorithm to be mapped is analyzed, according to the type and quantity of integrated computation cluster in multicore computing system, by described wait reflect
It penetrates algorithm and splits into several calculating tasks;
Step 2: the granularity of several calculating tasks is matched with coarseness multicore computing system, if calculating task
Granularity is greater than the capacity of physical register in the coarseness multicore computing system, then will be calculated according to the capacity of physical register
Task splits into the sub- calculating task that several granularities are less than the physical register capacity;
Step 3: Algorithm mapping being carried out to the sub- calculating task of any one function, writes in corresponding sub- calculating task and uses
Each functional unit configuration information, and by the configuration information of each functional unit sequence line up after, use occupy-place
Coding line is filled the configuration information after sequence, so that the length of configuration information meets the coarseness multicore computing system
Synchronous DRAM SDRAM burst-length integer multiple, to form a kind of bottom of the sub- calculating task of function
Layer function instruction;
Step 4: instructing frame, institute using the top-level task that the shell script write generates the sub- calculating task of any one function
Stating top-level task instruction frame includes: task type, the type and quantity of arithmetic element, the Data entries of assignment instructions and data
Export volume, bottom function command length;
Step 5: when bottom function command and top-level task instruction frame all warp knits of the sub- calculating task of all different function
It writes complete, then enters step 6;Otherwise step 3 is returned to;
Step 6: the bottom function command of the sub- calculating task of all different function being sequentially connected end to end, and in the end of connection
The bottom function command template of the interface instruction that the shell script provides and access instruction is added, then extracts each different function
The initial address of the bottom function command of energy, and be incorporated into the top-level task instruction frame;
Step 7: by the calculating task of the algorithm to be mapped split in step 1, by operational order, jump instruction, branch instruction
Top-level task instruction is lined up by the computation sequence of algorithm to be mapped, obtains sub- calculating task sequence;To realize operation function
Energy and process control;
Step 8: the reading is added before needing to use operational order of the data as source data in the data storage area
Instruction, is added the store instruction after needing to save operational order of the operation result to the data storage area;
Step 9: general register modification instruction being added before needing to recycle the sub- calculating task sequence executed, in sub- calculating task
Jump instruction is added after sequence, obtains subtask cyclic sequence, is used for control loop variable and program pointer PC, to realize circulation
Function;
Step 10: output order is added after the subtask cyclic sequence, for exporting operation result;
Step 11: filling system register field in top-level task instruction, obtain calculating task sequence;
Step 12: input instruction being added before the calculating task sequence, for inputting source data, to realize algorithm to be mapped
Mapping process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610645202.0A CN106293736B (en) | 2016-08-08 | 2016-08-08 | Two-stage programmer and its calculation method for coarseness multicore computing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610645202.0A CN106293736B (en) | 2016-08-08 | 2016-08-08 | Two-stage programmer and its calculation method for coarseness multicore computing system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106293736A CN106293736A (en) | 2017-01-04 |
CN106293736B true CN106293736B (en) | 2019-05-31 |
Family
ID=57667132
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610645202.0A Active CN106293736B (en) | 2016-08-08 | 2016-08-08 | Two-stage programmer and its calculation method for coarseness multicore computing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106293736B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108563446B (en) * | 2018-03-30 | 2021-06-25 | 北京中科睿芯智能计算产业研究院有限公司 | Data multiplexing and synchronizing method and device in coarse-grained data stream architecture |
CN108897714B (en) * | 2018-07-03 | 2022-05-24 | 中国人民解放军国防科技大学 | Multi-core or many-core processor chip with autonomous region |
CN110968404B (en) * | 2018-09-30 | 2023-04-28 | 阿里巴巴集团控股有限公司 | Equipment data processing method and device |
CN111124626A (en) * | 2018-11-01 | 2020-05-08 | 北京灵汐科技有限公司 | Many-core system and data processing method and processing device thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7610433B2 (en) * | 2004-02-05 | 2009-10-27 | Research In Motion Limited | Memory controller interface |
CN102073481A (en) * | 2011-01-14 | 2011-05-25 | 上海交通大学 | Multi-kernel DSP reconfigurable special integrated circuit system |
CN103049245A (en) * | 2012-10-25 | 2013-04-17 | 浪潮电子信息产业股份有限公司 | Software performance optimization method based on central processing unit (CPU) multi-core platform |
CN104572109A (en) * | 2015-01-19 | 2015-04-29 | 上海交通大学 | Two-stage partitioned two-time polycondensation parallel computing system development method and parallel computing system |
-
2016
- 2016-08-08 CN CN201610645202.0A patent/CN106293736B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7610433B2 (en) * | 2004-02-05 | 2009-10-27 | Research In Motion Limited | Memory controller interface |
CN102073481A (en) * | 2011-01-14 | 2011-05-25 | 上海交通大学 | Multi-kernel DSP reconfigurable special integrated circuit system |
CN103049245A (en) * | 2012-10-25 | 2013-04-17 | 浪潮电子信息产业股份有限公司 | Software performance optimization method based on central processing unit (CPU) multi-core platform |
CN104572109A (en) * | 2015-01-19 | 2015-04-29 | 上海交通大学 | Two-stage partitioned two-time polycondensation parallel computing system development method and parallel computing system |
Also Published As
Publication number | Publication date |
---|---|
CN106293736A (en) | 2017-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3757901A1 (en) | Schedule-aware tensor distribution module | |
CN102902512B (en) | A kind of multi-threading parallel process method based on multi-thread programming and message queue | |
CN106293736B (en) | Two-stage programmer and its calculation method for coarseness multicore computing system | |
US8782645B2 (en) | Automatic load balancing for heterogeneous cores | |
CN110121702B (en) | Processor, method of operating the processor, and readable storage medium | |
CN102023844B (en) | Parallel processor and thread processing method thereof | |
US20080250227A1 (en) | General Purpose Multiprocessor Programming Apparatus And Method | |
CN102708090B (en) | Verification method for shared storage multicore multithreading processor hardware lock | |
US8997071B2 (en) | Optimized division of work among processors in a heterogeneous processing system | |
WO2013184380A2 (en) | Systems and methods for efficient scheduling of concurrent applications in multithreaded processors | |
Giorgi et al. | An introduction to DF-Threads and their execution model | |
CN103885751A (en) | System and method for allocating memory of differing properties to shared data objects | |
US20120331278A1 (en) | Branch removal by data shuffling | |
CN102708009B (en) | Method for sharing GPU (graphics processing unit) by multiple tasks based on CUDA (compute unified device architecture) | |
CN104375805A (en) | Method for simulating parallel computation process of reconfigurable processor through multi-core processor | |
CN104317770B (en) | Data store organisation for many-core processing system and data access method | |
KR20090089327A (en) | Method and system for parallization of pipelined computations | |
US20240086359A1 (en) | Dynamic allocation of arithmetic logic units for vectorized operations | |
Holk et al. | Declarative parallel programming for GPUs | |
CN104346132A (en) | Control device applied to running of intelligent card virtual machine and intelligent card virtual machine | |
US8914779B2 (en) | Data placement for execution of an executable | |
US20230315479A1 (en) | Method and system for supporting throughput-oriented computing | |
Tarakji et al. | The development of a scheduling system GPUSched for graphics processing units | |
CN105893660B (en) | A kind of CPU design method and computing system towards symbol BDD operations | |
Garcia et al. | A dynamic schema to increase performance in many-core architectures through percolation operations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |