CN105487838A - Task-level parallel scheduling method and system for dynamically reconfigurable processor - Google Patents

Task-level parallel scheduling method and system for dynamically reconfigurable processor Download PDF

Info

Publication number
CN105487838A
CN105487838A CN201510817591.6A CN201510817591A CN105487838A CN 105487838 A CN105487838 A CN 105487838A CN 201510817591 A CN201510817591 A CN 201510817591A CN 105487838 A CN105487838 A CN 105487838A
Authority
CN
China
Prior art keywords
processing unit
reconfigurable processing
reconfigurable
task
master controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510817591.6A
Other languages
Chinese (zh)
Other versions
CN105487838B (en
Inventor
田丰硕
赵仲元
绳伟光
何卫锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201510817591.6A priority Critical patent/CN105487838B/en
Publication of CN105487838A publication Critical patent/CN105487838A/en
Application granted granted Critical
Publication of CN105487838B publication Critical patent/CN105487838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution

Abstract

The invention proposes a task-level parallel scheduling method and system for a dynamically reconfigurable processor. The system comprises a main controller, a plurality of reconfigurable processing units, a main memory, a direct memory access device and a system bus, wherein each reconfigurable processing unit consists of a co-controller, a plurality of reconfigurable processing element arrays in charge of reconfigurable calculation and a plurality of shared memories used for data storage; the reconfigurable processing element arrays and the shared memories are adjacently arranged; and the shared memories can be read and written by the two connected reconfigurable processing element arrays around. According to the task-level parallel scheduling method and system for the dynamically reconfigurable processor, different scheduling modes can be executed for different tasks by adjusting the scheduling method, and basically all parallel tasks can be well accelerated in parallel in the reconfigurable processor.

Description

A kind of task-level parallelism dispatching method of dynamic reconfigurable processor and system
Technical field
The present invention relates to computer realm, and in particular to a kind of task-level parallelism dispatching method of dynamic reconfigurable processor and system.
Background technology
In processor computation schema in the past, be usually divided into following two classes.Traditional general-purpose computations based on von Neumann processor has extremely strong dirigibility, but executive mode, limited arithmetic element and memory bandwidth that its instruction stream drives make its overall performance and power consumption unsatisfactory.Dedicated computing can for specific optimizing application structure and circuit, and without the need to instruction set, its execution speed is fast, low in energy consumption.But special-purpose computing system also exists fatal defect, dirigibility and extendability very poor, the more complicated application of constantly developing often has not been come by simple expansion.For different application, must design different special-purpose computing systems, therefore the design of hardware often cannot catch up with the renewal speed of application.Meanwhile, the design cycle of special-purpose computing system is long, and disposable engineering input cost is too high.A kind of account form that the dirigibility of software and the high efficiency of hardware are combined that Reconfigurable Computation occurs just under this background, Reconfigurable Computing Technology combines the advantage of general processor and ASIC, and the high-level efficiency of hardware can either be provided to have again the programmability of software.It obtains better balance between the key indexs such as performance, power consumption and dirigibility, has filled up the blank between general-purpose computations and dedicated computing.
Processor main flow in present stage computing machine has the multi-core CPU of 2 ~ 8 cores and many core GPU, and this design also makes parallel processing become heat subject, and parallel algorithm and multiple programming also become the content that programmer must understand and grasp.In June, 2007, NVIDIA is proposed CUDA, and this is a kind of using the software and hardware architecture of GPU as data parallel equipment.CUDA becomes model using CPU as main frame, and the CUDA parallel computation function that GPU wherein operates on GPU as coprocessor is called as kernal (kernel).Kenerl function is not a complete program, but the step that can be executed in parallel in whole CUDA program.Well like this CPU and GPU is divided the work thus realizes various levels of parallel.
But, for the present invention for the general reconfigurable processor parallel processing specification that but neither one is unified.A general processor and one or more reconfigurable processing unit (ReconfigurableProcessingUnit is contained in typical reconfigurable processor framework, RPU), how on this coarseness reconfigurable processor, task matching is carried out for multitask application, the present invention proposes a set of RPU and distribute and method for scheduling task and system.
Summary of the invention
The present invention proposes a kind of task-level parallelism dispatching method and system of dynamic reconfigurable processor, different scheduling modes can be carried out for different tasks, the parallel accelerate that substantially all parallel tasks all can obtain on this reconfigurable processor by regulating dispatching method.
In order to achieve the above object, the present invention proposes a kind of task-level parallelism dispatching system of dynamic reconfigurable processor, comprises master controller, multiple reconfigurable processing unit, primary memory, direct memory access and system bus,
Wherein, described each reconfigurable processing unit forms by assisting the reconfigurable processing unit array of controller, multiple responsible Reconfigurable Computation and multiple shared storage for data storage, wherein said reconfigurable processing unit array and shared storage arranged adjacent, described shared storage can read and write by two reconfigurable processing unit arrays being around connected.
Further, described master controller is used for the serial code being not suitable for reconfigurable processing unit process in executive routine, and is responsible for scheduling, the start-up and functionning of multiple reconfigurable processing unit.
Further, what described reconfigurable processing unit was responsible for computation-intensive in calculation procedure can parallel codes.
Further, described association controller for carry multiple reconfigurable processing unit array calculate needed for data and configuration information, control the startup of multiple reconfigurable processing unit array, run with termination.
In order to achieve the above object, the present invention also proposes a kind of task-level parallelism dispatching method of dynamic reconfigurable processor, comprises the following steps:
Computation-intensive in application program parallel codes can be encapsulated as kernel function;
Serial section code and parallel section code are compiled respectively the executable code generating and be applicable to master controller and reconfigurable processing unit;
Described master controller performs serial section code;
When performing kernel function partial code, described master controller carries out kernel function partial code described in dispatching distribution process to reconfigurable processing unit.
Further, described master controller carries out scheduling to reconfigurable processing unit and is divided into synchronization call and asynchronous call two kinds of parallel modes:
If synchronization call, the reconfigurable processing unit be not in operation then is found by master controller, and be loaded into executable code and configuration information, meanwhile, master controller is hung up, in synchronization call, called by multiple reconfigurable processing unit, the content of process different pieces of information block, after all reconfigurable processing unit process terminate simultaneously, carry out update process result by synchronous function rreturn value, and the serial code continuing master controller performs;
If asynchronous call, the reconfigurable processing unit be not in operation then is found by master controller, meanwhile under the prerequisite of not interrupting master controller, executable code and configuration information are loaded into, start reconfigurable processing unit, master controller continues to move to when needing reconfigurable processing unit return data, then stops to wait for that reconfigurable processing unit has calculated also return data.
Further, when described kernel function instruction is less, when a reconfigurable processing unit can complete separately the calculation task of whole kernel function, the configuration information that multiple reconfigurable processing unit array executed in parallel of its inside are identical, the data that each reconfigurable processing unit array is responsible for the shared storage of oneself calculate.
Further, when described kernel function instruction is more, when disposablely can not execute all statements of kernel function, kernel function is divided into the subtask that multiple length is identical, respectively the configuration information of subtask is distributed to multiple reconfigurable processing unit array in order, because each reconfigurable processing unit array can read and write adjacent upper strata shared storage and lower floor's shared storage, by the A block of the sizes such as each shared storage is divided into and B block, in task fluvial processes, each reconfigurable processing unit array first reads data from the A block of upper strata shared storage, and result is write the B block of lower floor's shared storage, after process terminates, each reconfigurable processing unit array reads data from the B block of upper strata shared storage again, and result is write the A block of lower floor's shared storage, while these two processes, the carrying of data to main memory is carried out to the part that first and last shared storage does not participate in calculating.
Compared with prior art, technique scheme comprises following innovative point and beneficial effect (advantage):
1, task-level parallelism dispatching method of the present invention carries out designing towards the isomery coarseness reconfigurable processor of specific tri-layer realizing, the method of the data-intensive of application program with the part kernel function of computation-intensive is packed out, master controller is responsible for the process of serial code and the distribution of reconfigurable processing unit, the reconfigurable processing unit that kernel function transfers to computation capability stronger processes, and carries out computing in reconfigurable processing unit at flexible allocation reconfigurable arrays wherein.The computation capability of multi-level isomery coarseness reconfigurable processor can be fully played in this way, then coordinate specific compiler can the application program of abundant parallel accelerate computation-intensive.
2, the present invention is on the basis of GPU concurrent operation instrument CUDA, has been transplanted to by its Method of Scheduling Parallel on multi-level isomery coarseness reconfigurable processor, and has proposed the scheduling mode of new streamlined, extended the dispatching method to variety classes task.
3, present invention achieves the process of the many reconfigurable processing unit scheduling of multitask, different scheduling modes can be carried out for different tasks by regulating dispatching method, avoid the requirement of single scheduling mode to task, the parallel accelerate that substantially all parallel tasks all can obtain on this reconfigurable processor.
Accompanying drawing explanation
Figure 1 shows that the task-level parallelism dispatching system structural representation of the dynamic reconfigurable processor of present pre-ferred embodiments.
Figure 2 shows that the task-level parallelism dispatching method process flow diagram of the dynamic reconfigurable processor of present pre-ferred embodiments.
Fig. 3 and the synchronous dispatching method schematic diagram that Figure 4 shows that when described kernel function instruction is less.
Figure 5 shows that the asynchronous schedule method schematic diagram when described kernel function instruction is more.
Embodiment
Provide the specific embodiment of the present invention below in conjunction with accompanying drawing, but the invention is not restricted to following embodiment.According to the following describes and claims, advantages and features of the invention will be clearer.It should be noted that, accompanying drawing all adopts the form that simplifies very much and all uses non-ratio accurately, only for object that is convenient, the aid illustration embodiment of the present invention lucidly.
Please refer to Fig. 1, Figure 1 shows that the task-level parallelism dispatching system structural representation of the dynamic reconfigurable processor of present pre-ferred embodiments, wherein the enlarged drawing of RPU1 on square frame limit, right side, to show its inside function structure.The present invention proposes a kind of task-level parallelism dispatching system of dynamic reconfigurable processor, comprise master controller ARM11, multiple reconfigurable processing unit (ReconfigurableProcessUnit, RPU), primary memory DDR, direct memory access (DirectMemoryAccess, and system bus AHB DMA), wherein, described each reconfigurable processing unit RPU is by assisting controller ARM7, reconfigurable processing unit array (the ProcessingElementArray of multiple responsible Reconfigurable Computation, PEA) and multiple for data store shared storage (SharedMemory, SM) form, wherein said reconfigurable processing unit array PEA and shared storage SM arranged adjacent, described shared storage SM can read and write by two reconfigurable processing unit array PEA being around connected.
According to present pre-ferred embodiments, described master controller ARM11 is used for the serial code being not suitable for reconfigurable processing unit RPU process in executive routine, and is responsible for scheduling, the start-up and functionning of multiple reconfigurable processing unit RPU.What described reconfigurable processing unit RPU was responsible for computation-intensive in calculation procedure can parallel codes.Further, described association controller ARM7 for carry multiple reconfigurable processing unit array PEA calculate needed for data and configuration information, control the startup of multiple reconfigurable processing unit array PEA, run with termination.
According to present pre-ferred embodiments, each reconfigurable processing unit RPU comprises 4 reconfigurable processing unit array PEA and 4 shared storage SM.
Please refer to Fig. 2, Figure 2 shows that the task-level parallelism dispatching method process flow diagram of the dynamic reconfigurable processor of present pre-ferred embodiments.The present invention also proposes a kind of task-level parallelism dispatching method of dynamic reconfigurable processor, comprises the following steps:
Step S100: the computation-intensive in application program parallel codes can be encapsulated as kernel function;
Step S200: serial section code and parallel section code are compiled respectively the executable code generating and be applicable to master controller and reconfigurable processing unit;
Step S300: described master controller performs serial section code;
Step S400: when performing kernel function partial code, described master controller carries out kernel function partial code described in dispatching distribution process to reconfigurable processing unit.
For general reconfigurable processor, parallel procedure is only the computing of the computation-intensive utilizing the high-speed computation of configurable component to process a large amount of repeatability mostly, and for the heterogeneous reconfigurable processor of the multi-level more complicated that the present invention is directed to, it contains the computing module of three levels, be respectively master controller, association's controller and PEA, between their threes, internal memory is independent, together constitutes the restructural isomery framework of a tri-layer.Carry out in the process processed in application programs, the method for task-level parallelism not only includes the parallel of reconfigurable cell, further comprises distribution and the scheduling of many RPU.
For the C program in machine code of an application, we parallel codes can be encapsulated as kernal (kernel) function by wherein computation-intensive, in one application, can there is the kernal function that multiple function is different with complexity.In compilation process, serial section and parallel section are compiled respectively the executable code generating and be applicable to master controller part and reconfigurable processing unit RPU part.There is the parallel of two levels in kernal function, the code be respectively on the parallel of the synchronous asynchronous between different RPU and inner 4 PEA of RPU walks abreast.
For the application program of a multitask, reconstruction structure when performing, first perform serial section code by master controller, when performing kernal part, can be distributed RPU by the RPU scheduler of master controller, and be divided into synchronization call and asynchronous call two kinds of parallel modes:
If synchronization call, the reconfigurable processing unit be not in operation then is found by master controller, and be loaded into executable code and configuration information, meanwhile, master controller is hung up, in synchronization call, called by multiple reconfigurable processing unit, the content of process different pieces of information block, after all reconfigurable processing unit process terminate simultaneously, carry out update process result by synchronous function rreturn value, and the serial code continuing master controller performs;
If asynchronous call, the reconfigurable processing unit be not in operation then is found by master controller, meanwhile under the prerequisite of not interrupting master controller, executable code and configuration information are loaded into, start reconfigurable processing unit, master controller continues to move to when needing reconfigurable processing unit return data, then stops to wait for that reconfigurable processing unit has calculated also return data.Kernal function large for data volume can be processed by many RPU parallel computation by synchronization call mode, and the different kernal of dependence that do not have can walk abreast by asynchronous call.
After distributing the RPU performing kernal function and dispatch, same in RPU inside exist tasks in parallel, the namely parallel computation of second level.In the hardware structure that the present invention relies on, 4 PEA and corresponding SM are contained in each RPU, wherein SM can read and write by two PEA being around connected, we just can complete the parallel calculating method of two kinds of modes for kernal kind for this framework like this.
Please refer to Fig. 3 and Fig. 4, Fig. 3 and the synchronous dispatching method schematic diagram that Figure 4 shows that when described kernel function instruction is less.When described kernel function instruction is less, when a reconfigurable processing unit can complete separately the calculation task of whole kernel function, the configuration information that multiple reconfigurable processing unit array executed in parallel of its inside are identical, the data that each reconfigurable processing unit array is responsible for the shared storage of oneself calculate, and so just the kernal execution time can be shortened to maximum 1/4th.Wherein first row square frame block1 ~ 4 represent that data are copied to the left-half of SM by association's controller from DDR, circle represents that PEA performs, second row square frame block1 ~ 4 represent that the data calculated are outputted to the right half part of SM by PEA, 3rd row square frame block1 ~ 4 represent that data output to DDR from SM right half part by association's controller, are more than the loop body in the middle of program.
When described kernel function instruction is more, when disposablely can not execute all statements of kernel function, to this situation, we need the mode taking flowing water to carry out tasks in parallel, concrete scheme is kernal function is divided into the subtask of no more than 4, each subtask length is as far as possible identical, respectively the configuration information of subtask is given 4 PEA in order, because each reconfigurable processing unit array can read and write adjacent upper strata shared storage and lower floor's shared storage, by the A block of the sizes such as each shared storage is divided into and B block, in task fluvial processes, each reconfigurable processing unit array first reads data from the A block of upper strata shared storage, and result is write the B block of lower floor's shared storage, after process terminates, each reconfigurable processing unit array reads data from the B block of upper strata shared storage again, and result is write the A block of lower floor's shared storage, while these two processes, the carrying of data to main memory is carried out to the part that first and last shared storage does not participate in calculating.Like this, in whole process, there is no extra data handling time, can be good at unremitting flowing water and perform kernal function.
Please refer to Fig. 5, Figure 5 shows that the asynchronous schedule method schematic diagram when described kernel function instruction is more.As shown in Figure 5, all orders in circulation are divided into several part according to certain method by programmer, in the process of compiler processes GR-C, these orders can be compiled into different PEA configuration packet respectively, by 4 PEA sequence calls, clock period is delayed one-period respectively operating, reach the effect of flowing water, one-period, the data of part A in SM1 are carried out computing by PEA1, result puts into the B block of SM2, the data of part B in SM1 are carried out computing by second period PEA1, the B blocks of data of SM2 is carried out computing by PEA2 simultaneously, put into the A block of SM2 and SM3 respectively, form the process of flowing water by that analogy, after PEA4 has calculated data directly by data copy in main memory, until all data processings are complete.RPU presses inside the complete execution flow process of task division, and wherein block1 ~ block5 represents different data stream, and rightmost side box indicating data output in DDR from SM3.
Although the present invention with preferred embodiment disclose as above, so itself and be not used to limit the present invention.Persond having ordinary knowledge in the technical field of the present invention, without departing from the spirit and scope of the present invention, when being used for a variety of modifications and variations.Therefore, protection scope of the present invention is when being as the criterion depending on those as defined in claim.

Claims (8)

1. a task-level parallelism dispatching system for dynamic reconfigurable processor, is characterized in that, comprises master controller, multiple reconfigurable processing unit, primary memory, direct memory access and system bus,
Wherein, described each reconfigurable processing unit forms by assisting the reconfigurable processing unit array of controller, multiple responsible Reconfigurable Computation and multiple shared storage for data storage, wherein said reconfigurable processing unit array and shared storage arranged adjacent, described shared storage can read and write by two reconfigurable processing unit arrays being around connected.
2. the task-level parallelism dispatching system of dynamic reconfigurable processor according to claim 1, it is characterized in that, described master controller is used for the serial code being not suitable for reconfigurable processing unit process in executive routine, and is responsible for scheduling, the start-up and functionning of multiple reconfigurable processing unit.
3. the task-level parallelism dispatching system of dynamic reconfigurable processor according to claim 1, is characterized in that, what described reconfigurable processing unit was responsible for computation-intensive in calculation procedure can parallel codes.
4. the task-level parallelism dispatching system of dynamic reconfigurable processor according to claim 1, it is characterized in that, described association controller for carry multiple reconfigurable processing unit array calculate needed for data and configuration information, control the startup of multiple reconfigurable processing unit array, run with termination.
5. a task-level parallelism dispatching method for dynamic reconfigurable processor, is characterized in that, comprise the following steps:
Computation-intensive in application program parallel codes can be encapsulated as kernel function;
Serial section code and parallel section code are compiled respectively the executable code generating and be applicable to master controller and reconfigurable processing unit;
Described master controller performs serial section code;
When performing kernel function partial code, described master controller carries out kernel function partial code described in dispatching distribution process to reconfigurable processing unit.
6. the task-level parallelism dispatching method of dynamic reconfigurable processor according to claim 5, is characterized in that, described master controller carries out scheduling to reconfigurable processing unit and is divided into synchronization call and asynchronous call two kinds of parallel modes:
If synchronization call, the reconfigurable processing unit be not in operation then is found by master controller, and be loaded into executable code and configuration information, meanwhile, master controller is hung up, in synchronization call, called by multiple reconfigurable processing unit, the content of process different pieces of information block, after all reconfigurable processing unit process terminate simultaneously, carry out update process result by synchronous function rreturn value, and the serial code continuing master controller performs;
If asynchronous call, the reconfigurable processing unit be not in operation then is found by master controller, meanwhile under the prerequisite of not interrupting master controller, executable code and configuration information are loaded into, start reconfigurable processing unit, master controller continues to move to when needing reconfigurable processing unit return data, then stops to wait for that reconfigurable processing unit has calculated also return data.
7. the task-level parallelism dispatching method of dynamic reconfigurable processor according to claim 5, it is characterized in that, when described kernel function instruction is less, when a reconfigurable processing unit can complete separately the calculation task of whole kernel function, the configuration information that multiple reconfigurable processing unit array executed in parallel of its inside are identical, the data that each reconfigurable processing unit array is responsible for the shared storage of oneself calculate.
8. the task-level parallelism dispatching method of dynamic reconfigurable processor according to claim 5, it is characterized in that, when described kernel function instruction is more, when disposablely can not execute all statements of kernel function, kernel function is divided into the subtask that multiple length is identical, respectively the configuration information of subtask is distributed to multiple reconfigurable processing unit array in order, because each reconfigurable processing unit array can read and write adjacent upper strata shared storage and lower floor's shared storage, by the A block of the sizes such as each shared storage is divided into and B block, in task fluvial processes, each reconfigurable processing unit array first reads data from the A block of upper strata shared storage, and result is write the B block of lower floor's shared storage, after process terminates, each reconfigurable processing unit array reads data from the B block of upper strata shared storage again, and result is write the A block of lower floor's shared storage, while these two processes, the carrying of data to main memory is carried out to the part that first and last shared storage does not participate in calculating.
CN201510817591.6A 2015-11-23 2015-11-23 The task-level parallelism dispatching method and system of a kind of dynamic reconfigurable processor Active CN105487838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510817591.6A CN105487838B (en) 2015-11-23 2015-11-23 The task-level parallelism dispatching method and system of a kind of dynamic reconfigurable processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510817591.6A CN105487838B (en) 2015-11-23 2015-11-23 The task-level parallelism dispatching method and system of a kind of dynamic reconfigurable processor

Publications (2)

Publication Number Publication Date
CN105487838A true CN105487838A (en) 2016-04-13
CN105487838B CN105487838B (en) 2018-01-26

Family

ID=55674840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510817591.6A Active CN105487838B (en) 2015-11-23 2015-11-23 The task-level parallelism dispatching method and system of a kind of dynamic reconfigurable processor

Country Status (1)

Country Link
CN (1) CN105487838B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105867994A (en) * 2016-04-20 2016-08-17 上海交通大学 Instruction scheduling optimization method for coarse-grained reconfigurable architecture complier
CN106095552A (en) * 2016-06-07 2016-11-09 华中科技大学 A kind of Multi-Task Graph processing method based on I/O duplicate removal and system
CN106648883A (en) * 2016-09-14 2017-05-10 上海鲲云信息科技有限公司 FPGA-based dynamic reconfigurable hardware acceleration method and system
CN109672524A (en) * 2018-12-12 2019-04-23 东南大学 SM3 algorithm wheel iteration system and alternative manner based on coarseness reconstruction structure
CN110059050A (en) * 2019-04-28 2019-07-26 北京超维度计算科技有限公司 AI supercomputer based on the restructural elastic calculation of high-performance
CN110096474A (en) * 2019-04-28 2019-08-06 北京超维度计算科技有限公司 A kind of high-performance elastic computing architecture and method based on Reconfigurable Computation
CN110275771A (en) * 2018-03-15 2019-09-24 中国移动通信集团有限公司 A kind of method for processing business, Internet of Things billing infrastructure system and storage medium
CN110765046A (en) * 2019-11-07 2020-02-07 首都师范大学 DMA transmission device and method for dynamically reconfigurable high-speed serial bus
CN111897580A (en) * 2020-09-29 2020-11-06 北京清微智能科技有限公司 Instruction scheduling system and method for reconfigurable array processor
CN111930319A (en) * 2020-09-30 2020-11-13 北京清微智能科技有限公司 Data storage and reading method and system for multi-library memory
CN112256632A (en) * 2020-12-23 2021-01-22 北京清微智能科技有限公司 Instruction distribution method and system in reconfigurable processor
CN112463719A (en) * 2020-12-04 2021-03-09 上海交通大学 In-memory computing method realized based on coarse-grained reconfigurable array
CN112486908A (en) * 2020-12-18 2021-03-12 清华大学 Hierarchical multi-RPU multi-PEA reconfigurable processor
CN112540793A (en) * 2020-12-18 2021-03-23 清华大学 Reconfigurable processing unit array supporting multiple access modes and control method and device
CN112559441A (en) * 2020-12-11 2021-03-26 清华大学无锡应用技术研究院 Control method of digital signal processor
CN112579090A (en) * 2019-09-27 2021-03-30 无锡江南计算技术研究所 Asynchronous parallel I/O programming framework method under heterogeneous many-core architecture
CN113553031A (en) * 2021-06-04 2021-10-26 中国人民解放军战略支援部队信息工程大学 Software definition variable structure computing framework and left-right brain integrated resource joint distribution method realized by using same
CN113568731A (en) * 2021-09-24 2021-10-29 苏州浪潮智能科技有限公司 Task scheduling method, chip and electronic equipment
CN114168526A (en) * 2017-03-14 2022-03-11 珠海市芯动力科技有限公司 Reconfigurable parallel processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984677A (en) * 2014-05-30 2014-08-13 东南大学 Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN104134210A (en) * 2014-07-22 2014-11-05 兰州交通大学 2D-3D medical image parallel registration method based on combination similarity measure
US20140331025A1 (en) * 2013-05-03 2014-11-06 Samsung Electronics Co., Ltd. Reconfigurable processor and operation method thereof
CN105302525A (en) * 2015-10-16 2016-02-03 上海交通大学 Parallel processing method for reconfigurable processor with multilayer heterogeneous structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140331025A1 (en) * 2013-05-03 2014-11-06 Samsung Electronics Co., Ltd. Reconfigurable processor and operation method thereof
CN103984677A (en) * 2014-05-30 2014-08-13 东南大学 Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN104134210A (en) * 2014-07-22 2014-11-05 兰州交通大学 2D-3D medical image parallel registration method based on combination similarity measure
CN105302525A (en) * 2015-10-16 2016-02-03 上海交通大学 Parallel processing method for reconfigurable processor with multilayer heterogeneous structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
楼杰超等: "异构粗粒度可重构处理器的自动任务编译器框架设计", 《微电子学与计算机》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105867994A (en) * 2016-04-20 2016-08-17 上海交通大学 Instruction scheduling optimization method for coarse-grained reconfigurable architecture complier
CN106095552A (en) * 2016-06-07 2016-11-09 华中科技大学 A kind of Multi-Task Graph processing method based on I/O duplicate removal and system
CN106095552B (en) * 2016-06-07 2019-06-28 华中科技大学 A kind of Multi-Task Graph processing method and system based on I/O duplicate removal
CN106648883A (en) * 2016-09-14 2017-05-10 上海鲲云信息科技有限公司 FPGA-based dynamic reconfigurable hardware acceleration method and system
CN106648883B (en) * 2016-09-14 2020-02-04 深圳鲲云信息科技有限公司 Dynamic reconfigurable hardware acceleration method and system based on FPGA
CN114168526B (en) * 2017-03-14 2024-01-12 珠海市芯动力科技有限公司 Reconfigurable parallel processing
CN114168526A (en) * 2017-03-14 2022-03-11 珠海市芯动力科技有限公司 Reconfigurable parallel processing
CN110275771B (en) * 2018-03-15 2021-12-14 中国移动通信集团有限公司 Service processing method, Internet of things charging infrastructure system and storage medium
CN110275771A (en) * 2018-03-15 2019-09-24 中国移动通信集团有限公司 A kind of method for processing business, Internet of Things billing infrastructure system and storage medium
CN109672524A (en) * 2018-12-12 2019-04-23 东南大学 SM3 algorithm wheel iteration system and alternative manner based on coarseness reconstruction structure
CN109672524B (en) * 2018-12-12 2021-08-20 东南大学 SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture
CN110059050B (en) * 2019-04-28 2023-07-25 北京美联东清科技有限公司 AI supercomputer based on high-performance reconfigurable elastic calculation
CN110096474A (en) * 2019-04-28 2019-08-06 北京超维度计算科技有限公司 A kind of high-performance elastic computing architecture and method based on Reconfigurable Computation
CN110059050A (en) * 2019-04-28 2019-07-26 北京超维度计算科技有限公司 AI supercomputer based on the restructural elastic calculation of high-performance
CN112579090A (en) * 2019-09-27 2021-03-30 无锡江南计算技术研究所 Asynchronous parallel I/O programming framework method under heterogeneous many-core architecture
CN110765046A (en) * 2019-11-07 2020-02-07 首都师范大学 DMA transmission device and method for dynamically reconfigurable high-speed serial bus
CN111897580B (en) * 2020-09-29 2021-01-12 北京清微智能科技有限公司 Instruction scheduling system and method for reconfigurable array processor
CN111897580A (en) * 2020-09-29 2020-11-06 北京清微智能科技有限公司 Instruction scheduling system and method for reconfigurable array processor
CN111930319A (en) * 2020-09-30 2020-11-13 北京清微智能科技有限公司 Data storage and reading method and system for multi-library memory
CN111930319B (en) * 2020-09-30 2021-09-03 北京清微智能科技有限公司 Data storage and reading method and system for multi-library memory
CN112463719A (en) * 2020-12-04 2021-03-09 上海交通大学 In-memory computing method realized based on coarse-grained reconfigurable array
CN112559441A (en) * 2020-12-11 2021-03-26 清华大学无锡应用技术研究院 Control method of digital signal processor
CN112540793A (en) * 2020-12-18 2021-03-23 清华大学 Reconfigurable processing unit array supporting multiple access modes and control method and device
CN112486908A (en) * 2020-12-18 2021-03-12 清华大学 Hierarchical multi-RPU multi-PEA reconfigurable processor
WO2022134426A1 (en) * 2020-12-23 2022-06-30 北京清微智能科技有限公司 Instruction distribution method and system in reconfigurable processor, and storage medium
CN112256632A (en) * 2020-12-23 2021-01-22 北京清微智能科技有限公司 Instruction distribution method and system in reconfigurable processor
CN113553031A (en) * 2021-06-04 2021-10-26 中国人民解放军战略支援部队信息工程大学 Software definition variable structure computing framework and left-right brain integrated resource joint distribution method realized by using same
CN113568731A (en) * 2021-09-24 2021-10-29 苏州浪潮智能科技有限公司 Task scheduling method, chip and electronic equipment

Also Published As

Publication number Publication date
CN105487838B (en) 2018-01-26

Similar Documents

Publication Publication Date Title
CN105487838A (en) Task-level parallel scheduling method and system for dynamically reconfigurable processor
EP3404587B1 (en) Cnn processing method and device
US10095657B2 (en) Processor, accelerator, and direct memory access controller within a core reading/writing local synchronization flag area for parallel
Prakash et al. Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms
CN103279445A (en) Computing method and super-computing system for computing task
Agullo et al. Multifrontal QR factorization for multicore architectures over runtime systems
CN104375805A (en) Method for simulating parallel computation process of reconfigurable processor through multi-core processor
US8615770B1 (en) System and method for dynamically spawning thread blocks within multi-threaded processing systems
CN107180010A (en) Heterogeneous computing system and method
CN101655828B (en) Design method for high efficiency super computing system based on task data flow drive
CN104657111A (en) Parallel computing method and device
CN114035916A (en) Method for compiling and scheduling calculation graph and related product
Madhu et al. Compiling HPC kernels for the REDEFINE CGRA
CN114970294A (en) Three-dimensional strain simulation PCG parallel optimization method and system based on Shenwei architecture
Segal et al. High level programming for heterogeneous architectures
Moustafa et al. 3D cartesian transport sweep for massively parallel architectures with PARSEC
US8959497B1 (en) System and method for dynamically spawning thread blocks within multi-threaded processing systems
Inagaki et al. Performance evaluation of a 3d-stencil library for distributed memory array accelerators
CN110262884B (en) Running method for multi-program multi-data-stream partition parallel in core group based on Shenwei many-core processor
Silberstein GPUs: High-performance accelerators for parallel applications: the multicore transformation (ubiquity symposium)
CN107329818A (en) A kind of task scheduling processing method and device
Liu et al. Parallel implementation and optimization of regional ocean modeling system (ROMS) based on sunway SW26010 many-core processor
CN112148361B (en) Method and system for transplanting encryption algorithm of processor
Nikov et al. High-performance simultaneous multiprocessing for heterogeneous System-on-Chip
Taghiyev et al. Parallel matrix multiplication for various implementations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant