CN106326184A - CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework - Google Patents

CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework Download PDF

Info

Publication number
CN106326184A
CN106326184A CN201610700761.7A CN201610700761A CN106326184A CN 106326184 A CN106326184 A CN 106326184A CN 201610700761 A CN201610700761 A CN 201610700761A CN 106326184 A CN106326184 A CN 106326184A
Authority
CN
China
Prior art keywords
dsp
cpu
gpu
computing
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610700761.7A
Other languages
Chinese (zh)
Inventor
朱焰冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Calabar Inforamtion Technology Ltd By Share Ltd
Original Assignee
Chengdu Calabar Inforamtion Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Calabar Inforamtion Technology Ltd By Share Ltd filed Critical Chengdu Calabar Inforamtion Technology Ltd By Share Ltd
Priority to CN201610700761.7A priority Critical patent/CN106326184A/en
Publication of CN106326184A publication Critical patent/CN106326184A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework. The CPU, GPU and DSP-based heterogeneous computing framework comprises a host module and an equipment end module; the host module comprises a CPU, a DSP module and a DRAM (Dynamic Random Access Memory); the DSP module comprises a storage unit, a control unit and a computing unit; the control unit sends a command to the storage unit to enable the storage unit to read data from the DRAM and store the data into the storage unit; simultaneously, the control unit sends a command to the computing unit to enable the computing unit to read the data from the storage unit and compute; after the computing unit completes computing, the computing unit sends a computing completing signal to the control unit; the control unit sends a command to the storage unit to receive the data returned by the computing unit after receiving the computing completing signal; the equipment end module comprises a DRAM local memory and a GPU chip; the GPU chip consists of a plurality of processors and is used for taking charge of highly parallel computing processing of large-scale data without a logical relationship. The CPU, GPU and DSP-based heterogeneous computing framework combines the multifunctional processing capacity of the CPU, the highly parallel computing capacity of GPU and the data processing capacity of the DSP, so that the data processing burden of the CPU is shared, and the data computing speed and the data processing capacity are improved.

Description

Heterogeneous Computing framework based on CPU, GPU and DSP
Technical field
The present invention relates to a kind of Computational frame, be specifically related to Heterogeneous Computing framework based on CPU, GPU and DSP.
Background technology
In recent years, dominant frequency simple for CPU promotes cannot be obviously improved systematic entirety energy, additionally, along with dominant frequency Promoting, power consumption increases, and heat dissipation problem also increasingly becomes an obstacle that cannot go beyond.And CPU multithreading and multi-core technology, The performance seeming to make CPU doubles, but in essence, can not solve power consumption and manufacturing process problem is CPU frequency Promote the bottleneck brought.
High-end GPU has had hundreds of stream and has processed core, and calculated performance has met or exceeded TFlops rank per second, phase When in a HPCC, it is far longer than the computing capability of main flow CPU.It is intended that the high computing capability energy of GPU Be enough in figure show beyond general-purpose computations field, as data process, scientific algorithm etc..But what GPU was good at is graphics class Or the highly-parallel numerical computations of non-graphic class, GPU can accommodate thousands of the numerical computations threads not having logical relation, it Advantage is the parallel computation without logical relation data.The advantage of GPU numerical computations is mainly floating-point operation at present, and it performs floating-point Computing is by a large amount of parallel soon, but the concurrency of this numerical operation is not playing advantage when logical judgment performs. More specific sees, GPU is particularly suitable for problem or computing same procedure operation many parallel datas of parallel data computing Element, has high computing density.
The powerful data-handling capacity of DSP and the high speed of service, be the big characteristics of the most commendable two.Due to its computing energy Power is very strong, and quickly, volume is the least for speed, and uses software programming to have the motility of height 7, therefore various multiple for being engaged in Miscellaneous application provides an effective way.
It can thus be seen that GPU has it to carry out the advantage place of general-purpose computations, but its feature determines it cannot replace CPU completes operating system, systems soft ware and general purpose application program etc. and has complicated order scheduling, circulation, branch, logical judgment etc. Task.
Summary of the invention
The technical problem to be solved be dispersion CPU task, improve arithmetic speed, it is therefore intended that provide based on The Heterogeneous Computing framework of CPU, GPU and DSP, calculates the highly-parallel of the multi-functional process of CPU, GPU and the data of DSP process Ability combines, and promotes data and calculates speed and data-handling capacity.
The present invention is achieved through the following technical solutions:
Heterogeneous Computing framework based on CPU, GPU and DSP, including host module and equipment end module;Described host module includes CPU, DSP module and DRAM internal memory, described CPU is multiple for being responsible for having of operating system, systems soft ware and general purpose application program Miscellaneous instruction scheduling, circulation, branch, the general procedure of logical judgment and simple computation task, carry out data with DRAM internal memory simultaneously Exchange and storage;Described DSP module includes memory element, control unit and arithmetic element, and described control unit is to memory element Sending instruction makes it read data from DRAM internal memory, and stores to memory element, and control unit sends to arithmetic element simultaneously Instruction, makes arithmetic element read data from memory element and goes forward side by side row operation, and arithmetic element sends to control unit after completing computing Computing completes signal, and control unit reception computing completes the backward memory element of signal and sends what command reception arithmetic element returned Data;Described equipment end module includes DRAM this locality internal memory and GPU chip, and described GPU chip is made up of multiprocessor, is used for bearing Duty is on a large scale without the highly-parallel calculating process of logical relation data.CPU supports operating system, the system side with CPU as core Just man-machine interaction and communicating with standard interface equipment, very convenient and need not hardware development, but the Peripheral Interface of CPU Circuit is complicated;DSP is mainly used in Embedded signal processing system, does not emphasize man-machine interaction, and typically need not much communicate connects Mouthful, simple in construction, it is simple to exploitation, therefore the respective pluses and minuses of CPU and GPU make up mutually so that respective strong point is by fully profit With, weakness is blanked;DSP is a developing direction branch of CPU, and DSP is programmable, and speed of service time in fact Up to every number of seconds with ten million bar complicated order program, considerably beyond general purpose microprocessor, its operational capability is strong, speed fast, body Long-pending little, and use software programming to have the highest motility, DSP with CPU is combined, the data that can effectively share CPU calculate Burden, also improves the speed of computing simultaneously.
Further, using data/address bus and address bus to be connected between DSP module with DRAM internal memory, DSP is total by data Line and address bus separately, make program and data be respectively stored in two separate spaces, it is allowed to instruction fetch and execution have instructed Full weight is folded;.
Further, host module uses high-speed serial bus to be connected with equipment end module, and high-speed serial bus has to be made By convenience, speed is fast, connect flexible and independently-powered advantage.
Further, the multiprocessor in GPU chip includes that depositor and shared drive are constituted.
Further, GPU chip contains at least one piece of multiprocessor, and described host module contains at least one DSP module.
The present invention compared with prior art, has such advantages as and beneficial effect: by the multi-functional process of CPU, GPU Highly-parallel calculates and the data-handling capacity of DSP combines, and shares the data processing load of CPU, promote data calculate speed and Data-handling capacity.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing being further appreciated by the embodiment of the present invention, constitutes of the application Point, it is not intended that the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is present configuration schematic diagram.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with embodiment and accompanying drawing, to this Invention is described in further detail, and the exemplary embodiment of the present invention and explanation thereof are only used for explaining the present invention, do not make For limitation of the invention.
Embodiment
In the present embodiment, CPU type selecting be Pentium4631, GPU type selecting be Kepler GK110, DSP type selecting is MS320C67XX。
As it is shown in figure 1, Heterogeneous Computing framework based on CPU, GPU and DSP, including host module and equipment end module;Institute State host module and include CPU, DSP module and DRAM internal memory, described CPU be used for being responsible for operating system, systems soft ware and general should Have complicated order scheduling, circulation, branch, the general procedure of logical judgment and a simple computation task by program, simultaneously with DRAM internal memory carries out data exchange and storage;Described DSP module includes memory element, control unit and arithmetic element, described control Unit processed sends instruction to memory element makes it read data from DRAM internal memory, and stores to memory element, controls list simultaneously Unit sends instruction to arithmetic element, makes arithmetic element read data from memory element and goes forward side by side row operation, and arithmetic element completes computing Backward control unit sends computing and completes signal, and control unit receives computing and completes the backward memory element of signal and send instruction and connect Receive the data that arithmetic element returns;Described equipment end module includes DRAM this locality internal memory and GPU chip, and described GPU chip is by many Processor is constituted, and processes for being responsible for the extensive highly-parallel calculating without logical relation data.CPU supports operating system, with CPU is that the system of core facilitates man-machine interaction and communicates with standard interface equipment, very convenient and need not hardware development, But the peripheral interface circuit of CPU is complicated;DSP is mainly used in Embedded signal processing system, does not emphasize man-machine interaction, typically Need not a lot of communication interface, simple in construction, it is simple to exploitation, therefore the respective pluses and minuses of CPU and GPU make up mutually so that each From strong point be fully utilized, weakness is blanked;DSP is a developing direction branch of CPU, and DSP is programmable, and And time in fact the speed of service up to every number of seconds with ten million bar complicated order program, considerably beyond general purpose microprocessor, its computing Ability is strong, speed is fast, volume is little, and uses software programming to have the highest motility, is combined by DSP with CPU, can effectively divide The data computation burden of load CPU, also improves the speed of computing simultaneously.Data/address bus is used between DSP module and DRAM internal memory Connect with address bus, DSP by data/address bus and address bus separately, make program and data be respectively stored in two separate Space, it is allowed to instruction fetch and execution instruction are completely overlapped;.Host module uses high-speed serial bus to be connected with equipment end module, High-speed serial bus has that easy to use, speed is fast, connect flexible and independently-powered advantage.Multiprocessor in GPU chip Constitute including depositor and shared drive.GPU chip contains at least one piece of multiprocessor, and described host module contains at least one DSP module.
Above-described detailed description of the invention, has been carried out the purpose of the present invention, technical scheme and beneficial effect further Describe in detail, be it should be understood that the detailed description of the invention that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, all within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, all should comprise Within protection scope of the present invention.

Claims (5)

1. Heterogeneous Computing framework based on CPU, GPU and DSP, it is characterised in that include host module and equipment end module;Described Host module includes CPU, DSP module and DRAM internal memory, and described CPU is used for being responsible for operating system, systems soft ware and having complexity Instruction scheduling, circulation, branch, the general purpose application program of general procedure of logical judgment and simple computation task, while and DRAM Internal memory carries out data exchange and storage;Described DSP module includes memory element, control unit and arithmetic element, described control list Unit sends instruction to memory element and makes it read data from DRAM internal memory, and stores to memory element, simultaneously control unit to Arithmetic element sends instruction, makes arithmetic element read data from memory element and goes forward side by side row operation, and it is backward that arithmetic element completes computing Control unit sends computing and completes signal, and control unit reception computing completes the backward memory element of signal and sends command reception fortune Calculate the data that unit returns;Described equipment end module includes DRAM this locality internal memory and GPU chip, and described GPU chip is by multiprocessing Device is constituted, and processes for being responsible for the extensive highly-parallel calculating without logical relation data.
Heterogeneous Computing framework based on CPU, GPU and DSP the most according to claim 1, it is characterised in that described DSP mould Data/address bus and address bus is used to be connected between block with DRAM internal memory.
Heterogeneous Computing framework based on CPU, GPU and DSP the most according to claim 1, it is characterised in that described main frame mould Block uses high-speed serial bus to be connected with equipment end module.
Heterogeneous Computing framework based on CPU, GPU and DSP the most according to claim 1, it is characterised in that described GPU core Multiprocessor in sheet includes that depositor and shared drive are constituted.
Heterogeneous Computing framework based on CPU, GPU and DSP the most according to claim 1, it is characterised in that described GPU core Sheet contains at least one piece of multiprocessor, and described host module contains at least one DSP module.
CN201610700761.7A 2016-08-23 2016-08-23 CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework Pending CN106326184A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610700761.7A CN106326184A (en) 2016-08-23 2016-08-23 CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610700761.7A CN106326184A (en) 2016-08-23 2016-08-23 CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework

Publications (1)

Publication Number Publication Date
CN106326184A true CN106326184A (en) 2017-01-11

Family

ID=57741206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610700761.7A Pending CN106326184A (en) 2016-08-23 2016-08-23 CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework

Country Status (1)

Country Link
CN (1) CN106326184A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897582A (en) * 2017-01-25 2017-06-27 人和未来生物科技(长沙)有限公司 A kind of heterogeneous platform understood towards gene data
CN107273663A (en) * 2017-05-22 2017-10-20 人和未来生物科技(长沙)有限公司 A kind of DNA methylation sequencing data calculates deciphering method
CN107564522A (en) * 2017-09-18 2018-01-09 郑州云海信息技术有限公司 A kind of intelligent control method and device
CN108509220A (en) * 2018-04-02 2018-09-07 厦门海迈科技股份有限公司 Revit engineering calculation amounts method for parallel processing, device, terminal and medium
CN111274996A (en) * 2020-02-14 2020-06-12 深圳英飞拓智能技术有限公司 Face picture feature comparison method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526934A (en) * 2009-04-21 2009-09-09 浪潮电子信息产业股份有限公司 Construction method of GPU and CPU combined processor
US20140181537A1 (en) * 2012-12-21 2014-06-26 Advanced Micro Devices, Inc. Guardband reduction for multi-core data processor
CN103914418A (en) * 2013-01-07 2014-07-09 三星电子株式会社 Processor module, micro-server, and method of using processor module

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526934A (en) * 2009-04-21 2009-09-09 浪潮电子信息产业股份有限公司 Construction method of GPU and CPU combined processor
US20140181537A1 (en) * 2012-12-21 2014-06-26 Advanced Micro Devices, Inc. Guardband reduction for multi-core data processor
CN103914418A (en) * 2013-01-07 2014-07-09 三星电子株式会社 Processor module, micro-server, and method of using processor module

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897582A (en) * 2017-01-25 2017-06-27 人和未来生物科技(长沙)有限公司 A kind of heterogeneous platform understood towards gene data
CN106897582B (en) * 2017-01-25 2018-03-09 人和未来生物科技(长沙)有限公司 A kind of heterogeneous platform understood towards gene data
CN107273663A (en) * 2017-05-22 2017-10-20 人和未来生物科技(长沙)有限公司 A kind of DNA methylation sequencing data calculates deciphering method
CN107273663B (en) * 2017-05-22 2018-12-11 人和未来生物科技(长沙)有限公司 A kind of DNA methylation sequencing data calculating deciphering method
CN107564522A (en) * 2017-09-18 2018-01-09 郑州云海信息技术有限公司 A kind of intelligent control method and device
CN108509220A (en) * 2018-04-02 2018-09-07 厦门海迈科技股份有限公司 Revit engineering calculation amounts method for parallel processing, device, terminal and medium
CN108509220B (en) * 2018-04-02 2021-01-22 厦门海迈科技股份有限公司 Revit engineering calculation amount parallel processing method, device, terminal and medium
CN111274996A (en) * 2020-02-14 2020-06-12 深圳英飞拓智能技术有限公司 Face picture feature comparison method and device, computer equipment and storage medium
CN111274996B (en) * 2020-02-14 2023-06-09 深圳英飞拓仁用信息有限公司 Face picture feature comparison method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106326184A (en) CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework
CN102375800B (en) For the multiprocessor systems on chips of machine vision algorithm
US20170109213A1 (en) Work stealing in heterogeneous computing systems
US10289604B2 (en) Memory processing core architecture
US10255228B2 (en) System and method for performing shaped memory access operations
US20120079155A1 (en) Interleaved Memory Access from Multiple Requesters
CN104699631A (en) Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CN103197916A (en) Methods and apparatus for source operand collector caching
EP2808783B1 (en) Smart cache and smart terminal
CN101833441B (en) Parallel vector processing engine structure
TWI666551B (en) Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
CN103226463A (en) Methods and apparatus for scheduling instructions using pre-decode data
CN102640127A (en) Configurable cache for multiple clients
CN104050032A (en) System and method for hardware scheduling of conditional barriers and impatient barriers
CN103020002A (en) Reconfigurable multiprocessor system
Islam et al. Improving node-level mapreduce performance using processing-in-memory technologies
CN112527729A (en) Tightly-coupled heterogeneous multi-core processor architecture and processing method thereof
CN105718990B (en) Communication means between cellular array computing system and wherein cell
CN103365821A (en) Address generator of heterogeneous multi-core processor
CN114116167B (en) High-performance computing-oriented regional autonomous heterogeneous many-core processor
US20240054081A1 (en) Controlling access to a memory shared by a cluster of multiple processing elements
CN201444298U (en) Communication module between multi-core processor and second level caches
CN105718993B (en) Cellular array computing system and communication means therein
US7650483B2 (en) Execution of instructions within a data processing apparatus having a plurality of processing units
Papadopoulos et al. Performance and power consumption evaluation of concurrent queue implementations in embedded systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170111

WD01 Invention patent application deemed withdrawn after publication