CN106326184A - CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework - Google Patents
CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework Download PDFInfo
- Publication number
- CN106326184A CN106326184A CN201610700761.7A CN201610700761A CN106326184A CN 106326184 A CN106326184 A CN 106326184A CN 201610700761 A CN201610700761 A CN 201610700761A CN 106326184 A CN106326184 A CN 106326184A
- Authority
- CN
- China
- Prior art keywords
- dsp
- cpu
- gpu
- computing
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
The invention discloses a CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework. The CPU, GPU and DSP-based heterogeneous computing framework comprises a host module and an equipment end module; the host module comprises a CPU, a DSP module and a DRAM (Dynamic Random Access Memory); the DSP module comprises a storage unit, a control unit and a computing unit; the control unit sends a command to the storage unit to enable the storage unit to read data from the DRAM and store the data into the storage unit; simultaneously, the control unit sends a command to the computing unit to enable the computing unit to read the data from the storage unit and compute; after the computing unit completes computing, the computing unit sends a computing completing signal to the control unit; the control unit sends a command to the storage unit to receive the data returned by the computing unit after receiving the computing completing signal; the equipment end module comprises a DRAM local memory and a GPU chip; the GPU chip consists of a plurality of processors and is used for taking charge of highly parallel computing processing of large-scale data without a logical relationship. The CPU, GPU and DSP-based heterogeneous computing framework combines the multifunctional processing capacity of the CPU, the highly parallel computing capacity of GPU and the data processing capacity of the DSP, so that the data processing burden of the CPU is shared, and the data computing speed and the data processing capacity are improved.
Description
Technical field
The present invention relates to a kind of Computational frame, be specifically related to Heterogeneous Computing framework based on CPU, GPU and DSP.
Background technology
In recent years, dominant frequency simple for CPU promotes cannot be obviously improved systematic entirety energy, additionally, along with dominant frequency
Promoting, power consumption increases, and heat dissipation problem also increasingly becomes an obstacle that cannot go beyond.And CPU multithreading and multi-core technology,
The performance seeming to make CPU doubles, but in essence, can not solve power consumption and manufacturing process problem is CPU frequency
Promote the bottleneck brought.
High-end GPU has had hundreds of stream and has processed core, and calculated performance has met or exceeded TFlops rank per second, phase
When in a HPCC, it is far longer than the computing capability of main flow CPU.It is intended that the high computing capability energy of GPU
Be enough in figure show beyond general-purpose computations field, as data process, scientific algorithm etc..But what GPU was good at is graphics class
Or the highly-parallel numerical computations of non-graphic class, GPU can accommodate thousands of the numerical computations threads not having logical relation, it
Advantage is the parallel computation without logical relation data.The advantage of GPU numerical computations is mainly floating-point operation at present, and it performs floating-point
Computing is by a large amount of parallel soon, but the concurrency of this numerical operation is not playing advantage when logical judgment performs.
More specific sees, GPU is particularly suitable for problem or computing same procedure operation many parallel datas of parallel data computing
Element, has high computing density.
The powerful data-handling capacity of DSP and the high speed of service, be the big characteristics of the most commendable two.Due to its computing energy
Power is very strong, and quickly, volume is the least for speed, and uses software programming to have the motility of height 7, therefore various multiple for being engaged in
Miscellaneous application provides an effective way.
It can thus be seen that GPU has it to carry out the advantage place of general-purpose computations, but its feature determines it cannot replace
CPU completes operating system, systems soft ware and general purpose application program etc. and has complicated order scheduling, circulation, branch, logical judgment etc.
Task.
Summary of the invention
The technical problem to be solved be dispersion CPU task, improve arithmetic speed, it is therefore intended that provide based on
The Heterogeneous Computing framework of CPU, GPU and DSP, calculates the highly-parallel of the multi-functional process of CPU, GPU and the data of DSP process
Ability combines, and promotes data and calculates speed and data-handling capacity.
The present invention is achieved through the following technical solutions:
Heterogeneous Computing framework based on CPU, GPU and DSP, including host module and equipment end module;Described host module includes
CPU, DSP module and DRAM internal memory, described CPU is multiple for being responsible for having of operating system, systems soft ware and general purpose application program
Miscellaneous instruction scheduling, circulation, branch, the general procedure of logical judgment and simple computation task, carry out data with DRAM internal memory simultaneously
Exchange and storage;Described DSP module includes memory element, control unit and arithmetic element, and described control unit is to memory element
Sending instruction makes it read data from DRAM internal memory, and stores to memory element, and control unit sends to arithmetic element simultaneously
Instruction, makes arithmetic element read data from memory element and goes forward side by side row operation, and arithmetic element sends to control unit after completing computing
Computing completes signal, and control unit reception computing completes the backward memory element of signal and sends what command reception arithmetic element returned
Data;Described equipment end module includes DRAM this locality internal memory and GPU chip, and described GPU chip is made up of multiprocessor, is used for bearing
Duty is on a large scale without the highly-parallel calculating process of logical relation data.CPU supports operating system, the system side with CPU as core
Just man-machine interaction and communicating with standard interface equipment, very convenient and need not hardware development, but the Peripheral Interface of CPU
Circuit is complicated;DSP is mainly used in Embedded signal processing system, does not emphasize man-machine interaction, and typically need not much communicate connects
Mouthful, simple in construction, it is simple to exploitation, therefore the respective pluses and minuses of CPU and GPU make up mutually so that respective strong point is by fully profit
With, weakness is blanked;DSP is a developing direction branch of CPU, and DSP is programmable, and speed of service time in fact
Up to every number of seconds with ten million bar complicated order program, considerably beyond general purpose microprocessor, its operational capability is strong, speed fast, body
Long-pending little, and use software programming to have the highest motility, DSP with CPU is combined, the data that can effectively share CPU calculate
Burden, also improves the speed of computing simultaneously.
Further, using data/address bus and address bus to be connected between DSP module with DRAM internal memory, DSP is total by data
Line and address bus separately, make program and data be respectively stored in two separate spaces, it is allowed to instruction fetch and execution have instructed
Full weight is folded;.
Further, host module uses high-speed serial bus to be connected with equipment end module, and high-speed serial bus has to be made
By convenience, speed is fast, connect flexible and independently-powered advantage.
Further, the multiprocessor in GPU chip includes that depositor and shared drive are constituted.
Further, GPU chip contains at least one piece of multiprocessor, and described host module contains at least one DSP module.
The present invention compared with prior art, has such advantages as and beneficial effect: by the multi-functional process of CPU, GPU
Highly-parallel calculates and the data-handling capacity of DSP combines, and shares the data processing load of CPU, promote data calculate speed and
Data-handling capacity.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing being further appreciated by the embodiment of the present invention, constitutes of the application
Point, it is not intended that the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is present configuration schematic diagram.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with embodiment and accompanying drawing, to this
Invention is described in further detail, and the exemplary embodiment of the present invention and explanation thereof are only used for explaining the present invention, do not make
For limitation of the invention.
Embodiment
In the present embodiment, CPU type selecting be Pentium4631, GPU type selecting be Kepler GK110, DSP type selecting is
MS320C67XX。
As it is shown in figure 1, Heterogeneous Computing framework based on CPU, GPU and DSP, including host module and equipment end module;Institute
State host module and include CPU, DSP module and DRAM internal memory, described CPU be used for being responsible for operating system, systems soft ware and general should
Have complicated order scheduling, circulation, branch, the general procedure of logical judgment and a simple computation task by program, simultaneously with
DRAM internal memory carries out data exchange and storage;Described DSP module includes memory element, control unit and arithmetic element, described control
Unit processed sends instruction to memory element makes it read data from DRAM internal memory, and stores to memory element, controls list simultaneously
Unit sends instruction to arithmetic element, makes arithmetic element read data from memory element and goes forward side by side row operation, and arithmetic element completes computing
Backward control unit sends computing and completes signal, and control unit receives computing and completes the backward memory element of signal and send instruction and connect
Receive the data that arithmetic element returns;Described equipment end module includes DRAM this locality internal memory and GPU chip, and described GPU chip is by many
Processor is constituted, and processes for being responsible for the extensive highly-parallel calculating without logical relation data.CPU supports operating system, with
CPU is that the system of core facilitates man-machine interaction and communicates with standard interface equipment, very convenient and need not hardware development,
But the peripheral interface circuit of CPU is complicated;DSP is mainly used in Embedded signal processing system, does not emphasize man-machine interaction, typically
Need not a lot of communication interface, simple in construction, it is simple to exploitation, therefore the respective pluses and minuses of CPU and GPU make up mutually so that each
From strong point be fully utilized, weakness is blanked;DSP is a developing direction branch of CPU, and DSP is programmable, and
And time in fact the speed of service up to every number of seconds with ten million bar complicated order program, considerably beyond general purpose microprocessor, its computing
Ability is strong, speed is fast, volume is little, and uses software programming to have the highest motility, is combined by DSP with CPU, can effectively divide
The data computation burden of load CPU, also improves the speed of computing simultaneously.Data/address bus is used between DSP module and DRAM internal memory
Connect with address bus, DSP by data/address bus and address bus separately, make program and data be respectively stored in two separate
Space, it is allowed to instruction fetch and execution instruction are completely overlapped;.Host module uses high-speed serial bus to be connected with equipment end module,
High-speed serial bus has that easy to use, speed is fast, connect flexible and independently-powered advantage.Multiprocessor in GPU chip
Constitute including depositor and shared drive.GPU chip contains at least one piece of multiprocessor, and described host module contains at least one
DSP module.
Above-described detailed description of the invention, has been carried out the purpose of the present invention, technical scheme and beneficial effect further
Describe in detail, be it should be understood that the detailed description of the invention that the foregoing is only the present invention, be not intended to limit the present invention
Protection domain, all within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, all should comprise
Within protection scope of the present invention.
Claims (5)
1. Heterogeneous Computing framework based on CPU, GPU and DSP, it is characterised in that include host module and equipment end module;Described
Host module includes CPU, DSP module and DRAM internal memory, and described CPU is used for being responsible for operating system, systems soft ware and having complexity
Instruction scheduling, circulation, branch, the general purpose application program of general procedure of logical judgment and simple computation task, while and DRAM
Internal memory carries out data exchange and storage;Described DSP module includes memory element, control unit and arithmetic element, described control list
Unit sends instruction to memory element and makes it read data from DRAM internal memory, and stores to memory element, simultaneously control unit to
Arithmetic element sends instruction, makes arithmetic element read data from memory element and goes forward side by side row operation, and it is backward that arithmetic element completes computing
Control unit sends computing and completes signal, and control unit reception computing completes the backward memory element of signal and sends command reception fortune
Calculate the data that unit returns;Described equipment end module includes DRAM this locality internal memory and GPU chip, and described GPU chip is by multiprocessing
Device is constituted, and processes for being responsible for the extensive highly-parallel calculating without logical relation data.
Heterogeneous Computing framework based on CPU, GPU and DSP the most according to claim 1, it is characterised in that described DSP mould
Data/address bus and address bus is used to be connected between block with DRAM internal memory.
Heterogeneous Computing framework based on CPU, GPU and DSP the most according to claim 1, it is characterised in that described main frame mould
Block uses high-speed serial bus to be connected with equipment end module.
Heterogeneous Computing framework based on CPU, GPU and DSP the most according to claim 1, it is characterised in that described GPU core
Multiprocessor in sheet includes that depositor and shared drive are constituted.
Heterogeneous Computing framework based on CPU, GPU and DSP the most according to claim 1, it is characterised in that described GPU core
Sheet contains at least one piece of multiprocessor, and described host module contains at least one DSP module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610700761.7A CN106326184A (en) | 2016-08-23 | 2016-08-23 | CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610700761.7A CN106326184A (en) | 2016-08-23 | 2016-08-23 | CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106326184A true CN106326184A (en) | 2017-01-11 |
Family
ID=57741206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610700761.7A Pending CN106326184A (en) | 2016-08-23 | 2016-08-23 | CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106326184A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897582A (en) * | 2017-01-25 | 2017-06-27 | 人和未来生物科技(长沙)有限公司 | A kind of heterogeneous platform understood towards gene data |
CN107273663A (en) * | 2017-05-22 | 2017-10-20 | 人和未来生物科技(长沙)有限公司 | A kind of DNA methylation sequencing data calculates deciphering method |
CN107564522A (en) * | 2017-09-18 | 2018-01-09 | 郑州云海信息技术有限公司 | A kind of intelligent control method and device |
CN108509220A (en) * | 2018-04-02 | 2018-09-07 | 厦门海迈科技股份有限公司 | Revit engineering calculation amounts method for parallel processing, device, terminal and medium |
CN111274996A (en) * | 2020-02-14 | 2020-06-12 | 深圳英飞拓智能技术有限公司 | Face picture feature comparison method and device, computer equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101526934A (en) * | 2009-04-21 | 2009-09-09 | 浪潮电子信息产业股份有限公司 | Construction method of GPU and CPU combined processor |
US20140181537A1 (en) * | 2012-12-21 | 2014-06-26 | Advanced Micro Devices, Inc. | Guardband reduction for multi-core data processor |
CN103914418A (en) * | 2013-01-07 | 2014-07-09 | 三星电子株式会社 | Processor module, micro-server, and method of using processor module |
-
2016
- 2016-08-23 CN CN201610700761.7A patent/CN106326184A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101526934A (en) * | 2009-04-21 | 2009-09-09 | 浪潮电子信息产业股份有限公司 | Construction method of GPU and CPU combined processor |
US20140181537A1 (en) * | 2012-12-21 | 2014-06-26 | Advanced Micro Devices, Inc. | Guardband reduction for multi-core data processor |
CN103914418A (en) * | 2013-01-07 | 2014-07-09 | 三星电子株式会社 | Processor module, micro-server, and method of using processor module |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897582A (en) * | 2017-01-25 | 2017-06-27 | 人和未来生物科技(长沙)有限公司 | A kind of heterogeneous platform understood towards gene data |
CN106897582B (en) * | 2017-01-25 | 2018-03-09 | 人和未来生物科技(长沙)有限公司 | A kind of heterogeneous platform understood towards gene data |
CN107273663A (en) * | 2017-05-22 | 2017-10-20 | 人和未来生物科技(长沙)有限公司 | A kind of DNA methylation sequencing data calculates deciphering method |
CN107273663B (en) * | 2017-05-22 | 2018-12-11 | 人和未来生物科技(长沙)有限公司 | A kind of DNA methylation sequencing data calculating deciphering method |
CN107564522A (en) * | 2017-09-18 | 2018-01-09 | 郑州云海信息技术有限公司 | A kind of intelligent control method and device |
CN108509220A (en) * | 2018-04-02 | 2018-09-07 | 厦门海迈科技股份有限公司 | Revit engineering calculation amounts method for parallel processing, device, terminal and medium |
CN108509220B (en) * | 2018-04-02 | 2021-01-22 | 厦门海迈科技股份有限公司 | Revit engineering calculation amount parallel processing method, device, terminal and medium |
CN111274996A (en) * | 2020-02-14 | 2020-06-12 | 深圳英飞拓智能技术有限公司 | Face picture feature comparison method and device, computer equipment and storage medium |
CN111274996B (en) * | 2020-02-14 | 2023-06-09 | 深圳英飞拓仁用信息有限公司 | Face picture feature comparison method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106326184A (en) | CPU (Central Processing Unit), GPU (Graphic Processing Unit) and DSP (Digital Signal Processor)-based heterogeneous computing framework | |
CN102375800B (en) | For the multiprocessor systems on chips of machine vision algorithm | |
US20170109213A1 (en) | Work stealing in heterogeneous computing systems | |
US10289604B2 (en) | Memory processing core architecture | |
US10255228B2 (en) | System and method for performing shaped memory access operations | |
US20120079155A1 (en) | Interleaved Memory Access from Multiple Requesters | |
CN104699631A (en) | Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor) | |
CN103197916A (en) | Methods and apparatus for source operand collector caching | |
EP2808783B1 (en) | Smart cache and smart terminal | |
CN101833441B (en) | Parallel vector processing engine structure | |
TWI666551B (en) | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines | |
CN103226463A (en) | Methods and apparatus for scheduling instructions using pre-decode data | |
CN102640127A (en) | Configurable cache for multiple clients | |
CN104050032A (en) | System and method for hardware scheduling of conditional barriers and impatient barriers | |
CN103020002A (en) | Reconfigurable multiprocessor system | |
Islam et al. | Improving node-level mapreduce performance using processing-in-memory technologies | |
CN112527729A (en) | Tightly-coupled heterogeneous multi-core processor architecture and processing method thereof | |
CN105718990B (en) | Communication means between cellular array computing system and wherein cell | |
CN103365821A (en) | Address generator of heterogeneous multi-core processor | |
CN114116167B (en) | High-performance computing-oriented regional autonomous heterogeneous many-core processor | |
US20240054081A1 (en) | Controlling access to a memory shared by a cluster of multiple processing elements | |
CN201444298U (en) | Communication module between multi-core processor and second level caches | |
CN105718993B (en) | Cellular array computing system and communication means therein | |
US7650483B2 (en) | Execution of instructions within a data processing apparatus having a plurality of processing units | |
Papadopoulos et al. | Performance and power consumption evaluation of concurrent queue implementations in embedded systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170111 |
|
WD01 | Invention patent application deemed withdrawn after publication |