CN107066802A - A kind of heterogeneous platform calculated towards gene data - Google Patents
A kind of heterogeneous platform calculated towards gene data Download PDFInfo
- Publication number
- CN107066802A CN107066802A CN201710055558.3A CN201710055558A CN107066802A CN 107066802 A CN107066802 A CN 107066802A CN 201710055558 A CN201710055558 A CN 201710055558A CN 107066802 A CN107066802 A CN 107066802A
- Authority
- CN
- China
- Prior art keywords
- data
- gene
- code segment
- gpu
- cpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 119
- 230000015654 memory Effects 0.000 claims abstract description 46
- 238000004364 calculation method Methods 0.000 claims abstract description 35
- 238000005265 energy consumption Methods 0.000 claims abstract description 11
- 238000004519 manufacturing process Methods 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims description 24
- 230000003993 interaction Effects 0.000 claims description 17
- 238000000034 method Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 15
- 230000002093 peripheral effect Effects 0.000 claims description 13
- 230000001133 acceleration Effects 0.000 claims description 10
- 238000004891 communication Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 7
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 10
- 241001269238 Data Species 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000012163 sequencing technique Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 6
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000007774 longterm Effects 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 239000000306 component Substances 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012070 whole genome sequencing analysis Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000036760 body temperature Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000009532 heart rate measurement Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004513 sizing Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of heterogeneous platform calculated towards gene data, data command input block and gene result of calculation output unit are calculated including heterogeneous processor unit, interconnection module, internal memory, gene, the heterogeneous processor unit calculates data command input block, gene result of calculation output unit with internal memory, gene by interconnection module respectively and is connected, the heterogeneous processor unit includes CPU, GPU and FPGA, wherein CPU constitutes control engine, and CPU, GPU, FPGA three constitute computing engines.The present invention can provide hardware supported to improve the real-time and accuracy of gene data calculating, have the advantages that gene data computational efficiency is high, manufacturing cost is low, it is low to calculate energy consumption.
Description
Technical field
The present invention relates to gene sequencing technology, and in particular to a kind of heterogeneous platform calculated towards gene data.
Background technology
Recent years, with sequencing technologies of future generation(Next Generation Sequence, hereinafter referred to as NGS)'s
Extensive use, the cost of gene sequencing declines rapidly, and gene technology initially enters popularization and application.NGS is calculated including gene data
Two steps are understood with gene data, wherein gene data, which is calculated, refers to original gene sequencing data are carried out to go pseudo-, duplicate removal
Deng pretreatment, used when being understood so as to gene data, gene data, which is understood, refers to the gene number after handling gene data calculating
Analyzed, disclosed and explained according to the Scientific Meaning in fields such as biology, medical science, health cares.
Currently, the bottleneck of a restriction gene technology clinical practice development is the magnanimity of gene data.Based on technology
Reason, single sample data volume of the original gene data of NGS generations is very big, such as full-length genome(Whole-Genome
Sequencing, WGS)Single sample data reach more than 100G, therefore the calculating of gene list sample data has been just high defeated
Enter/output-bound and high computation-intensive task;Along with the quick popularization of gene technology, the original gene for causing sequencing to generate
The total amount exponentially of data increases.So, gene data is carried out in real time, accurately to calculate and transmission becomes extremely difficult,
It is faced with huge challenge.Therefore, typical method is to possess the stronger high-performance processor of more, performance at present
On computer cluster, handled with the software based on multithreading.But, the shortcoming of this system is:On the one hand, it
Cost in storage, power consumption, technical support and maintenance is high;On the other hand, on the premise of accuracy is ensured, it is obtainable
Parallel computation acceleration is still difficult to the demand for meeting above-mentioned challenge;More importantly, the original gene data of sequencing generation
Increase far super Moore's Law, so, this method has lacked continuation.
At present, processor type common in computer system has central processing unit(Central Processing
Unit, abbreviation CPU), field programmable gate array(Field Programmable Gate Array, abbreviation FPGA)And figure
Shape processor(Graphics Processing Unit, abbreviation GPU).Current high-performance CPU generally includes multiple places
Manage device core(Processor Core), multithreading is supported from hardware, but its design object is still towards general purpose application program,
And relative to special calculating, the concurrency of general purpose application program is smaller, it is necessary to more complicated control and relatively low performance objective.
Therefore, the hardware resource on CPU pieces is not included mainly or for realizing complicated control rather than calculating for specific function
Special hardware, it would be preferable to support calculating degree of parallelism it is not high.FPGA is a kind of semi-custom circuit, and advantage has:Carried out based on FPGA
System development, the design cycle is short, development cost are low;It is low in energy consumption;Configuration can be remodified after production, design flexibility is high, design
Risk is small.Have the disadvantage:Same function is realized, in general FPGA compares application specific integrated circuit(Application Specific
Integrated Circuit, ASIC)Speed it is slow, it is bigger than ASIC circuit area.With the development and evolution of technology,
FPGA is to more high density, more Large Copacity, more low-power consumption and integrated more stone intellectual properties(Intellectual Property,
IP)Direction develop, FPGA shortcoming is reducing, and advantage is in amplification.Compared to CPU, FPGA can use hardware description language
To customize realization, modification and increase parallel computation.GPU is initially a kind of microprocessor dedicated for image procossing, Neng Goucong
The graphics calculations basic task such as texture mapping and polygon coloring is supported on hardware.It is related to some general numbers because graphics is calculated
Learn and calculate, such as matrix and vector operation, and GPU possesses the framework of highly-parallel, therefore, with related software and hardware technology
Development, GPU computing techniques increasingly rise, i.e. GPU is no longer limited to graphics process, are also exploited for linear algebra, at signal
The parallel computations such as reason, numerical simulation, can provide decades of times or even up to a hundred times of CPU performance.But current GPU is present
2 problems:One is, is limited to GPU hardware architectural features, and many parallel algorithms can not be efficiently performed on GPU;Two are,
Amount of heat can be produced in GPU operations, energy consumption is higher.In summary, three of the above calculating device respectively has feature, and respectively has limitation
Property.But, develop the bottleneck existed for the clinical practice of forementioned gene technology, how to build hybrid frame using above-mentioned processor
Structure platform has become a key technical problem urgently to be resolved hurrily to realize the calculating of magnanimity gene data, then.
The content of the invention
The technical problem to be solved in the present invention:For prior art above mentioned problem there is provided one kind can for improve gene
The real-time and accuracy that data are calculated provide hardware supported, and gene data computational efficiency is high, manufacturing cost is low, it is low to calculate energy consumption
Towards gene data calculate heterogeneous platform.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is:
A kind of heterogeneous platform calculated towards gene data, including heterogeneous processor unit, interconnection module, internal memory, gene
Data command input block and gene result of calculation output unit are calculated, the heterogeneous processor unit passes through interconnection respectively
Module calculates data command input block, gene result of calculation output unit with internal memory, gene and is connected, the heterogeneous processor list
Member includes CPU, GPU and FPGA, and wherein CPU constitutes control engine, and CPU, GPU, FPGA three constitutes computing engines, described
Control engine is calculating data command input block reception gene calculating data command by gene and is being divided into code segment, the present age
When the task type of code section is control task, the instruction and data of code segment is dispatched into CPU processing;When the task of code segment
When type is calculating task, the instruction and data scheduling computing engines of code segment are handled and result of calculation is passed through into gene
Result of calculation output unit is exported.
Preferably, the FPGA include cross bar switch, I/O control unit and accelerator unit, the I/O control unit, plus
Fast device unit is connected with cross bar switch respectively, and the accelerator unit includes being used to realize that hidden Markov model computing hardware adds
Speed hidden Markov model computation accelerator, for realizing both hardware-accelerated hash function computation accelerators of Hash calculation
At least one of, the I/O control unit is connected with interconnection module.
Preferably, the I/O control unit includes PCIE interfaces, dma controller, PIU peripheral interface units and DDR controls
Device, the cross bar switch is connected with dma controller, PIU peripheral interface units and DDR controller respectively, the dma controller,
It is connected with each other between PIU peripheral interface units, the PCIE interfaces are connected with dma controller, the PCIE interfaces, DDR controls
Device is connected with interconnection module respectively.
Preferably, the interconnection module include HCCLink bus modules and HNCLink bus modules, the CPU,
GPU is connected by HCCLink bus modules with internal memory respectively with FPGA, and described CPU, GPU and FPGA pass through HNCLink respectively
Bus module calculates data command input block with gene and gene result of calculation output unit is connected.
Preferably, the gene, which calculates data command input block, includes input equipment, common interface module, network interface
At least one of module, multimedia input interface module, External memory equipment, sensor.
Preferably, the gene result of calculation output unit includes display device, common interface module, network interface mould
At least one of block, multimedia output interface module, External memory equipment.
Preferably, the detailed step that the instruction and data scheduling computing engines of code segment are handled includes:
A1)Judge whether code segment can carry out executing instructions respectively, if streamline execution can be carried out, if line number can be entered
According to parallel execution, if three can not, redirect execution step A7), exit;Otherwise, execution step A2 is redirected);
A2)Judge whether code segment can only carry out data parallel execution, if data parallel execution can only be carried out, redirect execution
Step A3);Otherwise, execution step A6 is redirected);
A3)Judge that code segment is assigned to optimize on FPGA to perform(It is i.e. parallel to perform, similarly hereinafter)Overhead less than code segment assign
Whether the overhead for optimizing execution on to GPU is set up, the code segment be assigned to optimize on FPGA execution overhead include CPU
Interaction data and instruction are produced between FPGA communication overhead, FPGA memory access expense and FPGA computing cost, it is described
Code segment is assigned to the overhead for optimizing execution on GPU including interaction data between CPU and GPU and instructs the communication produced to open
Pin, GPU memory access expense and GPU computing cost, redirect execution step A6 if setting up);Otherwise, execution step is redirected
A4);
A4)Whether be energy consumption preferential, if energy consumption is preferential, then redirect execution step A6 if judging code segment);Otherwise, redirect and hold
Row step A5);
A5)Judge that the gene of code segment is calculated to handle if appropriate for GPU acceleration, if being adapted to GPU acceleration processing, redirect execution
Step A8);Otherwise, execution step A7 is redirected);
A6)The all possible accelerated methods of FPGA are comprehensively utilized, the accelerated method includes parallel instructions, streamline, data simultaneously
At least one of row, judges that code segment is assigned on FPGA and optimizes the overhead of execution and is held less than code segment on CPU
Whether capable overhead is set up, if set up, and redirects execution step A9), otherwise, redirect execution step A7);
A7)The instruction and data of code segment is dispatched into CPU processing, exited;
A8)The instruction and data of code segment is dispatched into GPU processing, exited;
A9)The instruction and data of code segment is dispatched into FPGA processing, exited.
Preferably, step A5)Detailed step include:
A5.1)Judge the gene of code segment calculates whether data parallel execution can be carried out, if data parallel execution can be carried out,
Redirect execution step A5.2);Otherwise, execution step A7 is redirected);
A5.2)Judge that code segment is assigned to optimize the overhead of execution on GPU and be less than the overhead that code segment is performed on CPU
No to set up, the code segment is assigned to the overhead for optimizing execution on GPU including interaction data between CPU and GPU and instructs production
Raw communication overhead, GPU memory access expense and GPU computing cost, the overhead that the code segment is performed on CPU includes
CPU memory access expense and CPU computing cost, redirects execution step A8 if setting up);Otherwise, execution step is redirected
A7).
The present invention has the advantage that towards the heterogeneous platform tool that gene data is calculated:
1st, hardware and software platform, heterogeneous platform of the invention is the heterogeneous platform for adding FPGA and GPU based on CPU, designer can be allowed to open
Send out gene data various and calculate application flow, without redesigning hardware system;Other disclosures or commercial gene number can be transplanted
According to application software is calculated, without redesigning hardware system;Isomery programming language can be used(Such as OpenCL)It is whole different to realize
The uniformity of structure platform application exploitation.
2nd, scalability is good, and heterogeneous platform of the invention is the heterogeneous platform for adding FPGA and GPU based on CPU, being capable of basis
The difference of application demand and change, neatly extend and reconstruct.
3rd, it is widely used, heterogeneous platform of the invention is the heterogeneous platform for adding FPGA and GPU based on CPU, can either conduct
The processing equipment that local gene data is calculated, the processing knot that can be calculated again as gene data under cluster or cloud computing environment
Point.
4th, Gao Kepei, heterogeneous platform of the invention is the heterogeneous platform for adding FPGA and GPU based on CPU, in software aspects, three
Core component --- CPU, FPGA and GPU are programming devices to kind;In hardware aspect, FPGA can also be in system sizing, life
After production and installation, increment configuration is carried out on demand, that is, is changed and/or increased function;, can be according to gene number in terms of application integration
According to the various application requirements of calculating, according to CPU, FPGA and GPU and other hardware advantageous feature, to the group of system all parts
Knit, scale and relevance are configured and used, make each part rational division of work and cooperate, in maximum efficiency optimization application
Flow.Present system provides good design flexibility and increment allocative abilities for system and using designer, it is easy to rise
Level adapts to new application.
5th, high-performance, heterogeneous platform of the invention is the heterogeneous platform for adding FPGA and GPU based on CPU, can be from two sides
Face calculates for high-performance gene data and provides hardware supported:One, accelerate while providing tasks in parallel, data parallel and hardware algorithm
Required hardware;Two, appoint while providing the intensive calculating task of control task, affairs type task, non-data, Data-intensive computing
Hardware needed for business.
6th, inexpensive, heterogeneous platform of the invention is the heterogeneous platform for adding FPGA and GPU based on CPU, and uses software completely
That processing gene data is calculated, existing computer cluster or cloud computing platform are compared, and while performance is improved, can be reduced
Cost in design, storage, network, power consumption, technical support and maintenance.
7th, low-power consumption, heterogeneous platform of the invention is the heterogeneous platform for adding FPGA and GPU based on CPU, passes through FPGA and GPU
Use, share CPU part work, improving performance and while realize functional diversities, reducing energy consumption.
Brief description of the drawings
Fig. 1 is the circuit theory schematic diagram of heterogeneous platform of the embodiment of the present invention.
Fig. 2 is the engine structure schematic diagram of heterogeneous platform of the embodiment of the present invention.
The circuit theory schematic diagram that Fig. 3 is FPGA in heterogeneous platform of the embodiment of the present invention.
Fig. 4 is the scheduling flow schematic diagram that heterogeneous platform of the embodiment of the present invention controls engine.
Fig. 5 is the schematic flow sheet that heterogeneous platform of the embodiment of the present invention dispatches computing engines.
Fig. 6 is that heterogeneous platform of the embodiment of the present invention dispatches the schematic flow sheet that computing engines judge whether suitable GPU acceleration.
Marginal data:1st, heterogeneous processor unit;11st, engine is controlled;12nd, computing engines;2nd, interconnection module;21、
HCCLink bus modules;22nd, HNCLink bus modules;3rd, internal memory;4th, gene calculates data command input block;5th, gene meter
Calculate result output unit.
Embodiment
As depicted in figs. 1 and 2, the heterogeneous platform towards gene data calculating of the present embodiment includes heterogeneous processor unit
1st, interconnection module 2, internal memory 3, gene are calculated at data command input block 4 and gene result of calculation output unit 5, isomery
Manage device unit 1 and data command input block 4, gene result of calculation are calculated by interconnection module 2 and internal memory 3, gene respectively
Output unit 5 is connected, and heterogeneous processor unit 1 includes CPU(Central Processing Unit, central processing unit)、GPU
(Graphics Processing Unit, graphics processor)And FPGA(Field Programmable Gate Array,
Field programmable gate array), wherein, CPU constitutes control engine 11, and CPU, GPU, FPGA three constitute computing engines 12, control
Engine 11 is calculating the reception gene calculating data command of data command input block 4 by gene and is being divided into code segment, the present age
When the task type of code section is control task, the instruction and data of code segment is dispatched into CPU processing;When the task of code segment
When type is calculating task, the instruction and data scheduling computing engines 12 of code segment are handled and result of calculation is passed through into base
Because result of calculation output unit 5 is exported.
In the present embodiment, CPU quantity can be one or more, and each CPU includes one or more processors core
(Processor Core), GPU quantity can be one or more, FPGA quantity can be one or more, CPU, GPU
And can be interconnected in FPGA three between any individual based on interconnection module 2 and exchange data and instruction, Er Qieneng
Enough realized based on interconnection module 2 and internal memory 3, gene calculate data command input block 4 and the output of gene result of calculation is single
Arbitrary equipment in member 5 is interconnected and exchanges data and instruction.Certainly, realize and interconnect between the said equipment part and hand over
The bus form for changing data and instruction is not limited to specific mutual contact mode, and various concrete implementation sides can be used as needed
Formula.
As shown in figure 3, FPGA includes cross bar switch(Crossbar), I/O control unit and accelerator unit, IO controls are single
Member, accelerator unit are connected with cross bar switch respectively, and accelerator unit includes being used to realize hidden Markov model computing hardware
The hidden Markov model of acceleration(Hidden Markov Model, HMM)Computation accelerator, for realizing Hash calculation hardware
Both hash function (Hash function) computation accelerators of acceleration, I/O control unit is connected with interconnection module 2.This
In embodiment, cross bar switch specifically uses Advanced extensible Interface(Advanced eXtensible Interface, AXI)Hand over
Fork is closed.In addition, accelerator unit can also select single hidden Markov model computation accelerator or single Kazakhstan as needed
Uncommon function computation accelerator or use other more similar hardware accelerators simultaneously, for it is hardware-accelerated realize it is other
Calculate.
As shown in figure 3, I/O control unit includes PCIE(Peripheral Component Interconnect
Express, quick Peripheral Component Interconnect)Interface, DMA(Direct Memory Access, direct memory access)Controller,
PIU(Peripheral Interface Unit, peripheral interface unit)Peripheral interface unit and DDR controller, cross bar switch point
It is not connected with dma controller, PIU peripheral interface units and DDR controller, phase between dma controller, PIU peripheral interface units
Connect, PCIE interfaces are connected with dma controller, PCIE interfaces, DDR controller are connected with interconnection module 2 respectively.DDR
Controller is accessed for DDR, and providing DDR controller in storage, the present embodiment for Large Volume Data specifically uses DDR4 controllers.
Above-mentioned PCIE interfaces, above-mentioned dma controller, above-mentioned PIU cooperate to be used between above-mentioned FPGA and above-mentioned CPU, and above-mentioned
Between FPGA and above-mentioned GPU, transmission data and instruction;Above-mentioned cross bar switch is used for above-mentioned dma controller, above-mentioned PIU peripheries and connect
Mouthpiece, above-mentioned DDR controller, above-mentioned hidden Markov model computation accelerator, above-mentioned hash function computation accelerator and on
The interconnection between other accelerators is stated, is that the data between them and instruction transmission provide path.
As shown in figure 1, interconnection module 2 includes HCCLink(Heterogeneous computing Cache
Coherence Link, Heterogeneous Computing storage uniformity interconnection)Bus module 21 and HNCLink(Heterogeneous
Computing Non-Coherence Link, the interconnection of Heterogeneous Computing nonuniformity)Bus module 22, CPU, GPU and FPGA points
It is not connected by HCCLink bus modules 21 with internal memory 3, and CPU, GPU and FPGA pass through the and of HNCLink bus modules 22 respectively
Gene calculates data command input block 4 and gene result of calculation output unit 5 is connected.HCCLink bus modules 21 are used for
Interconnected between above-mentioned CPU, above-mentioned FPGA and above-mentioned GPU and above-mentioned DDR4 memory arrays and exchange data, instructed.
HNCLink bus modules 22 are used to be interconnected between above-mentioned CPU, above-mentioned FPGA and above-mentioned GPU and exchange control instruction;For
Above-mentioned CPU, above-mentioned FPGA and above-mentioned GPU and above-mentioned input-output apparatus(I/O)Between interconnected and exchange data, instruction.
In the present embodiment, internal memory 3 is DDR4 memory arrays(Memory Array).
In the present embodiment, gene calculates data command input block 4 and connect including input equipment, common interface module, network
At least one of mouth mold block, multimedia input interface module, External memory equipment, sensor.In the present embodiment, input equipment
Including at least one of keyboard, mouse, trace ball and Trackpad, common interface module includes boundary scan interface module, led to
With at least one of serial bus interface IP, Network Interface Module includes ethernet interface module, Long Term Evolution LTE interface
At least one of module, Wi-Fi interface module, Bluetooth interface module, it is defeated that multimedia input interface module includes analogue audio frequency
At least one of incoming interface, DAB input interface, video input interface, External memory equipment include flash memory FLASH, consolidated
At least one of state hard disk SSD, sensor is included in temperature sensor, heart rate measurement sensor, fingerprint sensor at least
It is a kind of.
In the present embodiment, gene result of calculation output unit 5 includes display device, common interface module, network interface mould
At least one of block, multimedia output interface module, External memory equipment.In the present embodiment, display device is penetrated including negative electrode
At least one of spool CRT, liquid crystal display LCD, LED, common interface module include boundary scan interface
At least one of module, USB module, Network Interface Module include ethernet interface module, Long Term Evolution
At least one of LTE interface module, Wi-Fi interface module, Bluetooth interface module, multimedia output interface module include simulation
At least one of audio output interface, digital audio output interface, video output interface, External memory equipment include flash memory
At least one of FLASH, solid-state hard disk SSD.
As shown in figure 4, control engine 11 refers to by the reception gene calculating data of gene calculating data command input block 4
Make and be divided into code segment, then the computing engines 12 that CPU, GPU, FPGA three are constituted are entered according to the task type of code segment
Row integrated dispatch:When the task type of code segment is control task, the instruction and data of code segment is dispatched at CPU
Reason;When the task type of code segment is calculating task, the instruction and data scheduling computing engines 12 of code segment are handled
And export result of calculation by gene result of calculation output unit 5.
In the present embodiment, CPU function is as follows:For the one or more FPGA of scheduling controlling, and one or more FPGA
Interaction data and instruction;For the one or more GPU of scheduling controlling, and one or more GPU interaction datas and instruction;For with
One or more memory interaction datas and instruction;For receiving and handling the data of one or more input equipment inputs and refer to
Order;One or more output equipments are arrived for sending data and instruction;In gene data calculation process, appoint for performing scheduling
Business, things type task, gene data calculating task is performed for coordinating with one or more FPGA and one or more GPU.
In the present embodiment, FPGA function is as follows:For with one or more CPU interaction datas and instruction;It can be used for
The one or more GPU of scheduling controlling, and one or more GPU interaction datas and instruction;For being handed over one or more memories
Mutual data and instruction;It can be used for receiving and handle data and the instruction of one or more input equipment inputs;It can be used for hair
Data and instruction is sent to arrive one or more output equipments;In gene data calculation process, for one or more CPU and one
Individual or multiple GPU, which coordinate, performs gene data calculating task, can be used for performing scheduler task, things type task.
In the present embodiment, GPU function is as follows:For with one or more CPU interaction datas and instruction;Can be used for and
One or more FPGA interaction datas and instruction;For with one or more memory interaction datas and instruction;In gene data
In calculation process, gene data calculating task is performed for coordinating with one or more FPGA and one or more CPU.
In the present embodiment, the function of internal memory 3 is as follows:For storing one or more gene sequencing data, gene sequencing number
According to for initial data and/or compressed data, compressed data does not limit compression algorithm;For storing one or more gene reference sequences
Row and its corresponding one or more marks;For storing one or more known variation data;For storage and gene
Data calculate related other input datas;In gene data calculation process, for storing intermediate result and final data;No
Limit memory species, such as DDR3(Dual Data Rate 3), DDR4 etc..
In the present embodiment, the function that gene calculates data command input block 4 is as follows:Stream is calculated for inputting gene data
Cheng Suoxu data and instruction;For input gene data understand flow needed for data and instruction;For inputting gene data
Data needed for calculation process and instruction;Input equipment species, such as keyboard are not limited(Keyboard), mouse(Mouse), track
Ball(Trackball), Trackpad(touch pad)Deng input equipment, or boundary scan(Joint Test Action
Group, JTAG), USB(Universal Serial Bus, USB)Deng general-purpose interface, or Ethernet
(Ethernet), Long Term Evolution(Long Term Evolution, LTE), Wireless Fidelity(Wireless-Fidelity, Wi-
Fi), bluetooth(Bluetooth)Deng the network port, or analogue audio frequency input interface(Such as the stereo small three cores interfaces of 3.5mm)、
DAB input interface(Such as Sony/Philips Digital Interface Sony/Philips Digital Interface, S/
PDIF), video input interface(Such as HDMI High Definition Multimedia Interface,
HDMI)Deng multimedia interface, or flash memory(FLASH), solid state hard disc(Solid State Drives, SSD)Etc. external storage
Equipment, or temperature sensor(Measure body temperature), optical sensor(Measure heart rate), fingerprint sensor(Gather fingerprint)Deng sensing
Device(Sensor);Do not limit input data and instruction form, such as electric signal, text, picture, voice, audio, video and
Their any combination.
In the present embodiment, the function of gene result of calculation output unit 5 is as follows:For exporting gene data calculation process institute
The data of generation and instruction;Data and the instruction that flow is generated are understood for exporting gene data;For exporting gene data
Data and instruction that calculation process is generated;Output equipment species, such as cathode-ray tube are not limited(CRT), liquid crystal display
(LCD), light emitting diode(LED)Deng display device, either the general purpose interface such as JTAG, USB or Ethernet, LTE,
The network ports such as Wi-Fi, Bluetooth, or analogue audio frequency output interface(Such as the stereo small three cores interfaces of 3.5mm), numeral
Audio output interface(Such as S/PDIF), video output interface(Such as HDMI)Deng multimedia interface;Or solid state hard disc(Solid
State Drives, SSD)Deng External memory equipment, the form of output data and instruction, such as electric signal, text, figure are not limited
Piece, voice, audio, video etc. and their any combination.Referring to Fig. 1, gene calculates data command input block 4 and gene meter
Calculating can be realized based on the common equipment in part between result output unit 5, for example common interface module, Network Interface Module,
External memory equipment etc..
As shown in Figure 4 and Figure 5, the detailed step bag instruction and data scheduling computing engines 12 of code segment handled
Include:
A1)Judge whether code segment can carry out executing instructions respectively, if streamline execution can be carried out, if line number can be entered
According to parallel execution, if three can not, redirect execution step A7), exit;Otherwise, execution step A2 is redirected);
A2)Judge whether code segment can only carry out data parallel execution, if data parallel execution can only be carried out, redirect execution
Step A3);Otherwise, execution step A6 is redirected);
A3)Judge that code segment is assigned to optimize on FPGA to perform(It is i.e. parallel to perform, similarly hereinafter)Overhead less than code segment assign
Whether the overhead for optimizing execution on to GPU is set up, code segment be assigned to optimize on FPGA execution overhead include CPU and
Interaction data and instruction are produced between FPGA communication overhead, FPGA memory access expense and FPGA computing cost, code segment
It is assigned to and optimizes the overhead of execution on GPU and include communication overhead, the GPU that interaction data and instruction are produced between CPU and GPU
The computing cost of memory access expense and GPU, redirects execution step A6 if setting up);Otherwise, execution step A4 is redirected);
A4)Whether be energy consumption preferential, if energy consumption is preferential, then redirect execution step A6 if judging code segment);Otherwise, redirect and hold
Row step A5);
A5)Judge that the gene of code segment is calculated to handle if appropriate for GPU acceleration, if being adapted to GPU acceleration processing, redirect execution
Step A8);Otherwise, execution step A7 is redirected);
A6)The all possible accelerated methods of FPGA are comprehensively utilized, accelerated method is included in parallel instructions, streamline, data parallel
At least one, judge that code segment is assigned on FPGA and optimizes what the overhead of execution was performed less than code segment on CPU
Whether overhead is set up, if set up, and redirects execution step A9), otherwise, redirect execution step A7);
A7)The instruction and data of code segment is dispatched into CPU processing, exited;
A8)The instruction and data of code segment is dispatched into GPU processing, exited;
A9)The instruction and data of code segment is dispatched into FPGA processing, exited.
As shown in fig. 6, step A5)Detailed step include:
A5.1)Judge the gene of code segment calculates whether data parallel execution can be carried out, if data parallel execution can be carried out,
Redirect execution step A5.2);Otherwise, execution step A7 is redirected);
A5.2)Judge that code segment is assigned to optimize the overhead of execution on GPU and be less than the overhead that code segment is performed on CPU
No to set up, code segment is assigned to the overhead for optimizing execution on GPU including interaction data between CPU and GPU and instructs what is produced
Communication overhead, GPU memory access expense and GPU computing cost, the overhead that code segment is performed on CPU include CPU visit
Expense and CPU computing cost are deposited, execution step A8 is redirected if setting up);Otherwise, execution step A7 is redirected).
In summary, the heterogeneous platform calculated towards gene data of the present embodiment can be with lower cost, and satisfaction is high
Real-time and accuracy requirement that performance gene data is calculated.
Described above is only the preferred embodiment of the present invention, and protection scope of the present invention is not limited merely to above-mentioned implementation
Example, all technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art
Those of ordinary skill for, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications
It should be regarded as protection scope of the present invention.
Claims (8)
1. a kind of heterogeneous platform calculated towards gene data, it is characterised in that:Including heterogeneous processor unit(1), interconnection it is total
Wire module(2), internal memory(3), gene calculate data command input block(4)With gene result of calculation output unit(5), it is described different
Structure processor unit(1)Pass through interconnection module respectively(2)With internal memory(3), gene calculate data command input block(4)、
Gene result of calculation output unit(5)It is connected, the heterogeneous processor unit(1)Including CPU, GPU and FPGA, wherein CPU structures
Into control engine(11), CPU, GPU, FPGA three composition computing engines(12), the control engine(11)Passing through base
Because calculating data command input block(4)Receive gene to calculate data command and be divided into code segment, when the task class of code segment
When type is control task, the instruction and data of code segment is dispatched into CPU processing;When the task type of code segment is appointed to calculate
During business, the instruction and data of code segment is dispatched into computing engines(12)Handled and result of calculation is passed through into gene result of calculation
Output unit(5)Output.
2. the heterogeneous platform according to claim 1 calculated towards gene data, it is characterised in that:The FPGA includes handing over
Fork pass, I/O control unit and accelerator unit, the I/O control unit, accelerator unit are connected with cross bar switch respectively, institute
State accelerator unit include be used for realize hidden Markov model computing hardware accelerate hidden Markov model computation accelerator,
For realizing at least one of both hardware-accelerated hash function computation accelerators of Hash calculation, the I/O control unit with
Interconnection module(2)It is connected.
3. the heterogeneous platform according to claim 2 calculated towards gene data, it is characterised in that:The I/O control unit
Including PCIE interfaces, dma controller, PIU peripheral interface units and DDR controller, the cross bar switch respectively with DMA controls
Device, PIU peripheral interface units are connected with DDR controller, are connected with each other between the dma controller, PIU peripheral interface units,
The PCIE interfaces are connected with dma controller, the PCIE interfaces, DDR controller respectively with interconnection module(2)It is connected.
4. the heterogeneous platform according to claim 1 calculated towards gene data, it is characterised in that:The interconnection mould
Block(2)Including HCCLink bus modules(21)With HNCLink bus modules(22), described CPU, GPU and FPGA pass through respectively
HCCLink bus modules(21)And internal memory(3)It is connected, and described CPU, GPU and FPGA pass through HNCLink bus modules respectively
(22)Data command input block is calculated with gene(4)And gene result of calculation output unit(5)It is connected.
5. the heterogeneous platform according to claim 1 calculated towards gene data, it is characterised in that:The gene calculates number
According to instruction input unit(4)Including input equipment, common interface module, Network Interface Module, multimedia input interface module, outer
At least one of portion's storage device, sensor.
6. the heterogeneous platform according to claim 1 calculated towards gene data, it is characterised in that:The gene calculates knot
Fruit output unit(5)Deposited including display device, common interface module, Network Interface Module, multimedia output interface module, outside
Store up at least one of equipment.
7. the heterogeneous platform according to claim 1 calculated towards gene data, it is characterised in that:It is described by code segment
Instruction and data dispatches computing engines(12)The detailed step handled includes:
A1)Judge whether code segment can carry out executing instructions respectively, if streamline execution can be carried out, if line number can be entered
According to parallel execution, if three can not, redirect execution step A7), exit;Otherwise, execution step A2 is redirected);
A2)Judge whether code segment can only carry out data parallel execution, if data parallel execution can only be carried out, redirect execution
Step A3);Otherwise, execution step A6 is redirected);
A3)Judge that code segment is assigned to optimize on FPGA to perform(It is i.e. parallel to perform, similarly hereinafter)Overhead less than code segment assign
Whether the overhead for optimizing execution on to GPU is set up, the code segment be assigned to optimize on FPGA execution overhead include CPU
Interaction data and instruction are produced between FPGA communication overhead, FPGA memory access expense and FPGA computing cost, it is described
Code segment is assigned to the overhead for optimizing execution on GPU including interaction data between CPU and GPU and instructs the communication produced to open
Pin, GPU memory access expense and GPU computing cost, redirect execution step A6 if setting up);Otherwise, execution step is redirected
A4);
A4)Whether be energy consumption preferential, if energy consumption is preferential, then redirect execution step A6 if judging code segment);Otherwise, redirect and hold
Row step A5);
A5)Judge that the gene of code segment is calculated to handle if appropriate for GPU acceleration, if being adapted to GPU acceleration processing, redirect execution
Step A8);Otherwise, execution step A7 is redirected);
A6)The all possible accelerated methods of FPGA are comprehensively utilized, the accelerated method includes parallel instructions, streamline, data simultaneously
At least one of row, judges that code segment is assigned on FPGA and optimizes the overhead of execution and is held less than code segment on CPU
Whether capable overhead is set up, if set up, and redirects execution step A9), otherwise, redirect execution step A7);
A7)The instruction and data of code segment is dispatched into CPU processing, exited;
A8)The instruction and data of code segment is dispatched into GPU processing, exited;
A9)The instruction and data of code segment is dispatched into FPGA processing, exited.
8. the heterogeneous platform according to claim 7 calculated towards gene data, it is characterised in that:Step A5)It is detailed
Step includes:
A5.1)Judge the gene of code segment calculates whether data parallel execution can be carried out, if data parallel execution can be carried out,
Redirect execution step A5.2);Otherwise, execution step A7 is redirected);
A5.2)Judge that code segment is assigned to optimize the overhead of execution on GPU and be less than the overhead that code segment is performed on CPU
No to set up, the code segment is assigned to the overhead for optimizing execution on GPU including interaction data between CPU and GPU and instructs production
Raw communication overhead, GPU memory access expense and GPU computing cost, the overhead that the code segment is performed on CPU includes
CPU memory access expense and CPU computing cost, redirects execution step A8 if setting up);Otherwise, execution step is redirected
A7).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710055558.3A CN107066802B (en) | 2017-01-25 | 2017-01-25 | A kind of heterogeneous platform calculated towards gene data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710055558.3A CN107066802B (en) | 2017-01-25 | 2017-01-25 | A kind of heterogeneous platform calculated towards gene data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107066802A true CN107066802A (en) | 2017-08-18 |
CN107066802B CN107066802B (en) | 2018-05-15 |
Family
ID=59599186
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710055558.3A Active CN107066802B (en) | 2017-01-25 | 2017-01-25 | A kind of heterogeneous platform calculated towards gene data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107066802B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804376A (en) * | 2018-06-14 | 2018-11-13 | 山东航天电子技术研究所 | A kind of small-sized heterogeneous processing system based on GPU and FPGA |
CN111090611A (en) * | 2018-10-24 | 2020-05-01 | 上海雪湖信息科技有限公司 | Small heterogeneous distributed computing system based on FPGA |
CN112673347A (en) * | 2018-10-19 | 2021-04-16 | 日本电信电话株式会社 | Data processing system, central processing unit, and data processing method |
CN113254104A (en) * | 2021-06-07 | 2021-08-13 | 中科计算技术西部研究院 | Accelerator and acceleration method for gene analysis |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020002A (en) * | 2012-11-27 | 2013-04-03 | 中国人民解放军信息工程大学 | Reconfigurable multiprocessor system |
CN104820207A (en) * | 2015-05-08 | 2015-08-05 | 中国科学院新疆天文台 | Real-time correlator based on FPGA, GPU and CPU mixed architecture |
-
2017
- 2017-01-25 CN CN201710055558.3A patent/CN107066802B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020002A (en) * | 2012-11-27 | 2013-04-03 | 中国人民解放军信息工程大学 | Reconfigurable multiprocessor system |
CN104820207A (en) * | 2015-05-08 | 2015-08-05 | 中国科学院新疆天文台 | Real-time correlator based on FPGA, GPU and CPU mixed architecture |
Non-Patent Citations (2)
Title |
---|
PINGFAN MENG ET.AL.: "FPGA-GPU-CPU Heterogenous Architecture for Real-time Cardiac Physiological Optical Mapping", 《CONFERENCE: FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2012 INTERNATIONAL CONFERENCE ON》 * |
RA INTA ET.AL.: "The "Chimera": An Off-The-Shelf CPU/GPGPU/FPGA Hybrid Computing Platform", 《INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804376A (en) * | 2018-06-14 | 2018-11-13 | 山东航天电子技术研究所 | A kind of small-sized heterogeneous processing system based on GPU and FPGA |
CN108804376B (en) * | 2018-06-14 | 2021-11-19 | 山东航天电子技术研究所 | Small heterogeneous processing system based on GPU and FPGA |
CN112673347A (en) * | 2018-10-19 | 2021-04-16 | 日本电信电话株式会社 | Data processing system, central processing unit, and data processing method |
CN111090611A (en) * | 2018-10-24 | 2020-05-01 | 上海雪湖信息科技有限公司 | Small heterogeneous distributed computing system based on FPGA |
CN113254104A (en) * | 2021-06-07 | 2021-08-13 | 中科计算技术西部研究院 | Accelerator and acceleration method for gene analysis |
CN113254104B (en) * | 2021-06-07 | 2022-06-21 | 中科计算技术西部研究院 | Accelerator and acceleration method for gene analysis |
Also Published As
Publication number | Publication date |
---|---|
CN107066802B (en) | 2018-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106886690A (en) | It is a kind of that the heterogeneous platform understood is calculated towards gene data | |
CN107066802B (en) | A kind of heterogeneous platform calculated towards gene data | |
CN207895435U (en) | Neural computing module | |
CN110119311A (en) | A kind of distributed stream computing system accelerated method based on FPGA | |
CN108416433B (en) | Neural network heterogeneous acceleration method and system based on asynchronous event | |
CN106227507A (en) | Calculating system and controller thereof | |
CN110427262A (en) | A kind of gene data analysis method and isomery dispatching platform | |
CN105630735A (en) | Coprocessor based on reconfigurable computational array | |
Zhou et al. | Accelerating large-scale single-source shortest path on FPGA | |
CN106897581B (en) | A kind of restructural heterogeneous platform understood towards gene data | |
CN110135584A (en) | Extensive Symbolic Regression method and system based on self-adaptive parallel genetic algorithm | |
Kim et al. | FragGeneScan-plus for scalable high-throughput short-read open reading frame prediction | |
WO2012116654A1 (en) | Prototype verification system and verification method for high-end fault-tolerant computer | |
Bosilca et al. | Performance portability of a GPU enabled factorization with the DAGuE framework | |
CN105046109A (en) | Acceleration platform used for biological information sequence analysis | |
Wang et al. | A knowledge-based multi-agent evolutionary algorithm for semiconductor final testing scheduling problem | |
CN117642721A (en) | Partial and additive schedule aware, dynamically reconfigurable adder tree architecture in machine learning accelerators | |
Wang et al. | WooKong: A ubiquitous accelerator for recommendation algorithms with custom instruction sets on FPGA | |
Geng et al. | A survey: Handling irregularities in neural network acceleration with fpgas | |
Xie et al. | CuMF_SGD: Fast and scalable matrix factorization | |
CN110473593A (en) | A kind of Smith-Waterman algorithm implementation method and device based on FPGA | |
CN106897582B (en) | A kind of heterogeneous platform understood towards gene data | |
Lian et al. | Dadu: Accelerating inverse kinematics for high-DOF robots | |
Zhou et al. | Accelerating broadcast communication with gpu compression for deep learning workloads | |
Seok et al. | A highly parallelized control system platform architecture using multicore CPU and FPGA for multi-DoF robots |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information |
Inventor after: Song Zhuo Inventor after: Liu Pengxia Inventor after: Li Gen Inventor after: Ma Chouxian Inventor before: Song Zhuo Inventor before: Liu Pengxia Inventor before: Li Gen |
|
CB03 | Change of inventor or designer information |