CN106886690B

CN106886690B - It is a kind of that the heterogeneous platform understood is calculated towards gene data

Info

Publication number: CN106886690B
Application number: CN201710055557.9A
Authority: CN
Inventors: 宋卓; 刘蓬侠; 李�根
Original assignee: Human And Future Biotechnology (changsha) Co Ltd
Current assignee: Human And Future Biotechnology (changsha) Co Ltd
Priority date: 2017-01-25
Filing date: 2017-01-25
Publication date: 2018-03-09
Anticipated expiration: 2037-01-25
Also published as: CN106886690A

Abstract

The invention discloses a kind of heterogeneous platform for calculating and understanding towards gene data, including heterogeneous processor unit, interconnection module, internal memory, gene calculates unscrambling data instruction input unit and gene calculates and understands result output unit, the heterogeneous processor unit passes through interconnection module and internal memory respectively, gene calculates unscrambling data instruction input unit, gene calculates deciphering result output unit and is connected, the heterogeneous processor unit includes CPU, GPU, DSP and FPGA, wherein CPU forms control engine, CPU, GPU, FPGA three forms computing engines, CPU, GPU, DSP three's Explanation in Constitution engine.Real-time and accuracy, the accuracy of the raising gene data deciphering and readability that the present invention can calculate to improve gene data provide hardware supported, and there is gene data to calculate the advantages of deciphering efficiency high, manufacturing cost are low, calculating deciphering energy consumption is low.

Description

It is a kind of that the heterogeneous platform understood is calculated towards gene data

Technical field

The present invention relates to gene sequencing technology, and in particular to a kind of that the heterogeneous platform understood is calculated towards gene data.

Background technology

Recent years, with sequencing technologies of future generation（Next Generation Sequence, hereinafter referred to as NGS）'s Extensive use, the cost of gene sequencing decline rapidly, and gene technology initially enters popularization and application.NGS calculates including gene data Two steps are understood with gene data, wherein gene data, which calculates, refers to carry out original gene sequencing data pseudo-, duplicate removal Deng pretreatment, used when being understood so as to gene data, gene data, which is understood, refers to the gene number after handling gene data calculating Analyzed, disclosed and explained according to the Scientific Meaning in fields such as biology, medical science, health cares.

The bottleneck of two aspects be present in gene technology clinical practice development：One restricts gene technology clinical practice development Bottleneck is the magnanimity of gene data.Based on the reason for the technology, single sample data volume of the original gene data of NGS generations is very （Whole-Genome Sequencing, WGS）Single sample data reach more than 100G, therefore gene The calculating of single sample data has been just the intensive and high computation-intensive task of high input/output；It is fast along with gene technology Speed popularization, the total amount exponentially of the original gene data of sequencing generation is caused to increase.So real-time to gene data progress, Accurately calculate and transmission becomes extremely difficult, be faced with huge challenge.Therefore, typical method is to possess quantity at present More, on the computer cluster of the stronger high-performance processor of performance, handled with the software based on multithreading.But It is to be the shortcomings that this system：On the one hand, its cost in storage, power consumption, technical support and maintenance is high；On the other hand, exist On the premise of ensureing accuracy, its obtainable parallel computation acceleration is still difficult to meet the needs of above-mentioned challenge；It is more main Want, the increase far super Moore's Law of the original gene data of generation is sequenced, so, this method, which has lacked, to be continued Property.Another bottleneck for restricting gene technology clinical practice development is the accuracy and readability that gene data is understood.Base at present The typical method that factor data is understood is to be based on mankind's reference gene, with sequencing generation and after gene data calculating processing Gene data, reconstruct someone gene.However, currently used reference gene, such as GRCh38, it is based on limited sample This, is both not enough to represent the diversity of the whole mankind, and incomplete, is detecting unique change different time in genes of individuals, standard Gene information flow can cause deviation, and lack and analyzed with the depth intersection of other biologies, medical information.In addition, gene Data are understood and also rest essentially within professional domain, towards non-professional masses, lack readability, that is, lack direct to gene data Biological meaning and indirect health effect easy-to-understand, various informative deciphering.

At present, processor type common in computer system has central processing unit（Central Processing Unit, abbreviation CPU）, field programmable gate array（Field Programmable Gate Array, abbreviation FPGA）, figure Processor（Graphics Processing Unit, abbreviation GPU）And digital signal processor（Digital Signal Processor, abbreviation DSP）.Current high-performance CPU generally includes multiple processor cores（Processor Core）, from Support multithreading on hardware, but its design object is still towards general purpose application program, and relative to special calculating, it is general should It is smaller with the concurrency of program, it is necessary to more complicated control and relatively low performance objective.Therefore, the hardware resource master on CPU pieces Still to be used to realizing the control rather than calculating of complexity, not include special hardware for specific function, it would be preferable to support meter It is not high to calculate degree of parallelism.FPGA is a kind of semi-custom circuit, and advantage has：System development is carried out based on FPGA, the design cycle is short, develops Expense is low；It is low in energy consumption；Configuration can be remodified after production, design flexibility is high, and design risk is small.Shortcoming is：Realize equally In general function, FPGA compare application specific integrated circuit（Application Specific Integrated Circuit, ASIC）Speed it is slow, it is bigger than ASIC circuit area.With the development and evolution of technology, FPGA is to more high density, more great Rong Amount, more low-power consumption and integrated more stone intellectual properties（Intellectual Property, IP）Direction develop, FPGA's Shortcoming is being reduced, and advantage is being amplified.Compared to CPU, FPGA can be realized, changed and increased to be customized with hardware description language Parallel computation.GPU is initially a kind of microprocessor dedicated for image procossing, and texture mapping and more can be supported from hardware The graphics calculations basic tasks such as side shape coloring.It is related to the calculating of some general mathematicals because graphics calculates, such as matrix and vector Computing, and GPU possesses the framework of highly-parallel, therefore, with the development of related software and hardware technology, GPU computing techniques are increasingly Rise, i.e. GPU is no longer limited to graphics process, is also exploited for linear algebra, signal transacting, numerical simulation etc. and counts parallel Calculate, decades of times or even up to a hundred times of CPU performance can be provided.But current GPU has 2：First, it is limited to GPU hardware architectural features, many parallel algorithms can not efficiently perform on GPU；Second, it can be produced in GPU operations a large amount of Heat, energy consumption are higher.DSP be it is a kind of various signals are carried out with digital method quickly analyze, converted, filtering, detection, modulate, The microprocessor of the calculation process such as demodulation.Therefore, DSP has done special optimization on chip internal structure, for example hardware is realized At a high speed, high-precision multiplication etc..With the arrival of digital Age, DSP is widely used in smart machine, resource exploration, numeral control The every field such as system, biomedicine, space flight and aviation, have the characteristics that low in energy consumption, precision is high, can carry out two dimension and multidimensional is handled. In summary, four kinds of calculating devices of the above respectively have feature, and respectively have limitation.But for forementioned gene technology clinical practice How the bottleneck of two aspects existing for development, using above-mentioned processor build mixed architecture platform to realize magnanimity gene number According to calculating understand, then have become a key technical problem urgently to be resolved hurrily.

The content of the invention

The technical problem to be solved in the present invention：For the above mentioned problem of prior art, there is provided one kind can be to improve gene Real-time and accuracy, the accuracy of raising gene data deciphering and the readable offer hardware supported that data calculate, gene number According to calculating, deciphering efficiency high, manufacturing cost are low, calculate and understand the low heterogeneous platform understood towards gene data calculating of energy consumption.

In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is：

It is a kind of to calculate the heterogeneous platform understood towards gene data including heterogeneous processor unit, interconnection module, interior Deposit, gene calculates unscrambling data instruction input unit and gene calculates and understands result output unit, the heterogeneous processor unit Unscrambling data instruction input unit is calculated by interconnection module and internal memory, gene respectively, gene calculates and understands result output Unit is connected, and the heterogeneous processor unit includes CPU, GPU, DSP and FPGA, and wherein CPU, which is formed, controls engine, the CPU, GPU, FPGA three form computing engines, CPU, GPU, DSP three Explanation in Constitution engine, and the control engine is passing through base Because calculating unscrambling data instruction input unit reception gene calculating unscrambling data instructs and is divided into code segment, when appointing for code segment When service type is control task, the instruction and data of code segment is dispatched into CPU processing；When the task type of code segment is meter During calculation task, the instruction and data scheduling computing engines of code segment are handled and calculated result of calculation by gene and are understood As a result output unit exports；When the task type of code segment is solves reading task, the instruction and data of code segment is dispatched and understood Engine, which is handled and calculates result of calculation by gene, understands the output of result output unit；The instruction by code segment and The detailed step that data dispatch computing engines are handled includes：

A1）Judge whether code segment can carry out executing instructions respectively, if streamline execution can be carried out, if can enter Row data parallel performs, if three can not, redirect and perform step A7）, exit；Otherwise, redirect and perform step A2）；

A2）Judge whether code segment can only carry out data parallel execution, if data parallel execution can only be carried out, redirect Perform step A3）；Otherwise, redirect and perform step A6）；

A3）Judge that code segment is assigned to optimize on FPGA to perform（It is i.e. parallel to perform, similarly hereinafter）Overhead be less than code segment It is assigned to and optimizes the overhead of execution on GPU and whether set up, the code segment is assigned to the overhead bag for optimizing execution on FPGA Interaction data and communication overhead, FPGA memory access expense and FPGA computing cost caused by instruction between CPU and FPGA are included, The code segment is assigned to the overhead for optimizing execution on GPU including being communicated caused by interaction data between CPU and GPU and instruction Expense, GPU memory access expense and GPU computing cost, redirected if setting up and perform step A6）；Otherwise, redirect and perform step Rapid A4）；

A4）Judge whether code segment is that energy consumption is preferential, if energy consumption is preferential, then redirects and perform step A6）；Otherwise, jump Turn to perform step A5）；

A5）Judge that the gene of code segment calculates to handle if appropriate for GPU acceleration, if being adapted to GPU acceleration processing, redirect Perform step A8）；Otherwise, redirect and perform step A7）；

A6）The all possible accelerated methods of FPGA are comprehensively utilized, the accelerated method includes parallel instructions, streamline, number According at least one of parallel, judge that code segment is assigned to FPGA on and optimizes the overhead of execution and be less than code segment in CPU Whether the overhead of upper execution is set up, if set up, redirects and performs step A9）, otherwise, redirect and perform step A7）；

A7）The instruction and data of code segment is dispatched into CPU processing, exited；

A8）The instruction and data of code segment is dispatched into GPU processing, exited；

A9）The instruction and data of code segment is dispatched into FPGA processing, exited.

Preferably, step A5）Detailed step include：

A5.1）Judge that the gene of code segment calculates whether data parallel execution can be carried out, held if data parallel can be carried out OK, then redirect and perform step A5.2）；Otherwise, redirect and perform step A7）；

A5.2）Judge that code segment is assigned to optimize the overhead of execution on GPU and always open less than what code segment performed on CPU Whether pin is set up, and the code segment is assigned to the overhead for optimizing execution on GPU including interaction data between CPU and GPU and referred to Communication overhead, GPU memory access expense and GPU computing cost, the overhead that the code segment performs on CPU caused by order The computing cost of memory access expense and CPU including CPU, redirected if setting up and perform step A8）；Otherwise, redirect and perform step Rapid A7）.

Preferably, the FPGA includes cross bar switch, I/O control unit and accelerator unit, the I/O control unit, adds Fast device unit is connected with cross bar switch respectively, and the accelerator unit includes being used to realize that hidden Markov model computing hardware adds Speed hidden Markov model computation accelerator, for realizing both hardware-accelerated hash function computation accelerators of Hash calculation At least one of, the I/O control unit is connected with interconnection module.

Preferably, the I/O control unit includes PCIE interfaces, dma controller, PIU peripheral interface units and DDR controls Device, the cross bar switch are connected with dma controller, PIU peripheral interface units and DDR controller respectively, the dma controller, It is connected with each other between PIU peripheral interface units, the PCIE interfaces are connected with dma controller, the PCIE interfaces, DDR controls Device is connected with interconnection module respectively.

Preferably, the interconnection module includes HCCLink bus modules and HNCLink bus modules, the CPU, GPU, DSP are connected by HCCLink bus modules with internal memory respectively with FPGA, and described CPU, GPU, DSP and FPGA lead to respectively Cross HNCLink bus modules and gene calculates unscrambling data instruction input unit and gene calculates and understands result output unit phase Even.

Preferably, the gene, which calculates unscrambling data instruction input unit, includes input equipment, common interface module, network At least one of interface module, multimedia input interface module, External memory equipment, sensor.

Preferably, the gene, which calculates deciphering result output unit, includes display device, common interface module, network interface At least one of module, multimedia output interface module, External memory equipment.

Preferably, the detailed step that the instruction and data scheduling solution read engine of code segment is handled includes：

B1）Judge whether code segment is Digital Signal Processing respectively, if be non-graphic image class multi-media processing, if For graph and image processing, if three is not, redirects and perform step B7）；Otherwise, redirect and perform step B2）；

B2）Judge whether code segment is graph and image processing, if graph and image processing, then redirect and perform step B3）； Otherwise, redirect and perform step B5）；

B3）Judge code segment be assigned on DSP and optimize execution overhead be assigned to less than code segment on GPU and Optimization perform overhead whether set up, the code segment be assigned to DSP on and optimize execution overhead include CPU with Interaction data and communication overhead, DSP memory access expense and DSP computing cost, the code segment caused by instruction between DSP Be assigned on GPU and optimize execution overhead include between CPU and GPU interaction data and instruction caused by communication overhead, GPU memory access expense and GPU computing cost, redirected if setting up and perform step B5）；Otherwise, execution step is redirected B4）；

B4）Judge whether code segment is that energy consumption is preferential, if energy consumption is preferential, then redirects and perform step B5）；Otherwise, jump Turn to perform step B7）；

B5）Judge code segment be assigned on DSP and optimize execution overhead performed less than code segment on CPU it is total Whether expense is set up, and the overhead that the code segment performs on CPU includes CPU memory access expense and CPU computing cost, Redirected if setting up and perform step B6）；Otherwise, redirect and perform step B8）；

B6）The instruction and data of code segment is dispatched into DSP processing, exited；

B7）Judge that the gene of code segment is understood to handle if appropriate for GPU acceleration, be adapted to if the gene of code segment is understood GPU acceleration is handled, then the instruction and data of code segment is dispatched into GPU processing, exited；Otherwise, redirect and perform step B8）；

B8）The instruction and data of code segment is dispatched into CPU processing, exited.

Preferably, step B7）Detailed step include：

B7.1）Judge whether code segment is graph and image processing, if graph and image processing, then redirect execution step B7.3）；Otherwise, redirect and perform step B7.2）；

B7.2）Judge whether code segment can carry out data parallel execution, if data parallel execution can be carried out, redirect and hold Row step B7.3）；Otherwise, redirect and perform step B8）；

B7.3）Judge that code segment is assigned on GPU and optimizes what the overhead of execution performed less than code segment on CPU Whether overhead is set up, and the code segment is assigned on GPU and optimizes the overhead of execution including interactive between CPU and GPU Communication overhead, GPU memory access expense and GPU computing cost caused by data and instruction, the code segment perform on CPU The memory access expense of overhead including CPU and CPU computing cost, redirected if setting up and perform step B7.4）；Otherwise, Redirect and perform step B8）；

B7.4）The instruction and data of code segment is dispatched into GPU processing, exited.

The present invention calculates the heterogeneous platform tool understood towards gene data and had the advantage that：

1st, hardware and software platform, heterogeneous platform of the invention are the heterogeneous platform for adding FPGA, GPU and DSP based on CPU, can allow and set Count the various gene datas calculating of staff development, gene data deciphering and gene data to calculate and understand application flow, without setting again Count hardware system；Other disclosures can be transplanted or commercial gene data is calculated, gene data is understood and gene data is calculated and understood Application software, without redesigning hardware system；Isomery programming language can be used（Such as OpenCL）To realize whole heterogeneous platform The uniformity of application and development.

2nd, scalability is good, and heterogeneous platform of the invention is the heterogeneous platform for adding FPGA, GPU and DSP based on CPU, can According to the difference of application demand and change, neatly extend and reconstruct.

3rd, it is widely used, heterogeneous platform of the invention is the heterogeneous platform for adding FPGA, GPU and DSP based on CPU, can either Calculated as local gene data, the processing equipment that gene data is understood and gene data calculating is understood, and can is enough used as cluster Or gene data is calculated under cloud computing environment, gene data is understood and gene data calculates the processing node understood.

4th, Gao Kepei, heterogeneous platform of the invention is the heterogeneous platform for adding FPGA, GPU and DSP based on CPU, in software side Face, four kinds of core components --- CPU, FPGA, GPU and DSP are programming devices；In hardware aspect, FPGA can also be After system sizing, production and installation, increment configuration is carried out on demand, that is, is changed and/or increased function；, can in terms of application integration The various application requirements understood are calculated according to gene data, according to CPU, FPGA, GPU and DSP and other hardware advantageous feature, The tissue of system all parts, scale and relevance are configured and used, makes each part rational division of work and cooperates, Optimize application flow in maximum efficiency.Present system provides good design flexibility and increasing for system and using designer Allocative abilities are measured, it is easily scalable to adapt to new application.

5th, match gene data and calculate the Heterogeneous Computing understood（heterogeneous computing）Demand, the present invention Heterogeneous platform be to add FPGA, GPU and DSP heterogeneous platform based on CPU, can be well matched with and meet now and base in future Factor data calculates the various structurings such as fusion treatment analysis text, picture, voice, audio, video and other electric signals in deciphering Demand with the Heterogeneous Computing of unstructured data to hardware.

6th, high-performance, heterogeneous platform of the invention are the heterogeneous platform for adding FPGA, GPU and DSP based on CPU, can be from three Individual aspect calculates to understand for high-performance gene data provides hardware supported：One, while tasks in parallel, data parallel and algorithm are provided Hardware needed for hardware-accelerated；Two, while control task, affairs type task, non-data intensity calculating task, data-intensive is provided Hardware needed for type calculating task；Three, while the fusion treatments such as text, picture, voice, audio, video and other electric signals are provided Hardware needed for analysis.

7th, inexpensive, heterogeneous platform of the invention is the heterogeneous platform for adding FPGA, GPU and DSP based on CPU, and is used completely Software processing gene data calculates computer cluster or cloud computing platform understand, existing and compared, while performance is improved, The cost in design, storage, network, power consumption, technical support and maintenance can be reduced.

8th, low-power consumption, heterogeneous platform of the invention are the heterogeneous platform for adding FPGA, GPU and DSP based on CPU, pass through FPGA With DSP use, CPU and GPU part work is shared, while improving performance and realizing functional diversities, reduces energy Consumption.

Brief description of the drawings

Fig. 1 is the circuit theory schematic diagram of heterogeneous platform of the embodiment of the present invention.

Fig. 2 is the engine structure schematic diagram of heterogeneous platform of the embodiment of the present invention.

Fig. 3 is the circuit theory schematic diagram of FPGA in heterogeneous platform of the embodiment of the present invention.

Fig. 4 is the scheduling flow schematic diagram that heterogeneous platform of the embodiment of the present invention controls engine.

Fig. 5 is the schematic flow sheet that heterogeneous platform of the embodiment of the present invention dispatches computing engines.

Fig. 6 is that heterogeneous platform of the embodiment of the present invention dispatches the schematic flow sheet that computing engines judge whether to be adapted to GPU to accelerate.

Fig. 7 is the schematic flow sheet of heterogeneous platform of embodiment of the present invention scheduling solution read engine.

Fig. 8 is the schematic flow sheet that heterogeneous platform of embodiment of the present invention scheduling solution read engine judges whether to be adapted to GPU to accelerate.

Marginal data：1st, heterogeneous processor unit；11st, engine is controlled；12nd, computing engines；13rd, read engine is solved；2nd, interconnect Bus module；21st, HCCLink bus modules；22nd, HNCLink bus modules；3rd, internal memory；4th, gene calculates unscrambling data instruction Input block；5th, gene, which calculates, understands result output unit.

Embodiment

As depicted in figs. 1 and 2, the heterogeneous platform understood that calculated towards gene data of the present embodiment includes heterogeneous processor Unit 1, interconnection module 2, internal memory 3, gene calculate unscrambling data instruction input unit 4 and gene calculates and understands result output Unit 5, heterogeneous processor unit 1 calculate unscrambling data instruction input list by interconnection module 2 and internal memory 3, gene respectively Member 4, gene calculate deciphering result output unit 5 and are connected, and heterogeneous processor unit 1 includes CPU（Central Processing Unit, central processing unit）、GPU（Graphics Processing Unit, graphics processor）、DSP（Digital Signal Processor, digital signal processor）And FPGA（Field Programmable Gate Array, scene can Program gate array）, wherein, CPU forms control engine 11, and CPU, GPU, FPGA three form computing engines 12, CPU, GPU, DSP Three's Explanation in Constitution engine 13, control engine 11 are calculating the reception gene calculating of unscrambling data instruction input unit 4 by gene Unscrambling data instructs and is divided into code segment, when the task type of code segment is control task, by the instruction sum of code segment According to scheduling CPU processing；When the task type of code segment is calculating task, the instruction and data of code segment is dispatched and calculated Engine 12 is handled and result of calculation is calculated into deciphering result output unit 5 by gene and exported；When the task class of code segment When type is solves reading task, the instruction and data scheduling solution read engine 13 of code segment is handled and result of calculation is passed through into gene Deciphering result output unit 5 is calculated to export.

In the present embodiment, CPU quantity can be one or more, and each CPU includes one or more processors core （Processor Core）, GPU quantity can be one or more, and DSP quantity can be one or more, FPGA quantity It can be one or more, can be entered in CPU, GPU, DSP and FPGA between any individual based on interconnection module 2 Row interconnects and exchanges data and instruction, and can be based on the realization of interconnection module 2 and internal memory 3, gene and calculate unscrambling data Instruction input unit 4 and gene calculate the arbitrary equipment understood in result output unit 5 and are interconnected and exchange data and instruction. Certainly, realize that the bus form that data and instruction are interconnected and exchanged between the said equipment part is not limited to specific interconnection Mode, can be in a manner of using various concrete implementations as needed.

As shown in figure 3, FPGA includes cross bar switch（Crossbar）, I/O control unit and accelerator unit, IO controls are single Member, accelerator unit are connected with cross bar switch respectively, and accelerator unit includes being used to realize hidden Markov model computing hardware The hidden Markov model of acceleration（Hidden Markov Model, HMM）Computation accelerator, for realizing Hash calculation hardware Both hash function (Hash function) computation accelerators of acceleration, I/O control unit is connected with interconnection module 2.This In embodiment, cross bar switch specifically uses Advanced extensible Interface（Advanced eXtensible Interface, AXI）Hand over Fork is closed.In addition, accelerator unit can also select single hidden Markov model computation accelerator or single Kazakhstan as needed Uncommon function computation accelerator or simultaneously using other more similar hardware accelerators, for it is hardware-accelerated realize it is other Calculate.

As shown in figure 3, I/O control unit includes PCIE（Peripheral Component Interconnect Express, quick Peripheral Component Interconnect）Interface, DMA（Direct Memory Access, direct memory access）Controller, PIU（Peripheral Interface Unit, peripheral interface unit）Peripheral interface unit and DDR controller, cross bar switch point It is not connected with dma controller, PIU peripheral interface units and DDR controller, phase between dma controller, PIU peripheral interface units Connect, PCIE interfaces are connected with dma controller, and PCIE interfaces, DDR controller are connected with interconnection module 2 respectively.DDR Controller accesses for DDR, provides storage for Large Volume Data, DDR controller specifically uses DDR4 controllers in the present embodiment. Above-mentioned PCIE interfaces, above-mentioned dma controller, above-mentioned PIU cooperate to be used between above-mentioned FPGA and above-mentioned CPU, and above-mentioned Between FPGA and above-mentioned GPU, data and instruction are transmitted；Above-mentioned cross bar switch is used for above-mentioned dma controller, above-mentioned PIU peripheries connect Mouthpiece, above-mentioned DDR controller, above-mentioned hidden Markov model computation accelerator, above-mentioned hash function computation accelerator and on The interconnection between other accelerators is stated, path is provided for the data between them and instruction transmission.

As shown in figure 1, interconnection module 2 includes HCCLink（Heterogeneous computing Cache Coherence Link, Heterogeneous Computing storage uniformity interconnection）Bus module 21 and HNCLink（Heterogeneous Computing Non-Coherence Link, the interconnection of Heterogeneous Computing nonuniformity）Bus module 22, CPU, GPU, DSP and FPGA is connected by HCCLink bus modules 21 with internal memory 3 respectively, and CPU, GPU, DSP and FPGA are total by HNCLink respectively Wire module 22 calculates unscrambling data instruction input unit 4 with gene and gene calculates deciphering result output unit 5 and is connected. HCCLink bus modules 21 be used for above-mentioned CPU, above-mentioned FPGA, above-mentioned GPU and above-mentioned DSP and above-mentioned DDR4 memory arrays it Between interconnected and exchange data, instruction.HNCLink bus modules 22 are used for above-mentioned CPU, above-mentioned FPGA, above-mentioned GPU and above-mentioned Interconnected between DSP and exchange control instruction；For above-mentioned CPU, above-mentioned FPGA, above-mentioned GPU and above-mentioned DSP with it is above-mentioned defeated Enter/output equipment（I/O）Between interconnected and exchange data, instruction.

In the present embodiment, internal memory 3 is DDR4 memory arrays（Memory Array）.

In the present embodiment, gene, which calculates unscrambling data instruction input unit 4, includes input equipment, common interface module, net At least one of network interface module, multimedia input interface module, External memory equipment, sensor.In the present embodiment, input Equipment includes at least one of keyboard, mouse, trace ball and Trackpad, and common interface module includes boundary scan interface mould At least one of block, USB module, Network Interface Module include ethernet interface module, Long Term Evolution At least one of LTE interface module, Wi-Fi interface module, Bluetooth interface module, multimedia input interface module include simulation At least one of audio input interface, DAB input interface, video input interface, External memory equipment include flash memory At least one of FLASH, solid-state hard disk SSD, sensor include temperature sensor, heart rate measurement sensor, fingerprint sensor At least one of.

In the present embodiment, gene calculates deciphering result output unit 5 and connect including display device, common interface module, network At least one of mouth mold block, multimedia output interface module, External memory equipment.In the present embodiment, display device includes the moon At least one of extreme ray pipe CRT, liquid crystal display LCD, LED, common interface module include boundary scan At least one of interface module, USB module, Network Interface Module include ethernet interface module, long-term At least one of evolution LTE interface module, Wi-Fi interface module, Bluetooth interface module, multimedia output interface module includes At least one of analogue audio frequency output interface, digital audio output interface, video output interface, External memory equipment include dodging Deposit at least one of FLASH, solid-state hard disk SSD.

As shown in figure 4, control engine 11 is calculating the reception gene calculating solution of unscrambling data instruction input unit 4 by gene Read data command and be divided into code segment, the calculating then formed according to the task type of code segment to CPU, GPU, FPGA three Engine 12, the solution read engine 13 that CPU, GPU, DSP three are formed carry out integrated dispatch：When the task type of code segment is appointed for control During business, the instruction and data of code segment is dispatched into CPU processing；, will generation when the task type of code segment is calculating task The instruction and data scheduling computing engines 12 of code section are handled and exported result of calculation by gene calculating deciphering result single Member 5 exports；When the task type of code segment is solves reading task, the instruction and data scheduling solution read engine 13 of code segment is carried out Handle and result of calculation is calculated into deciphering result output unit 5 by gene and export.

In the present embodiment, CPU function is as follows：For scheduling controlling one or more FPGA, and one or more FPGA Interaction data and instruction；For scheduling controlling one or more GPU, and one or more GPU interaction datas and instruction；For adjusting Degree controls one or more DSP, and one or more DSP interaction datas and instruction；For being interacted with one or more memories Data and instruction；For receiving and handling data and the instruction of one or more input equipment inputs；For sending data and referring to Make one or more output equipments；In gene data calculation process, for performing scheduler task, things type task, it is used for Coordinate with one or more FPGA and one or more GPU and perform gene data calculating task；In gene data understands flow, For performing scheduler task, things type task, gene number is performed for coordinating with one or more DSP and one or more GPU According to solution reading task；Gene data calculate understand flow in, for performing scheduler task, things type task, for one or Multiple FPGA and one or more GPU, which coordinate, performs gene data calculating task, for one or more DSP and one or more Individual GPU, which coordinates, performs gene data solution reading task.

In the present embodiment, FPGA function is as follows：For with one or more CPU interaction datas and instruction；It can be used for Scheduling controlling one or more GPU, and one or more GPU interaction datas and instruction；It can be used for scheduling controlling one or more Individual DSP, and one or more DSP interaction datas and instruction；For with one or more memory interaction datas and instruction；Can be with For receiving and handling data and the instruction of one or more input equipment inputs；It can be used for sending data and instruction to one Or multiple output equipments；In gene data calculation process, held for coordinating with one or more CPU and one or more GPU Row gene data calculating task, it can be used for performing scheduler task, things type task；, can be with gene data understands flow For performing scheduler task, things type task, can be used for coordinating execution base with one or more DSP and one or more GPU Factor data solution reading task；In gene data calculates and understands flow, for matching somebody with somebody with one or more CPU and one or more GPU Close and perform gene data calculating task, can be used for coordinating execution gene data with one or more DSP and one or more GPU Reading task is solved, can be used for performing scheduler task, things type task.

In the present embodiment, GPU function is as follows：For with one or more CPU interaction datas and instruction；Can be used for and One or more FPGA interaction datas and instruction；It can be used for and one or more DSP interaction datas and instruction；For with one Or multiple memory interaction datas and instruction；In gene data calculation process, for one or more FPGA and one or Multiple CPU, which coordinate, performs gene data calculating task；Gene data understand flow in, for one or more DSP and one Individual or multiple CPU, which coordinate, performs gene data solution reading task；In gene data calculates and understands flow, it is used for and one or more FPGA and one or more CPU, which coordinate, performs gene data calculating task, for one or more DSP and one or more CPU, which coordinates, performs gene data solution reading task.

In the present embodiment, DSP function is as follows：For with one or more CPU interaction datas and instruction；Can be used for and One or more FPGA interaction datas and instruction；It can be used for and one or more GPU interaction datas and instruction；For with one Or multiple memory interaction datas and instruction；It can be used for receiving and handle the data of one or more input equipment inputs and refer to Order；It can be used for sending data and one or more output equipments arrived in instruction；Gene data understand flow in, for one Or multiple CPU and one or more GPU coordinate execution gene data solution reading task；In gene data calculates and understands flow, use Gene data solution reading task is performed in coordinating with one or more CPU and one or more GPU.

In the present embodiment, the function of internal memory 3 is as follows：For storing one or more gene sequencing data, gene sequencing number According to for initial data and/or compressed data, the unlimited level pressure compression algorithm of compressed data；For storing one or more gene reference sequences Row and its corresponding one or more marks；For storing one or more known variation data；For storage and gene Data calculate related other input datas；For storing the other input datas related to gene data deciphering；For storing The other input datas related to gene data calculating deciphering；In gene data calculation process, for store intermediate result and Final data；In gene data understands flow, for storing intermediate result and final data；Calculated in gene data and understand stream Cheng Zhong, for storing intermediate result and final data；Memory species, such as DDR3 are not limited（Dual Data Rate 3）, DDR4 etc..

In the present embodiment, the function that gene calculates unscrambling data instruction input unit 4 is as follows：By inputting based on gene data Calculate the data needed for flow and instruction；For inputting data and instruction needed for gene data deciphering flow；For inputting gene Data, which calculate, understands data and instruction needed for flow；Input equipment species, such as keyboard are not limited（Keyboard）, mouse （Mouse）, trace ball（Trackball）, Trackpad（touch pad）Deng input equipment, or boundary scan（Joint Test Action Group, JTAG）, USB（Universal Serial Bus, USB）Deng general-purpose interface, or ether Net（Ethernet）, Long Term Evolution（Long Term Evolution, LTE）, Wireless Fidelity（Wireless-Fidelity, Wi- Fi）, bluetooth（Bluetooth）Deng the network port, or analogue audio frequency input interface（Such as the stereo small three cores interfaces of 3.5mm）、 DAB input interface（Such as Sony/Philips Digital Interface Sony/Philips Digital Interface, S/ PDIF）, video input interface（Such as HDMI High Definition Multimedia Interface, HDMI）Deng multimedia interface, or flash memory（FLASH）, solid state hard disc（Solid State Drives, SSD）Etc. external storage Equipment, or temperature sensor（Measure body temperature）, optical sensor（Measure heart rate）, fingerprint sensor（Gather fingerprint）Deng sensing Device（Sensor）；Do not limit input data and instruction form, such as electric signal, text, picture, voice, audio, video etc. and Their any combination.

In the present embodiment, the function that gene calculates deciphering result output unit 5 is as follows：Stream is calculated for exporting gene data The data and instruction that journey is generated；Data and the instruction that flow is generated are understood for exporting gene data；For exporting gene Data calculate data and the instruction for understanding that flow is generated；Output equipment species, such as cathode-ray tube are not limited（CRT）, liquid Crystal display（LCD）, light emitting diode（LED）Deng display device, either the general purpose interface such as JTAG, USB or The network ports such as Ethernet, LTE, Wi-Fi, Bluetooth, or analogue audio frequency output interface（As 3.5mm is stereo small by three Core interface）, digital audio output interface（Such as S/PDIF）, video output interface（Such as HDMI）Deng multimedia interface；Or solid-state Hard disk（Solid State Drives, SSD）Deng External memory equipment, output data and the form of instruction, such as electricity are not limited Signal, text, picture, voice, audio, video etc. and their any combination.Referring to Fig. 1, gene calculates unscrambling data instruction Input block 4 and gene are calculated to understand and can realized based on the common equipment in part between result output unit 5, such as general are connect Mouth mold block, Network Interface Module, External memory equipment etc..

As shown in Figure 4 and Figure 5, the detailed step bag instruction and data scheduling computing engines 12 of code segment handled Include：

A3）Judge that code segment is assigned to optimize on FPGA to perform（It is i.e. parallel to perform, similarly hereinafter）Overhead be less than code segment It is assigned to and optimizes the overhead of execution on GPU and whether set up, code segment, which is assigned to, to be optimized the overhead of execution and include CPU on FPGA Interaction data and communication overhead, FPGA memory access expense and FPGA computing cost, code caused by instruction between FPGA Section be assigned to optimize on GPU execution overhead include between CPU and GPU interaction data and instruction caused by communication overhead, GPU Memory access expense and GPU computing cost, redirected if setting up and perform step A6）；Otherwise, redirect and perform step A4）；

A6）The all possible accelerated methods of FPGA are comprehensively utilized, accelerated method includes parallel instructions, streamline, data simultaneously At least one of row, judge that code segment is assigned on FPGA and optimizes the overhead of execution and held less than code segment on CPU Whether capable overhead is set up, if set up, redirects and performs step A9）, otherwise, redirect and perform step A7）；

As shown in fig. 6, step A5）Detailed step include：

A5.2）Judge that code segment is assigned to optimize the overhead of execution on GPU and always open less than what code segment performed on CPU Whether pin is set up, and code segment is assigned to the overhead for optimizing execution on GPU including interaction data between CPU and GPU and instructs production Raw communication overhead, GPU memory access expense and GPU computing cost, the overhead that code segment performs on CPU include CPU Memory access expense and CPU computing cost, redirected if setting up and perform step A8）；Otherwise, redirect and perform step A7）.

As shown in figs. 4 and 7, the detailed step bag instruction and data scheduling solution read engine 13 of code segment handled Include：

B3）Judge code segment be assigned on DSP and optimize execution overhead be assigned to less than code segment on GPU and Whether the overhead that optimization performs is set up, code segment be assigned to DSP on and optimize the overhead of execution including CPU and DSP it Between communication overhead, DSP memory access expense and DSP computing cost caused by interaction data and instruction, code segment is assigned to GPU Overhead that is upper and optimizing execution includes the memory access of interaction data and communication overhead, GPU caused by instruction between CPU and GPU The computing cost of expense and GPU, redirected if setting up and perform step B5）；Otherwise, redirect and perform step B4）；

B5）Judge code segment be assigned on DSP and optimize execution overhead performed less than code segment on CPU it is total Whether expense is set up, and the overhead that code segment performs on CPU includes CPU memory access expense and CPU computing cost, if Establishment, which then redirects, performs step B6）；Otherwise, redirect and perform step B8）；

As shown in figure 8, step B7）Detailed step include：

In summary, the heterogeneous platform that deciphering is calculated towards gene data of the present embodiment can be full with lower cost The real-time and accuracy requirement that sufficient high-performance gene data calculates, meet accuracy that high cognition gene data understands and readable Property require.

Described above is only the preferred embodiment of the present invention, and protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art Those of ordinary skill for, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims

1. a kind of calculate the heterogeneous platform understood towards gene data, it is characterised in that：Including heterogeneous processor unit（1）, mutually Join bus module（2）, internal memory（3）, gene calculate unscrambling data instruction input unit（4）Calculated with gene and understand result output list Member（5）, the heterogeneous processor unit（1）Pass through interconnection module respectively（2）With internal memory（3）, gene calculate unscrambling data Instruction input unit（4）, gene calculate understand result output unit（5）It is connected, the heterogeneous processor unit（1）Including CPU, GPU, DSP and FPGA, wherein CPU form control engine（11）, CPU, GPU, FPGA three composition computing engines（12）, institute State CPU, GPU, DSP three's Explanation in Constitution engine（13）, the control engine（11）Instructed calculating unscrambling data by gene Input block（4）Receive gene calculating unscrambling data to instruct and be divided into code segment, when the task type of code segment is appointed for control During business, the instruction and data of code segment is dispatched into CPU processing；, will generation when the task type of code segment is calculating task The instruction and data scheduling computing engines of code section（12）Handled and calculated result of calculation by gene and understand result output Unit（5）Output；When the task type of code segment is solves reading task, by the instruction and data scheduling solution read engine of code segment （13）Handled and calculated result of calculation by gene and understand result output unit（5）Output；The finger by code segment Order and data dispatch computing engines（12）The detailed step handled includes：

A1）Judge whether code segment can carry out executing instructions respectively, if streamline execution can be carried out, if line number can be entered Performed according to parallel, if three can not, redirect and perform step A7）, exit；Otherwise, redirect and perform step A2）；

A2）Judge whether code segment can only carry out data parallel execution, if data parallel execution can only be carried out, redirect execution Step A3）；Otherwise, redirect and perform step A6）；

A3）Judge that code segment is assigned to optimize on FPGA to perform（It is i.e. parallel to perform, similarly hereinafter）Overhead less than code segment assign Whether the overhead for optimizing execution on to GPU is set up, and the code segment, which is assigned to, to be optimized the overhead of execution and include CPU on FPGA Interaction data and communication overhead, FPGA memory access expense and FPGA computing cost caused by instruction, described between FPGA Code segment is assigned to the overhead for optimizing execution on GPU including interaction data between CPU and GPU and instructs caused communication to open Pin, GPU memory access expense and GPU computing cost, redirected if setting up and perform step A6）；Otherwise, execution step is redirected A4）；

A4）Judge whether code segment is that energy consumption is preferential, if energy consumption is preferential, then redirects and perform step A6）；Otherwise, redirect and hold Row step A5）；

A5）Judge that the gene of code segment calculates to handle if appropriate for GPU acceleration, if being adapted to GPU acceleration processing, redirect execution Step A8）；Otherwise, redirect and perform step A7）；

A6）The all possible accelerated methods of FPGA are comprehensively utilized, the accelerated method includes parallel instructions, streamline, data simultaneously At least one of row, judge that code segment is assigned on FPGA and optimizes the overhead of execution and held less than code segment on CPU Whether capable overhead is set up, if set up, redirects and performs step A9）, otherwise, redirect and perform step A7）；

2. according to claim 1 calculate the heterogeneous platform understood towards gene data, it is characterised in that：The FPGA bags Include cross bar switch, I/O control unit and accelerator unit, the I/O control unit, accelerator unit respectively with cross bar switch phase Connect, the accelerator unit includes being used to realize that the hidden Markov model calculating that hidden Markov model computing hardware accelerates adds Fast device, for realizing at least one of both hardware-accelerated hash function computation accelerators of Hash calculation, the IO controls Unit and interconnection module（2）It is connected.

3. according to claim 2 calculate the heterogeneous platform understood towards gene data, it is characterised in that：The IO controls Unit includes PCIE interfaces, dma controller, PIU peripheral interface units and DDR controller, and the cross bar switch is controlled with DMA respectively Device processed, PIU peripheral interface units are connected with DDR controller, are mutually interconnected between the dma controller, PIU peripheral interface units Connect, the PCIE interfaces are connected with dma controller, the PCIE interfaces, DDR controller respectively with interconnection module（2）Phase Even.

4. according to claim 1 calculate the heterogeneous platform understood towards gene data, it is characterised in that：The interconnection is total Wire module（2）Including HCCLink bus modules（21）With HNCLink bus modules（22）, described CPU, GPU, DSP and FPGA points Do not pass through HCCLink bus modules（21）And internal memory（3）It is connected, and described CPU, GPU, DSP and FPGA pass through HNCLink respectively Bus module（22）Unscrambling data instruction input unit is calculated with gene（4）And gene calculates and understands result output unit（5） It is connected.

5. according to claim 1 calculate the heterogeneous platform understood towards gene data, it is characterised in that：The gene meter Calculate unscrambling data instruction input unit（4）Connect including input equipment, common interface module, Network Interface Module, multimedia input At least one of mouth mold block, External memory equipment, sensor.

6. according to claim 1 calculate the heterogeneous platform understood towards gene data, it is characterised in that：The gene meter Calculate and understand result output unit（5）Including display device, common interface module, Network Interface Module, multimedia output interface mould At least one of block, External memory equipment.

7. according to claim 1 calculate the heterogeneous platform understood towards gene data, it is characterised in that：Step A5）'s Detailed step includes：

A5.1）Judge that the gene of code segment calculates whether data parallel execution can be carried out, if data parallel execution can be carried out, Redirect and perform step A5.2）；Otherwise, redirect and perform step A7）；

A5.2）Judge that code segment is assigned to optimize the overhead of execution on GPU and be less than the overhead that code segment performs on CPU No establishment, the code segment are assigned to the overhead for optimizing execution on GPU including interaction data between CPU and GPU and instruct production Raw communication overhead, GPU memory access expense and GPU computing cost, the overhead that the code segment performs on CPU include CPU memory access expense and CPU computing cost, redirected if setting up and perform step A8）；Otherwise, execution step is redirected A7）.

8. according to claim 1 calculate the heterogeneous platform understood towards gene data, it is characterised in that：It is described by code The instruction and data scheduling solution read engine of section（13）The detailed step handled includes：

B1）Judge whether code segment is Digital Signal Processing respectively, if be non-graphic image class multi-media processing, if for figure Shape image procossing, if three is not, redirects and perform step B7）；Otherwise, redirect and perform step B2）；

B2）Judge whether code segment is graph and image processing, if graph and image processing, then redirect and perform step B3）；It is no Then, redirect and perform step B5）；

B3）Judge that code segment is assigned on DSP and optimizes the overhead of execution to be assigned on GPU and optimize less than code segment Whether the overhead of execution is set up, the code segment be assigned on DSP and optimize execution overhead include CPU and DSP it Between communication overhead, DSP memory access expense and DSP computing cost, the code segment caused by interaction data and instruction assign On to GPU and optimize the overhead of execution and include communication overhead, GPU caused by interaction data and instruction between CPU and GPU The computing cost of memory access expense and GPU, redirected if setting up and perform step B5）；Otherwise, redirect and perform step B4）；

B4）Judge whether code segment is that energy consumption is preferential, if energy consumption is preferential, then redirects and perform step B5）；Otherwise, redirect and hold Row step B7）；

B5）Judge that code segment is assigned on DSP and optimizes the overhead of execution less than the overhead that code segment performs on CPU Whether to set up, the overhead that the code segment performs on CPU includes CPU memory access expense and CPU computing cost, if Establishment, which then redirects, performs step B6）；Otherwise, redirect and perform step B8）；

B7）Judge that the gene of code segment is understood to handle if appropriate for GPU acceleration, be adapted to GPU to add if the gene of code segment is understood Speed processing, then dispatch GPU processing by the instruction and data of code segment, exit；Otherwise, redirect and perform step B8）；

9. according to claim 8 calculate the heterogeneous platform understood towards gene data, it is characterised in that：Step B7）'s Detailed step includes：

B7.1）Judge whether code segment is graph and image processing, if graph and image processing, then redirect and perform step B7.3）； Otherwise, redirect and perform step B7.2）；

B7.2）Judge whether code segment can carry out data parallel execution, if data parallel execution can be carried out, redirect and perform step Rapid B7.3）；Otherwise, redirect and perform step B8）；

B7.3）Judge that code segment is assigned on GPU and optimizes the overhead of execution always to open less than what code segment performed on CPU Whether pin is set up, and the code segment, which is assigned to GPU on and optimizes the overhead of execution, includes interaction data between CPU and GPU With instruction caused by communication overhead, GPU memory access expense and GPU computing cost, the code segment performs total on CPU Expense includes CPU memory access expense and CPU computing cost, is redirected if setting up and performs step B7.4）；Otherwise, redirect Perform step B8）；