CN107168795A

CN107168795A - Codon deviation factor model method based on CPU GPU isomery combined type parallel computation frames

Info

Publication number: CN107168795A
Application number: CN201710332575.7A
Authority: CN
Inventors: 章乐; 陈镜行; 丁维龙; 荆晨阳; 冯计平
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2017-05-12
Filing date: 2017-05-12
Publication date: 2017-09-15
Anticipated expiration: 2037-05-12
Also published as: CN107168795B

Abstract

The invention provides a kind of codon deviation factor implementation method based on CPU GPU isomery combined type parallel computation frames, including：Cluster is set up on rolling reamer machine, every rolling reamer machine is used as a node；And host node is set to host node and from node simultaneously；CUDA framework environments are built, and carry out network configuration；Batch jobs catalogue to submission is pre-processed, task list of the fitting output comprising the actual HDFS storage locations of each waiting task file to the catalogue specified；Define to the division rule of task list, and be sent in node and handled；The HDFS paths of individual task file are got, are downloaded it to after local node, are arranged as command request；And file content and command request are sent on GPU servers；It is after the completion of the calculating of GPU server ends is monitored, output file is locally downloading, and it is uploaded to HDFS.The present invention realizes that operation division, parallel processing and internal memory are calculated in the case of batch jobs submission, improves the efficiency of codon bias algorithm.

Description

Codon deviation factor mould based on CPU-GPU isomery combined type parallel computation frames Type method

Technical field

The present invention relates to one kind by setting up CPU-GPU isomery combined type parallel computation frames, optimization codon deviation system The method of exponential model efficiency, belongs to biological big data field.

Background technology

With the development of bioinformatics, people are by numerous studies biomolecular information structure and tissue come deep learning Genome hereditary information rule.And with the development innovation of sequencing technologies, third generation sequencing technologies of today have reached more accurate Really, level quickly, inexpensive.By taking HISEQ X TEN sequenators as an example, it has promoted the mankind in analysis hereditary information rule Field rapid advances, but storage and the analysis efficiency problem of lots of genes data are also form simultaneously.

Wherein have by mathematics and statistics, the side of lifting gene data analysis efficiency is realized by means of innovatory algorithm Method, also there is the physical characteristic using GPU graphics cards, and the approach for calculating and accelerating is realized with its powerful high-speed parallel ability.Its Middle CUDA-C alleviates the pressure of Large Scale Biology data calculating with regard to both to a certain extent, by the GPU for supporting CUDA The unit parallel computation that bias algorithm (CAT) is used for codon is realized, the high calculating performance of graphics process is made full use of, it is real Existing 200 times of speed-up ratios.But for now, the biometric database such as NCBI, EBI, DDBJ still constantly enriches each type Gene data, the growth of data volume seems the trend not stopped.So deposited in Data Analysis Services and data On the problem of storage, we are badly in need of finding new breakthrough point.

Recently as the realization of proposition and the cloud computing of grid computing concept, this method can be managed to a certain extent Solve as by the network share of soft and hardware resource in distributed type assemblies.Hadoop as one of cloud computing framework of nowadays main flow, Possess high reliability (step-by-step is stored and powerful data-handling capacity), high scalability (to distribute between available computer cluster Data simultaneously complete calculating task, and these clusters can be easily scalable in thousands of calculate nodes, and are directed to collection Group for, can easily realize the extension of number of nodes), high efficiency (being capable of dynamic mobile data, and energy in the cluster The enough dynamic equilibrium for effectively ensureing each node), high fault tolerance (automatically saves multiple copies of data to ensure the peace of data Full property and integrality) etc. it is outstanding the characteristics of.Hadoop supports the language such as Java, C, C++ and Python to develop simultaneously, has Very high ease for use.And for the not high of node hardware requirement, this causes building and use cost for Hadoop clusters It is relatively low, cost performance is improved to a certain extent.

In the prior art, the major programme used at present includes following several：

With regard to bioinformatics itself, researcher by mathematical model optimizing the applicable sex chromosome mosaicism of traditional algorithm, And codon deviation factor MODEL C DC is realized by its codon component analysis tool box CAT researched and developed.Then The researcher that GPU accelerates parallel passes through the part of the CUDA programming models of NVIDIA companies to CDC algorithms in above-mentioned CAT softwares Not possessing the module of data dependence relation realizes GPU parallel acceleration, has issued the CUDA optimization softwares CUDA- of CDC algorithms CDC.In the prior art, such as Publication No. CN102708088A, CN104536937A, CN104731569A, In technical scheme described in CN105335135A etc., it is by setting fixed master and slave nodes, building knot Group's framework.

The defect that prior art is primarily present have it is following some：

(1) serial process of current multitask can not be solved the problems, such as.

(2) currently employed blade server can not use CPU, GPU isomorphism mould due to that can not assign GPU equipment Formula.

The content of the invention

In view of this, to solve the above-mentioned problems in the prior art, the present invention specifically provides following technical scheme：

The invention provides a kind of codon deviation factor realization based on CPU-GPU isomery combined type parallel computation frames Method, it is characterised in that methods described comprises the following steps：

Step 1: setting up Hadoop clusters on rolling reamer machine, every rolling reamer machine is used as a node；And by host node simultaneously It is set to host node and from node；

Step 2: CUDA framework environments are built on tower server, and setting can be with each node of Hadoop clusters Realize the network configuration of communication；

Step 3: realizing pretreatment module, the batch jobs catalogue to submission is pre-processed, and fitting output includes each The task list of the actual HDFS storage locations of waiting task file is under the catalogue specified；

Step 4: defining the division rule to task list by MapReduce frameworks, Map tasks are started in host node, Each task in task list is obtained into burst example, and is sent in node and is handled；

Step 5: getting the HDFS paths of individual task file by the input of Reduce modules, this is downloaded it to After ground node, arrange to need the command request handled on GPU servers；And send file content and command request After on to GPU servers, remote control command is performed；

Step 6: after the completion of the calculating of GPU server end CUDA-CDC programs is monitored, will be defeated under specified output directory Go out file download to local node, and be uploaded to HDFS.

Preferably, in the step 2, the graphics calculations card quantity that the tower server is installed and blade server number Amount is identical.

Preferably, in the step 3, pretreatment is carried out to batch jobs catalogue and specifically included：HDFS is committed to user In operation file catalogue traveled through, search and meet the fasta files of CUDA-CDC input file requirements, it is every according to filename Traversal obtains directory characters string of the result then by file in HDFS and spliced with filename, obtains this document and is deposited in HDFS The fullpath of storage, and the character string for representing fullpath is recorded as a line in the task list in write-in HDFS.

Preferably, the pretreatment can be carried out in the following way in detail：HDFS is committed to user by recursive algorithm In /input catalogues under operation file catalogue carry out traversal search meet CUDA-CDC input file requirements fasta text Part, often travels through according to filename and obtains directory characters string of the result then by file in HDFS and spliced with filename, obtain The fullpath stored to this document in HDFS, and the character string for representing fullpath is write in HDFS as a line record / input catalogues under task.txt files in.In using task.txt files as TextInputFormat classes The input of getSplits functions and createRecordReader functions.

Preferably, in the step 4, the division rule is specially：It is clear to the task using divided by row rule One record of N rows content creating in list, obtains institute's pending processing HDFS storage locations；The N is natural number.

Preferably, the division rule can be carried out in the following way in detail：Due to the write-in rule in task.txt files It is then the complete HDFS store paths that a line records a pending fasta file, therefore when dividing task list file, adopts The divided by row rule given tacit consent to TextInputFormat, is recorded to every one RecordReader of a line content creating.Obtain All records i.e. fasta files to be handled HDFS storage locations, in this, as the input of Reduce functions.Other can The division rule of choosing can be set according to different system and cluster environment, such as using N rows file content as one group of record, Divided using the form of similar packing, it means that needed when Reduce functions are receiving RecordReader The N number of file of serial process.

Preferably, in the step 5, arrange to need the command request handled on GPU servers, specific bag Include following steps：In command procedure is remotely performed, define remote command and perform the IP of main frame, user name, password and complete Command string；For fasta file analysis orders, title, the input parameter of software are defined by absolute path.

Preferably, the step 5 also includes：By defining identical outgoing route, the position of output file is counted, with It is easy to the recovery of classification results；By string-concatenation, obtain comprising software information, input believing comprising what absolute path was described Breath, the command-line string of output information.

Preferably, the group system structure is specially：Include master node, slave node and calculating three subsets of service node Group, group system different from the past, Map ends are not set up with GPU ends and communicated.And JSCH is used between Reduce ends and GPU ends Communicated, the computing environment at Map ends and Reduce ends and GPU ends is thoroughly separated, carry out collaboration processing.

Compared with prior art, technical solution of the present invention passes through under codon bias algorithm (CAT) and CUDA environment Codon bias algorithm (CUDA-C's) of realization researchs and analyses, for a variety of situations Hadoop parallel tasks divide it is feasible Property analysis, service, realize real in the case of being submitted to batch jobs with reference to distributed type assemblies environment and CUDA Server remotes SSH Existing operation division, parallel processing and internal memory are calculated, and the calculating of each node on cluster and storage resource are integrated, enter one Step improves the efficiency of codon bias algorithm.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.

Fig. 1 is the Network Structure of Industrial Cluster figure of the embodiment of the present invention；

Fig. 2 is the cluster frameworks design drawing of the embodiment of the present invention；

Fig. 3 is the preliminary design figure of the embodiment of the present invention；

Fig. 4 is the data preprocessing phase figure of the embodiment of the present invention；

Fig. 5 is the CAT framework UML detailed design figures of the embodiment of the present invention；

Fig. 6 is first group of data run time line chart of the embodiment of the present invention；

Fig. 7 is first group of data run time scale figure of the embodiment of the present invention；

Fig. 8 is second group of data run time line chart of the embodiment of the present invention；

Fig. 9 is second group of data run time scale figure of the embodiment of the present invention.

Embodiment

The embodiment of the present invention is described in detail below in conjunction with the accompanying drawings.It will be appreciated that described embodiment is only this Invent a part of embodiment, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art exist The all other embodiment obtained under the premise of creative work is not made, the scope of protection of the invention is belonged to.

Those skilled in the art are it is to be understood that following specific embodiments or embodiment, are that the present invention is further The set-up mode for the series of optimum explained the specific content of the invention and enumerated, and be between those set-up modes can be mutual With reference to or it is interrelated use, unless clearly proposed some of which or a certain specific embodiment or embodiment party in the present invention Formula can not be associated setting or be used in conjunction with other embodiments or embodiment.Meanwhile, following specific embodiment or Embodiment only as optimize set-up mode, and not as limit protection scope of the present invention understanding.

Embodiment 1

In a specific embodiment, corresponding system of the invention can be built according to Fig. 1 Fig. 2 multisystems, be adopted With four independently installed CentOs7 operating systems of Dell M820 rolling reamer machines, 4 pieces are installed using an AMAX tower server NVIDIA graphics calculations card simultaneously configures CUDA environment.Five rolling reamer machine network configurations are phase same network segment IP to ensure it by more than afterwards Communication is normal.4 rolling reamer machines are then respectively set to master, slave1, slave2, slave3.With remaining Hadoop Cluster is identical, and three slave machines are set into cluster from node, and different places is to match the quantity of GPU card, protects During demonstrate,proving Multi-task Concurrency, each GPU resource is utilized, and master is also write to the slaves texts of cluster configuration Used simultaneously as master and slave in part.

Embodiment 2

Present invention design is realized the compound parallel computation frame of real GPU-CPU isomeries and studied in codon using bias Optimization of the field to codon deviation factor model.GPU distributed type assemblies model extension will be needed to use to can not be in cluster On node in direct carry GPU equipment, the applicability of compound cluster is added.

Specifically, the invention provides a kind of new combined type parallel computation frame.Drawn by distributed computing task Multiple operations that user submits are carried out distributed treatment by extension set system, and using HVN after realization is distributed to each node Task is sent on the GPU servers specified by network, and the efficient process of single homework is carried out using CUDA-CDC softwares.In processing By data record to node and HDFS is uploaded to after end.

In a specific embodiment, method of the invention can be realized especially by one of following preferred embodiment, With reference to Fig. 3, show that the codon based on GPU-CPU isomery combined type parallel computation frames that the embodiment of the present invention 1 is provided makes With the implementation process of bias algorithm, details are as follows for this method process：

Step 1: setting up Hadoop clusters on rolling reamer machine, every rolling reamer machine is used as a node.It is different from conventional configuration , Master is set to host node and from node simultaneously, it is ensured that in case of need, can also carry out distributed task scheduling Processing, the utilization rate of cluster is maximized.

In a specific embodiment, in step one, a computer cluster is built, comprising one by blade type Server group into Hadoop server clusters and a GPU server cluster being made up of tower server, GPU servers Quantity only has one, but is wherein equipped with the GPU graphics calculations card suitable with blade server quantity.

For Hadoop server clusters, wherein containing four rolling reamer machine nodes, wherein one rolling reamer machine is configured to Master nodes, at the same time, Slave nodes are each configured to by the node and its excess-three node.Have between each node Independent internal memory and memory space.

Step 2: CUDA framework environments are built on tower server, and setting can be with each node of Hadoop clusters Realize the network configuration of communication.

In a specific embodiment, in step 2, network service is set up with GPU servers to each node Passage.Exemplary, each node realizes that network service and remote command control are called to GPU servers by JSCH.

It should be noted that the GPU computing resources needed for algorithm CUDA optimizations are not in Hadoop clusters, but it is logical The mode of remote command execution is crossed, calculating is provided by proprietary GPU servers.So design and be advantageous in that for Hadoop collection Group, it is not necessary to Compulsory Feature hardware environment itself.After each stage module, the program can be by simple to remote command Modification just can be suitably used for other calculated cases.

Step 3: by pretreatment module, being pre-processed to the batch jobs catalogue that user submits, fitting output is included The task list of the actual HDFS storage locations of each waiting task file is under the catalogue specified.

In a specific embodiment, in step 3, the directory traversal algorithm that selection has screening is submitted to user Catalogue traveled through, search out all pending fasta files, uniformly recorded in the task list file specified.Show Example property, from HDFS /input/task.txt files.

Specific pretreatment, in a detailed embodiment, by recursive algorithm user is committed in HDFS/ Operation file catalogue under input catalogues carries out traversal and searches the fasta files for meeting CUDA-CDC input file requirements, according to Filename, which is often traveled through, to be obtained directory characters string of the result then by file in HDFS and is spliced with filename, obtains this document HDFS store fullpath, and using represent fullpath character string as a line record write-in HDFS in /input mesh In task.txt files under record.GetSplits functions in using task.txt files as TextInputFormat classes and The input of createRecordReader functions.

Step 4: by MapReduce frameworks, defining the division rule to task list, starting Map in host node appoints Business, obtains burst example, and be sent in node and handled by each task in task list.

In a specific embodiment, in step 4, the content in task list file is divided.Example Property, from the divided by row rule of TextInputFormat classes, the rule of row log file path and title are pressed in matching files Then.

Because the complete HDFS that the write-in rule in task.txt files is a line one pending fasta file of record is deposited Path is stored up, therefore when dividing task list file, the divided by row rule given tacit consent to using TextInputFormat, to each One RecordReader record of row content creating.Obtain all records i.e. the HDFS of fasta files to be handled store Position, in this, as the input of Reduce functions.Other optional division rules can be according to different system and cluster environment Set, such as divided N rows file content using the form of similar packing as one group of record, it means that The N number of file of serial process is needed when Reduce functions are receiving RecordReader.

Step 5: defining Reduce modules.The HDFS paths of individual task file are got by the input of the module, will After its locally downloading node, arrange to need the command request handled on GPU servers.And by file content and life After order request is sent on GPU servers by JSCH communications frameworks, remote control command is performed.

In a specific embodiment, in step 5, ready-portioned subtask burst is input to Reduce ends Handled.It should be noted that here Reduce termination receive simply need handle file HDFS storage informations, not Be file in itself.Advantage of this is that avoiding big data transmitting procedure excessive between Map ends and Reduce ends.

Reduce ends are downloaded to local node according to fileinfo after burst is received from HDFS.Need explanation It is：GPU servers do not exist with HDFS to communicate, and actual calculating is really on GPU servers, therefore institute in calculating process The data content needed then needs Slave nodes voluntarily to be uploaded when request is sent.Exemplary, it is employed herein in JSCH SFTP patterns upload on GPU servers /home/ezio/Documents/fromNode/input catalogues under.In data After being transmitted, data are handled by remote command.Exemplary, handled from the exe patterns in JSCH, and Implementing result is output in assigned catalogue by wait.Exemplary, catalogue/home/ is locally stored from GPU servers In ezio/Documents/fromNode/output.

JSCH frameworks are used when calling GPU servers, are realized as the Java of SSH frameworks, JSCH remains SSH The method of operation.It is long-range it is exectorial during, it is necessary to define remote command perform the IP of main frame, user name, password with And complete command string.Fasta file analyses order based on CUDA-CDC is according to software usage, it is necessary to pass through absolute road Footpath defines the title of software, input parameter.Go out outside above-mentioned two required contents, the recovery of analysis result for convenience, by fixed Adopted identical outgoing route counts the position of output file.Finally as described above, obtained by string-concatenation comprising exhausted To path description comprising software information, input information, output information command-line string.

In a specific embodiment, in step 6, after output procedure completion, slave nodes again will knot Fruit is downloaded to by JSCH SFTP to be locally stored.After the completion of download, result is uploaded to by HDFS storages by HDFS API.

In general, as shown in figure 3, whole computing flow is divided for three phases, wherein pretreatment stage is main Task is that the pending file directory that user is uploaded into HDFS carries out traversal processing, and recording and sort out can be as Mapper The task list file of function input.

Secondly in data processing stage, Map ends are divided to inventory file, and obtained subtask is passed to Reduce ends.Reduce ends as indicated from HDFS downloading files, and are sent it to after assignment file description is received On GPU servers.Further according to the directory location and required order execution information being sent on GPU servers, sort out Whole task CUDA task orders, and sent by JSCH EXE modules and perform order.

In the data record stage, CUDA result of calculations are downloaded from specified GPU server paths, and be uploaded to HDFS.

The data flow of whole process is described in another specific embodiment, in Fig. 4.It is wherein pending Since fasta files be downloaded being uploaded to HDFS, be taken to GPU servers respectively until by each Slave node Untill.

And task list file is established as an intermediate result since principal function is called, afterwards as Map ends Input, untill Slave ends are flowed to after being fragmented.

And last result of calculation file is built on GPU servers, after being downloaded by each Slave, being pooled to HDFS is Only.

The modules needed for whole method frame are devised in Fig. 5.That wherein most crucial is CAT_Mapper and CAT_ Reducer.The two modules realize division and the GPU cluster invoked procedure of task respectively.JSCH modules are to pass through JschTools classes are realized that the encapsulation for the SFTP and EXE modules used required for containing is realized. ArrangeTaskRequest classes mainly realize judgement and the preprocessing function of task list.It should be noted that CAT_ Mapper is both designed to the static inner classes for CAT with CAT_Reducer classes.

Fig. 6-9 is the time record after two groups of data are calculated by two kinds of programs respectively.Wherein first group data are Single file size is 11KB or so ten fasta files, and second group of data is the fasta that single file size is 3.5KB File.The orange codon for being recorded as optimizing by CUDA realizes program CUDA-CDC meter using bias algorithm in four width figures Evaluation time result, the blue program Hadoop-CUDA-CDC being recorded as after Hadoop and CUDA double optimizations.In figure The assignment file quantity that very coordinate representation simulation active user submits.

As can be seen from the data：1st, the size of assignment file does not influence stable speed-up ratio；2nd, when total time is smaller, in advance Processing time accounting is larger, and combined type program handles the saved time by nodal parallel and has been less than task and divides shared Pretreatment time, result in the negative growth of acceleration.

One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, the change or replacement that can be readily occurred in, all should It is included within the scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.

Claims

1. a kind of codon deviation factor implementation method based on CPU-GPU isomery combined type parallel computation frames, its feature exists In methods described comprises the following steps：

Step 1: setting up Hadoop clusters on rolling reamer machine, every rolling reamer machine is used as a node；And set host node simultaneously For host node and from node；

Step 2: building CUDA framework environments on tower server, and set and each node can be realized with Hadoop clusters The network configuration of communication；

Step 3: the batch jobs catalogue to submission is pre-processed, fitting output is actual comprising each waiting task file The task list of HDFS storage locations is under the catalogue specified；

Step 4: defining the division rule to task list by MapReduce frameworks, Map tasks are started in host node, will be appointed Each task being engaged in inventory obtains burst example, and is sent in node and is handled；

Step 5: getting the HDFS paths of individual task file by the input of Reduce modules, local section is downloaded it to After point, arrange to need the command request handled on GPU servers；And be sent to file content and command request After on GPU servers, remote control command is performed；

Step 6: after the completion of the calculating of GPU server end CUDA-CDC programs is monitored, by the output text under specified output directory The locally downloading node of part, and it is uploaded to HDFS.

2. the figure that according to the method described in claim 1, it is characterised in that in the step 2, the tower server is installed It is identical with guide card number of servers that shape calculates card quantity.

3. according to the method described in claim 1, it is characterised in that in the step 3, batch jobs catalogue is located in advance Reason is specifically included：

The operation file catalogue being committed to user in HDFS is traveled through, and search meets CUDA-CDC input file requirements Fasta files, the directory characters string and filename progress for obtaining a result then by file in HDFS are often traveled through according to filename Splicing, obtains the fullpath that this document is stored in HDFS, and the character string for representing fullpath is write as a line record In task list in HDFS.

4. according to the method described in claim 1, it is characterised in that in the step 4, the division rule is specially：

Using divided by row rule, to one record of N rows content creating in the task list, institute's pending processing is obtained HDFS storage locations；The N is natural number.

5. according to the method described in claim 1, it is characterised in that in the step 5, arrange to need on GPU servers The command request handled, specifically includes following steps：

In command procedure is remotely performed, IP, user name, password and complete command character that remote command performs main frame are defined String；For fasta file analysis orders, title, the input parameter of software are defined by absolute path.

6. method according to claim 5, it is characterised in that the step 5 also includes：Exported by defining identical Path, counts the position of output file, in order to the recovery of classification results；

By string-concatenation, obtain including software information, input information, the order of output information comprising what absolute path was described Line character string.

7. according to the method described in claim 1, it is characterised in that in the step one, group system structure is specially：

Comprising master node, slave node and calculate three sub-clusters of service node, group system different from the past, Map ends with Communication is not set up at GPU ends；

Communicated between Reduce ends and GPU ends using JSCH, by the computing environment at Map ends and Reduce ends and GPU ends Thoroughly separate, carry out collaboration processing.