CN107168795B

CN107168795B - Codon deviation factor model method based on CPU-GPU isomery combined type parallel computation frame

Info

Publication number: CN107168795B
Application number: CN201710332575.7A
Authority: CN
Inventors: 章乐; 陈镜行; 丁维龙; 荆晨阳; 冯计平
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2017-05-12
Filing date: 2017-05-12
Publication date: 2019-05-03
Anticipated expiration: 2037-05-12
Also published as: CN107168795A

Abstract

The present invention provides a kind of codon deviation factor implementation methods based on CPU-GPU isomery combined type parallel computation frame, comprising: cluster is established on rolling reamer machine, every rolling reamer machine is as a node；And by host node set host node simultaneously and from node；CUDA framework environment is built, and carries out network configuration；The batch jobs catalogue of submission is pre-processed, task list of the fitting output comprising the practical HDFS storage location of each waiting task file to specified catalogue；The division rule to task list is defined, and is sent in node and is handled；The path HDFS for getting individual task file after downloading it to local node, arranges as command request；And it sends file content and command request on GPU server；It is after the completion of monitoring the calculating of GPU server end, output file is locally downloading, and it is uploaded to HDFS.The present invention realizes that operation division, parallel processing and memory calculate, and improve the efficiency of codon bias algorithm when batch jobs are submitted.

Description

Codon deviation factor mould based on CPU-GPU isomery combined type parallel computation frame Type method

Technical field

The present invention relates to one kind by establishing CPU-GPU isomery combined type parallel computation frame, optimizes codon deviation system The method of exponential model efficiency belongs to biological big data field.

Background technique

With the development of bioinformatics, people are by numerous studies biomolecular information structure and tissue come deep learning Genome hereditary information rule.And with the development innovation of sequencing technologies, third generation sequencing technologies of today have reached more quasi- Really, level quickly, inexpensive.By taking HISEQ X TEN sequenator as an example, push the mankind in analysis hereditary information rule Field rapid advances, but the storage and analysis efficiency problem of lots of genes data are also formed simultaneously.

Wherein have through mathematics and statistics, realizes the side for promoting gene data analysis efficiency as means using innovatory algorithm Method also has the physical characteristic using GPU graphics card, and the approach of calculating acceleration is realized with its powerful high-speed parallel ability.Its Just the two alleviates the pressure that Large Scale Biology data calculate to middle CUDA-C to a certain extent, passes through the GPU for supporting CUDA It realizes the single machine parallel computation for using bias algorithm (CAT) for codon, makes full use of the high calculated performance of graphics process, it is real Existing 200 times of speed-up ratios.But for now, the biometric databases such as NCBI, EBI, DDBJ still constantly enrich each type Gene data, the growth of data volume seems the trend not stopped.So being deposited in Data Analysis Services and data On the problem of storage, we are badly in need of finding new breakthrough point.

Recently as the realization of proposition and the cloud computing of grid computing concept, this method can manage to a certain extent Solution is by the network share of soft and hardware resource in distributed type assemblies.Cloud computing framework one of of the Hadoop as nowadays mainstream, Has high reliability (step-by-step storage and powerful data-handling capacity), high scalability (distributes between available computer cluster Data simultaneously complete calculating task, these clusters can be easily scalable in thousands of calculate nodes, and be directed to collection Group for, can easily realize the extension of number of nodes), high efficiency (being capable of dynamic mobile data, and energy in the cluster The enough dynamic equilibrium for effectively guaranteeing each node), high fault tolerance (automatically save multiple copies of data to guarantee the peace of data Full property and integrality) etc. outstanding feature.Hadoop supports that the language such as Java, C, C++ and Python are developed simultaneously, has Very high ease for use.And for the not high of node hardware requirement, this makes building and use cost for Hadoop cluster It is relatively low, cost performance is improved to a certain extent.

In the prior art, currently used major programme includes following several:

Itself with regard to bioinformatics, researcher passes through the mathematical model optimizing applicability problem of traditional algorithm, And codon deviation factor MODEL C DC is realized by its codon component analysis tool box CAT researched and developed.Then The researcher that GPU accelerates parallel passes through part of the CUDA programming model to CDC algorithm in above-mentioned CAT software of NVIDIA company The module for not having data dependence relation realizes the parallel acceleration of GPU, has issued the CUDA optimization software CUDA- of CDC algorithm CDC.In the prior art, for example, Publication No. CN102708088A, CN104536937A, CN104731569A, In the technical solution recorded in CN105335135A etc., it is master the and slave node fixed by setting, builds knot Group's frame.

The defect that the prior art is primarily present has the following:

(1) serial process of current multitask can not be solved the problems, such as.

(2) currently employed blade server can not use CPU, GPU isomorphism mould due to that can not assign GPU equipment Formula.

Summary of the invention

In view of this, the present invention specifically provides following technical solution to solve the above-mentioned problems in the prior art:

The present invention provides a kind of, and the codon deviation factor based on CPU-GPU isomery combined type parallel computation frame is realized Method, which is characterized in that described method includes following steps:

Step 1: establishing Hadoop cluster on rolling reamer machine, every rolling reamer machine is as a node；And simultaneously by host node It is set as host node and from node；

Step 2: building CUDA framework environment on tower server, and being arranged can be with each node of Hadoop cluster Realize the network configuration of communication；

Step 3: realizing preprocessing module, the batch jobs catalogue of submission is pre-processed, fitting output is comprising each Under the task list of the practical HDFS storage location of waiting task file to specified catalogue；

Step 4: defining the division rule to task list by MapReduce frame, start Map task in host node, The fragment example of each task in task list is obtained, and is sent in node and is handled, the node is every blade Node corresponding to machine；

Step 5: getting the path HDFS of individual task file by the input of Reduce module, this is downloaded it to After ground node, arrange to need the command request handled on GPU server；And file content and command request are sent After on to GPU server, remote control command is executed；

Step 6: after the completion of monitoring the calculating of GPU server end CUDA-CDC program, it will be defeated under specified output directory File download is to local node out, and is uploaded to HDFS.

Preferably, in the step 2, the graphics calculations card quantity and blade server number of the tower server installation It measures identical.

Preferably, in the step 3, pretreatment is carried out to batch jobs catalogue and is specifically included: HDFS is committed to user In operation file catalogue traversed, search and meet the fasta file of CUDA-CDC input file requirement, it is every according to filename Traversal obtains a result and then splices file with filename in the directory characters string of HDFS, obtains this document and deposits in HDFS The fullpath of storage, and the character string for indicating fullpath is recorded as a line and is written in the task list in HDFS.

Preferably, which can carry out in the following way in detail: be committed to HDFS to user by recursive algorithm In /input catalogue under operation file catalogue carry out traversal search meet CUDA-CDC input file requirement fasta text Part obtains a result according to the every traversal of filename and then splices file with filename in the directory characters string of HDFS, obtains The fullpath stored to this document in HDFS, and recorded the character string for indicating fullpath as a line in write-in HDFS / input catalogue under task.txt file in.In using task.txt file as TextInputFormat class The input of getSplits function and createRecordReader function.

Preferably, in the step 4, the division rule specifically: clear to the task using divided by row rule N row content creating one record in list obtains the pending HDFS storage location of institute；The N is natural number.

Preferably, which can carry out in the following way in detail: due to the write-in rule in task.txt file It is then the complete HDFS store path that a line records a fasta file to be processed, therefore when dividing task list file, adopts With the divided by row rule of TextInputFormat default, one RecordReader of every a line content creating is recorded.It obtains All records i.e. fasta file to be handled HDFS storage location, in this, as the input of Reduce function.Other can The division rule of choosing can be set according to different system and cluster environment, such as using N row file content as one group of record, Divided using the similar form being packaged, it means that when Reduce function when receiving RecordReader needs The N number of file of serial process.

Preferably, it in the step 5, arranges to need the command request handled on GPU server, it is specific to wrap It includes following steps: remotely executing in command procedure, define remote command and execute the IP of host, user name, password and complete Command string；For fasta file analysis order, title, the input parameter of software are defined by absolute path.

Preferably, the step 5 further include: by the identical outgoing route of definition, the position of output file is counted, with Convenient for the recycling of classification results；By string-concatenation, believing comprising software information, input comprising absolute path description is obtained The command-line string of breath, output information.

Preferably, group system structure specifically: comprising master node, slave node and calculate three subsets of service node Group, group system different from the past, the end Map is not established with the end GPU and is communicated.And JSCH is used between the end Reduce and the end GPU It is communicated, the end Map and the calculating environment at the end Reduce and the end GPU is thoroughly separated, collaboration processing is carried out.

Compared with prior art, technical solution of the present invention passes through under codon bias algorithm (CAT) and CUDA environment Codon bias algorithm (CUDA-C's) of realization researchs and analyses, for a variety of situations Hadoop parallel task divide it is feasible Property analysis, service, realize real in the case of being submitted to batch jobs in conjunction with distributed type assemblies environment and CUDA Server remote SSH Existing operation division, parallel processing and memory calculate, and the calculating of node each on cluster and storage resource are integrated, into one Step improves the efficiency of codon bias algorithm.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.

Fig. 1 is the Network Structure of Industrial Cluster figure of the embodiment of the present invention；

Fig. 2 is the cluster frameworks design drawing of the embodiment of the present invention；

Fig. 3 is the preliminary design figure of the embodiment of the present invention；

Fig. 4 is the data preprocessing phase figure of the embodiment of the present invention；

Fig. 5 is the CAT frame UML detailed design figure of the embodiment of the present invention；

Fig. 6 is first group of data run time line chart of the embodiment of the present invention；

Fig. 7 is first group of data run time scale figure of the embodiment of the present invention；

Fig. 8 is second group of data run time line chart of the embodiment of the present invention；

Fig. 9 is second group of data run time scale figure of the embodiment of the present invention.

Specific embodiment

The embodiment of the present invention is described in detail with reference to the accompanying drawing.It will be appreciated that described embodiment is only this Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist All other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.

Those skilled in the art should know it is further that following specific embodiments or specific embodiment, which are the present invention, The set-up mode of series of optimum explaining specific summary of the invention and enumerating, and being between those set-up modes can be mutual In conjunction with or it is interrelated use, unless clearly proposing some of them or a certain specific embodiment or embodiment party in the present invention Formula can not be associated setting or is used in conjunction with other embodiments or embodiment.Meanwhile following specific embodiment or Embodiment is only as the set-up mode optimized, and not as the understanding limited the scope of protection of the present invention.

Embodiment 1

In a specific embodiment, the corresponding system of the present invention can be built according to Fig. 1 Fig. 2 multisystem, be adopted With four independently installed CentOs7 operating systems of Dell M820 rolling reamer machine, 4 pieces are installed using an AMAX tower server NVIDIA graphics calculations card simultaneously configures CUDA environment.It is afterwards phase same network segment IP to guarantee it by above five rolling reamer machine network configurations Communication is normal.4 rolling reamer machines are then respectively set to master, slave1, slave2, slave3.With remaining Hadoop Cluster is identical, sets cluster from node for three slave machines, and different places is to match the quantity of GPU card, protects During demonstrate,proving Multi-task Concurrency, each GPU resource is utilized, and master is also written to the slaves text of cluster configuration It is used simultaneously as master and slave in part.

Embodiment 2

Present invention design is realized the true compound parallel computation frame of GPU-CPU isomery and is studied in codon using bias Optimization of the field to codon deviation factor model.The distributed type assemblies model extension of GPU will be needed to use to can not be in cluster On node in the equipment of direct carry GPU, the applicability of compound cluster is increased.

Specifically, the present invention provides a kind of new combined type parallel computation frames.It is drawn by distributed computing task Extension set system carries out distributed treatment to multiple operations that user submits, and uses high speed network after realization is distributed to each node Network sends task on specified GPU server, and the efficient process of single homework is carried out using CUDA-CDC software.It is handling By data record to node and HDFS is uploaded to after end.

In a specific embodiment, method of the invention can be realized especially by one of following preferred embodiment, In conjunction with Fig. 3, the codon based on GPU-CPU isomery combined type parallel computation frame for showing the offer of the embodiment of the present invention 1 makes With the implementation process of bias algorithm, details are as follows for this method process:

Step 1: establishing Hadoop cluster on rolling reamer machine, every rolling reamer machine is as a node.It is different from conventional configuration , set Master to simultaneously host node and from node, guarantee in case of need, also to can be carried out distributed task scheduling Processing maximizes the utilization rate of cluster.

In a specific embodiment, in step 1, a computer cluster is built, comprising one by blade type The Hadoop server cluster and a GPU server cluster being made of tower server of server composition, GPU server Quantity only has one, but is wherein equipped with and the comparable GPU graphics calculations card of blade server quantity.

For Hadoop server cluster, wherein containing four rolling reamer machine nodes, will wherein a rolling reamer machine be configured to The node and its excess-three node are each configured to Slave node at the same time by Master node.Have between each node Independent memory and memory space.

Step 2: building CUDA framework environment on tower server, and being arranged can be with each node of Hadoop cluster Realize the network configuration of communication.

In a specific embodiment, in step 2, network communication is established with GPU server to each node Channel.Illustratively, each node realizes that network communication and remote command control are called to GPU server by JSCH.

It should be noted that algorithm CUDA optimizes required GPU computing resource not in Hadoop cluster, but it is logical The mode for crossing remote command execution provides calculating by proprietary GPU server.Designing in this way is advantageous in that for Hadoop collection Group itself, does not need Compulsory Feature hardware environment.After each stage module, which can be by simple to remote command Modification just can be suitably used for other calculated cases.

Step 3: the batch jobs catalogue submitted to user pre-processes by preprocessing module, fitting output includes Under the task list of each practical HDFS storage location of waiting task file to specified catalogue.

In a specific embodiment, in step 3, selection has the directory traversal algorithm of screening to submit user Catalogue traversed, search out all fasta files to be processed, be uniformly recorded in specified task list file.Show Example property, on selection HDFS /input/task.txt file.

Specific pretreatment, in a specific embodiment, by recursive algorithm to user be committed in HDFS/ Operation file catalogue under input catalogue carries out traversal and searches the fasta file for meeting CUDA-CDC input file requirement, according to The every traversal of filename obtains a result and then splices file with filename in the directory characters string of HDFS, obtains this document In the fullpath of HDFS storage, and using indicate the character string of fullpath as a line record be written in HDFS /input mesh In task.txt file under record.GetSplits function in using task.txt file as TextInputFormat class and The input of createRecordReader function.

Step 4: defining the division rule to task list by MapReduce frame, appoint in host node starting Map Each task in task list is obtained fragment example, and is sent in node and is handled by business, and the node is every knife Node corresponding to piece machine.

In a specific embodiment, in step 4, the content in task list file is divided.Example Property, the divided by row rule of TextInputFormat class is selected, records the rule of file path and title in matching files by row Then.

Since the complete HDFS that the write-in rule in task.txt file is a line one fasta file to be processed of record is deposited Path is stored up, therefore when dividing task list file, using the divided by row rule of TextInputFormat default, to each One RecordReader record of row content creating.Obtain all records i.e. the HDFS of fasta file to be handled store Position, in this, as the input of Reduce function.Other optional division rules can be according to different system and cluster environment It is set, such as using N row file content as one group of record, is divided using the similar form being packaged, it means that When Reduce function needs the N number of file of serial process when receiving RecordReader.

Step 5: defining Reduce module.The path HDFS of individual task file is got by the input of the module, it will After its locally downloading node, arrange to need the command request handled on GPU server.And by file content and life After enabling request be sent on GPU server by JSCH communications framework, remote control command is executed.

In a specific embodiment, in step 5, ready-portioned subtask fragment is input to the end Reduce It is handled.It should be noted that the HDFS storage information for only needing to handle file that Reduce termination receives here, not It is file itself.The advantage of doing so is that avoiding big data transmission process excessive between the end Map and the end Reduce.

The end Reduce is downloaded to local node after receiving fragment, according to the file information from HDFS.It needs to illustrate Be: GPU server does not exist with HDFS communicate, and it is actual calculate really on GPU server, therefore institute in calculating process The data content needed then needs Slave node voluntarily to upload when sending request.Illustratively, it is employed herein in JSCH SFTP mode upload on GPU server /home/ezio/Documents/fromNode/input catalogue under.In data After being transmitted, data are handled by remote command.Illustratively, the exe mode in JSCH is selected to be handled, and Implementing result is output in specified directory by waiting.Illustratively, that selects GPU server is locally stored catalogue/home/ In ezio/Documents/fromNode/output.

When calling GPU server using JSCH frame, the Java as SSH frame realizes that JSCH remains SSH The method of operation.During long-range exectorial, need to define remote command execute the IP of host, user name, password with And complete command string.Fasta file analysis order based on CUDA-CDC is needed according to software usage through absolute road Diameter defines the title of software, input parameter.Out outside above-mentioned two required contents, the recycling of result is analyzed for convenience, by fixed The position of the identical outgoing route statistics output file of justice.Finally as described above, it is obtained by string-concatenation comprising exhausted To the command-line string comprising software information, input information, output information of path description.

In a specific embodiment, in step 6, after output process is completed, slave node again will knot Fruit is downloaded to by the SFTP of JSCH and is locally stored.After the completion of downloading, result is uploaded to by HDFS storage by the API of HDFS.

In general, as shown in figure 3, entire operation process divides for three phases, wherein pretreatment stage is main Task is that user is uploaded to the file directory to be processed progress traversal processing of HDFS, and Mapper can be used as by recording and sorting out The task list file of function input.

Secondly in data processing stage, the end Map divides inventory file, and obtained subtask is passed to The end Reduce.The end Reduce as indicated from HDFS downloading files, and is sent it to after receiving assignment file description On GPU server.Further according to the directory location and required order execution information being sent on GPU server, sort out Whole task CUDA task order, and send by the EXE module of JSCH and execute order.

In the data record stage, CUDA calculated result is downloaded from specified GPU server path, and be uploaded to HDFS.

The data flow of whole process is described in another specific embodiment, in Fig. 4.It is wherein to be processed Since fasta file be downloaded respectively until by each Slave node being uploaded to HDFS, be taken to GPU server Until.

And task list file is established since principal function is called as an intermediate result, is used as the end Map later Input, until the end Slave is flowed to after being fragmented.

And last calculated result file is built on GPU server, after each Slave downloading, being pooled to HDFS is Only.

Modules needed for devising entire method frame in Fig. 5.Wherein that most crucial is CAT_Mapper and CAT_ Reducer.The two modules realize division and the GPU cluster calling process of task respectively.JSCH module is to pass through JschTools class is realized that the encapsulation for the SFTP and EXE module used required for containing is realized. ArrangeTaskRequest class mainly realizes judgement and the preprocessing function of task list.It should be noted that CAT_ Mapper and CAT_Reducer class are both designed to the static inner classes for CAT.

Fig. 6-9 is the time record after two groups of data are calculated by two kinds of programs respectively.Wherein first group of data is Single file size is ten fasta files of 11KB or so, and second group of data is the fasta that single file size is 3.5KB File.The dark codon for being recorded as optimizing by CUDA realizes the meter of program CUDA-CDC using bias algorithm in four width figures Evaluation time is as a result, i.e. GPU parallel mode；The program Hadoop- of light color being recorded as after Hadoop and CUDA double optimization CUDA-CDC, i.e. combined type parallel mode.The assignment file quantity that very coordinate representation simulation active user in figure submits.

As can be seen from the data: 1, the size of assignment file does not influence stable speed-up ratio；2, when total time is smaller, in advance It is larger to handle time accounting, combined type program by nodal parallel handles saved time, and to be less than task division occupied Pretreatment time results in the negative growth of acceleration.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims

1. a kind of codon deviation factor implementation method based on CPU-GPU isomery combined type parallel computation frame, feature exist In described method includes following steps:

Step 1: establishing Hadoop cluster on rolling reamer machine, every rolling reamer machine is as a node；And host node is arranged simultaneously For host node and from node；

Step 2: building CUDA framework environment on tower server, and it is arranged and can be realized with each node of Hadoop cluster The network configuration of communication；

Step 3: the batch jobs catalogue to submission pre-processes, fitting output is practical comprising each waiting task file Under the task list of HDFS storage location to specified catalogue；

Step 4: defining the division rule to task list by MapReduce frame, starts Map task in host node, obtain The fragment example of each task in task list, and be sent in node and handled, the node is every rolling reamer machine institute Corresponding node；

Step 5: getting the path HDFS of individual task file by the input of Reduce module, local section is downloaded it to After point, arrange to need the command request handled on GPU server；And it sends file content and command request to After on GPU server, remote control command is executed；

Step 6: after the completion of monitoring the calculating of GPU server end CUDA-CDC program, by the output text under specified output directory The locally downloading node of part, and it is uploaded to HDFS.

2. the method according to claim 1, wherein in the step 2, the figure of the tower server installation It is identical as blade server quantity that shape calculates card quantity.

3. the method according to claim 1, wherein being located in advance in the step 3 to batch jobs catalogue Reason specifically includes:

The operation file catalogue that user is committed in HDFS is traversed, search meets CUDA-CDC input file requirement Fasta file obtains a result according to the every traversal of filename and then carries out file in the directory characters string and filename of HDFS Splicing obtains the fullpath that this document is stored in HDFS, and the character string for indicating fullpath is recorded as a line and is written In task list in HDFS.

4. the method according to claim 1, wherein in the step 4, the division rule specifically:

It is pending to be obtained to N row content creating one record in the task list using divided by row rule for institute HDFS storage location；The N is natural number.

5. the method according to claim 1, wherein arranging in the step 5 to need on GPU server The command request handled, specifically comprises the following steps:

It is remotely executing in command procedure, is defining IP, user name, password and complete command character that remote command executes host String；For fasta file analysis order, title, the input parameter of software are defined by absolute path.

6. according to the method described in claim 5, it is characterized in that, the step 5 further include: by defining identical output Path counts the position of output file, in order to the recycling of classification results；

By string-concatenation, the order comprising software information, input information, output information comprising absolute path description is obtained Line character string.

7. the method according to claim 1, wherein in the step 1, group system structure specifically:

Comprising master node, slave node and calculate three sub-clusters of service node, group system different from the past, the end Map with Communication is not established at the end GPU；

It is communicated between the end Reduce and the end GPU using JSCH, by the calculating environment at the end Map and the end Reduce and the end GPU It thoroughly separates, carries out collaboration processing.