CN103324509A - Method for installing bioinformatics application programs in high-performance cluster system - Google Patents

Method for installing bioinformatics application programs in high-performance cluster system Download PDF

Info

Publication number
CN103324509A
CN103324509A CN2013102601126A CN201310260112A CN103324509A CN 103324509 A CN103324509 A CN 103324509A CN 2013102601126 A CN2013102601126 A CN 2013102601126A CN 201310260112 A CN201310260112 A CN 201310260112A CN 103324509 A CN103324509 A CN 103324509A
Authority
CN
China
Prior art keywords
bioinformatics
system
method
platform
applications
Prior art date
Application number
CN2013102601126A
Other languages
Chinese (zh)
Inventor
姜金良
马少杰
曹振南
李斌
赵明坤
侯雪峰
何沧平
田相桂
杨亮
易成
曹征
苗春葆
胡耀国
范娟
Original Assignee
曙光信息产业(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 曙光信息产业(北京)有限公司 filed Critical 曙光信息产业(北京)有限公司
Priority to CN2013102601126A priority Critical patent/CN103324509A/en
Publication of CN103324509A publication Critical patent/CN103324509A/en

Links

Abstract

The invention discloses a method for installing bioinformatics application programs in a high-performance cluster system. The method includes the steps: loading environment variables of the bioinformatics application programs; selecting a corresponding math library according to system types and network configuration of a current installation platform; installing the bioinformatics application programs by the aid of the environment variables and the math library. By the method for installing the bioinformatics application programs in the high-performance cluster system, the efficiency of installing the bioinformatics application programs in the high-performance cluster system is improved.

Description

在高性能集群系统中安装生物信息学类应用程序的方法 The method of mounting bioinformatics-based high-performance applications in the cluster system

技术领域 FIELD

[0001] 本发明基本上涉及材料研究,更具体地来说,涉及一种在高性能集群系统中安装生物信息学类应用程序的方法。 [0001] The present invention relates generally research materials, and more particularly, to a mounting bioinformatics-based application in a method of high-performance cluster system.

背景技术 Background technique

[0002] 生物信息学是以计算机为工具对生物信息进行收集、处理和利用的科学。 [0002] Bioinformatics is a computer for the collection, processing and utilization of biological information science as a tool. 研究对象一般为蛋白质和DNA大分子,一方面由于研究对象本身结构的复杂性,另一方面由于测序技术的飞速发展,人类发现的基因序列数目按照指数级增长,对于如此数量庞大的基因进行研究,往往伴随着巨大的数据处理量和并行计算量。 Subjects generally proteins and DNA molecules, on the one hand due to the complexity of the study of the structure itself, on the other hand due to the rapid development of sequencing technology, the number of human gene sequences found in accordance with the exponential growth, the study for such a huge number of genes , often accompanied by a huge amount of data processing, and parallel computation.

[0003] 生物信息学研究的内容有很多:比如利用实验仪器对基因等进行测序以及测量数据的初步处理——测序仪离线处理,DNA测序仪是用于测量DNA(基因)序列的高级试验仪器,用于亲子鉴定、个体识别、基因档案、父系鉴定、母系鉴定、种族鉴定、种属鉴定,以及某些疾病的诊断等等,是生命科学研究中必不可少的仪器设备、获得重要科研进展的重要工具。 [0003] Content bioinformatics studies are many: the use of laboratory instruments such as gene sequencing, and the like preliminary process measurement data - offline processing sequencer, DNA sequencer advanced test equipment for measurement of DNA sequences (genes) for paternity testing, individual identification, genetic profiles, paternal identification, maternal identification, ethnic identification, species identification, and diagnosis of certain diseases, etc., is essential for life science research instruments and equipment, gain important research progress an important tool. DNA测序仪价格昂贵,其研究过程分为准备试剂,仪器测序到最后的仪器离线处理,从而获得科学家可以辨识的基因序列,在这个的基础上,科学家可以利用测量获得的序列进行拼接、比对、同源性分析等;序列比对主要是从相互重叠的序列片段中重构DNA完整序列,在各种实验条件下从探测数据中决定物理和基因图存贮,遍历和比较数据库中的DNA序列,比较两个或多个序列的相似性,在数据库中搜索相关序列和子序列,寻找核苷酸的连续产生模式,找出蛋白质和DNA序列的信息成分;分子对接根据配体与受体的锁-钥原理,模拟小分子配体与受体大分子的相互作用,通过计算预测两者间的结合模式和亲和力,从而进行药物的虚拟筛选。 DNA sequencer expensive course of their research into a preparatory reagent, sequencing instrument to the final off-line processing equipment, thereby obtain a gene sequence recognized scientists can, on the basis of this, scientists can be spliced ​​using the measurement sequence obtained than homology analysis; sequence alignment mainly reconstructed from overlapping sequence fragments of the complete DNA sequence determined from the probe data under various experimental conditions the physical and genetic map storage, in the database traversal and compare DNA sequence similarity, comparison of two or more sequences, related sequences in the database searches and sequence, find continuous nucleotides generation mode, information to identify the components of protein and DNA sequences; molecular docking a ligand to the receptor according to lock - key principle, the analog small molecule ligand and receptor macromolecules interact, by computing the prediction mode and the binding affinity between the two, thereby performing virtual drug screening.

[0004]常用的程序有 abyss、allpathslg、 amos、autodock、blast、clustal—omega、clustalw、clustalw—mp1、dock、emboss、exonerate、fasta、fsa> hmmer、mira、mpiblast、mpihmmer、mummer> velvet、wgs 等。 [0004] Commonly used procedures abyss, allpathslg, amos, autodock, blast, clustal-omega, clustalw, clustalw-mp1, dock, emboss, exonerate, fasta, fsa> hmmer, mira, mpiblast, mpihmmer, mummer> velvet, wgs Wait.

[0005] 通常生物信息学类应用程序的安装部署都是手动执行,这种安装方式存在一些不足:程序编译、安装过程较为复杂,需人为设置的参数较多,手动安装操作繁琐,费时费力,如果对编译操作流程不熟悉,很容易出现错误。 [0005] Deployment is usually mounted bioinformatics-based applications are performed manually, there are some shortcomings this installation: compiling the program, the installation process is more complicated, many parameters need to artificially set, manual mounting operation cumbersome, time-consuming, If you are not familiar with compiling operational processes, it is prone to error. 安装过程中需要针对不同的硬件平台和网络环境进行不同的参数配置,对操作系统、编译器、数学库、硬件系统和网络环境的不熟悉都会造成程序执行效率低下甚至是运行结果错误。 The installation process requires different parameters for different hardware platforms and network environments, operating systems, compilers, math libraries, hardware systems and network environments are not familiar with the program execution will result in low efficiency and even operating results wrong. 程序安装成功后需要配置相应的环境变量,以方便用户使用,手动配置容易出错,应用程序种类多时,容易造成环境变量设置混乱、冲突。 After a successful installation you need to configure the appropriate environment variable, for ease of use, error-prone manual configuration, the variety of applications for a long time, likely to cause confusion environment variable settings, conflict.

发明内容 SUMMARY

[0006] 针对上述现有技术的缺陷,本发明提出了一种在高性能集群系统中安装生物信息学类应用程序的方法,解决了如何提高安装高性能集群系统中安装生物信息学类应用程序的效率的技术问题。 [0006] for the above-mentioned drawbacks of the prior art, the present invention proposes a method of installing a bioinformatics-based high-performance applications in a cluster system, it addresses how to improve the mounting system mounted High Performance Cluster-based bioinformatics applications technical problems efficiency. [0007] 本发明提出了一种高性能计算集群生物信息学类应用程序的一种自动安装方法。 [0007] The present invention provides a method for the automatic installation HPC clusters based bioinformatics applications. 该应用程序实现多种材料物理类应用程序的自动化无人值守安装,包括abyss、allpathslg、amos、autodock、blast、clustal—omega、clustalw、clustalw—mp1、dock、emboss、exonerate、fasta、fsa、hmmer、mira、mpiblast、mpihmmer、mummer、velvet、wgs 等;该程序在安装配置生物信息学类应用程序前先自动检查依赖的其它程序环境;自动安装配置的过程中,根据高性能计算集群的网络环境进行配置参数调整和优化;安装完成后自动配置环境变量,并提供在集群系统中提交任务所需的脚本示例;整个安装过程中,动态提示安装进度,如果出现错误给出相应报错提示。 The application program implements a variety of materials physics applications unattended automated installation, comprising abyss, allpathslg, amos, autodock, blast, clustal-omega, clustalw, clustalw-mp1, dock, emboss, exonerate, fasta, fsa, hmmer , mira, mpiblast, mpihmmer, mummer, velvet, wgs like; the program automatically checks to other programs before installing environment-dependent configuration based bioinformatics applications; automatically during setup configuration, computing clusters according to high performance environments configure parameter adjustment and optimization; automatic configuration environment variable after installation is complete, and submit the job script examples provided in the cluster system required; the entire installation process, the dynamic progress of the installation prompts, if an error occurs given the appropriate error prompt.

[0008] 根据本发明的一个方面,提供了一种在高性能集群系统中安装生物信息学类应用程序的方法,包括:步骤S1:载入所述生物信息学类应用程序的环境变量;步骤S2:根据当前安装平台的系统类型和网络配置选择对应的数学库;步骤S3:利用所述环境变量和所述数学库,安装所述生物信息学类应用程序。 [0008] In accordance with one aspect of the present invention, there is provided a method of installing a bioinformatics-based applications in high-performance cluster system, comprising: Step S1: Loading the environment variables based bioinformatics application; step S2: select a corresponding configuration according to the current mounting platform math library system type and network; step S3: using the environmental variables and the math library, mounting the class bioinformatics applications.

[0009] 在所述方法中,在所述步骤S2之前,所述方法还包括:检查所述生物信息学类应用程序的源程序是否存在和安装目标文件夹是否能够正常创建,如果是,则执行步骤S2。 [0009] In the method, prior to the step S2, the method further comprising: checking the source bioinformatics application class exists and if the target file folder installation normally can be created, if so, to step S2.

[0010] 在所述方法中,在所述步骤S2之前,所述方法还包括:获取当前安装平台的所述系统类型和所述网络配置。 [0010] In the method, prior to the step S2, the method further comprising: obtaining the current mounting platform for the system type and the network configuration.

[0011] 在所述方法中,所述系统类型包括当前安装平台的操作系统版本。 [0011] In the method, the system comprises an operating system type of the currently installed version of the platform.

[0012] 在所述方法中,获取当前安装平台的所述系统类型包括:通过查看当前安装平台的系统文件获取当前安装平台的操作系统版本。 [0012] In the method, the system retrieves the current type mounting platform comprising: acquiring operating system version currently installed on the platform by reviewing the current file system mounting platform.

[0013] 在所述方法中,所述网络配置包括是否配置有Infiniband网卡。 [0013] In the method, the network is configured with a configuration comprising a card Infiniband.

[0014] 在所述方法中,获取当前安装平台的所述系统类型和所述网络配置包括:通过查看当前安装平台的系统文件获取当前安装平台的操作系统版本;检查当前安装平台中是否配置了Infiniband网卡;以及检查所述Infiniband网卡是否安装了驱动并且是否能够正常运行。 [0014] In the method, obtaining the current mounting platform for the system type and configuration of the network comprises: acquiring operating system version currently installed by viewing platform currently installed file system platform; check the current platform is configured in the installation Infiniband card; Infiniband well as checking whether the card is installed and the drive is running properly.

[0015] 在所述方法中,所述环境变量包括整体环境变量和具体环境变量,所述整体环境变量包括安装过程子程序、所述生物信息学类应用程序源程序位置和所述生物信息学类应用程序的安装目标路径,并且所述具体环境变量包括编译器和MPI。 [0015] In the method, the environmental variables and the specific variable includes the overall environment environmental variables, the whole installation process environment variables include a subroutine, a bioinformatics-based application and said source location bioinformatics installation target path, and the particular application environment variable type including a compiler and MPI.

[0016] 在所述方法中,所述方法还包括:将安装过程中生成的输出信息保存。 [0016] In the method, the method further comprising: mounting the output information generated during storage.

[0017] 在所述方法中,所述方法还包括:为所述生物信息学类应用程序生成在所述高性能集群系统提交作业的脚本示例,其中,所述脚本示例内容包括所述生物信息学类应用程序的资源申请方式和应用程序运行方式。 [0017] In the method, the method further comprising: generating classes for said bioinformatics applications in the high-performance cluster system example of a job submission script, wherein said script comprises a content example of the biological information resource way to apply science class applications and applications running mode.

[0018] 通过本发明所提供在高性能集群系统中安装生物信息学类应用程序的方法提高了安装高性能集群系统中安装生物信息学类应用程序的效率。 [0018] The method of installing a bioinformatics-based application in a high performance cluster system provides improved efficiency in installing high performance cluster system installed bioinformatics-based applications by the present invention.

附图说明 BRIEF DESCRIPTION

[0019] 附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。 [0019] The accompanying drawings provide a further understanding of the present invention, and constitute part of this specification, the embodiments of the invention, serve to explain the invention, not to limit the present invention. 在附图中: In the drawings:

[0020]图1是根据本发明的在高性能集群系统中安装生物信息学类应用程序的方法的总体实施例的流程图;[0021]图2是根据本发明的在高性能集群系统中安装生物信息学类应用程序的方法的具体实施例的流程图; [0020] FIG. 1 is a flowchart of a method in accordance with the general class mounting bioinformatics applications in high-performance cluster system embodiment of the present invention; [0021] FIG. 2 is mounted in a high performance cluster system according to the invention flowchart of a specific method of bioinformatics application classes embodiment;

[0022]图3是根据本发明的在高性能集群系统中安装生物信息学类应用程序的方法的实例的流程图。 [0022] FIG. 3 is a flowchart of an example method of installation based bioinformatics applications in high-performance cluster system according to the present invention.

具体实施方式 Detailed ways

[0023] 以下结合附图对本发明的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明。 [0023] Hereinafter, the preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, it should be understood that the preferred embodiments described herein are only used to illustrate and explain the present invention and are not intended to limit the present invention.

[0024]图1是根据本发明的在高性能集群系统中安装生物信息学类应用程序的方法的总体实施例的流程图。 [0024] FIG. 1 is a flowchart of a method in accordance with the general class mounting bioinformatics applications in high-performance cluster system embodiment of the present invention. 在图1中: In Figure 1:

[0025] 步骤SlOO:载入生物信息学类应用程序的环境变量。 [0025] Step SlOO: Loading environment variables based bioinformatics applications. 其中,环境变量可以包括整体环境变量和具体环境变量,整体环境变量包括安装过程子程序、生物信息学类应用程序源程序位置和生物信息学类应用程序的安装目标路径,并且具体环境变量包括编译器和MPI。 Wherein, environment variables may include variables and the overall environment specific environment variable, the overall target path environment variable includes mounting installation routine, bioinformatics applications based source location and bioinformatics-based application, and the particular environment variables including compiled and MPI.

[0026] 步骤S102:根据当前安装平台的系统类型和网络配置选择对应的数学库。 [0026] the step S102: corresponding to the selected configuration based on the current math library mounting platform type and network systems. 其中,系统类型可以包括当前安装平台的操作系统版本。 Among them, the type of system may include an operating system currently installed version of the platform. 在一个优选实施例中,可以通过查看当前安装平台的系统文件获取当前安装平台的操作系统版本。 In a preferred embodiment, the operating system can get the currently installed version of the platform by reviewing the current file system mounting platform. 其中,操作系统可以包括Red Hat、Suse, CentOS等主流高性能集群操作系统。 Among them, the operating system may include Red Hat, Suse, CentOS and other mainstream high-performance cluster operating system. 此外,网络配置可以包括是否配置有Infiniband网卡,还可以进一步检测Infiniband网卡是否安装有驱动程序以及该网卡是否正常运行。 Further, the network configuration may include a network card is configured with the Infiniband, Infiniband may further detect whether the card has been mounted and the card drivers running.

[0027] 步骤S104:利用环境变量和数学库安装生物信息学类应用程序。 [0027] Step S104: the use of environment variables and math libraries installed bioinformatics class applications.

[0028] 通过本实施例所公开的安装生物信息学类应用程序的方法简化了生物信息学类应用程序的安装流程,降低了安装难度;通过依赖关系判断、容错性判断、标准化配置等方式提高了应用程序的安装成功率和安装质量,最大程度避免了人为操作失误;通过无人值守的方式大大提高了生物信息学类应用程序的安装部署效率,节省了时间及人力。 [0028] The method of the present embodiment of the disclosed mounting bioinformatics-based application simplifies the installation process bioinformatics-based applications, reducing installation difficulty; increased by the dependency relationship determination, fault tolerance is determined, a standardized configuration etc. the application installation success rate and quality of the installation, the maximum to avoid the human error; unattended by way of greatly improving the efficiency of the installation and deployment of bioinformatics class applications, saving time and manpower.

[0029]图2是根据本发明的在高性能集群系统中安装生物信息学类应用程序的方法的具体实施例的流程图。 [0029] FIG 2 is a specific method for mounting bioinformatics-based application in a high performance cluster system flow diagram of the present invention embodiment. 在图2中: In Figure 2:

[0030] 步骤S200:载入生物信息学类应用程序的环境变量。 [0030] Step S200: Loading bioinformatics-based application environment variables. 其中,环境变量可以包括整体环境变量和具体环境变量,整体环境变量包括安装过程子程序、生物信息学类应用程序源程序位置和生物信息学类应用程序的安装目标路径,并且具体环境变量包括编译器和MPI。 Wherein, environment variables may include variables and the overall environment specific environment variable, the overall target path environment variable includes mounting installation routine, bioinformatics applications based source location and bioinformatics-based application, and the particular environment variables including compiled and MPI.

[0031] 步骤S202:检查生物信息学类应用程序的源程序是否存在和安装目标文件夹是否能够正常创建,如果是,则执行下述步骤,如果否,则退出安装。 [0031] Step S202: Check bioinformatics-based application installation source and whether there is the destination folder can be normally created, and if yes, performing the following step, if not, then exit the installation.

[0032] 步骤S204:获取当前安装平台的系统类型和网络配置。 [0032] Step S204: acquiring the current network configuration system type and mounting platform. 其中,系统类型可以包括当前安装平台的操作系统版本。 Among them, the type of system may include an operating system currently installed version of the platform. 在一个优选实施例中,可以通过查看当前安装平台的系统文件获取当前安装平台的操作系统版本。 In a preferred embodiment, the operating system can get the currently installed version of the platform by reviewing the current file system mounting platform. 其中,操作系统可以包括Red Hat、Suse、CentOS等主流高性能集群操作系统。 Among them, the operating system may include Red Hat, Suse, CentOS and other mainstream high-performance cluster operating system. 此外,网络配置可以包括是否配置有Infiniband网卡,还可以进一步检测Infiniband网卡是否安装有驱动程序以及该网卡是否正常运行。 Further, the network configuration may include a network card is configured with the Infiniband, Infiniband may further detect whether the card has been mounted and the card drivers running.

[0033] 步骤S206:根据当前安装平台的系统类型和网络配置选择对应的数学库。 [0033] Step S206: the math library configuration options based on the current corresponding to a mounting platform type and network systems. [0034] 步骤S208:利用环境变量和数学库编译安装生物信息学类应用程序。 [0034] Step S208: the use of environment variables and math library compiled and installed bioinformatics class applications.

[0035] 步骤S210:为生物信息学类应用程序生成在高性能集群系统提交作业的脚本示例,其中,脚本示例内容包括生物信息学类应用程序的资源申请方式和应用程序运行方式。 [0035] Step S210: generating class bioinformatics script example application submits a job in the cluster of high-performance system, wherein the resource request including script examples and applications running mode based bioinformatics applications. 为应用程序生成一个在集群系统提交作业的脚本示例,示例文件包括两部分:如何申请计算资源、如何运行应用程序。 Submit an application to generate a script example operating in a cluster system, the sample file consists of two parts: how to apply computing resources, how to run the application. 高性能集群系统一般配置作业调度系统,准备的脚本里面包括如何申请计算资源、如何执行命令等,这部分内容与应用程序无关,取决于调度系统的设置,本发明中选用的最常用的Pbs调度系统,需要设置的参数有“#PBS-lnodeS = I:ppn =2”,“#PBS_q low”等;执行命令的方式会根据不同的应用程序、前期检测到的网络情况有所不同,如果配置了Infiniband网络,选用的是Openmpi的mpi库,需要加“一mca btl self,openib”参数等等。 High Performance Cluster systems are generally configured job scheduling system, including how to prepare scripts which apply computing resources, how to execute the command, etc., regardless of which part of the application, depending on the setting of the scheduling system, the present invention is most commonly selected scheduling Pbs system needs to be set with a parameter "# PBS-lnodeS = I: ppn = 2", "# PBS_q low" and the like; Run mode will be different according to the application, pre-detected network situation is different if the configuration the Infiniband network, the choices are Openmpi the mpi library, and needs "a mca btl self, openib" parameter and so on. 使用的时候根据实际情况所需的资源,做简单的修改即可。 When used in accordance with the actual situation of the resources required to do simple modifications.

[0036] 通过本实施例所公开的安装生物信息学类应用程序的方法简化了生物信息学类应用程序的安装流程,降低了安装难度;通过依赖关系判断、容错性判断、标准化配置等方式提高了应用程序的安装成功率和安装质量,最大程度避免了人为操作失误;通过无人值守的方式大大提高了生物信息学类应用程序的安装部署效率,节省了时间及人力。 [0036] The method of the present embodiment of the disclosed mounting bioinformatics-based application simplifies the installation process bioinformatics-based applications, reducing installation difficulty; increased by the dependency relationship determination, fault tolerance is determined, a standardized configuration etc. the application installation success rate and quality of the installation, the maximum to avoid the human error; unattended by way of greatly improving the efficiency of the installation and deployment of bioinformatics class applications, saving time and manpower.

[0037]图3是根据本发明的在高性能集群系统中安装生物信息学类应用程序的方法的实例的流程图。 [0037] FIG. 3 is a flowchart of an example method of installation based bioinformatics applications in high-performance cluster system according to the present invention. 在本实例中: In this example:

[0038] 第一步:载入程序包整体环境变量,主要包括安装过程中需要用到的子程序、应用程序源程序所在位置、安装目标路径等。 [0038] The first step: Load Package overall environment variables, including the installation process need to use subroutines, the location where the application source code, such as the installation target path.

[0039] 第二步:载入安装应用程序需要的环境变量,大部分的生物信息学类应用程序为多线程式编程,只能在单台服务器上运行,需要载入的环境变量主要是编译器。 [0039] Step Two: Load environment variables need to install the application, most of bioinformatics applications like multi-line program to program, can only run on a single server, you need to load environment variables are mainly compiled device. 个别应用程序支持多节点并行,比如mpiblast, mpi hummer等,除了编译器外还需要载入MPI库的环境变量。 Individual applications support multi-node parallel, such as mpiblast, mpi hummer, etc., in addition to the compiler also need to load environment variables MPI library. 载入需要的环境后,测试编译器等是否能够正常使用。 After loading the environmental needs, such as whether to test the compiler can be used normally.

[0040] 第三步:检查应用程序的源程序是否存在、安装目标文件夹是否正常创建等。 [0040] The third step: Check your application's source code is present, the installation destination folder is properly created and so on.

[0041] 第四步:检查高性能集群环境,包括操作系统版本、网络等。 [0041] Step 4: Check high performance cluster environment, including an operating system version, network or the like.

[0042] 操作系统版本可以通过查看系统文件设置获取,目前支持的操作系统包括RedHat、Suse、CentOS等主流高性能集群操作系统。 [0042] operating system versions can be obtained by viewing the system settings file, it is currently supported operating systems, including RedHat, Suse, CentOS and other mainstream high-performance cluster operating system.

[0043] 网络检查主要是支持多节点的应用程序安装时需要进行这部分内容,比如mpiblast, mpihummer,检测内容为是否配置了高速Infiniband网,并计划应用程序使用这种网络,主要从两点检查: [0043] Network check mainly to support the need for this part of the application of multi-node installation, such as mpiblast, mpihummer, configured to detect whether the content is a high-speed Infiniband network, and plans to use this network applications, mainly from the two inspection :

[0044] (I)检查服务器中是否配置了高速Infiniband网卡。 [0044] (I) to check whether the server is configured with high-speed Infiniband network cards.

[0045] (2)是否为Infiniband网卡安装了驱动,网卡状态是否正常。 [0045] (2) is installed to drive the Infiniband network card, if the card status is normal.

[0046] 第五步:根据获得的系统信息,自动设置程序安装需要的变量,对程序进行编译安装。 [0046] Step 5: The system variable information obtained, the automatic setup program required for the installation, the installation of the program is compiled.

[0047] 第六步:为应用程序生成一个在集群系统提交作业的脚本示例,示例文件包括两部分:如何申请计算资源、如何运行应用程序。 [0047] The sixth step: Generate script for the application to submit a sample job in a clustered system, the sample file consists of two parts: how to apply computing resources, how to run the application. 高性能集群系统一般配置作业调度系统,准备的脚本里面包括如何申请计算资源、如何执行命令等,这部分内容与应用程序无关,取决于调度系统的设置,本发明中选用的最常用的Pbs调度系统,需要设置的参数有“#PBS-lnodes = I:ppn = 2”,“#PBS q low”等;执行命令的方式会根据不同的应用程序有所不同,对于mpiblast,mpihummer这种支持多节点的应用程序,如果配置了Infiniband网络,选用的是Openmpi的mpi库,需要加“一mca btl self, openib”参数,如果是其它单节点应用程序,则不需要指定网络参数等等。 High Performance Cluster systems are generally configured job scheduling system, including how to prepare scripts which apply computing resources, how to execute the command, etc., regardless of which part of the application, depending on the setting of the scheduling system, the present invention is most commonly selected scheduling Pbs system needs to be set with a parameter "# PBS-lnodes = I: ppn = 2", "# PBS q low" and the like; Run mode will vary depending on the application, for mpiblast, mpihummer this support multiple application node, if the configuration Infiniband network, the choices are Openmpi mpi library, and needs "a mca btl self, openib" parameter, if other nodes single application, and so do not need to specify the network parameters. 使用的时候根据实际情况所需的资源,做简单的修改即可。 When used in accordance with the actual situation of the resources required to do simple modifications.

[0048] 程序包在安装过程中会对安装过程产生的输出进行保存,如果非正常退出,可以查看保存的文件,查找出错的原因。 Output [0048] package will install during the installation process created to save, if the non-normal exit, you can view the saved file, find the cause of the error.

[0049] 程序包的使用:把程序包解压后会有一个install, sh的命令,进入程序包的文件夹,执行命令:./install.sh-<应用程序的名字>。 [0049] use package: After extracting the package will have a install, sh command, file folder into the package, execute the command: ./ install.sh- <application name>. 之后即可实现应用程序的自动安装。 After you can automatically install the application.

[0050] 本发明提出了一种高性能计算集群生物信息学类应用程序的自动安装方法。 [0050] The present invention provides a method for the automatic installation HPC clusters based bioinformatics applications. 通过自动化的方式大大简化了生物信息学类应用程序的安装流程,降低了安装难度;通过依赖关系判断、容错性判断、标准化配置等方式提高了应用程序的安装成功率和安装质量,最大程度避免了人为操作失误;通过无人值守的方式大大提高了生物信息学类应用程序的安装部署效率,节省了时间及人力。 By automated manner greatly simplifies the installation process bioinformatics-based applications, reducing installation difficulty; by dependency determination, fault tolerance is determined, standardized configuration other ways to enhance the installation success rate and quality of the installation of the application, the maximum extent to avoid the human error; unattended manner by increasing installation efficiency bioinformatics-based deployment application, saving time and labor. 该方法和程序广泛适用于不同规模的高性能计算集群生物信息学类应用程序的自动快速安装部署,也适用于动态多变的环境中(如云计算)对临时计算资源进行高性能计算程序环境快速配置部署。 The methods and procedures are widely applicable to different sizes of high-performance computing clusters bioinformatics applications like automatic quick installation and deployment, but also for the dynamic changing environment (cloud computing) computing resources on a temporary high-performance computing environment program rapid deployment configuration.

[0051] 以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。 [0051] The above description is only preferred embodiments of the present invention, it is not intended to limit the invention to those skilled in the art, the present invention may have various changes and variations. 凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 Any modification within the spirit and principle of the present invention, made, equivalent substitutions, improvements, etc., should be included within the scope of the present invention.

Claims (10)

1.一种在高性能集群系统中安装生物信息学类应用程序的方法,其特征在于,包括: 步骤S1:载入所述生物信息学类应用程序的环境变量; 步骤S2:根据当前安装平台的系统类型和网络配置选择对应的数学库; 步骤S3:利用所述环境变量和所述数学库安装所述生物信息学类应用程序。 A method of installing a bioinformatics-based high-performance applications in a cluster system, characterized by comprising: Step S1: Loading the environment variables based bioinformatics applications; Step S2: According to the current mounting platform the type of system and network configuration corresponding to the selected math library; step S3: using the environmental variables and the mounting of the math library class bioinformatics applications.
2.根据权利要求1所述的方法,其特征在于,在所述步骤S2之前,所述方法还包括:检查所述生物信息学类应用程序的源程序是否存在和安装目标文件夹是否能够正常创建,如果是,则执行步骤S2。 The method according to claim 1, characterized in that, prior to the step S2, the method further comprises: checking the source bioinformatics application class exists and whether the installation target folder can normally create, and if so, proceed to step S2.
3.根据权利要求1或2所述的方法,其特征在于,在所述步骤S2之前,所述方法还包括:获取当前安装平台的所述系统类型和所述网络配置。 3. The method of claim 1 or claim 2, wherein, prior to the step S2, the method further comprising: obtaining the current system and the network type mounting platform configuration.
4.根据权利要求3所述的方法,其特征在于,所述系统类型包括当前安装平台的操作系统版本。 4. The method according to claim 3, wherein said system comprises an operating system type of the currently installed version of the platform.
5.根据权利要求4所述的方法,其特征在于,获取当前安装平台的所述系统类型包括:通过查看当前安装平台的系统文件获取当前安装平台的操作系统版本。 5. The method as claimed in claim 4, wherein acquiring the current system type mounting platform comprising: acquiring operating system version currently installed on the platform by reviewing the current file system mounting platform.
6.根据权利要求5所述的方法,其特征在于,所述网络配置包括是否配置有Infiniband 网卡。 6. The method according to claim 5, wherein the network configuration comprises a network adapter is configured with the Infiniband.
7.根据权利要求6所述的方法,其特征在于,获取当前安装平台的所述系统类型和所述网络配置包括: 通过查看当前安装平台的系统文件获取当前安装平台的操作系统版本; 检查当前安装平台中是否配置了Infiniband网卡;以及检查所述Infiniband网卡是否安装了驱动并且是否能够正常运行。 7. The method according to claim 6, wherein said mounting platform to get the current system and the type of network configuration comprising: acquiring operating system version currently installed by viewing platform currently installed file system platform; Check the current the mounting platform is configured in the Infiniband network adapter; Infiniband network card and checking whether the driver is installed and is working correctly.
8.根据权利要求7所述的方法,其特征在于,所述环境变量包括整体环境变量和具体环境变量,所述整体环境变量包括安装过程子程序、所述生物信息学类应用程序源程序位置和所述生物信息学类应用程序的安装目标路径,并且所述具体环境变量包括编译器和MPI。 8. The method according to claim 7, wherein said environmental variables and environment variables comprising the overall specific environment variables, the whole installation process environment variables include a subroutine, a bioinformatics-based source location application and mounting the target path bioinformatics-based application, and the particular environment variables including a compiler and MPI.
9.根据权利要求1所述的方法,其特征在于,所述方法还包括:将安装过程中生成的输出信息保存。 9. The method according to claim 1, characterized in that, said method further comprising: mounting the output information generated during storage.
10.根据权利要求1所述的方法,其特征在于,所述方法还包括:为所述生物信息学类应用程序生成在所述高性能集群系统提交作业的脚本示例,其中,所述脚本示例内容包括所述生物信息学类应用程序的资源申请方式和应用程序运行方式。 10. The method according to claim 1, characterized in that, said method further comprising: generating a job submission script sample in the cluster system is the high-performance bioinformatics-based application, wherein an example of the script including the mode of application resource class bioinformatics applications and applications run mode.
CN2013102601126A 2013-06-26 2013-06-26 Method for installing bioinformatics application programs in high-performance cluster system CN103324509A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013102601126A CN103324509A (en) 2013-06-26 2013-06-26 Method for installing bioinformatics application programs in high-performance cluster system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013102601126A CN103324509A (en) 2013-06-26 2013-06-26 Method for installing bioinformatics application programs in high-performance cluster system

Publications (1)

Publication Number Publication Date
CN103324509A true CN103324509A (en) 2013-09-25

Family

ID=49193276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013102601126A CN103324509A (en) 2013-06-26 2013-06-26 Method for installing bioinformatics application programs in high-performance cluster system

Country Status (1)

Country Link
CN (1) CN103324509A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126976A (en) * 2016-06-15 2016-11-16 北京市计算中心 Biological information analysis system applied in server
CN106445605A (en) * 2016-09-30 2017-02-22 郑州云海信息技术有限公司 Method for silent installation of ICC compiling environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040060035A1 (en) * 2002-09-24 2004-03-25 Eric Ustaris Automated method and system for building, deploying and installing software resources across multiple computer systems
CN101937351A (en) * 2010-09-15 2011-01-05 深圳市任子行网络技术股份有限公司 Method and system for automatically installing application software
CN102141924A (en) * 2010-01-29 2011-08-03 迈普通信技术股份有限公司 Batch production method of Linux boards and production server

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040060035A1 (en) * 2002-09-24 2004-03-25 Eric Ustaris Automated method and system for building, deploying and installing software resources across multiple computer systems
CN102141924A (en) * 2010-01-29 2011-08-03 迈普通信技术股份有限公司 Batch production method of Linux boards and production server
CN101937351A (en) * 2010-09-15 2011-01-05 深圳市任子行网络技术股份有限公司 Method and system for automatically installing application software

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
严冰等: "《Linux程序设计》", 2 February 2012, 浙江大学出版社 *
孙靖 等: "InfiniBand技术及其在Linux系统中的配置简介", 《HTTP://WWW.IBM.COM/DEVELOPERWORKS/CN/LINUX/L-CN-INFINIBAND/》 *
高俊峰 等: "《国产Linux基础应用》", 31 July 2012 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126976A (en) * 2016-06-15 2016-11-16 北京市计算中心 Biological information analysis system applied in server
CN106445605A (en) * 2016-09-30 2017-02-22 郑州云海信息技术有限公司 Method for silent installation of ICC compiling environment

Similar Documents

Publication Publication Date Title
Lun et al. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor
Lancaster et al. PyPop: a software framework for population genomics: analyzing large-scale multi-locus genotype data
Fonseca et al. Tools for mapping high-throughput sequencing data
Okonechnikov et al. Unipro UGENE: a unified bioinformatics toolkit
Pavlidis et al. SweeD: likelihood-based detection of selective sweeps in thousands of genomes
Scherer et al. PyEMMA 2: A software package for estimation, validation, and analysis of Markov models
Kumar et al. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets
Prosperi et al. QuRe: software for viral quasispecies reconstruction from next-generation sequencing data
Wang et al. Target analysis by integration of transcriptome and ChIP-seq data with BETA
O'Sullivan et al. 3DCoffee: combining protein sequences and structures within multiple sequence alignments
Antao et al. LOSITAN: a workbench to detect molecular adaptation based on a F ST-outlier method
Trapnell et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks
Holt et al. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects
Smith et al. Using quality scores and longer reads improves accuracy of Solexa read mapping
Reid et al. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline
Webb et al. Protein structure modeling with MODELLER
Norgan et al. Multilevel parallelization of AutoDock 4.2
Pronk et al. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
Fourment et al. A comparison of common programming languages used in bioinformatics
Lee et al. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score
Feng et al. PeakRanger: a cloud-enabled peak caller for ChIP-seq data
Kumar et al. Bioinformatics software for biologists in the genomics era
US20170220732A1 (en) Comprehensive analysis pipeline for discovery of human genetic variation
Feuda et al. Improved modeling of compositional heterogeneity supports sponges as sister to all other animals
US7536678B2 (en) System and method for determining the possibility of adverse effect arising from a code change in a computer program

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
RJ01