CN106502795A - The method and system of scientific algorithm application deployment are realized on distributed type assemblies - Google Patents

The method and system of scientific algorithm application deployment are realized on distributed type assemblies Download PDF

Info

Publication number
CN106502795A
CN106502795A CN201610954803.XA CN201610954803A CN106502795A CN 106502795 A CN106502795 A CN 106502795A CN 201610954803 A CN201610954803 A CN 201610954803A CN 106502795 A CN106502795 A CN 106502795A
Authority
CN
China
Prior art keywords
distributed type
type assemblies
node
cluster
scientific algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610954803.XA
Other languages
Chinese (zh)
Inventor
李俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201610954803.XA priority Critical patent/CN106502795A/en
Publication of CN106502795A publication Critical patent/CN106502795A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources

Abstract

The invention discloses realizing the method and system of scientific algorithm application deployment on distributed type assemblies, belong to scientific algorithm applied technical field, disclosure address how the application that itself is not possessed parallel ability carries out the problem of parallel computation operation, the technical scheme for adopting for:The external node of configuration cluster, name node and back end on distributed type assemblies;Client is accessed to cluster to exterior node by cluster;Name node is used for carrying out Map processes;Back end is used for carrying out Reduce processes;The installation share directory of distributed type assemblies, in name node, is set, disposing application program and/or file in share directory is installed;Share directory is installed globally shared for carrying out application program or file;The user of switching distributed type assemblies, opens application program, executes;Back end is logged in, checks whether the Reduce processes of application execute using top instructions;After being finished, collect all of node integration and feed back to active user.

Description

The method and system of scientific algorithm application deployment are realized on distributed type assemblies
Technical field
The present invention relates to a kind of scientific algorithm applied technical field, specifically realizes scientific algorithm on distributed type assemblies The method and system of application deployment.
Background technology
Common scientific algorithm application is deployed in High-Performance Computing Cluster, and High-Performance Computing Cluster can squeeze computing resource to greatest extent Ability be High-Performance Computing Cluster a big feature, and the reason for major clusters species selected as scientific algorithm application it One.
But want realize in High-Performance Computing Cluster efficient parallel, the most important condition be by scientific algorithm application can parallel based on Calculate, in other words, the application is through multiple programming and exploitation.From the point of view of with regard to current application market, great majority are used for nature section Application is all concurrent program, and the application for being used for social sciences is seldom then parallel compilation, in High-Performance Computing Cluster it Can only single node serial computing.
The distributed type assemblies hadoop cluster that namely we often mention, hadoop are a kind of open source technologies, Hadoop's Big data is processed engine as far as possible near storage by distributed structure/architecture.The MapReduce of Hadoop(MapReduce)Function reality Show and individual task has been smashed, and fragment task (Map) has been sent on multiple nodes, afterwards again in the form of individual data collection (Reduce) is loaded in data warehouse.Hadoop is made up of many elements.Its bottommost is Hadoop Distributed File System (HDFS distributed file systems), it store the file in Hadoop clusters on all memory nodes. The MapReduce of Hadoop adopts Master/Slave(Master/slave)Structure.Master:It is the unique global pipe of whole cluster Reason person, function include:JobTracker in task management, condition monitoring and task scheduling etc., i.e. MapReduce.Slave: TaskTracker in the execution of responsible task and the return of task status, i.e. MapReduce.
JobTracker is a background service process, after startup, can monitor always and receive from each The heartbeat message that TaskTracker sends, including the information such as resource service condition and task run situation.The master of JobTracker Want function:1. Operation control:In hadoop, each application program is expressed as an operation, and each operation is divided into multiple again Task, the Operation control module of JobTracker are then responsible for decomposition and the condition monitoring of operation;Condition monitoring:Mainly include TaskTracker condition monitorings, job state monitoring and task status monitoring;Main Function:Fault-tolerant and provide for task scheduling Decision-making foundation.2. resource management.
TaskTracker is the bridge between JobTracker and Task:On the one hand, receive from JobTracker and execute Various orders:Operation task, submission task, kill task dispatching;On the other hand, the state of each task on local node is passed through Heart beat cycle is reported to JobTracker.Employing RPC agreements between TaskTracker and JobTracker and Task is carried out Communication.The function of TaskTracker:1. heart beating is reported:Various information on all nodes are periodically passed through heart beating machine by Tracker System is reported to JobTracker;Information includes:Machine class information(Node health situation, resource service condition etc.), task level Other information(Tasks carrying progress, task run state etc.);2. order is executed:JobTracker can be assigned to TaskTracker Various orders, mainly include:Startup task (LaunchTaskAction), submission task (CommitTaskAction), kill Task (KillTaskAction), kill operation and (KillJobAction) and reinitialize (TaskTrackerReinitAction).3. resource management.
There are a lot of commercial companies on market multiple different business-like hadoop products are provided for application.And this Its core of different types of product remains original hadoop a bit, and its agent structure has certain difference with High-Performance Computing Cluster, The HDFS of bottom(Hadoop Distributed File System)File system and MapReduce distributed treatments are to constitute The core of hadoop Distributed Calculations.High-performance calculation depends on to apply itself has parallel ability, and hadoop cluster sheet Body is provided the ability of distributed treatment by system layer.Obviously, some cannot be parallel scientific algorithm application deployment in height Serial computing on performance cluster, efficiency be not in fact high, it is impossible at utmost using cluster resource.Application itself does not possess parallel Ability, how to carry out parallel computation operation to this application is the problem for solving that needs at present.
Content of the invention
The technical assignment of the present invention is to provide on distributed type assemblies the method and system for realizing scientific algorithm application deployment, solves The application how itself is not possessed parallel ability carries out the problem of parallel computation operation.
The present invention technical assignment realize in the following manner,
The method for realizing scientific algorithm application deployment on distributed type assemblies, step are as follows:
(1), deployment distributed type assemblies so that the user of distributed type assemblies normal storage file and can normally carry out Map- Reduce processes;A cluster is configured on distributed type assemblies to exterior node, at least two name nodes(Namenode)With many The individual back end for being calculated(Datanode);Client is accessed to cluster to exterior node by cluster;Name node is used for Carry out Map processes;Back end is used for carrying out Reduce processes;
(2), distributed type assemblies using scientific algorithm application carry out tuning, comprising Map-Reduce function interfaces;
(3), in name node, arrange distributed type assemblies installation share directory, install share directory in application deployment journey Sequence and/or file;Share directory is installed globally shared for carrying out application program or file so that the installation can be accessed altogether The child node for enjoying catalogue can call application program or reading and writing of files;
(4), switching distributed type assemblies user, open application program, there are application programming interfaces, under the application programming interfaces Carry out example or program debugging operation;
(5), log in each execution Reduce processes back end(Datanode), application is checked using top instructions Whether Reduce processes execute;
(6), be finished after, distributed type assemblies are collected automatically all of node and are integrated and feed back to active user.
Step(1)In, distributed type assemblies are using hadoop or commercial version hadoop of increasing income.
Step(3)In, share directory is installed and uses common memory space, or using the shared catalogues of NFS, such as/opt.
Step(3)In, installing disposing application program and/or Files step in share directory is:Installed by uploading decompression Three steps, test after installing whether each node can normally access and execute corresponding application program and/or file.
Step(4)In, example or program debugging operation:Calculating by all Datanode of the Map-Reduce invocations of procedure Resource, the number for calling computing resource is arranged by hadoop configuration process;Application program is opened, the example of application program is input into Parameter or corresponding program, start to execute.
Step(6)In, node integration feeds back to active user's way of output and is:Screen output or file output.
The system for realizing scientific algorithm application deployment on distributed type assemblies, configures a cluster external on distributed type assemblies Node, at least two name nodes(Namenode)With multiple back end for being calculated(Datanode);Client passes through Cluster is accessed to cluster to exterior node;Name node is used for carrying out Map processes;Back end is used for carrying out Reduce processes.
Distributed type assemblies are using hadoop or commercial version hadoop of increasing income.
Client accesses cluster to exterior node by external network or public network, and cluster passes through internal network to exterior node Name node can be accessed(Namenode)And back end(Datanode), name node(Namenode)And back end (Datanode)The internal network for carrying out data interaction and communication is gigabit networking or 10,000,000,000 networks or Infiniband nets Network.
Realize on the distributed type assemblies of the present invention that the method and system of scientific algorithm application deployment have advantages below:
1st, the cross-node Distributed Calculation of the application based on hadoop exploitations is realized on distributed type assemblies, makes up high-performance collection The shortcoming for applying serial working performance low in group;
2nd, common distributed type assemblies are such as increased income hadoop or the distributed software of commercial version;Can realize that this kind of application is passed through Applying after hadoop tunings affixes one's name to computing in hadoop platform uppers, so as to the ability with cross-node distributed arithmetic, improves PC cluster efficiency;
3rd, simple and clear, it is easy to operate, distributed type assemblies are the whole bases for setting up, and realize the distributed meter of scientific algorithm application The core for setting up at last and purpose, the node that there are other character types in cluster are still suitable for;Final purpose is the section of deployment Learn to calculate to apply and can realize irrealizable cross-node computing in High-Performance Computing Cluster, to greatest extent cluster resource is utilized Come, for scientific computing process in;
4th, application program and file are deployed under installation share directory, the application deployment time that can be saved under distributed type assemblies And difficulty.
Description of the drawings
The present invention is further described below in conjunction with the accompanying drawings.
Accompanying drawing 1 is the flow chart of the method for realizing scientific algorithm application deployment on distributed type assemblies;
Accompanying drawing 2 is the structured flowchart of the system for realizing scientific algorithm application deployment on distributed type assemblies.
Specific embodiment
With reference to Figure of description and specific embodiment to realizing on distributed type assemblies of the invention that scientific algorithm application is disposed Method and system be described in detail below.
Embodiment 1:
The method for realizing scientific algorithm application deployment on the distributed type assemblies of the present invention, step are as follows:
(1), deployment distributed type assemblies so that the user of distributed type assemblies normal storage file and can normally carry out Map- Reduce processes;A cluster is configured on distributed type assemblies to exterior node, at least two name nodes(Namenode)With many The individual back end for being calculated(Datanode);Client is accessed to cluster to exterior node by cluster;Name node is used for Carry out Map processes;Back end is used for carrying out Reduce processes;Distributed type assemblies are using the hadoop that increases income;
(2), distributed type assemblies using scientific algorithm application carry out tuning, comprising Map-Reduce function interfaces;
(3), in name node, arrange distributed type assemblies installation share directory, install share directory use common storage Space;Disposing application program and/or file in share directory is installed;Installing share directory is used for application program or file Carry out globally shared so that the child node that can access the installation share directory can call application program or reading and writing of files;Install In share directory, disposing application program and/or Files step are:Three steps are installed by uploading decompression, after installing, test each section Whether point can normally access and execute corresponding application program and/or file;
(4), switching distributed type assemblies user, open application program, there are application programming interfaces, under the application programming interfaces Carry out example or program debugging operation;Example or program debugging operation:By all Datanode of the Map-Reduce invocations of procedure Computing resource, the number for calling computing resource is arranged by hadoop configuration process;Application program is opened, application program is input into Example parameter or corresponding program, start to execute;
(5), log in each execution Reduce processes back end(Datanode), application is checked using top instructions Whether Reduce processes execute;
(6), be finished after, distributed type assemblies are collected automatically all of node and are integrated and feed back to active user;Node is integrated anti- Active user's way of output of feeding is:Screen is exported.
Embodiment 2:
The method for realizing scientific algorithm application deployment on the distributed type assemblies of the present invention, step are as follows:
(1), deployment distributed type assemblies so that the user of distributed type assemblies normal storage file and can normally carry out Map- Reduce processes;A cluster is configured on distributed type assemblies to exterior node, at least two name nodes(Namenode)With many The individual back end for being calculated(Datanode);Client is accessed to cluster to exterior node by cluster;Name node is used for Carry out Map processes;Back end is used for carrying out Reduce processes;Distributed type assemblies use commercial version hadoop;
(2), distributed type assemblies using scientific algorithm application carry out tuning, comprising Map-Reduce function interfaces;
(3), the installation share directory of distributed type assemblies, in name node, is set, share directory is installed using the shared mesh of NFS Record/opt;Disposing application program and/or file in share directory is installed;Installing share directory is used for application program or text Part carries out globally shared so that the child node that can access the installation share directory can call application program or reading and writing of files;Peace In dress share directory, disposing application program and/or Files step are:Three steps are installed by uploading decompression, are tested after installing each Whether node can normally access and execute corresponding application program and/or file;
(4), switching distributed type assemblies user, open application program, there are application programming interfaces, under the application programming interfaces Carry out example or program debugging operation;Example or program debugging operation:By all Datanode of the Map-Reduce invocations of procedure Computing resource, the number for calling computing resource is arranged by hadoop configuration process;Application program is opened, application program is input into Example parameter or corresponding program, start to execute;
(5), log in each execution Reduce processes back end(Datanode), application is checked using top instructions Whether Reduce processes execute;
(6), be finished after, distributed type assemblies are collected automatically all of node and are integrated and feed back to active user;Node is integrated anti- Active user's way of output of feeding is:File is exported.
Embodiment 3:
The system for realizing scientific algorithm application deployment on the distributed type assemblies of the present invention, configures a cluster on distributed type assemblies To exterior node, at least two name nodes(Namenode)With multiple back end for being calculated(Datanode);Client Exterior node is accessed to cluster by cluster;Name node is used for carrying out Map processes;Back end is used for carrying out Reduce mistakes Journey.Distributed type assemblies are using hadoop or commercial version hadoop of increasing income.Client is accessed by external network or public network Cluster can access name node to exterior node by internal network to exterior node, cluster(Namenode)And back end (Datanode), name node(Namenode)And back end(Datanode)Carry out the internal network of data interaction and communication For gigabit networking.
By specific embodiment above, the those skilled in the art can readily realize the present invention.But should Work as understanding, the present invention is not limited to above-mentioned specific embodiment.On the basis of disclosed embodiment, the technical field Technical staff can the different technical characteristic of combination in any, so as to realize different technical schemes.
In addition to the technical characteristic described in description, the known technology of those skilled in the art is.

Claims (9)

1. the method for realizing scientific algorithm application deployment on distributed type assemblies, it is characterised in that step is as follows:
(1), deployment distributed type assemblies so that the user of distributed type assemblies normal storage file and can normally carry out Map- Reduce processes;A cluster is configured to exterior node, at least two name nodes on distributed type assemblies and multiple is calculated Back end;Client is accessed to cluster to exterior node by cluster;Name node is used for carrying out Map processes;Back end For carrying out Reduce processes;
(2), distributed type assemblies using scientific algorithm application carry out tuning, comprising Map-Reduce function interfaces;
(3), in name node, arrange distributed type assemblies installation share directory, install share directory in application deployment journey Sequence and/or file;Share directory is installed globally shared for carrying out application program or file so that the installation can be accessed altogether The child node for enjoying catalogue can call application program or reading and writing of files;
(4), switching distributed type assemblies user, open application program, there are application programming interfaces, under the application programming interfaces Carry out example or program debugging operation;
(5), log in the back end of each execution Reduce process, whether the Reduce processes for checking application using top instructions Execute;
(6), be finished after, distributed type assemblies are collected automatically all of node and are integrated and feed back to active user.
2. the method for realizing scientific algorithm application deployment on distributed type assemblies according to claim 1, it is characterised in that step Suddenly(1)In, distributed type assemblies are using hadoop or commercial version hadoop of increasing income.
3. the method for realizing scientific algorithm application deployment on distributed type assemblies according to claim 1, it is characterised in that step Suddenly(3)In, share directory is installed and uses common memory space, or using the shared catalogues of NFS.
4. the method for realizing scientific algorithm application deployment on distributed type assemblies according to claim 1, it is characterised in that step Suddenly(3)In, installing disposing application program and/or Files step in share directory is:Three steps are installed by uploading decompression, are installed Test whether each node can normally access and execute corresponding application program and/or file after finishing.
5. the method for realizing scientific algorithm application deployment on distributed type assemblies according to claim 1, it is characterised in that step Suddenly(4)In, example or program debugging operation:By the computing resource of all Datanode of the Map-Reduce invocations of procedure, call The number of computing resource is arranged by hadoop configuration process;Open application program, be input into application program example parameter or Corresponding program, starts to execute.
6. the method for realizing scientific algorithm application deployment on distributed type assemblies according to claim 1, it is characterised in that step Suddenly(6)In, node integration feeds back to active user's way of output and is:Screen output or file output.
7. the system for realizing scientific algorithm application deployment on distributed type assemblies, it is characterised in that configure on distributed type assemblies Cluster is to exterior node, at least two name nodes and multiple back end for being calculated;Client is by cluster to exterior node Cluster is accessed;Name node is used for carrying out Map processes;Back end is used for carrying out Reduce processes.
8. the system for realizing scientific algorithm application deployment on distributed type assemblies according to claim 7, it is characterised in that point Cloth cluster is using hadoop or commercial version hadoop of increasing income.
9. the system for realizing scientific algorithm application deployment on distributed type assemblies according to claim 7, it is characterised in that visitor Family end accesses cluster to exterior node by external network or public network, and cluster can access title to exterior node by internal network It is gigabit networking or ten thousand that node and back end, name node and back end carry out data interaction and the internal network of communication Million networks or Infiniband networks.
CN201610954803.XA 2016-11-03 2016-11-03 The method and system of scientific algorithm application deployment are realized on distributed type assemblies Pending CN106502795A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610954803.XA CN106502795A (en) 2016-11-03 2016-11-03 The method and system of scientific algorithm application deployment are realized on distributed type assemblies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610954803.XA CN106502795A (en) 2016-11-03 2016-11-03 The method and system of scientific algorithm application deployment are realized on distributed type assemblies

Publications (1)

Publication Number Publication Date
CN106502795A true CN106502795A (en) 2017-03-15

Family

ID=58322502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610954803.XA Pending CN106502795A (en) 2016-11-03 2016-11-03 The method and system of scientific algorithm application deployment are realized on distributed type assemblies

Country Status (1)

Country Link
CN (1) CN106502795A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168795A (en) * 2017-05-12 2017-09-15 西南大学 Codon deviation factor model method based on CPU GPU isomery combined type parallel computation frames
CN107846411A (en) * 2017-11-24 2018-03-27 郑州云海信息技术有限公司 A kind of DNS clustered deploy(ment)s system and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631623A (en) * 2013-11-29 2014-03-12 浪潮(北京)电子信息产业有限公司 Method and device for allocating application software in trunking system
CN103647797A (en) * 2013-11-15 2014-03-19 北京邮电大学 Distributed file system and data access method thereof
CN103678360A (en) * 2012-09-13 2014-03-26 腾讯科技(深圳)有限公司 Data storing method and device for distributed file system
CN104113597A (en) * 2014-07-18 2014-10-22 西安交通大学 Multi- data-centre hadoop distributed file system (HDFS) data read-write system and method
CN104363222A (en) * 2014-11-11 2015-02-18 浪潮电子信息产业股份有限公司 Hadoop-based network security event analyzing method
CN104702702A (en) * 2012-01-11 2015-06-10 北京奇虎科技有限公司 System and method for downloading data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104702702A (en) * 2012-01-11 2015-06-10 北京奇虎科技有限公司 System and method for downloading data
CN103678360A (en) * 2012-09-13 2014-03-26 腾讯科技(深圳)有限公司 Data storing method and device for distributed file system
CN103647797A (en) * 2013-11-15 2014-03-19 北京邮电大学 Distributed file system and data access method thereof
CN103631623A (en) * 2013-11-29 2014-03-12 浪潮(北京)电子信息产业有限公司 Method and device for allocating application software in trunking system
CN104113597A (en) * 2014-07-18 2014-10-22 西安交通大学 Multi- data-centre hadoop distributed file system (HDFS) data read-write system and method
CN104363222A (en) * 2014-11-11 2015-02-18 浪潮电子信息产业股份有限公司 Hadoop-based network security event analyzing method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168795A (en) * 2017-05-12 2017-09-15 西南大学 Codon deviation factor model method based on CPU GPU isomery combined type parallel computation frames
CN107168795B (en) * 2017-05-12 2019-05-03 西南大学 Codon deviation factor model method based on CPU-GPU isomery combined type parallel computation frame
CN107846411A (en) * 2017-11-24 2018-03-27 郑州云海信息技术有限公司 A kind of DNS clustered deploy(ment)s system and method

Similar Documents

Publication Publication Date Title
JP4867660B2 (en) Componentized automated provisioning and management of computing environments for computing utilities
US10936589B1 (en) Capability-based query planning for heterogenous processing nodes
JP6581108B2 (en) Processing data from multiple sources
CN102880503B (en) Data analysis system and data analysis method
CN111241078A (en) Data analysis system, data analysis method and device
Tao et al. Dynamic resource allocation algorithm for container-based service computing
WO2019226327A1 (en) Data platform fabric
CN104954453A (en) Data mining REST service platform based on cloud computing
US9747130B2 (en) Managing nodes in a high-performance computing system using a node registrar
US10671621B2 (en) Predictive scaling for cloud applications
CN104008012B (en) A kind of high-performance MapReduce implementation methods based on dynamic migration of virtual machine
Luckow et al. Pilot-data: an abstraction for distributed data
US9184982B2 (en) Balancing the allocation of virtual machines in cloud systems
CN107294771A (en) A kind of efficient deployment system and application method suitable for big data cluster
CN101256599A (en) System for gathering data of distributing simulation platform based on grid
Luckow et al. Pilot-edge: Distributed resource management along the edge-to-cloud continuum
Sánchez et al. Agent-based platform to support the execution of parallel tasks
CN108153859A (en) A kind of effectiveness order based on Hadoop and Spark determines method parallel
CN106502795A (en) The method and system of scientific algorithm application deployment are realized on distributed type assemblies
Kim et al. RIDE: real-time massive image processing platform on distributed environment
Amoretti et al. Efficient autonomic cloud computing using online discrete event simulation
Rossant et al. Playdoh: a lightweight Python library for distributed computing and optimisation
Belgacem et al. Virtual ez grid: A volunteer computing infrastructure for scientific medical applications
CN115048564B (en) Distributed crawler task scheduling method, system and equipment
CN206421382U (en) A kind of data handling system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170315

RJ01 Rejection of invention patent application after publication