CN107885549A - Remove the method and system that process is remained in TORQUE computing cluster calculate nodes - Google Patents

Remove the method and system that process is remained in TORQUE computing cluster calculate nodes Download PDF

Info

Publication number
CN107885549A
CN107885549A CN201711137327.3A CN201711137327A CN107885549A CN 107885549 A CN107885549 A CN 107885549A CN 201711137327 A CN201711137327 A CN 201711137327A CN 107885549 A CN107885549 A CN 107885549A
Authority
CN
China
Prior art keywords
calculate
user name
file
torque computing
calculate node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711137327.3A
Other languages
Chinese (zh)
Other versions
CN107885549B (en
Inventor
孙金土
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinyang Normal University
Original Assignee
Xinyang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinyang Normal University filed Critical Xinyang Normal University
Priority to CN201711137327.3A priority Critical patent/CN107885549B/en
Publication of CN107885549A publication Critical patent/CN107885549A/en
Application granted granted Critical
Publication of CN107885549B publication Critical patent/CN107885549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44594Unloading

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to field of computer technology, and in particular to the management of high-performance computer cluster, more particularly to remove the method and system that process is remained in TORQUE computing cluster calculate nodes.The method that process is remained in TORQUE computing cluster calculate nodes is removed, including:All user names are obtained from user name listing file, are stored in user list;The user name of operation task in current all calculate nodes is obtained, is stored in relation array;A file with the entitled filename of calculate node is established for each calculate node, be stored in file does not currently have the user name of tasks carrying in the calculate node;Residual process is deleted for each calculate node;Automatically update user name listing file.The system that process is remained in TORQUE computing cluster calculate nodes is removed, including:First acquisition module;Second acquisition module;File establishes module;Remain process-kill module;Update module.The present invention can fast and accurately remove the residual process in TORQUE computing cluster calculate nodes.

Description

Remove the method and system that process is remained in TORQUE computing cluster calculate nodes
Technical field
The invention belongs to field of computer technology, and in particular to the management of high-performance computer cluster, more particularly to remove The method and system of process are remained in TORQUE computing cluster calculate nodes.
Background technology
High-performance calculation can be used for numerical analysis and simulated experiment, is a kind of generally acknowledged important research method, is science The important means of innovation.Currently, high-performance calculation ability is the embodiment of a national overall national strength, is had emphatically to national strategy The influence wanted.
Lifting computing capability has two ways:When the computing capability of lifting unit, such as lifting CPU dominant frequency, use Many-core technology uses GPU calculating etc.;Another method is to cross express network together(Such as InfiniBand)Carry out parallel more meters Calculation machine.At present, building for HPCC often considers above-mentioned two factor, selects a cost performance highest side Case, software for calculation, result of calculation are all stored in document storage system, to ensure the synchronism of each node data.Job run When, management node is assigned to task in calculate node and calculated, corresponding in calculate node to produce a calculation procedure.Work as meter After the completion of calculation, calculation procedure terminates in calculate node, and management node again can not find out the state of task.
In cluster actual motion, it is often found that indivedual calculate nodes have calculation procedure running, but looked into management node Less than its running status.Such as s-th of calculate node has calculation procedure running, but management node thinks that s-th of node is in Idle condition.If new task, management node may proceed to submit task to s-th of node, cause the multiple meters of s-th of node Add journey while run, substantially reduce computational efficiency.Here, the calculation procedure that we can not find out management node, is referred to as remained Process.The reason for causing residual process has a lot, and most importantly user program is write lack of standardization, next to that user not over Correct method submits calculating task, it is also possible to is system bug.In a word, residual calculation procedure has a strong impact on normal calculating Tasks carrying, to dispose.It is badly in need of a kind of method and system for the residual process fast and accurately removed in calculate node.
The content of the invention
Remained it is an object of the present invention to overcome the above mentioned deficiencies, providing and removing in TORQUE computing cluster calculate nodes The method and system of process, to solve influence of the residual process to calculate node and TORQUE computing clusters, ensure the fast of task Speed performs.
To achieve these goals, the present invention uses following technical scheme:
A kind of method that process is remained in removing TORQUE computing cluster calculate nodes, comprises the following steps:
Step 1:All user names are obtained from user name listing file, are stored in user list;
Step 2:The user name of operation task in current all calculate nodes is obtained, is stored in relation array;
Step 3:A file with the entitled filename of calculate node is established for each calculate node, is stored in currently at this in file Calculate node does not have the user name of tasks carrying;
Step 4:Residual process is deleted for each calculate node;
Step 5:Automatically update user name listing file.
Preferably, also include before the step 1:
User name listing file is created for TORQUE computing clusters.
Preferably, also include after the step 4:
Record the situation that each calculate node deletes residual process.
The system that process is remained in TORQUE computing cluster calculate nodes is removed, including:
First acquisition module, for obtaining all user names from user name listing file, it is stored in user list;
Second acquisition module, for obtaining the user name of operation task in current all calculate nodes, it is stored in relation array;
File establishes module, for establishing a file with the entitled filename of calculate node for each calculate node, in file Deposit does not currently have the user name of tasks carrying in the calculate node;
Process-kill module is remained, for deleting residual process for each calculate node;
Update module, for automatically updating user name listing file.
Preferably, in addition to:
Creation module, for creating user name listing file for TORQUE computing clusters.
Preferably, in addition to:
Logging modle, the situation of residual process is deleted for recording each calculate node.
Compared with prior art, the device have the advantages that:
Then the present invention obtains current institute by obtaining all user names from user name listing file and being stored in user list There is the user name of operation task in calculate node and be stored in relation array, then establish one for each calculate node to calculate The file of the entitled filename of node, be stored in file does not currently have the user name of tasks carrying in the calculate node, finally to be every Individual calculate node deletes residual process, automatically updates the mode of user name listing file, deletes each TORQUE computing clusters and respectively counts The residual process of operator node, influence of the residual process to calculating task is eliminated with this so that calculating task is within the predetermined time Complete, the specification use of cluster.
Brief description of the drawings
Fig. 1 is the basic procedure schematic diagram that the present invention removes the method that process is remained in TORQUE computing cluster calculate nodes One of.
Fig. 2 is the basic procedure schematic diagram that the present invention removes the method that process is remained in TORQUE computing cluster calculate nodes Two.
Fig. 3 be the present invention remove TORQUE computing cluster calculate nodes in remain process system structural representation it One.
Fig. 4 be the present invention remove TORQUE computing cluster calculate nodes in remain process system structural representation it Two.
Embodiment
In order to make it easy to understand, explanation explained below is made to the part noun occurred in the embodiment of the present invention:
TORQUE computing clusters:A kind of common HPCC job management system.
Management node:For managing and distributing the node of computing resource, general cluster only has one.
Calculate node:It is mainly used in the node calculated in the cluster, general cluster there are many.
Residual process:The calculation procedure that can not find out in management node.
Below in conjunction with the accompanying drawings with specific embodiment the present invention will be further explained explanation:
Embodiment one:
As shown in figure 1, the method for process is remained in a kind of removing TORQUE computing cluster calculate nodes of the present invention, including it is following Step:
Step S101:All user names are obtained from user name listing file, are stored in user list;
Step S102:The user name of operation task in current all calculate nodes is obtained, is stored in relation array;
Step S103:A file with the entitled filename of calculate node is established for each calculate node, is stored in file current There is no the user name of tasks carrying in the calculate node;
Step S104:Residual process is deleted for each calculate node;
Step S105:Automatically update user name listing file.
Embodiment two:
As shown in Fig. 2 another method for removing residual process in TORQUE computing cluster calculate nodes of the present invention, including with Lower step:
Step S201:User name listing file user.list is created for TORQUE computing clusters.
Step S202:All user names, deposit user list uli are obtained from user name listing file user.list;
For example, it is assumed that there are 10 users in system:
Uli=[user1, user2, user3, user4, user5, user6, user7, user8, user9, user10];
Step S203:The user name situation of operation task in current all calculate nodes is obtained, is stored in relation array idic, form For idic [node name]=[user list];
For example, it is assumed that TORQUE computing clusters there are 16 calculate nodes, node 1, node 2 ... node 16 are respectively designated as, with Exemplified by node 1, node 2, node 16:
Idic [node1]=[user1, user3, user8];
Idic [node2]=[user1, user4];
Idic [node16]=[user5, user6].
Step S204:A file with the entitled filename of calculate node is established for each calculate node, is stored in file Currently there is no the user name of tasks carrying in the calculate node;
Such as:
It is stored in node1 files:User2, user4, user5, user6, user7, user9, user10;
It is stored in node2 files:User2, user3, user5, user6, user7, user8, user9, user10;
It is stored in node16 file:User1, user2, user3, user4, user7, user8, user9, user10.
Step S205:Currently there is no the user name of tasks carrying in the calculate node according in each calculate node file, Residual process is deleted for each calculate node;
Such as:
User2, user4, user5, user6, user7, user9, user10 all processes are deleted under node1;
User2, user3, user5, user6, user7, user8, user9, user10 all processes are deleted under node2;
User1, user2, user3, user4, user7, user8, user9, all processes of user10 are deleted under node16.
Step S206:Record the situation that each calculate node deletes residual process, each calculate node is deleted remain into The situation write-in now.list of journey, until deleting all residual processes.
Step S207:If keeper newly with the addition of user, and the user is in and performs task status, then automatically updates User name listing file user.list.
Embodiment three:
As shown in figure 3, the system of process is remained in a kind of removing TORQUE computing cluster calculate nodes of the present invention, including:
First acquisition module 301, for obtaining all user names from user name listing file, it is stored in user list;
Second acquisition module 302, for obtaining the user name of operation task in current all calculate nodes, it is stored in relation array;
File establishes module 303, for establishing a file with the entitled filename of calculate node, file for each calculate node In be stored in and currently there is no the user name of tasks carrying in the calculate node;
Process-kill module 304 is remained, for deleting residual process for each calculate node;
Update module 305, for automatically updating user name listing file.
Example IV:
As shown in figure 4, another system for removing residual process in TORQUE computing cluster calculate nodes of the present invention, including:
Creation module 401, for creating user name listing file for TORQUE computing clusters.
First acquisition module 402, for obtaining all user names from user name listing file, it is stored in user list;
Second acquisition module 403, for obtaining the user name of operation task in current all calculate nodes, it is stored in relation array;
File establishes module 404, for establishing a file with the entitled filename of calculate node, file for each calculate node In be stored in and currently there is no the user name of tasks carrying in the calculate node;
Process-kill module 405 is remained, for deleting residual process for each calculate node;
Logging modle 406, the situation of residual process is deleted for recording each calculate node.
Update module 407, for automatically updating user name listing file.
Illustrated above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (6)

1. the method for process is remained in a kind of removing TORQUE computing cluster calculate nodes, it is characterised in that comprise the following steps:
Step 1:All user names are obtained from user name listing file, are stored in user list;
Step 2:The user name of operation task in current all calculate nodes is obtained, is stored in relation array;
Step 3:A file with the entitled filename of calculate node is established for each calculate node, is stored in currently at this in file Calculate node does not have the user name of tasks carrying;
Step 4:Residual process is deleted for each calculate node;
Step 5:Automatically update user name listing file.
2. remaining the method for process in removing TORQUE computing cluster calculate nodes according to claim 1, its feature exists In also including before the step 1:
User name listing file is created for TORQUE computing clusters.
3. remaining the method for process in removing TORQUE computing cluster calculate nodes according to claim 1, its feature exists In also including after the step 4:
Record the situation that each calculate node deletes residual process.
4. based on the clear of the method that process is remained in any described removing TORQUE computing cluster calculate nodes of claim 1-3 Except the system that process is remained in TORQUE computing cluster calculate nodes, it is characterised in that including:
First acquisition module, for obtaining all user names from user name listing file, it is stored in user list;
Second acquisition module, for obtaining the user name of operation task in current all calculate nodes, it is stored in relation array;
File establishes module, for establishing a file with the entitled filename of calculate node for each calculate node, in file Deposit does not currently have the user name of tasks carrying in the calculate node;
Process-kill module is remained, for deleting residual process for each calculate node;
Update module, for automatically updating user name listing file.
5. remaining the system of process in removing TORQUE computing cluster calculate nodes according to claim 4, its feature exists In, in addition to:
Creation module, for creating user name listing file for TORQUE computing clusters.
6. remaining the system of process in removing TORQUE computing cluster calculate nodes according to claim 4, its feature exists In, in addition to:
Logging modle, the situation of residual process is deleted for recording each calculate node.
CN201711137327.3A 2017-11-16 2017-11-16 Method and system for clearing residual process in TORQUE computing cluster computing node Active CN107885549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711137327.3A CN107885549B (en) 2017-11-16 2017-11-16 Method and system for clearing residual process in TORQUE computing cluster computing node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711137327.3A CN107885549B (en) 2017-11-16 2017-11-16 Method and system for clearing residual process in TORQUE computing cluster computing node

Publications (2)

Publication Number Publication Date
CN107885549A true CN107885549A (en) 2018-04-06
CN107885549B CN107885549B (en) 2020-10-23

Family

ID=61777060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711137327.3A Active CN107885549B (en) 2017-11-16 2017-11-16 Method and system for clearing residual process in TORQUE computing cluster computing node

Country Status (1)

Country Link
CN (1) CN107885549B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463739A (en) * 2019-09-09 2021-03-09 山东省计算中心(国家超级计算济南中心) Data processing method and system based on ocean mode ROMS

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1391386A (en) * 2001-06-12 2003-01-15 华为技术有限公司 Method for protecting task process in multitask operating system
US20150222506A1 (en) * 2014-01-31 2015-08-06 Apollo Education Group, Inc. Mechanism for controlling a process on a computing node based on the participation status of the computing node
CN105022666A (en) * 2014-04-24 2015-11-04 中国电信股份有限公司 Method, device and system for controlling MapReduce task scheduling
CN106681753A (en) * 2016-11-01 2017-05-17 腾讯科技(深圳)有限公司 Software residual process processing method and device
CN107066879A (en) * 2017-03-22 2017-08-18 山东中创软件商用中间件股份有限公司 A kind of method and system hidden for computer application program process

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1391386A (en) * 2001-06-12 2003-01-15 华为技术有限公司 Method for protecting task process in multitask operating system
US20150222506A1 (en) * 2014-01-31 2015-08-06 Apollo Education Group, Inc. Mechanism for controlling a process on a computing node based on the participation status of the computing node
CN105022666A (en) * 2014-04-24 2015-11-04 中国电信股份有限公司 Method, device and system for controlling MapReduce task scheduling
CN106681753A (en) * 2016-11-01 2017-05-17 腾讯科技(深圳)有限公司 Software residual process processing method and device
CN107066879A (en) * 2017-03-22 2017-08-18 山东中创软件商用中间件股份有限公司 A kind of method and system hidden for computer application program process

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱军等: "节点删除法的虚拟网络映射算法", 《安徽大学学报(自然科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463739A (en) * 2019-09-09 2021-03-09 山东省计算中心(国家超级计算济南中心) Data processing method and system based on ocean mode ROMS

Also Published As

Publication number Publication date
CN107885549B (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN104618693B (en) A kind of monitor video based on cloud computing handles task management method and system online
WO2020259717A1 (en) Method, apparatus, and terminal device for controlling federated learning system, and storage medium
US9798774B1 (en) Graph data search method and apparatus
Lo'ai et al. A mobile cloud computing model using the cloudlet scheme for big data applications
US20160048415A1 (en) Systems and Methods for Auto-Scaling a Big Data System
CN101887367B (en) Multi-level parallel programming method
CN104731595A (en) Big-data-analysis-oriented mixing computing system
CN102739785B (en) Method for scheduling cloud computing tasks based on network bandwidth estimation
CN103902593A (en) Data transfer method and device
CN106302640A (en) Data request processing method and device
CN108121511A (en) Data processing method, device and equipment in a kind of distributed edge storage system
WO2017148297A1 (en) Method and device for joining tables
WO2015051685A1 (en) Task scheduling method, device and system
CN107357873A (en) A kind of big data storage management system
WO2021082928A1 (en) Data reduction method and apparatus, computing device, and storage medium
CN106487601A (en) Resource monitoring method, apparatus and system
CN109618308A (en) A method of internet of things data is handled based on Spark Streaming
WO2015196885A1 (en) Method and apparatus for acquiring and storing performance data of cloud computing system
CN107885549A (en) Remove the method and system that process is remained in TORQUE computing cluster calculate nodes
CN107273527A (en) A kind of Hadoop clusters and distributed system
CN106230623A (en) A kind of VIM site selection method and device
CN104361000A (en) Access method and system for relational database based on self-built database connection pool
CN106257447A (en) The video storage of cloud storage server and search method, video cloud storage system
WO2017113865A1 (en) Method and device for big data increment calculation
CN104580498B (en) A kind of adaptive cloud management platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant