CN105389201A - Process management method and system thereof based on high-performance computing cluster - Google Patents

Process management method and system thereof based on high-performance computing cluster Download PDF

Info

Publication number
CN105389201A
CN105389201A CN201410446186.3A CN201410446186A CN105389201A CN 105389201 A CN105389201 A CN 105389201A CN 201410446186 A CN201410446186 A CN 201410446186A CN 105389201 A CN105389201 A CN 105389201A
Authority
CN
China
Prior art keywords
rubbish
hpcc
difference
information
processes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410446186.3A
Other languages
Chinese (zh)
Other versions
CN105389201B (en
Inventor
葛鑫
冯佳丽
李进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Petroleum and Chemical Corp
Sinopec Geophysical Research Institute
Original Assignee
China Petroleum and Chemical Corp
Sinopec Geophysical Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Petroleum and Chemical Corp, Sinopec Geophysical Research Institute filed Critical China Petroleum and Chemical Corp
Priority to CN201410446186.3A priority Critical patent/CN105389201B/en
Publication of CN105389201A publication Critical patent/CN105389201A/en
Application granted granted Critical
Publication of CN105389201B publication Critical patent/CN105389201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a process management method and system thereof based on a high-performance computing cluster. The method comprises following steps of: S1, collecting processes of all nodes of the high-performance compute cluster and corresponding process information; S2, adopting an automatic contrasting method or a searching and screening method to determine whether processes are garbage processes or not; and S3, closing garbage processes. In the step S2, when the automatic contrasting method is utilized for determining whether processes are garbage processes or not, difference of CPU hold-up time and machine-dwelling time are firstly acquired according to process information and is utilized for comparing with pre-set value. If the difference exceeds the pre-set value, the processes are determined as garbage processes.By adoption of the technical scheme, users' processes of a system can be collected and analyzed and determined by adoption of the automatic contrasting method or the searching and screening method and garbage processes are cleared out. As a result, calculating resources are released and system load is decreased.

Description

A kind of process management method based on HPCC and system thereof
Technical field
The present invention relates to HPCC, particularly relate to a kind of process management method based on HPCC and system thereof.
Background technology
Cluster (cluster) is one group of computing machine, and they integrally provide a group network resource to user.These single computer systems are exactly the node (node) of cluster.A desirable cluster is that user never can be appreciated that the node of group system bottom, and In the view of them, cluster is a system, but not multiple computer system.Process is a specific implementation of program, is the process of executive routine, and a program can perform repeatedly, can open up independently space at every turn and load, thus produce multiple process in internal memory.
High-performance calculation (HighPerfermanceComputing) cluster, be called for short HPC cluster, the system consisted of network a lot of computing machine, is mainly used in parallel computation, (SuSE) Linux OS is installed by all computing machines.Meeting is improper due to user's use, the exception of computing machine and produce a large amount of rubbish processes, system manager often needs to take much time and analyzes, judges and remove these rubbish processes to the process in High-Performance Computing Cluster, solves the problem that the system service efficiency of making is not high because of these rubbish processes.
Summary of the invention
The features and advantages of the present invention are partly stated in the following description, or can be apparent from this description, or learn by putting into practice the present invention.
For overcoming the problem of prior art, the invention provides a kind of process management method based on HPCC and system thereof, adopt automatic control methods or search screening technique and judge whether this process is rubbish process, thus process is effectively managed, improve clustered node utilization factor.
It is as follows that the present invention solves the problems of the technologies described above adopted technical scheme:
According to an aspect of the present invention, a kind of process management method based on HPCC is provided, it is characterized in that, comprising: S1, the process gathering all nodes in HPCC and corresponding progress information; S2, adopt automatic control methods or search screening technique and judge whether this process is rubbish process; S3, terminate this rubbish process; In this step S2, when adopting this automatic control methods to judge whether this process is rubbish process, first time in machine and the CPU holding time of this process is obtained according to this progress information, then the time in machine of this process and the difference of CPU holding time is calculated, and this difference and preset value are made comparisons, if this difference is greater than this preset value, then judge that this process is as rubbish process.
Preferably, in this step S2, this preset value arranges different values according to the type difference of this process.
Preferably, this step S1, S2, S3 realize by inputting instruction in shell script.
Preferably, in this step S2, adopting automatic control methods or searching the process that screening technique carries out judging is consumer process.
Preferably, in this step S2, when adopt this search screening technique judge whether this process is rubbish process time, first in this consumer process all, find out the consumer process that there is abnormal interrupt situation, filter out wherein more also at the consumer process of the abnormal interrupt situation of this existence taking this node resource, be judged as rubbish process.
Preferably, in this step S2, when adopt this search screening technique judge whether this process is rubbish process time, judge whether this process is rubbish process according to user's information of this process, time in machine, the size taking the CPU of this node and any one or more information in the memory size of this node of taking in conjunction with the information of this process whether well afoot.
According to another aspect of the present invention, a kind of process management system based on HPCC is provided, it is characterized in that, comprising: collecting unit, for gathering the process of all nodes in HPCC and corresponding progress information; Judging unit, is connected with this collecting unit, comprises and automatically contrasts module and search screening module, for judging whether this process is rubbish process; End unit, for terminating this rubbish process; In this judging unit, this automatically contrasts module and comprises: calculating sub module, for calculating the time in machine of this process and the difference of CPU holding time according to this progress information; Contrast submodule, for this difference and preset value being made comparisons, if this difference is greater than this preset value, then this process is rubbish process.
Preferably, this preset value has different values according to the type difference of this process.
Preferably, this searches screening module for finding out the process that there is abnormal interrupt situation in this processes all; Also also taking the process of this node resource for filtering out in the process of the abnormal interrupt situation of this existence all, being judged as rubbish process.
Preferably, this is searched screening module and judges whether this process is rubbish process for the user's information according to this process, time in machine, the size taking the CPU of this node and any one or more information in the memory size of this node of taking in conjunction with the information of this process whether well afoot.
The invention provides a kind of process management method based on HPCC and system thereof, the T.T. of CPU and the contrast of preset value is taken by calculation procedure, automatically rubbish process is terminated according to comparing result, also rubbish process can be terminated by searching screening technique, thus give system manager certain operating space, confirm and remove the rubbish process of user, release computational resource.
By reading instructions, those of ordinary skill in the art will understand the characteristic sum content of these technical schemes better.
Accompanying drawing explanation
Below by with reference to accompanying drawing describe the present invention particularly in conjunction with example, advantage of the present invention and implementation will be more obvious, wherein content shown in accompanying drawing is only for explanation of the present invention, and does not form restriction of going up in all senses of the present invention, in the accompanying drawings:
Fig. 1 is the schematic flow sheet of the process management method based on HPCC of the embodiment of the present invention.
Fig. 2 is the schematic diagram by the management of shell script implementation process of the embodiment of the present invention.
Fig. 3 is the structural representation of the process management system based on HPCC of the embodiment of the present invention.
Embodiment
As shown in Figure 1, the invention provides a kind of process management method based on HPCC, comprising: S1, the process gathering all nodes in HPCC and corresponding progress information; S2, adopt automatic control methods or search screening technique and judge whether this process is rubbish process; S3, terminate this rubbish process; In this step S2, when adopting automatic control methods to judge whether process is rubbish process, first time in machine and the CPU holding time of process is obtained according to this progress information, then the time in machine of calculation procedure and the difference of CPU holding time, and this difference and preset value are made comparisons, if difference is greater than preset value, then judge that this process is as rubbish process.Wherein the time in machine is the time that process exists in systems in which, and system is operating system, and namely process is in an operating system from the duration produced till now.
This preset value can arrange different values according to the type difference of process, and in the present embodiment, the value of setting is 7 days.
Process is carried out Automatic Optimal operation by the algorithm that automatic control methods achieves to be provided according to system, and carries out interference management without the need to system manager.It should be noted that, process can be divided into system process and consumer process roughly, and to adopt automatic control methods in step s 2 or search the process that screening technique carries out judging be consumer process, that is, rubbish process must be consumer process, and there is no fear of being system process.So before the automatic control methods of enforcement, consumer process will be filtered out by automatic or manual in all processes, the consumer process and corresponding consumer process information that directly gather all nodes in HPCC certainly also can be set in step sl.
In step s 2, when adopt this search screening technique judge whether this process is rubbish process time, first in all consumer process, find out the consumer process that there is abnormal interrupt situation, filter out also at the consumer process of the abnormal interrupt situation of this existence taking described node resource more wherein, be judged as rubbish process, and then terminated it in step s3.The consumer process of the abnormal interrupt situation of above-mentioned existence refers to that the program that this consumer process is corresponding exists abnormal situation of interrupting.
When adopt this search screening technique judge whether this process is rubbish process time, can also judge whether this process is rubbish process according to user's information of this process, time in machine, the size taking the CPU of node and any one or more information in the memory size of node of taking in conjunction with the information of this process whether well afoot.Such as whether can there is the wrong situation sent out according to its operation of user's information inspection, when the process that the operation that user's mistake is sent out produces is also underway, can conclude that this process is rubbish process; First can also select the large of the CPU taking node or take the large multiple processes of the internal memory of node, in conjunction with the service condition of the plurality of process, conclude whether process is rubbish process.
Searching the requirement that screening technique achieves system manager well, when some process goes wrong, without the need to waiting until that this process proceeds to the preset value of its correspondence, this process can be terminated at once; In the specific implementation, automatic control methods with search screening technique and can carry out simultaneously.
Please refer to Fig. 2, in the present embodiment, above-mentioned steps S1, S2, S3 can by inputting instruction to realize in shell script.Shell is a kind of program possessing specific function, and it is an interface between the kernel program (kernel) of user and UNIX/Linux operating system.
As shown in Figure 3, the present invention also provides a kind of process management system based on HPCC, comprising: collecting unit 10, for gathering the process of all nodes in HPCC and corresponding progress information; Judging unit 20, is connected with this collecting unit 10, comprises contrast module 21 automatically and screens module 22, for judging whether process is rubbish process with searching; End unit 30, for terminating this rubbish process.Although do not show in figure, but in judging unit 20, this automatically contrast module 21 comprise: calculating sub module, for according to the time in machine of progress information calculation procedure and the difference of CPU holding time, wherein time in machine of process and CPU holding time are included in progress information; Contrast submodule, for this difference and preset value being made comparisons, if this difference is greater than preset value, then this process is rubbish process.Wherein, preset value can have different values according to the type difference of this process.
Search screening module 22 for finding out the process that there is abnormal interrupt situation in all processes; Also also taking the process of this node resource for filtering out in the process of the abnormal interrupt situation of all existence, being judged as rubbish process.Above-mentioned process refers to consumer process, and the process that there is abnormal interrupt situation refers to the abnormal situation of interrupting of program existence that this process is corresponding.In practical operation, search screening module 22 and can be used for finding out consumer process in all processes.Also collecting unit 10 directly can be arranged to gather the consumer process of all nodes in HPCC and corresponding consumer process information.
This is searched screening module 22 and can also be used for user's information according to this process, time in machine, the size taking the CPU of node and any one or more information in the memory size of node of taking and judge whether this process is rubbish process in conjunction with the information of this process whether well afoot.Such as whether can there is the wrong situation sent out according to its operation of user's information inspection, when the process that the operation that user's mistake is sent out produces is also underway, can conclude that this process is rubbish process; First can also select the large of the CPU taking node or take the large multiple processes of the internal memory of node, in conjunction with the service condition of the plurality of process, conclude whether process is rubbish process.
In the present embodiment, collecting unit 10, judging unit 20 and end unit 30 are all based on shell script technology.And automatic contrast module in described judging unit 20 21 is screened module 22 and can be worked with searching simultaneously.
The invention provides a kind of process management method based on HPCC and system thereof, can consumer process in acquisition system and by automatic control methods with search screening technique and carry out analysis and judge, confirm and the process that removes rubbish, thus can computational resource be discharged, reduce system load.
Above with reference to the accompanying drawings of the preferred embodiments of the present invention, those skilled in the art do not depart from the scope and spirit of the present invention, and multiple flexible program can be had to realize the present invention.For example, to illustrate as the part of an embodiment or the feature that describes can be used for another embodiment to obtain another embodiment.These are only the better feasible embodiment of the present invention, not thereby limit to interest field of the present invention that the equivalence change that all utilizations instructions of the present invention and accompanying drawing content are done all is contained within interest field of the present invention.

Claims (10)

1. based on a process management method for HPCC, it is characterized in that, comprising:
S1, the process gathering all nodes in HPCC and corresponding progress information;
S2, adopt automatic control methods or search screening technique and judge whether described process is rubbish process;
S3, terminate described rubbish process;
In described step S2, when adopting described automatic control methods to judge whether described process is rubbish process, first time in machine and the CPU holding time of described process is obtained according to described progress information, then the time in machine of described process and the difference of CPU holding time is calculated, and described difference and preset value are made comparisons, if described difference is greater than described preset value, then judge that described process is as rubbish process.
2. according to claim 1 based on the process management method of HPCC, it is characterized in that, in described step S2, described preset value arranges different values according to the type difference of described process.
3. according to claim 1 based on the process management method of HPCC, it is characterized in that, described step S1, S2, S3 realize by inputting instruction in shell script.
4. according to claim 1 based on the process management method of HPCC, it is characterized in that, in described step S2, adopting automatic control methods or searching the process that screening technique carries out judging is consumer process.
5. according to claim 1 or 4 based on the process management method of HPCC, it is characterized in that, in described step S2, when searching screening technique described in adopting and judging whether described process is rubbish process, first in all described consumer process, find out the consumer process that there is abnormal interrupt situation, filtering out wherein more also taking the consumer process of the abnormal interrupt situation of described existence of described node resource, being judged as rubbish process.
6. according to claim 1 based on the process management method of HPCC, it is characterized in that, in described step S2, when searching screening technique described in adopting and judging whether described process is rubbish process, judge whether described process is rubbish process according to user's information of described process, time in machine, the size taking the CPU of described node and any one or more information in the memory size of described node of taking in conjunction with the information of described process whether well afoot.
7. based on a process management system for HPCC, it is characterized in that, comprising:
Collecting unit, for gathering the process of all nodes in HPCC and corresponding progress information;
Judging unit, is connected with described collecting unit, comprises and automatically contrasts module and search screening module, for judging whether described process is rubbish process;
End unit, for terminating described rubbish process;
In described judging unit, described automatic contrast module comprises: calculating sub module, for calculating the time in machine of described process and the difference of CPU holding time according to described progress information; Contrast submodule, for described difference and preset value being made comparisons, if described difference is greater than described preset value, then described process is rubbish process.
8. according to claim 7 based on the process management system of HPCC, it is characterized in that, described preset value has different values according to the type difference of described process.
9., according to claim 7 based on the process management system of HPCC, it is characterized in that, described in search screening module for finding out the process that there is abnormal interrupt situation in all described processes; Also for filtering out also in the process taking described node resource in the process of the abnormal interrupt situation of all described existence, be judged as rubbish process.
10. according to claim 7 based on the process management system of HPCC, it is characterized in that, described in search screening module and judge whether described process is rubbish process for the user's information according to described process, time in machine, the size taking the CPU of described node and any one or more information in the memory size of described node of taking in conjunction with the information of described process whether well afoot.
CN201410446186.3A 2014-09-03 2014-09-03 A kind of process management method and its system based on High Performance Computing Cluster Active CN105389201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410446186.3A CN105389201B (en) 2014-09-03 2014-09-03 A kind of process management method and its system based on High Performance Computing Cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410446186.3A CN105389201B (en) 2014-09-03 2014-09-03 A kind of process management method and its system based on High Performance Computing Cluster

Publications (2)

Publication Number Publication Date
CN105389201A true CN105389201A (en) 2016-03-09
CN105389201B CN105389201B (en) 2018-11-13

Family

ID=55421508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410446186.3A Active CN105389201B (en) 2014-09-03 2014-09-03 A kind of process management method and its system based on High Performance Computing Cluster

Country Status (1)

Country Link
CN (1) CN105389201B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106325861A (en) * 2016-08-18 2017-01-11 北京奇虎科技有限公司 Method and device used for managing distributed system
CN106371928A (en) * 2016-09-18 2017-02-01 安徽爱她有果电子商务有限公司 Method for managing computer
CN110955710A (en) * 2019-11-26 2020-04-03 杭州数梦工场科技有限公司 Method and device for processing dirty data in data exchange operation
CN111639006A (en) * 2020-05-29 2020-09-08 深圳前海微众银行股份有限公司 Cluster process management method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102141934A (en) * 2011-02-28 2011-08-03 浪潮(北京)电子信息产业有限公司 Method and device for controlling process on fat node
CN102591765A (en) * 2011-12-31 2012-07-18 珠海市君天电子科技有限公司 Progress automatic management system
CN102662762A (en) * 2012-03-30 2012-09-12 浪潮电子信息产业股份有限公司 Method for effectively controlling use of memory resource of fat node
US20130055278A1 (en) * 2011-08-29 2013-02-28 Kaspersky Lab Zao Efficient management of computer resources

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102141934A (en) * 2011-02-28 2011-08-03 浪潮(北京)电子信息产业有限公司 Method and device for controlling process on fat node
US20130055278A1 (en) * 2011-08-29 2013-02-28 Kaspersky Lab Zao Efficient management of computer resources
CN102591765A (en) * 2011-12-31 2012-07-18 珠海市君天电子科技有限公司 Progress automatic management system
CN102662762A (en) * 2012-03-30 2012-09-12 浪潮电子信息产业股份有限公司 Method for effectively controlling use of memory resource of fat node

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106325861A (en) * 2016-08-18 2017-01-11 北京奇虎科技有限公司 Method and device used for managing distributed system
CN106371928A (en) * 2016-09-18 2017-02-01 安徽爱她有果电子商务有限公司 Method for managing computer
CN110955710A (en) * 2019-11-26 2020-04-03 杭州数梦工场科技有限公司 Method and device for processing dirty data in data exchange operation
CN111639006A (en) * 2020-05-29 2020-09-08 深圳前海微众银行股份有限公司 Cluster process management method and device

Also Published As

Publication number Publication date
CN105389201B (en) 2018-11-13

Similar Documents

Publication Publication Date Title
US8881286B2 (en) Clustering processing method and device for virus files
CN105389201A (en) Process management method and system thereof based on high-performance computing cluster
CN110134738B (en) Distributed storage system resource estimation method and device
US20150120637A1 (en) Apparatus and method for analyzing bottlenecks in data distributed data processing system
CN107544832A (en) A kind of monitoring method, the device and system of virtual machine process
JP5862245B2 (en) Arrangement apparatus, arrangement program, and arrangement method
CN108021441A (en) A kind of resources of virtual machine collocation method and device based on cloud computing
CN110908796B (en) Multi-operation merging and optimizing system and method in Gaia system
CN110737648A (en) Performance characteristic dimension reduction method and device, electronic equipment and storage medium
WO2020211253A1 (en) Elastic scaling method and apparatus for number of hosts in distributed system, and computer device
CN108874508A (en) A kind of cloud computing virtual server system load equilibration scheduling method
CN110196751B (en) Method and device for isolating mutual interference service, electronic equipment and storage medium
US10324643B1 (en) Automated initialization and configuration of virtual storage pools in software-defined storage
CN111026574A (en) Method and device for diagnosing Elasticissearch cluster problems
CN108108625B (en) Method, system and storage medium for detecting overflow vulnerability based on format isomerism
US11122065B2 (en) Adaptive anomaly detection for computer systems
CN114138330B (en) Knowledge graph-based code clone detection optimization method and device and electronic equipment
CN110955710B (en) Dirty data processing method and device in data exchange operation
CN110955498B (en) Process processing method, device and equipment and computer readable storage medium
CN107766442B (en) A kind of mass data association rule mining method and system
CN106648867B (en) Intelligent graceful restart method and device based on cloud data center
KR101837236B1 (en) Basic block size considering execution path exploration method and system for improving the code coverage
CN117135151B (en) Fault detection method of GPU cluster, electronic equipment and storage medium
US10031788B2 (en) Request profile in multi-threaded service systems with kernel events
CN107545186A (en) It is quick to solve the idle method, apparatus of engine and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant