CN106844021A - Computing environment resource management system and management method thereof - Google Patents
Computing environment resource management system and management method thereof Download PDFInfo
- Publication number
- CN106844021A CN106844021A CN201611111871.6A CN201611111871A CN106844021A CN 106844021 A CN106844021 A CN 106844021A CN 201611111871 A CN201611111871 A CN 201611111871A CN 106844021 A CN106844021 A CN 106844021A
- Authority
- CN
- China
- Prior art keywords
- unit
- task
- information
- management
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007726 management method Methods 0.000 title abstract description 51
- 238000004891 communication Methods 0.000 claims abstract description 29
- 230000010354 integration Effects 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 16
- 238000009826 distribution Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 11
- 230000004048 modification Effects 0.000 claims description 4
- 238000012986 modification Methods 0.000 claims description 4
- 230000000737 periodic effect Effects 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/48—Indexing scheme relating to G06F9/48
- G06F2209/484—Precedence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5017—Task decomposition
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer And Data Communications (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a computing environment resource management system and a management method thereof, wherein the system comprises a first statistical unit and a second statistical unit which are connected with each other, the first statistical unit comprises a first communication unit and the like, and the first communication unit, a first task information statistical unit and a first operating system are in level with a first task management unit; the second statistical unit comprises a second task management unit and the like, the second task management unit, the second communication unit, the slave system information receiving unit, the second task information statistical unit and the system state statistical unit are all connected with the state information integration unit, and the second task management unit, the second task information statistical unit and the system state statistical unit are all connected with the second operating system. The invention integrates the resources of the master system and the slave system, takes the whole CPU platform consisting of the master system and the three slave systems as a resource scheduling unit, truly reflects the system state and improves the resource management efficiency.
Description
Technical field
The present invention relates to a kind of management system and its management method, in particular it relates to a kind of computing environment resource management system
System and its management method.
Background technology
SLURM is a High Availabitity that can be used for large construction cluster system, scalable, fault tolerant, scalable cluster resource
Manager and task scheduling system, mainly have three functions:First, cluster resource is dynamically assigned to task.Secondly, there is provided
One complete framework, is started to task, performed and is monitored.Finally, management role queue, realizes the secondary of resource contention
Cut out.The system mainly includes a management finger daemon and multiple acts on behalf of finger daemon, and management finger daemon runs on management section
Point, receives cluster state monitoring data, distribution is scheduled to resource, distributed tasks and recovery result.Act on behalf of finger daemon fortune
Row is waited, performed and return to task status, while being counted to information such as cluster state, task statuses, being remembered in calculate node
Record, and report management node.Both coordinate the management function for realizing cluster.
Shen prestige platform is the domestic CPU platforms researched and developed by south of the River Institute of Computing Technology, and it has 16 cores, is divided into four
Individual core group, from core group, each core group installs a system for respectively one main core group and three, and main core group runs main system, from
Run in core group from system.Main system is depended on from system set, it is necessary to obtain system resource, access bottom hardware by main system
It is standby.
Therefore, the agent process in main system can fully monitor four system modes of core group, truly four cores of reflection
The resource situation of group.And the agent process from system cannot truly reflect the resource consumption of the system, and can only monitor from
Task run state in system, carries out the operation such as distribution, monitoring, recovery of task.
Therefore, if being disposed according to original framework of SLURM, the agent process from system can only obtain mistake
False information, it is impossible to reflect the truth of calculate node;Management node cannot monitor cluster correct status, cause resource consumption
Erroneous judgement, final cluster cannot normally run.
The content of the invention
For defect of the prior art, it is an object of the invention to provide a kind of computing environment resource management system and its pipe
Reason method, its integrate master-slave system resource, using by a main system, three whole CPU platforms constituted from system as one
Scheduling of resource unit, truly reflects system mode, improves resources management efficiency.
According to an aspect of the present invention, there is provided a kind of computing environment resource management system, it is characterised in that the calculating
Environmental resources management system includes the first statistic unit and the second statistic unit that are connected with each other, and the first statistic unit includes first
Communication unit, first task administrative unit, first task Information Statistics unit, the first operating system, the first communication unit, first
Mission bit stream statistic unit, the first operating system all practice level with first task administrative unit;Second statistic unit includes second
Business administrative unit, the second communication unit, status information integral unit, from System Information reception unit, the second mission bit stream count
Unit, system mode statistic unit, the second operating system, the second role management unit, the second communication unit, connect from system information
Receive unit, the second mission bit stream statistic unit, system mode statistic unit to be all connected with status information integral unit, the second task
Administrative unit, the second mission bit stream statistic unit, system mode statistic unit are all connected with the second operating system.
Preferably, the computing environment resource management system is distinguished to main system, from system, is run different agencies and is kept
Shield process.
Preferably, the finger daemon of acting on behalf of in the main system carries out the modification of function, addition.
The present invention also provides a kind of computing environment method for managing resource, it is characterised in that including task distribution flow and shape
State information reporting flow;
Task distribution flow is as follows:The calculating task that management finger daemon reception system keeper submits to;According to keeper
Parameter and the resource dispatching strategies such as task priority, occupancy resource, the operation duration specified, appropriate drawing is carried out to task
Point, and it is assigned to certain the calculate node main system in appropriate subregion;Status information integral unit is by from System Information reception list
Obtained respectively in unit, the second mission bit stream statistic unit, system mode statistic unit these three units from the task fortune in system
System mode, the resource consumption information of task run status information and main system in row status information, main system, and will be upper
Information integration is stated to together, obtaining a main system, three from the integrality information of system;Second role management unit is led to by second
Letter unit obtains the task of distribution, and integrality information is obtained by status information integral unit, then according to scheduling rule to appointing
Business is decomposed again, is handed down to from being by the second communication unit in this main system actuating section task, another part task
System;Second role management unit is obtained the task of distribution by the second communication unit, is obtained by the second mission bit stream statistic unit and appointed
Business running status, task is started when resource meets and requires;
State information report flow is as follows:The task that first task Information Statistics unit periodic statistical runs from system
Status information, and main system is reported by the first communication unit;It is responsible for receiving three from system from System Information reception unit
The mission bit stream for reporting;Second mission bit stream statistic unit is responsible for being monitored the task in main system, counting;System mode
Statistic unit then monitors the information such as a main system, three operation conditions, the resource consumptions from system;Status information integral unit
Foregoing three kinds of information is integrated, a main system, the three integrality information from system is obtained, and by the second communication
Integrality information reporting is given management finger daemon by unit.
Compared with prior art, the present invention has following beneficial effect:The present invention reduces needed in cluster management
Node number, is reduced to a quarter of original number, this not only simplifies cluster topology, required for decreasing cluster management
The traffic.Meanwhile, the partial function for managing finger daemon is transferred to main system and acts on behalf of finger daemon, reduce management node
Load pressure, improve the stability of group system.
Brief description of the drawings
The detailed description made to non-limiting example with reference to the following drawings by reading, further feature of the invention,
Objects and advantages will become more apparent upon:
Fig. 1 is the theory diagram of computing environment resource management system of the present invention.
Specific embodiment
With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area
Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that to the ordinary skill of this area
For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention
Protection domain.
As shown in figure 1, computing environment resource management system of the present invention includes the first statistic unit and second being connected with each other
Statistic unit, the first statistic unit include the first communication unit, first task administrative unit, first task Information Statistics unit,
First operating system, the first communication unit, first task Information Statistics unit, the first operating system all manage single with first task
Level is practiced by unit;Second statistic unit include the second role management unit, the second communication unit, status information integral unit, from system
Information receiving unit, the second mission bit stream statistic unit, system mode statistic unit, the second operating system, the second task management
Unit, the second communication unit, from System Information reception unit, the second mission bit stream statistic unit, system mode statistic unit all
It is connected with status information integral unit, the second role management unit, the second mission bit stream statistic unit, system mode statistic unit
All it is connected with the second operating system.
With reference to accompanying drawing of the invention, technical scheme is described in detail.Standard SLURM include one (it is or multiple,
It is each other hot standby relation, same time only one of which comes into force) management finger daemon and multiple act on behalf of finger daemon, and management is kept
Shield process runs on management node, receives cluster state monitoring data, distribution is scheduled to resource, and distributed tasks are tied with recovery
Really.Act on behalf of finger daemon and run on calculate node, wait, perform and return to task status, while to cluster state, task status
Counted etc. information, recorded, and reported management node.
But because 16 cores of Shen prestige platform are divided into a main core group and three from core group, each core group installs one and is
System.Main system is depended on from system, it is necessary to obtain system resource by main system, access bottom hardware equipment, cause from system
On agent process cannot truly reflect the resource consumption of the system.Therefore, if disposed according to original framework of SLURM
If, the agent process from system can only obtain error message, it is impossible to reflect the truth of calculate node;Management node without
Method monitors cluster correct status, causes the erroneous judgement of resource consumption, and final cluster cannot normally run.
In order to solve this problem, based on SLURM softwares, the present invention provides a kind of computing environment resource pipe of Shen prestige platform
Reason system, distinguishes to main system, from system, runs and different acts on behalf of finger daemon.Based on SLURM agent processes, from system
In agent process carry out function cutting, remove the functions such as system status monitoring, only retain the management function of task.It is based on
SLURM agent processes, the finger daemon of acting on behalf of in main system carries out the state letter of the modification of function, addition, including master and slave system
Breath integration function, distribution function again and task management functions of priority etc..
Computing environment method for managing resource of the present invention includes task distribution flow and state information report flow.
Task distribution flow of the invention is as follows:The calculating task that management finger daemon reception system keeper submits to;Root
Parameter and the resource dispatching strategies such as task priority, occupancy resource, the operation duration specified according to keeper, fit to task
When division, and be assigned to certain the calculate node main system in appropriate subregion;Status information integral unit is by from system information
Obtained from system respectively in receiving unit, the second mission bit stream statistic unit, system mode statistic unit these three units
System mode, the resource consumption information of task run status information and main system in task run status information, main system,
And be integrated together above- mentioned information, a main system, three are obtained from the integrality information of system;Second role management unit by
Second communication unit obtains the task of distribution, and integrality information is obtained by status information integral unit, is then advised according to scheduling
Then task is decomposed again, is issued by the second communication unit in this main system actuating section task, another part task
To from system.Second role management unit is obtained the task of distribution by the second communication unit, by the second mission bit stream statistic unit
Task run state is obtained, task is started when resource meets and requires.
State information report flow of the invention is as follows:First task Information Statistics unit periodic statistical runs from system
Task status information, and main system is reported by the first communication unit;It is responsible for reception three from System Information reception unit
The individual mission bit stream reported from system;Second mission bit stream statistic unit is responsible for being monitored the task in main system, counting;
System mode statistic unit then monitors the information such as a main system, three operation conditions, the resource consumptions from system;Status information
Integral unit is integrated to foregoing three kinds of information, obtains a main system, the three integrality information from system, and pass through
Integrality information reporting is given management finger daemon by the second communication unit.
Specific embodiment of the invention is described above.It is to be appreciated that the invention is not limited in above-mentioned
Particular implementation, those skilled in the art can within the scope of the claims make various deformations or amendments, this not shadow
Sound substance of the invention.
Claims (4)
1. a kind of computing environment resource management system, it is characterised in that the computing environment resource management system includes mutually interconnecting
The first statistic unit and the second statistic unit for connecing, the first statistic unit include the first communication unit, first task administrative unit,
First task Information Statistics unit, the first operating system, the first communication unit, first task Information Statistics unit, the first operation
System all practices level with first task administrative unit;Second statistic unit includes the second role management unit, the second communication unit, shape
State information integration unit, from System Information reception unit, the second mission bit stream statistic unit, system mode statistic unit, second
Operating system, it is the second role management unit, the second communication unit, single from System Information reception unit, the second mission bit stream statistics
Unit, system mode statistic unit are all connected with status information integral unit, the second role management unit, the second mission bit stream statistics
Unit, system mode statistic unit are all connected with the second operating system.
2. computing environment resource management system according to claim 1, it is characterised in that the computing environment resource management
System is distinguished to main system, from system, is run and different is acted on behalf of finger daemon.
3. computing environment resource management system according to claim 2, it is characterised in that the agency in the main system keeps
Shield process carries out the modification of function, addition.
4. a kind of computing environment method for managing resource, it is characterised in that including task distribution flow and state information report flow;
Task distribution flow is as follows:The calculating task that management finger daemon reception system keeper submits to;Specified according to keeper
Task priority, take resource, parameter and the resource dispatching strategy such as operation duration, appropriate division is carried out to task, and
It is assigned to certain the calculate node main system in appropriate subregion;Status information integral unit is by from System Information reception unit,
Obtained respectively from the task run state in system in two mission bit stream statistic units, system mode statistic unit these three units
System mode, the resource consumption information of task run status information and main system in information, main system, and by above- mentioned information
It is integrated together, obtains a main system, three from the integrality information of system;Second role management unit is by the second communication unit
The task of distribution is obtained, integrality information is obtained by status information integral unit, task is carried out according to scheduling rule then
Decompose again, be handed down to from system by the second communication unit in this main system actuating section task, another part task;Second
Role management unit is obtained the task of distribution by the second communication unit, and task run shape is obtained by the second mission bit stream statistic unit
State, task is started when resource meets and requires;
State information report flow is as follows:The state of the task that first task Information Statistics unit periodic statistical runs from system
Information, and main system is reported by the first communication unit;It is responsible for reception three from System Information reception unit to be reported from system
Mission bit stream;Second mission bit stream statistic unit is responsible for being monitored the task in main system, counting;System mode is counted
Unit then monitors the information such as a main system, three operation conditions, the resource consumptions from system;Status information integral unit is to preceding
State three kinds of information to be integrated, obtain a main system, the three integrality information from system, and by the second communication unit
Management finger daemon is given by integrality information reporting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611111871.6A CN106844021B (en) | 2016-12-06 | 2016-12-06 | Computing environment resource management system and management method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611111871.6A CN106844021B (en) | 2016-12-06 | 2016-12-06 | Computing environment resource management system and management method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106844021A true CN106844021A (en) | 2017-06-13 |
CN106844021B CN106844021B (en) | 2020-08-25 |
Family
ID=59146333
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611111871.6A Active CN106844021B (en) | 2016-12-06 | 2016-12-06 | Computing environment resource management system and management method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844021B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110177020A (en) * | 2019-06-18 | 2019-08-27 | 北京计算机技术及应用研究所 | A kind of High-Performance Computing Cluster management method based on Slurm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050283788A1 (en) * | 2004-06-17 | 2005-12-22 | Platform Computing Corporation | Autonomic monitoring in a grid environment |
US20060106996A1 (en) * | 2004-11-15 | 2006-05-18 | Ahmad Said A | Updating data shared among systems |
CN103501047A (en) * | 2013-10-09 | 2014-01-08 | 云南电力调度控制中心 | Intelligent fault wave recording main station information management system |
CN105938357A (en) * | 2015-03-02 | 2016-09-14 | 发那科株式会社 | Control device capable of centrally managing control by grouping a plurality of systems |
-
2016
- 2016-12-06 CN CN201611111871.6A patent/CN106844021B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050283788A1 (en) * | 2004-06-17 | 2005-12-22 | Platform Computing Corporation | Autonomic monitoring in a grid environment |
US20060106996A1 (en) * | 2004-11-15 | 2006-05-18 | Ahmad Said A | Updating data shared among systems |
CN103501047A (en) * | 2013-10-09 | 2014-01-08 | 云南电力调度控制中心 | Intelligent fault wave recording main station information management system |
CN105938357A (en) * | 2015-03-02 | 2016-09-14 | 发那科株式会社 | Control device capable of centrally managing control by grouping a plurality of systems |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110177020A (en) * | 2019-06-18 | 2019-08-27 | 北京计算机技术及应用研究所 | A kind of High-Performance Computing Cluster management method based on Slurm |
Also Published As
Publication number | Publication date |
---|---|
CN106844021B (en) | 2020-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103873279B (en) | Server management method and server management device | |
CN106708622A (en) | Cluster resource processing method and system, and resource processing cluster | |
CN108092813A (en) | Data center's total management system server hardware Governance framework and implementation method | |
CN111339175B (en) | Data processing method, device, electronic equipment and readable storage medium | |
CN104580338A (en) | Service processing method, system and equipment | |
CN102929773A (en) | Information collection method and device | |
US20140115153A1 (en) | Apparatus for monitoring data distribution service (dds) and method thereof | |
US11212173B2 (en) | Model-driven technique for virtual network function rehoming for service chains | |
CN114116172A (en) | Flow data acquisition method, device, equipment and storage medium | |
CN103763373A (en) | Method for dispatching based on cloud computing and dispatcher | |
CN108563787A (en) | A kind of data interaction management system and method for data center's total management system | |
JP2010128597A (en) | Information processor and method of operating the same | |
CN106844021A (en) | Computing environment resource management system and management method thereof | |
CN116260738B (en) | Equipment monitoring method and related equipment | |
US9009735B2 (en) | Method for processing data, computing node, and system | |
CN103442212A (en) | Network security and protection comprehensive early warning type management system platform | |
Benford | Requirements of Activity Management. | |
EP2770447B1 (en) | Data processing method, computational node and system | |
CN116346823A (en) | Big data heterogeneous task scheduling method and system based on message queue | |
CN114757448B (en) | Manufacturing inter-link optimal value chain construction method based on data space model | |
CN112000657A (en) | Data management method, device, server and storage medium | |
Liu et al. | Distributed ale in rfid middleware | |
CN115168042A (en) | Management method and device of monitoring cluster, computer storage medium and electronic equipment | |
Gulhane | Enhancing queuing efficiency using discrete event simulation | |
JP2009157597A (en) | Automatic distribution system for remote maintenance software, and automatic distribution method for remote maintenance software |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |