CN113806011A - Cluster resource control method and device, cluster and computer readable storage medium - Google Patents

Cluster resource control method and device, cluster and computer readable storage medium Download PDF

Info

Publication number
CN113806011A
CN113806011A CN202110942743.0A CN202110942743A CN113806011A CN 113806011 A CN113806011 A CN 113806011A CN 202110942743 A CN202110942743 A CN 202110942743A CN 113806011 A CN113806011 A CN 113806011A
Authority
CN
China
Prior art keywords
user
resource control
cluster
login
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110942743.0A
Other languages
Chinese (zh)
Other versions
CN113806011B (en
Inventor
徐仕鑫
张涛
吕灼恒
胡梦龙
李斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Co Ltd filed Critical Dawning Information Industry Co Ltd
Priority to CN202110942743.0A priority Critical patent/CN113806011B/en
Publication of CN113806011A publication Critical patent/CN113806011A/en
Application granted granted Critical
Publication of CN113806011B publication Critical patent/CN113806011B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The application relates to a cluster resource control method and device, a cluster and a computer readable storage medium, wherein the cluster comprises a login node, and the method comprises the following steps: monitoring whether a behavior of creating a preset file exists in a preset directory of a login node; and creating a preset file under a preset directory to perform mandatory behavior of environment loading in the process of logging in a cluster for each user. If the login node exists, a resource control strategy is configured for the user according to the identification of the user, and available resources of the user at the login node are managed according to the resource control strategy; wherein the resource control policy includes an available resource quota of the user. Therefore, the method can realize that the users are not missed when each user logs in the node to limit the resources, and ensure the normal operation of the cluster.

Description

Cluster resource control method and device, cluster and computer readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a cluster resource control method and apparatus, a cluster, and a computer-readable storage medium.
Background
The cluster is mainly composed of five types of computing devices and three types of networks. The five types of computing devices mainly refer to management nodes, login nodes, computing nodes, switching devices, I/O devices and storage devices. The login node is equivalent to a gateway for a user to access the cluster system, and the computing node is the computing core of the whole cluster. Users typically perform simple compilation on the logging node and submit the computing program to the computing node, which runs the computing program.
After the user logs in the cluster, part of users unfamiliar with cluster operation mistakenly operate the calculation program on the login node, or part of bad users directly control the calculation program to operate on the login node, so that the cost of the calculation program on the calculation node is reduced. Generally, resources occupied by the computing program are large, and if the computing program is operated on the login node at the moment, the load of the login node is too high, so that the normal operation of the cluster system is influenced.
In the conventional method, the resources commonly consumed by all users are usually limited, and it is difficult to limit the available resources of each user. And even if the resource limitation is recorded for each user through the configuration file, the login behavior of the user is difficult to be accurately monitored, so that part of users are frequently missed when the resource limitation is carried out.
Disclosure of Invention
The embodiment of the application provides a cluster resource control method and device, a cluster and a computer readable storage medium, which can realize that users are not missed when each user logs in a node to limit resources and ensure the normal operation of the cluster.
In one embodiment, a cluster resource control method is provided, where the cluster includes a login node, and the method includes:
monitoring whether a behavior of creating a preset file exists in a preset directory of the login node; creating a preset file under a preset directory as a mandatory behavior of each user in the process of logging in the cluster;
if the login node exists, configuring a resource control strategy for the user according to the user identifier, and managing available resources of the user at the login node according to the resource control strategy; wherein the resource control policy includes an available resource quota of the user.
In the embodiment of the application, in the process of creating the preset file in the preset directory to log in the cluster for each user, the mandatory behavior which must be executed when the login node carries out environment loading is monitored, so that the login behavior of the user is monitored by monitoring whether the file creation behavior exists in the preset directory of the login node, and the login behavior of any user cannot be missed. And then, if the behavior of creating a preset file exists in the preset directory of the login node, configuring a resource control strategy for the user according to the user identification, and managing the available resources of the user at the login node according to the resource control strategy. Therefore, the method can realize that the users are not missed when each user logs in the node to limit the resources, and ensure the normal operation of the cluster.
In one embodiment, the preset directory includes a custom subdirectory included in the/tmp directory.
In the embodiment of the application, when common users log in the cluster in the linux system, environment variables need to be loaded in the/etc/profile.d directory, and temporary files are created in the preset directory based on the loaded environment variables to realize environment loading, so that the logging behavior of each user for logging in the cluster can be accurately monitored by monitoring whether the behavior of creating the preset files exists in the custom subdirectory contained in the/tmp directory of the login node, and the logging behavior of any user cannot be missed. Furthermore, resource management can be performed for each user, and the situation that a single user maliciously runs a complex calculation program on a login node or runs the calculation program on the login node due to unfamiliarity with cluster operation errors, so that the login node is overloaded and runs for a long time, and a cluster is halted, and normal running of the cluster is affected is avoided.
In one embodiment, the preset file is a custom file; the monitoring whether a behavior of creating a preset file exists in a preset directory of the login node includes:
and monitoring whether a behavior of creating the custom file exists under a custom subdirectory contained in the/tmp directory through an inotify tool.
In the embodiment of the application, the inotify tool can monitor all changes of the file, so that at the login node, whether a behavior of creating the preset file exists in the preset directory of the login node can be accurately monitored through the inotify tool. And then, if the behavior of creating a preset file exists in the preset directory of the login node, configuring a resource control strategy for the user according to the user identification, and managing the available resources of the user at the login node according to the resource control strategy. Therefore, when each user logs in the node to limit the resources, any user is not missed, and the normal operation of the cluster is ensured.
In one embodiment, the identification of the user comprises an ID of the user and a login process ID of the user to login to the login node; before configuring the resource control policy for the user according to the identifier of the user, the method further includes:
and acquiring the ID of the user and the login process ID of the user for logging in the login node from the user-defined file.
In the embodiment of the application, in the process that a user logs in a login node of a cluster, since the login node creates a custom file under a preset directory based on the ID of the user and the login process ID generated when the user logs in the login node at the moment, the cluster can acquire the ID of the user and the login process ID of the user logging in the login node from the custom file. And facilitating follow-up of the ID of the user and the login process ID of the user logging in the login node, configuring a resource control strategy for the user, and managing the available resources of the user in the login node according to the resource control strategy.
In one embodiment, the configuring a resource control policy for the user according to the identifier of the user includes:
judging whether a resource control group corresponding to the user identifier exists on the login node or not, wherein the resource control group comprises a resource control strategy;
and if so, configuring the resource control strategy in the resource control group to the identifier of the user.
In the embodiment of the application, if a behavior of creating a preset file exists in a preset directory of a login node, a user ID of a user and an ID of a login process generated when the user logs in the login node are acquired. And judging whether a resource control group corresponding to the user identifier exists on the login node, and if so, configuring a resource control strategy in the resource control group to the user identifier, namely adding a user ID into the resource control group. Subsequently, in the process of acquiring the IDs of all the sub-processes under the ID of the login process corresponding to the user ID, all the sub-processes corresponding to the user ID may be managed in the available resources of the login node based on the resource control policy preset in the resource control group.
Therefore, resource limitation can be performed on all sub-processes corresponding to each user at the login node, and normal operation of the cluster is guaranteed.
In one embodiment, the configuring a resource control policy for the user according to the identifier of the user further includes:
if not, a new resource control group is created, and the corresponding relation between the user identifier and the new resource control group is established;
and configuring the resource control strategy in the new resource control group to the identifier of the user.
In the embodiment of the application, if a behavior of creating a preset file exists in a preset directory of a login node, a user ID of a user and an ID of a login process generated when the user logs in the login node are acquired. And judging whether a resource control group corresponding to the user identifier exists on the login node, and if so, configuring a resource control strategy in the resource control group to the user identifier, namely adding a user ID into the resource control group. And if the resource control group does not exist, a new resource control group is created, the corresponding relation between the user identifier and the new resource control group is established, and the resource control strategy in the new resource control group is configured to the user identifier. The user ID of the resource control group existing on the login node or the user ID of the resource control group not existing on the login node can be added into the resource control group. Subsequently, in the process of acquiring the IDs of all the sub-processes under the ID of the login process corresponding to the user ID, all the sub-processes corresponding to the user ID may be managed in the available resources of the login node based on the resource control policy preset in the resource control group.
Therefore, resource limitation can be performed on all sub-processes corresponding to each user at the login node, and normal operation of the cluster is guaranteed.
In one embodiment, the users include a bash type user or a csh type user.
In the embodiment of the present application, the cluster resource control method provided is suitable for a bash type user or a csh type user, and the present application does not limit this. Therefore, the cluster resource control can be performed for various users, and the applicability of the cluster resource control method is improved.
In one embodiment, an apparatus for controlling cluster resources is provided, the cluster including a login node, the apparatus comprising:
the monitoring module is used for monitoring whether a behavior of creating a preset file exists in a preset directory of the login node; creating a preset file under a preset directory as a mandatory behavior of each user in the process of logging in the cluster;
the resource control module is used for configuring a resource control strategy for the user according to the identification of the user and managing available resources of the user at the login node according to the resource control strategy if the user exists; wherein the resource control policy includes an available resource quota of the user.
A cluster comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the bluetooth communication method as described above.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the bluetooth communication method as described above.
The cluster resource control method and device, the cluster and the computer readable storage medium are provided, wherein the cluster comprises a login node, and the method comprises the following steps: monitoring whether a behavior of creating a preset file exists in a preset directory of a login node; and creating a preset file under a preset directory to perform mandatory behavior of environment loading in the process of logging in a cluster for each user. If the login node exists, a resource control strategy is configured for the user according to the identification of the user, and available resources of the user at the login node are managed according to the resource control strategy; wherein the resource control policy includes an available resource quota of the user.
Since the mandatory behavior which must be executed when the login node carries out environment loading is carried out in the process of creating the preset file in the preset directory for each user to log in the cluster, the login behavior of the user is monitored by monitoring whether the file creation behavior exists in the preset directory of the login node, and the login behavior of any user cannot be missed. And then, if the behavior of creating the preset file exists in the preset directory of the login node, configuring a resource control strategy for the user according to the user identification, and managing the available resources of the user at the login node according to the resource control strategy. Therefore, the method can realize that the users are not missed when each user logs in the node to limit the resources, and ensure the normal operation of the cluster.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram of an application environment of a cluster resource control method in one embodiment;
FIG. 2 is a flow diagram of a cluster resource control method in one embodiment;
FIG. 3 is a flow chart of a cluster resource control method in another embodiment;
FIG. 4 is a flow diagram of a method for configuring resource control policies for users based on their identities in one embodiment;
FIG. 5 is a flowchart of a method for configuring resource control policies for users based on their identities in another embodiment;
FIG. 6 is a flow diagram of a cluster resource control method in a specific embodiment;
FIG. 7 is a block diagram of an embodiment of a cluster resource control device;
FIG. 8 is a block diagram of an alternative embodiment of a cluster resource controller;
fig. 9 is a schematic internal structure diagram of a cluster in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The cluster is mainly composed of five types of computing devices and three types of networks. The five types of computing devices mainly refer to management nodes, login nodes, computing nodes, switching devices, I/O devices and storage devices. The login node is equivalent to a gateway for a user to access the cluster system, and the computing node is a computing core of the whole cluster. Users typically perform simple compilation on the logging node and submit the computing program to the computing node, which runs the computing program. The user can not perform large-scale compiling and parallel compiling on the login node, and can perform large-scale compiling and parallel compiling on the computing node.
After the user logs in the cluster, part of users unfamiliar with cluster operation mistakenly operate the calculation program on the login node, or part of bad users directly control the calculation program to operate on the login node, so that the cost of the calculation program on the calculation node is reduced. Generally, resources occupied by a computing program are large, and if the computing program is operated on a login node at the moment, the load of the login node is too high, so that the normal use of other login users is influenced, and the normal operation of a cluster system is influenced.
In order to solve the above problems, the conventional method limits the available resources of the user at the login node through a cgroup technology. However, the cgroup technology is mainly used to limit shared resources of users at login nodes, that is, limit resources consumed by all users, and it is difficult to limit available resources of each user. And if the resource allocation of a single user is limited by a command, there is a drawback. For example, the first login session restriction after command enablement is not in effect, etc. If the resource allocation of a single user is limited by the configuration file, it often happens that the resource allocation of a part of users is missed. Wherein, cgroup is a function of the Linux kernel, and is used to limit, control and separate resources (such as CPU, memory, disk input and output, etc.) of a process group.
Therefore, the cluster resource control method provided in the embodiment of the application can realize that users are not missed when each user logs in a node to limit resources, thereby ensuring normal operation of a cluster.
Fig. 1 is an application scenario diagram of a cluster resource control method in an embodiment. As shown in FIG. 1, the application environment includes a cluster 100, the cluster 100 including a logging node 120 and a computing node 140. The user logs in to the cluster through the login node based on the terminal 160 where the user is located, and performs a simple compiling action at the login node. Or, the user establishes communication connection with the computing node based on the login node, and performs complex compiling behaviors or operations on the computing node. Here, the cluster 100 may be an HPC (High Performance Computing), but the present application is not limited thereto. The login node 120 and the computing node 140 each include a plurality of computer devices (not shown).
FIG. 2 is a flowchart of a cluster resource control method in one embodiment. The cluster resource control method in this embodiment is described by taking the cluster 100 in fig. 1 as an example. The method includes the following steps 220 through 240, wherein,
step 220, monitoring whether a behavior of creating a preset file exists in a preset directory of a login node; and creating a preset file under a preset directory to perform mandatory behavior of environment loading in the process of logging in a cluster for each user.
The login node is equivalent to a gateway for a user to access the cluster system, so that the user logs in the cluster, namely the user logs in the login node of the cluster. Generally, a user first needs to perform environment loading during the process of logging in a cluster. In the embodiment of the application, in the process of environment loading, a step of creating a preset file under a preset directory is added in a self-defining manner, and the preset directory is a directory for creating a temporary cache file. Therefore, in the process of logging in the cluster by the user and loading the environment, the preset file must be created under the preset directory.
The behavior of creating the preset file in the preset directory for the user is a mandatory behavior in the process of logging in the cluster for each user and carrying out environment loading, namely, steps which must be passed by each user in the process of logging in the cluster for carrying out environment loading. Therefore, the logging behavior of the user logging in the cluster can be accurately monitored by monitoring whether the behavior of creating the preset file exists in the preset directory of the logging node.
Step 240, if the resource exists, configuring a resource control strategy for the user according to the user identifier, and managing available resources of the user at the login node according to the resource control strategy; wherein the resource control policy includes an available resource quota of the user.
Specifically, if it is monitored that a behavior of creating a preset file exists in a preset directory of a login node, a corresponding resource control policy is configured for the user according to the user identifier. The user identification comprises the ID of the user and the login process ID of the user for logging in the login node. And the resource control strategy sets the resource which can be used by the user and the strategy of how to use the resource. For example, the resource control policy sets a preset type of cluster resources that can be used by the user at the login node, a preset usage amount of the cluster resources that can be used, and the like, and may also set a preset time for the user to use the cluster resources at the login node, and the like, which is not limited in this application. The preset type of the cluster resource comprises at least one of a CPU, a memory, a disk IO and the like, the preset usage limit comprises at least one of a maximum CPU utilization rate of 80%, a maximum memory occupation size of 200G, CPU, a maximum number of occupied memory of 8 and the like, the preset time comprises that the time when the user uses the cluster resource is 10-12 pm and the like, and the application does not limit the time.
Because the resource that the user can use and the strategy of how to use the resource are set in the resource control strategy, the available resource of the user at the login node can be managed based on the resource control strategy. I.e. the resources that the user can use at the login node and how to use them.
For example, the resource control policy sets a preset type of cluster resources that can be used by the user at the login node, a preset usage amount of the cluster resources that can be used, and the like, and may also set a preset time for the user to use the cluster resources at the login node, and the like, which is not limited in this application. Then, the user can be controlled to use the resource corresponding to the preset type only at the login node, the size of the preset resource occupied by the user is controlled not to exceed the preset usage limit, and the time for the user to use the preset resource is controlled to be within the preset time range.
The main steps of managing the available resources of the user at the login node are as follows:
Figure BDA0003215490610000101
Figure BDA0003215490610000111
in the embodiment of the application, in the process of creating the preset file in the preset directory to log in the cluster for each user, the mandatory behavior which must be executed when the login node carries out environment loading is monitored, so that the login behavior of the user is monitored by monitoring whether the file creation behavior exists in the preset directory of the login node, and the login behavior of any user cannot be missed. And then, if the behavior of creating the preset file exists in the preset directory of the login node, configuring a resource control strategy for the user according to the user identification, and managing the available resources of the user at the login node according to the resource control strategy. Therefore, the method can realize that the users are not missed when each user logs in the node to limit the resources, and ensure the normal operation of the cluster.
In one embodiment, the preset directory comprises a custom subdirectory contained in the/tmp directory.
And the/etc/profile.d directory in the linux system comprises environment variables required by the user for environment loading. In the process of logging in a cluster by a common user in the linux system, environment variables are required to be loaded in the/etc/profile.d directory, and a preset file is created in the preset directory based on the loaded environment variables to realize environment loading. The preset directory is a user-defined subdirectory contained in the/tmp directory, and the preset file created under the user-defined subdirectory contained in the/tmp directory is a temporary file.
Specifically, an operating system adopted in the cluster is a linux system, and in the process that a common user in the linux system logs in the cluster, all login nodes need to load environment variables in an/etc/profile.d directory. Thus, the user may be forced to create a temporary file under the custom subdirectory contained in the/tmp directory during the process of loading the environment variables.
In the embodiment of the application, because the common users in the linux system need to load the environment variables in the/etc/profile.d directory when logging in the cluster. Therefore, the user can be forced to create the temporary file under the customized subdirectory contained in the/tmp directory in the process of loading the environment variable, and therefore, the login behavior of each user for logging in the cluster can be accurately monitored by monitoring whether the behavior of creating the temporary file exists under the customized subdirectory contained in the/tmp directory of the login node, and the login behavior of any user cannot be missed. Furthermore, resource management can be performed for each user, and the situation that a single user maliciously runs a complex calculation program on a login node or runs the calculation program on the login node due to unfamiliarity with cluster operation errors, so that the login node is overloaded and runs for a long time, and a cluster is halted, and normal running of the cluster is affected is avoided.
In one embodiment, the default file is a custom file; monitoring whether a behavior of creating a preset file exists in a preset directory of a login node, wherein the behavior comprises the following steps:
and monitoring whether a behavior of creating a custom file exists under a custom subdirectory contained in the/tmp directory through an inotify tool.
The inotify tool is also called an inotify file monitoring tool, is a powerful and fine-grained asynchronous file system monitoring mechanism, and can meet various file monitoring requirements. The access attribute, the read-write attribute, the authority attribute, the operations of deletion, creation, movement and the like of the file system can be monitored, and almost all changes of the file can be monitored. The inotify-tools are a set of development interface library functions based on C voice provided for an inotify file monitoring tool under a linux system, and a series of command line tools are provided at the same time, and the tools can be used for monitoring events of the file system. Since the inotify-tools are written in C-voice, they do not rely on other software except for kernel support.
Specifically, when a common user logs in a cluster in the linux system, the environment variables are loaded and a temporary file is created to perform environment loading under a/tmp directory. The temporary file is a user-defined file, and the content of the temporary file is not limited. Therefore, at the login node, whether a behavior of creating a custom file exists in the custom subdirectory contained in the/tmp directory is monitored through the inotify tool, so that the login behavior of each user for logging in the cluster can be accurately monitored, and the login behavior of any user cannot be missed.
The temporary file is a user-defined file, such as a sshcgroup.sh file and a sshcgroup.csh file, which is not limited in the present application. These two files are the files created under the custom subdirectories contained in the/tmp directory when each user is loaded in the environment during login to the cluster.
The method comprises the following main steps of monitoring whether a behavior of creating a custom file exists in a custom subdirectory contained in a/tmp directory through an inotify tool:
inotifywait-mq--format'%f'-e create$cgtmp_path|while read-r file
in the embodiment of the application, the inotify tool can monitor the change of the file, so that at the login node, whether the behavior of creating the preset file exists in the preset directory of the login node can be accurately monitored through the inotify tool. And then, if the behavior of creating the preset file exists in the preset directory of the login node, configuring a resource control strategy for the user according to the user identification, and managing the available resources of the user at the login node according to the resource control strategy. Therefore, when each user logs in the node to limit the resources, any user is not missed, and the normal operation of the cluster is ensured.
In one embodiment, as shown in FIG. 3, the identification of the user includes the ID of the user and the login process ID of the user to login to the login node; before configuring the resource control strategy for the user according to the user identification, the method further comprises the following steps:
step 230, obtaining the ID of the user and the ID of the login process of the user logging in the login node from the custom file.
The user identification comprises the ID of the user and the login process ID of the user for logging in the login node. Specifically, whether a behavior of creating a custom file exists in a preset directory of a login node is monitored, and if the behavior of creating the custom file exists in the preset directory of the login node is monitored, an ID of a user and a login process ID of the user logging in the login node are obtained from the custom file. In the process that a user logs in a cluster, a login node creates a custom file under a preset directory based on the ID of the user and the login process ID generated when the user logs in the login node, so that the cluster can acquire the ID of the user and the login process ID of the user logging in the login node from the custom file.
The user id (identification) is a unique identity when the user logs in a login node of the cluster through the terminal, is used for identifying different users, and can be represented by a UID. The ID of the login process is the ID of the login process generated when the user logs in at the login node, and can be represented by PID. Based on the ID of the login process, the IDs of all the sub-processes under the ID of the login process can be obtained.
The cluster acquires the UID and the PID from the custom file by the following main steps:
Figure BDA0003215490610000141
in the embodiment of the application, in the process that the user logs in the login node of the cluster, the cluster can acquire the ID of the user and the login process ID of the user logging in the login node from the custom file because the login node is created under the preset directory based on the ID of the user and the login process ID generated when the user logs in the login node at the moment. And the ID of the user and the login process ID of the user logging in the login node are conveniently followed subsequently, a resource control strategy is configured for the user, and the available resources of the user at the login node are managed according to the resource control strategy.
In one embodiment, as shown in fig. 4, step 240, configuring a resource control policy for a user according to the user's identification includes:
step 242, determining whether a resource control group corresponding to the user identifier exists on the login node, where the resource control group includes a resource control policy;
if so, the resource control policy in the resource control group is configured to the identity of the user, step 244.
Specifically, whether a behavior of creating a preset file exists in a preset directory of a login node is monitored. And if the behavior of creating the preset file exists in the preset directory of the monitoring login node, acquiring the identifier of the user. The user identification comprises a user ID and an ID of a login process generated when the user logs in the login node.
Further, whether a resource control group cgroup corresponding to the user identifier exists on the login node is judged. The resource control group comprises a resource control strategy, and a unique resource control group is configured for each user in advance. Resource control groups (CGroups) are a feature of the Linux kernel, and are mainly used to isolate, limit, audit, etc. shared resources. Resource competition for the host system when multiple containers run simultaneously can be avoided only by controlling resources allocated to different containers. Therefore, different resource control groups can provide restriction and charging management for resources such as memory, CPU, disk IO and the like of different containers respectively.
Since each user is configured with a unique resource control group in advance, if a resource control group corresponding to the identifier of the user exists on the login node, the resource control policy in the resource control group can be configured to the identifier of the user, that is, the identifier of the user is added to the corresponding resource control group. Therefore, the available resources of the users at the login node can be managed individually based on the resource control policy in the resource control group, and the resource configuration is not unbalanced because all users can be controlled to share all resources as in the conventional method.
In the embodiment of the application, if a behavior of creating a preset file exists in a preset directory of a login node, a user ID of a user and an ID of a login process generated when the user logs in the login node are acquired. And judging whether a resource control group corresponding to the user identifier exists on the login node, and if so, configuring a resource control strategy in the resource control group to the user identifier, namely adding the user ID into the resource control group. Subsequently, after the IDs of all the sub-processes under the ID of the login process corresponding to the user ID are obtained, all the sub-processes corresponding to the user ID may be managed on the available resources of the login node based on the resource control policy preset in the resource control group.
Therefore, resource limitation can be performed on all sub-processes corresponding to each user at the login node, and normal operation of the cluster is guaranteed.
In one embodiment, as shown in fig. 5, configuring a resource control policy for a user according to an identifier of the user further includes:
step 246, if not, creating a new resource control group, and establishing a corresponding relationship between the user identifier and the new resource control group;
step 248, the resource control policy in the new resource control group is configured to the user's identity.
Specifically, whether a behavior of creating a preset file exists in a preset directory of a login node is monitored. And if the behavior of creating the preset file exists in the preset directory of the monitoring login node, acquiring the identifier of the user. The user identification comprises a user ID and an ID of a login process generated when the user logs in the login node.
Further, whether a resource control group cgroup corresponding to the user identifier exists on the login node is judged. And if the resource control group corresponding to the identifier of the user does not exist on the login node, the user is the first login cluster. Since the user logs in the cluster for the first time, the resource control group corresponding to the user is not stored in advance in the login node. Thus, a new resource control group may be established for the user, and a correspondence between the user's identity and the new resource control group may be established. Similarly, a resource control group includes a resource control policy.
Thus, after a new resource control group is established for the user, the resource control policy in the new resource control group is configured to the identifier of the user, that is, the identifier of the user is added to the corresponding resource control group. Therefore, the available resources of the user at the login node can be managed individually based on the resource control policy in the resource control group, and the resource configuration imbalance caused by the fact that all users can only be controlled to share all resources in the traditional method is avoided.
The main steps for creating a resource control group for a user are as follows:
cgcreate-g memory,cpu,cpuset:$login_user
cgset-r cpu.cfs_quota_us=180000-r memory.limit_in_bytes=4G-rmemory.oom_control=1-r cpuset.cpus=$cpus_set-r cpuset.mems=$mem_node$login_user
in the embodiment of the application, if a behavior of creating a preset file exists in a preset directory of a login node, a user ID of a user and an ID of a login process generated when the user logs in the login node are acquired. And judging whether a resource control group corresponding to the user identifier exists on the login node, and if so, configuring a resource control strategy in the resource control group to the user identifier, namely adding the user ID into the resource control group. And if the resource control strategy does not exist, a new resource control group is created, the corresponding relation between the user identifier and the new resource control group is established, and the resource control strategy in the new resource control group is configured to the user identifier. The user ID of the resource control group existing on the login node or the user ID of the resource control group not existing on the login node can be added into the resource control group. Subsequently, after the IDs of all the sub-processes under the ID of the login process corresponding to the user ID are obtained, all the sub-processes corresponding to the user ID may be managed on the available resources of the login node based on the resource control policy preset in the resource control group.
Therefore, resource limitation can be performed on all sub-processes corresponding to each user at the login node, and normal operation of the cluster is guaranteed.
In one embodiment, the users include a bash type user or a csh type user.
Wherein, bash and csh are the types of shells in Linux, and bash is the abbreviation of Bourn Again Shell and is the default Shell of Linux standard. While cshell (csh) provides user interaction features that Bourne Shell cannot handle, such as command completion, command aliases, historical command replacement, etc. However, CShell is not compatible with BourneShell.
Specifically, the user may choose to use different shell environments in the linux system, and the default environment is generally bash, that is, when the user executes commands or other executable scripts, the commands or scripts are interpreted into the environment for instructions that can be recognized by the system, which is determined according to the default shell of the user (in the case that the scripts are not specified). For example, the default shell for a bash type user is bash and the default shell for a csh user type is csh.
In the embodiment of the present application, the cluster resource control method provided is suitable for a bash type user or a csh type user, and the present application does not limit this. Therefore, the cluster resource control can be performed for various users, and the applicability of the cluster resource control method is improved.
In one embodiment, as shown in fig. 6, a cluster resource control method is provided, which is applied to an HPC (High Performance Computing), where a cluster includes a login node and a compute node. The method comprises the steps of, wherein,
step 602, monitoring whether a behavior of creating a custom file exists under a custom subdirectory contained in a/tmp directory of a login node through an inotify tool;
step 604, if yes, obtaining the ID of the user and the login process ID of the user for logging in the login node from the user-defined file;
step 606, judging whether a resource control group corresponding to the user identifier exists on the login node, wherein the resource control group comprises a resource control strategy;
if so, the resource control policy in the resource control group is configured to the user identifier, step 608.
Step 610, if not, a new resource control group is created, and a corresponding relationship between the user identifier and the new resource control group is established; configuring the resource control strategy in the new resource control group to the identifier of the user;
step 612, managing the available resources of the user at the login node according to the resource control policy.
In the embodiment of the application, the inotify tool can monitor all changes of the file, so that at the login node, whether a behavior of creating the preset file exists in the preset directory of the login node can be accurately monitored through the inotify tool. And further, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, whether a resource control group corresponding to the user identifier exists on the login node or not is further judged, and the resource control group comprises a resource control strategy. And if so, configuring the resource control strategy in the resource control group to the identifier of the user. And if the resource control strategy does not exist, a new resource control group is created, the corresponding relation between the user identifier and the new resource control group is established, and the resource control strategy in the new resource control group is configured to the user identifier. Finally, the available resources of the user at the login node can be managed based on the resource control policy in the resource control group. Therefore, resource limitation of each user at the login node can be realized, and normal operation of the cluster is guaranteed.
In one embodiment, as shown in fig. 7, there is provided a cluster resource control apparatus 700, wherein a cluster includes a login node, the apparatus comprising:
the monitoring module 720 is configured to monitor whether a behavior of creating a preset file exists in a preset directory of the login node; creating a preset file under a preset directory to perform mandatory behavior of environment loading in the process of logging in a cluster for each user;
the resource control module 740 is configured to, if the resource control policy exists, configure a resource control policy for the user according to the user identifier, and manage available resources of the user at the login node according to the resource control policy; wherein the resource control policy includes an available resource quota of the user.
In one embodiment, the preset directory comprises a custom subdirectory contained in the/tmp directory.
In one embodiment, the default file is a custom file; the monitoring module 720 is further configured to monitor whether a behavior of creating a custom file exists under a custom subdirectory contained in the/tmp directory through an inotify tool.
In one embodiment, as shown in fig. 8, there is provided a cluster resource control apparatus, where the identification of the user includes an ID of the user and a login process ID of the user to login to a login node; before configuring the resource control strategy for the user according to the user identification, the method further comprises the following steps:
the ID obtaining module 730 is configured to obtain, from the custom file, an ID of the user and a login process ID of the user logging in the login node.
In one embodiment, the resource control module 740 is further configured to determine whether a resource control group corresponding to the identifier of the user exists on the login node, where the resource control group includes a resource control policy; and if so, configuring the resource control strategy in the resource control group to the identifier of the user.
In one embodiment, the resource control module 740 is further configured to create a new resource control group if the new resource control group does not exist, and establish a corresponding relationship between the user identifier and the new resource control group; and configuring the resource control strategy in the new resource control group to the identifier of the user.
In one embodiment, the users include a bash type user or a csh type user.
It should be understood that, although the steps in the flowcharts in the above-described figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in the above figures may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
The division of each module in the cluster resource control device is only used for illustration, and in other embodiments, the cluster resource control device may be divided into different modules as needed to complete all or part of the functions of the cluster resource control device.
For specific limitations of the cluster resource control device, reference may be made to the above limitations of the cluster resource control method, which is not described herein again. The modules in the cluster resource control device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a cluster is further provided, which includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to perform the steps of a cluster resource control method provided in the above embodiments.
Fig. 9 is a schematic internal structure diagram of a cluster in one embodiment. As shown in fig. 9, the cluster includes a processor and a memory connected by a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole cluster. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor to implement a cluster resource control method provided by the above embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The cluster may be any terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, and a wearable device.
The implementation of each module in the cluster resource control device provided in the embodiment of the present application may be in the form of a computer program. The computer program may be run on a cluster. Program modules of the computer program may be stored on the memory of the cluster. Which when executed by a processor, performs the steps of the method described in the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of the cluster resource control method.
A computer program product containing instructions which, when run on a computer, cause the computer to perform a cluster resource control method.
Any reference to memory, storage, database, or other medium used by embodiments of the present application may include non-volatile and/or volatile memory. Suitable non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
The above embodiments of cluster resource control only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for cluster resource control, wherein the cluster includes a login node, the method comprising:
monitoring whether a behavior of creating a preset file exists in a preset directory of the login node; creating a preset file in a preset directory, and providing mandatory behaviors of each user in the process of logging in the cluster;
if the login node exists, configuring a resource control strategy for the user according to the user identifier, and managing available resources of the user at the login node according to the resource control strategy; wherein the resource control policy includes an available resource quota of the user.
2. The method according to claim 1, wherein the preset directory comprises a custom subdirectory included in a/tmp directory.
3. The method according to claim 2, wherein the preset file is a custom file; the monitoring whether a behavior of creating a preset file exists in a preset directory of the login node includes:
and monitoring whether a behavior of creating the custom file exists under a custom subdirectory contained in the/tmp directory through an inotify tool.
4. The cluster resource control method of claim 3, wherein the identification of the user comprises an ID of the user and a login process ID of the user to login to the login node; before configuring the resource control policy for the user according to the identifier of the user, the method further includes:
and acquiring the ID of the user and the login process ID of the user for logging in the login node from the user-defined file.
5. The method according to any of claims 1-4, wherein said configuring a resource control policy for the user according to the user's identity comprises:
judging whether a resource control group corresponding to the user identifier exists on the login node or not, wherein the resource control group comprises a resource control strategy;
and if so, configuring the resource control strategy in the resource control group to the identifier of the user.
6. The method according to claim 5, wherein the configuring the resource control policy for the user according to the user identifier further comprises:
if not, a new resource control group is created, and the corresponding relation between the user identifier and the new resource control group is established;
and configuring the resource control strategy in the new resource control group to the identifier of the user.
7. The cluster resource control method of claim 1, wherein the users comprise bash type users or csh type users.
8. An apparatus for cluster resource control, wherein the cluster includes a login node, the apparatus comprising:
the monitoring module is used for monitoring whether a behavior of creating a preset file exists in a preset directory of the login node; creating a preset file in a preset directory, and providing mandatory behaviors of each user in the process of logging in the cluster;
the resource control module is used for configuring a resource control strategy for the user according to the identification of the user and managing available resources of the user at the login node according to the resource control strategy if the user exists; wherein the resource control policy includes an available resource quota of the user.
9. A cluster comprising a memory and a processor, the memory having stored thereon a computer program, characterized in that the computer program, when executed by the processor, causes the processor to carry out the steps of the cluster resource control method according to any of the claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the cluster resource control method according to any one of claims 1 to 7.
CN202110942743.0A 2021-08-17 2021-08-17 Cluster resource control method and device, cluster and computer readable storage medium Active CN113806011B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110942743.0A CN113806011B (en) 2021-08-17 2021-08-17 Cluster resource control method and device, cluster and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110942743.0A CN113806011B (en) 2021-08-17 2021-08-17 Cluster resource control method and device, cluster and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113806011A true CN113806011A (en) 2021-12-17
CN113806011B CN113806011B (en) 2023-12-19

Family

ID=78893673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110942743.0A Active CN113806011B (en) 2021-08-17 2021-08-17 Cluster resource control method and device, cluster and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113806011B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1607824A2 (en) * 2004-06-18 2005-12-21 Circle Unlimited AG Method and system for resource management and licence management in a computer system
US7941709B1 (en) * 2007-09-28 2011-05-10 Symantec Corporation Fast connectivity recovery for a partitioned namespace
CN106790636A (en) * 2017-01-09 2017-05-31 上海承蓝科技股份有限公司 A kind of equally loaded system and method for cloud computing server cluster
CN107783836A (en) * 2016-08-31 2018-03-09 中国电信股份有限公司 Method and apparatus based on linux container control Web application resources
CN109150921A (en) * 2018-11-05 2019-01-04 郑州云海信息技术有限公司 A kind of login method of multi-node cluster, device, equipment and storage medium
CN111858020A (en) * 2019-04-30 2020-10-30 中移(苏州)软件技术有限公司 User resource limiting method, device and computer storage medium
US20210084048A1 (en) * 2019-09-18 2021-03-18 International Business Machines Corporation Cognitive Access Control Policy Management in a Multi-Cluster Container Orchestration Environment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1607824A2 (en) * 2004-06-18 2005-12-21 Circle Unlimited AG Method and system for resource management and licence management in a computer system
US7941709B1 (en) * 2007-09-28 2011-05-10 Symantec Corporation Fast connectivity recovery for a partitioned namespace
CN107783836A (en) * 2016-08-31 2018-03-09 中国电信股份有限公司 Method and apparatus based on linux container control Web application resources
CN106790636A (en) * 2017-01-09 2017-05-31 上海承蓝科技股份有限公司 A kind of equally loaded system and method for cloud computing server cluster
CN109150921A (en) * 2018-11-05 2019-01-04 郑州云海信息技术有限公司 A kind of login method of multi-node cluster, device, equipment and storage medium
CN111858020A (en) * 2019-04-30 2020-10-30 中移(苏州)软件技术有限公司 User resource limiting method, device and computer storage medium
US20210084048A1 (en) * 2019-09-18 2021-03-18 International Business Machines Corporation Cognitive Access Control Policy Management in a Multi-Cluster Container Orchestration Environment

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
DYLAN GARDNER ET AL.: "Arbiter: Dynamically Limiting Resource Consumption on Login Nodes", PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING (PEARC ’19), no. 32, pages 1 - 7, XP058872864, DOI: 10.1145/3332186.3333043 *
GITHUB: "lnist-CNRS/node-inotifywait", pages 1 - 4, Retrieved from the Internet <URL:https://github.com/lnist-CNRS/node-inotifywait> *
刘光亚: "实时同步云存储客户端的设计与实现", 中国优秀硕士学位论文全文数据库 (信息科技辑), no. 07, pages 137 - 51 *
唐金等: "一种高性能计算集群登录节点资源限制方法", 实验室研究与探索, vol. 40, no. 04, pages 24 - 26 *
孙震宇等: "大规模异构计算集群的双层作业调度系统", 计算机工程, vol. 46, no. 01, pages 187 - 195 *
码源: "mirrors_back/cgroups", pages 121 - 125, Retrieved from the Internet <URL:https://gitee.com/mirrors_back/cgroups?_from=gitee_search> *

Also Published As

Publication number Publication date
CN113806011B (en) 2023-12-19

Similar Documents

Publication Publication Date Title
CN106537338B (en) Self-expanding clouds
JP6005706B2 (en) Virtual machine morphing for heterogeneous mobile environments
US20160139949A1 (en) Virtual machine resource management system and method thereof
CN106775946B (en) A kind of virtual machine Method of Creation Process
CN101211272A (en) Dynamic virtual machine generation
CN111338779B (en) Resource allocation method, device, computer equipment and storage medium
CN113204353B (en) Big data platform assembly deployment method and device
CN104809045A (en) Operation method and device of monitoring script
CN113849266A (en) Service deployment method and device for multiple Kubernetes clusters
CN111885184A (en) Method and device for processing hot spot access keywords in high concurrency scene
CN116028163A (en) Method, device and storage medium for scheduling dynamic link library of container group
CN113760516A (en) Elastic expansion method, device, equipment and medium in multi-cloud environment
US11561843B2 (en) Automated performance tuning using workload profiling in a distributed computing environment
CN113485830A (en) Micro-service automatic capacity expansion method for power grid monitoring system
CN106330595B (en) Heartbeat detection method and device for distributed platform
CN111858020B (en) User resource limiting method and device and computer storage medium
CN111431951B (en) Data processing method, node equipment, system and storage medium
CN111159271A (en) Data processing method and device, computer equipment and storage medium
CN116450165A (en) Method, system, terminal and storage medium for quickly building environment and deploying program
CN111399999A (en) Computer resource processing method and device, readable storage medium and computer equipment
CN110806891A (en) Method and device for generating software version of embedded equipment
CN113806011B (en) Cluster resource control method and device, cluster and computer readable storage medium
CN114661427B (en) Node management method and system for computing cluster for deploying containerized application service
CN113220368B (en) Storage client resource isolation method, system, terminal and storage medium
CN112564979B (en) Execution method and device of construction task, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant