CN113806011B - Cluster resource control method and device, cluster and computer readable storage medium - Google Patents

Cluster resource control method and device, cluster and computer readable storage medium Download PDF

Info

Publication number
CN113806011B
CN113806011B CN202110942743.0A CN202110942743A CN113806011B CN 113806011 B CN113806011 B CN 113806011B CN 202110942743 A CN202110942743 A CN 202110942743A CN 113806011 B CN113806011 B CN 113806011B
Authority
CN
China
Prior art keywords
user
resource control
cluster
login
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110942743.0A
Other languages
Chinese (zh)
Other versions
CN113806011A (en
Inventor
徐仕鑫
张涛
吕灼恒
胡梦龙
李斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Co Ltd filed Critical Dawning Information Industry Co Ltd
Priority to CN202110942743.0A priority Critical patent/CN113806011B/en
Publication of CN113806011A publication Critical patent/CN113806011A/en
Application granted granted Critical
Publication of CN113806011B publication Critical patent/CN113806011B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The application relates to a cluster resource control method and device, a cluster and a computer readable storage medium, wherein the cluster comprises login nodes, and the method comprises the following steps: monitoring whether a behavior of creating a preset file exists under a preset directory of a login node; and creating a preset file under a preset directory, and carrying out mandatory actions of environment loading in the process of logging in the cluster for each user. If the user identification exists, configuring a resource control strategy for the user according to the user identification, and managing available resources of the user at the login node according to the resource control strategy; wherein the resource control policy includes a quota of available resources for the user. Therefore, the method can prevent users from being missed when each user performs resource restriction on the login node, and ensure normal operation of the cluster.

Description

Cluster resource control method and device, cluster and computer readable storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for controlling cluster resources, a cluster, and a computer readable storage medium.
Background
The cluster is mainly composed of five types of computing devices and three types of networks. The five types of computing devices mainly refer to a management node, a login node, a computing node, a switching device, an I/O device and a storage device. The login node is equivalent to a gateway for a user to access the cluster system, and the computing node is a computing core of the whole cluster. The user typically simply compiles on the login node and submits the computing program to the computing node, which runs the computing program.
After logging in the cluster, part of users unfamiliar with the cluster operation can mistakenly run the computing program on the logging-in node, or part of bad users can directly control the computing program to run on the logging-in node, so that the cost generated by the running of the computing program on the computing node is reduced. The resources occupied by the general computing program are large, and if the computing program is run on the login node at the moment, the load of the login node is too high, so that the normal running of the cluster system is influenced.
In the conventional method, the resources commonly consumed by all users are usually limited, and it is difficult to limit the available resources of each user. And even if the resource limitation is recorded for each user through the configuration file, part of users are frequently missed when the resource limitation is carried out because the login behavior of the users is difficult to accurately monitor.
Disclosure of Invention
The embodiment of the application provides a cluster resource control method and device, a cluster and a computer readable storage medium, which can realize that each user cannot miss the user when logging in a node to limit resources, and ensure normal operation of the cluster.
In one embodiment, a method for controlling cluster resources is provided, where the cluster includes a login node, and the method includes:
Monitoring whether a behavior of creating a preset file exists under a preset directory of the login node; creating a preset file under a preset directory as mandatory behavior of each user in the process of logging in the cluster;
if yes, configuring a resource control strategy for the user according to the user identification, and managing available resources of the user at the login node according to the resource control strategy; wherein the resource control policy includes an available resource quota for the user.
In the embodiment of the application, because the mandatory behavior which must be executed when the login node loads the environment is performed in the process of creating the preset file under the preset directory for each user to log in the cluster, the login behavior of the user is monitored by monitoring whether the file creation behavior exists under the preset directory of the login node, so that the login behavior of any user is not missed. Then, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, a resource control strategy is configured for the user according to the user identifier, and the available resources of the user in the login node are managed according to the resource control strategy. Therefore, the method can prevent users from being missed when each user performs resource restriction on the login node, and ensure normal operation of the cluster.
In one embodiment, the preset directory includes a custom subdirectory contained in a tmp directory.
In the embodiment of the application, when a common user logs in a cluster in a linux system, environment variables are loaded under a/etc/profile.d directory, and a temporary file is created under a preset directory based on the loaded environment variables to realize environment loading, so that by monitoring whether a behavior of creating the preset file exists under a custom sub-directory contained in a/tmp directory of a login node, the login behavior of each user logging in the cluster can be accurately monitored, and the login behavior of any user can not be missed. Furthermore, resource management can be performed for each user, and the situation that a single user maliciously operates a complex computing program on a login node or the login node is overloaded for a long time and the cluster is halted due to the fact that the computing program is operated on the login node by mistake due to unfamiliar cluster operation is avoided, so that the normal operation of the cluster is influenced.
In one embodiment, the preset file is a custom file; the monitoring whether the behavior of creating the preset file exists in the preset directory of the login node comprises the following steps:
And monitoring whether the behavior of creating the custom file exists under the custom sub-directory contained in the/tmp directory through an inotify tool.
In the embodiment of the application, since the inotify tool can monitor all changes of the file, whether the behavior of creating the preset file exists in the preset directory of the login node or not can be monitored accurately by the inotify tool at the login node. And if the behavior of creating the preset file exists in the preset directory of the login node is monitored, configuring a resource control strategy for the user according to the user identifier, and managing available resources of the user in the login node according to the resource control strategy. Therefore, when each user performs resource restriction on the login node, any user is not missed, and normal operation of the cluster is ensured.
In one embodiment, the identification of the user includes an ID of the user and a login process ID of the user logging in the login node; before the resource control strategy is configured for the user according to the identification of the user, the method further comprises the following steps:
and acquiring the ID of the user and the login process ID of the user for logging in the login node from the custom file.
In the embodiment of the present application, in the process that a user logs in to a login node of a cluster, because the login node creates a custom file under a preset directory based on an ID of the user and a login process ID generated when the user logs in at the login node, the cluster can acquire the ID of the user and the login process ID of the user logging in the login node from the custom file. And the ID of the subsequent user and the login process ID of the user login to the login node are conveniently followed, a resource control strategy is configured for the user, and the available resources of the user at the login node are managed according to the resource control strategy.
In one embodiment, the configuring the resource control policy for the user according to the identification of the user includes:
judging whether a resource control group corresponding to the user identifier exists on the login node or not, wherein the resource control group comprises a resource control strategy;
if so, configuring the resource control strategy in the resource control group to the identification of the user.
In the embodiment of the application, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, the user ID of the user and the ID of the login process generated when the user logs in the login node are obtained. Judging whether a resource control group corresponding to the user identifier exists on the login node, if so, configuring the resource control strategy in the resource control group to the user identifier, namely adding the user ID into the resource control group. Subsequently, in the process of acquiring the IDs of all the sub-processes under the ID of the login process corresponding to the user ID, the available resources of the login node can be managed on the basis of the resource control policy preset in the resource control group for all the sub-processes corresponding to the user ID.
Therefore, resource restriction can be carried out on all sub-processes corresponding to each user at the login node, and normal operation of the cluster is ensured.
In one embodiment, the configuring the resource control policy for the user according to the identification of the user further includes:
if not, a new resource control group is created, and a corresponding relation between the user identification and the new resource control group is established;
and configuring the resource control strategy in the new resource control group to the identification of the user.
In the embodiment of the application, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, the user ID of the user and the ID of the login process generated when the user logs in the login node are obtained. Judging whether a resource control group corresponding to the user identifier exists on the login node, if so, configuring the resource control strategy in the resource control group to the user identifier, namely adding the user ID into the resource control group. If not, a new resource control group is created, a corresponding relation between the user identification and the new resource control group is established, and the resource control strategy in the new resource control group is configured to the user identification. The user ID can be added into the resource control group aiming at the user ID of the resource control group existing on the login node or the user ID of the resource control group not existing on the login node. Subsequently, in the process of acquiring the IDs of all the sub-processes under the ID of the login process corresponding to the user ID, the available resources of the login node can be managed on the basis of the resource control policy preset in the resource control group for all the sub-processes corresponding to the user ID.
Therefore, resource restriction can be carried out on all sub-processes corresponding to each user at the login node, and normal operation of the cluster is ensured.
In one embodiment, the user comprises a bash type user or a csh type user.
In the embodiment of the present application, the provided cluster resource control method is applicable to a bash type user or a csh type user, which is not limited in this application. Therefore, cluster resource control can be performed for various users, and the applicability of the cluster resource control method is improved.
In one embodiment, a cluster resource control device is provided, where the cluster includes a login node, and the device includes:
the monitoring module is used for monitoring whether a behavior of creating a preset file exists under the preset directory of the login node; creating a preset file under a preset directory as mandatory behavior of each user in the process of logging in the cluster;
the resource control module is used for configuring a resource control strategy for the user according to the user identification and managing available resources of the user at the login node according to the resource control strategy if the user identification exists; wherein the resource control policy includes an available resource quota for the user.
A cluster comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the bluetooth communication method as described above.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of a bluetooth communication method as described above.
The cluster resource control method and device, the cluster and the computer readable storage medium, wherein the cluster comprises login nodes, and the method comprises the following steps: monitoring whether a behavior of creating a preset file exists under a preset directory of a login node; and creating a preset file under a preset directory, and carrying out mandatory actions of environment loading in the process of logging in the cluster for each user. If the user identification exists, configuring a resource control strategy for the user according to the user identification, and managing available resources of the user at the login node according to the resource control strategy; wherein the resource control policy includes a quota of available resources for the user.
In the process of creating the preset file under the preset directory for each user to log in the cluster, mandatory actions which are required to be executed when the login node carries out environment loading are monitored, so that the login actions of the user are monitored by monitoring whether file creation actions exist under the preset directory of the login node, and the login actions of any user are not missed. Then, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, a resource control strategy is configured for the user according to the user identification, and the available resources of the user in the login node are managed according to the resource control strategy. Therefore, the method can prevent users from being missed when each user performs resource restriction on the login node, and ensure normal operation of the cluster.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of an application environment for a cluster resource control method in one embodiment;
FIG. 2 is a flow diagram of a method of cluster resource control in one embodiment;
FIG. 3 is a flow chart of a method of cluster resource control in another embodiment;
FIG. 4 is a flow diagram of a method of configuring a resource control policy for a user based on the identity of the user, in one embodiment;
FIG. 5 is a flow chart of a method of configuring a resource control policy for a user based on the identity of the user in another embodiment;
FIG. 6 is a flow chart of a method of cluster resource control in one embodiment;
FIG. 7 is a block diagram of a cluster resource control device in one embodiment;
FIG. 8 is a block diagram illustrating a cluster resource control device in accordance with another embodiment;
FIG. 9 is a schematic diagram of an internal structure of a cluster in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The cluster is mainly composed of five types of computing devices and three types of networks. The five types of computing devices mainly refer to a management node, a login node, a computing node, a switching device, an I/O device and a storage device. The login node is equivalent to a gateway for a user to access the cluster system, and the computing node is a computing core of the whole cluster. The user typically simply compiles on the login node and submits the computing program to the computing node, which runs the computing program. The user cannot perform large-scale compilation and parallel compilation on the login node, and can perform large-scale compilation and parallel compilation on the computing node.
After logging in the cluster, part of users unfamiliar with the cluster operation can mistakenly run the computing program on the logging-in node, or part of bad users can directly control the computing program to run on the logging-in node, so that the cost generated by the running of the computing program on the computing node is reduced. The resources occupied by the general calculation program are large, if the calculation program is run on the login node at the moment, the load of the login node is too high, so that on one hand, the normal use of other login users is influenced, and on the other hand, the normal operation of the cluster system is influenced.
In order to solve the above problem, the conventional method limits available resources of the user on the login node through the cgroup technology. However, the cgroup technology is mainly used for limiting the shared resources of the users at the login node, namely, limiting the resources commonly consumed by all the users, and it is difficult to realize the limitation of the available resources of each user. And if the resource allocation of a single user is limited by a command, there is a drawback. For example, the first login session limit after the command is enabled is not validated, etc. If the configuration file is used to limit the resource allocation of a single user, the resource allocation of partial users is frequently missed. Wherein cgroup is a function of Linux kernel, which is used to limit, control and separate resources (such as CPU, memory, disk input/output, etc.) of a process group.
Therefore, the embodiment of the application provides a cluster resource control method, which can prevent users from being missed when each user performs resource restriction on a login node, so that normal operation of a cluster is ensured.
Fig. 1 is an application scenario diagram of a cluster resource control method in one embodiment. As shown in fig. 1, the application environment includes a cluster 100, and the cluster 100 includes a login node 120 and a computing node 140. The user logs in the cluster through the login node based on the terminal 160 in which the user is located, and performs simple compiling behavior at the login node. Or, the user establishes communication connection with the computing node based on the login node, and performs complex compiling behavior or operation on the computing node. Here, the cluster 100 may be an HPC (High Performance Computing, high performance computer cluster), which is, of course, not limited in this application. Wherein, the login node 120 and the computing node 140 each comprise a plurality of computer devices (not shown).
FIG. 2 is a flow chart of a method of cluster resource control in one embodiment. The cluster resource control method in this embodiment is described taking the cluster 100 in fig. 1 as an example. The method includes the following steps 220 through 240, wherein,
step 220, monitoring whether a behavior of creating a preset file exists under a preset directory of the login node; and creating a preset file under a preset directory, and carrying out mandatory actions of environment loading in the process of logging in the cluster for each user.
The login node corresponds to a gateway for a user to access the cluster system, so that the user logs in the cluster, namely, the user logs in the login node of the cluster. In general, a user first needs to perform an environment loading in the process of logging in a cluster. In the embodiment of the present application, in the process of performing environment loading, a step of creating a preset file under a preset directory is added in a custom manner, and the preset directory is a directory for creating a temporary cache file. Therefore, in the process that the user logs in the cluster and performs environment loading, a preset file must be created under a preset directory.
Because the user creates the behavior of the preset file under the preset catalog, the mandatory behavior in the process of logging in the cluster and carrying out environment loading for each user is the step which is necessary to be passed by in the process of logging in the cluster for each user. Therefore, by monitoring whether the behavior of creating the preset file exists in the preset directory of the login node, the login behavior of the user login cluster can be accurately monitored.
Step 240, if so, configuring a resource control strategy for the user according to the user identification, and managing available resources of the user at the login node according to the resource control strategy; wherein the resource control policy includes a quota of available resources for the user.
Specifically, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, a corresponding resource control strategy is configured for the user according to the user identifier. The identification of the user comprises the ID of the user and the login process ID of the user login to the login node. And the resource control strategy sets the resource which can be used by the user and the strategy of how to use the resource. For example, the resource control policy sets a preset type of the cluster resource that can be used by the user at the login node, a preset usage amount of the cluster resource that can be used, and the like, and may also set a preset time that can be used by the user at the login node, and the like. The preset type of the cluster resource comprises at least one of a CPU, a memory, a disk IO and the like, the preset use limit comprises at least one of a CPU utilization rate of 80% at maximum, a memory occupation size of 200G, CPU of 8 at maximum and the like, the preset time comprises 10-12 points at night of the time when the user uses the cluster resource and the like, and the application is not limited to this.
Because the resource control policy sets the resource which can be used by the user and the policy how to use the resource, the available resource of the user on the login node can be managed based on the resource control policy. That is, it manages the resources that the user can use at the login node and how to use the resources.
For example, the resource control policy sets a preset type of the cluster resource that can be used by the user at the login node, a preset usage amount of the cluster resource that can be used, and the like, and may also set a preset time that can be used by the user at the login node, and the like. Then, the user can be controlled to only use the resource corresponding to the preset type at the login node, the size of the preset resource occupied by the user is controlled not to exceed the preset use limit, and the time for controlling the user to use the preset resource is required to be within the preset time range.
The main steps for managing the available resources of the user at the login node are as follows:
in the embodiment of the application, because the mandatory behavior which must be executed when the login node loads the environment is performed in the process of creating the preset file under the preset directory for each user to log in the cluster, the login behavior of the user is monitored by monitoring whether the file creation behavior exists under the preset directory of the login node, so that the login behavior of any user is not missed. Then, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, a resource control strategy is configured for the user according to the user identification, and the available resources of the user in the login node are managed according to the resource control strategy. Therefore, the method can prevent users from being missed when each user performs resource restriction on the login node, and ensure normal operation of the cluster.
In one embodiment, the default directory includes custom subdirectories contained in the/tmp directory.
The content of the/etc/profile.d directory in the linux system comprises environment variables required by a user for environment loading. In the linux system, an ordinary user needs to load environment variables under the/etc/profile.d directory in the process of logging in the cluster, and a preset file is created under a preset directory based on the loaded environment variables to realize environment loading. The preset directory is a custom subdirectory contained in the/tmp directory, and the preset file created under the custom subdirectory contained in the/tmp directory is a temporary file.
Specifically, an operating system adopted in the cluster is a linux system, and in the process that a common user logs in the cluster in the linux system, all logging nodes need to load environment variables under the/etc/profile.d directory. Therefore, the user can be forced to create temporary files under the custom subdirectory contained in the/tmp directory during the process of loading the environment variables.
In the embodiment of the application, because when a common user logs in a cluster in the linux system, the environment variables are required to be loaded under the/etc/profile.d directory. Therefore, the user can be forced to create the temporary file under the custom sub-directory contained in the/tmp directory in the process of loading the environment variable, and the login behavior of each user login cluster can be accurately monitored by monitoring whether the behavior of creating the temporary file exists under the custom sub-directory contained in the/tmp directory of the login node, so that the login behavior of any user can not be missed. Furthermore, resource management can be performed for each user, and the situation that a single user maliciously operates a complex computing program on a login node or the login node is overloaded for a long time and the cluster is halted due to the fact that the computing program is operated on the login node by mistake due to unfamiliar cluster operation is avoided, so that the normal operation of the cluster is influenced.
In one embodiment, the preset file is a custom file; monitoring whether a behavior of creating a preset file exists in a preset directory of a login node comprises the following steps:
and monitoring whether the behavior of creating the custom file exists under the custom sub-directory contained in the tmp directory through an inotify tool.
The inotify tool is also called an inotify file monitoring tool, is a powerful and fine-grained asynchronous file system monitoring mechanism, and can meet various file monitoring requirements. The method can monitor access attribute, read-write attribute, authority attribute, deletion, creation, movement and other operations of the file system, and can monitor almost all changes of the file. The inotify-tools are a set of development interface library functions based on C voice provided for an inotify file monitoring tool under a linux system, and a series of command line tools are provided at the same time, and the tools can be used for monitoring events of a file system. Since inotify-tools are written in C speech, they are not dependent on other software than kernel support.
Specifically, when a common user logs in a cluster in the linux system, the environment variable is loaded and a temporary file is created under the/tmp directory to load the environment. The temporary file is a custom file, and the content of the temporary file is not limited. Therefore, at the login node, whether the behavior of creating the custom file exists under the custom sub-directory contained in the inotify tool is monitored/tmp directory, so that the login behavior of each user login cluster can be accurately monitored, and the login behavior of any user can not be missed.
The temporary files are custom files, such as sshcgroup.sh files and sshcgroup.csh files, which are not limited in this application. The two files are files created under the custom sub-directory contained in the/tmp directory when the environment is loaded in the process of logging in the cluster for each user.
The main steps of monitoring whether the custom sub-directory contained in the tmp directory exists or not through the inotify tool are as follows:
inotifywait-mq--format'%f'-e create$cgtmp_path|while read-r file
in the embodiment of the application, since the inotify tool can monitor the change of the file, whether the behavior of creating the preset file exists in the preset directory of the login node or not can be monitored accurately by the inotify tool at the login node. And if the behavior of creating the preset file exists in the preset directory of the login node is monitored, configuring a resource control strategy for the user according to the user identification, and managing available resources of the user in the login node according to the resource control strategy. Therefore, when each user performs resource restriction on the login node, any user is not missed, and normal operation of the cluster is ensured.
In one embodiment, as shown in FIG. 3, the identity of the user includes the user's ID and the login process ID of the user login node; before configuring the resource control strategy for the user according to the identification of the user, the method further comprises the following steps:
Step 230, obtain the user ID and the login process ID of the user to login the login node from the custom file.
The identity of the user includes the user's ID and the login process ID of the user login node. Specifically, whether a behavior of creating a custom file exists under a preset directory of the login node is monitored, if the behavior of creating the custom file exists under the preset directory of the login node is monitored, the ID of the user and the login process ID of the user logging in the login node are obtained from the custom file. Because the login node creates the custom file under the preset directory based on the ID of the user and the login process ID generated when the user logs in the login node, the cluster can acquire the ID of the user and the login process ID of the user logging in the login node from the custom file.
The user ID (Identification) is a unique identity identifier when the user logs in a login node of the cluster through a terminal, and is used for identifying different users, and can be represented by a UID. The ID of the login process is an identity of the login process generated when the user logs in at the login node, and may be represented by PID. Based on the ID of the login process, the IDs of all sub-processes under the ID of the login process can be obtained.
The main steps of the cluster for acquiring the UID and the PID from the custom file are as follows:
in the embodiment of the present application, in the process that the user logs in to the login node of the cluster, because the login node is based on the ID of the user and the login process ID generated when the user logs in at the login node, the user-defined file is created under the preset directory, so that the cluster can obtain the ID of the user and the login process ID of the user logging in the login node from the user-defined file. The method is convenient for following the ID of the user and the login process ID of the user for logging in the login node, configures a resource control strategy for the user, and manages the available resources of the user in the login node according to the resource control strategy.
In one embodiment, as shown in fig. 4, step 240, configuring a resource control policy for a user according to an identification of the user includes:
step 242, judging whether a resource control group corresponding to the user identification exists on the login node, wherein the resource control group comprises a resource control strategy;
step 244, if so, configuring the resource control policies in the resource control group to the identity of the user.
Specifically, whether a behavior of creating a preset file exists under a preset directory of the login node is monitored. If the behavior of creating the preset file exists in the preset directory of the monitoring login node, the identification of the user is obtained. The user identification comprises a user ID and an ID of a login process generated when the user logs in the login node.
Further, it is determined whether or not there is a resource control group cgroup corresponding to the identification of the user on the login node. Wherein the resource control group comprises a resource control policy, and a unique resource control group is configured for each user in advance. The resource control group (CGroups) is a characteristic of Linux kernel, and is mainly used for isolating, limiting, auditing shared resources. Only controlling the resources allocated to different containers avoids competing resources for the host system when multiple containers are running simultaneously. Therefore, different resource control groups can provide the respective restriction and charging management for the resources such as the memory, the CPU, the disk IO and the like of different containers.
Since a unique resource control group is configured for each user in advance, if a resource control group corresponding to the user identifier exists on the login node, the resource control policy in the resource control group can be configured to the user identifier, that is, the user identifier is added to the corresponding resource control group. Therefore, the available resources of the user at the login node can be independently managed based on the resource control strategy in the resource control group, and the situation that the resource configuration is unbalanced because all users can only be controlled to share all resources in the traditional method is avoided.
In the embodiment of the application, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, the user ID of the user and the ID of the login process generated when the user logs in the login node are obtained. Judging whether a resource control group corresponding to the user identifier exists on the login node, if so, configuring the resource control strategy in the resource control group to the user identifier, namely adding the user ID into the resource control group. Subsequently, after the IDs of all the sub-processes under the ID of the login process corresponding to the user ID are obtained, the available resources of the login node can be managed for all the sub-processes corresponding to the user ID based on the resource control policy preset in the resource control group.
Therefore, resource restriction can be carried out on all sub-processes corresponding to each user at the login node, and normal operation of the cluster is ensured.
In one embodiment, as shown in fig. 5, the resource control policy is configured for the user according to the identification of the user, and further includes:
step 246, if not, creating a new resource control group, and establishing a corresponding relation between the user identification and the new resource control group;
step 248, the resource control policies in the new resource control group are configured to the identity of the user.
Specifically, whether a behavior of creating a preset file exists under a preset directory of the login node is monitored. If the behavior of creating the preset file exists in the preset directory of the monitoring login node, the identification of the user is obtained. The user identification comprises a user ID and an ID of a login process generated when the user logs in the login node.
Further, it is determined whether or not there is a resource control group cgroup corresponding to the identification of the user on the login node. If the login node does not have the resource control group corresponding to the user identification, the user is indicated to be the first login cluster. Since the user is the first login cluster, the login node does not store the resource control group corresponding to the user in advance. Therefore, a new resource control group can be established for the user, and a corresponding relation between the identification of the user and the new resource control group is established. Similarly, the resource control group includes a resource control policy.
Thus, after a new resource control group is established for the user, the resource control policy in the new resource control group is configured to the user identifier, i.e. the user identifier is added to the corresponding resource control group. Thus, the available resources of the user at the login node can be managed individually based on the resource control policy in the resource control group, and the situation that the resource configuration is unbalanced because all users can only be controlled to share all the resources as in the conventional method is avoided.
The main steps for creating the resource control group for the user are as follows:
cgcreate-g memory,cpu,cpuset:$login_user
cgset-r cpu.cfs_quota_us=180000-r memory.limit_in_bytes=4G-rmemory.oom_control=1-r cpuset.cpus=$cpus_set-r cpuset.mems=$mem_node$login_user
in the embodiment of the application, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, the user ID of the user and the ID of the login process generated when the user logs in the login node are obtained. Judging whether a resource control group corresponding to the user identifier exists on the login node, if so, configuring the resource control strategy in the resource control group to the user identifier, namely adding the user ID into the resource control group. If not, a new resource control group is created, a corresponding relation between the user identification and the new resource control group is established, and the resource control strategy in the new resource control group is configured to the user identification. The user ID can be added into the resource control group aiming at the user ID of the resource control group existing on the login node or the user ID of the resource control group not existing on the login node. Subsequently, after the IDs of all the sub-processes under the ID of the login process corresponding to the user ID are obtained, the available resources of the login node can be managed for all the sub-processes corresponding to the user ID based on the resource control policy preset in the resource control group.
Therefore, resource restriction can be carried out on all sub-processes corresponding to each user at the login node, and normal operation of the cluster is ensured.
In one embodiment, the user comprises a bash type user or a csh type user.
Wherein, bash and csh are types of shell in Linux, bash is an abbreviation of Bourne Again Shell, and is a default shell of Linux standard. While CShell (csh) provides user interaction features that the Bourne Shell cannot handle, such as command completion, command aliasing, historical command substitution, etc. However, CShell is not compatible with BourneShell.
Specifically, in the linux system, the user can select to use different shell environments, and the typical default is bash, that is, when the user executes commands or other executable scripts, the environments used for interpreting the commands or the scripts into the instructions that can be recognized by the system are determined according to the default shell of the user (in the case that the script is not specified). For example, the default shell for a user of the bash type is bash and the default shell for a user of the csh type is csh.
In the embodiment of the present application, the provided cluster resource control method is applicable to a bash type user or a csh type user, which is not limited in this application. Therefore, cluster resource control can be performed for various users, and the applicability of the cluster resource control method is improved.
In a specific embodiment, as shown in fig. 6, a cluster resource control method is provided, which is applied to an HPC (High Performance Computing, high-performance computer cluster), where the cluster includes a login node and a computing node. The method comprises the steps of, among other things,
step 602, monitoring whether a behavior of creating a custom file exists under a custom sub-directory contained in a/tmp directory of a login node through an inotify tool;
step 604, if so, acquiring the ID of the user and the login process ID of the user for logging in the login node from the user-defined file;
step 606, judging whether a resource control group corresponding to the user identification exists on the login node, wherein the resource control group comprises a resource control strategy;
step 608, if present, configures the resource control policies in the resource control group to the identity of the user.
Step 610, if not, creating a new resource control group, and creating a correspondence between the user's identification and the new resource control group; configuring a resource control strategy in the new resource control group to a user identifier;
and step 612, managing available resources of the user at the login node according to the resource control strategy.
In the embodiment of the application, since the inotify tool can monitor all changes of the file, whether the behavior of creating the preset file exists in the preset directory of the login node or not can be monitored accurately by the inotify tool at the login node. Further, if the behavior of creating the preset file exists in the preset directory of the login node is monitored, whether a resource control group corresponding to the user identifier exists on the login node is further judged, and the resource control group comprises a resource control strategy. If so, the resource control policy in the resource control group is configured to the identity of the user. If not, a new resource control group is created, a corresponding relation between the user identification and the new resource control group is established, and the resource control strategy in the new resource control group is configured to the user identification. Finally, the available resources of the user at the login node can be managed based on the resource control policy in the resource control group. Therefore, resource restriction on each user at the login node can be realized, and normal operation of the cluster is ensured.
In one embodiment, as shown in fig. 7, there is provided a cluster resource control device 700, wherein a cluster includes a login node, the device comprising:
The monitoring module 720 is configured to monitor whether a behavior of creating a preset file exists under a preset directory of the login node; creating a preset file under a preset directory, and carrying out mandatory actions of environment loading in the process of logging in the cluster for each user;
the resource control module 740 is configured to configure a resource control policy for the user according to the identifier of the user if the resource control policy exists, and manage available resources of the user at the login node according to the resource control policy; wherein the resource control policy includes a quota of available resources for the user.
In one embodiment, the default directory includes custom subdirectories contained in the/tmp directory.
In one embodiment, the preset file is a custom file; the monitoring module 720 is further configured to monitor, by using an inotify tool, whether a behavior of creating a custom file exists in a custom sub-directory included in the bitmap directory.
In one embodiment, as shown in fig. 8, there is provided a cluster resource control device, where the identifier of the user includes an ID of the user and a login process ID of the user logging in to a login node; before configuring the resource control strategy for the user according to the identification of the user, the method further comprises the following steps:
the ID obtaining module 730 is configured to obtain, from the custom file, an ID of a user and a login process ID of the user for logging in the login node.
In one embodiment, the resource control module 740 is further configured to determine whether a resource control group corresponding to the identifier of the user exists on the login node, where the resource control group includes a resource control policy; if so, the resource control policy in the resource control group is configured to the identity of the user.
In one embodiment, the resource control module 740 is further configured to create a new resource control group if the new resource control group does not exist, and establish a correspondence between the identifier of the user and the new resource control group; the resource control policies in the new resource control group are configured to the identity of the user.
In one embodiment, the user comprises a bash type user or a csh type user.
It should be understood that, although the steps in the flowcharts in the above figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the figures described above may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
The above-mentioned division of each module in the cluster resource control device is only used for illustration, and in other embodiments, the cluster resource control device may be divided into different modules as needed to complete all or part of the functions of the above-mentioned cluster resource control device.
For specific limitations of the cluster resource control device, reference may be made to the above limitation of the cluster resource control method, which is not repeated here. The modules in the cluster resource control device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a cluster is further provided, including a memory and a processor, where the memory stores a computer program, and the computer program when executed by the processor causes the processor to perform the steps of a cluster resource control method provided in the foregoing embodiments.
FIG. 9 is a schematic diagram of an internal structure of a cluster in one embodiment. As shown in fig. 9, the cluster includes a processor and a memory connected by a system bus. Wherein the processor is configured to provide computing and control capabilities to support operation of the entire cluster. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program is executable by a processor for implementing a cluster resource control method provided by the above embodiments. The internal memory provides a cached operating environment for operating system computer programs in the non-volatile storage medium. The cluster may be any terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant ), a POS (Point of Sales), a car-mounted computer, and a wearable device.
The implementation of each module in the cluster resource control device provided in the embodiments of the present application may be in the form of a computer program. The computer program may run on a cluster. Program modules of the computer program may be stored in the memory of the clusters. Which when executed by a processor, performs the steps of the methods described in the embodiments of the present application.
Embodiments of the present application also provide a computer-readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of a cluster resource control method.
A computer program product containing instructions that, when run on a computer, cause the computer to perform a cluster resource control method.
Any reference to memory, storage, database, or other medium used in embodiments of the present application may include non-volatile and/or volatile memory. Suitable nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The above cluster resource control examples only represent a few embodiments of the present application, which are described in more detail and detail, but are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. A method of cluster resource control, wherein the cluster includes a login node, the method comprising:
monitoring whether a behavior of creating a preset file exists under a preset directory of the login node; creating a preset file under a preset directory, and performing mandatory actions of each user in the process of logging in the cluster;
if yes, judging whether a resource control group corresponding to the user identification exists on the login node, wherein the resource control group comprises a resource control strategy;
if so, configuring a resource control strategy in the resource control group to the identifier of the user;
if not, a new resource control group is created, and a corresponding relation between the user identification and the new resource control group is established;
Configuring a resource control policy in the new resource control group to an identity of the user;
managing available resources of the user at the login node according to the resource control strategy; wherein the resource control policy includes an available resource quota for the user.
2. The method according to claim 1, wherein the default directory includes custom subdirectories included in a tmp directory.
3. The cluster resource control method according to claim 2, wherein the preset file is a custom file; the monitoring whether the behavior of creating the preset file exists in the preset directory of the login node comprises the following steps:
and monitoring whether the behavior of creating the custom file exists under the custom sub-directory contained in the/tmp directory through an inotify tool.
4. A cluster resource control method according to claim 3, wherein the identification of the user comprises an ID of the user and a login process ID of the user logging into the login node; before the resource control strategy is configured for the user according to the identification of the user, the method further comprises the following steps:
and acquiring the ID of the user and the login process ID of the user for logging in the login node from the custom file.
5. The cluster resource control method of claim 1, wherein the user comprises a bash type user or a csh type user.
6. The method according to any one of claims 1-5, wherein the resource control policy further includes a preset type and a preset time of a cluster resource that can be used by the user at the login node, where the preset type of the cluster resource includes at least one of a CPU, a memory, and a disk IO.
7. The method of claim 3 or 4, wherein the custom file comprises a sshcgroup.sh file or a sshcgroup.csh file.
8. A cluster resource control device, wherein the cluster includes a login node, the device comprising:
the monitoring module is used for monitoring whether a behavior of creating a preset file exists under the preset directory of the login node; creating a preset file under a preset directory, and performing mandatory actions of each user in the process of logging in the cluster;
the resource control module is used for judging whether a resource control group corresponding to the user identifier exists on the login node or not if so, wherein the resource control group comprises a resource control strategy; if so, configuring a resource control strategy in the resource control group to the identifier of the user; if not, a new resource control group is created, and a corresponding relation between the user identification and the new resource control group is established; configuring a resource control policy in the new resource control group to an identity of the user; managing available resources of the user at the login node according to the resource control strategy; wherein the resource control policy includes an available resource quota for the user.
9. A cluster resource control system comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the cluster resource control method of any of claims 1 to 7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the cluster resource control method according to any of claims 1 to 7.
CN202110942743.0A 2021-08-17 2021-08-17 Cluster resource control method and device, cluster and computer readable storage medium Active CN113806011B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110942743.0A CN113806011B (en) 2021-08-17 2021-08-17 Cluster resource control method and device, cluster and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110942743.0A CN113806011B (en) 2021-08-17 2021-08-17 Cluster resource control method and device, cluster and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113806011A CN113806011A (en) 2021-12-17
CN113806011B true CN113806011B (en) 2023-12-19

Family

ID=78893673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110942743.0A Active CN113806011B (en) 2021-08-17 2021-08-17 Cluster resource control method and device, cluster and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113806011B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1607824A2 (en) * 2004-06-18 2005-12-21 Circle Unlimited AG Method and system for resource management and licence management in a computer system
US7941709B1 (en) * 2007-09-28 2011-05-10 Symantec Corporation Fast connectivity recovery for a partitioned namespace
CN106790636A (en) * 2017-01-09 2017-05-31 上海承蓝科技股份有限公司 A kind of equally loaded system and method for cloud computing server cluster
CN107783836A (en) * 2016-08-31 2018-03-09 中国电信股份有限公司 Method and apparatus based on linux container control Web application resources
CN109150921A (en) * 2018-11-05 2019-01-04 郑州云海信息技术有限公司 A kind of login method of multi-node cluster, device, equipment and storage medium
CN111858020A (en) * 2019-04-30 2020-10-30 中移(苏州)软件技术有限公司 User resource limiting method, device and computer storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11252159B2 (en) * 2019-09-18 2022-02-15 International Business Machines Corporation Cognitive access control policy management in a multi-cluster container orchestration environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1607824A2 (en) * 2004-06-18 2005-12-21 Circle Unlimited AG Method and system for resource management and licence management in a computer system
US7941709B1 (en) * 2007-09-28 2011-05-10 Symantec Corporation Fast connectivity recovery for a partitioned namespace
CN107783836A (en) * 2016-08-31 2018-03-09 中国电信股份有限公司 Method and apparatus based on linux container control Web application resources
CN106790636A (en) * 2017-01-09 2017-05-31 上海承蓝科技股份有限公司 A kind of equally loaded system and method for cloud computing server cluster
CN109150921A (en) * 2018-11-05 2019-01-04 郑州云海信息技术有限公司 A kind of login method of multi-node cluster, device, equipment and storage medium
CN111858020A (en) * 2019-04-30 2020-10-30 中移(苏州)软件技术有限公司 User resource limiting method, device and computer storage medium

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Arbiter: Dynamically Limiting Resource Consumption on Login Nodes;Dylan Gardner et al.;Practice and Experience in Advanced Research Computing (PEARC ’19)(第32期);第1-7页 *
一种高性能计算集群登录节点资源限制方法;唐金等;实验室研究与探索;第40卷(第04期);第24-26+47页 *
司占军等.数字媒体技术.北京:中国轻工业出版社,2020,第28-29页. *
大规模异构计算集群的双层作业调度系统;孙震宇等;计算机工程;第46卷(第01期);第187-195页 *
实时同步云存储客户端的设计与实现;刘光亚;中国优秀硕士学位论文全文数据库 (信息科技辑)(第07期);第I137-51页 *
青岛英谷教育科技股份有限公司.云计算与虚拟化技术.西安:西安电子科技大学出版社,2018,第121-125页. *

Also Published As

Publication number Publication date
CN113806011A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN108829459B (en) Nginx server-based configuration method and device, computer equipment and storage medium
WO2020232884A1 (en) Data table migration method, apparatus, computer device and storage medium
US20160139949A1 (en) Virtual machine resource management system and method thereof
CN106775946B (en) A kind of virtual machine Method of Creation Process
CN111723079A (en) Data migration method and device, computer equipment and storage medium
CN110427258B (en) Resource scheduling control method and device based on cloud platform
CN111338779B (en) Resource allocation method, device, computer equipment and storage medium
CN112612988A (en) Page processing method and device, computer equipment and storage medium
CN111885184A (en) Method and device for processing hot spot access keywords in high concurrency scene
CN111045802B (en) Redis cluster component scheduling system and method and platform equipment
CN112230857A (en) Hybrid cloud system, hybrid cloud disk application method and data storage method
US10684895B1 (en) Systems and methods for managing containerized applications in a flexible appliance platform
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN107276998B (en) OpenSSL-based performance optimization method and device
US20210389994A1 (en) Automated performance tuning using workload profiling in a distributed computing environment
CN111399999B (en) Computer resource processing method, device, readable storage medium and computer equipment
CN111858020B (en) User resource limiting method and device and computer storage medium
CN113806011B (en) Cluster resource control method and device, cluster and computer readable storage medium
US20230171179A1 (en) Method for testing pressure, electronic device and storage medium
CN111104198A (en) Method, equipment and medium for improving operation efficiency of scanning system plug-in
CN111159271A (en) Data processing method and device, computer equipment and storage medium
KR102456017B1 (en) Apparatus and method for file sharing between applications
CN115951845A (en) Disk management method, device, equipment and storage medium
CN116450165A (en) Method, system, terminal and storage medium for quickly building environment and deploying program
CN112564979B (en) Execution method and device of construction task, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant