CN111209107A - Multi-cluster operation method - Google Patents

Multi-cluster operation method Download PDF

Info

Publication number
CN111209107A
CN111209107A CN201911362939.1A CN201911362939A CN111209107A CN 111209107 A CN111209107 A CN 111209107A CN 201911362939 A CN201911362939 A CN 201911362939A CN 111209107 A CN111209107 A CN 111209107A
Authority
CN
China
Prior art keywords
cluster
user
authority
operates
administrator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911362939.1A
Other languages
Chinese (zh)
Inventor
胡梦龙
张涛
原帅
吕灼恒
王家尧
胡辰
王新雷
李斌
沙超群
厉军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Co Ltd filed Critical Dawning Information Industry Co Ltd
Priority to CN201911362939.1A priority Critical patent/CN111209107A/en
Publication of CN111209107A publication Critical patent/CN111209107A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a multi-cluster operation method, which comprises the following steps: adding an attribute to the user; and the administrator determines that the user operates the single cluster, or operates the current login cluster, or operates the multiple clusters by setting the value of the attribute. Through the technical scheme, the problem that the cluster information is exposed to the user can be solved.

Description

Multi-cluster operation method
Technical Field
The invention relates to the technical field of computer clusters, in particular to a multi-cluster operation method.
Background
The SLURM is an open-source cluster job scheduling system with good fault tolerance and high scalability, and has the key functions of: allocating computing resources to perform work tasks; providing a framework for starting, executing, monitoring jobs on the assigned node sets; arbitration resource contention issues. The cluster consists of all nodes managed by one slarmctld daemon.
SLURM provides the ability to target commands to other clusters, rather than, or in addition to, the local cluster that invoked the command. After enabling this behavior, the user may submit jobs to one or more clusters and receive status from these remote clusters. Part of the client commands now provide an "-M" -clusters ═ "option that provides the ability to communicate with comma-separated cluster lists.
At present, a user must explicitly specify a cluster list by using an option of "-M, — clusters", and an administrator must expose cluster information to the user, which cannot meet the requirement of the administrator for controlling the cluster information. SLURM provides temporarily no functionality to shield the cluster information from the user.
Disclosure of Invention
In view of the above problems in the related art, the present invention provides a multi-cluster operation method, which can eliminate the need for explicitly specifying cluster names.
The technical scheme of the invention is realized as follows:
according to an aspect of the present invention, there is provided a multi-cluster operation method, including:
adding an attribute to the user;
the administrator determines whether to operate a single cluster by the user, or to operate a current login cluster, or to operate multiple clusters by setting the values of the attributes.
According to an embodiment of the present invention, adding an attribute to a user comprises: a field is added to the database for the user table, which is a list of cluster names operable by the user.
According to an embodiment of the present invention, the administrator setting the values of the attributes includes: for a user having authority only for a first cluster, when the value of the field set by the administrator is the name of the first cluster, if the user logs in the first cluster having the authority, the user operates the first cluster, and if the user logs in a second cluster having no authority, the user operates the first cluster.
According to an embodiment of the present invention, the administrator setting the values of the attributes includes: for users having authority over both the first cluster and the second cluster, when the administrator sets the value of the field as the first cluster name, if the user logs in the first cluster having authority, the user operates the first cluster, and if the user logs in the second cluster having authority, the user operates the first cluster.
According to an embodiment of the present invention, the administrator setting the values of the attributes includes: for users having authority over both the first cluster and the second cluster, when the administrator sets the value of the field as the current cluster name, if the user logs in the first cluster having authority, the user only operates the first cluster, and if the user logs in the second cluster having authority, the user only operates the second cluster.
According to an embodiment of the present invention, the administrator setting the values of the attributes includes: for a user having authority over both the first cluster and the second cluster, when the administrator sets the values of the fields as the first cluster name and the second cluster name, or all the cluster names, if the user logs in the first cluster having the authority, the user operates the first cluster and the second cluster, and if the user logs in the second cluster having the authority, the user operates the first cluster and the second cluster.
According to the embodiment of the invention, the operation of a user on a single cluster, or the operation of a current login cluster, or the operation of a multi-cluster comprises the following steps: for submitting jobs to a single cluster, or a currently logged-on cluster, or multiple clusters.
According to the embodiment of the invention, when the submission job is executed and when the multi-cluster operation needs to be executed, the user and cluster information in the database are sequentially inquired and the cluster list is returned so as to select the cluster from all available clusters to submit the job.
The technical scheme of the invention realizes the SLURM dynamically configurable multi-cluster operation method. An attribute is added to a user, and an administrator determines that the user submits jobs to functions of a single cluster, a current login cluster, a multi-cluster and the like by setting the value of the attribute. Therefore, the user does not need to care about the cluster information, and the problem that the cluster information is exposed to the user in the prior art is solved. By default, the user may submit jobs to all clusters. In addition, the control function of the cluster administrator is enhanced, and the requirement of the cluster administrator on protecting cluster information is met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of a method of multi-cluster operation according to an embodiment of the invention;
FIG. 2 is a flow diagram of a batch submit job command according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
FIG. 1 is a flow chart of a method of multi-cluster operation according to an embodiment of the invention. As shown in fig. 1, the multi-cluster operation method of the embodiment of the present invention may include the following steps:
s11, adding an attribute for the user;
s12, the administrator determines whether the user operates the single cluster, the current login cluster or the multi-cluster by setting the value of the attribute.
According to the technical scheme, by adding the attributes, an administrator determines the functions of a user operation list cluster, a current login cluster, a multi-cluster and the like by setting the user attributes. Therefore, in contrast to the prior art, the user does not need to explicitly specify the cluster name.
Specifically, a feandmulticycle field may be added to the user table in the database, where the meaning of the field is a list of cluster names that can be operated by the user, and the specific setting conditions include:
(1) for users having authority only to one cluster a (the first cluster), the administrator sets feandmulticruster a.
A user logs in the cluster A with the authority and can operate the cluster A;
a user logs in a cluster B (a second cluster) without authority and can operate the cluster A;
(2) for users with authority in two clusters a, B, the administrator sets feandmulticruster a.
A user logs in the cluster A with the authority and can operate the cluster A;
a user logs in the cluster B with the authority and can operate the cluster A;
(3) for users with authority in the two clusters A and B, the administrator sets the current as the fendnmulticruster.
A user logs in the cluster A with the authority and can only operate the cluster A;
a user logs in the cluster B with the authority and can only operate the cluster B;
(4) for users with authority over two clusters a, B, the administrator sets either feandmulticluster-a, B or feandmulticluster-all.
A user logs in the cluster A with the authority and can operate the clusters A and B;
and the user logs in the cluster B with the authority and can operate the clusters A and B.
In one embodiment, the SLURM multi-cluster operation commands are numerous, and the dynamic configuration implementation principle is described by taking a batch commit job command sbatch as an example, and a sbatch code processing flow chart is shown in fig. 2, which includes:
1) the sbatch command starts to be executed, firstly, a configuration file (slarm. conf) is analyzed, and some key parameters are stored;
2) analyzing and storing parameters transmitted from sources such as a job script, an environment variable, a command line and the like, and processing the condition of an option of ' M ' -Cluster ';
3) filling a job structure according to the parameters obtained in the above two steps, the structure containing all necessary information for execution of one job;
4) and judging whether to execute the multi-cluster operation according to opt. If there are multiple clusters, execute the slarmdb _ get _ first _ avail _ cluster, this function interacts with the database daemon slarmdb, execute job _ will _ run, slarmjb _ will _ run2 and job _ will _ run _ cluster in turn, the function is to select one suitable cluster from all available clusters to submit the job. Calling slurmdb _ get _ info _ cluster inside the function, sequentially inquiring user and cluster information in the mysql database, and returning appropriate cluster list information;
5) if not, executing the slarm _ submit _ batch _ jobs;
6) step 4) and step 5) call the slm _ send _ resv _ controller _ msg, and the function packs the job information and sends the job information to the management node daemon slrmctld of the cluster configured by the user to wait for scheduling and execution.
It should be understood that other commands related to multi-cluster operations may be processed similarly to fig. 2.
In summary, the technical solution of the present invention realizes a method for dynamically configurable multi-cluster operation by SLURM. An attribute is added to a user, and an administrator determines that the user submits jobs to functions of a single cluster, a current login cluster, a multi-cluster and the like by setting the value of the attribute. Therefore, the user does not need to care about the cluster information, and the problem that the cluster information is exposed to the user in the prior art is solved. By default, the user may submit jobs to all clusters. In addition, the control function of the cluster administrator is enhanced, and the requirement of the cluster administrator on protecting cluster information is met.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A method of multi-cluster operation, comprising:
adding an attribute to the user;
and the administrator determines that the user operates the single cluster, or operates the current login cluster, or operates the multiple clusters by setting the value of the attribute.
2. The method of claim 1, wherein adding an attribute to the user comprises:
a field is added to the database for the user table, which is a list of cluster names operable by the user.
3. The multi-cluster operation method of claim 1, wherein setting the value of the attribute by an administrator comprises:
for users having permission only to the first cluster, when the administrator sets the value of the field to the first cluster name,
if the user logs in the first cluster with the authority, the user operates the first cluster,
and if the user logs in a second cluster without permission, the user operates the first cluster.
4. The multi-cluster operation method of claim 1, wherein setting the value of the attribute by an administrator comprises:
for a user having authority over both the first cluster and the second cluster, when the administrator sets the value of the field to the first cluster name,
if the user logs in the first cluster with the authority, the user operates the first cluster,
and if the user logs in the second cluster with the authority, the user operates the first cluster.
5. The multi-cluster operation method of claim 1, wherein setting the value of the attribute by an administrator comprises:
for a user having permission to both the first cluster and the second cluster, when the administrator sets the value of the field to the current cluster name,
if the user logs in the first cluster with the authority, the user only operates the first cluster,
and if the user logs in a second cluster with authority, the user only operates the second cluster.
6. The multi-cluster operation method of claim 1, wherein setting the value of the attribute by an administrator comprises:
for a user having rights to both the first cluster and the second cluster, when the administrator sets the value of the field to the first cluster name and the second cluster name, or to all cluster names,
if the user logs in the first cluster with the authority, the user operates the first cluster and the second cluster,
and if the user logs in the second cluster with the authority, the user operates the first cluster and the second cluster.
7. The multi-cluster operation method of claim 1, wherein operating a single cluster, or operating a current login cluster, or operating a multi-cluster by a user comprises:
for submitting jobs to a single cluster, or a currently logged-on cluster, or multiple clusters.
8. The multi-cluster operation method of claim 2, wherein, when a commit job is executed,
and when the multi-cluster operation needs to be executed, sequentially inquiring the user and cluster information in the database and returning a cluster list so as to select a cluster from all available clusters to submit the operation.
CN201911362939.1A 2019-12-26 2019-12-26 Multi-cluster operation method Pending CN111209107A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911362939.1A CN111209107A (en) 2019-12-26 2019-12-26 Multi-cluster operation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911362939.1A CN111209107A (en) 2019-12-26 2019-12-26 Multi-cluster operation method

Publications (1)

Publication Number Publication Date
CN111209107A true CN111209107A (en) 2020-05-29

Family

ID=70782533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911362939.1A Pending CN111209107A (en) 2019-12-26 2019-12-26 Multi-cluster operation method

Country Status (1)

Country Link
CN (1) CN111209107A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645022A (en) * 2009-08-28 2010-02-10 曙光信息产业(北京)有限公司 Work scheduling management system and method for a plurality of colonies
CN103294485A (en) * 2013-06-27 2013-09-11 曙光信息产业(北京)有限公司 Web service packaging method and Web service packaging system both used for ABINIT parallel computing system
CN105183820A (en) * 2015-08-28 2015-12-23 广东创我科技发展有限公司 Multi-tenant supported large data platform and tenant access method
CN106165367A (en) * 2014-12-31 2016-11-23 华为技术有限公司 A kind of access control method, storage device and control system storing device
CN107895113A (en) * 2017-12-06 2018-04-10 北京搜狐新媒体信息技术有限公司 A kind of fine-grained data authority control method and system for supporting the more clusters of hadoop
US20190089812A1 (en) * 2016-03-31 2019-03-21 Alibaba Group Holding Limited Routing method and device
CN109740373A (en) * 2018-12-19 2019-05-10 福建新大陆软件工程有限公司 A kind of Hadoop cluster management method, system and platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645022A (en) * 2009-08-28 2010-02-10 曙光信息产业(北京)有限公司 Work scheduling management system and method for a plurality of colonies
CN103294485A (en) * 2013-06-27 2013-09-11 曙光信息产业(北京)有限公司 Web service packaging method and Web service packaging system both used for ABINIT parallel computing system
CN106165367A (en) * 2014-12-31 2016-11-23 华为技术有限公司 A kind of access control method, storage device and control system storing device
CN105183820A (en) * 2015-08-28 2015-12-23 广东创我科技发展有限公司 Multi-tenant supported large data platform and tenant access method
US20190089812A1 (en) * 2016-03-31 2019-03-21 Alibaba Group Holding Limited Routing method and device
CN107895113A (en) * 2017-12-06 2018-04-10 北京搜狐新媒体信息技术有限公司 A kind of fine-grained data authority control method and system for supporting the more clusters of hadoop
CN109740373A (en) * 2018-12-19 2019-05-10 福建新大陆软件工程有限公司 A kind of Hadoop cluster management method, system and platform

Similar Documents

Publication Publication Date Title
EP3633506B1 (en) Programmatic event detection and message generation for requests to execute program code
US20190324819A1 (en) Distributed-system task assignment method and apparatus
JP6571277B2 (en) Method and apparatus for loading terminal application
US5689708A (en) Client/server computer systems having control of client-based application programs, and application-program control means therefor
JP2020501253A (en) On-demand code execution in a localized device coordinator
CN110352401B (en) Local device coordinator with on-demand code execution capability
CN112395107A (en) Tax control equipment control method and device, storage medium and electronic equipment
CN108701132B (en) Resource management system and method
US20050055429A1 (en) System and method for providing data and services for appliances, and appliances which use the provided data and services
JP2017090961A (en) Information processing device, control method thereof, and program
US11645098B2 (en) Systems and methods to pre-provision sockets for serverless functions
WO2021022714A1 (en) Message processing method for cross-block chain node, device, apparatus and medium
WO2014171130A1 (en) Information processing system, deployment method, processing device, and deployment device
CN111240864A (en) Asynchronous task processing method, device, equipment and computer readable storage medium
WO2024066342A1 (en) Task processing method and apparatus, electronic device, and storage medium
CN112448987A (en) Fusing degradation triggering method and system and storage medium
JP2021518014A (en) On-demand code execution with limited memory footprint
JP2021506002A (en) Resource processing methods and systems, storage media, electronic devices
WO2018096717A1 (en) Control system and control method
US20100122261A1 (en) Application level placement scheduler in a multiprocessor computing environment
CN115794355B (en) Task processing method, device, terminal equipment and storage medium
CN111209107A (en) Multi-cluster operation method
CN112860421A (en) Method, apparatus and computer program product for job processing
US20110246553A1 (en) Validation of internal data in batch applications
CN114048460A (en) Cross-platform automatic data batch processing method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination