WO2022105337A1 - 一种任务调度方法与系统 - Google Patents

一种任务调度方法与系统 Download PDF

Info

Publication number
WO2022105337A1
WO2022105337A1 PCT/CN2021/114299 CN2021114299W WO2022105337A1 WO 2022105337 A1 WO2022105337 A1 WO 2022105337A1 CN 2021114299 W CN2021114299 W CN 2021114299W WO 2022105337 A1 WO2022105337 A1 WO 2022105337A1
Authority
WO
WIPO (PCT)
Prior art keywords
job
scheduling
name
executed
performance computer
Prior art date
Application number
PCT/CN2021/114299
Other languages
English (en)
French (fr)
Inventor
吴璨
王小宁
肖海力
迟学斌
和荣
卢莎莎
Original Assignee
中国科学院计算机网络信息中心
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算机网络信息中心 filed Critical 中国科学院计算机网络信息中心
Priority to US17/635,260 priority Critical patent/US20230342191A1/en
Publication of WO2022105337A1 publication Critical patent/WO2022105337A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/482Application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/547Messaging middleware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of high-performance computing, and more specifically, to a task scheduling method and system.
  • the cross-cluster computing service environment aggregates the computing resources of clusters distributed in different regions or even belonging to different organizations, and provides a unified computing service environment for users.
  • the cross-cluster computing service environment shields the heterogeneity of underlying computing resources, job management systems, access methods, and management systems, and provides users with high-level computing application services with unified access portals, usage methods, and user technical support.
  • the various task scheduling algorithms in the cross-cluster computing service environment are different from the traditional job scheduling concept. They are resource selection and matching between clusters on the cluster job resource management system, and belong to the task scheduling of the application layer.
  • resource selection and matching between clusters on the cluster job resource management system and belong to the task scheduling of the application layer.
  • multiple computing models, and cross-cluster computing under big data storage how to reasonably allocate computing tasks, make full use of computing resources, and achieve the best energy efficiency while meeting user application requirements is the most basic and important task scheduling strategy. Problems to be solved.
  • the purpose of this application is to solve the problems existing in the prior art, and to quickly integrate different task scheduling algorithms into a cross-cluster computing environment by means of software configuration, without affecting the running services.
  • the present application proposes a task scheduling system, which includes: a job request collection and distribution module, at least one scheduling service module, and at least one job execution service module; wherein: the job request collection and distribution module A job execution request for receiving a job to be executed, the job execution request including request description information of the job to be executed, the request description information including a job scheduling algorithm name and a global identifier of the job to be executed; the at least one scheduling service module The current scheduling service module matching the name of the job scheduling algorithm is used to determine the job scheduling result according to the request description information and the computing resource information of at least one available computing cluster; wherein, the job scheduling result includes the job global identification, the device identification of the high-performance computer used to execute the job to be executed, and the name of the job execution service; the current job execution service module in the at least one job execution service module that matches the name of the job execution service is used to receive all the The scheduling result determined by the current scheduling service module, and according to the device identifier and the global identifier of the job to be
  • the current scheduling service module is further configured to generate job description information by using the request description information and the job scheduling result, and provide the job description information to the job request collection and distribution module;
  • the job request collection and distribution module is further configured to distribute the job description information to the current job execution service module according to the job execution service name carried in the job scheduling result included in the job description information.
  • the request description information further includes the name of the application required by the job, the name of the queue required by the job, and the number of cores of the high-performance computer required by the job;
  • the current scheduling service module in the at least one scheduling service module that matches the job scheduling algorithm name can also be used to obtain computing resource information;
  • the computing resource information includes: an application list and application resources; wherein, the application list for indicating at least one application program respectively deployed in at least one high-performance computer of the at least one available computing cluster, and the application resource is used to indicate at least one application program included in each of the high-performance computers in the at least one computing cluster Computing queues, each computing queue includes the number of cores of its corresponding high-performance computer.
  • the job request description information further includes one or more of the job name, the version of the application required for the job, and the expected running time of the job.
  • the present application provides a task scheduling method, the method includes: receiving a job execution request of a job to be executed, the job execution request including request description information of the to-be-executed job, the request description information including a job scheduling algorithm name and The global identifier of the job to be executed; the job scheduling is determined according to the request description information and the computing resource description information of at least one available computing cluster through the current scheduling service module in at least one scheduling service module that matches the name of the job scheduling algorithm result; wherein, the job scheduling result includes a job global identifier, a device identifier of a high-performance computer used to execute the job to be executed, and a job execution service name;
  • the matching current job execution service module receives the scheduling result determined by the current scheduling service module, and submits the to-be-executed job to the A high-performance computer for executing the to-be-executed job.
  • the method further comprises: generating job description information using the request description information and the job scheduling result through the current scheduling service module, and providing the job description information to the job request collection and distribution module;
  • the job request collection and distribution module distributes the job description information to the current job execution service module according to the job execution service name carried in the job scheduling result included in the job description information.
  • the request description information further includes the name of the application required by the job, the name of the queue required by the job, and the number of cores of the high-performance computer required by the job;
  • the method may further include: obtaining computing resource information through the current scheduling service module; the computing resource information includes: an application list and application resources; wherein the application list is used to indicate the at least one available computing cluster. At least one application program respectively deployed in at least one high-performance computer, the application resource is used to indicate at least one computing queue included in each high-performance computer in the at least one computing cluster, and each computing queue includes its corresponding of high-performance computers.
  • the job request description information in the method further includes the job name, the version of the application required for the job, and the estimated running time of the job.
  • the task scheduling algorithm can be developed in strict accordance with the standards, and finally an independent service is formed, that is, an independent scheduling service module is formed, and multiple scheduling services do not affect each other. After registration, it can be deployed directly to the computing cluster environment without modifying the original code, without affecting existing services, and with high scalability.
  • FIG. 1 is a schematic diagram of a task scheduling system provided in an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another task scheduling system provided in an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a task scheduling method provided in an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of another task scheduling method provided in an embodiment of the present application.
  • the device for integrating multiple task scheduling algorithms for a cross-cluster computing service environment decouples multiple task scheduling algorithms from the computing environment, and each task scheduling algorithm is an independent service. It provides a cluster computing resource information query interface for scheduling algorithm developers, and clearly defines the standard format of input and output of task scheduling services.
  • Each task scheduling algorithm is developed in strict accordance with the standard.
  • the developers of the scheduling algorithm do not need to know the implementation details of the system code, but only need to exchange information in a standard format, and then the scheduling algorithm can be integrated into the cross-cluster computing service environment.
  • Each task scheduling algorithm is encapsulated as a service with unified service input and output information.
  • service registration must be performed first, and the authorized service will get an authorization code.
  • Each service is a jar file, and the service is started by the Java startup command Java-jar***.jar.
  • FIG. 1 is a schematic diagram of a task scheduling system provided by the present application.
  • the device for integrating multiple task scheduling algorithms for cross-cluster computing services includes at least one job request collection and distribution module 101 , at least one scheduling service module 102 and at least one job execution service module 103 .
  • the number of the job request collection and distribution module 101 , the scheduling service module 102 and the job execution service module 103 may be equal or different.
  • the task scheduling system adopts a distributed and cross-domain multi-cluster environment, a plurality of the same above-mentioned modules can be deployed in the system.
  • HPCs are usually distributed in different locations, the implementation form and geographic location of other modules also vary according to requirements.
  • the job submission service, the scheduling service, and the job execution service are all low-level services of the computer, which can be implemented by using a jar package.
  • the job request collection and distribution module may be message middleware. As shown in FIG. 2 , the job submission service, the message middleware, at least one scheduling service and at least one job execution service are respectively deployed on different servers.
  • job submission service may be deployed on one server at the same time, or may be deployed on multiple servers respectively.
  • the deployment manner shown in FIG. 2 is only a specific implementation manner provided by the embodiment of the present application, and does not limit the deployment of each service and the message middleware in the embodiment of the present application.
  • both the scheduling service and the job execution service need to be registered.
  • the administrator When registering, the administrator will assign a service name to each registered service, and create a message queue named after the service name for each service in the message middleware.
  • the algorithm name of the scheduling algorithm used by the scheduling service may be used as the name of the service.
  • each job execution service corresponds to a high-performance computer
  • the upper management program when assigning a service name to the job execution service, can use the name of the high-performance computer corresponding to the job execution service as the job execution service The name.
  • the user submits a job request through the job submission service, and the job submission service checks the validity of the job description information of the job to be executed included in the job request, and sends the qualified job description information to the message middleware.
  • the file stores the job description information in the message queue matching the job scheduling algorithm name according to the job scheduling algorithm name in the job description information.
  • the scheduling service periodically receives job description information from its corresponding message queue, and determines the job scheduling result according to the received job description information and the computing resource description information of at least one available computing cluster, where the job scheduling result includes the name of the high-performance computer .
  • the scheduling service stores the job scheduling results and job description information in message queues that match the names of the high-performance computers included in the job scheduling results.
  • the job execution service periodically receives job description information and scheduling result information from its corresponding message queue, and submits the job to the specified HPC for running according to the HPC name assigned in the scheduling result.
  • the request collection and distribution module 101 receives a job execution request of a job to be executed, where the job execution request includes request description information of the job to be executed.
  • Table 1 illustrates a request description information.
  • the request description information at least includes the name of the job scheduling algorithm and the global identifier of the job to be executed.
  • the description information further includes one or more of the job name, the application name required by the job, the application version required by the job, the queue name required by the job, the number of cores required by the job, and the estimated running time of the job.
  • the application refers to the application provided by the high-performance computing environment
  • the queue refers to the computing queue that can be used by each application in the high-performance computing environment
  • the number of cores refers to the number of cores of each computer in the computing queue
  • the request collection and distribution module 101 determines a scheduling service module 102 from at least one scheduling service module 102 according to the job scheduling algorithm name in the description information, and sends a job execution request to the scheduling service module 102 .
  • the scheduling service module 102 can be determined according to other information in the description information, or can be arbitrarily specified.
  • At least one scheduling service module 102 runs a job scheduling algorithm, and each service module 102 runs a different job scheduling algorithm. Each is configured with a cluster computing resource information query interface.
  • the device can integrate different scheduling algorithms. Currently, the integrated scheduling algorithms include AWFS (Apllication Weight First Schedule, that is, load priority scheduling algorithm) and ATFS (Application Time First Schedule, that is, time priority scheduling algorithm).
  • AWFS Application Weight First Schedule
  • ATFS Application Time First Schedule
  • the scheduling service module in at least one scheduling service module 102 that matches the name of the job scheduling algorithm determines the job scheduling result according to the request description information and the computing resource description information of at least one available computing cluster .
  • the scheduling service module determines the computing resources required by the job to be executed according to the description information, and then obtains the currently available cluster computing resources according to the obtained computing resources required by the job to be executed, and through the interface of the cross-cluster computing service environment Resource information to determine job scheduling results.
  • the cross-cluster computing service environment provides an interface for querying cluster computing resource information.
  • the interface includes one or more of a high performance computer (HPC) list query interface, an application list query interface, an application resource query interface, a job query interface, or a combination thereof.
  • HPC high performance computer
  • a high performance computing environment may deploy different applications, each of which may be identified by an application name. If the description information specifies the application name of the application required by the job, the scheduling service module can query the application required by the job through the application resource query interface according to the application name. In some cases, applications have different versions; if the description information specifies the application version of the application required by the job, the scheduling service module can query the corresponding version of the application required by the job through the application resource query interface.
  • each compute queue including a queue name. If the description information specifies the name of the queue required by the job, the scheduling service module can query the computing queue that can be used to execute the job through the application resource query interface according to the queue name.
  • each compute queue has a different number of available compute cores.
  • the determined computing queue should have a number of cores not less than the number of high-performance computer cores required by the job.
  • the job scheduling result of the scheduling service module 102 may be as shown in Table 3.
  • the job scheduling result includes a job global identifier, a device identifier of a high-performance computer to be used to execute the job to be executed, and a job execution service name.
  • the job execution service is named after the machine name, the scheduling result is the machine name, and the scheduling is to assign the job to a certain machine.
  • the description information may further include the name of the application required by the job, the name of the queue required by the job, and the number of cores of the high-performance computer required by the job
  • the scheduling service module is further used to obtain computing resource information.
  • the computing resource information includes: an application list and application resources; wherein the application list is used to indicate at least one application program respectively deployed in at least one high-performance computer of the at least one available computing cluster, and the application resources are used for Indicates at least one computing queue included in each high performance computer (High Performance Computer, HPC) in the at least one computing cluster, and each computing queue may include the number of cores of its corresponding high performance computer.
  • a computing cluster can have multiple HPCs, and a HPC can have multiple queues.
  • the current scheduling service module may be further configured to generate job description information using the request description information and the job scheduling result, and provide the job description to the job request collection and distribution module information. Then, the job request collection and distribution module distributes the job description information to a job execution service module matching the job execution service name according to the job description information.
  • the current job execution service module in at least one job execution service module 103 that matches the job execution service name is configured to receive the scheduling result determined by the scheduling service module, and according to the device identifier included in the scheduling result and the to-be-to-be
  • the global identifier of the execution job, and the to-be-executed job is submitted to a high-performance computer for executing the to-be-executed job.
  • the job execution service can be named with the name of the high-performance computer, and the scheduling result information includes the name of the high-performance computer, that is, the high-performance computer on which the job is scheduled to be executed.
  • the job execution service named after this name will receive the message and submit the job to the high-performance computer for execution.
  • the function of the job execution service is to receive job information, and then submit the job information to the high-performance computer for execution. This is because the service cannot be deployed on the high-performance computer and cannot receive messages, so the front-end service needs to receive the information and then submit it to the high-performance computer for execution.
  • the job request collection and distribution module decouples the tightly coupled scheduling service module and the job execution service module, which can improve the speed of problem solving, and most importantly, can reduce the possibility of hidden dangers in the future.
  • FIG. 3 is a flowchart of a task scheduling method according to an embodiment of the present application, and the method may be implemented in the system shown in FIG. 1 .
  • the method may at least include the following steps 301 , 302 and 305 .
  • a job execution request to be executed is received, the job request to be executed includes request description information of the job to be executed, and the request description information includes the name of the job scheduling algorithm and the global identifier of the job to be executed.
  • a job scheduling result is determined by the job scheduling algorithm according to the request description information and the computing resource description information of at least one available computing cluster.
  • the job scheduling result includes a job global identifier, a device identifier of a high-performance computer for executing the job to be executed, and a job execution service name.
  • step 305 the scheduling result is received through the current job execution service module that matches the job execution service name in the job scheduling result, and according to the device identifier included in the scheduling result and the to-be-executed
  • the global identifier of the job, and the job to be executed is submitted to the high-performance computer for the job to be executed.
  • FIG. 4 is a schematic flowchart of another task scheduling method provided in an embodiment of the present application. Compared with the flow of Fig. 3, before step 305, Fig. 4 further includes step 303 and step 304.
  • step 303 the scheduling service module currently executing the scheduling uses the request description information and the job scheduling result to generate job description information, and provides the job description information to the job request collection and distribution module.
  • step 304 the job description information is distributed to the job execution service module matching the job execution service name according to the job description information through the job request collection and distribution module.
  • step 305 may specifically include step 3051, the job execution service module matching the job execution service name receives the job description information, and according to the device identifier and the global identifier contained in the job description information, the job to be executed. Submitted to a high performance computer for executing the job to be performed.
  • a job submission service module may be deployed on the user terminal, the job submission service module checks the validity of the job description information, and sends the qualified job description information to the job request collection and distribution module.
  • the task scheduling method and system provided by the present application provide a cluster computing resource information query interface for scheduling algorithm developers, and clearly define the input and output standard format of the task scheduling service.
  • Each task scheduling algorithm is developed in strict accordance with the standard, using the scheduling algorithm oriented to the high-performance computing environment as a task scheduling device that includes multiple task scheduling services. It can be integrated and has high scalability.
  • the developers of the scheduling algorithm do not need to know the implementation details of the system code, but only need to exchange information in a standard format, and then the scheduling algorithm can be integrated into the cross-cluster computing service environment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Stored Programmes (AREA)

Abstract

本发明提供一种任务调度方法与系统,该系统包括:作业请求收集与分发模块,至少一个调度服务模块,至少一个作业执行服务模块。作业请求收集与分发模块从用户终端接收待执行作业的第一描述信息。至少一个调度服务模块中与作业调度算法名称相匹配的当前调度服务模块,根据第一描述信息确定待执行作业所需要的计算资源,然后根据所需要的计算资源以及当前可用集群计算资源,确定作业调度结果。通过至少一个作业执行服务模块中与作业执行服务名称相匹配的当前作业执行服务模块,根据调度结果中包含的设备标识和待执行作业的全局标识,将待执行作业提交至高性能计算机。

Description

一种任务调度方法与系统
本申请要求于2020年11月23日提交中国国家知识产权局、申请号为202011322687.2、申请名称为“一种任务调度方法与装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及高性能计算领域,更具体的,涉及一种任务调度方法与系统。
背景技术
跨集群计算服务环境聚合了由分布在不同地域,甚至归属不同组织机构的集群的计算资源,面向用户提供统一的计算服务环境。跨集群计算服务环境屏蔽了底层计算资源、作业管理系统、接入方式、管理制度等的异构性,为用户提供具有统一访问入口、使用方法和用户技术支持的高水平计算应用服务。
跨集群计算服务环境的多种任务调度算法与传统作业调度概念不同,是在集群作业资源管理系统之上的集群之间资源选择与匹配,属于应用层的任务调度。面对高并发、多计算模型、大数据存储下的跨集群计算,如何合理分配计算任务、充分利用计算资源,在满足用户应用需求的同时达到最佳能效,是任务调度策略的最基本、也是亟待解决的问题。
发明内容
本申请的目的是为了解决现有技术中存在的问题,能够通过软件配置的手段,快速将不同的任务调度算法集成到跨集群计算环境中,不影响运行中的服务。
第一方面,本申请提出一种任务调度系统,该任务调度系统包括:作 业请求收集与分发模块,至少一个调度服务模块,以及至少一个作业执行服务模块;其中:所述作业请求收集与分发模块用于接收待执行作业的作业执行请求,所述作业执行请求包括待执行作业的请求描述信息,所述请求描述信息包括作业调度算法名称以及待执行作业的全局标识;所述至少一个调度服务模块中与所述作业调度算法名称相匹配的当前调度服务模块,用于根据所述请求描述信息以及至少一个可用计算集群的计算资源信息,确定作业调度结果;其中,所述作业调度结果包括作业全局标识、用于执行待执行作业的高性能计算机的设备标识以及作业执行服务名称;所述至少一个作业执行服务模块中与所述作业执行服务名称相匹配的当前作业执行服务模块,用于接收所述当前调度服务模块确定的调度结果,并且根据所述调度结果中包含的设备标识和待执行作业的全局标识,将待执行作业提交至用于执行待执行作业的高性能计算机。
优选地,所述当前调度服务模块还用于利用所述请求描述信息和所述作业调度结果生成作业描述信息,并向所述作业请求收集与分发模块提供所述作业描述信息;
所述作业请求收集与分发模块还用于根据所述作业描述信息包含的所述作业调度结果中携带的作业执行服务名称,向所述当前作业执行服务模块分发所述作业描述信息。
优选地,所述请求描述信息还包括作业所需应用名称、作业所需队列名称以及作业所需高性能计算机的核数;
所述至少一个调度服务模块中与所述作业调度算法名称相匹配的当前调度服务模块还可以用于获取计算资源信息;所述计算资源信息包括:应用列表以及应用资源;其中,所述应用列表用于指示所述至少一个可用计算集群的至少一个高性能计算机中各自部署的至少一个应用程序,所述应用资源用于指示所述至少一个计算集群中的每一个高性能计算机各自包含的至少一个计算队列,每个计算队列中包括其对应的高性能计算机的核数。
优选地,所述作业请求描述信息还包括作业名,作业所需应用程序的版本、作业预计运行时长中的一个或多个。
第二方面,本申请提出一种任务调度方法,方法包括:接收待执行作业的作业执行请求,所述作业执行请求包括待执行作业的请求描述信息,所述请求描述信息包括作业调度算法名称以及待执行作业的全局标识;通过至少一个调度服务模块中与所述作业调度算法名称相匹配的当前调度服务模块,根据所述请求描述信息以及至少一个可用计算集群的计算资源描述信息,确定作业调度结果;其中,所述作业调度结果包括作业全局标识、用于执行所述待执行作业的高性能计算机的设备标识以及作业执行服务名称;通过至少一个作业执行服务模块中与所述作业执行服务名称相匹配的当前作业执行服务模块,接收所述当前调度服务模块确定的调度结果,并且根据所述调度结果中包含的设备标识和所述待执行作业的全局标识,将所述待执行作业提交至用于执行所述待执行作业高性能计算机。
优选地,方法还包括:通过所述当前调度服务模块,利用所述请求描述信息和所述作业调度结果生成作业描述信息,向所述作业请求收集与分发模块提供所述作业描述信息;通过所述作业请求收集与分发模块,根据所述作业描述信息包含的所述作业调度结果中携带的作业执行服务名称,向所述当前作业执行服务模块分发所述作业描述信息。
优选地,所述请求描述信息还包括作业所需应用名称、作业所需队列名称以及作业所需高性能计算机的核数;
所述方法还可以包括:通过所述当前调度服务模块,获取计算资源信息;所述计算资源信息包括:应用列表以及应用资源;其中,所述应用列表用于指示所述至少一个可用计算集群的至少一个高性能计算机中各自部署的至少一个应用程序,所述应用资源用于指示所述至少一个计算集群中的每一个高性能计算机各自包含的至少一个计算队列,每个计算队列中包括其对应的高性能计算机的核数。
优选地,方法中所述作业请求描述信息还包括作业名,作业所需应用程序的版本、作业预计运行时长。
通过本申请提供的任务调度方法及系统,任务调度算法可以严格按照标准开发,最后形成一个独立的服务,即形成一个独立的调度服务模块,多个调度服务之间互不影响,各个调度服务经过注册后直接部署至计算集群环境中即可,无需修改原有代码、不影响已有服务,具备高可扩展性。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1是本申请实施例中提供的一种任务调度系统的示意图;
图2是本申请实施例中提供的又一种任务调度系统的示意图;
图3是本申请实施例中提供的一种任务调度方法的流程示意图;
图4是本申请实施例中提供的另一种任务调度方法的流程示意图。
具体实施方式
下面结合附图和实施例,对本发明所提供的技术方案做进一步的详细描述。
在本申请实施例中,面向跨集群计算服务环境的多种任务调度算法集成装置将多种任务调度算法与计算环境解耦,每一个任务调度算法为一个独立的服务。其为调度算法开发人员提供了集群计算资源信息查询接口、明确定义了任务调度服务的输入输出标准格式。每个任务调度算法均严格按照标准开发,在使用面向高性能计算环境的调度算法集成装置集成不同的调度算法时,无需了解调度算法的实现细节,按照标准流程集成即可, 具备高可扩展性。调度算法的开发人员也无需了解系统代码的实现细节,只需按照标准格式进行信息交互,即可以将调度算法集成到跨集群计算服务环境中。
每一个任务调度算法都封装为一个服务,具有统一的服务输入、输出信息。在集成时,首先要进行服务注册,经过授权的服务会得到授权码,将该授权码写入到服务的配置文件中,启动服务即可。每个服务是一个jar文件,通过Java的启动命令Java-jar***.jar启动服务。
图1是本申请提供的一种任务调度系统的示意图。如图1所示,面向跨集群计算服务的多种任务调度算法集成装置包括至少一个作业请求收集与分发模块101、至少一个调度服务模块102和至少一个作业执行服务模块103。作业请求收集与分发模块101、调度服务模块102和作业执行服务模块103的数量可以相等,也可以不等。在任务调度系统采用分布式跨域的多集群环境的情况下,系统中可以部署多个相同的上述模块。此外,鉴于HPC通常有不同位置分布,其它模块的实现形态和地理位置也因要求而各异。
在一个可能的实施例中,作业提交服务、调度服务和作业执行服务都为计算机底层服务,可以采用jar包实现。作业请求收集与分发模块可以是消息中间件。如图2所示,作业提交服务、消息中间件、至少一个调度服务和至少一个作业执行服务,分别部署在不同的服务器上。
需要说明的是,作业提交服务、消息中间件、至少一个调度服务和至少一个作业执行服务,可以同时部署在一个服务器上,也可以分别部署在多个服务器上。图2所示的部署方式只是本申请实施例提供的一种具体的实现方式,并不对本申请实施例中的各个服务以及消息中间件的部署进行限定。
在集成时,调度服务和作业执行服务都需要进行注册。在注册时,管理员会向每一个进行注册的服务分配一个服务名称,以及在消息中间件中 为每一个服务建立一个以该服务的名称命名的消息队列。
在一个例子中,管理员在向调度服务分配服务名称时,可以将该调度服务使用的调度算法的算法名称作为该服务的名称。
在一个例子中,由于每一个作业执行服务都对应一个高性能计算机,因此上层管理程序在向作业执行服务分配服务名称时,可以将该作业执行服务对应的高性能计算机的名称作为该作业执行服务的名称。
在工作中,用户通过作业提交服务提交作业请求,作业提交服务对作业请求中包括的待执行作业的作业描述信息的合法性进行检测,将符合条件的作业描述信息发送到消息中间件,消息中间件根据作业描述信息中作业调度算法名称将该作业描述信息存储到与作业调度算法名称相匹配的消息队列中。
调度服务定期从其对应的消息队列中接收作业描述信息并根据接收到的作业描述信息以及至少一个可用计算集群的计算资源描述信息,确定作业调度结果,该作业调度结果中包括高性能计算机的名称。调度服务将作业调度结果和作业描述信息存储到与作业调度结果中包括的高性能计算机的名称相匹配的消息队列中。
作业执行服务定期从其对应的消息队列中接收作业描述信息和调度结果信息,按照调度结果中分配的HPC名称,将作业提交至指定的HPC上运行。
在如图1所示的任务调度系统中,请求收集与分发模块101接收待执行作业的作业执行请求,所述作业执行请求包括待执行作业的请求描述信息。表1示意了一种请求描述信息。如表1所示,所述请求描述信息中至少包含有作业调度算法名称、待执行作业的全局标识。可选地,描述信息还包含作业名、作业所需应用名称、作业所需应用版本、作业所需队列名称、作业所需核数、作业预计运行时长中的一个或多个。这里,应用是指高性能计算环境提供的应用;队列是指高性能计算环境中各应用可以使用 的计算队列,核数是指计算队列中各计算机的核数;作业预计运行时长是指在指定应用,指定作业队列,指定计算机核数上进行作业所需要的最小运行时间。
表1
Figure PCTCN2021114299-appb-000001
请求收集与分发模块101根据描述信息中的作业调度算法名称,从至少一个调度服务模块102中确定一个调度服务模块102,并且把作业执行请求发送给该调度服务模块102。在作业调度算法名称缺省的情况下,可以根据描述信息中的其它信息确定调度服务模块102,或者可以任意指定。
至少一个调度服务模块102各自运行作业调度算法,每个服务模块102运行的作业调度算法不同。各自配置有集群计算资源信息查询接口。该装置可以集成不同的调度算法,目前已经集成的调度算法有AWFS(Apllication Weight First Schedule,即负载优先调度算法),ATFS(Application Time First Schedule,即时间优先调度算法)。
至少一个调度服务模块102中与所述作业调度算法名称相匹配的调度服务模块在接收到作业执行请求之后,根据所述请求描述信息以及至少一个可用计算集群的计算资源描述信息,确定作业调度结果。
在一个例子中,调度服务模块根据所述描述信息确定待执行作业所需要的计算资源,然后根据得到的待执行作业所需要的计算资源,以及通过跨集群计算服务环境的接口获取当前可用集群计算资源信息,确定作业调度结果。
在一个例子中,跨集群计算服务环境提供了用于查询集群计算资源信息的接口。所述接口包括高性能计算机(HPC)列表查询接口、应用列表查询接口、应用资源查询接口、作业查询接口中的一个或多个或其组合。各接口的详细描述及使用方式如表2所示。
表2
Figure PCTCN2021114299-appb-000002
在一个例子中,高性能计算环境可以部署不同的应用程序,每个应用程序可以采用应用名称加以标识。如果描述信息指定了作业所需应用的应用名称,则调度服务模块可以根据该应用名称,通过应用资源查询接口查询所述作业所需应用。在有些情况下,应用程序有不同的版本;如果描述信息中指定了作业所需应用的应用版本,则调度服务模块可以通过应用资 源查询接口查询相应版本的所述作业所需应用。
在一个例子中,高性能计算环境的HPC上可以有不同的计算队列,每个计算队列包括队列名称。如果描述信息指定了作业所需队列的名称,则调度服务模块可以根据该队列名称,通过应用资源查询接口查询可以用于执行作业的计算队列。
在一个例子中,每个计算队列的可用计算核数不同。在确定作业所需计算队列时,该被确定的计算队列应该具有不小于作业所需高性能计算机核数的核数。
调度服务模块102的作业调度结果可以如表3所示。在表3中,所述作业调度结果包括作业全局标识、拟用于执行所述待执行作业的高性能计算机的设备标识以及作业执行服务名称。作业执行服务是以机器名称命名的,调度结果就是机器名,调度就是把作业分配到某台机器上。
表3
Figure PCTCN2021114299-appb-000003
在一个例子中,描述信息还可以包括作业所需应用名称、作业所需队列名称以及作业所需高性能计算机的核数,调度服务模块还用于获取计算资源信息。所述计算资源信息包括:应用列表以及应用资源;其中,所述应用列表用于指示所述至少一个可用计算集群的至少一个高性能计算机中各自部署的至少一个应用程序,所述应用资源用于指示所述至少一个计算集群中的每一个高性能计算机(High Performance Computer,HPC)各自包含的至少一个计算队列,每个计算队列可以包括其对应的高性能计算机 的核数。在一个例子中,一个计算集群可以有多个高性能计算机,一个高性能计算机可以有多个队列。
在一些可能的实施例中,所述当前调度服务模块还可以用于利用所述请求描述信息和所述作业调度结果生成作业描述信息,并向所述作业请求收集与分发模块提供所述作业描述信息。然后,作业请求收集与分发模块根据所述作业描述信息,向与所述作业执行服务名称相匹配的作业执行服务模块分发所述作业描述信息。
所述作业描述信息如表4所示。
表4
Figure PCTCN2021114299-appb-000004
至少一个作业执行服务模块103中与所述作业执行服务名称相匹配的当前作业执行服务模块,用于接收调度服务模块确定的调度结果,并且根据所述调度结果中包含的设备标识和所述待执行作业的全局标识,将所述待执行作业提交至用于执行所述待执行作业高性能计算机。
在一个例子中,作业执行服务可以采用高性能计算机的名称命名,调度结果信息包括高性能计算机的名称,即要把作业调度到哪个高性能计算 机上执行。以该名称命名的作业执行服务会接收该条信息,然后把作业提交到高性能计算机上执行。作业执行服务的功能就是接收作业信息,然后把作业信息提交到高性能计算机上执行。这是因为高性能计算机上不能部署服务,无法接收消息,所以需要前端服务接收信息,然后提交到高性能计算机上执行。
在上述实施例中,作业请求收集与分发模块将紧耦合的调度服务模块与作业执行服务模块解耦,可以提高问题的解决速度,尤为重要的,可以降低将来爆发隐患的可能性。
图3是本申请实施例的一种任务调度方法的流程图,该方法可以在如图1所示的系统中实现。
如图3所示,该方法至少可以包括如下步骤301、步骤302以及步骤305。
首先,在步骤301,接收待执行的作业执行请求,所述待执行作业请求包括待执行作业的请求描述信息,所述请求描述信息包括作业调度算法名称以及待执行作业的全局标识。
接着,在步骤302,通过所述作业调度算法,根据所述请求描述信息以及至少一个可用计算集群的计算资源描述信息,确定作业调度结果。在一个例子中,所述作业调度结果包括作业全局标识、用于执行所述待执行作业的高性能计算机的设备标识以及作业执行服务名称。
接着,在步骤305中,通过与所述作业调度结果中的作业执行服务名称相匹配的当前作业执行服务模块接收所述调度结果,并且根据所述调度结果中包含的设备标识和所述待执行作业的全局标识,将所述待执行作业提交至用于所述待执行作业高性能计算机。
最后由指定的高性能计算机执行作业请求。
图4是本申请实施例中提供的另一种任务调度方法的流程示意图。与 图3的流程相比,在步骤305之前,图4还包括步骤303和步骤304。
在步骤303,通过当前执行调度的调度服务模块,利用所述请求描述信息和所述作业调度结果,生成作业描述信息,并向所述作业请求收集与分发模块提供所述作业描述信息。
在步骤304,通过所述作业请求收集与分发模块,根据所述作业描述信息,向与作业执行服务名称匹配的作业执行服务模块分发所述作业描述信息。
相应的,步骤305具体可以包括步骤3051,与作业执行服务名称匹配的作业执行服务模块接收所述作业描述信息,并且根据所述作业描述信息包含的设备标识和全局标识,将所述待执行作业提交至用于执行所述待执行作业高性能计算机。
最后由指定的高性能计算机执行作业请求。
在一些可能的实施例中,可以在用户终端部署作业提交服务模块,该作业提交服务模块检查作业描述信息的合法性,将符合条件的作业描述信息发送至所述作业请求收集与分发模块。
本申请提供的一种任务调度方法及系统为调度算法开发人员提供了集群计算资源信息查询接口、明确定义了任务调度服务的输入输出标准格式。每个任务调度算法均严格按照标准开发,使用面向高性能计算环境的调度算法为一个包含多种任务调度服务的任务调度装置,各种人员无需了解多种任务调度算法的实现细节,按照标准流程集成即可,具备高可扩展性。调度算法的开发人员也无需了解系统代码的实现细节,只需按照标准格式进行信息交互,即可以将调度算法集成到跨集群计算服务环境中。
在此处所提供的说明书中,说明了大量的具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下完成实现。在一些示例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若对本发明的这些修改和变型属于本发明权利要求及其同等技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (7)

  1. 一种任务调度系统,其特征在于,包括:
    作业请求收集与分发模块,用于接收待执行作业的作业执行请求,所述作业执行请求包括所述待执行作业的请求描述信息,所述请求描述信息包括作业调度算法名称以及所述待执行作业的全局标识;
    至少一个调度服务模块,各自配置有集群计算资源信息查询接口,用于分别运行作业调度算法;其中作业调度算法具有作业调度算法名称;其中与所述作业调度算法名称相匹配的调度服务模块通过集群计算资源信息查询接口获取至少一个可用计算集群的计算资源信息,并且通过作业调度算法确定所述待执行作业的作业调度结果;其中,所述作业调度结果包括作业全局标识、至少一个可用计算集群中拟用于执行待执行作业的高性能计算机的设备标识以及作业执行服务名称;
    至少一个作业执行服务模块;其中,与所述作业执行服务名称相匹配的作业执行服务模块接收所述作业调度结果,并且将作业全局标识所标识的待执行作业提交至所述设备标识所指定的高性能计算机。
  2. 根据权利要求1所述的任务调度系统,其特征在于,所述当前调度服务模块还用于利用所述请求描述信息和所述作业调度结果生成作业描述信息,并向所述作业请求收集与分发模块提供所述作业描述信息;
    所述作业请求收集与分发模块还用于,向所述作业调度结果中携带的作业执行服务名称所指定的作业执行服务模块分发所述作业描述信息。
  3. 根据权利要求1所述的任务调度系统,其特征在于,所述请求描述信息还包括作业所需应用名称、作业所需队列名称以及作业所需高性能计算机的核数中的一个或多个;
    所述至少一个调度服务模块中与所述作业调度算法名称相匹配的调度服务模块还用于获取计算资源信息;所述计算资源信息包括:应用列表以及应用资源;其中,所述应用列表用于指示所述至少一个可用计算集群的 至少一个高性能计算机中各自部署的至少一个应用程序,所述应用资源用于指示所述至少一个计算集群中的每一个高性能计算机各自包含的至少一个计算队列,每个计算队列中包括其对应的高性能计算机的核数;
    所述至少一个调度服务模块中与所述作业调度算法名称相匹配的调度服务模块还用于根据作业所需应用名称、作业所需队列名称以及作业所需高性能计算机的核数中的一个或多个,确定所述待执行作业的作业调度结果;
    其中,所述至少一个调度服务模块中与所述作业调度算法名称相匹配的调度服务模块确定包含所述作业所需队列名称所指定的计算队列的高性能计算机为用于执行待执行作业的高性能计算机;和/或
    所述至少一个调度服务模块中与所述作业调度算法名称相匹配的调度服务模块确定可用核数不小于作业所需高性能计算机的核数高性能计算机为用于执行待执行作业的高性能计算机;和/或
    所述至少一个调度服务模块中与所述作业调度算法名称相匹配的调度服务模块确定部署有所述作业所需应用名称所指定的应用程序的高性能计算机为用于执行待执行作业的高性能计算机。
  4. 根据权利要求1所述的任务调度系统,其特征在于,所述请求描述信息还包括作业名,作业所需应用程序的版本、作业预计运行时长中的一个或多个。
  5. 一种任务调度方法,其特征在于,
    接收待执行作业的作业执行请求,所述作业执行请求包括待执行作业的请求描述信息,所述请求描述信息包括作业调度算法名称以及待执行作业的全局标识;作业调度算法名称是作业调度算法的名称,并且作业调度算法配置有集群计算资源信息查询接口;
    利用与所述作业调度算法名称相匹配的作业调度算法的集群计算资源信息查询接口获取利用与所述作业调度算法名称相匹配的作业调度算法, 根据至少一个可用计算集群的计算资源描述信息,利用所述作业调度算法,根据计算资源描述信息确定所述待执行作业的作业调度结果;其中,所述作业调度结果包括待执行作业的全局标识、至少一个可用计算集群中拟用于执行所述待执行作业的高性能计算机的设备标识以及作业执行服务名称;
    利用与所述作业执行服务名称相匹配的作业执行服务,根据所述调度结果中包含的设备标识和所述待执行作业的全局标识,将作业全局标识所标识的所述待执行作业提交至用于执行所述设备标识指定的待执行作业高性能计算机。
  6. 根据权利要求5所述的方法,其特征在于,所述请求描述信息还包括作业所需应用名称、作业所需队列名称以及作业所需高性能计算机的核数中的一个或多个;
    所述计算资源信息包括:应用列表以及应用资源;其中,所述应用列表用于指示所述至少一个可用计算集群的至少一个高性能计算机中各自部署的至少一个应用程序,所述应用资源用于指示所述至少一个计算集群中的每一个高性能计算机各自包含的至少一个计算队列,每个计算队列中包括其对应的高性能计算机的核数;
    所述根据计算资源描述信息确定所述待执行作业的作业调度结果包括:
    确定包含所述作业所需队列名称所指定的计算队列的高性能计算机为用于执行待执行作业的高性能计算机;和/或
    确定可用核数不小于作业所需高性能计算机的核数高性能计算机为用于执行待执行作业的高性能计算机;和/或
    确定部署有所述作业所需应用名称所指定的应用程序的高性能计算机为用于执行待执行作业的高性能计算机。
  7. 根据权利要求5所述的方法,其特征在于,所述请求描述信息还包括作业名,作业所需应用程序的版本、作业预计运行时长中的一个或多个。
PCT/CN2021/114299 2020-11-23 2021-08-24 一种任务调度方法与系统 WO2022105337A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/635,260 US20230342191A1 (en) 2020-11-23 2021-08-24 Task Scheduling Method and System

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011322687.2 2020-11-23
CN202011322687.2A CN112306719B (zh) 2020-11-23 2020-11-23 一种任务调度方法与装置

Publications (1)

Publication Number Publication Date
WO2022105337A1 true WO2022105337A1 (zh) 2022-05-27

Family

ID=74336157

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114299 WO2022105337A1 (zh) 2020-11-23 2021-08-24 一种任务调度方法与系统

Country Status (3)

Country Link
US (1) US20230342191A1 (zh)
CN (1) CN112306719B (zh)
WO (1) WO2022105337A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794355A (zh) * 2023-01-29 2023-03-14 中国空气动力研究与发展中心计算空气动力研究所 任务处理方法、装置、终端设备及存储介质
CN116866438A (zh) * 2023-09-04 2023-10-10 金网络(北京)数字科技有限公司 一种跨集群任务调度方法、装置、计算机设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306719B (zh) * 2020-11-23 2022-05-31 中国科学院计算机网络信息中心 一种任务调度方法与装置
CN115454450B (zh) * 2022-09-15 2024-04-30 北京火山引擎科技有限公司 一种针对数据作业的资源管理的方法、装置、电子设备和存储介质
CN117056061B (zh) * 2023-10-13 2024-01-09 浙江远算科技有限公司 一种基于容器分发机制的跨超算中心任务调度方法和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599026A (zh) * 2009-07-09 2009-12-09 浪潮电子信息产业股份有限公司 一种具有弹性架构的集群作业调度系统
CN103324534A (zh) * 2012-03-22 2013-09-25 阿里巴巴集团控股有限公司 作业调度方法及其调度器
CN104123182A (zh) * 2014-07-18 2014-10-29 西安交通大学 基于主从架构的MapReduce任务跨数据中心调度系统及方法
US20160098292A1 (en) * 2014-10-03 2016-04-07 Microsoft Corporation Job scheduling using expected server performance information
CN111126895A (zh) * 2019-11-18 2020-05-08 青岛海信网络科技股份有限公司 一种复杂场景下调度智能分析算法的管理仓库及调度方法
CN112306719A (zh) * 2020-11-23 2021-02-02 中国科学院计算机网络信息中心 一种任务调度方法与装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11334806B2 (en) * 2017-12-22 2022-05-17 International Business Machines Corporation Registration, composition, and execution of analytics in a distributed environment
CN108965024B (zh) * 2018-08-01 2021-08-13 重庆邮电大学 一种5g网络切片基于预测的虚拟网络功能调度方法
CN109976894B (zh) * 2019-04-03 2023-07-25 中国科学技术大学苏州研究院 一种平台无关的可扩展的分布式系统任务调度支撑框架
CN110333939B (zh) * 2019-06-17 2023-11-14 腾讯科技(成都)有限公司 任务混合调度方法、装置、调度服务器及资源服务器
CN110795223A (zh) * 2019-10-29 2020-02-14 浪潮云信息技术有限公司 一种针对资源统一管理的集群调度系统及方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599026A (zh) * 2009-07-09 2009-12-09 浪潮电子信息产业股份有限公司 一种具有弹性架构的集群作业调度系统
CN103324534A (zh) * 2012-03-22 2013-09-25 阿里巴巴集团控股有限公司 作业调度方法及其调度器
CN104123182A (zh) * 2014-07-18 2014-10-29 西安交通大学 基于主从架构的MapReduce任务跨数据中心调度系统及方法
US20160098292A1 (en) * 2014-10-03 2016-04-07 Microsoft Corporation Job scheduling using expected server performance information
CN111126895A (zh) * 2019-11-18 2020-05-08 青岛海信网络科技股份有限公司 一种复杂场景下调度智能分析算法的管理仓库及调度方法
CN112306719A (zh) * 2020-11-23 2021-02-02 中国科学院计算机网络信息中心 一种任务调度方法与装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794355A (zh) * 2023-01-29 2023-03-14 中国空气动力研究与发展中心计算空气动力研究所 任务处理方法、装置、终端设备及存储介质
CN116866438A (zh) * 2023-09-04 2023-10-10 金网络(北京)数字科技有限公司 一种跨集群任务调度方法、装置、计算机设备及存储介质
CN116866438B (zh) * 2023-09-04 2023-11-21 金网络(北京)数字科技有限公司 一种跨集群任务调度方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN112306719B (zh) 2022-05-31
US20230342191A1 (en) 2023-10-26
CN112306719A (zh) 2021-02-02

Similar Documents

Publication Publication Date Title
WO2022105337A1 (zh) 一种任务调度方法与系统
US11875173B2 (en) Execution of auxiliary functions in an on-demand network code execution system
US10817331B2 (en) Execution of auxiliary functions in an on-demand network code execution system
JP7197612B2 (ja) オンデマンドネットワークコード実行システム上での補助機能の実行
JP7275171B2 (ja) オンデマンドネットワークコード実行システムにおけるオペレーティングシステムカスタマイゼーション
US11243953B2 (en) Mapreduce implementation in an on-demand network code execution system and stream data processing system
WO2018149221A1 (zh) 一种设备管理方法及网管系统
US11119813B1 (en) Mapreduce implementation using an on-demand network code execution system
US9262210B2 (en) Light weight workload management server integration
TW201826120A (zh) 一種應用資源管理方法、使用方法及裝置
CN106933664B (zh) 一种Hadoop集群的资源调度方法及装置
WO2021227999A1 (zh) 云计算服务系统和方法
CN109117252B (zh) 基于容器的任务处理的方法、系统及容器集群管理系统
Somasundaram et al. CARE Resource Broker: A framework for scheduling and supporting virtual resource management
JP2005056391A (ja) コンピューティング環境の作業負荷を均衡させる方法およびシステム
CN103873534A (zh) 一种应用集群迁移方法及装置
CN110806928A (zh) 一种作业提交方法及系统
CN112532683A (zh) 一种基于微服务架构下的边缘计算方法和设备
WO2020108337A1 (zh) 一种cpu资源调度方法及电子设备
US11144359B1 (en) Managing sandbox reuse in an on-demand code execution system
CN111913784B (zh) 任务调度方法及装置、网元、存储介质
US11861386B1 (en) Application gateways in an on-demand network code execution system
US20200293376A1 (en) Database process categorization
Li et al. A two-stage approach for virtual resources adaptive scheduling in container cloud
CN113296968A (zh) 地址列表更新方法、装置、介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893489

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.08.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21893489

Country of ref document: EP

Kind code of ref document: A1