CN116804941A - Batch serverless job scheduling system and method - Google Patents

Batch serverless job scheduling system and method Download PDF

Info

Publication number
CN116804941A
CN116804941A CN202310678914.2A CN202310678914A CN116804941A CN 116804941 A CN116804941 A CN 116804941A CN 202310678914 A CN202310678914 A CN 202310678914A CN 116804941 A CN116804941 A CN 116804941A
Authority
CN
China
Prior art keywords
batch
job
server
serverless
container
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310678914.2A
Other languages
Chinese (zh)
Inventor
施经纬
沈力
白佳乐
沈震宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310678914.2A priority Critical patent/CN116804941A/en
Publication of CN116804941A publication Critical patent/CN116804941A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The specification relates to the technical field of distributed batch scheduling, and particularly discloses a batch serverless job scheduling system and method, wherein the system comprises the following steps: a batch serverless dispatch server, a distributed batch dispatch platform and a container cloud platform; the distributed batch scheduling platform writes batch jobs into a batch registry; the batch registration center updates the operation state of the batch operation according to the execution result of the batch operation; the batch serverless dispatching server monitors batch jobs in a batch registration center; under the condition that the existence of a new server-free operation is monitored, a resource application request is sent to a container cloud platform; the container destroying request can be sent to the container cloud platform under the condition that the operation is completed; the container cloud platform responds to the resource application request and generates a corresponding server-free container in the server-free resource pool; the container cloud platform also destroys the corresponding serverless container in response to the container destruction request. The system can improve the resource utilization rate.

Description

Batch serverless job scheduling system and method
Technical Field
The specification relates to the technical field of distributed batch scheduling, in particular to a batch serverless job scheduling system and method.
Background
The existing batch scheduling is based on a batch scheduling flow, and batch jobs are sequentially scheduled according to fixed time, front-back dependence, logic branches and the like, processing data are read, and summarized data are distributed so that subsequent online or batch business can be normally executed. Meanwhile, after distributed transformation, databases for a large number of applications are designed by adopting a database and table division mode, and after each database independently processes data, other processes such as sequencing, summarizing and the like are needed. The application needs to apply a set of databases separately to specially process the data so as to meet the requirement of centralized processing of the data after the slicing.
According to the characteristics of the batch processing mode, batch execution of the application system is mainly concentrated at night, some application batch operations are even concentrated only for one or two hours clinically, but as a role of batch execution, fixed resources (such as virtual machines and containers) are required to be applied for deploying batch programs, so that the utilization rate of the whole resources is very low. In addition, after distributed transformation, a special application or multiplexing of a certain partitioned database is needed to perform centralized summarization, sequencing, splitting and other treatments, and meanwhile, the problems of asymmetric performance capacity and waste of equipment resources exist. Moreover, the high availability aspect can only rely on the high availability mechanism of this database, and the risk that batch jobs can not be supported in extreme cases.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the specification provides a batch serverless job scheduling system and method, which are used for solving the problem of low resource utilization rate in a batch scheduling process in the prior art.
The embodiment of the specification provides a batch serverless job scheduling system, which comprises: a batch serverless dispatch server, a distributed batch dispatch platform and a container cloud platform;
the distributed batch scheduling platform is used for writing batch jobs into a batch registry; the batch registration center is also used for updating the operation state of the batch operation according to the execution result of the batch operation;
the batch serverless scheduling server is used for monitoring batch jobs in the batch registration center; judging whether the newly added batch job is a server-free job or not under the condition that the newly added batch job is monitored; if yes, a resource application request is sent to a container cloud platform, wherein the resource application request comprises identification information of the server-free operation; the batch serverless dispatching server is further used for reading the operation state of serverless operation in the batch registration center and sending a container destroying request to the container cloud platform, wherein the container destroying request carries identification information of completed serverless operation;
The container cloud platform is used for responding to the resource application request and generating a server-free container corresponding to the server-free job in a server-free resource pool; the server-free container is used for monitoring and reading the operation information of the corresponding server-free operation in the batch registry, triggering the execution of the server-free operation and sending the execution result to the batch registry; the container cloud platform is further used for responding to the container destroying request and destroying the server-free container corresponding to the completed server-free job in the server-free resource pool.
In one embodiment, the distributed batch scheduling platform is configured to read job definitions for batch jobs; generating batch instance information according to the job definition of the batch job; and writing the batch job into a batch job center according to the scheduling logic.
In one embodiment, the job information of the batch job includes no-server identification information, and the batch serverless scheduling server determines whether the batch job is a serverless job according to the no-server identification information of the batch job.
In one embodiment, the job information of the batch job further includes at least one of the following: batch job ID, batch job name, batch executor grouping identification, batch job plan execution time, batch job front dependent job ID, batch job execution post trigger job ID, and fragment correspondence ID.
In one embodiment, the batch instance information includes at least one of: job instance ID, batch job ID, batch instance name, batch executor group identification, batch job plan execution time, no server container resource ID, fragment correspondence ID, no server identification information, and job execution status.
In one embodiment, the container cloud platform stores a server-less resource definition table, and the server-less resource definition table stores at least one of the following information: mirror ID corresponding to the batch job, batch job ID, resource specification, deployment park ID, timeout time and resource policy.
In one embodiment, the identification information of the serverless job is the batch job ID;
the container cloud platform is used for inquiring the mirror image ID and the deployment park ID corresponding to the no-server operation in the no-server resource definition table according to the batch operation ID of the no-server operation so as to acquire batch mirror image information corresponding to the no-server operation; the container cloud platform is used for generating a no-server container corresponding to the no-server job in a no-server resource pool corresponding to the deployment park ID according to the batch mirror image information, returning the resource ID of the no-server container to the batch no-server scheduling server, and writing the no-server container resource ID corresponding to the no-server job into job information of the batch registration center by the batch no-server scheduling server.
In one embodiment, the batch serverless scheduling server is further configured to send a container destruction request to the container cloud platform when the job status of the serverless job read from the batch registry is abnormally completed, where the container destruction request carries identification information of the serverless job whose job status is abnormally completed; the batch serverless scheduling server is further used for judging whether the execution times of serverless jobs with the job status being abnormally completed are smaller than preset reset times or not, if yes, sending a resource application request to the container cloud platform so as to re-execute the serverless jobs with the job status being abnormally completed; the resource application request comprises identification information of the server-free operation with the abnormal operation state.
In one embodiment, the batch job is a batch file centralized processing job; correspondingly, the server-free container accesses a distributed file system by utilizing a pre-built file processing component so as to execute the batch file centralized processing operation; the document processing means is for performing one of the following operations: merging, sorting, de-duplication and splitting.
The embodiment of the specification also provides a batch serverless job scheduling method, which comprises the following steps:
the distributed batch scheduling platform writes batch jobs into a batch registry;
monitoring batch jobs in the batch registration center by a batch serverless scheduling server; judging whether the newly added batch job is a server-free job or not under the condition that the newly added batch job is monitored; if yes, a resource application request is sent to a container cloud platform, wherein the resource application request comprises identification information of the server-free operation;
the container cloud platform responds to the resource application request and generates a server-free container corresponding to the server-free job in a server-free resource pool;
the server-free container monitors and reads the operation information of the corresponding server-free operation in the batch registry, triggers the execution of the server-free operation, and sends the execution result to the batch registry;
the batch registry updates the operation state of the server-free operation according to the execution result;
the batch serverless dispatching server reads the operation state of serverless operation in the batch registration center and sends a container destroying request to the container cloud platform, wherein the container destroying request carries identification information of completed serverless operation;
And the container cloud platform responds to the container destroying request to destroy the server-free container corresponding to the completed server-free job in the server-free resource pool.
In the embodiment of the specification, a batch serverless job scheduling system is provided, so that batch logic scheduling and container resource scheduling can be integrated, batch resource allocation and use as required can be realized, and resource utilization rate can be greatly improved. By arranging the batch serverless scheduling server in the system, under the condition of supporting the original form batch scheduling and serverless scheduling, the system can support the gradual transition of the application to a new form, is friendly to the application transformation implementation and has low development cost. Further, by combining with a distributed file system, the system can convert the mode that the data is needed to be processed intensively by a higher-specification database or a separate database into the mode that the data is processed in a file form in a batch executor, thereby further reducing the resource consumption. In addition, depending on a file processing component, a distributed file system and a batch serverless scheduling mechanism, the multi-copy capability provided by the distributed file system can be improved when a main database and a standby database have problems at the same time under extreme conditions, such as a single database processing centralized data scene, multiple parks can be supported, batch resource creation and scheduling action execution can be triggered at any time in any one of the parks serverless container resource domains according to needs, and the availability is improved.
Drawings
The accompanying drawings are included to provide a further understanding of the specification, and are incorporated in and constitute a part of this specification. In the drawings:
FIG. 1 illustrates a flow diagram of a container cloud platform for resource object creation, updating, and deletion in one embodiment of the present description;
FIG. 2 illustrates a functional schematic of a batch serverless dispatch server in one embodiment of the present disclosure;
FIG. 3 illustrates a functional schematic of a batch serverless dispatch server in one embodiment of the present disclosure;
FIG. 4 shows a flow diagram of file processing in an embodiment of the present description;
FIG. 5 shows a flow diagram of file processing in an embodiment of the present description;
FIG. 6 illustrates a schematic diagram of a batch serverless job identification and distribution flow in one embodiment of the present disclosure
FIG. 7 illustrates a batch serverless job scheduling flow diagram in one embodiment of the present disclosure;
FIG. 8 illustrates a flow diagram of batch file centralization processing in one embodiment of the present description;
FIG. 9 is a schematic diagram showing the configuration of a batch serverless job scheduling system in one embodiment of the present disclosure
Fig. 10 is a flowchart of a batch serverless job scheduling method in one embodiment of the present disclosure.
Detailed Description
The principles and spirit of the present specification will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present description, and are not intended to limit the scope of the present description in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that the embodiments of the present description may be implemented as a system, apparatus, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
The embodiment of the specification provides a batch serverless job scheduling system. In one scenario example, a batch serverless job scheduling system is based primarily on container resource scheduling and batch job scheduling, i.e., a container cloud platform (PaaS, platform as a service) and a distributed batch scheduling platform (Distributed Batch Framework ). The container cloud platform provides container (POD) resource supply and arrangement service taking Kubernetes as a dispatching core, an application can reserve PaaS container resources through a device application management flow, and when a version is put into production, a corresponding container is generated at one time through container template setting. The distributed batch platform provides timing or conditional triggering scheduling of batch jobs (e.g., shell, java programs), and forms ordered batch job scheduling through a certain service logic arrangement, and supports daytime and daytime batches of application specific services.
The batch executor computing resources provided by the container cloud platform can be provided according to container specifications defined in the container template by application, such as 4CPU and 8G memory, and containers with corresponding specifications can be provided from production start according to a resident resource mode by the container cloud resource pool. The container cloud platform simultaneously provides no server computing capability, can realize a scheduling mode of container resources from 1 to 0, can support release of the container resources applied before under the selected condition, and reschedules application of new resources to create containers when needed.
Referring to FIG. 1, a flow chart of a container cloud platform for resource object creation, update, and deletion is shown. As shown in fig. 1, the core controller processes the request: the scaling target strategy is a resource object of the PaaS elastic rule, and the core controller receives an operation command of the rule resource object (elastic strategy). When the elastic policy is deleted, the container copy number will revert to the default container copy number for the deployment time resource. When creating a regular resource object (elasticity policy), an elasticity policy, generic container horizontal scaling instruction (subsequent scaling controls the elastic scaling of container resources by Kubernetes) is created. And simultaneously starting a cyclic capacity expansion and contraction trigger cooperative process, and continuously detecting whether the capacity expansion and contraction trigger is active or not by the cooperative process. Cyclic monitoring of the source of the dilatation trigger event: the trigger circularly monitors an external event source, updates the operation of the number of copies from 0 to 1 or 1 to 0 according to the active state (for the case that the current number of copies is greater than 0 and the active is true, the universal container horizontal capacity reduction mechanism takes over to carry out capacity expansion), and transmits the index acquired from the event source to the index adapter component. Treatment index, general container horizontal shrinkage: the index adapter receives the index transmitted by the expansion and contraction volume trigger, converts the index into a format identified by Kubernetes, transmits the format to a universal container horizontal volume contraction mechanism, and finally realizes container resource expansion and contraction volume.
Because the whole batch operation aims at reading an imported file, processing data and outputting a result, the data is interacted with an upstream business system and a downstream business system usually in a file form, and meanwhile, data collection and splitting of different fragment databases in the system are also carried out according to the file form to store and process the data. Batch system operation generally relies on the sharing capabilities of the files of the distributed file system to support applications for trans-campus file sharing, to support trans-campus high availability dual activity, and so on.
In the batch scheduling process in the prior art, the server-free computing capability provided by the container cloud platform cannot be utilized, the distributed batch scheduling mechanism cannot interact with the Kubernetes in the container cloud platform, only batch operation can be scheduled, batch execution of business is triggered, after the batch execution is finished, the container cloud platform cannot be informed to recover container resources, and the batch executor cannot be redeployed to execute batch programs in subsequent batches after the container resources are required to be reapplied.
The embodiment of the specification provides a batch serverless job scheduling system, which can comprise a batch serverless scheduling server, wherein the batch serverless scheduling server can determine and identify the start and the end of batch job execution, and can generate a volume expansion request which can be identified by Kubernetes in a container cloud platform according to a job state, so that the batch job execution result and the state can be realized to trigger the container creation and destruction of the container cloud platform, the server resource can be used according to the specific execution time of batches, and the resource utilization rate can be greatly improved.
The batch serverless scheduling server can generate a container expansion rule resource object (elastic strategy) according to the execution state of batch jobs, and when the batch jobs are completed, a strategy with the contraction capacity of 0 is generated; and when the batch needs to be scheduled, generating
Considering that there is still a large number of stock batch jobs or special jobs that are executed in a lot for a long time, it is necessary to consider the difference in execution of batch jobs on a normal container and a serverless container at the same time, and thus to achieve the above-described object, it is necessary to achieve the following functions.
First, no server identification can be added to the original batch job definition to distinguish processing. The information included in the job information of the batch job is shown in table 1 below.
TABLE 1
Secondly, resources and jobs can be comprehensively scheduled through a batch serverless scheduling server, and the jobs are triggered as a core module according to the sequence of resource prefabrication, batch starting, batch ending and resource recycling.
The batch serverless scheduling server can generate a container expansion rule resource object (elastic strategy) according to the execution state of batch jobs, and when the batch jobs are completed, a strategy with the contraction capacity of 0 is generated; and when the batch is required to be scheduled, generating a capacity expansion strategy for expanding the capacity to the minimum copy number, and further initiating batch job scheduling by the distributed batch scheduling module.
Referring to fig. 2, a functional schematic of a batch serverless dispatch server in an embodiment of the present disclosure is shown. As shown in fig. 2, the batch serverless scheduling server may read the batch job execution status from the batch registry, and when the batch job execution status is completed, generate a resource rule object with a reduction of 0 (a reduction-expansion policy corresponding to the batch container, that is, a policy in which the number of copies of the batch executor is set to 0). The batch serverless scheduling server may generate a resource rule object (a scaling strategy to set the number of copies of the batch executor to the minimum number of copies) that is scaled to the minimum number of copies in response to the batch job scheduling instruction. The batch serverless dispatch server may send the resource rule object to the container cloud platform. After receiving the resource rule object, the container cloud platform can trigger the capacity expansion and contraction operation to realize the capacity expansion and contraction of the PaaS container cluster.
And thirdly, the processing of resource abnormality and batch job abnormality can be comprehensively considered, so that the problem that the resources cannot be recovered normally and the batch cannot be scheduled normally due to the fact that the resources and logics are in problems under different systems and scheduling is interfered mutually is prevented.
Referring to fig. 3, a functional schematic of a batch serverless dispatch server in an embodiment of the present disclosure is shown. As shown in FIG. 3, the batch serverless dispatch server may read a batch job execution status from a batch registry. When the execution state of the batch job is abnormal, the batch serverless scheduling server can be set according to batch retry times, and the resource recycling strategy is triggered first, namely the container cloud platform is informed to reduce the number of the container copies to 0. And simultaneously, restarting the resource scheduling flow corresponding to the batch job by the batch serverless scheduling server. After the resource application, the batch job may be executed again until the batch job is completed or the number of retries reaches a preset number.
In the case where the batch job is a batch file processing job, a distributed file system and dedicated file processing components may be used to support distribution, aggregation, ordering, deduplication, etc. of data for each sharded database. Referring to fig. 4 and 5, a flow chart of file processing in the embodiment of the present specification is shown. As shown in fig. 4, files may be split and deduplicated. As shown in fig. 5, the files may be merged and deduplicated.
With continued reference to FIG. 6, a schematic diagram of a batch serverless job identification and distribution flow is shown. The batch job needs to explicitly identify whether it is a serverless job, and add a batch job instance table to record information such as a correspondence relationship between batch job instances and serverless containers, a status, and the like, as shown in table 1 above. The information contained in the serverless job instance information is shown in table 2 below.
TABLE 2
As shown in FIG. 6, after a batch job is deployed, a distributed batch scheduling platform (i.e., a batch scheduling system) may begin initializing an instance table according to a job definition, and then write batch job information to be executed in a batch registry in a unified manner according to scheduling logic, time, etc.
In addition, the serverless computing of the container cloud platform needs to store the mirror image information corresponding to the application batch job in the corresponding batch serverless resource definition table in advance. The information contained in the serverless resource definition table is shown in table 3 below.
TABLE 3 Table 3
Mirror ID Job definition ID Resource specification Deployment park Timeout time Resource policy
vchar(20) vchar(200) vchar(20) vchar(100) vchar(10) vchar(20)
Referring to FIG. 7, a batch serverless job scheduling flow diagram is shown. As shown in fig. 7, the application has the number of batch executors equivalent to the database thereof in a slicing manner or other number meeting the high availability requirement of the application, the batch serverless scheduling server monitors batch job instance information in the batch registration center, if new batch jobs appear, whether the batch jobs are serverless jobs is judged first, if yes, the job information is read continuously, meanwhile, no server computing resources are applied to the container cloud platform, batch job ID information is synchronously informed to the container cloud platform, the container cloud platform acquires corresponding job images according to the batch job IDs, corresponding containers are generated in a designated park according to the agreed specification requirements, if timeout, errors and the like occur in the resource application process, the resource application is retried according to the resource policy, or the errors are reported directly. The batch serverless dispatching server can realize the flexible extension of a serverless container in combination with the execution state of batch jobs by integrating the analysis capability of a batch dispatching protocol and the resource flexible policy generation capability of a container cloud platform.
When the batch serverless scheduling server triggers the batch container resource scheduling, after the container resource is deployed in place, the serverless container starts to monitor and read the related job information of the batch registry (the job state is the state after the job is initialized), screens according to the corresponding resource ID, triggers the batch job according to the original batch flow, and executes the existing batch processing logic in the container, such as reading a database, business processing, processing files and the like. After the operation is executed, the batch executor of the server-free container acquires a corresponding state, if the batch returns successfully, the batch registration center operation state is updated to be completed, if the batch registration center operation state fails, retry or direct alarm is arranged according to the original batch operation exception handling strategy, the batch operation flow is ended, and resources are recovered by the container cloud platform.
Referring to fig. 8, a flow chart of batch file centralized processing in an embodiment of the present specification is shown. Under the distributed system, the business still needs to split, summarize, sort and the like the data, but under the single-chip database, the problem of incomplete data can cause considerable trouble to the data processing. Through processing logic, related data processing functions are independently stripped to form a jar package component of the file processing tool, and the application utilizes the batch file processing tool to realize data sharing of all batch executors through accessing the distributed file system uniformly. No matter which step the data is processed, a single batch executor has a problem, new container resources can be quickly applied through a server-free module of the container cloud platform, and batch operation can be executed. In this mode, a general file processing module needs to be constructed to support the functions of splitting, summarizing, sorting, etc. of files, as shown in table 4 below.
TABLE 4 Table 4
The batch serverless job scheduling system in the embodiment of the specification can combine the existing container cloud, batch job scheduling and file access processing technology, so that fusion of batch job logic scheduling and batch container resource scheduling is realized, and resource waste is greatly reduced. Meanwhile, the high-availability pain point of the centralized processing data of the application system under the distributed system is solved by utilizing the shared access capability of the distributed files and the public file processing component. By the scheme, batch logic scheduling and container resource scheduling can be integrated, batch resource allocation and use according to needs can be realized, and resource utilization rate is greatly improved. In addition, through the design of batch serverless scheduling server, the whole system supports original form batch scheduling and serverless scheduling, can support the gradual transition of application to a new form, is friendly to application modification implementation and has low development cost. And thirdly, by combining a distributed file system, the mode of centralized processing of data which originally needs a higher-specification database or a separate database can be changed into the mode of processing in a file form in a batch executor, so that the resource consumption is further reduced. According to the system in the embodiment, the file processing component, the distributed file system and the batch serverless scheduling mechanism are relied on, so that the multi-copy capability provided by the distributed file system can be improved when the master database and the slave database have problems at the same time under extreme conditions, such as a single database processing centralized data scene, the multi-park multiple-activity can be supported, batch resource creation and scheduling action execution can be triggered as required in any one park serverless container resource domain, and the usability is improved.
Based on the foregoing, embodiments of the present disclosure provide a batch serverless job scheduling system. Referring to fig. 9, a schematic diagram of a batch serverless job scheduling system according to an embodiment of the present disclosure is shown. As shown in fig. 9, batch serverless job scheduling system 90 may include: a distributed batch scheduling platform 901, a batch serverless scheduling server 902, and a container cloud platform 903.
The distributed batch scheduling platform 901 may be used to write batch jobs to a batch registry. The batch registration center may register job information of batch jobs. The batch registry may also be used to update the job status of the batch job based on the execution results of the batch job. In one embodiment, the job state may include: and (5) waiting for execution, executing and completing the execution. Execution completion may include: normal completion and abnormal completion.
In some embodiments of the present description, distributed batch scheduling platform 901 may be used to read job definitions for batch jobs; generating batch instance information according to job definition of batch jobs; and writing the batch job to the batch job center according to the scheduling logic.
Batch serverless dispatch server 902 may be used to monitor batch jobs in a batch registry. When it is detected that there is a new batch job, the batch serverless scheduling server 902 determines whether or not the new batch job is a serverless job. If yes, a resource application request is sent to the container cloud platform 903. The resource application request comprises identification information of the operation without the server.
The batch serverless dispatch server 902 may also be configured to read job status of serverless jobs in a batch registry and send a container destruction request to the container cloud platform 903. The container destroying request carries identification information of the completed server-free operation in the operation state.
The container cloud platform 903 may be configured to generate a serverless container corresponding to a serverless job in a serverless resource pool in response to a resource application request. The serverless container may be configured to monitor and read job information of a corresponding serverless job in the batch registry, trigger execution of the serverless job, and send an execution result to the batch registry.
The container cloud platform 903 may also be configured to destroy, in response to a container destroy request, a serverless container corresponding to a completed serverless job in a serverless resource pool.
The system in the embodiment can integrate batch logic scheduling and container resource scheduling, realize batch resource allocation and use as required and greatly improve the resource utilization rate. By arranging the batch serverless scheduling server in the system, under the condition of supporting the original form batch scheduling and serverless scheduling, the system can support the gradual transition of the application to a new form, is friendly to the application transformation implementation and has low development cost.
In some embodiments of the present disclosure, the job information of the batch job may include no-server identification information, and the batch serverless scheduling server may determine whether the batch job is a serverless job according to the no-server identification information of the batch job. For example, when the no-server identification information is 1, this batch job is indicated as a no-server job, and when the no-server identification information is 0, this batch job is indicated as not a no-server job.
In some embodiments of the present disclosure, the job information of the batch job may further include at least one of the following: batch job ID, batch job name, batch executor grouping identification, batch job plan execution time, batch job front dependent job ID, batch job execution post trigger job ID, and fragment correspondence ID. The batch job ID is identification information of the batch job. The name of the batch job is the name of the batch job. The batch actuator group identification identifies a batch actuator group for executing the batch job. The batch job front dependent job ID and the batch job execution post-trigger job ID may represent an identification of the batch job front dependent batch job and an identification of the batch job triggered after the batch job execution is completed, respectively. The shard correspondence ID is used to indicate the identity of the corresponding file shard in the distributed file system.
In some embodiments of the present description, the batch instance information includes at least one of: job instance ID, batch job ID, batch instance name, batch executor group identification, batch job plan execution time, no server container resource ID, fragment correspondence ID, no server identification information, and job execution status. The distributed batch scheduling platform may generate batch job instances from job information for batch jobs. The batch instance information corresponding to the batch job instance may include a job instance ID, i.e., identification information of the batch job instance. The serverless container resource ID is identification information of the serverless container that executes the batch job.
In some embodiments of the present disclosure, a server resource definition table may be stored in the container cloud platform, where at least one of the following information may be stored in the server-free resource definition table: mirror ID corresponding to the batch job, batch job ID, resource specification, deployment park ID, timeout time and resource policy. The mirror image ID corresponding to the batch job is an identification of mirror image information corresponding to the batch job. The campus ID is an identification of the campus where the serverless container corresponding to the batch job is located. The resource specification is the resource specification information corresponding to the server-free container. The timeout may refer to an average run length of the batch job under historical run. The resource policy may refer to a policy that the server-less executor container resource applies for again when a batch execution exception occurs, or may refer to a correction policy of the resource specification after reporting an error.
In some embodiments of the present disclosure, the container cloud platform may be configured to query, in the serverless resource definition table, a mirror ID and a deployment park ID corresponding to the serverless job according to a batch job ID of the serverless job, so as to obtain batch mirror information corresponding to the serverless job. The container cloud platform can be used for generating a serverless container corresponding to the serverless job in a serverless resource pool corresponding to the deployment park ID according to the batch mirror image information, and returning the resource ID of the serverless container to the batch serverless scheduling server. The batch serverless dispatch server may write the serverless container resource ID corresponding to the serverless job into job information of the batch registry.
In some embodiments of the present disclosure, the batch serverless scheduling server may be further configured to send a container destruction request to the container cloud platform when the job status of the serverless job read from the batch registry is abnormally completed, where the container destruction request carries identification information of the serverless job whose job status is abnormally completed. The batch serverless dispatch server may also be configured to determine whether the job status is that the number of executions of the abnormally completed serverless job is less than a preset reset number. If so, the batch serverless scheduling server sends a resource application request to the container cloud platform to re-execute the serverless job whose job state is abnormally completed. The resource application request includes identification information of the serverless job whose job status is abnormally completed.
In some embodiments of the present description, a batch job may be a batch file centralized processing job. Accordingly, the serverless container may access the distributed file system using a pre-built file processing component to perform batch file centralized processing jobs. The document processing means may be operable to perform one of: merging, sorting, de-duplication and splitting.
Fig. 10 is a flowchart of a batch serverless job scheduling method in one embodiment of the present disclosure. Although the present description provides methods and apparatus structures as shown in the following examples or figures, more or fewer steps or modular units may be included in the methods or apparatus based on conventional or non-inventive labor. In the steps or the structures of the apparatuses, which logically do not have the necessary cause and effect relationship, the execution order or the structure of the modules of the apparatuses are not limited to the execution order or the structure of the modules shown in the drawings and described in the embodiments of the present specification. The described methods or module structures may be implemented sequentially or in parallel (e.g., in a parallel processor or multithreaded environment, or even in a distributed processing environment) in accordance with the embodiments or the method or module structure connection illustrated in the figures when implemented in a practical device or end product application.
Specifically, as shown in fig. 10, the batch serverless job scheduling method provided in one embodiment of the present specification may include the following steps.
In step S1001, the distributed batch scheduling platform writes the batch job to the batch registry.
Step S1002, a batch serverless scheduling server monitors batch jobs in the batch registration center; judging whether the newly added batch job is a server-free job or not under the condition that the newly added batch job is monitored; if yes, a resource application request is sent to the container cloud platform, wherein the resource application request comprises identification information of the server-free operation.
In step S1003, the container cloud platform responds to the resource application request to generate a server-free container corresponding to the server-free job in a server-free resource pool.
Step S1004, the serverless container monitors and reads job information of a corresponding serverless job in the batch registry, triggers execution of the serverless job, and sends an execution result to the batch registry.
In step S1005, the batch registry updates the job status of the serverless job according to the execution result.
Step S1006, the batch serverless scheduling server reads the job status of the serverless job in the batch registration center, and sends a container destruction request to the container cloud platform, where the container destruction request carries identification information of the job status of the completed serverless job.
Step S1007, the container cloud platform responds to the container destruction request to destroy the serverless container corresponding to the completed serverless job in the serverless resource pool.
The method in the embodiment can integrate batch logic scheduling and container resource scheduling, realize batch resource allocation and use as required and greatly improve the resource utilization rate. By arranging the batch serverless scheduling server in the system, under the condition of supporting the original form batch scheduling and serverless scheduling, the system can support the gradual transition of the application to a new form, is friendly to the application transformation implementation and has low development cost. Further, by combining with a distributed file system, the system can convert the mode that the data is needed to be processed intensively by a higher-specification database or a separate database into the mode that the data is processed in a file form in a batch executor, thereby further reducing the resource consumption. In addition, depending on a file processing component, a distributed file system and a batch serverless scheduling mechanism, the multi-copy capability provided by the distributed file system can be improved when a main database and a standby database have problems at the same time under extreme conditions, such as a single database processing centralized data scene, multiple parks can be supported, batch resource creation and scheduling action execution can be triggered at any time in any one of the parks serverless container resource domains according to needs, and the availability is improved.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. Specific reference may be made to the foregoing description of related embodiments of the related process, which is not described herein in detail.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
From the above description, it can be seen that the following technical effects are achieved in the embodiments of the present specification: the method can integrate batch logic scheduling and container resource scheduling, realize batch resource allocation and use as required, and greatly improve the resource utilization rate. By arranging the batch serverless scheduling server in the system, under the condition of supporting the original form batch scheduling and serverless scheduling, the system can support the gradual transition of the application to a new form, is friendly to the application transformation implementation and has low development cost. Further, by combining with a distributed file system, the system can convert the mode that the data is needed to be processed intensively by a higher-specification database or a separate database into the mode that the data is processed in a file form in a batch executor, thereby further reducing the resource consumption. In addition, depending on a file processing component, a distributed file system and a batch serverless scheduling mechanism, the multi-copy capability provided by the distributed file system can be improved when a main database and a standby database have problems at the same time under extreme conditions, such as a single database processing centralized data scene, multiple parks can be supported, batch resource creation and scheduling action execution can be triggered at any time in any one of the parks serverless container resource domains according to needs, and the availability is improved.
It will be apparent to those skilled in the art that the modules or steps of the embodiments described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module. Thus, embodiments of the present specification are not limited to any specific combination of hardware and software.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and many applications other than the examples provided will be apparent to those of skill in the art upon reading the above description. The scope of the disclosure should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the protection scope of the present specification.

Claims (10)

1. A batch serverless job scheduling system, comprising: a batch serverless dispatch server, a distributed batch dispatch platform and a container cloud platform;
the distributed batch scheduling platform is used for writing batch jobs into a batch registry; the batch registration center is also used for updating the operation state of the batch operation according to the execution result of the batch operation;
the batch serverless scheduling server is used for monitoring batch jobs in the batch registration center; judging whether the newly added batch job is a server-free job or not under the condition that the newly added batch job is monitored; if yes, a resource application request is sent to a container cloud platform, wherein the resource application request comprises identification information of the server-free operation; the batch serverless dispatching server is further used for reading the operation state of serverless operation in the batch registration center and sending a container destroying request to the container cloud platform, wherein the container destroying request carries identification information of completed serverless operation;
The container cloud platform is used for responding to the resource application request and generating a server-free container corresponding to the server-free job in a server-free resource pool; the server-free container is used for monitoring and reading the operation information of the corresponding server-free operation in the batch registry, triggering the execution of the server-free operation and sending the execution result to the batch registry; the container cloud platform is further used for responding to the container destroying request and destroying the server-free container corresponding to the completed server-free job in the server-free resource pool.
2. The batch serverless job scheduling system of claim 1, wherein the distributed batch scheduling platform is configured to read job definitions for batch jobs; generating batch instance information according to the job definition of the batch job; and writing the batch job into a batch job center according to the scheduling logic.
3. The batch serverless job scheduling system of claim 1, wherein job information of the batch job includes serverless identification information, and the batch serverless scheduling server determines whether the batch job is a serverless job based on the serverless identification information of the batch job.
4. The batch serverless job scheduling system of claim 3, wherein the job information of the batch job further comprises at least one of: batch job ID, batch job name, batch executor grouping identification, batch job plan execution time, batch job front dependent job ID, batch job execution post trigger job ID, and fragment correspondence ID.
5. The batch serverless job scheduling system of claim 2, wherein the batch instance information comprises at least one of: job instance ID, batch job ID, batch instance name, batch executor group identification, batch job plan execution time, no server container resource ID, fragment correspondence ID, no server identification information, and job execution status.
6. The batch serverless job scheduling system of claim 1, wherein the container cloud platform has a server resource definition table stored therein, the serverless resource definition table having at least one of the following information stored therein: mirror ID corresponding to the batch job, batch job ID, resource specification, deployment park ID, timeout time and resource policy.
7. The batch serverless job scheduling system of claim 6, wherein the identification information of the serverless job is the batch job ID;
the container cloud platform is used for inquiring the mirror image ID and the deployment park ID corresponding to the no-server operation in the no-server resource definition table according to the batch operation ID of the no-server operation so as to acquire batch mirror image information corresponding to the no-server operation; the container cloud platform is used for generating a no-server container corresponding to the no-server job in a no-server resource pool corresponding to the deployment park ID according to the batch mirror image information, returning the resource ID of the no-server container to the batch no-server scheduling server, and writing the no-server container resource ID corresponding to the no-server job into job information of the batch registration center by the batch no-server scheduling server.
8. The batch serverless job scheduling system according to claim 1, wherein the batch serverless job scheduling server is further configured to send a container destruction request to the container cloud platform, where the job status of the serverless job read from the batch registry is abnormal, and the container destruction request carries identification information of the serverless job whose job status is abnormal; the batch serverless scheduling server is further used for judging whether the execution times of serverless jobs with the job status being abnormally completed are smaller than preset reset times or not, if yes, sending a resource application request to the container cloud platform so as to re-execute the serverless jobs with the job status being abnormally completed; the resource application request comprises identification information of the server-free operation with the abnormal operation state.
9. The batch serverless job scheduling system of claim 1, wherein the batch job is a batch file centralized processing job; correspondingly, the server-free container accesses a distributed file system by utilizing a pre-built file processing component so as to execute the batch file centralized processing operation; the document processing means is for performing one of the following operations: merging, sorting, de-duplication and splitting.
10. A batch serverless job scheduling method, comprising:
the distributed batch scheduling platform writes batch jobs into a batch registry;
monitoring batch jobs in the batch registration center by a batch serverless scheduling server; judging whether the newly added batch job is a server-free job or not under the condition that the newly added batch job is monitored; if yes, a resource application request is sent to a container cloud platform, wherein the resource application request comprises identification information of the server-free operation;
the container cloud platform responds to the resource application request and generates a server-free container corresponding to the server-free job in a server-free resource pool;
the server-free container monitors and reads the operation information of the corresponding server-free operation in the batch registry, triggers the execution of the server-free operation, and sends the execution result to the batch registry;
The batch registry updates the operation state of the server-free operation according to the execution result;
the batch serverless dispatching server reads the operation state of serverless operation in the batch registration center and sends a container destroying request to the container cloud platform, wherein the container destroying request carries identification information of completed serverless operation;
and the container cloud platform responds to the container destroying request to destroy the server-free container corresponding to the completed server-free job in the server-free resource pool.
CN202310678914.2A 2023-06-08 2023-06-08 Batch serverless job scheduling system and method Pending CN116804941A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310678914.2A CN116804941A (en) 2023-06-08 2023-06-08 Batch serverless job scheduling system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310678914.2A CN116804941A (en) 2023-06-08 2023-06-08 Batch serverless job scheduling system and method

Publications (1)

Publication Number Publication Date
CN116804941A true CN116804941A (en) 2023-09-26

Family

ID=88079249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310678914.2A Pending CN116804941A (en) 2023-06-08 2023-06-08 Batch serverless job scheduling system and method

Country Status (1)

Country Link
CN (1) CN116804941A (en)

Similar Documents

Publication Publication Date Title
US11144415B2 (en) Storage system and control software deployment method
US8307363B2 (en) Virtual machine system, restarting method of virtual machine and system
US8074222B2 (en) Job management device, cluster system, and computer-readable medium storing job management program
US8949188B2 (en) Efficient backup and restore of a cluster aware virtual input/output server (VIOS) within a VIOS cluster
EP2539820B1 (en) System and method for failing over cluster unaware applications in a clustered system
US9886260B2 (en) Managing software version upgrades in a multiple computer system environment
US9031917B2 (en) Efficient backup and restore of virtual input/output server (VIOS) cluster
US9389976B2 (en) Distributed persistent memory using asynchronous streaming of log records
JP4204769B2 (en) System and method for handling failover
CN112035293A (en) Virtual machine cluster backup
US9501544B1 (en) Federated backup of cluster shared volumes
US20170168756A1 (en) Storage transactions
US9398092B1 (en) Federated restore of cluster shared volumes
JP6520448B2 (en) INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING DEVICE, AND INFORMATION PROCESSING DEVICE CONTROL METHOD
US11803412B2 (en) Containerized application management system and management method
CN112235405A (en) Distributed storage system and data delivery method
CN108108119B (en) Configuration method and device for extensible storage cluster things
JP2009080705A (en) Virtual machine system and method for restoring virtual machine in the system
CN113986450A (en) Virtual machine backup method and device
US11079960B2 (en) Object storage system with priority meta object replication
CN116804941A (en) Batch serverless job scheduling system and method
WO2022227719A1 (en) Data backup method and system, and related device
CN115378800A (en) Distributed fault-tolerant system, method, apparatus, device and medium without server architecture
CN115292408A (en) Master-slave synchronization method, device, equipment and medium for MySQL database
CN113934575A (en) Big data backup system and method based on distributed copy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination