CN110362390B - Distributed data integration job scheduling method and device - Google Patents

Distributed data integration job scheduling method and device Download PDF

Info

Publication number
CN110362390B
CN110362390B CN201910489422.2A CN201910489422A CN110362390B CN 110362390 B CN110362390 B CN 110362390B CN 201910489422 A CN201910489422 A CN 201910489422A CN 110362390 B CN110362390 B CN 110362390B
Authority
CN
China
Prior art keywords
job
scheduling
information
module
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910489422.2A
Other languages
Chinese (zh)
Other versions
CN110362390A (en
Inventor
李建元
刘飞黄
王超群
刘兴田
贾建涛
温晓岳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yinjiang Technology Co.,Ltd.
Original Assignee
Enjoyor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enjoyor Co Ltd filed Critical Enjoyor Co Ltd
Priority to CN201910489422.2A priority Critical patent/CN110362390B/en
Publication of CN110362390A publication Critical patent/CN110362390A/en
Application granted granted Critical
Publication of CN110362390B publication Critical patent/CN110362390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a distributed data integration job scheduling method and a device, aiming at special scenes possibly faced by data integration, the job scheduling device is responsible for sending data integration jobs to a job running device, the job running device receives scheduling tasks and starts job execution, and sends state information of job running to a job management module, and feeds back working node calculation resources to a resource scheduling module and feeds back loss connection or fault information to a job preloading module. The invention has the following comprehensive characteristics: (1) high availability, fault tolerance, weak consistency; (2) the low-delay characteristic facing the quasi-real-time job scheduling; (3) multi-tenant concurrency control facing to cloud service application; (4) computing resource isolation and multi-job parallel scheduling; (5) a priority scheduling mechanism.

Description

Distributed data integration job scheduling method and device
Technical Field
The invention relates to the technical field of big data foundation, in particular to a distributed data integration job scheduling method and device.
Background
With the evolution of digital economy, service digitization in various industries has been fully developed, and digital service has gradually become a new center of gravity. However, since the service digitization derives a large number of data islands and becomes a common pain point for realizing the digital service, data integration is urgently needed in various industries, the data islands are opened and avoided, and data resources are integrated and managed, so that the associated value among data is effectively developed.
Data integration often faces thousands of job scheduling including job types such as data exchange and data preprocessing, and the design of a scheduling system needs to consider various complex scenarios. For example, some scenarios not only have a large number of jobs but also require parallel processing; some scenes require quasi-real-time operation, and priority scheduling needs to be considered; some jobs need to occupy more or longer computing resources, and resource isolation needs to be considered so as not to influence other jobs; some jobs may suffer from failures such as downtime of data sources and targets, interruption of networks, downtime of running nodes and the like, and a failure fault tolerance mechanism is needed; in a multi-tenant scenario, concurrent control needs to be handled; and so on.
The prior art fails to meet the complex data integration job scheduling requirements. For example: in the traditional LTS scheduling system, the operation is isolated based on the thread, if the execution thread of one operation exhausts all the memories of the current process, all the operations in the process are abnormal, the capability of scheduling data integration operation is lacked, and the traditional LTS scheduling system is more suitable for scheduling light-weight tasks. Chinese patent CN201610800080 discloses a distributed task scheduling system and method for solving the problems of large code writing amount and heavy development task of developers in a parallel computing program development mode, which is essentially scheduling for a single large-scale distributed computing job without considering multi-task parallel scheduling. Chinese patent CN201610197298 discloses a task scheduling method, apparatus and system, which provides a multi-channel multi-task distributed scheduling method, solves the problem of starvation of other jobs caused by too long scheduling time of a single task, but does not consider the problems of low latency of job metadata access, how to ensure job metadata consistency, and the like. Chinese patent No. CN201410748604 of the invention discloses a distributed task scheduling system and method, which provides a distributed task scheduling system and method for ensuring the reliability of the system itself, supporting independent or associated tasks, and supporting task rollback distribution, but is not suitable for complex data integration scenarios where rollback of tasks is not a key point, association does not need to occur between data integration tasks, and does not consider the problem that downtime under large-scale complex data integration tasks is high, and high availability needs to be achieved, and the problems of how to reduce delay as much as possible, how to ensure consistency of operation metadata, and the like under the premise that a quasi-real-time task exists.
Disclosure of Invention
Aiming at special scenes possibly faced by data integration, the invention is characterized in that an operation scheduling device is responsible for sending data integration operation to an operation running device, the operation running device receives scheduling tasks and starts operation execution, and sends operation running state information to an operation management module, feeds back working node calculation resources to a resource scheduling module, and feeds back loss connection or fault information to an operation preloading module; the invention has the following comprehensive characteristics: (1) high availability, fault tolerance, weak consistency; (2) the low-delay characteristic facing the quasi-real-time job scheduling; (3) multi-tenant concurrency control facing to cloud service application; (4) computing resource isolation and multi-job parallel scheduling; (5) a priority scheduling mechanism.
The invention achieves the aim through the following technical scheme: a distributed data integration job scheduling method comprises the following steps:
(1) the job scheduling device issues the data integration job to the job running device, wherein the job scheduling device comprises a job management module, a job preloading module and a resource scheduling module: (1.1) the operation management module receives, caches and stores operation related meta information to perform concurrency control;
(1.2) the job preloading module acquires the jobs to be processed from the job management module and determines the scheduling priority sequence;
(1.3) the resource scheduling module completes resource allocation and scheduling distribution by acquiring the job preloading information and the calculation resource information of the job running device;
(2) the operation running device receives the scheduling task and starts operation execution, and feeds back the operation running state information to the operation management module, feeds back the working node calculation resources to the resource scheduling module, and feeds back the loss connection or fault information to the operation preloading module.
Preferably, the job management module includes an information receiving unit, an information caching unit, a persistent storage unit, and a concurrency control unit, wherein the specific operations are as follows:
(i) the information receiving unit receives job submission, job meta-information modification and scheduling strategy updating; receiving unallocated resource job information fed back by the resource scheduling module, and updating job states; receiving operation state information fed back by the operation running device and updating the operation state;
(ii) the information caching unit is used for locally caching the operation meta information and the state information and supporting frequent real-time query;
(iii) the persistent storage unit maintains the data consistency of the cache layer and the storage layer according to the metadata information of the persistent operation of the cache content;
(iv) the concurrency control unit assigns a read-write lock to access of each of the job resources.
Preferably, in step (iii), the data consistency between the cache layer and the storage layer is maintained by the following method:
(a) writing updated data into a local file by using fault-tolerant storage, and writing the updated data into a storage layer after the network is recovered to be normal;
(b) the fuse is used, when the fault-tolerant mechanism is triggered and reaches a preset threshold value, the fuse is disconnected, the service performs degradation processing, and a new task is not scheduled;
(c) and encapsulating the operation state interface of the operation running device, acquiring the operation running state and auditing the operation running state when the operation scheduling node is started every time, and ensuring that the operation state in the operation running device is consistent with the state in the metadata storage layer.
Preferably, the job preloading module includes a real-time query unit, a job preloading unit, and a fault processing unit, wherein the specific operations are as follows:
(I) the real-time query unit queries the job metadata cache in real time to acquire the unlocked job to be scheduled;
(II) adding the unlocked job to be scheduled into a bounded ordered queue by a job preloading unit, and sequencing according to job scheduling time and job priority;
(III) the fault processing unit receives fault information from the operation device and carries out fault tolerance processing; the fault tolerance processing means that a working node is down or a long-time network is disconnected in the operation process, the operation running device informs the operation preloading module, the operation preloading module is connected with the working node to judge whether the connection is unavailable, if the connection is not available, the operation is directly put into a queue, and the lost operation is finally dispatched to other available nodes; if the unconnected node recovers again at the moment, the operation running device can directly kill the operation process, so that the same operation cannot be simultaneously run on two nodes under one operation running device.
Preferably, the bounded ordered blocking queue is used for loading all jobs to be scheduled, wherein bounded means that the upper limit of the number of jobs is guaranteed, and an upper limit parameter can be given through scene evaluation; the order refers to that the job with earlier trigger time and higher priority is placed at the position at the front of the queue for preferential scheduling; in the process of stopping deleting the job, supporting to remove the specified job to be scheduled from the queue; with the producer-consumer model, the CPU burden is reduced using a thread blocking-wakeup approach.
Preferably, the resource scheduling module includes a resource obtaining unit, a resource allocating unit, and a scheduling distributing unit, wherein the specific operations are as follows:
1) the resource obtaining unit obtains the computing resources of the operation cluster and caches the computing resources in the memory;
2) the resource allocation unit acquires all the jobs from the bounded ordered queue and allocates computing resources to each job according to the job priority order;
3) and the scheduling and distributing unit appoints an actuator for the scheduled job and sends the job meta-information, the distributed computing resources and the actuator configuration to the job running cluster.
Preferably, in the step (2), the job running device includes a main control node and a work node, the main control node is responsible for management and coordination, and the work node is responsible for executing the data integration job; the master control node receives the job meta-information, the job resource allocation information and the job executor information which are distributed by the resource scheduling module and starts the job execution; the agent program on the working node collects the state information of the operation and sends the state information to the main control node, and the main control node sends the state information to the operation management module; the agent program on the working node collects the working node computing resources and sends the working node computing resources to the main control node, and the main control node sends the working node computing resources to the resource scheduling module; the agent program on the working node sends heartbeat information to the main control node, and the main control node sends loss of connection or fault information to the operation preloading module; the executor on the working node is provided with a retry mechanism, and once the data flow source or the data flow target goes down or loses connection, the executor carries out timing retry to ensure that the data flow source and the data flow target can continue to normally operate after recovery.
Preferably, the operation of the job running device is based on a meso cluster system to perform distributed system resource management, a master control node provides low-delay local metadata management in a RAM + WAL log mode, a PAXOS algorithm is adopted to maintain job state synchronization of a large number of working nodes, and specific physical resources are pushed to a job scheduling system based on a cluster physical resource unified management interface and a specific resource sharing strategy of the master control node; the job operation cluster provides two modes of a multi-language driver package and JSON RPC for the registration of the job scheduling system and the acquisition of a specific callback event; the agent program is responsible for collecting the resources of the working nodes, running a specific scheduling task through the actuator and returning the execution result and the task state of the actuator to the master control node; and then the main control node forwards the data to the job scheduling device.
A distributed data integrated job scheduling apparatus, comprising: a job scheduling device and a job running device; the job scheduling device and the job running device perform information interaction with each other; the job scheduling device comprises a job management module, a job pre-recording module and a resource scheduling module; the operation management module is used for receiving, caching and storing operation related meta information and performing concurrency control; the job preloading module is used for acquiring the jobs to be processed from the job management module and determining the scheduling priority sequence; the resource scheduling module is used for completing resource allocation and scheduling distribution by acquiring the job preloading information and the calculation resource information of the job running device; the operation running device comprises a main control node and a working node, wherein the main control node is responsible for management and coordination, and the working node is responsible for executing data integration operation.
Preferably, the job scheduling device and the job running device are both registered in the ZooKeeper, the job scheduling device adopts a master-slave mode, and once the master device goes down, the ZooKeeper selects the backup device and replaces the job scheduling work; the main control nodes in the operation running device adopt a main standby mode, and once the main control nodes are down, the ZooKeeper elects the standby main control nodes to take over the management coordination work.
Preferably, the job scheduling device performs auditing and maintenance based on job status information fed back by the job running device, and the job metadata database caches and maintains consistency based on job metadata of the job scheduling device; when the job scheduling device fails, the standby job scheduling device needs to interact with the job metadata database once taking over the work, a job metadata cache mechanism is rebuilt, and the metadata cache information is audited and maintained by receiving job state feedback information from the job running device, so that the metadata cache information is kept consistent in a distribution system.
The invention has the beneficial effects that: (1) according to CAP theorem, the method of the invention meets two indexes of high availability and fault tolerance, and adopts a mechanism for ensuring the consistency of operation metadata as much as possible; (2) the multi-tenant concurrency control is realized based on the distributed read-write lock, and the multi-tenant data integration service is provided in a cloud service mode; (3) aiming at the particularity that the data integration operation needs frequent scheduling, the operation metadata adopts a cache mechanism, so that the delay and interruption risks caused by frequent metadata access can be effectively reduced.
Drawings
FIG. 1 is a schematic flow diagram of the apparatus of the present invention;
FIG. 2 is a schematic diagram of the high availability mechanism of the apparatus of the present invention;
FIG. 3 is a schematic flow diagram of the method of the present invention;
FIG. 4 is a schematic flow diagram of a job management module of the present invention;
FIG. 5 is a schematic representation of the operation of the job management module of the present invention;
FIG. 6 is a process flow diagram of a job preloading module of the present invention;
FIG. 7 is a schematic flow diagram of a resource scheduling module of the present invention;
fig. 8 is a schematic view showing an operation flow of the work running apparatus of the present invention.
Detailed Description
The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:
example (b): as shown in fig. 1, a distributed data-integrated job scheduling apparatus is composed of a job scheduling apparatus and a job execution apparatus. The job scheduling device and the job running device perform information interaction with each other; the job scheduling device comprises a job management module, a job pre-recording module and a resource scheduling module; the operation management module is used for receiving, caching and storing operation related meta information and performing concurrency control; the job preloading module is used for acquiring the jobs to be processed from the job management module and determining the scheduling priority sequence; the resource scheduling module is used for completing resource allocation and scheduling distribution by acquiring the job preloading information and the calculation resource information of the job running device; the operation running device comprises a main control node and a working node, wherein the main control node is responsible for management and coordination, and the working node is responsible for executing data integration operation.
As shown in fig. 2, both the job scheduling device and the job running device are registered in the ZooKeeper, the job scheduling device adopts a master-slave mode, and once the master device goes down, the ZooKeeper selects the backup device and replaces the job scheduling work; the main control nodes in the operation running device adopt a main standby mode, and once the main control nodes are down, the ZooKeeper elects the standby main control nodes to take over the management coordination work.
The job scheduling device audits and maintains based on job state information fed back by the job running device, and the job metadata base caches and maintains consistency based on job metadata of the job scheduling device; when the job scheduling device fails, the standby job scheduling device needs to interact with a job metadata database once taking over work, a job metadata cache mechanism is rebuilt, and the metadata cache information is audited and maintained by receiving job state feedback information from the job running device so as to keep the metadata cache information consistent in a distribution system; thereby ensuring weak consistency.
As shown in fig. 3, a distributed data integration job scheduling method includes the following steps:
s100: the job scheduling device issues the data integration job to the job running device, and the job scheduling device consists of a job management module, a job preloading module and a resource scheduling module, and specifically comprises the following parts:
s101: and the operation management module receives, caches and stores the operation related meta information and performs concurrency control. The job management module consists of an information receiving unit, a storage processing unit and a concurrency control unit. As shown in fig. 4, the specific operations are as follows:
(1) the information receiving unit S101-1 is responsible for receiving job submission, job meta-information modification and scheduling policy update; receiving unallocated resource job information fed back by the resource scheduling module, and updating job states; receiving operation state information fed back by the operation running device and updating the operation state;
(2) the information caching unit S101-2 is responsible for locally caching the operation meta information and the state information and supporting frequent real-time query;
(3) the persistent storage unit S101-3 maintains the data consistency of the cache layer and the storage layer according to the metadata information of the persistent job of the cache content;
(4) the concurrency control unit S101-4 is responsible for assigning a read-write lock to access of each job resource.
Specifically, as shown in fig. 5, the job management module of the present invention is responsible for receiving jobs and maintaining job state machines, and the main job states may include: the method comprises the following steps of not starting operation, waiting to be scheduled operation, suspending operation, running operation, stopping operation, abnormal operation and finishing operation. The job management module provides job state operation interfaces, such as operation interfaces for stopping running jobs, suspending jobs, scheduling jobs, suspending jobs, normally stopping jobs, abnormally stopping jobs, and the like. Maintaining various job scheduling policies, such as: repeated operation, timed operation, Cron operation, disposable operation, etc. And a history storage module is internally maintained and used for recording all scheduling histories. And adding a read-write lock to the operation of the cache layer and the metadata persistence layer to realize concurrency control: if there are concurrent threads that are write operations, the lock is upgraded to an exclusive lock and other threads cannot seize the lock. Conversely, if the concurrent thread is a read operation, the lock is upgraded to a shared lock and other threads can concurrently seize the lock. The operation management module adds a cache layer on an operation metadata storage layer, ensures frequent metadata query and call, supports frequent access and frequent scheduling of quasi-real-time data integration tasks, abstracts the cache layer into an SPI interface on an implementation layer, and supports the implementation of cache layer interfaces such as Caffeine, JDK, Guava, Redis and the like. The persistence layer is abstracted into an SPI interface on the implementation level and supports databases such as relational databases, MongoDB databases and the like. Because the job data may cause the problem of data inconsistency due to network and other instability factors when written into the persistence layer, in the implementation level, the job management module adopts "triple insurance" to ensure the metadata consistency as much as possible: (1) writing updated data into a local database/file by using fault-tolerant storage, and writing the updated data into a storage layer after the network is recovered to be normal; (2) the fuse is used, when the fault-tolerant mechanism is triggered and reaches a certain threshold value, the fuse is disconnected, the service performs degradation processing, and a new task is not scheduled; (3) and encapsulating an operation state interface of the operation running system, and acquiring the operation running state for auditing when the operation scheduling node is started every time, so as to ensure that the operation state in the operation running system is consistent with the state in the metadata storage layer.
S102, as shown in figure 6, the job preloading module acquires the job to be processed from the job management module and determines the scheduling priority sequence, wherein the job preloading module consists of a real-time query unit, a job preloading unit and a fault processing unit. The method comprises the following specific operations:
(1) the real-time query unit S102-1 is responsible for querying job metadata cache in real time and acquiring unlocked jobs to be scheduled;
(2) the job preloading unit S102-2 is responsible for adding the unlocked job to be scheduled into the bounded ordered queue and sorting the job according to the job scheduling time and the job priority;
(3) failure processing unit S102-3: and the system is responsible for receiving fault information from the operation cluster and performing fault tolerance processing.
Specifically, a bounded ordered blocking queue is built to load all jobs to be scheduled. The bounded state refers to the condition that the upper limit of the number of the jobs is guaranteed, and upper limit parameters can be given through scene evaluation; in order means that jobs with earlier trigger times and higher priority levels will be placed in the queue front position for priority scheduling. And in the process of stopping deleting the job, supporting to remove the specified job to be scheduled from the queue. With the producer-consumer model, the CPU burden is reduced using a thread blocking-wakeup approach.
The fault tolerance processing means that in the operation process of the operation, a working node is down or a long-time network is disconnected, the operation running device informs the operation preloading module, the operation preloading module is connected with the working node to judge whether the connection is unavailable, if the connection is not available, the operation is directly put into a queue, and the lost operation is finally scheduled to other available nodes. If the disconnected node is recovered again, the operation running device can directly kill the operation process to ensure that the same operation cannot run on two nodes simultaneously under one operation running device.
And S103, the resource scheduling module finishes resource allocation and scheduling distribution by acquiring the job preloading information and the calculation resource information of the job running device. The resource scheduling module consists of a resource acquisition unit, a resource allocation unit and a scheduling and distributing unit; as shown in fig. 7:
(1) the resource obtaining unit S103-1 is responsible for obtaining the computing resources of the job running cluster and caching the computing resources in the memory;
(2) the resource allocation unit S103-2 is responsible for acquiring all the jobs from the bounded ordered queue and allocating computing resources to each job according to the job priority order;
(3) the scheduling and distributing unit S103-3 is responsible for assigning an executor to the scheduled job and sending the job meta-information, the allocated computing resource and the executor configuration to the job running cluster.
The implementation of the Executor may be Linux container Executor, Docker Executor, or other executors, and these container executors can implement the isolation of the computing resources.
And S200, the job running device receives the scheduling task and starts job execution, sends the state information of job running to the job management module, feeds back the working node calculation resources to the resource scheduling module, and feeds back the loss of connection or fault information to the job preloading module.
The operation running device is provided with a main control node and a working node, the main control node is responsible for management and coordination, and the working node is responsible for executing data integration operation. The master control node receives job meta-information, job resource allocation information and job executor information distributed by the resource scheduling module and starts job execution; the agent program on the working node collects the state information of the operation and sends the state information to the main control node, and the main control node sends the state information to the operation management module; the agent program on the working node collects the working node computing resources and sends the working node computing resources to the main control node, and the main control node sends the working node computing resources to the resource scheduling module; and the agent program on the working node sends heartbeat information to the main control node, and the main control node sends loss of connection or fault information to the operation preloading module. The executor on the working node is provided with a retry mechanism, and once the data flow source or the data flow target goes down or loses connection, the executor carries out timing retry to ensure that the data flow source and the data flow target can continue to normally operate after being recovered.
As shown in fig. 8, the job running apparatus may perform distributed system resource management based on a meso cluster system, where the master node provides low-latency local metadata management in a RAM + WAL log manner, maintains job state synchronization of a large number of working nodes by using a PAXOS algorithm, and pushes specific physical resources to a job scheduling system based on a cluster physical resource unified management interface and a specific resource sharing policy of the master node. The job operation cluster provides two modes of a multi-language driver package and JSON RPC for the registration of the job scheduling system and the acquisition of a specific callback event; the agent program is responsible for collecting the resources of the working nodes, running a specific scheduling task through the actuator and returning the execution result and the task state of the actuator to the master control node. And then the main control node forwards the data to the job scheduling device.
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A distributed data integration job scheduling method is characterized by comprising the following steps: (1) the job scheduling device issues the data integration job to the job running device, wherein the job scheduling device comprises a job management module, a job preloading module and a resource scheduling module:
(1.1) the operation management module receives, caches and stores operation related meta information to perform concurrency control; the operation management module comprises an information receiving unit, an information caching unit, a persistent storage unit and a concurrency control unit, wherein the operation management module specifically comprises the following operations:
(1.1.1) the information receiving unit receives the job submission, the job meta-information modification and the scheduling policy update; receiving unallocated resource job information fed back by the resource scheduling module, and updating job states; receiving operation state information fed back by the operation running device and updating the operation state;
(1.1.2) the information caching unit carries out local caching on the operation meta information and the state information and supports frequent real-time query;
(1.1.3) the persistent storage unit maintains the data consistency of the cache layer and the storage layer according to the metadata information of the persistent job of the cache content; the method for maintaining the data consistency of the cache layer and the storage layer is realized by the following steps:
(a) writing updated data into a local file by using fault-tolerant storage, and writing the updated data into a storage layer after the network is recovered to be normal;
(b) the fuse is used, when the fault-tolerant mechanism is triggered and reaches a preset threshold value, the fuse is disconnected, the service performs degradation processing, and a new task is not scheduled;
(c) encapsulating an operation state interface of the operation running device, acquiring an operation running state and auditing the operation running state when an operation scheduling node is started each time, and ensuring that the operation state in the operation running device is consistent with the state in the metadata storage layer;
(1.1.4) the concurrency control unit allocating a read-write lock to access each job resource;
(1.2) the job preloading module acquires the job to be processed from the job management module and determines a bounded ordered queue and a scheduling priority; the bounded ordered queue is used for loading all jobs to be scheduled, wherein bounded means that the upper limit of the number of the jobs is guaranteed, and the upper limit parameter of the number of the jobs is given through scene evaluation; the order refers to that the jobs with early triggering time and high priority are placed at the front position of the queue for preferential scheduling; in the process of stopping deleting the job, supporting to remove the specified job to be scheduled from the queue; the producer-consumer model is adopted, and the CPU burden is reduced by using a thread blocking-awakening mode;
(1.3) the resource scheduling module completes resource allocation and scheduling distribution by acquiring the job preloading information and the calculation resource information of the job running device;
(2) the operation running device receives the scheduling task and starts operation execution, and feeds back the operation running state information to the operation management module, feeds back the working node calculation resources to the resource scheduling module, and feeds back the loss connection or fault information to the operation preloading module.
2. The distributed data integration job scheduling method according to claim 1, wherein: the operation preloading module comprises a real-time query unit, an operation preloading unit and a fault processing unit, wherein the operation of the step (1.2) is as follows:
(I) the real-time query unit queries the job metadata cache in real time to acquire the unlocked job to be scheduled;
(II) adding the unlocked job to be scheduled into a bounded ordered queue by a job preloading unit, and sequencing according to job scheduling time and job priority;
(III) the fault processing unit receives fault information from the operation device and carries out fault tolerance processing; the fault tolerance processing means that a working node is down or a long-time network is disconnected in the operation process, the operation running device informs the operation preloading module, the operation preloading module is connected with the working node to judge whether the connection is unavailable, if the connection is not available, the operation is directly put into a queue, and the lost operation is finally dispatched to other available nodes; if the unconnected node recovers again at the moment, the operation running device can directly kill the operation process, so that the condition that the same operation cannot run on two nodes simultaneously under one operation running device is ensured.
3. The distributed data integration job scheduling method according to claim 1, wherein: the resource scheduling module comprises a resource obtaining unit, a resource allocation unit and a scheduling distribution unit, wherein the step (1.3) specifically operates as follows:
1) the resource obtaining unit obtains the computing resources of the operation cluster and caches the computing resources in the memory;
2) the resource allocation unit acquires all the jobs from the bounded ordered queue and allocates computing resources to each job according to the job priority order;
3) and the scheduling and distributing unit appoints an actuator for the scheduled job and sends the job meta-information, the distributed computing resources and the actuator configuration to the job running cluster.
4. The distributed data integration job scheduling method according to claim 1, wherein: in the step (2), the operation running device comprises a main control node and a working node, the main control node is responsible for management and coordination, and the working node is responsible for executing data integration operation; the master control node receives the job meta-information, the job resource allocation information and the job executor information which are distributed by the resource scheduling module and starts the job execution; the agent program on the working node collects the state information of the operation and sends the state information to the main control node, and the main control node sends the state information to the operation management module; the agent program on the working node collects the working node computing resources and sends the working node computing resources to the main control node, and the main control node sends the working node computing resources to the resource scheduling module; the agent program on the working node sends heartbeat information to the main control node, and the main control node sends loss of connection or fault information to the operation preloading module; the executor on the working node is provided with a retry mechanism, and once the data flow source or the data flow target goes down or loses connection, the executor carries out timing retry to ensure that the data flow source and the data flow target can continue to normally operate after recovery.
5. The distributed data integration job scheduling method according to claim 1, wherein: the operation of the operation device is based on a meso cluster system to manage distributed system resources, a main control node adopts a RAM + WAL log mode to provide low-delay local metadata management, a PAXOS algorithm is adopted to maintain the operation state synchronization of working nodes, and specific physical resources are pushed to an operation scheduling system based on a cluster physical resource unified management interface and a specific resource sharing strategy; the job operation cluster provides two modes of a multi-language driver package and JSON RPC for the registration of the job scheduling system and the acquisition of a specific callback event; the agent program is responsible for collecting the resources of the working nodes, running a specific scheduling task through the actuator and returning the execution result and the task state of the actuator to the master control node; and then the main control node forwards the data to the job scheduling device.
6. A distributed data integrated job scheduling apparatus to which the method of claim 1 is applied, comprising: a job scheduling device and a job running device; the job scheduling device and the job running device perform information interaction with each other; the job scheduling device comprises a job management module, a job preloading module and a resource scheduling module; the operation management module is used for receiving, caching and storing operation related meta information and performing concurrency control; the job preloading module is used for acquiring the job to be processed from the job management module and determining the scheduling priority; the resource scheduling module is used for completing resource allocation and scheduling distribution by acquiring the job preloading information and the calculation resource information of the job running device; the operation running device comprises a main control node and a working node, wherein the main control node is responsible for management and coordination, and the working node is responsible for executing data integration operation.
7. The distributed data integrated job scheduling device according to claim 6, wherein: the job scheduling device and the job running device are both registered in the ZooKeeper, the job scheduling device adopts a main standby mode, and once the main device is down, the ZooKeeper selects a standby device and replaces the job scheduling work; the main control nodes in the operation running device adopt a main standby mode, and once the main control nodes are down, the ZooKeeper elects the standby main control nodes to take over the management coordination work.
8. The distributed data integrated job scheduling apparatus according to claim 7, wherein: the job scheduling device audits and maintains based on job state information fed back by the job running device, and the job metadata base maintains consistency based on job metadata cache of the job scheduling device; when the job scheduling device fails, the standby job scheduling device needs to interact with the job metadata database once taking over the work, a job metadata cache mechanism is rebuilt, and the metadata cache information is audited and maintained by receiving job state feedback information from the job running device, so that the metadata cache information is kept consistent in a distribution system.
CN201910489422.2A 2019-06-06 2019-06-06 Distributed data integration job scheduling method and device Active CN110362390B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910489422.2A CN110362390B (en) 2019-06-06 2019-06-06 Distributed data integration job scheduling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910489422.2A CN110362390B (en) 2019-06-06 2019-06-06 Distributed data integration job scheduling method and device

Publications (2)

Publication Number Publication Date
CN110362390A CN110362390A (en) 2019-10-22
CN110362390B true CN110362390B (en) 2021-09-07

Family

ID=68215696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910489422.2A Active CN110362390B (en) 2019-06-06 2019-06-06 Distributed data integration job scheduling method and device

Country Status (1)

Country Link
CN (1) CN110362390B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045802B (en) * 2019-11-22 2024-01-26 中国联合网络通信集团有限公司 Redis cluster component scheduling system and method and platform equipment
CN111124806B (en) * 2019-11-25 2023-09-05 山东鲁软数字科技有限公司 Method and system for monitoring equipment state in real time based on distributed scheduling task
CN111338770A (en) * 2020-02-12 2020-06-26 咪咕文化科技有限公司 Task scheduling method, server and computer readable storage medium
CN111580990A (en) * 2020-05-08 2020-08-25 中国建设银行股份有限公司 Task scheduling method, scheduling node, centralized configuration server and system
CN112200534A (en) * 2020-09-24 2021-01-08 中国建设银行股份有限公司 Method and device for managing time events
CN112328383A (en) * 2020-11-19 2021-02-05 湖南智慧畅行交通科技有限公司 Priority-based job concurrency control and scheduling algorithm
CN112131318B (en) * 2020-11-30 2021-03-16 北京优炫软件股份有限公司 Pre-written log record ordering system in database cluster
CN112527488A (en) * 2020-12-21 2021-03-19 浙江百应科技有限公司 Distributed high-availability task scheduling method and system
CN112835717A (en) * 2021-02-05 2021-05-25 远光软件股份有限公司 Integrated application processing method and device for cluster
CN113778676B (en) * 2021-09-02 2023-05-23 山东派盟网络科技有限公司 Task scheduling system, method, computer device and storage medium
CN113986507A (en) * 2021-11-01 2022-01-28 佛山技研智联科技有限公司 Job scheduling method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6490572B2 (en) * 1998-05-15 2002-12-03 International Business Machines Corporation Optimization prediction for industrial processes
CN101309208A (en) * 2008-06-21 2008-11-19 华中科技大学 Job scheduling system suitable for grid environment and based on reliable expense
CN101599026A (en) * 2009-07-09 2009-12-09 浪潮电子信息产业股份有限公司 A kind of cluster job scheduling system with resilient infrastructure
CN104317650A (en) * 2014-10-10 2015-01-28 北京工业大学 Map/Reduce type mass data processing platform-orientated job scheduling method
CN104462370A (en) * 2014-12-09 2015-03-25 北京百度网讯科技有限公司 Distributed task scheduling system and method
US9141433B2 (en) * 2009-12-18 2015-09-22 International Business Machines Corporation Automated cloud workload management in a map-reduce environment
CN109327509A (en) * 2018-09-11 2019-02-12 武汉魅瞳科技有限公司 A kind of distributive type Computational frame of the lower coupling of master/slave framework

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6490572B2 (en) * 1998-05-15 2002-12-03 International Business Machines Corporation Optimization prediction for industrial processes
CN101309208A (en) * 2008-06-21 2008-11-19 华中科技大学 Job scheduling system suitable for grid environment and based on reliable expense
CN101599026A (en) * 2009-07-09 2009-12-09 浪潮电子信息产业股份有限公司 A kind of cluster job scheduling system with resilient infrastructure
US9141433B2 (en) * 2009-12-18 2015-09-22 International Business Machines Corporation Automated cloud workload management in a map-reduce environment
CN104317650A (en) * 2014-10-10 2015-01-28 北京工业大学 Map/Reduce type mass data processing platform-orientated job scheduling method
CN104462370A (en) * 2014-12-09 2015-03-25 北京百度网讯科技有限公司 Distributed task scheduling system and method
CN109327509A (en) * 2018-09-11 2019-02-12 武汉魅瞳科技有限公司 A kind of distributive type Computational frame of the lower coupling of master/slave framework

Also Published As

Publication number Publication date
CN110362390A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
CN110362390B (en) Distributed data integration job scheduling method and device
US11561841B2 (en) Managing partitions in a scalable environment
US10817478B2 (en) System and method for supporting persistent store versioning and integrity in a distributed data grid
US8799248B2 (en) Real-time transaction scheduling in a distributed database
CN104793988B (en) The implementation method and device of integration across database distributed transaction
Lin et al. Towards a non-2pc transaction management in distributed database systems
JP5214105B2 (en) Virtual machine monitoring
US20040158549A1 (en) Method and apparatus for online transaction processing
US20090172142A1 (en) System and method for adding a standby computer into clustered computer system
EP3198430A1 (en) System and method for supporting dynamic thread pool sizing in a distributed data grid
US9128895B2 (en) Intelligent flood control management
JP2015506523A (en) Dynamic load balancing in a scalable environment
EP2673711A1 (en) Method and system for reducing write latency for database logging utilizing multiple storage devices
US11550820B2 (en) System and method for partition-scoped snapshot creation in a distributed data computing environment
CN114064414A (en) High-availability cluster state monitoring method and system
US9703634B2 (en) Data recovery for a compute node in a heterogeneous database system
CN111580951A (en) Task allocation method and resource management platform
CN113342511A (en) Distributed task management system and method
JP2010218159A (en) Management device, database management method and program
Lev-Ari et al. Quick: a queuing system in cloudkit
JPH01112444A (en) Data access system
Martin et al. Near real-time analytics with ibm db2 analytics accelerator
WO2016048831A1 (en) System and method for supporting dynamic thread pool sizing in a distributed data grid
CN117931302A (en) Parameter file saving and loading method, device, equipment and storage medium
CN114416372A (en) Request processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee after: Yinjiang Technology Co.,Ltd.

Address before: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee before: ENJOYOR Co.,Ltd.