CN110362390A - A kind of distributed data integrated operations dispatching method and device - Google Patents

A kind of distributed data integrated operations dispatching method and device Download PDF

Info

Publication number
CN110362390A
CN110362390A CN201910489422.2A CN201910489422A CN110362390A CN 110362390 A CN110362390 A CN 110362390A CN 201910489422 A CN201910489422 A CN 201910489422A CN 110362390 A CN110362390 A CN 110362390A
Authority
CN
China
Prior art keywords
job
scheduling
resource
node
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910489422.2A
Other languages
Chinese (zh)
Other versions
CN110362390B (en
Inventor
李建元
刘飞黄
王超群
刘兴田
贾建涛
温晓岳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yinjiang Technology Co.,Ltd.
Original Assignee
Enjoyor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enjoyor Co Ltd filed Critical Enjoyor Co Ltd
Priority to CN201910489422.2A priority Critical patent/CN110362390B/en
Publication of CN110362390A publication Critical patent/CN110362390A/en
Application granted granted Critical
Publication of CN110362390B publication Critical patent/CN110362390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of distributed data integrated operations dispatching method and devices, the present invention is for data sets at the special screne that may be faced, it is responsible for data integration operation being issued to operation device by job scheduling device, operation device receives scheduler task and initiating task executes, and it by the status information of job run and is sent to and feeds back to job management module, working node computing resource is fed back into scheduling of resource module, lost contact or fault message are fed back into operation preloaded components.The present invention has following Comprehensive Characteristics: (1) High Availabitity and failure tolerant, weak consistency;(2) towards the low latency characteristic of quasi real time job scheduling;(3) the multi-tenant con current control of cloud service-oriented application;(4) computing resource isolation is dispatched with more job parallelisms;(5) priority scheduling mechanism.

Description

A kind of distributed data integrated operations dispatching method and device
Technical field
The present invention relates to big data basic technology field more particularly to a kind of distributed data integrated operations dispatching method and Device.
Background technique
With the evolution of digital economy, the service digitalization of numerous industries has attained full development, and digital serviceization is gradually As new center of gravity.However, since service digital has derived from mass data isolated island, and become the general character for realizing digital service Data silo, integration and improvement data resource are got through and evaded to pain spot, every profession and trade, to effectively open there is an urgent need to data integration Send out the affiliated value between data.
Data integration often faces the homework types such as thousands of job scheduling, including data exchange, data prediction, The design of scheduling system needs to consider the scene of various complexity.For example, not only operation quantity is more but also needs parallel for some scenes Processing;Some scene requirements are quasi real time run, and need to consider priority scheduling;Some operations need to occupy more or the long period Computing resource needs to consider resource isolation in order to avoid influencing other operations;Some operations may meet with data source and target delay machine, The failures such as interruption, the operation node delay machine of network, need failure tolerant mechanism;Under multi-tenant scene, needs to handle and concurrently control System;Etc..
The prior art is not able to satisfy complicated data integration job scheduling demand.Such as: traditional LTS dispatches system, makees Industry is isolated based on thread, if the execution thread of an operation runs out of all memories of current process, will lead to All operations are all abnormal under this process, lack the ability of scheduling data integration operation, more suitable for dispatching lightweight task.In State patent of invention CN201610800080 discloses a kind of distributed task dispatching system and method, for solving parallel computation journey The problem that sequence development scheme written in code amount is big, developer's development task is heavy, essence are towards single large-scale distributed meter It can be regarded as industry to be scheduled, do not consider that multi-task parallel is dispatched.Chinese invention patent CN201610197298 discloses a kind of task Dispatching method, apparatus and system propose the distributed scheduling method of multichannel multitask, solve single task role scheduling time The problem hungry to death of other operations caused by too long, but do not consider the low time delay of operation metadata access, how to ensure operation element The problems such as data consistency.Chinese invention patent CN201410748604 discloses a kind of distributed task dispatching system and side Method proposes the reliability of guarantee system itself, supports independence or associated task, supports the distributed task tune of task rollback System and method is spent, but is not suitable for data integration complex scene, the rollback of task is not emphasis, data under these scenes It does not need to be associated between integration servers yet, does not consider under large-scale complex data integration task that delay machine cost is big and need reality The problem of existing High Availabitity, and how to reduce time delay as far as possible under the premise of quasi real time task is existing, how to guarantee operation element number The problems such as according to consistency.
Summary of the invention
The present invention is to overcome above-mentioned shortcoming, and it is an object of the present invention to provide a kind of distributed data integrated operations dispatching method And device, the present invention at the special screne that may be faced, are responsible for by job scheduling device by data integration operation for data sets It is issued to operation device, operation device receives scheduler task and initiating task executes, and by the state of job run Information is simultaneously sent to and feeds back to job management module, and working node computing resource is fed back to scheduling of resource module, by lost contact or Fault message feeds back to operation preloaded components;The present invention has following Comprehensive Characteristics: (1) High Availabitity and failure tolerant, and weak one Cause property;(2) towards the low latency characteristic of quasi real time job scheduling;(3) the multi-tenant con current control of cloud service-oriented application;(4) Computing resource isolation is dispatched with more job parallelisms;(5) priority scheduling mechanism.
The present invention is to reach above-mentioned purpose by the following technical programs: a kind of distributed data integrated operations dispatching method, Include the following steps:
(1) data integration operation is issued to operation device by job scheduling device, wherein the job scheduling device Including job management module, operation preloaded components, scheduling of resource module:
(1.1) job management module receives, caching, stores operation relevant meta information, carries out con current control;
(1.2) operation preloaded components obtain operation to be processed to job management module, and determine dispatching priority sequentially;
(1.3) scheduling of resource module passes through the computing resource information for obtaining operation preload information and operation device, Complete Resource Distribution and Schedule distribution;
(2) operation device receives scheduler task and initiating task and executes, and by the status information of job run and anti- It feeds job management module, working node computing resource is fed back into scheduling of resource module, lost contact or fault message are fed back to Operation preloaded components.
Preferably, the job management module includes information receiving unit, information cache unit, persistent storage list Member, con current control unit, wherein concrete operations are as follows:
(i) information receiving unit receives operation submission, the modification of operation metamessage, scheduling strategy update;Receive scheduling of resource The unallocated resource job information of module feedback, and update job state;Receive the job state letter of operation device feedback Breath, and update job state;
(ii) information cache unit is localized caching to operation metamessage and status information, and support is frequently looked into real time It askes;
(iii) persistent storage unit safeguards cache layer and accumulation layer according to cache contents persistence operation metadata information Data consistency;
(iv) con current control unit distributes Read-Write Locks to the access of each operation resource.
Preferably, safeguarding the data consistency of cache layer and accumulation layer in the step (iii) by the following method, have Body is as follows:
(a) fault tolerant storage is used, the data of update are written in local file, normally write-in is deposited later to network recovery Reservoir;
(b) fuse is used, when fault tolerant mechanism has triggered and reaches scheduled threshold value, fuse is disconnected, and service carries out Downgrade processing no longer dispatches new task;
(c) the job state interface of packaging operation running gear obtains operation fortune in the starting of each job scheduling node Row state is simultaneously audited, and guarantees that the job state in operation device is consistent with the state in metadata accumulation layer.
Preferably, the operation preloaded components include real-time query unit, operation preload unit and troubleshooting Unit, wherein concrete operations are as follows:
(I) real-time query unit real-time query operation metadata cache obtains unlocked to schedule job;
(II) operation preload unit is added to bounded ordered queue to schedule job for unlocked, and according to job scheduling Time and job priority sequence;
(III) fault processing unit receives the fault message from operation device, carries out failure tolerant processing;Its In, the failure tolerant processing refers to that working node delay machine or long-time network have occurred during job run to be lost Connection, operation device are notified that operation preloaded components, and operation preloaded components can connect that working node judges whether can not Even, if it is confirmed that can not connect, then operation is directly put into queue, this is lost operation and is eventually scheduled for other available sections Point on;If the loss node restores again at this time, operation device can directly kill this operation process, to guarantee one The same operation is not had under a operation device on two nodes while being run.
Preferably, orderly to block queue all to schedule job to load for the bounded, wherein what bounded referred to It is to guarantee the number of jobs upper limit, can be assessed by scene and provide upper limit parameter;Orderly refer to that the triggered time is relatively early, priority is higher Operation will be placed on the forward position priority scheduling of queue;During operation stops deleting, support is removed from queue It is specified to schedule job;Using Producer-consumer model, CPU is reduced using thread block-wake-up mode and is born.
Preferably, the scheduling of resource module includes resource acquisition unit, resource allocation unit, scheduling distribution list Member, wherein concrete operations are as follows:
1) resource acquisition unit obtains the computing resource of job run cluster, and is cached in memory;
2) resource allocation unit obtains all operations from bounded ordered queue, and is each work according to job priority sequence Industry distributes computing resource;
3) scheduling Dispatching Unit be the specified actuator of scheduled operation, by operation metamessage, distribution computing resource, hold The configuration of row device is sent to job run cluster.
Preferably, operation device includes main controlled node and working node in the step (2), main controlled node is negative Management coordination is blamed, working node is responsible for executing data integration operation;The main controlled node receives the distribution of scheduling of resource module Operation metamessage, operation resource allocation information, Work implement information, and initiating task executes;Generation on the working node Reason program collects the status information of job run and is sent to main controlled node, is sent to job management module by main controlled node;Institute It states the broker program collection work node computing resource on working node and is sent to main controlled node, and be sent to by main controlled node Scheduling of resource module;Broker program on the working node sends heartbeat message to main controlled node, main controlled node by lost contact or Fault message is sent to operation preloaded components;Actuator on the working node has retry mechanism, once stream compression There is delay machine or lost contact in source or target, are timed and retry to guarantee to can continue to normally transport after data source and datum target are restored Row.
Preferably, the operation of the operation device, which is based on Mesos group system, carries out distributed system resource pipe Reason, main controlled node provide the local metadata management of low delay using RAM+WAL log mode, are safeguarded using PAXOS algorithm The job state of extensive work node is synchronous, based on its cluster physical resource unified management interface and specific Resources Sharing To push specific physical resource to job scheduling system;Job run cluster provides multilingual driving packet and two kinds of JSON RPC Mode registers and gets specific callback events for job scheduling system;Broker program is responsible for working node collection of resources, Specific scheduler task is run by actuator, and the implementing result of actuator and task status are returned to main controlled node; Job scheduling device is transmitted to by main controlled node again.
A kind of distributed data integrated operations dispatching device, comprising: job scheduling device and operation device;Described Job scheduling device and operation device carry out mutually information exchange;The job scheduling device includes job management mould Module, scheduling of resource module are recorded in block, operation in advance;The job management module is for receiving, caching, storing the related member of operation Information carries out con current control;The operation preloaded components are used to obtain operation to be processed to job management module, and determine Dispatching priority is sequentially;The scheduling of resource module is used for the calculating by obtaining operation preload information and operation device Resource information completes Resource Distribution and Schedule distribution;The operation device includes main controlled node and working node, master control Node is responsible for management coordination, and working node is responsible for executing data integration operation.
Preferably, the job scheduling device and operation device are registered in ZooKeeper, the work Industry dispatching device uses active-standby mode, once master device delay machine, ZooKeeper elects stand-by provision and takes over job scheduling work Make;Main controlled node in operation device uses active-standby mode, once main controlled node delay machine, ZooKeeper elects spare master Control node takes over management coordination work.
Preferably, the job status information that the job scheduling device is fed back based on operation device is audited And maintenance, operation metadatabase safeguard consistency based on the operation metadata cache of job scheduling device;In job scheduling device When breaking down, spare job scheduling device once takes over work, needs to interact with operation metadatabase, rebuilds operation metadata Caching mechanism, and by receiving the job state feedback information from operation device, metadata cache information is safeguarded in audit, It is consistent it in compartment system.
The beneficial effects of the present invention are: (1) according to CAP theorem, the method for the present invention meets High Availabitity and fault-tolerance two Index, and take the mechanism for guaranteeing operation metadata consistency as possible;(2) multi-tenant is realized simultaneously based on distributed Read-Write Locks Hair control is supported to provide multi-tenant data integrated service with cloud service mode;(3) frequently scheduling is needed at operation for data sets Particularity, operation metadata uses caching mechanism, can be effectively reduced delay caused by frequent metadata access and in Disconnected risk.
Detailed description of the invention
Fig. 1 is the device of the invention flow diagram;
Fig. 2 is the high availability mechanism schematic diagram of apparatus of the present invention;
Fig. 3 is method flow schematic diagram of the invention;
Fig. 4 is job management block process schematic diagram of the invention;
Fig. 5 is job management module operation schematic diagram of the invention;
Fig. 6 is operation preloaded components flow diagram of the invention;
Fig. 7 is scheduling of resource block process schematic diagram of the invention;
Fig. 8 is operation device operational process schematic diagram of the invention.
Specific embodiment
The present invention is described further combined with specific embodiments below, but protection scope of the present invention is not limited in This:
Embodiment: as shown in Figure 1, a kind of distributed data integrated operations dispatching device is transported by job scheduling device and operation Luggage sets composition.Job scheduling device and operation device carry out mutually information exchange;The job scheduling device includes Module, scheduling of resource module are recorded in job management module, operation in advance;The job management module is for receiving, caching, storing Operation relevant meta information carries out con current control;The operation preloaded components are used to obtain to job management module to be processed Operation, and determine dispatching priority sequentially;The scheduling of resource module is used for by obtaining operation preload information and operation fortune The computing resource information that luggage is set completes Resource Distribution and Schedule distribution;The operation device include main controlled node with Working node, main controlled node are responsible for management coordination, and working node is responsible for executing data integration operation.
As shown in Fig. 2, job scheduling device and operation device are registered in ZooKeeper, the operation tune It spends device and uses active-standby mode, once master device delay machine, ZooKeeper elects stand-by provision and takes over job scheduling work; Main controlled node in operation device uses active-standby mode, once main controlled node delay machine, ZooKeeper elects spare master control section Point takes over management coordination work.
Job scheduling device is audited and is safeguarded based on the job status information that operation device is fed back, operation element number Consistency is safeguarded based on the operation metadata cache of job scheduling device according to library;It is spare when job scheduling device breaks down Job scheduling device once takes over work, needs to interact with operation metadatabase, rebuilds operation metadata cache mechanism, and pass through The job state feedback information from operation device is received, audit maintenance metadata cache information makes it in compartment system In be consistent;To ensure weak consistency.
As shown in figure 3, a kind of distributed data integrated operations dispatching method, includes the following steps:
S100: data integration operation is issued to operation device by job scheduling device, and job scheduling device is by operation Management module, operation preloaded components, scheduling of resource module three parts composition, specific as follows:
S101: job management module receives, caching, stores operation relevant meta information, carries out con current control.Job management mould Block is made of information receiving unit, storage processing unit, con current control unit three parts.Wherein as shown in figure 4, concrete operations such as Under:
(1) information receiving unit S101-1 is responsible for receiving operation submission, the modification of operation metamessage, scheduling strategy update;It connects The unallocated resource job information of scheduling of resource module feedback is received, and updates job state;Receive operation device feedback Job status information, and update job state;
(2) information cache cell S 101-2 is responsible for being localized operation metamessage and status information caching, support frequency Numerous real-time query;
(3) persistent storage unit S101-3 safeguards cache layer and deposits according to cache contents persistence operation metadata information The data consistency of reservoir;
(4) con current control cell S 101-4 is responsible for the access distribution Read-Write Locks to each operation resource.
Specifically, as shown in figure 5, job management module of the present invention is responsible for receiving operation and upkeep operation state Machine, main job state can include: inactive operation, to operation in operation in schedule job, Suspend Job, operation, stopping, Abnormal operation fulfils assignment.Job management module provides job state operation interface, as operation, the pause in out of service are made Industry blocked job, schedule job, blocked job, normally stops operation, the operation interfaces such as the operation that abends.Safeguard a variety of works Industry scheduling strategy, such as: repeat operation, timing operation, Cron operation, disposable operation.One historical storage mould of internal maintenance Block, for recording all schedule histories.Read-Write Locks are added to the operation of cache layer and metadata data persistence layer, realization is concurrently controlled System: if there is concurrent thread is write operation, then the lock just upgrades into exclusive lock, other threads can not just occupy the lock.Conversely, If concurrent thread is read operation, which just upgrades into shared lock, and other threads can also occupy the lock simultaneously.Working pipe Reason module adds cache layer on operation element data storage layer, guarantees that frequent metadata query is called, support quasi real time counts Be abstracted into SPI interface in realization level according to the frequent access and frequently scheduling, cache layer of integration servers, support Caffeine, JDK, Guava, Redis etc. cache layer interface and realize.Data persistence layer is also abstracted into SPI interface in realization level, supports relationship type With the databases such as MongoDB.Due to that may be led due to the unstability factor such as network when work data is written toward data persistence layer The problem for causing data inconsistent, therefore in the level of realization, job management module is as much as possible ensured using " retrocession " Metadata is consistent: (1) fault tolerant storage used, the data of update are written in local data base/file, it is normal to network recovery Accumulation layer is written later;(2) fuse is used, when fault tolerant mechanism has triggered and reaches certain threshold value, fuse is disconnected, service Downgrade processing is carried out, new task is no longer dispatched;(3) the job state interface of packaging operation operating system, in every subjob tune It when spending node starting, obtains job run state and audits, guarantee that the job state in job run system is deposited with metadata State in reservoir is consistent.
S102: as shown in fig. 6, operation preloaded components obtain operation to be processed to job management module and determine that scheduling is excellent First sequentially, described operation preloaded components are made of real-time query unit, operation preload unit and fault processing unit.Wherein have Gymnastics is made as follows:
(1) real-time query cell S 102-1 is responsible for real-time query operation metadata cache, obtains unlocked wait dispatch work Industry;
(2) operation preload unit S102-2 is responsible for being added to bounded ordered queue, and root to schedule job for unlocked It sorts according to job scheduling time and job priority;
(3) fault processing unit S102-3: it is responsible for receiving the fault message from job run cluster, carries out failure tolerant Processing.
Specifically, it constructs a bounded and orderly blocks queue, it is all to schedule job to load.Wherein, bounded It refers to guaranteeing the number of jobs upper limit, can be assessed by scene and provide upper limit parameter;Orderly refer to that the triggered time is relatively early, priority Higher operation will be placed on the forward position priority scheduling of queue.During operation stops deleting, support from queue It removes specified to schedule job.Using Producer-consumer model, CPU is reduced using thread block-wake-up mode and is born.
Failure tolerant processing refers to has occurred working node delay machine or long-time network lost contact during job run, Operation device is notified that operation preloaded components, operation preloaded components can connect working node and judge whether to connect, If it is confirmed that can not connect, then operation is directly put into queue, this is lost operation and is eventually scheduled for other enabled nodes On.If the loss node restores again at this time, operation device, which can directly kill this operation process, to be guaranteed in an operation The same operation is not had under running gear on two nodes while being run.
S103: scheduling of resource module passes through the computing resource information for obtaining operation preload information and operation device, Complete Resource Distribution and Schedule distribution.The scheduling of resource module is single by resource acquisition unit, resource allocation unit, scheduling distribution Member composition;It is specific as shown in Figure 7:
(1) resource acquisition cell S 103-1 is responsible for obtaining the computing resource of job run cluster, and is delayed in memory It deposits;
(2) resource allocation unit S103-2 is responsible for obtaining all operations from bounded ordered queue, and according to job priority Sequence is that each operation distributes computing resource;
(3) scheduling Dispatching Unit S103-3 be responsible for scheduled operation specify actuator, by operation metamessage, distribution Computing resource, Actuator configuration are sent to job run cluster.
Wherein, the implementation of actuator can be Linux Containner Executor, Docker Executor, can also To be other actuators, these container actuators can be realized the isolation of computing resource.
S200: operation device receives scheduler task and initiating task executes, and simultaneously by the status information of job run It is sent to and feeds back to job management module, working node computing resource is fed back into scheduling of resource module, lost contact or failure are believed Breath feeds back to operation preloaded components.
Operation device sets main controlled node and working node, and main controlled node is responsible for management coordination, and working node is responsible for holding Line data set is at operation.The main controlled node receive the operation metamessage of scheduling of resource module distribution, operation resource allocation information, Work implement information, and initiating task executes;Broker program on the working node collects the status information of job run And it is sent to main controlled node, job management module is sent to by main controlled node;Broker program on the working node collects work Make node computing resource and be sent to main controlled node, and scheduling of resource module is sent to by main controlled node;On the working node Broker program send heartbeat message to main controlled node, lost contact or fault message are sent to operation and preload mould by main controlled node Block.Actuator on the working node has retry mechanism, once delay machine or lost contact occur in stream compression source or target, carries out Timing is retried to guarantee to can continue to operate normally after data source and datum target are restored.
As shown in figure 8, the implementation of operation device can carry out distributed system resource pipe based on Mesos group system Reason, main controlled node provide the local metadata management of low delay using RAM+WAL log mode, are safeguarded using PAXOS algorithm The job state of extensive work node is synchronous, based on its cluster physical resource unified management interface and specific Resources Sharing To push specific physical resource to job scheduling system.Job run cluster provides multilingual driving packet and two kinds of JSON RPC Mode registers and gets specific callback events for job scheduling system;Broker program is responsible for working node collection of resources, Specific scheduler task is run by actuator, and the implementing result of actuator and task status are returned to main controlled node. Job scheduling device is transmitted to by main controlled node again.
It is specific embodiments of the present invention and the technical principle used described in above, if conception under this invention institute The change of work when the spirit that generated function is still covered without departing from specification and attached drawing, should belong to of the invention Protection scope.

Claims (11)

1. a kind of distributed data integrated operations dispatching method, which comprises the steps of:
(1) data integration operation is issued to operation device by job scheduling device, wherein the job scheduling device includes Job management module, operation preloaded components, scheduling of resource module:
(1.1) job management module receives, caching, stores operation relevant meta information, carries out con current control;
(1.2) operation preloaded components obtain operation to be processed to job management module, and determine dispatching priority sequentially;
(1.3) scheduling of resource module is completed by the computing resource information of acquisition operation preload information and operation device Resource Distribution and Schedule distribution;
(2) operation device receives scheduler task and initiating task executes, and by the status information of job run and feeds back to Working node computing resource is fed back to scheduling of resource module, lost contact or fault message is fed back to operation by job management module Preloaded components.
2. a kind of distributed data integrated operations dispatching method according to claim 1, it is characterised in that: the operation Management module includes information receiving unit, information cache unit, persistent storage unit, con current control unit, wherein the step (1.1) concrete operations are as follows:
(i) information receiving unit receives operation submission, the modification of operation metamessage, scheduling strategy update;Receive scheduling of resource module The unallocated resource job information of feedback, and update job state;The job status information of operation device feedback is received, and Update job state;
(ii) information cache unit is localized caching to operation metamessage and status information, supports frequent real-time query;
(iii) persistent storage unit safeguards the number of cache layer and accumulation layer according to cache contents persistence operation metadata information According to consistency;
(iv) con current control unit distributes Read-Write Locks to the access of each operation resource.
3. a kind of distributed data integrated operations dispatching method according to claim 2, it is characterised in that: the step (iii) data consistency of maintenance cache layer and accumulation layer is realized in by the following method, specific as follows:
(a) fault tolerant storage is used, the data of update are written in local file, to network recovery normally write-in storage later Layer;
(b) fuse is used, when fault tolerant mechanism has triggered and reaches scheduled threshold value, fuse is disconnected, and service degrades Processing, no longer dispatches new task;
(c) the job state interface of packaging operation running gear obtains job run shape in the starting of each job scheduling node State is simultaneously audited, and guarantees that the job state in operation device is consistent with the state in metadata accumulation layer.
4. a kind of distributed data integrated operations dispatching method according to claim 1, it is characterised in that: the operation Preloaded components include real-time query unit, operation preload unit and fault processing unit, wherein the step (1.2) is specifically grasped Make as follows:
(I) real-time query unit real-time query operation metadata cache obtains unlocked to schedule job;
(II) operation preload unit is added to bounded ordered queue to schedule job for unlocked, and according to the job scheduling time It sorts with job priority;
(III) fault processing unit receives the fault message from operation device, carries out failure tolerant processing;Wherein, institute The failure tolerant processing stated refers to has occurred working node delay machine or long-time network lost contact, operation during job run Running gear is notified that operation preloaded components, operation preloaded components can connect working node and judge whether to connect, if Confirmation can not connect, then operation is directly put into queue, this is lost operation and is eventually scheduled on other enabled nodes;If The loss node restores again at this time, and operation device can directly kill this operation process, to guarantee in an operation The same operation is not had under running gear on two nodes while being run.
5. a kind of distributed data integrated operations dispatching method according to claim 4, it is characterised in that: the bounded Ordered queue is all to schedule job to load, wherein bounded is referred to guaranteeing the number of jobs upper limit, can be commented by scene Estimate and provides upper limit parameter;Orderly refer to that the triggered time will be placed on the forward position of queue compared with early, the higher operation of priority Priority scheduling;During operation stops deleting, support to remove from queue specified to schedule job;Disappeared using the producer- The person's of expense model reduces CPU using thread block-wake-up mode and bears.
6. a kind of distributed data integrated operations dispatching method according to claim 1, it is characterised in that: the resource Scheduler module includes resource acquisition unit, resource allocation unit, scheduling Dispatching Unit, wherein the step (1.3) concrete operations It is as follows:
1) resource acquisition unit obtains the computing resource of job run cluster, and is cached in memory;
2) resource allocation unit obtains all operations from bounded ordered queue, and is each operation point according to job priority sequence With computing resource;
3) scheduling Dispatching Unit is that actuator is specified in scheduled operation, by operation metamessage, the computing resource of distribution, actuator Configuration is sent to job run cluster.
7. a kind of distributed data integrated operations dispatching method according to claim 1, it is characterised in that: described in step (2) in, operation device includes main controlled node and working node, and main controlled node is responsible for management coordination, and working node is responsible for holding Line data set is at operation;The main controlled node receives the operation metamessage of scheduling of resource module distribution, operation resource allocation letter Breath, Work implement information, and initiating task executes;Broker program on the working node collects the state letter of job run Main controlled node is ceased and be sent to, job management module is sent to by main controlled node;Broker program on the working node is collected Working node computing resource is simultaneously sent to main controlled node, and is sent to scheduling of resource module by main controlled node;The working node On broker program send heartbeat message to main controlled node, lost contact or fault message are sent to operation and preload mould by main controlled node Block;Actuator on the working node has retry mechanism, once delay machine or lost contact occur in stream compression source or target, carries out Timing is retried to guarantee to can continue to operate normally after data source and datum target are restored.
8. a kind of distributed data integrated operations dispatching method according to claim 7, it is characterised in that: the operation fortune The operation that luggage is set is based on Mesos group system and carries out distributed system resource management, and main controlled node uses the log side RAM+WAL Formula provides the local metadata management of low delay, and the job state using PAXOS algorithm maintenance extensive work node is synchronous, base Interface and specific Resources Sharing, which are managed collectively, in its cluster physical resource gives operation tune to push specific physical resource Degree system;Job run cluster provides multilingual driving packet and JSON RPC two ways is registered and obtained for job scheduling system Get specific callback events;Broker program is responsible for working node collection of resources, runs specific scheduler task by actuator, And the implementing result of actuator and task status are returned into main controlled node;Job scheduling device is transmitted to by main controlled node again.
9. a kind of distributed data integrated operations dispatching device characterized by comprising job scheduling device and job run fill It sets;The job scheduling device and operation device carries out mutually information exchange;The job scheduling device includes making Industry management module, operation preloaded components, scheduling of resource module;The job management module is for receiving, caching, storing work Industry relevant meta information carries out con current control;The operation preloaded components are used to obtain work to be processed to job management module Industry, and determine dispatching priority sequentially;The scheduling of resource module is used for by obtaining operation preload information and job run The computing resource information of device completes Resource Distribution and Schedule distribution;The operation device includes main controlled node and work Make node, main controlled node is responsible for management coordination, and working node is responsible for executing data integration operation.
10. a kind of distributed data integrated operations dispatching device according to claim 9, it is characterised in that: the work Industry dispatching device and operation device are registered in ZooKeeper, and the job scheduling device uses active-standby mode, main Once device delay machine, ZooKeeper elects stand-by provision and takes over job scheduling work;Master control section in operation device Point uses active-standby mode, once main controlled node delay machine, ZooKeeper elects spare main controlled node to take over management coordination work.
11. a kind of distributed data integrated operations dispatching device according to claim 10, it is characterised in that: the work Industry dispatching device is audited and is safeguarded based on the job status information that operation device is fed back, and operation metadatabase is based on making The operation metadata cache of industry dispatching device safeguards consistency;When job scheduling device breaks down, spare job scheduling dress It sets once taking over work, needs to interact with operation metadatabase, rebuild operation metadata cache mechanism, and make by receiving to come from The job state feedback information of industry running gear, audit maintenance metadata cache information, is consistent it in compartment system.
CN201910489422.2A 2019-06-06 2019-06-06 Distributed data integration job scheduling method and device Active CN110362390B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910489422.2A CN110362390B (en) 2019-06-06 2019-06-06 Distributed data integration job scheduling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910489422.2A CN110362390B (en) 2019-06-06 2019-06-06 Distributed data integration job scheduling method and device

Publications (2)

Publication Number Publication Date
CN110362390A true CN110362390A (en) 2019-10-22
CN110362390B CN110362390B (en) 2021-09-07

Family

ID=68215696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910489422.2A Active CN110362390B (en) 2019-06-06 2019-06-06 Distributed data integration job scheduling method and device

Country Status (1)

Country Link
CN (1) CN110362390B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045802A (en) * 2019-11-22 2020-04-21 中国联合网络通信集团有限公司 Redis cluster component scheduling system and method and platform device
CN111124806A (en) * 2019-11-25 2020-05-08 山东鲁能软件技术有限公司 Equipment state real-time monitoring method and system based on distributed scheduling task
CN111338770A (en) * 2020-02-12 2020-06-26 咪咕文化科技有限公司 Task scheduling method, server and computer readable storage medium
CN111580990A (en) * 2020-05-08 2020-08-25 中国建设银行股份有限公司 Task scheduling method, scheduling node, centralized configuration server and system
CN112131318A (en) * 2020-11-30 2020-12-25 北京优炫软件股份有限公司 Pre-written log record ordering system in database cluster
CN112200534A (en) * 2020-09-24 2021-01-08 中国建设银行股份有限公司 Method and device for managing time events
CN112328383A (en) * 2020-11-19 2021-02-05 湖南智慧畅行交通科技有限公司 Priority-based job concurrency control and scheduling algorithm
CN112527488A (en) * 2020-12-21 2021-03-19 浙江百应科技有限公司 Distributed high-availability task scheduling method and system
CN112835717A (en) * 2021-02-05 2021-05-25 远光软件股份有限公司 Integrated application processing method and device for cluster
CN113778676A (en) * 2021-09-02 2021-12-10 山东派盟网络科技有限公司 Task scheduling system, method, computer device and storage medium
CN113986507A (en) * 2021-11-01 2022-01-28 佛山技研智联科技有限公司 Job scheduling method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6490572B2 (en) * 1998-05-15 2002-12-03 International Business Machines Corporation Optimization prediction for industrial processes
CN101309208A (en) * 2008-06-21 2008-11-19 华中科技大学 Job scheduling system suitable for grid environment and based on reliable expense
CN101599026A (en) * 2009-07-09 2009-12-09 浪潮电子信息产业股份有限公司 A kind of cluster job scheduling system with resilient infrastructure
CN104317650A (en) * 2014-10-10 2015-01-28 北京工业大学 Map/Reduce type mass data processing platform-orientated job scheduling method
CN104462370A (en) * 2014-12-09 2015-03-25 北京百度网讯科技有限公司 Distributed task scheduling system and method
US9141433B2 (en) * 2009-12-18 2015-09-22 International Business Machines Corporation Automated cloud workload management in a map-reduce environment
CN109327509A (en) * 2018-09-11 2019-02-12 武汉魅瞳科技有限公司 A kind of distributive type Computational frame of the lower coupling of master/slave framework

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6490572B2 (en) * 1998-05-15 2002-12-03 International Business Machines Corporation Optimization prediction for industrial processes
CN101309208A (en) * 2008-06-21 2008-11-19 华中科技大学 Job scheduling system suitable for grid environment and based on reliable expense
CN101599026A (en) * 2009-07-09 2009-12-09 浪潮电子信息产业股份有限公司 A kind of cluster job scheduling system with resilient infrastructure
US9141433B2 (en) * 2009-12-18 2015-09-22 International Business Machines Corporation Automated cloud workload management in a map-reduce environment
CN104317650A (en) * 2014-10-10 2015-01-28 北京工业大学 Map/Reduce type mass data processing platform-orientated job scheduling method
CN104462370A (en) * 2014-12-09 2015-03-25 北京百度网讯科技有限公司 Distributed task scheduling system and method
CN109327509A (en) * 2018-09-11 2019-02-12 武汉魅瞳科技有限公司 A kind of distributive type Computational frame of the lower coupling of master/slave framework

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045802B (en) * 2019-11-22 2024-01-26 中国联合网络通信集团有限公司 Redis cluster component scheduling system and method and platform equipment
CN111045802A (en) * 2019-11-22 2020-04-21 中国联合网络通信集团有限公司 Redis cluster component scheduling system and method and platform device
CN111124806B (en) * 2019-11-25 2023-09-05 山东鲁软数字科技有限公司 Method and system for monitoring equipment state in real time based on distributed scheduling task
CN111124806A (en) * 2019-11-25 2020-05-08 山东鲁能软件技术有限公司 Equipment state real-time monitoring method and system based on distributed scheduling task
CN111338770A (en) * 2020-02-12 2020-06-26 咪咕文化科技有限公司 Task scheduling method, server and computer readable storage medium
CN111580990A (en) * 2020-05-08 2020-08-25 中国建设银行股份有限公司 Task scheduling method, scheduling node, centralized configuration server and system
CN112200534A (en) * 2020-09-24 2021-01-08 中国建设银行股份有限公司 Method and device for managing time events
CN112328383A (en) * 2020-11-19 2021-02-05 湖南智慧畅行交通科技有限公司 Priority-based job concurrency control and scheduling algorithm
CN112131318B (en) * 2020-11-30 2021-03-16 北京优炫软件股份有限公司 Pre-written log record ordering system in database cluster
CN112131318A (en) * 2020-11-30 2020-12-25 北京优炫软件股份有限公司 Pre-written log record ordering system in database cluster
CN112527488A (en) * 2020-12-21 2021-03-19 浙江百应科技有限公司 Distributed high-availability task scheduling method and system
CN112835717A (en) * 2021-02-05 2021-05-25 远光软件股份有限公司 Integrated application processing method and device for cluster
CN113778676A (en) * 2021-09-02 2021-12-10 山东派盟网络科技有限公司 Task scheduling system, method, computer device and storage medium
CN113778676B (en) * 2021-09-02 2023-05-23 山东派盟网络科技有限公司 Task scheduling system, method, computer device and storage medium
CN113986507A (en) * 2021-11-01 2022-01-28 佛山技研智联科技有限公司 Job scheduling method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110362390B (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN110362390A (en) A kind of distributed data integrated operations dispatching method and device
CN101958808B (en) Cluster task dispatching manager used for multi-grid access
Sha et al. Rate monotonic analysis for real-time systems
CN102346460B (en) Transaction-based service control system and method
US8799248B2 (en) Real-time transaction scheduling in a distributed database
JP5373770B2 (en) Deterministic computing systems, methods, and program storage devices (distributed, fault tolerant, and high availability computing systems) to achieve distributed, fault tolerant, and high availability
CN103930875B (en) Software virtual machine for acceleration of transactional data processing
CN105573866B (en) The method and system of batch input data is handled with fault-tolerant way
CN102591759B (en) Clock precision parallel simulation system for on-chip multi-core processor
KR20090065133A (en) Apparatus for task distribution method and method of transaction at a task thread group unit on the distributed parallel processing system
CN104123182A (en) Map Reduce task data-center-across scheduling system and method based on master-slave framework
WO2014110702A1 (en) Cooperative concurrent message bus, driving member assembly model and member disassembly method
EP2693297A1 (en) Strictly increasing virtual clock for high-precision timing of programs in multiprocessing systems
CN107316124B (en) Extensive affairs type job scheduling and processing general-purpose system under big data environment
Carvalho et al. Asynchronous lease-based replication of software transactional memory
CN109739640A (en) A kind of container resource management system based on Shen prestige framework
CN109144749A (en) A method of it is communicated between realizing multiprocessor using processor
CN111625414A (en) Method for realizing automatic scheduling monitoring system of data conversion integration software
CN111125070A (en) Data exchange method and platform
CN112148546A (en) Static safety analysis parallel computing system and method for power system
CN116414581A (en) Multithreading time synchronization event scheduling system based on thread pool and Avl tree
CN109032809A (en) Heterogeneous parallel scheduling system based on remote sensing image storage position
CN113515356B (en) Lightweight distributed resource management and task scheduler and method
CN112698931B (en) Distributed scheduling system for cloud workflow
CN111506407B (en) Resource management and job scheduling method and system combining Pull mode and Push mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee after: Yinjiang Technology Co.,Ltd.

Address before: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee before: ENJOYOR Co.,Ltd.

CP01 Change in the name or title of a patent holder