CN110362390A

CN110362390A - A kind of distributed data integrated operations dispatching method and device

Info

Publication number: CN110362390A
Application number: CN201910489422.2A
Authority: CN
Inventors: 李建元; 刘飞黄; 王超群; 刘兴田; 贾建涛; 温晓岳
Original assignee: Enjoyor Co Ltd
Current assignee: Yinjiang Technology Co.,Ltd.
Priority date: 2019-06-06
Filing date: 2019-06-06
Publication date: 2019-10-22
Anticipated expiration: 2039-06-06
Also published as: CN110362390B

Abstract

The present invention relates to a kind of distributed data integrated operations dispatching method and devices, the present invention is for data sets at the special screne that may be faced, it is responsible for data integration operation being issued to operation device by job scheduling device, operation device receives scheduler task and initiating task executes, and it by the status information of job run and is sent to and feeds back to job management module, working node computing resource is fed back into scheduling of resource module, lost contact or fault message are fed back into operation preloaded components.The present invention has following Comprehensive Characteristics: (1) High Availabitity and failure tolerant, weak consistency；(2) towards the low latency characteristic of quasi real time job scheduling；(3) the multi-tenant con current control of cloud service-oriented application；(4) computing resource isolation is dispatched with more job parallelisms；(5) priority scheduling mechanism.

Description

A kind of distributed data integrated operations dispatching method and device

Technical field

The present invention relates to big data basic technology field more particularly to a kind of distributed data integrated operations dispatching method and Device.

Background technique

With the evolution of digital economy, the service digitalization of numerous industries has attained full development, and digital serviceization is gradually As new center of gravity.However, since service digital has derived from mass data isolated island, and become the general character for realizing digital service Data silo, integration and improvement data resource are got through and evaded to pain spot, every profession and trade, to effectively open there is an urgent need to data integration Send out the affiliated value between data.

Data integration often faces the homework types such as thousands of job scheduling, including data exchange, data prediction, The design of scheduling system needs to consider the scene of various complexity.For example, not only operation quantity is more but also needs parallel for some scenes Processing；Some scene requirements are quasi real time run, and need to consider priority scheduling；Some operations need to occupy more or the long period Computing resource needs to consider resource isolation in order to avoid influencing other operations；Some operations may meet with data source and target delay machine, The failures such as interruption, the operation node delay machine of network, need failure tolerant mechanism；Under multi-tenant scene, needs to handle and concurrently control System；Etc..

The prior art is not able to satisfy complicated data integration job scheduling demand.Such as: traditional LTS dispatches system, makees Industry is isolated based on thread, if the execution thread of an operation runs out of all memories of current process, will lead to All operations are all abnormal under this process, lack the ability of scheduling data integration operation, more suitable for dispatching lightweight task.In State patent of invention CN201610800080 discloses a kind of distributed task dispatching system and method, for solving parallel computation journey The problem that sequence development scheme written in code amount is big, developer's development task is heavy, essence are towards single large-scale distributed meter It can be regarded as industry to be scheduled, do not consider that multi-task parallel is dispatched.Chinese invention patent CN201610197298 discloses a kind of task Dispatching method, apparatus and system propose the distributed scheduling method of multichannel multitask, solve single task role scheduling time The problem hungry to death of other operations caused by too long, but do not consider the low time delay of operation metadata access, how to ensure operation element The problems such as data consistency.Chinese invention patent CN201410748604 discloses a kind of distributed task dispatching system and side Method proposes the reliability of guarantee system itself, supports independence or associated task, supports the distributed task tune of task rollback System and method is spent, but is not suitable for data integration complex scene, the rollback of task is not emphasis, data under these scenes It does not need to be associated between integration servers yet, does not consider under large-scale complex data integration task that delay machine cost is big and need reality The problem of existing High Availabitity, and how to reduce time delay as far as possible under the premise of quasi real time task is existing, how to guarantee operation element number The problems such as according to consistency.

Summary of the invention

The present invention is to overcome above-mentioned shortcoming, and it is an object of the present invention to provide a kind of distributed data integrated operations dispatching method And device, the present invention at the special screne that may be faced, are responsible for by job scheduling device by data integration operation for data sets It is issued to operation device, operation device receives scheduler task and initiating task executes, and by the state of job run Information is simultaneously sent to and feeds back to job management module, and working node computing resource is fed back to scheduling of resource module, by lost contact or Fault message feeds back to operation preloaded components；The present invention has following Comprehensive Characteristics: (1) High Availabitity and failure tolerant, and weak one Cause property；(2) towards the low latency characteristic of quasi real time job scheduling；(3) the multi-tenant con current control of cloud service-oriented application；(4) Computing resource isolation is dispatched with more job parallelisms；(5) priority scheduling mechanism.

The present invention is to reach above-mentioned purpose by the following technical programs: a kind of distributed data integrated operations dispatching method, Include the following steps:

(1) data integration operation is issued to operation device by job scheduling device, wherein the job scheduling device Including job management module, operation preloaded components, scheduling of resource module:

(1.1) job management module receives, caching, stores operation relevant meta information, carries out con current control；

(1.2) operation preloaded components obtain operation to be processed to job management module, and determine dispatching priority sequentially；

(1.3) scheduling of resource module passes through the computing resource information for obtaining operation preload information and operation device, Complete Resource Distribution and Schedule distribution；

(2) operation device receives scheduler task and initiating task and executes, and by the status information of job run and anti- It feeds job management module, working node computing resource is fed back into scheduling of resource module, lost contact or fault message are fed back to Operation preloaded components.

Preferably, the job management module includes information receiving unit, information cache unit, persistent storage list Member, con current control unit, wherein concrete operations are as follows:

(i) information receiving unit receives operation submission, the modification of operation metamessage, scheduling strategy update；Receive scheduling of resource The unallocated resource job information of module feedback, and update job state；Receive the job state letter of operation device feedback Breath, and update job state；

(ii) information cache unit is localized caching to operation metamessage and status information, and support is frequently looked into real time It askes；

(iii) persistent storage unit safeguards cache layer and accumulation layer according to cache contents persistence operation metadata information Data consistency；

(iv) con current control unit distributes Read-Write Locks to the access of each operation resource.

Preferably, safeguarding the data consistency of cache layer and accumulation layer in the step (iii) by the following method, have Body is as follows:

(a) fault tolerant storage is used, the data of update are written in local file, normally write-in is deposited later to network recovery Reservoir；

(b) fuse is used, when fault tolerant mechanism has triggered and reaches scheduled threshold value, fuse is disconnected, and service carries out Downgrade processing no longer dispatches new task；

(c) the job state interface of packaging operation running gear obtains operation fortune in the starting of each job scheduling node Row state is simultaneously audited, and guarantees that the job state in operation device is consistent with the state in metadata accumulation layer.

Preferably, the operation preloaded components include real-time query unit, operation preload unit and troubleshooting Unit, wherein concrete operations are as follows:

(I) real-time query unit real-time query operation metadata cache obtains unlocked to schedule job；

(II) operation preload unit is added to bounded ordered queue to schedule job for unlocked, and according to job scheduling Time and job priority sequence；

(III) fault processing unit receives the fault message from operation device, carries out failure tolerant processing；Its In, the failure tolerant processing refers to that working node delay machine or long-time network have occurred during job run to be lost Connection, operation device are notified that operation preloaded components, and operation preloaded components can connect that working node judges whether can not Even, if it is confirmed that can not connect, then operation is directly put into queue, this is lost operation and is eventually scheduled for other available sections Point on；If the loss node restores again at this time, operation device can directly kill this operation process, to guarantee one The same operation is not had under a operation device on two nodes while being run.

Preferably, orderly to block queue all to schedule job to load for the bounded, wherein what bounded referred to It is to guarantee the number of jobs upper limit, can be assessed by scene and provide upper limit parameter；Orderly refer to that the triggered time is relatively early, priority is higher Operation will be placed on the forward position priority scheduling of queue；During operation stops deleting, support is removed from queue It is specified to schedule job；Using Producer-consumer model, CPU is reduced using thread block-wake-up mode and is born.

Preferably, the scheduling of resource module includes resource acquisition unit, resource allocation unit, scheduling distribution list Member, wherein concrete operations are as follows:

1) resource acquisition unit obtains the computing resource of job run cluster, and is cached in memory；

2) resource allocation unit obtains all operations from bounded ordered queue, and is each work according to job priority sequence Industry distributes computing resource；

3) scheduling Dispatching Unit be the specified actuator of scheduled operation, by operation metamessage, distribution computing resource, hold The configuration of row device is sent to job run cluster.

Preferably, operation device includes main controlled node and working node in the step (2), main controlled node is negative Management coordination is blamed, working node is responsible for executing data integration operation；The main controlled node receives the distribution of scheduling of resource module Operation metamessage, operation resource allocation information, Work implement information, and initiating task executes；Generation on the working node Reason program collects the status information of job run and is sent to main controlled node, is sent to job management module by main controlled node；Institute It states the broker program collection work node computing resource on working node and is sent to main controlled node, and be sent to by main controlled node Scheduling of resource module；Broker program on the working node sends heartbeat message to main controlled node, main controlled node by lost contact or Fault message is sent to operation preloaded components；Actuator on the working node has retry mechanism, once stream compression There is delay machine or lost contact in source or target, are timed and retry to guarantee to can continue to normally transport after data source and datum target are restored Row.

Preferably, the operation of the operation device, which is based on Mesos group system, carries out distributed system resource pipe Reason, main controlled node provide the local metadata management of low delay using RAM+WAL log mode, are safeguarded using PAXOS algorithm The job state of extensive work node is synchronous, based on its cluster physical resource unified management interface and specific Resources Sharing To push specific physical resource to job scheduling system；Job run cluster provides multilingual driving packet and two kinds of JSON RPC Mode registers and gets specific callback events for job scheduling system；Broker program is responsible for working node collection of resources, Specific scheduler task is run by actuator, and the implementing result of actuator and task status are returned to main controlled node； Job scheduling device is transmitted to by main controlled node again.

A kind of distributed data integrated operations dispatching device, comprising: job scheduling device and operation device；Described Job scheduling device and operation device carry out mutually information exchange；The job scheduling device includes job management mould Module, scheduling of resource module are recorded in block, operation in advance；The job management module is for receiving, caching, storing the related member of operation Information carries out con current control；The operation preloaded components are used to obtain operation to be processed to job management module, and determine Dispatching priority is sequentially；The scheduling of resource module is used for the calculating by obtaining operation preload information and operation device Resource information completes Resource Distribution and Schedule distribution；The operation device includes main controlled node and working node, master control Node is responsible for management coordination, and working node is responsible for executing data integration operation.

Preferably, the job scheduling device and operation device are registered in ZooKeeper, the work Industry dispatching device uses active-standby mode, once master device delay machine, ZooKeeper elects stand-by provision and takes over job scheduling work Make；Main controlled node in operation device uses active-standby mode, once main controlled node delay machine, ZooKeeper elects spare master Control node takes over management coordination work.

Preferably, the job status information that the job scheduling device is fed back based on operation device is audited And maintenance, operation metadatabase safeguard consistency based on the operation metadata cache of job scheduling device；In job scheduling device When breaking down, spare job scheduling device once takes over work, needs to interact with operation metadatabase, rebuilds operation metadata Caching mechanism, and by receiving the job state feedback information from operation device, metadata cache information is safeguarded in audit, It is consistent it in compartment system.

The beneficial effects of the present invention are: (1) according to CAP theorem, the method for the present invention meets High Availabitity and fault-tolerance two Index, and take the mechanism for guaranteeing operation metadata consistency as possible；(2) multi-tenant is realized simultaneously based on distributed Read-Write Locks Hair control is supported to provide multi-tenant data integrated service with cloud service mode；(3) frequently scheduling is needed at operation for data sets Particularity, operation metadata uses caching mechanism, can be effectively reduced delay caused by frequent metadata access and in Disconnected risk.

Detailed description of the invention

Fig. 1 is the device of the invention flow diagram；

Fig. 2 is the high availability mechanism schematic diagram of apparatus of the present invention；

Fig. 3 is method flow schematic diagram of the invention；

Fig. 4 is job management block process schematic diagram of the invention；

Fig. 5 is job management module operation schematic diagram of the invention；

Fig. 6 is operation preloaded components flow diagram of the invention；

Fig. 7 is scheduling of resource block process schematic diagram of the invention；

Fig. 8 is operation device operational process schematic diagram of the invention.

Specific embodiment

The present invention is described further combined with specific embodiments below, but protection scope of the present invention is not limited in This:

Embodiment: as shown in Figure 1, a kind of distributed data integrated operations dispatching device is transported by job scheduling device and operation Luggage sets composition.Job scheduling device and operation device carry out mutually information exchange；The job scheduling device includes Module, scheduling of resource module are recorded in job management module, operation in advance；The job management module is for receiving, caching, storing Operation relevant meta information carries out con current control；The operation preloaded components are used to obtain to job management module to be processed Operation, and determine dispatching priority sequentially；The scheduling of resource module is used for by obtaining operation preload information and operation fortune The computing resource information that luggage is set completes Resource Distribution and Schedule distribution；The operation device include main controlled node with Working node, main controlled node are responsible for management coordination, and working node is responsible for executing data integration operation.

As shown in Fig. 2, job scheduling device and operation device are registered in ZooKeeper, the operation tune It spends device and uses active-standby mode, once master device delay machine, ZooKeeper elects stand-by provision and takes over job scheduling work； Main controlled node in operation device uses active-standby mode, once main controlled node delay machine, ZooKeeper elects spare master control section Point takes over management coordination work.

Job scheduling device is audited and is safeguarded based on the job status information that operation device is fed back, operation element number Consistency is safeguarded based on the operation metadata cache of job scheduling device according to library；It is spare when job scheduling device breaks down Job scheduling device once takes over work, needs to interact with operation metadatabase, rebuilds operation metadata cache mechanism, and pass through The job state feedback information from operation device is received, audit maintenance metadata cache information makes it in compartment system In be consistent；To ensure weak consistency.

As shown in figure 3, a kind of distributed data integrated operations dispatching method, includes the following steps:

S100: data integration operation is issued to operation device by job scheduling device, and job scheduling device is by operation Management module, operation preloaded components, scheduling of resource module three parts composition, specific as follows:

S101: job management module receives, caching, stores operation relevant meta information, carries out con current control.Job management mould Block is made of information receiving unit, storage processing unit, con current control unit three parts.Wherein as shown in figure 4, concrete operations such as Under:

(1) information receiving unit S101-1 is responsible for receiving operation submission, the modification of operation metamessage, scheduling strategy update；It connects The unallocated resource job information of scheduling of resource module feedback is received, and updates job state；Receive operation device feedback Job status information, and update job state；

(2) information cache cell S 101-2 is responsible for being localized operation metamessage and status information caching, support frequency Numerous real-time query；

(3) persistent storage unit S101-3 safeguards cache layer and deposits according to cache contents persistence operation metadata information The data consistency of reservoir；

(4) con current control cell S 101-4 is responsible for the access distribution Read-Write Locks to each operation resource.

Specifically, as shown in figure 5, job management module of the present invention is responsible for receiving operation and upkeep operation state Machine, main job state can include: inactive operation, to operation in operation in schedule job, Suspend Job, operation, stopping, Abnormal operation fulfils assignment.Job management module provides job state operation interface, as operation, the pause in out of service are made Industry blocked job, schedule job, blocked job, normally stops operation, the operation interfaces such as the operation that abends.Safeguard a variety of works Industry scheduling strategy, such as: repeat operation, timing operation, Cron operation, disposable operation.One historical storage mould of internal maintenance Block, for recording all schedule histories.Read-Write Locks are added to the operation of cache layer and metadata data persistence layer, realization is concurrently controlled System: if there is concurrent thread is write operation, then the lock just upgrades into exclusive lock, other threads can not just occupy the lock.Conversely, If concurrent thread is read operation, which just upgrades into shared lock, and other threads can also occupy the lock simultaneously.Working pipe Reason module adds cache layer on operation element data storage layer, guarantees that frequent metadata query is called, support quasi real time counts Be abstracted into SPI interface in realization level according to the frequent access and frequently scheduling, cache layer of integration servers, support Caffeine, JDK, Guava, Redis etc. cache layer interface and realize.Data persistence layer is also abstracted into SPI interface in realization level, supports relationship type With the databases such as MongoDB.Due to that may be led due to the unstability factor such as network when work data is written toward data persistence layer The problem for causing data inconsistent, therefore in the level of realization, job management module is as much as possible ensured using " retrocession " Metadata is consistent: (1) fault tolerant storage used, the data of update are written in local data base/file, it is normal to network recovery Accumulation layer is written later；(2) fuse is used, when fault tolerant mechanism has triggered and reaches certain threshold value, fuse is disconnected, service Downgrade processing is carried out, new task is no longer dispatched；(3) the job state interface of packaging operation operating system, in every subjob tune It when spending node starting, obtains job run state and audits, guarantee that the job state in job run system is deposited with metadata State in reservoir is consistent.

S102: as shown in fig. 6, operation preloaded components obtain operation to be processed to job management module and determine that scheduling is excellent First sequentially, described operation preloaded components are made of real-time query unit, operation preload unit and fault processing unit.Wherein have Gymnastics is made as follows:

(1) real-time query cell S 102-1 is responsible for real-time query operation metadata cache, obtains unlocked wait dispatch work Industry；

(2) operation preload unit S102-2 is responsible for being added to bounded ordered queue, and root to schedule job for unlocked It sorts according to job scheduling time and job priority；

(3) fault processing unit S102-3: it is responsible for receiving the fault message from job run cluster, carries out failure tolerant Processing.

Specifically, it constructs a bounded and orderly blocks queue, it is all to schedule job to load.Wherein, bounded It refers to guaranteeing the number of jobs upper limit, can be assessed by scene and provide upper limit parameter；Orderly refer to that the triggered time is relatively early, priority Higher operation will be placed on the forward position priority scheduling of queue.During operation stops deleting, support from queue It removes specified to schedule job.Using Producer-consumer model, CPU is reduced using thread block-wake-up mode and is born.

Failure tolerant processing refers to has occurred working node delay machine or long-time network lost contact during job run, Operation device is notified that operation preloaded components, operation preloaded components can connect working node and judge whether to connect, If it is confirmed that can not connect, then operation is directly put into queue, this is lost operation and is eventually scheduled for other enabled nodes On.If the loss node restores again at this time, operation device, which can directly kill this operation process, to be guaranteed in an operation The same operation is not had under running gear on two nodes while being run.

S103: scheduling of resource module passes through the computing resource information for obtaining operation preload information and operation device, Complete Resource Distribution and Schedule distribution.The scheduling of resource module is single by resource acquisition unit, resource allocation unit, scheduling distribution Member composition；It is specific as shown in Figure 7:

(1) resource acquisition cell S 103-1 is responsible for obtaining the computing resource of job run cluster, and is delayed in memory It deposits；

(2) resource allocation unit S103-2 is responsible for obtaining all operations from bounded ordered queue, and according to job priority Sequence is that each operation distributes computing resource；

(3) scheduling Dispatching Unit S103-3 be responsible for scheduled operation specify actuator, by operation metamessage, distribution Computing resource, Actuator configuration are sent to job run cluster.

Wherein, the implementation of actuator can be Linux Containner Executor, Docker Executor, can also To be other actuators, these container actuators can be realized the isolation of computing resource.

S200: operation device receives scheduler task and initiating task executes, and simultaneously by the status information of job run It is sent to and feeds back to job management module, working node computing resource is fed back into scheduling of resource module, lost contact or failure are believed Breath feeds back to operation preloaded components.

Operation device sets main controlled node and working node, and main controlled node is responsible for management coordination, and working node is responsible for holding Line data set is at operation.The main controlled node receive the operation metamessage of scheduling of resource module distribution, operation resource allocation information, Work implement information, and initiating task executes；Broker program on the working node collects the status information of job run And it is sent to main controlled node, job management module is sent to by main controlled node；Broker program on the working node collects work Make node computing resource and be sent to main controlled node, and scheduling of resource module is sent to by main controlled node；On the working node Broker program send heartbeat message to main controlled node, lost contact or fault message are sent to operation and preload mould by main controlled node Block.Actuator on the working node has retry mechanism, once delay machine or lost contact occur in stream compression source or target, carries out Timing is retried to guarantee to can continue to operate normally after data source and datum target are restored.

As shown in figure 8, the implementation of operation device can carry out distributed system resource pipe based on Mesos group system Reason, main controlled node provide the local metadata management of low delay using RAM+WAL log mode, are safeguarded using PAXOS algorithm The job state of extensive work node is synchronous, based on its cluster physical resource unified management interface and specific Resources Sharing To push specific physical resource to job scheduling system.Job run cluster provides multilingual driving packet and two kinds of JSON RPC Mode registers and gets specific callback events for job scheduling system；Broker program is responsible for working node collection of resources, Specific scheduler task is run by actuator, and the implementing result of actuator and task status are returned to main controlled node. Job scheduling device is transmitted to by main controlled node again.

It is specific embodiments of the present invention and the technical principle used described in above, if conception under this invention institute The change of work when the spirit that generated function is still covered without departing from specification and attached drawing, should belong to of the invention Protection scope.

Claims

1. a kind of distributed data integrated operations dispatching method, which comprises the steps of:

(1) data integration operation is issued to operation device by job scheduling device, wherein the job scheduling device includes Job management module, operation preloaded components, scheduling of resource module:

(1.3) scheduling of resource module is completed by the computing resource information of acquisition operation preload information and operation device Resource Distribution and Schedule distribution；

(2) operation device receives scheduler task and initiating task executes, and by the status information of job run and feeds back to Working node computing resource is fed back to scheduling of resource module, lost contact or fault message is fed back to operation by job management module Preloaded components.

2. a kind of distributed data integrated operations dispatching method according to claim 1, it is characterised in that: the operation Management module includes information receiving unit, information cache unit, persistent storage unit, con current control unit, wherein the step (1.1) concrete operations are as follows:

(i) information receiving unit receives operation submission, the modification of operation metamessage, scheduling strategy update；Receive scheduling of resource module The unallocated resource job information of feedback, and update job state；The job status information of operation device feedback is received, and Update job state；

(ii) information cache unit is localized caching to operation metamessage and status information, supports frequent real-time query；

(iii) persistent storage unit safeguards the number of cache layer and accumulation layer according to cache contents persistence operation metadata information According to consistency；

3. a kind of distributed data integrated operations dispatching method according to claim 2, it is characterised in that: the step (iii) data consistency of maintenance cache layer and accumulation layer is realized in by the following method, specific as follows:

(a) fault tolerant storage is used, the data of update are written in local file, to network recovery normally write-in storage later Layer；

(b) fuse is used, when fault tolerant mechanism has triggered and reaches scheduled threshold value, fuse is disconnected, and service degrades Processing, no longer dispatches new task；

(c) the job state interface of packaging operation running gear obtains job run shape in the starting of each job scheduling node State is simultaneously audited, and guarantees that the job state in operation device is consistent with the state in metadata accumulation layer.

4. a kind of distributed data integrated operations dispatching method according to claim 1, it is characterised in that: the operation Preloaded components include real-time query unit, operation preload unit and fault processing unit, wherein the step (1.2) is specifically grasped Make as follows:

(II) operation preload unit is added to bounded ordered queue to schedule job for unlocked, and according to the job scheduling time It sorts with job priority；

(III) fault processing unit receives the fault message from operation device, carries out failure tolerant processing；Wherein, institute The failure tolerant processing stated refers to has occurred working node delay machine or long-time network lost contact, operation during job run Running gear is notified that operation preloaded components, operation preloaded components can connect working node and judge whether to connect, if Confirmation can not connect, then operation is directly put into queue, this is lost operation and is eventually scheduled on other enabled nodes；If The loss node restores again at this time, and operation device can directly kill this operation process, to guarantee in an operation The same operation is not had under running gear on two nodes while being run.

5. a kind of distributed data integrated operations dispatching method according to claim 4, it is characterised in that: the bounded Ordered queue is all to schedule job to load, wherein bounded is referred to guaranteeing the number of jobs upper limit, can be commented by scene Estimate and provides upper limit parameter；Orderly refer to that the triggered time will be placed on the forward position of queue compared with early, the higher operation of priority Priority scheduling；During operation stops deleting, support to remove from queue specified to schedule job；Disappeared using the producer- The person's of expense model reduces CPU using thread block-wake-up mode and bears.

6. a kind of distributed data integrated operations dispatching method according to claim 1, it is characterised in that: the resource Scheduler module includes resource acquisition unit, resource allocation unit, scheduling Dispatching Unit, wherein the step (1.3) concrete operations It is as follows:

2) resource allocation unit obtains all operations from bounded ordered queue, and is each operation point according to job priority sequence With computing resource；

3) scheduling Dispatching Unit is that actuator is specified in scheduled operation, by operation metamessage, the computing resource of distribution, actuator Configuration is sent to job run cluster.

7. a kind of distributed data integrated operations dispatching method according to claim 1, it is characterised in that: described in step (2) in, operation device includes main controlled node and working node, and main controlled node is responsible for management coordination, and working node is responsible for holding Line data set is at operation；The main controlled node receives the operation metamessage of scheduling of resource module distribution, operation resource allocation letter Breath, Work implement information, and initiating task executes；Broker program on the working node collects the state letter of job run Main controlled node is ceased and be sent to, job management module is sent to by main controlled node；Broker program on the working node is collected Working node computing resource is simultaneously sent to main controlled node, and is sent to scheduling of resource module by main controlled node；The working node On broker program send heartbeat message to main controlled node, lost contact or fault message are sent to operation and preload mould by main controlled node Block；Actuator on the working node has retry mechanism, once delay machine or lost contact occur in stream compression source or target, carries out Timing is retried to guarantee to can continue to operate normally after data source and datum target are restored.

8. a kind of distributed data integrated operations dispatching method according to claim 7, it is characterised in that: the operation fortune The operation that luggage is set is based on Mesos group system and carries out distributed system resource management, and main controlled node uses the log side RAM+WAL Formula provides the local metadata management of low delay, and the job state using PAXOS algorithm maintenance extensive work node is synchronous, base Interface and specific Resources Sharing, which are managed collectively, in its cluster physical resource gives operation tune to push specific physical resource Degree system；Job run cluster provides multilingual driving packet and JSON RPC two ways is registered and obtained for job scheduling system Get specific callback events；Broker program is responsible for working node collection of resources, runs specific scheduler task by actuator, And the implementing result of actuator and task status are returned into main controlled node；Job scheduling device is transmitted to by main controlled node again.

9. a kind of distributed data integrated operations dispatching device characterized by comprising job scheduling device and job run fill It sets；The job scheduling device and operation device carries out mutually information exchange；The job scheduling device includes making Industry management module, operation preloaded components, scheduling of resource module；The job management module is for receiving, caching, storing work Industry relevant meta information carries out con current control；The operation preloaded components are used to obtain work to be processed to job management module Industry, and determine dispatching priority sequentially；The scheduling of resource module is used for by obtaining operation preload information and job run The computing resource information of device completes Resource Distribution and Schedule distribution；The operation device includes main controlled node and work Make node, main controlled node is responsible for management coordination, and working node is responsible for executing data integration operation.

10. a kind of distributed data integrated operations dispatching device according to claim 9, it is characterised in that: the work Industry dispatching device and operation device are registered in ZooKeeper, and the job scheduling device uses active-standby mode, main Once device delay machine, ZooKeeper elects stand-by provision and takes over job scheduling work；Master control section in operation device Point uses active-standby mode, once main controlled node delay machine, ZooKeeper elects spare main controlled node to take over management coordination work.

11. a kind of distributed data integrated operations dispatching device according to claim 10, it is characterised in that: the work Industry dispatching device is audited and is safeguarded based on the job status information that operation device is fed back, and operation metadatabase is based on making The operation metadata cache of industry dispatching device safeguards consistency；When job scheduling device breaks down, spare job scheduling dress It sets once taking over work, needs to interact with operation metadatabase, rebuild operation metadata cache mechanism, and make by receiving to come from The job state feedback information of industry running gear, audit maintenance metadata cache information, is consistent it in compartment system.