CN109284324A - The dispatching device of flow tasks based on Apache Oozie frame processing big data - Google Patents

The dispatching device of flow tasks based on Apache Oozie frame processing big data Download PDF

Info

Publication number
CN109284324A
CN109284324A CN201811204278.5A CN201811204278A CN109284324A CN 109284324 A CN109284324 A CN 109284324A CN 201811204278 A CN201811204278 A CN 201811204278A CN 109284324 A CN109284324 A CN 109284324A
Authority
CN
China
Prior art keywords
task
big data
oozie
module
frame processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811204278.5A
Other languages
Chinese (zh)
Inventor
桂艳军
张金桥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shun Yi Nationwide Financial Services Inc
Original Assignee
Shenzhen Shun Yi Nationwide Financial Services Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shun Yi Nationwide Financial Services Inc filed Critical Shenzhen Shun Yi Nationwide Financial Services Inc
Priority to CN201811204278.5A priority Critical patent/CN109284324A/en
Publication of CN109284324A publication Critical patent/CN109284324A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of dispatching devices of flow tasks based on Apache Oozie frame processing big data, including client (front end) and server end (rear end), the client includes interface operation module, and the server end includes server-side operation module;Wherein, the interface operation module includes three task submission, task operating and Mission Monitor modules;The front-end interface operation module with the entrance of user's operation for docking;The server end includes control layer (Controller), operation layer (Service) and accumulation layer.

Description

The dispatching device of flow tasks based on Apache Oozie frame processing big data
Technical field
The present invention relates to technical field of data processing, especially a kind of big data processing based on Apache Oozie frame The dispatching device of the flow tasks of process.
Background technique
Mono- Open Framework based on workflow engine of Apache Oozie, is to contribute to Apache by Cloudera company , it is capable of providing the scheduling and coordination of the data processing task to Hadoop MapReduce and Pig Jobs.Oozie needs It is deployed in Java Servlet container and runs.The workflow engine increased income as one, it provides task submission, and task opens Dynamic, task is killed, task suspension, task recovery, Mission Monitor, and task is run again, the functions such as task schedule, and the official website Oozie provides One simple query interface.Its architecture design is as shown in Figure 1, Oozie provides three kinds of flow engines
1, workflow: sequence executes flow nodes, and Oozie client submits process to describe file to server end, Oozie server end process of analysis file executes node according to process sequence.
2, Coordinator: coordinator engine, OOzie manage workflow using Coordinator, by predetermined Justice time or based on data qualification come the starting workflow of timing.
Multiple Coordinator are organized into a set using Bundle task by 3, Bundle:oozie, are used Buddle can more easily manage multiple Coordinator coordinators.
The minimum execution unit of Oozie is node, including Hadoop map-reduce, Hadoop file system, Pig, SSH, HTTP, eMail and Oozie sub-workflow etc. acts node and start, end, kill, fork, The control nodes such as join, decision, while Oozie also supports user's User- defined Node, Oozie to use directed acyclic graph (DAG) each flow nodes being organized into workflow, oozie describes the description of node and workflow using xml document, It is developed using oozie, the status of O&M big data is as shown in Figure 2.For development process:
Step 1: user needs locally writing flow tasks using xml, since user uses different flow nodes Different schema constraints is needed to refer to, the attribute configuration of different flow nodes differs greatly, the file knot of a workflow Structure can be relative complex.Enumerating one, only there are five the workflow.xml files of flow nodes:
Step 2: user needs using hdfs tool to upload to the flow file finished writing on hdfs.
Step 3: user needs using Oozie client, usually execution shell-command submits task.
Step 4: the interface ext (oneself downloading ext is needed to rely on) provided using official website or order line check task Operating status and running log.
In the actual development scene of big data, data flow is usually required according to business come division module, and usual data flow is all It can be according to the processes such as data acquire, data cleansing, data analysis, data summarization, and data are shown execute;These data flows are most Be executed according to specified frequency (according to minute, hour, day, week, moon etc.) timing, and execute had in proper order data according to The relationship of relying, therefore substantially processing data mining process is exactly extremely complex.And oozie comes definitim flow and association using xml Make device, so every one timed task needs of exploitation of user are at least provided according to the document that oozie xml schema is defined
Two files of coordinator.xml, workflow.xml, user uses different in workflow.xml Flow nodes need to refer to different schema constraints, and the attribute configuration of different flow nodes differs greatly, so that process is opened Hair process becomes complicated and is easy error, especially when flow tasks reach certain scale, holds if necessary to change process Capable node, user need to describe file from hdfs or more current-carrying journey again, then modification process file, again upper transmitting file, If it is coordinator.xml file modification, it is also necessary to restart coordinator task, whole process is extremely complex.
Oozie describes workflow using xml document, since big data exploitation would generally be related to multi-level, multi-service The data flow of line is had extremely complex dependence between data flow, can not be pressed to workflow using existing flow engine Business and level grouping, the dependence that can not be also directly viewable between multiple workflows.
In addition, although oozie client provides task submission, task start, task is killed, task suspension, and task is extensive Multiple, Mission Monitor, task is run again, the operation of the order lines such as task schedule, and after more than the task quantity, all operations need to exist The order line of server operates, and for O&M daily for task, this mode is clearly worthless.
Summary of the invention
The purpose of the present invention is to provide a kind of tune of flow tasks based on Apache Oozie frame processing big data Device is spent, exploitation can be greatly improved and handles the speed of the flow tasks of big data.
The technical scheme is that a kind of scheduling of the flow tasks based on Apache Oozie frame processing big data Device, including client (front end) and server end (rear end), the client include interface operation module, the server end Including server-side operation module;Wherein, the interface operation module include task submit, task operating and Mission Monitor three Module;
The front-end interface operation module with the entrance of user's operation for docking;The server end includes control layer (Controller), operation layer (Service) and accumulation layer;
Wherein, the control layer is Service layer described for calling, according to the different parameters of user's submission to described Service layers of transmission request;
Described Service layers includes task generation, three task operating, Mission Monitor modules, and described Service layers is used for The core business processing of entire big data treatment process, and the end OOzieServer is sent by the final request of user, by The end OOzieServer finally executes task;The task monitoring module is used for the state of timing query statistic task, by task Operating condition statistical analysis generates Mission Monitor report, shows in task monitoring interface, if there is task execution fails, described Be engaged in monitoring module sending e-mail alert submitter.
The accumulation layer includes MYSQL memory module and HDFS memory module, and the MYSQL memory module is for storing institute State the metadata of flow tasks;The HDFS memory module is used to store the definition file of the flow tasks;All generations The definition of task describes file and all stores onto HDFS, the metadata of all task runs, the result record including task run All the Mysql database is arrived in storage.
Further, the task operating module of the server end include starting, hang up, modification, pause, restore, again run, Kill, delete and submit unit.
Further, the interface operation module includes grouped task, and task relies on inquiry and batch processing unit.
Preferably, the control layer uses springMVC.
Preferably, the task generation module uses Velocity as template.
Preferably, the client is realized using ElementUI+VUE.js+ECHART.js.
Preferably, the server end carries out OOzie using SpringBoot+SpringMVC+Spring+Mybatis Secondary encapsulation.
Preferably, the client and server end uses MAVEN as building strapping tool.
Co-ordination relationship between modules of the invention is as follows:
User submits interface in task, selects the flow nodes needed and fills in form attributes if it is Coordinate Business user needs the input and output path of appointed task, the determinant attributes such as running frequency of task, and user's submission form arrives Then springMVC control layer calls and arrives task generation module, task generation module uses Velocity as template, according to Different task types generates different Action definition documents, finally completes all node processings to merge file generated The workflow.xml that oozie workflow needs, also needs to generate coordinate.xml and connects if it is Coordinate task Call HDFS memory module API the file of generation is uploaded on HDFS, if user chosen when the task of submission it is vertical It runs, control layer can then call task operating module, and task operating module calls OOzieClient to OozieServer Module sends submiting command, starts task by OOzieServer, while modifying the state of task and depositing the metadata of task Store up Mysql database.
For task operating, user only needs to choose task list at interface, then chooses different operation buttons, passes through To console, console can then call task operating module for Ajax request, task operating module call OOzieClient to OozieServer module sends submiting command, starts task by OOzieServer, while modifying state and the modification of task The state of corresponding task data in Mysql.
For Mission Monitor, front end is read using training in rotation task monitoring module in every five points of Ajax, task monitoring module The data of task run in Mysql database are sorted by different dimensions by statistical packet, return the result to front end dynamic Rendering report, if there is task execution failure then sends mail notification task submitter.
The beneficial effects of the present invention are for development process:
The present invention only needs user to fill in list on interface, and many attributes are all drop-down selections, when user's submission form Flow definition file workflow.xml can be automatically generated and be saved on hdfs, when user needs modification process file It waits, it can also be with direct-on-line edit-modify.Task exploitation, submission can be completed directly at interface, not need manually to call again Hdfs api and oozie Client command.
For O&M process:
The present invention provides interface operations abundant, and will support grouped task, and task relies on inquiry, to all behaviour All be added to batch processing, accordingly even when task popularization also can quick batch operation task so that on a large scale Task O&M become to be simple and efficient.
Detailed description of the invention
Fig. 1 is the configuration diagram of background technique Apache Oozie frame of the invention;
Flow tasks when Fig. 2 is background technique Apache Oozie Development of Framework and O&M of the invention dispatch signal Figure;
Fig. 3 is the framework general illustration of dispatching method of the invention;
Fig. 4 is the operation interface schematic diagram of one embodiment of the present of invention.
Specific embodiment
Embodiment of the invention, as shown in Figure 3 and Figure 4 is further illustrated with reference to the accompanying drawings and examples,
A kind of dispatching device of the flow tasks based on Apache Oozie frame processing big data, including client are (preceding End) and server end (rear end), the client includes interface operation module, and the server end includes that server-side operates mould Block;Wherein, the interface operation module includes three task submission, task operating and Mission Monitor modules;
The front-end interface operation module with the entrance of user's operation for docking;The server end includes control layer (Controller), operation layer (Service) and accumulation layer;
Wherein, the control layer is Service layer described for calling, according to the different parameters of user's submission to described Service layers of transmission request;
Described Service layers includes task generation, three task operating, Mission Monitor modules, and described Service layers is used for The core business processing of entire big data treatment process, and the end OOzieServer is sent by the final request of user, by The end OOzieServer finally executes task;The task monitoring module is used for the state of timing query statistic task, by task Operating condition statistical analysis generates Mission Monitor report, shows in task monitoring interface, if there is task execution fails, described Be engaged in monitoring module sending e-mail alert submitter.
The accumulation layer includes MYSQL memory module and HDFS memory module, and the MYSQL memory module is for storing institute State the metadata of flow tasks;The HDFS memory module is used to store the definition file of the flow tasks;All generations The definition of task describes file and all stores onto HDFS, the metadata of all task runs, the result record including task run All the Mysql database is arrived in storage.
Further, the task operating module of the server end include starting, hang up, modification, pause, restore, again run, Kill, delete and submit unit.
Further, the interface operation module includes grouped task, and task relies on inquiry and batch processing unit.
Preferably, the control layer uses springMVC.The task generation module uses Velocity as template. The client is realized using ElementUI+VUE.js+ECHART.js.The server end uses SpringBoot+ SpringMVC+Spring+Mybatis carries out secondary encapsulation to OOzie.The client and server end use MAVEN as Construct strapping tool.
Co-ordination relationship between modules of the invention is as follows:
User submits interface in task, selects the flow nodes needed and fills in form attributes if it is Coordinate Business user needs the input and output path of appointed task, the determinant attributes such as running frequency of task, and user's submission form arrives Then springMVC control layer calls and arrives task generation module, task generation module uses Velocity as template, according to Different task types generates different Action definition documents, finally completes all node processings to merge file generated The workflow.xml that oozie workflow needs, also needs to generate coordinate.xml and connects if it is Coordinate task Call HDFS memory module API the file of generation is uploaded on HDFS, if user chosen when the task of submission it is vertical It runs, control layer can then call task operating module, and task operating module calls OOzieClient to OozieServer Module sends submiting command, starts task by OOzieServer, while modifying the state of task and depositing the metadata of task Store up Mysql database.
For task operating, user only needs to choose task list at interface, then chooses different operation buttons, passes through To console, console can then call task operating module for Ajax request, task operating module call OOzieClient to OozieServer module sends submiting command, starts task by OOzieServer, while modifying state and the modification of task The state of corresponding task data in Mysql.
For Mission Monitor, front end is read using training in rotation task monitoring module in every five points of Ajax, task monitoring module The data of task run in Mysql database are sorted by different dimensions by statistical packet, return the result to front end dynamic Rendering report, if there is task execution failure then sends mail notification task submitter.
Above description merely relates to certain specific embodiments of the invention, and any those skilled in the art is based on this The replacement or improvement that the spirit of invention is done should be protection scope of the present invention and covered, protection scope of the present invention Ying Yiquan Subject to sharp claim.

Claims (8)

1. a kind of dispatching device of the flow tasks based on Apache Oozie frame processing big data, including client (front end) With server end (rear end), the client includes interface operation module, and the server end includes server-side operation module;Its In, the interface operation module includes three task submission, task operating and Mission Monitor modules;
The front-end interface operation module with the entrance of user's operation for docking;The server end includes control layer (Controller), operation layer (Service) and accumulation layer;
Wherein, the control layer is Service layer described for calling, according to the different parameters of user's submission to the Service Layer sends request;
Described Service layers includes task generation, three task operating, Mission Monitor modules, and described Service layers for entire The core business of big data treatment process is handled, and sends the end OOzieServer for the final request of user, by The end OOzieServer finally executes task;The task monitoring module is used for the state of timing query statistic task, by task Operating condition statistical analysis generates Mission Monitor report, shows in task monitoring interface, if there is task execution fails, described Be engaged in monitoring module sending e-mail alert submitter.
The accumulation layer includes MYSQL memory module and HDFS memory module, and the MYSQL memory module is for storing the stream The metadata of journey task;The HDFS memory module is used to store the definition file of the flow tasks;The task of all generations Definition describe file and all store onto HDFS, the metadata of all task runs, the result record including task run is all deposited Store up the Mysql database.
2. the dispatching device of the flow tasks according to claim 1 based on Apache Oozie frame processing big data, It is characterized in that, the task operating module of the server end include starting, hang up, modification, pause, restore, again run, kill, Delete and submit unit.
3. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data It sets, which is characterized in that the interface operation module includes grouped task, and task relies on inquiry and batch processing unit.
4. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data It sets, which is characterized in that the control layer uses springMVC.
5. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data It sets, which is characterized in that the task generation module uses Velocity as template.
6. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data It sets, which is characterized in that the client is realized using ElementUI+VUE.js+ECHART.js.
7. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data It sets, which is characterized in that the server end carries out OOzie using SpringBoot+SpringMVC+Spring+Mybatis Secondary encapsulation.
8. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data It sets, which is characterized in that the client and server end uses MAVEN as building strapping tool.
CN201811204278.5A 2018-10-16 2018-10-16 The dispatching device of flow tasks based on Apache Oozie frame processing big data Pending CN109284324A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811204278.5A CN109284324A (en) 2018-10-16 2018-10-16 The dispatching device of flow tasks based on Apache Oozie frame processing big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811204278.5A CN109284324A (en) 2018-10-16 2018-10-16 The dispatching device of flow tasks based on Apache Oozie frame processing big data

Publications (1)

Publication Number Publication Date
CN109284324A true CN109284324A (en) 2019-01-29

Family

ID=65177737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811204278.5A Pending CN109284324A (en) 2018-10-16 2018-10-16 The dispatching device of flow tasks based on Apache Oozie frame processing big data

Country Status (1)

Country Link
CN (1) CN109284324A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694650A (en) * 2020-06-17 2020-09-22 科技谷(厦门)信息技术有限公司 Multidimensional data job scheduling system
CN111708751A (en) * 2019-12-27 2020-09-25 山东鲁能软件技术有限公司 Method, system, equipment and readable storage medium for realizing data loading based on Hue
CN113220438A (en) * 2021-06-02 2021-08-06 中国邮政储蓄银行股份有限公司 System for executing operation, method and device for testing batch operation
CN113326117A (en) * 2021-07-15 2021-08-31 中国电子科技集团公司第十五研究所 Task scheduling method, device and equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101478431A (en) * 2009-02-10 2009-07-08 浪潮通信信息系统有限公司 Task scheduling system for management by visible process
CN101694709A (en) * 2009-09-27 2010-04-14 华中科技大学 Service-oriented distributed work flow management system
CN103577256A (en) * 2013-11-21 2014-02-12 五八同城信息技术有限公司 Distributed timed task dispatching system
CN104536809A (en) * 2014-11-26 2015-04-22 上海瀚之友信息技术服务有限公司 Distributed timing task scheduling system based on client and server system
US9172608B2 (en) * 2012-02-07 2015-10-27 Cloudera, Inc. Centralized configuration and monitoring of a distributed computing cluster
CN105446812A (en) * 2016-01-04 2016-03-30 中国南方电网有限责任公司 Multitask scheduling configuration method
CN105867907A (en) * 2016-03-23 2016-08-17 沈阳师范大学 JSS multi-layer Web development framework design method removing service coupling
CN108037919A (en) * 2017-12-01 2018-05-15 北京博宇通达科技有限公司 A kind of visualization big data workflow configuration method and system based on WEB

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101478431A (en) * 2009-02-10 2009-07-08 浪潮通信信息系统有限公司 Task scheduling system for management by visible process
CN101694709A (en) * 2009-09-27 2010-04-14 华中科技大学 Service-oriented distributed work flow management system
US9172608B2 (en) * 2012-02-07 2015-10-27 Cloudera, Inc. Centralized configuration and monitoring of a distributed computing cluster
CN103577256A (en) * 2013-11-21 2014-02-12 五八同城信息技术有限公司 Distributed timed task dispatching system
CN104536809A (en) * 2014-11-26 2015-04-22 上海瀚之友信息技术服务有限公司 Distributed timing task scheduling system based on client and server system
CN105446812A (en) * 2016-01-04 2016-03-30 中国南方电网有限责任公司 Multitask scheduling configuration method
CN105867907A (en) * 2016-03-23 2016-08-17 沈阳师范大学 JSS multi-layer Web development framework design method removing service coupling
CN108037919A (en) * 2017-12-01 2018-05-15 北京博宇通达科技有限公司 A kind of visualization big data workflow configuration method and system based on WEB

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
万川梅: "《云技术应用》", 31 August 2013, 成都:西南交通大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708751A (en) * 2019-12-27 2020-09-25 山东鲁能软件技术有限公司 Method, system, equipment and readable storage medium for realizing data loading based on Hue
CN111708751B (en) * 2019-12-27 2024-02-02 山东鲁能软件技术有限公司 Method, system, equipment and readable storage medium for realizing data loading based on Hue
CN111694650A (en) * 2020-06-17 2020-09-22 科技谷(厦门)信息技术有限公司 Multidimensional data job scheduling system
CN113220438A (en) * 2021-06-02 2021-08-06 中国邮政储蓄银行股份有限公司 System for executing operation, method and device for testing batch operation
CN113326117A (en) * 2021-07-15 2021-08-31 中国电子科技集团公司第十五研究所 Task scheduling method, device and equipment

Similar Documents

Publication Publication Date Title
US10901791B2 (en) Providing configurable workflow capabilities
US11086688B2 (en) Managing resource allocation in a stream processing framework
US11288142B2 (en) Recovery strategy for a stream processing system
US11296961B2 (en) Simplified entity lifecycle management
US11615084B1 (en) Unified data processing across streaming and indexed data sets
US9965330B2 (en) Maintaining throughput of a stream processing framework while increasing processing load
US11238048B1 (en) Guided creation interface for streaming data processing pipelines
US9842000B2 (en) Managing processing of long tail task sequences in a stream processing framework
JP6523354B2 (en) State machine builder with improved interface and handling of state independent events
CN109684053B (en) Task scheduling method and system for big data
US20190155646A1 (en) Providing strong ordering in multi-stage streamng processing
CN109284324A (en) The dispatching device of flow tasks based on Apache Oozie frame processing big data
US8812752B1 (en) Connector interface for data pipeline
CN107103064B (en) Data statistical method and device
CN111837121B (en) Key-based logging with executable logic for processing structured data items
CN110781180A (en) Data screening method and data screening device
CN112149838A (en) Method, device, electronic equipment and storage medium for realizing automatic model building
CN109471709A (en) The dispatching method of flow tasks based on Apache Oozie frame processing big data
Yahia A language-based approach for web service composition
Laurent A Language-Based Approach for Web Service Composition
Alodib An analytical Approach for the Enhancement of Services Provided using Big Data Technique.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190129

RJ01 Rejection of invention patent application after publication