CN109284324A - The dispatching device of flow tasks based on Apache Oozie frame processing big data - Google Patents
The dispatching device of flow tasks based on Apache Oozie frame processing big data Download PDFInfo
- Publication number
- CN109284324A CN109284324A CN201811204278.5A CN201811204278A CN109284324A CN 109284324 A CN109284324 A CN 109284324A CN 201811204278 A CN201811204278 A CN 201811204278A CN 109284324 A CN109284324 A CN 109284324A
- Authority
- CN
- China
- Prior art keywords
- task
- big data
- oozie
- module
- frame processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of dispatching devices of flow tasks based on Apache Oozie frame processing big data, including client (front end) and server end (rear end), the client includes interface operation module, and the server end includes server-side operation module;Wherein, the interface operation module includes three task submission, task operating and Mission Monitor modules;The front-end interface operation module with the entrance of user's operation for docking;The server end includes control layer (Controller), operation layer (Service) and accumulation layer.
Description
Technical field
The present invention relates to technical field of data processing, especially a kind of big data processing based on Apache Oozie frame
The dispatching device of the flow tasks of process.
Background technique
Mono- Open Framework based on workflow engine of Apache Oozie, is to contribute to Apache by Cloudera company
, it is capable of providing the scheduling and coordination of the data processing task to Hadoop MapReduce and Pig Jobs.Oozie needs
It is deployed in Java Servlet container and runs.The workflow engine increased income as one, it provides task submission, and task opens
Dynamic, task is killed, task suspension, task recovery, Mission Monitor, and task is run again, the functions such as task schedule, and the official website Oozie provides
One simple query interface.Its architecture design is as shown in Figure 1, Oozie provides three kinds of flow engines
1, workflow: sequence executes flow nodes, and Oozie client submits process to describe file to server end,
Oozie server end process of analysis file executes node according to process sequence.
2, Coordinator: coordinator engine, OOzie manage workflow using Coordinator, by predetermined
Justice time or based on data qualification come the starting workflow of timing.
Multiple Coordinator are organized into a set using Bundle task by 3, Bundle:oozie, are used
Buddle can more easily manage multiple Coordinator coordinators.
The minimum execution unit of Oozie is node, including Hadoop map-reduce, Hadoop file system,
Pig, SSH, HTTP, eMail and Oozie sub-workflow etc. acts node and start, end, kill, fork,
The control nodes such as join, decision, while Oozie also supports user's User- defined Node, Oozie to use directed acyclic graph
(DAG) each flow nodes being organized into workflow, oozie describes the description of node and workflow using xml document,
It is developed using oozie, the status of O&M big data is as shown in Figure 2.For development process:
Step 1: user needs locally writing flow tasks using xml, since user uses different flow nodes
Different schema constraints is needed to refer to, the attribute configuration of different flow nodes differs greatly, the file knot of a workflow
Structure can be relative complex.Enumerating one, only there are five the workflow.xml files of flow nodes:
Step 2: user needs using hdfs tool to upload to the flow file finished writing on hdfs.
Step 3: user needs using Oozie client, usually execution shell-command submits task.
Step 4: the interface ext (oneself downloading ext is needed to rely on) provided using official website or order line check task
Operating status and running log.
In the actual development scene of big data, data flow is usually required according to business come division module, and usual data flow is all
It can be according to the processes such as data acquire, data cleansing, data analysis, data summarization, and data are shown execute;These data flows are most
Be executed according to specified frequency (according to minute, hour, day, week, moon etc.) timing, and execute had in proper order data according to
The relationship of relying, therefore substantially processing data mining process is exactly extremely complex.And oozie comes definitim flow and association using xml
Make device, so every one timed task needs of exploitation of user are at least provided according to the document that oozie xml schema is defined
Two files of coordinator.xml, workflow.xml, user uses different in workflow.xml
Flow nodes need to refer to different schema constraints, and the attribute configuration of different flow nodes differs greatly, so that process is opened
Hair process becomes complicated and is easy error, especially when flow tasks reach certain scale, holds if necessary to change process
Capable node, user need to describe file from hdfs or more current-carrying journey again, then modification process file, again upper transmitting file,
If it is coordinator.xml file modification, it is also necessary to restart coordinator task, whole process is extremely complex.
Oozie describes workflow using xml document, since big data exploitation would generally be related to multi-level, multi-service
The data flow of line is had extremely complex dependence between data flow, can not be pressed to workflow using existing flow engine
Business and level grouping, the dependence that can not be also directly viewable between multiple workflows.
In addition, although oozie client provides task submission, task start, task is killed, task suspension, and task is extensive
Multiple, Mission Monitor, task is run again, the operation of the order lines such as task schedule, and after more than the task quantity, all operations need to exist
The order line of server operates, and for O&M daily for task, this mode is clearly worthless.
Summary of the invention
The purpose of the present invention is to provide a kind of tune of flow tasks based on Apache Oozie frame processing big data
Device is spent, exploitation can be greatly improved and handles the speed of the flow tasks of big data.
The technical scheme is that a kind of scheduling of the flow tasks based on Apache Oozie frame processing big data
Device, including client (front end) and server end (rear end), the client include interface operation module, the server end
Including server-side operation module;Wherein, the interface operation module include task submit, task operating and Mission Monitor three
Module;
The front-end interface operation module with the entrance of user's operation for docking;The server end includes control layer
(Controller), operation layer (Service) and accumulation layer;
Wherein, the control layer is Service layer described for calling, according to the different parameters of user's submission to described
Service layers of transmission request;
Described Service layers includes task generation, three task operating, Mission Monitor modules, and described Service layers is used for
The core business processing of entire big data treatment process, and the end OOzieServer is sent by the final request of user, by
The end OOzieServer finally executes task;The task monitoring module is used for the state of timing query statistic task, by task
Operating condition statistical analysis generates Mission Monitor report, shows in task monitoring interface, if there is task execution fails, described
Be engaged in monitoring module sending e-mail alert submitter.
The accumulation layer includes MYSQL memory module and HDFS memory module, and the MYSQL memory module is for storing institute
State the metadata of flow tasks;The HDFS memory module is used to store the definition file of the flow tasks;All generations
The definition of task describes file and all stores onto HDFS, the metadata of all task runs, the result record including task run
All the Mysql database is arrived in storage.
Further, the task operating module of the server end include starting, hang up, modification, pause, restore, again run,
Kill, delete and submit unit.
Further, the interface operation module includes grouped task, and task relies on inquiry and batch processing unit.
Preferably, the control layer uses springMVC.
Preferably, the task generation module uses Velocity as template.
Preferably, the client is realized using ElementUI+VUE.js+ECHART.js.
Preferably, the server end carries out OOzie using SpringBoot+SpringMVC+Spring+Mybatis
Secondary encapsulation.
Preferably, the client and server end uses MAVEN as building strapping tool.
Co-ordination relationship between modules of the invention is as follows:
User submits interface in task, selects the flow nodes needed and fills in form attributes if it is Coordinate
Business user needs the input and output path of appointed task, the determinant attributes such as running frequency of task, and user's submission form arrives
Then springMVC control layer calls and arrives task generation module, task generation module uses Velocity as template, according to
Different task types generates different Action definition documents, finally completes all node processings to merge file generated
The workflow.xml that oozie workflow needs, also needs to generate coordinate.xml and connects if it is Coordinate task
Call HDFS memory module API the file of generation is uploaded on HDFS, if user chosen when the task of submission it is vertical
It runs, control layer can then call task operating module, and task operating module calls OOzieClient to OozieServer
Module sends submiting command, starts task by OOzieServer, while modifying the state of task and depositing the metadata of task
Store up Mysql database.
For task operating, user only needs to choose task list at interface, then chooses different operation buttons, passes through
To console, console can then call task operating module for Ajax request, task operating module call OOzieClient to
OozieServer module sends submiting command, starts task by OOzieServer, while modifying state and the modification of task
The state of corresponding task data in Mysql.
For Mission Monitor, front end is read using training in rotation task monitoring module in every five points of Ajax, task monitoring module
The data of task run in Mysql database are sorted by different dimensions by statistical packet, return the result to front end dynamic
Rendering report, if there is task execution failure then sends mail notification task submitter.
The beneficial effects of the present invention are for development process:
The present invention only needs user to fill in list on interface, and many attributes are all drop-down selections, when user's submission form
Flow definition file workflow.xml can be automatically generated and be saved on hdfs, when user needs modification process file
It waits, it can also be with direct-on-line edit-modify.Task exploitation, submission can be completed directly at interface, not need manually to call again
Hdfs api and oozie Client command.
For O&M process:
The present invention provides interface operations abundant, and will support grouped task, and task relies on inquiry, to all behaviour
All be added to batch processing, accordingly even when task popularization also can quick batch operation task so that on a large scale
Task O&M become to be simple and efficient.
Detailed description of the invention
Fig. 1 is the configuration diagram of background technique Apache Oozie frame of the invention;
Flow tasks when Fig. 2 is background technique Apache Oozie Development of Framework and O&M of the invention dispatch signal
Figure;
Fig. 3 is the framework general illustration of dispatching method of the invention;
Fig. 4 is the operation interface schematic diagram of one embodiment of the present of invention.
Specific embodiment
Embodiment of the invention, as shown in Figure 3 and Figure 4 is further illustrated with reference to the accompanying drawings and examples,
A kind of dispatching device of the flow tasks based on Apache Oozie frame processing big data, including client are (preceding
End) and server end (rear end), the client includes interface operation module, and the server end includes that server-side operates mould
Block;Wherein, the interface operation module includes three task submission, task operating and Mission Monitor modules;
The front-end interface operation module with the entrance of user's operation for docking;The server end includes control layer
(Controller), operation layer (Service) and accumulation layer;
Wherein, the control layer is Service layer described for calling, according to the different parameters of user's submission to described
Service layers of transmission request;
Described Service layers includes task generation, three task operating, Mission Monitor modules, and described Service layers is used for
The core business processing of entire big data treatment process, and the end OOzieServer is sent by the final request of user, by
The end OOzieServer finally executes task;The task monitoring module is used for the state of timing query statistic task, by task
Operating condition statistical analysis generates Mission Monitor report, shows in task monitoring interface, if there is task execution fails, described
Be engaged in monitoring module sending e-mail alert submitter.
The accumulation layer includes MYSQL memory module and HDFS memory module, and the MYSQL memory module is for storing institute
State the metadata of flow tasks;The HDFS memory module is used to store the definition file of the flow tasks;All generations
The definition of task describes file and all stores onto HDFS, the metadata of all task runs, the result record including task run
All the Mysql database is arrived in storage.
Further, the task operating module of the server end include starting, hang up, modification, pause, restore, again run,
Kill, delete and submit unit.
Further, the interface operation module includes grouped task, and task relies on inquiry and batch processing unit.
Preferably, the control layer uses springMVC.The task generation module uses Velocity as template.
The client is realized using ElementUI+VUE.js+ECHART.js.The server end uses SpringBoot+
SpringMVC+Spring+Mybatis carries out secondary encapsulation to OOzie.The client and server end use MAVEN as
Construct strapping tool.
Co-ordination relationship between modules of the invention is as follows:
User submits interface in task, selects the flow nodes needed and fills in form attributes if it is Coordinate
Business user needs the input and output path of appointed task, the determinant attributes such as running frequency of task, and user's submission form arrives
Then springMVC control layer calls and arrives task generation module, task generation module uses Velocity as template, according to
Different task types generates different Action definition documents, finally completes all node processings to merge file generated
The workflow.xml that oozie workflow needs, also needs to generate coordinate.xml and connects if it is Coordinate task
Call HDFS memory module API the file of generation is uploaded on HDFS, if user chosen when the task of submission it is vertical
It runs, control layer can then call task operating module, and task operating module calls OOzieClient to OozieServer
Module sends submiting command, starts task by OOzieServer, while modifying the state of task and depositing the metadata of task
Store up Mysql database.
For task operating, user only needs to choose task list at interface, then chooses different operation buttons, passes through
To console, console can then call task operating module for Ajax request, task operating module call OOzieClient to
OozieServer module sends submiting command, starts task by OOzieServer, while modifying state and the modification of task
The state of corresponding task data in Mysql.
For Mission Monitor, front end is read using training in rotation task monitoring module in every five points of Ajax, task monitoring module
The data of task run in Mysql database are sorted by different dimensions by statistical packet, return the result to front end dynamic
Rendering report, if there is task execution failure then sends mail notification task submitter.
Above description merely relates to certain specific embodiments of the invention, and any those skilled in the art is based on this
The replacement or improvement that the spirit of invention is done should be protection scope of the present invention and covered, protection scope of the present invention Ying Yiquan
Subject to sharp claim.
Claims (8)
1. a kind of dispatching device of the flow tasks based on Apache Oozie frame processing big data, including client (front end)
With server end (rear end), the client includes interface operation module, and the server end includes server-side operation module;Its
In, the interface operation module includes three task submission, task operating and Mission Monitor modules;
The front-end interface operation module with the entrance of user's operation for docking;The server end includes control layer
(Controller), operation layer (Service) and accumulation layer;
Wherein, the control layer is Service layer described for calling, according to the different parameters of user's submission to the Service
Layer sends request;
Described Service layers includes task generation, three task operating, Mission Monitor modules, and described Service layers for entire
The core business of big data treatment process is handled, and sends the end OOzieServer for the final request of user, by
The end OOzieServer finally executes task;The task monitoring module is used for the state of timing query statistic task, by task
Operating condition statistical analysis generates Mission Monitor report, shows in task monitoring interface, if there is task execution fails, described
Be engaged in monitoring module sending e-mail alert submitter.
The accumulation layer includes MYSQL memory module and HDFS memory module, and the MYSQL memory module is for storing the stream
The metadata of journey task;The HDFS memory module is used to store the definition file of the flow tasks;The task of all generations
Definition describe file and all store onto HDFS, the metadata of all task runs, the result record including task run is all deposited
Store up the Mysql database.
2. the dispatching device of the flow tasks according to claim 1 based on Apache Oozie frame processing big data,
It is characterized in that, the task operating module of the server end include starting, hang up, modification, pause, restore, again run, kill,
Delete and submit unit.
3. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data
It sets, which is characterized in that the interface operation module includes grouped task, and task relies on inquiry and batch processing unit.
4. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data
It sets, which is characterized in that the control layer uses springMVC.
5. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data
It sets, which is characterized in that the task generation module uses Velocity as template.
6. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data
It sets, which is characterized in that the client is realized using ElementUI+VUE.js+ECHART.js.
7. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data
It sets, which is characterized in that the server end carries out OOzie using SpringBoot+SpringMVC+Spring+Mybatis
Secondary encapsulation.
8. the scheduling dress of the flow tasks according to claim 1 or 2 based on Apache Oozie frame processing big data
It sets, which is characterized in that the client and server end uses MAVEN as building strapping tool.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811204278.5A CN109284324A (en) | 2018-10-16 | 2018-10-16 | The dispatching device of flow tasks based on Apache Oozie frame processing big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811204278.5A CN109284324A (en) | 2018-10-16 | 2018-10-16 | The dispatching device of flow tasks based on Apache Oozie frame processing big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109284324A true CN109284324A (en) | 2019-01-29 |
Family
ID=65177737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811204278.5A Pending CN109284324A (en) | 2018-10-16 | 2018-10-16 | The dispatching device of flow tasks based on Apache Oozie frame processing big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109284324A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694650A (en) * | 2020-06-17 | 2020-09-22 | 科技谷(厦门)信息技术有限公司 | Multidimensional data job scheduling system |
CN111708751A (en) * | 2019-12-27 | 2020-09-25 | 山东鲁能软件技术有限公司 | Method, system, equipment and readable storage medium for realizing data loading based on Hue |
CN113220438A (en) * | 2021-06-02 | 2021-08-06 | 中国邮政储蓄银行股份有限公司 | System for executing operation, method and device for testing batch operation |
CN113326117A (en) * | 2021-07-15 | 2021-08-31 | 中国电子科技集团公司第十五研究所 | Task scheduling method, device and equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101478431A (en) * | 2009-02-10 | 2009-07-08 | 浪潮通信信息系统有限公司 | Task scheduling system for management by visible process |
CN101694709A (en) * | 2009-09-27 | 2010-04-14 | 华中科技大学 | Service-oriented distributed work flow management system |
CN103577256A (en) * | 2013-11-21 | 2014-02-12 | 五八同城信息技术有限公司 | Distributed timed task dispatching system |
CN104536809A (en) * | 2014-11-26 | 2015-04-22 | 上海瀚之友信息技术服务有限公司 | Distributed timing task scheduling system based on client and server system |
US9172608B2 (en) * | 2012-02-07 | 2015-10-27 | Cloudera, Inc. | Centralized configuration and monitoring of a distributed computing cluster |
CN105446812A (en) * | 2016-01-04 | 2016-03-30 | 中国南方电网有限责任公司 | Multitask scheduling configuration method |
CN105867907A (en) * | 2016-03-23 | 2016-08-17 | 沈阳师范大学 | JSS multi-layer Web development framework design method removing service coupling |
CN108037919A (en) * | 2017-12-01 | 2018-05-15 | 北京博宇通达科技有限公司 | A kind of visualization big data workflow configuration method and system based on WEB |
-
2018
- 2018-10-16 CN CN201811204278.5A patent/CN109284324A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101478431A (en) * | 2009-02-10 | 2009-07-08 | 浪潮通信信息系统有限公司 | Task scheduling system for management by visible process |
CN101694709A (en) * | 2009-09-27 | 2010-04-14 | 华中科技大学 | Service-oriented distributed work flow management system |
US9172608B2 (en) * | 2012-02-07 | 2015-10-27 | Cloudera, Inc. | Centralized configuration and monitoring of a distributed computing cluster |
CN103577256A (en) * | 2013-11-21 | 2014-02-12 | 五八同城信息技术有限公司 | Distributed timed task dispatching system |
CN104536809A (en) * | 2014-11-26 | 2015-04-22 | 上海瀚之友信息技术服务有限公司 | Distributed timing task scheduling system based on client and server system |
CN105446812A (en) * | 2016-01-04 | 2016-03-30 | 中国南方电网有限责任公司 | Multitask scheduling configuration method |
CN105867907A (en) * | 2016-03-23 | 2016-08-17 | 沈阳师范大学 | JSS multi-layer Web development framework design method removing service coupling |
CN108037919A (en) * | 2017-12-01 | 2018-05-15 | 北京博宇通达科技有限公司 | A kind of visualization big data workflow configuration method and system based on WEB |
Non-Patent Citations (1)
Title |
---|
万川梅: "《云技术应用》", 31 August 2013, 成都:西南交通大学出版社 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708751A (en) * | 2019-12-27 | 2020-09-25 | 山东鲁能软件技术有限公司 | Method, system, equipment and readable storage medium for realizing data loading based on Hue |
CN111708751B (en) * | 2019-12-27 | 2024-02-02 | 山东鲁能软件技术有限公司 | Method, system, equipment and readable storage medium for realizing data loading based on Hue |
CN111694650A (en) * | 2020-06-17 | 2020-09-22 | 科技谷(厦门)信息技术有限公司 | Multidimensional data job scheduling system |
CN113220438A (en) * | 2021-06-02 | 2021-08-06 | 中国邮政储蓄银行股份有限公司 | System for executing operation, method and device for testing batch operation |
CN113326117A (en) * | 2021-07-15 | 2021-08-31 | 中国电子科技集团公司第十五研究所 | Task scheduling method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10901791B2 (en) | Providing configurable workflow capabilities | |
US11086688B2 (en) | Managing resource allocation in a stream processing framework | |
US11288142B2 (en) | Recovery strategy for a stream processing system | |
US11296961B2 (en) | Simplified entity lifecycle management | |
US11615084B1 (en) | Unified data processing across streaming and indexed data sets | |
US9965330B2 (en) | Maintaining throughput of a stream processing framework while increasing processing load | |
US11238048B1 (en) | Guided creation interface for streaming data processing pipelines | |
US9842000B2 (en) | Managing processing of long tail task sequences in a stream processing framework | |
JP6523354B2 (en) | State machine builder with improved interface and handling of state independent events | |
CN109684053B (en) | Task scheduling method and system for big data | |
US20190155646A1 (en) | Providing strong ordering in multi-stage streamng processing | |
CN109284324A (en) | The dispatching device of flow tasks based on Apache Oozie frame processing big data | |
US8812752B1 (en) | Connector interface for data pipeline | |
CN107103064B (en) | Data statistical method and device | |
CN111837121B (en) | Key-based logging with executable logic for processing structured data items | |
CN110781180A (en) | Data screening method and data screening device | |
CN112149838A (en) | Method, device, electronic equipment and storage medium for realizing automatic model building | |
CN109471709A (en) | The dispatching method of flow tasks based on Apache Oozie frame processing big data | |
Yahia | A language-based approach for web service composition | |
Laurent | A Language-Based Approach for Web Service Composition | |
Alodib | An analytical Approach for the Enhancement of Services Provided using Big Data Technique. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190129 |
|
RJ01 | Rejection of invention patent application after publication |