CN111625269A - Web-based universal Spark task submission system and method - Google Patents
Web-based universal Spark task submission system and method Download PDFInfo
- Publication number
- CN111625269A CN111625269A CN202010408018.0A CN202010408018A CN111625269A CN 111625269 A CN111625269 A CN 111625269A CN 202010408018 A CN202010408018 A CN 202010408018A CN 111625269 A CN111625269 A CN 111625269A
- Authority
- CN
- China
- Prior art keywords
- task
- spark
- submission
- database
- web
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 43
- 238000012544 monitoring process Methods 0.000 claims abstract description 30
- 230000003993 interaction Effects 0.000 claims description 7
- 230000000007 visual effect Effects 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 3
- 230000006872 improvement Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000005111 flow chemistry technique Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/71—Version control; Configuration management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a Web-based universal Spark task submitting system and a Web-based universal Spark task submitting method, wherein the system comprises a Web interface module, a submitting service module, a Spark task uploading module and a database; the Web interface module is used for displaying pages and interacting with data of the database, storing spare task submission information configured by a user to the database, and displaying task processing monitoring information in the database; the submission service module interacts with the data of the database, reads Spark task submission information stored in the database and generates a submission command; and the Spark task uploading module interacts with the submission service module and the database data, reads the Spark task submission information, generates a jar packet of a corresponding Spark task and uploads the jar packet to a Spark cluster after acquiring a submission command, and simultaneously stores the task processing monitoring information to the database. The developer can automatically submit the task on the spark cluster without writing codes and submitting the task, and only needs to configure corresponding data processing flow and task operation parameters on the Web.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a Web-based universal Spark task submission system and a Web-based universal Spark task submission method.
Background
Spark is a general memory parallel computing framework developed by the berkeley division AMP laboratory (Algorithms, Machines, and peoplesab) of the university of california, is used for constructing a large-scale and low-delay data analysis application program, is an open source cluster computing framework, is a rapid and general computing engine specially designed for large-scale data processing, can be used for completing various operations, including SQL query, text processing, machine learning and the like, and can operate in multiple modes such as standalon, Yarn, MeSOS (Spark several common installation and deployment modes). The developer needs to receive data, process data and output data according to a special programming mode of spark, and then manually generates a jar packet and submits the jar packet to a corresponding spark cluster, so that spark can allocate cluster resources to run the task.
In the above background, developers need to write a spark project separately for each task and make a jar packet to submit to the spark cluster. The programming format of the spark project is almost the same, and the spark project reads data from an input source, outputs new generated data through the conversion of rdd, dataFrame or dataSet, and then manually generates jar packets to submit to the spark cluster. With the increase of projects, a large amount of repeated codes exist, a large amount of time is spent for generating jar packages and submitting tasks, and development cost is increased.
Disclosure of Invention
Aiming at the problem that development cost is increased because a Spark project needs to be written into a jar packet and submitted to a Spark cluster for each task independently in the prior art, the invention provides a Web-based universal Spark task submitting system and method, which are suitable for most data processing tasks, and developers can automatically submit tasks on the Spark cluster only by configuring corresponding data processing flows and task operating parameters on Web without writing codes and submitting tasks.
In order to achieve the purpose, the invention provides a Web-based universal Spark task submission system, which comprises a Web interface module, a submission service module, a Spark task uploading module and a database, wherein the Web interface module, the submission service module, the task uploading module and the database are all arranged on a Web server;
the Web interface module is used for displaying pages and carrying out data interaction with the database, saving spare task submission information configured on the Web interface module by a user to the database, and displaying task processing monitoring information in the database;
the submission service module carries out data interaction with the database, reads Spark task submission information stored in the database and generates a submission command;
and after acquiring a submitting command, the Spark task uploading module reads Spark task submitting information in the database, generates a jar packet of a corresponding Spark task, uploads the jar packet to a Spark cluster for task processing, and stores task processing monitoring information to the database.
As a further improvement of the above technical solution, the spare task submission information includes a task data source, a task flow, a task output source, and a task submission parameter.
As a further improvement of the above technical solution, the task flow includes, but is not limited to, at least one of filtering, dividing, and splicing.
As a further improvement of the above technical solution, the task processing monitoring information includes, but is not limited to, whether the task is successfully started, consumed memory, remaining memory, running time, and consumed cpu core number.
In order to achieve the above object, the present invention further provides a Web-based generic Spark task submission method, which is characterized by comprising the following steps:
step 1, editing Spark task submission information on a page of a Web interface module, and storing the Spark task submission information to a database;
step 2, writing a Spark project, determining a main class as an entrance of the Spark task, and transmitting a task ID associated with the Spark task submission information in the step 1 to the main class;
step 3, searching corresponding Spark task submission information in the data based on the task ID, generating a data processing logic for use according to the Spark task submission information in a Spark project, and then making the Spark project into jar packets to be stored in a Web server;
step 4, the submission service module generates a submission command based on Spark task submission information and the task ID;
step 5, after acquiring a submission command sent by the submission service module, the Spark task uploading module uploads the jar packet in the step 3 to a Spark cluster for task processing, and meanwhile, acquires corresponding task processing monitoring information and stores the task processing monitoring information to a database;
and 6, reading the task processing monitoring information in the database by the Web interface module, generating a visual chart and displaying the visual chart on a page of the Web interface module.
As a further improvement of the above technical solution, in step 1, the spare task submission information includes a task data source, a task flow, a task output source, and a task submission parameter.
As a further improvement of the above technical solution, in step 3, in the process of programming the spark item, a timing monitoring thread is programmed in the main method of the main class of the spark item, and the spark context is transmitted to the thread for acquiring the task processing monitoring information, and finally the task processing monitoring information is stored in the database.
The invention provides a Web-based universal Spark task submitting system and a Web-based universal Spark task submitting method, aiming at the problem that development cost is increased because a Spark project needs to be written separately for each task and is made into a jar packet to be submitted to a Spark cluster in the prior art, a B/S (Browser/Server, Browser/Server mode) framework is adopted to realize visual configuration of Spark data flow processing, the Web-based universal Spark task submitting system is suitable for most data processing tasks, and developers can automatically submit tasks on the Spark cluster only by configuring corresponding data processing flows and task operating parameters on the Web without writing codes and submitting tasks.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a schematic block diagram of a Web-based generic Spark task submission system according to an embodiment of the present invention;
fig. 2 is a schematic flow diagram of a Web-based generic Spark task submission method in the embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that all the directional indicators (such as up, down, left, right, front, and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.
In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "connected," "secured," and the like are to be construed broadly, and for example, "secured" may be a fixed connection, a removable connection, or an integral part; the connection can be mechanical connection, electrical connection, physical connection or wireless communication connection; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.
As shown in fig. 1, in the general Spark task submission system based on the Web, a Web interface module, a submission service module, a Spark task upload module, and a database are all arranged on a Web server;
the Web interface module is used for displaying pages and performing data interaction with a database, saving Spark task submission information configured on the Web interface module by a user to the database and displaying task processing monitoring information in the database, wherein the Spark task submission information comprises a task data source, a task flow, a task output source and task submission parameters, and the task flow comprises at least one of filtering, dividing and splicing;
the data interaction is carried out between the submission service module and the database, and Spark task submission information stored in the database is read and a submission command is generated;
the Spark task uploading module performs data interaction with the submission service module and the database, reads Spark task submission information in the database after acquiring a submission command, generates a jar packet of a corresponding Spark task, uploads the jar packet to a Spark cluster for task processing, and stores task processing monitoring information to the database, wherein the task processing monitoring information includes but is not limited to whether a task is started successfully, memory consumption, residual memory, running time and cpu core consumption.
Referring to fig. 2, a method for submitting a general Spark task based on Web in this embodiment includes the following steps:
step 1, a Web project is built on a page of a Web interface module, information such as a task data source, a task flow (filtering, dividing, splicing and the like), a task output source, task submission parameters and the like is displayed on the page of the Web interface module, a developer edits and stores the information, and then the information is stored in a database according to a set rule;
step 2, writing a spark item, and determining a main class as an entry of the spark task, such as: com, Spark app, which transmits a task ID associated with Spark task submission information in step 1 to the main class;
step 3, searching corresponding Spark task submission information in the data based on the task ID, such as: after the Spark item generates data processing logic for use according to Spark task submission information, the Spark item is beaten into jar packet and stored to a designated position in the Web server, such as: jar/home/spark;
in the course of writing the spark project, writing a timing monitoring thread in the main class main method of the spark project, transmitting spark context into the thread for acquiring task processing monitoring information, and finally storing the task processing monitoring information into a database;
step 4, writing a service for submitting the spark jar packet, wherein the service searches a database regularly, queries a task ID to be started and corresponding task submitting parameters, and splices a series of spark submitting commands according to a directory where spark park-submit files are located, wherein the commands are approximately as follows:
/home/bin/spark-submit--classcn.com.SparkApp\
--mastermaster\
--deploy-modeyarn\
--driver-memory1g\
--executor-memory2g\
--executor-cores10\
--queuedefault\
/home/spark/spark.jar4
where/home/bin/spark-submit is the path where spark-submit is located, -classcn.com.sparkApp is the main class that specifies spark project runs, -mastermaster is the specified master mode, -default-modegrow is specified to run on grow, -driver-memory1g is the specified driver end run memory is 1g, instance-memory 2g is the specified one instance allocating 2g of memory, instance-core 10 is the specified one instance allocating 10 instances to the task, -queue default is the specified queue default, and/home/spark is the jar.r is the path where spark jar package is located, the last 4 is the last task ID to run, the last strip command to be executed;
step 5, after acquiring a submission command sent by the submission service module, the Spark task uploading module uploads the jar packet in the step 3 to a Spark cluster for task processing, wherein a monitoring thread simultaneously acquires corresponding task processing monitoring information and stores the task processing monitoring information to a database;
and 6, reading the task processing monitoring information in the database by the Web interface module, generating a visual chart and displaying the visual chart on a page of the Web interface module.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (7)
1. A general Spark task submitting system based on Web is characterized by comprising a Web interface module, a submitting service module, a Spark task uploading module and a database, wherein the Web interface module, the submitting service module, the task uploading module and the database are all arranged on a Web server;
the Web interface module is used for displaying pages and carrying out data interaction with the database, saving spare task submission information configured on the Web interface module by a user to the database, and displaying task processing monitoring information in the database;
the submission service module carries out data interaction with the database, reads Spark task submission information stored in the database and generates a submission command;
and after acquiring a submitting command, the Spark task uploading module reads Spark task submitting information in the database, generates a jar packet of a corresponding Spark task, uploads the jar packet to a Spark cluster for task processing, and stores task processing monitoring information to the database.
2. The Web-based generic Spark task submission system of claim 1, wherein the Spark task submission information includes task data sources, task flows, task output sources, and task submission parameters.
3. The Web-based generic Spark task submission system of claim 2, wherein the task flow includes, but is not limited to, at least one of filtering, splitting, and splicing.
4. The Web-based generic Spark task submission system of claim 1, wherein the task processing monitoring information includes, but is not limited to, whether a task was successfully started, consumed memory, remaining memory, run time, and number of cpu cores consumed.
5. A Web-based universal Spark task submission method is characterized by comprising the following steps:
step 1, editing Spark task submission information on a page of a Web interface module, and storing the Spark task submission information to a database;
step 2, writing a Spark project, determining a main class as an entrance of the Spark task, and transmitting a task ID associated with the Spark task submission information in the step 1 to the main class;
step 3, searching corresponding Spark task submission information in the data based on the task ID, generating a data processing logic for use according to the Spark task submission information in a Spark project, and then making the Spark project into jar packets to be stored in a Web server;
step 4, the submission service module generates a submission command based on Spark task submission information and the task ID;
step 5, after acquiring a submission command sent by the submission service module, the Spark task uploading module uploads the jar packet in the step 3 to a Spark cluster for task processing, and meanwhile, acquires corresponding task processing monitoring information and stores the task processing monitoring information to a database;
and 6, reading the task processing monitoring information in the database by the Web interface module, generating a visual chart and displaying the visual chart on a page of the Web interface module.
6. The Web-based universal Spark task submission method according to claim 5, wherein in step 1, the Spark task submission information includes a task data source, a task flow, a task output source, and task submission parameters.
7. The Web-based universal Spark task submission method according to claim 5, wherein in step 3, in the process of programming the Spark project, a timing monitoring thread is programmed in the main class main method of the Spark project, and Spark context is transmitted to the thread for acquiring task processing monitoring information, and finally the task processing monitoring information is stored in the database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010408018.0A CN111625269A (en) | 2020-05-14 | 2020-05-14 | Web-based universal Spark task submission system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010408018.0A CN111625269A (en) | 2020-05-14 | 2020-05-14 | Web-based universal Spark task submission system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111625269A true CN111625269A (en) | 2020-09-04 |
Family
ID=72271946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010408018.0A Pending CN111625269A (en) | 2020-05-14 | 2020-05-14 | Web-based universal Spark task submission system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111625269A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112486501A (en) * | 2020-11-17 | 2021-03-12 | 中国人寿保险股份有限公司 | Spark application deployment management method and related equipment |
CN112612514A (en) * | 2020-12-31 | 2021-04-06 | 青岛海尔科技有限公司 | Program development method and device, storage medium and electronic device |
CN113448657A (en) * | 2021-09-01 | 2021-09-28 | 深圳市信润富联数字科技有限公司 | Method for generating and executing dynamic spark task |
CN113742040A (en) * | 2021-08-09 | 2021-12-03 | 广州市易工品科技有限公司 | Method and device for quickly generating distributed batch processing tasks based on visual interface |
CN115529306A (en) * | 2022-07-22 | 2022-12-27 | 四川启睿克科技有限公司 | Spring jar package remote submission method based on springboot |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1642105A (en) * | 2004-01-05 | 2005-07-20 | 华为技术有限公司 | Method for realizing task management for network system |
CN106777101A (en) * | 2016-12-14 | 2017-05-31 | 深圳天源迪科信息技术股份有限公司 | Data processing engine |
CN107977260A (en) * | 2017-11-23 | 2018-05-01 | 北京神州泰岳软件股份有限公司 | Task submits method and device |
CN110008242A (en) * | 2019-03-12 | 2019-07-12 | 广州亚美信息科技有限公司 | One kind being based on Spark streaming program generator and program data processing method |
CN110781007A (en) * | 2019-10-31 | 2020-02-11 | 广州市网星信息技术有限公司 | Task processing method, device, server, client, system and storage medium |
CN110889108A (en) * | 2019-11-26 | 2020-03-17 | 网易(杭州)网络有限公司 | spark task submitting method and device and server |
-
2020
- 2020-05-14 CN CN202010408018.0A patent/CN111625269A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1642105A (en) * | 2004-01-05 | 2005-07-20 | 华为技术有限公司 | Method for realizing task management for network system |
CN106777101A (en) * | 2016-12-14 | 2017-05-31 | 深圳天源迪科信息技术股份有限公司 | Data processing engine |
CN107977260A (en) * | 2017-11-23 | 2018-05-01 | 北京神州泰岳软件股份有限公司 | Task submits method and device |
CN110008242A (en) * | 2019-03-12 | 2019-07-12 | 广州亚美信息科技有限公司 | One kind being based on Spark streaming program generator and program data processing method |
CN110781007A (en) * | 2019-10-31 | 2020-02-11 | 广州市网星信息技术有限公司 | Task processing method, device, server, client, system and storage medium |
CN110889108A (en) * | 2019-11-26 | 2020-03-17 | 网易(杭州)网络有限公司 | spark task submitting method and device and server |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112486501A (en) * | 2020-11-17 | 2021-03-12 | 中国人寿保险股份有限公司 | Spark application deployment management method and related equipment |
CN112486501B (en) * | 2020-11-17 | 2024-06-25 | 中国人寿保险股份有限公司 | Spark application deployment management method and related equipment |
CN112612514A (en) * | 2020-12-31 | 2021-04-06 | 青岛海尔科技有限公司 | Program development method and device, storage medium and electronic device |
CN112612514B (en) * | 2020-12-31 | 2023-11-28 | 青岛海尔科技有限公司 | Program development method and device, storage medium and electronic device |
CN113742040A (en) * | 2021-08-09 | 2021-12-03 | 广州市易工品科技有限公司 | Method and device for quickly generating distributed batch processing tasks based on visual interface |
CN113742040B (en) * | 2021-08-09 | 2024-04-19 | 广州市易工品科技有限公司 | Method and device for quickly generating distributed batch processing task based on visual interface |
CN113448657A (en) * | 2021-09-01 | 2021-09-28 | 深圳市信润富联数字科技有限公司 | Method for generating and executing dynamic spark task |
CN113448657B (en) * | 2021-09-01 | 2021-11-30 | 深圳市信润富联数字科技有限公司 | Method for generating and executing dynamic spark task |
CN115529306A (en) * | 2022-07-22 | 2022-12-27 | 四川启睿克科技有限公司 | Spring jar package remote submission method based on springboot |
CN115529306B (en) * | 2022-07-22 | 2024-05-17 | 四川启睿克科技有限公司 | Springboot-based remote submitting method for spark jar packets |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111625269A (en) | Web-based universal Spark task submission system and method | |
WO2023071075A1 (en) | Method and system for constructing machine learning model automated production line | |
CN109471890A (en) | Generation method, terminal device and the medium of report file | |
US10831648B2 (en) | Intermittent failure metrics in technological processes | |
US7634756B2 (en) | Method and apparatus for dataflow creation and execution | |
US20180113746A1 (en) | Software service execution apparatus, system, & method | |
US7840585B2 (en) | DISCOSQL: distributed processing of structured queries | |
US20080313610A1 (en) | Discoscript: a simplified distributed computing scripting language | |
CN104133772A (en) | Automatic test data generation method | |
CN110377621B (en) | Interface processing method and device based on calculation engine | |
CN107315764B (en) | Method and system for updating non-relational database associated data | |
CN111427577A (en) | Code processing method and device and server | |
CN108768790A (en) | Distributed search cluster monitoring method and device, computing device, storage medium | |
CN115617338A (en) | Method and device for quickly generating service page and readable storage medium | |
WO2023077732A1 (en) | Knowledge visualization development method and system for data visualization large screen | |
Clem et al. | Static analysis at github: An experience report | |
US9104356B2 (en) | Extendable system for preprocessing print document and method for the same | |
CN117453713A (en) | SQL sentence generation method, device and storage medium for multi-type database | |
US20130159327A1 (en) | Apparatus and method for visualizing data | |
CN115630106A (en) | Multi-scene parameter receiving method and device based on universal format analysis | |
CN111124482A (en) | Method and device for configuring file information | |
JP2006294011A (en) | Control program development support apparatus | |
CN114896168A (en) | Rapid debugging system, method and memory for automatic driving algorithm development | |
CN111125264B (en) | Extra-large set analysis method and device based on extended OLAP model | |
CN113835706A (en) | Skeleton screen generation method and device based on artificial intelligence, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200904 |
|
RJ01 | Rejection of invention patent application after publication |