CN111625269A - Web-based universal Spark task submission system and method - Google Patents

Web-based universal Spark task submission system and method Download PDF

Info

Publication number
CN111625269A
CN111625269A CN202010408018.0A CN202010408018A CN111625269A CN 111625269 A CN111625269 A CN 111625269A CN 202010408018 A CN202010408018 A CN 202010408018A CN 111625269 A CN111625269 A CN 111625269A
Authority
CN
China
Prior art keywords
task
spark
submission
database
web
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010408018.0A
Other languages
Chinese (zh)
Inventor
贺群雄
匡岳锋
胡鹏
傅苗
曹林
刘湘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Power Industry Internet Co ltd
Original Assignee
China Power Industry Internet Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Power Industry Internet Co ltd filed Critical China Power Industry Internet Co ltd
Priority to CN202010408018.0A priority Critical patent/CN111625269A/en
Publication of CN111625269A publication Critical patent/CN111625269A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a Web-based universal Spark task submitting system and a Web-based universal Spark task submitting method, wherein the system comprises a Web interface module, a submitting service module, a Spark task uploading module and a database; the Web interface module is used for displaying pages and interacting with data of the database, storing spare task submission information configured by a user to the database, and displaying task processing monitoring information in the database; the submission service module interacts with the data of the database, reads Spark task submission information stored in the database and generates a submission command; and the Spark task uploading module interacts with the submission service module and the database data, reads the Spark task submission information, generates a jar packet of a corresponding Spark task and uploads the jar packet to a Spark cluster after acquiring a submission command, and simultaneously stores the task processing monitoring information to the database. The developer can automatically submit the task on the spark cluster without writing codes and submitting the task, and only needs to configure corresponding data processing flow and task operation parameters on the Web.

Description

Web-based universal Spark task submission system and method
Technical Field
The invention relates to the technical field of data processing, in particular to a Web-based universal Spark task submission system and a Web-based universal Spark task submission method.
Background
Spark is a general memory parallel computing framework developed by the berkeley division AMP laboratory (Algorithms, Machines, and peoplesab) of the university of california, is used for constructing a large-scale and low-delay data analysis application program, is an open source cluster computing framework, is a rapid and general computing engine specially designed for large-scale data processing, can be used for completing various operations, including SQL query, text processing, machine learning and the like, and can operate in multiple modes such as standalon, Yarn, MeSOS (Spark several common installation and deployment modes). The developer needs to receive data, process data and output data according to a special programming mode of spark, and then manually generates a jar packet and submits the jar packet to a corresponding spark cluster, so that spark can allocate cluster resources to run the task.
In the above background, developers need to write a spark project separately for each task and make a jar packet to submit to the spark cluster. The programming format of the spark project is almost the same, and the spark project reads data from an input source, outputs new generated data through the conversion of rdd, dataFrame or dataSet, and then manually generates jar packets to submit to the spark cluster. With the increase of projects, a large amount of repeated codes exist, a large amount of time is spent for generating jar packages and submitting tasks, and development cost is increased.
Disclosure of Invention
Aiming at the problem that development cost is increased because a Spark project needs to be written into a jar packet and submitted to a Spark cluster for each task independently in the prior art, the invention provides a Web-based universal Spark task submitting system and method, which are suitable for most data processing tasks, and developers can automatically submit tasks on the Spark cluster only by configuring corresponding data processing flows and task operating parameters on Web without writing codes and submitting tasks.
In order to achieve the purpose, the invention provides a Web-based universal Spark task submission system, which comprises a Web interface module, a submission service module, a Spark task uploading module and a database, wherein the Web interface module, the submission service module, the task uploading module and the database are all arranged on a Web server;
the Web interface module is used for displaying pages and carrying out data interaction with the database, saving spare task submission information configured on the Web interface module by a user to the database, and displaying task processing monitoring information in the database;
the submission service module carries out data interaction with the database, reads Spark task submission information stored in the database and generates a submission command;
and after acquiring a submitting command, the Spark task uploading module reads Spark task submitting information in the database, generates a jar packet of a corresponding Spark task, uploads the jar packet to a Spark cluster for task processing, and stores task processing monitoring information to the database.
As a further improvement of the above technical solution, the spare task submission information includes a task data source, a task flow, a task output source, and a task submission parameter.
As a further improvement of the above technical solution, the task flow includes, but is not limited to, at least one of filtering, dividing, and splicing.
As a further improvement of the above technical solution, the task processing monitoring information includes, but is not limited to, whether the task is successfully started, consumed memory, remaining memory, running time, and consumed cpu core number.
In order to achieve the above object, the present invention further provides a Web-based generic Spark task submission method, which is characterized by comprising the following steps:
step 1, editing Spark task submission information on a page of a Web interface module, and storing the Spark task submission information to a database;
step 2, writing a Spark project, determining a main class as an entrance of the Spark task, and transmitting a task ID associated with the Spark task submission information in the step 1 to the main class;
step 3, searching corresponding Spark task submission information in the data based on the task ID, generating a data processing logic for use according to the Spark task submission information in a Spark project, and then making the Spark project into jar packets to be stored in a Web server;
step 4, the submission service module generates a submission command based on Spark task submission information and the task ID;
step 5, after acquiring a submission command sent by the submission service module, the Spark task uploading module uploads the jar packet in the step 3 to a Spark cluster for task processing, and meanwhile, acquires corresponding task processing monitoring information and stores the task processing monitoring information to a database;
and 6, reading the task processing monitoring information in the database by the Web interface module, generating a visual chart and displaying the visual chart on a page of the Web interface module.
As a further improvement of the above technical solution, in step 1, the spare task submission information includes a task data source, a task flow, a task output source, and a task submission parameter.
As a further improvement of the above technical solution, in step 3, in the process of programming the spark item, a timing monitoring thread is programmed in the main method of the main class of the spark item, and the spark context is transmitted to the thread for acquiring the task processing monitoring information, and finally the task processing monitoring information is stored in the database.
The invention provides a Web-based universal Spark task submitting system and a Web-based universal Spark task submitting method, aiming at the problem that development cost is increased because a Spark project needs to be written separately for each task and is made into a jar packet to be submitted to a Spark cluster in the prior art, a B/S (Browser/Server, Browser/Server mode) framework is adopted to realize visual configuration of Spark data flow processing, the Web-based universal Spark task submitting system is suitable for most data processing tasks, and developers can automatically submit tasks on the Spark cluster only by configuring corresponding data processing flows and task operating parameters on the Web without writing codes and submitting tasks.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a schematic block diagram of a Web-based generic Spark task submission system according to an embodiment of the present invention;
fig. 2 is a schematic flow diagram of a Web-based generic Spark task submission method in the embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that all the directional indicators (such as up, down, left, right, front, and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.
In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "connected," "secured," and the like are to be construed broadly, and for example, "secured" may be a fixed connection, a removable connection, or an integral part; the connection can be mechanical connection, electrical connection, physical connection or wireless communication connection; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.
As shown in fig. 1, in the general Spark task submission system based on the Web, a Web interface module, a submission service module, a Spark task upload module, and a database are all arranged on a Web server;
the Web interface module is used for displaying pages and performing data interaction with a database, saving Spark task submission information configured on the Web interface module by a user to the database and displaying task processing monitoring information in the database, wherein the Spark task submission information comprises a task data source, a task flow, a task output source and task submission parameters, and the task flow comprises at least one of filtering, dividing and splicing;
the data interaction is carried out between the submission service module and the database, and Spark task submission information stored in the database is read and a submission command is generated;
the Spark task uploading module performs data interaction with the submission service module and the database, reads Spark task submission information in the database after acquiring a submission command, generates a jar packet of a corresponding Spark task, uploads the jar packet to a Spark cluster for task processing, and stores task processing monitoring information to the database, wherein the task processing monitoring information includes but is not limited to whether a task is started successfully, memory consumption, residual memory, running time and cpu core consumption.
Referring to fig. 2, a method for submitting a general Spark task based on Web in this embodiment includes the following steps:
step 1, a Web project is built on a page of a Web interface module, information such as a task data source, a task flow (filtering, dividing, splicing and the like), a task output source, task submission parameters and the like is displayed on the page of the Web interface module, a developer edits and stores the information, and then the information is stored in a database according to a set rule;
step 2, writing a spark item, and determining a main class as an entry of the spark task, such as: com, Spark app, which transmits a task ID associated with Spark task submission information in step 1 to the main class;
step 3, searching corresponding Spark task submission information in the data based on the task ID, such as: after the Spark item generates data processing logic for use according to Spark task submission information, the Spark item is beaten into jar packet and stored to a designated position in the Web server, such as: jar/home/spark;
in the course of writing the spark project, writing a timing monitoring thread in the main class main method of the spark project, transmitting spark context into the thread for acquiring task processing monitoring information, and finally storing the task processing monitoring information into a database;
step 4, writing a service for submitting the spark jar packet, wherein the service searches a database regularly, queries a task ID to be started and corresponding task submitting parameters, and splices a series of spark submitting commands according to a directory where spark park-submit files are located, wherein the commands are approximately as follows:
/home/bin/spark-submit--classcn.com.SparkApp\
--mastermaster\
--deploy-modeyarn\
--driver-memory1g\
--executor-memory2g\
--executor-cores10\
--queuedefault\
/home/spark/spark.jar4
where/home/bin/spark-submit is the path where spark-submit is located, -classcn.com.sparkApp is the main class that specifies spark project runs, -mastermaster is the specified master mode, -default-modegrow is specified to run on grow, -driver-memory1g is the specified driver end run memory is 1g, instance-memory 2g is the specified one instance allocating 2g of memory, instance-core 10 is the specified one instance allocating 10 instances to the task, -queue default is the specified queue default, and/home/spark is the jar.r is the path where spark jar package is located, the last 4 is the last task ID to run, the last strip command to be executed;
step 5, after acquiring a submission command sent by the submission service module, the Spark task uploading module uploads the jar packet in the step 3 to a Spark cluster for task processing, wherein a monitoring thread simultaneously acquires corresponding task processing monitoring information and stores the task processing monitoring information to a database;
and 6, reading the task processing monitoring information in the database by the Web interface module, generating a visual chart and displaying the visual chart on a page of the Web interface module.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (7)

1. A general Spark task submitting system based on Web is characterized by comprising a Web interface module, a submitting service module, a Spark task uploading module and a database, wherein the Web interface module, the submitting service module, the task uploading module and the database are all arranged on a Web server;
the Web interface module is used for displaying pages and carrying out data interaction with the database, saving spare task submission information configured on the Web interface module by a user to the database, and displaying task processing monitoring information in the database;
the submission service module carries out data interaction with the database, reads Spark task submission information stored in the database and generates a submission command;
and after acquiring a submitting command, the Spark task uploading module reads Spark task submitting information in the database, generates a jar packet of a corresponding Spark task, uploads the jar packet to a Spark cluster for task processing, and stores task processing monitoring information to the database.
2. The Web-based generic Spark task submission system of claim 1, wherein the Spark task submission information includes task data sources, task flows, task output sources, and task submission parameters.
3. The Web-based generic Spark task submission system of claim 2, wherein the task flow includes, but is not limited to, at least one of filtering, splitting, and splicing.
4. The Web-based generic Spark task submission system of claim 1, wherein the task processing monitoring information includes, but is not limited to, whether a task was successfully started, consumed memory, remaining memory, run time, and number of cpu cores consumed.
5. A Web-based universal Spark task submission method is characterized by comprising the following steps:
step 1, editing Spark task submission information on a page of a Web interface module, and storing the Spark task submission information to a database;
step 2, writing a Spark project, determining a main class as an entrance of the Spark task, and transmitting a task ID associated with the Spark task submission information in the step 1 to the main class;
step 3, searching corresponding Spark task submission information in the data based on the task ID, generating a data processing logic for use according to the Spark task submission information in a Spark project, and then making the Spark project into jar packets to be stored in a Web server;
step 4, the submission service module generates a submission command based on Spark task submission information and the task ID;
step 5, after acquiring a submission command sent by the submission service module, the Spark task uploading module uploads the jar packet in the step 3 to a Spark cluster for task processing, and meanwhile, acquires corresponding task processing monitoring information and stores the task processing monitoring information to a database;
and 6, reading the task processing monitoring information in the database by the Web interface module, generating a visual chart and displaying the visual chart on a page of the Web interface module.
6. The Web-based universal Spark task submission method according to claim 5, wherein in step 1, the Spark task submission information includes a task data source, a task flow, a task output source, and task submission parameters.
7. The Web-based universal Spark task submission method according to claim 5, wherein in step 3, in the process of programming the Spark project, a timing monitoring thread is programmed in the main class main method of the Spark project, and Spark context is transmitted to the thread for acquiring task processing monitoring information, and finally the task processing monitoring information is stored in the database.
CN202010408018.0A 2020-05-14 2020-05-14 Web-based universal Spark task submission system and method Pending CN111625269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010408018.0A CN111625269A (en) 2020-05-14 2020-05-14 Web-based universal Spark task submission system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010408018.0A CN111625269A (en) 2020-05-14 2020-05-14 Web-based universal Spark task submission system and method

Publications (1)

Publication Number Publication Date
CN111625269A true CN111625269A (en) 2020-09-04

Family

ID=72271946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010408018.0A Pending CN111625269A (en) 2020-05-14 2020-05-14 Web-based universal Spark task submission system and method

Country Status (1)

Country Link
CN (1) CN111625269A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486501A (en) * 2020-11-17 2021-03-12 中国人寿保险股份有限公司 Spark application deployment management method and related equipment
CN112612514A (en) * 2020-12-31 2021-04-06 青岛海尔科技有限公司 Program development method and device, storage medium and electronic device
CN113448657A (en) * 2021-09-01 2021-09-28 深圳市信润富联数字科技有限公司 Method for generating and executing dynamic spark task
CN113742040A (en) * 2021-08-09 2021-12-03 广州市易工品科技有限公司 Method and device for quickly generating distributed batch processing tasks based on visual interface
CN115529306A (en) * 2022-07-22 2022-12-27 四川启睿克科技有限公司 Spring jar package remote submission method based on springboot

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1642105A (en) * 2004-01-05 2005-07-20 华为技术有限公司 Method for realizing task management for network system
CN106777101A (en) * 2016-12-14 2017-05-31 深圳天源迪科信息技术股份有限公司 Data processing engine
CN107977260A (en) * 2017-11-23 2018-05-01 北京神州泰岳软件股份有限公司 Task submits method and device
CN110008242A (en) * 2019-03-12 2019-07-12 广州亚美信息科技有限公司 One kind being based on Spark streaming program generator and program data processing method
CN110781007A (en) * 2019-10-31 2020-02-11 广州市网星信息技术有限公司 Task processing method, device, server, client, system and storage medium
CN110889108A (en) * 2019-11-26 2020-03-17 网易(杭州)网络有限公司 spark task submitting method and device and server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1642105A (en) * 2004-01-05 2005-07-20 华为技术有限公司 Method for realizing task management for network system
CN106777101A (en) * 2016-12-14 2017-05-31 深圳天源迪科信息技术股份有限公司 Data processing engine
CN107977260A (en) * 2017-11-23 2018-05-01 北京神州泰岳软件股份有限公司 Task submits method and device
CN110008242A (en) * 2019-03-12 2019-07-12 广州亚美信息科技有限公司 One kind being based on Spark streaming program generator and program data processing method
CN110781007A (en) * 2019-10-31 2020-02-11 广州市网星信息技术有限公司 Task processing method, device, server, client, system and storage medium
CN110889108A (en) * 2019-11-26 2020-03-17 网易(杭州)网络有限公司 spark task submitting method and device and server

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486501A (en) * 2020-11-17 2021-03-12 中国人寿保险股份有限公司 Spark application deployment management method and related equipment
CN112486501B (en) * 2020-11-17 2024-06-25 中国人寿保险股份有限公司 Spark application deployment management method and related equipment
CN112612514A (en) * 2020-12-31 2021-04-06 青岛海尔科技有限公司 Program development method and device, storage medium and electronic device
CN112612514B (en) * 2020-12-31 2023-11-28 青岛海尔科技有限公司 Program development method and device, storage medium and electronic device
CN113742040A (en) * 2021-08-09 2021-12-03 广州市易工品科技有限公司 Method and device for quickly generating distributed batch processing tasks based on visual interface
CN113742040B (en) * 2021-08-09 2024-04-19 广州市易工品科技有限公司 Method and device for quickly generating distributed batch processing task based on visual interface
CN113448657A (en) * 2021-09-01 2021-09-28 深圳市信润富联数字科技有限公司 Method for generating and executing dynamic spark task
CN113448657B (en) * 2021-09-01 2021-11-30 深圳市信润富联数字科技有限公司 Method for generating and executing dynamic spark task
CN115529306A (en) * 2022-07-22 2022-12-27 四川启睿克科技有限公司 Spring jar package remote submission method based on springboot
CN115529306B (en) * 2022-07-22 2024-05-17 四川启睿克科技有限公司 Springboot-based remote submitting method for spark jar packets

Similar Documents

Publication Publication Date Title
CN111625269A (en) Web-based universal Spark task submission system and method
WO2023071075A1 (en) Method and system for constructing machine learning model automated production line
CN109471890A (en) Generation method, terminal device and the medium of report file
US10831648B2 (en) Intermittent failure metrics in technological processes
US7634756B2 (en) Method and apparatus for dataflow creation and execution
US20180113746A1 (en) Software service execution apparatus, system, & method
US7840585B2 (en) DISCOSQL: distributed processing of structured queries
US20080313610A1 (en) Discoscript: a simplified distributed computing scripting language
CN104133772A (en) Automatic test data generation method
CN110377621B (en) Interface processing method and device based on calculation engine
CN107315764B (en) Method and system for updating non-relational database associated data
CN111427577A (en) Code processing method and device and server
CN108768790A (en) Distributed search cluster monitoring method and device, computing device, storage medium
CN115617338A (en) Method and device for quickly generating service page and readable storage medium
WO2023077732A1 (en) Knowledge visualization development method and system for data visualization large screen
Clem et al. Static analysis at github: An experience report
US9104356B2 (en) Extendable system for preprocessing print document and method for the same
CN117453713A (en) SQL sentence generation method, device and storage medium for multi-type database
US20130159327A1 (en) Apparatus and method for visualizing data
CN115630106A (en) Multi-scene parameter receiving method and device based on universal format analysis
CN111124482A (en) Method and device for configuring file information
JP2006294011A (en) Control program development support apparatus
CN114896168A (en) Rapid debugging system, method and memory for automatic driving algorithm development
CN111125264B (en) Extra-large set analysis method and device based on extended OLAP model
CN113835706A (en) Skeleton screen generation method and device based on artificial intelligence, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200904

RJ01 Rejection of invention patent application after publication