CN110209486A

CN110209486A - Spark flow of task construction method and computer readable storage medium based on interface

Info

Publication number: CN110209486A
Application number: CN201910490107.1A
Authority: CN
Inventors: 陈光淙; 涂建群
Original assignee: Linewell Software Co Ltd
Current assignee: Linewell Software Co Ltd
Priority date: 2019-06-06
Filing date: 2019-06-06
Publication date: 2019-09-06

Abstract

The present invention provides a kind of spark flow of task construction method and computer readable storage medium based on interface, it include: to establish task component library, the task component library includes a plurality of task components being packaged by spark operator, defines the task execution relationship between the configuration attribute of task component and forerunner's postposition relationship and the task component；Visualization interface is provided, shows the task component, and obtains user on the visualization interface to the operating result of the task component, the operating result is directed acyclic graph；The directed acyclic graph is traversed using topological sorting algorithm, obtains spark task execution queue；The spark task execution queue is parsed in sequence, and obtaining spark can be performed operator queue；Executing the spark can be performed operator queue, obtains implementing result and is shown on the visualization interface.The present invention is by being packaged into task component for spark operator and providing visualization interface, user-friendly spark computing engines, and easy to operate easy to get started, it is not easy to malfunction.

Description

Spark flow of task construction method and computer readable storage medium based on interface

Technical field

The present invention relates to a kind of computer visualization programming field more particularly to a kind of spark task flows based on interface Journey construction method and computer readable storage medium.

Background technique

With the fast development of big data technology, major incorporated business, especially Internet enterprises, all from all angles Acquire data, storing data, processing data, sharing data, retrieval data, analysis data, display data and mining data behind Commercial value dissolve the problem of full-service chain big data analysis and by making one-stop big data analysis platform.With industry The fining of business application reduces the focus on research direction that data analysis difficulty has become each major company.

Spark is the computing engines for the Universal-purpose quick for aiming at large-scale data processing and designing, and is possessed efficient, stable special Property and powerful community support, be a kind of data analysis technique of mainstream.Overwhelming majority developer is by writing now Code direct construction spark flow of task, for the datamation person for being ignorant of encoding, spark is relatively high using threshold, even if It is also very not intuitive by writing code construction spark task for the programming personnel for understanding coding, and it is easy error.

Summary of the invention

One of the technical problem to be solved in the present invention is to provide a kind of spark flow of task building side based on interface Method makes user that can construct spark flow of task, and require according to user is specified by the drag operation of component on interface Execute spark task.

One of the technical problem to be solved in the present invention is achieved in that

Step 10 establishes task component library, and the task component library includes a plurality of being packaged by spark operator for tasks Component defines the configuration attribute of the task component；

Task execution relationship between step 20, the forerunner's postposition relationship and the task component of the definition task component；

Step 30 provides visualization interface, shows the task component, and it is right on the visualization interface to obtain user The operating result of the task component, the operating result are directed acyclic graph；

Step 40 traverses the directed acyclic graph using topological sorting algorithm, obtains spark task execution queue；

Step 50 parses the spark task execution queue in sequence, and obtaining spark can be performed operator queue；

Step 60 executes the executable operator queue of the spark, obtains implementing result and is shown in the visualization interface On.

Further, in the step 10, the spark operator includes shipping calculation, union, sequence, data merging, number It is written according to reading with data.

Further, by the configuration attribute of task component described in JSON or XML definition, the task component forerunner after Set the task execution relationship between relationship and the task component.

Further, in the step 30, user includes to the operation of the task component on the visualization interface Pull task component described in the task component and line.

Further, the step 60 executes each group specifically, operator queue sequence can be performed according to the spark Part, input of the output of previous component as the latter component, until terminating, obtaining implementing result and being shown in described visual Change on interface.

The second technical problem to be solved by the present invention is to provide a kind of computer readable storage medium, be stored thereon with Computer program (instruction), which is characterized in that the program (instruction) performs the steps of when being executed by processor

The present invention has the advantage that by the way that spark operator is packaged into task component and provides visualized operation interface, User can construct the analysis process for meeting actual demand according to the demand of the data processing of oneself, on the visualization interface Spark task is constructed by carrying out the operations such as dragging and line to the task component, user-friendly spark calculating is drawn It holds up, makes user is more intuitive to understand each operating process, and easy to operate easy to get started, it is not easy to malfunction.

Detailed description of the invention

The present invention is further illustrated in conjunction with the embodiments with reference to the accompanying drawings.

Fig. 1 is spark flow of task construction method execution flow chart of the embodiment of the present invention based on interface.

Fig. 2 is that the embodiment of the present invention is configured based on the spark flow of task construction method task component property parameters at interface One of schematic diagram.

Fig. 3 is that the embodiment of the present invention is configured based on the spark flow of task construction method task component property parameters at interface The two of schematic diagram.

Fig. 4 is spark flow of task construction method visualization interface and operating result of the embodiment of the present invention based on interface Schematic diagram.

Fig. 5 is spark flow of task construction method Spark execution core code signal of the embodiment of the present invention based on interface Figure.

Specific embodiment

Fig. 1 to 5 is please referred to, the embodiment of the present invention provides a kind of spark flow of task construction method based on interface, packet It includes:

Step 10 establishes task component library, and the task component library includes a plurality of being packaged by spark operator for tasks Component defines the configuration attribute of the task component；The spark operator includes shipping calculation (such as " intersection comparison ") and transporting Calculate (such as " data merging "), sequence (such as " field is shown "), data merging (such as " data deduplication "), reading data (ratio Such as " data source ") and data write-in (such as " file output ").

The configuration attribute of task component described in JSON or XML definition can be passed through；

A kind of JSON (JavaScript Object Notation) data interchange format of lightweight, have it is good can Reading and the characteristic convenient for quickly writing.Data exchange can be carried out between different platform.JSON is very high, complete using compatibility Independently of language text format, at the same also have similar to C language habit (including C, C++, C#, Java, JavaScript, Perl, Python etc.) system behavior.

Extending mark language (Extensible Markup Language, XML), for marking electronic document to make it have Structural markup language can be used to flag data, define data type, be a kind of markup language for allowing user to oneself The original language being defined.

The embodiment of the present invention defines the configuration attribute of the task component using JSON；Each task component is according to function spy Property needs to define different property parameters, includes common configuration item below:

Id: the ID of each component instance；

Name: component Name；

NameCn: component Chinese；

Type: indicating component type, such as: data source nodes: IN, data out node: OUT, data transformation node: TRANSFORM, data merge node: ASSOCIATION etc.；

Rendering: component property rendering data, the data usually when constructing model, are obtained from previous node It takes, each component data to be rendered are different, it may be possible to common input frame, it may be possible to combobox, it may be possible to array；Output Node needs to configure [output field], and initial value is a list；

Condition: the final result that attribute data configuration is completed；Usually backstage completes to calculate the parameter needed, often The different parameter of a component may be common input value, it is also possible to which list, attribute [separator] are a common inputs It is worth (such as Fig. 2)；Attribute [output field] is the list of fields (such as Fig. 3) chosen.

The task between the forerunner's postposition relationship and the task component of task component described in JSON or XML definition can be passed through Execution relationship, the embodiment of the present invention are defined between forerunner's postposition relationship of the task component and the task component using JSON Task execution relationship, definition format are as follows:

Name: each type can may be distinguished, such as data source section there are many different realizations by name Point may include relational database data source: DATA_RESOURCE, HDFS data source: HDFS_RESOURCE, ES data source: ES_RESOURCE etc.:

NameCn: the Chinese of node

Icon: component icon

Input: type and the quantity definition of predecessor node；DataType indicates the output type of predecessor node, can be Data flow: the number of dataSet, num expression predecessor node: -1 indicates can there is numerous predecessor node；0 indicates to be not allow for Predecessor node；The corresponding predecessor node number of other specific digital representations；

Output: the type and definition of postposition node；DataType indicates the output type of this node, can be data Stream: dataSet.The number of Num expression postposition node: -1 indicates can there is numerous postposition node；0 indicates to be not allow for postposition Node；The corresponding postposition node number of other specific digital representations；

Step 30 provides visualization interface (such as web browser), shows the task component, and obtain user in institute It states to the operating result of the task component on visualization interface, user is on the visualization interface to the task component Operation includes pulling task component described in the task component and line, and the finally obtained operating result is directed acyclic graph (such as Fig. 4)；

Step 40 traverses the directed acyclic graph using topological sorting algorithm, obtains spark task execution queue；User The operating result of the task component is constructed by JavaScript on the visualization interface；Front end passes through After JavaScript builds process, configuration flow is passed into backstage, the JSON format of core is as follows:

Node: component array

Node.x, Node.y: origin coordinates of the component on drawing board

Name: component Name

Type: component type

Condition: component parameter configuration

Link: inter-module connecting line array, source, target correspond to the id field in Node structure；

Daemon analytics engine parses the spark task execution queue in sequence, and obtaining spark can be performed operator Queue；Distinct interface, including association analysis component interface, data source component interface, output are realized according to different component classifications Component interface and transition components interface, each component realize the interface of oneself, such as data source component using spark grammer: logical It crosses the method for calling spark to read csv from HDFS and obtains data.

Step 60 executes the executable operator queue of the spark, obtains implementing result and is shown in the visualization interface On.Specifically, operator queue sequence can be performed according to the spark and executes each component, the output conduct of previous component The input of the latter component obtains implementing result and is shown on the visualization interface, Spark executes core until terminating Code such as Fig. 5 can be performed operator queue sequence according to the spark and execute each component, the output conduct of previous component The input of the latter component obtains implementing result and is shown on the visualization interface until terminating.

It refer again to Fig. 1 to 5, the present invention provides a kind of computer readable storage medium, is stored thereon with computer program (instruction), the program (instruction) perform the steps of when being executed by processor

Id: the ID of each component instance；

Name: component Name；

NameCn: component Chinese；

NameCn: the Chinese of node

Icon: component icon

Node: component array

Node.x, Node.y: origin coordinates of the component on drawing board

Name: component Name

Type: component type

Condition: component parameter configuration

For the present invention by the way that spark operator is packaged into task component and provides visualized operation interface, user can basis The demand building of the data processing of oneself meets the analysis process of actual demand, by described on the visualization interface Business component pull and the operations such as line building spark task, user-friendly spark computing engines keep user more straight It sees and understands each operating process, and is easy to operate easy to get started, it is not easy to malfunction.

Although specific embodiments of the present invention have been described above, those familiar with the art should be managed Solution, we are merely exemplary described specific embodiment, rather than for the restriction to the scope of the present invention, it is familiar with this The technical staff in field should be covered of the invention according to modification and variation equivalent made by spirit of the invention In scope of the claimed protection.

Claims

1. a kind of spark flow of task construction method based on interface, which comprises the steps of:

Step 10 establishes task component library, and the task component library includes a plurality of task groups being packaged by spark operator Part defines the configuration attribute of the task component；

Step 30 provides visualization interface, shows the task component, and obtain user on the visualization interface to described The operating result of task component, the operating result are directed acyclic graph；

Step 60 executes the executable operator queue of the spark, obtains implementing result and is shown on the visualization interface.

2. the spark flow of task construction method based on interface as described in claim 1, it is characterised in that: the step 10 In, the spark operator includes shipping calculation, union, sequence, data merging, reading data and data write-in.

3. the spark flow of task construction method based on interface as described in claim 1, it is characterised in that: by JSON or The configuration attribute of task component described in XML definition, the task component forerunner's postposition relationship and the task component between appoint Business execution relationship.

4. the spark flow of task construction method based on interface as described in claim 1, it is characterised in that: the step 30 In, user includes pulling to appoint described in the task component and line to the operation of the task component on the visualization interface Business component.

5. the spark flow of task construction method based on interface as described in claim 1, it is characterised in that: the step 60 Specifically, operator queue sequence, which can be performed, according to the spark executes each component, the output of previous component is as latter The input of a component obtains implementing result and is shown on the visualization interface until terminating.

6. a kind of computer readable storage medium is stored thereon with computer program (instruction), which is characterized in that the program (refers to Enable) it performs the steps of when being executed by processor

7. a kind of computer readable storage medium as claimed in claim 6, it is characterised in that: described in the step 10 Spark operator includes shipping calculation, union, sequence, data merging, reading data, data write-in.

8. a kind of computer readable storage medium as claimed in claim 6, it is characterised in that: pass through JSON or XML definition institute The task execution stated between the configuration attribute of task component, forerunner's postposition relationship of the task component and the task component is closed System.

9. a kind of computer readable storage medium as claimed in claim 6, it is characterised in that: in the step 30, Yong Hu It include pulling task component described in the task component and line to the operation of the task component on the visualization interface.

10. a kind of computer readable storage medium as claimed in claim 6, it is characterised in that: the step 60 is specifically, press Operator queue sequence can be performed according to the spark and execute each component, the output of previous component is as the latter component Input obtains implementing result and is shown on the visualization interface until terminating.