CN115291872A

CN115291872A - Data processing method, electronic device and storage medium

Info

Publication number: CN115291872A
Application number: CN202210999201.1A
Authority: CN
Inventors: 廉斌
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2022-08-19
Filing date: 2022-08-19
Publication date: 2022-11-04

Abstract

The invention discloses a data processing method, a data processing device and electronic equipment, wherein the method comprises the following steps: responding to a user entering a task management platform, and displaying a task creation interface on the task management platform, wherein the task creation interface comprises a stream task creation component and a batch task creation component; responding to the operation of the user on any creating component on the task creating interface, and entering a configuration interface of a corresponding task; and submitting the information configured by the user and a dependency package of the plug-in stored by the task management platform and related to the information configured by the user to execute in response to the configuration operation and a submission instruction of the user on the configuration interface. According to the embodiment of the invention, the components used by the user are created and stored in advance, when the components are required to be used, data development is carried out only by dragging the components on the page, the background can use the dependency packages of the related plug-ins as required, the use threshold of big data development is solved, and the method is more convenient and friendly for the user.

Description

Data processing method, electronic device and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data processing method, an electronic device, and a storage medium.

Background

The existing data processing similar technologies on the market at present comprise a Dolphin scheduler (data processing platform) and a certain cloud E-MapReduce; the Apache Dolphin scheduler is a distributed decentralized, easily extensible visual DAG (directed acyclic graph) workflow task scheduling platform. The method aims to solve the complicated dependency relationship in the data processing flow, so that the scheduling system can be used in a box opening mode in the data processing flow. A certain cloud E-MapReduce is built on a cloud server ECS, and based on an Apache Hadoop (distributed system infrastructure) and an Apache Spark (distributed computing engine) which are open sources, a user can conveniently use other peripheral systems in the Hadoop (system infrastructure) and Spark (computing engine) ecological system to analyze and process data. EMRs (certain clouds) may also be in data communication with other cloud data storage systems and database systems.

In the prior art, the Dolphin scheduler and a certain cloud E-MapReduce mainly rely on a big data cluster to run tasks, functionally only support a user to write programs and execute, and do not support a flow interface operation; each development has a set of version dependence, users often have normal local debugging and abnormal online use, and a great amount of time is spent on debugging because of version conflict; a cloud E-MapReduce binds a set of big data components and does not support the use of a multi-version computing framework.

The inventor finds that: the Dolphin scheduler and the E-MapReduce are bound with versions of a self-computing framework, and are difficult to operate if the user wants to use multiple versions.

Disclosure of Invention

The embodiment of the invention aims to solve at least one of the technical problems.

In a first aspect, an embodiment of the present invention provides a data processing method, including: responding to a user entering a task management platform, and displaying a task creation interface on the task management platform, wherein the task creation interface comprises a stream task creation component and a batch task creation component; responding to the operation of the user on any creation component on the task creation interface, and entering a configuration interface of a corresponding task; and in response to the configuration operation and a submission instruction of the user on the configuration interface, submitting the information configured by the user and a dependency package of the plug-in stored by the task management platform and related to the information configured by the user to execute.

In a second aspect, an embodiment of the present invention provides a data processing apparatus, including: the task creating module is used for responding to the fact that a user enters a task management platform and displaying a task creating interface on the task management platform, wherein the task creating interface comprises a stream task creating component and a batch task creating component; the task configuration module is used for responding to the operation of the user on any creation component on the task creation interface and entering a configuration interface of a corresponding task; and the execution module is used for responding to the configuration operation and the submission instruction of the user on the configuration interface, submitting and executing the information configured by the user and the dependency package of the plug-in which is stored by the task management platform and is related to the information configured by the user.

In a third aspect, an embodiment of the present invention provides an electronic device, including: the data processing system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute any one of the data processing methods of the invention.

In a fourth aspect, the embodiment of the present invention provides a storage medium, in which one or more programs including execution instructions are stored, and the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above data processing methods of the present invention.

In a fifth aspect, the present invention further provides a computer program product, which includes a computer program stored on a storage medium, the computer program including program instructions, which when executed by a computer, cause the computer to execute any one of the data processing methods described above.

According to the embodiment of the invention, the components used by the user are created and stored in advance, when the components are required to be used, data development is carried out only by dragging the components on the page, the background can use the dependency packages of the related plug-ins as required, the use threshold of big data development is solved, and the method is more convenient and friendly for the user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flow chart of an embodiment of a data processing method provided by the present invention;

FIG. 2 is a flow chart of another embodiment of a data processing method according to the present invention;

FIG. 3 is a flowchart of an embodiment of a data processing apparatus according to the present invention;

FIG. 4 is a block diagram of a data processing method according to the present invention;

FIG. 5 is a flow chart illustrating implementation steps of a data processing method according to the present invention;

fig. 6 is a schematic structural diagram of an embodiment of an electronic device according to the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element.

The embodiment of the invention provides a data processing method which can be applied to electronic equipment. The electronic device may be a computer, a server, or other electronic products, and the invention is not limited thereto.

Referring to fig. 1, a data processing method according to an embodiment of the invention is shown.

As shown in fig. 1, in step 101, in response to a user entering a task management platform, a task creation interface is presented on the task management platform, where the task creation interface includes a stream task creation component and a batch task creation component;

in step 102, responding to the operation of the user on any creating component on the task creating interface, and entering a configuration interface of a corresponding task;

in step 103, in response to the configuration operation and the submission instruction of the user on the configuration interface, submitting the information configured by the user and the dependency package of the plug-in stored by the task management platform and related to the information configured by the user for execution.

In this embodiment, for step 101, a user enters a task management platform, creates a task based on a task creation interface displayed on the task management platform, and creates the task, where the task creation further includes a stream task creation component and a batch task creation component, and the user creates the task through the task management platform and then edits the task, where the task management platform mainly includes task management, DAG (directed acyclic graph) management, data source management, target source management, and Transform (data processing) plug-in management.

Then, for step 102, a user creates a task and configures the task on the task management platform, enters a configuration interface of a corresponding task according to an operation of the user on any created component on the task creation interface, the user can select 1 or more data sources in the task configuration process, fills in related metadata information, and maps the metadata information to a table name desired by the user, and simultaneously the user can select 1 or more Transform (data processing) plug-ins to execute corresponding operations, and finally the user selects a computing frame type and version used by the task to complete the final configuration of the task, for example, the user selects at least one data processing plug-in to configure a mapping table, wherein the plug-ins include a data processing module plug-in, a data processing function plug-in, a code module plug-in, a time delay data processing plug-in, a state judgment name plug-in, a result selection name plug-in, and a table processing name.

Finally, for step 103, the user stores and submits the instruction for the configuration operation on the configuration interface on the task management platform, and submits and executes the information configured by the user and the dependency package of the plug-in corresponding to the information configured by the user and stored by the task management platform, for example, the user packages the information configured by the task, the script of the data processing module and the data processing function by the task management platform and submits and executes the information configured by the task, the script of the data processing module and the script of the data processing function together with the calculation plug-in package in the storage in a parameter form, and the plug-in package analyzes the parameter to execute the corresponding operation.

According to the method, the components used by the user are created in advance and stored, when the method is needed, data development is carried out only in a page dragging mode, the background can use the dependence packages of the related plug-ins as required, the use threshold of big data development is met, and the method is more convenient and friendly for the user.

In some optional embodiments, after creating a task on the task management platform and configuring the task, a user requests to view the created task on the task management platform, and displays a state and a log of the task corresponding to the task viewing request on the task management platform, for example, after the user submits and executes the task created on the task management platform, the state of the created task is viewed through the task management platform.

According to the method, the states of the created tasks and the submitted and executed tasks on the task management platform are checked by the user, so that the created tasks and the configured information are ensured to be correct.

In some optional embodiments, the configuration interface includes a data source plug-in configuration sub-interface, a data processing plug-in configuration sub-interface, and a target source plug-in configuration sub-interface, dependency packages of various optional data source plug-ins in the data source plug-in configuration sub-interface, dependency packages of various optional data processing plug-ins in the data processing plug-in configuration sub-interface, and dependency packages of various optional target source plug-ins in the target source plug-in configuration sub-interface are stored in the task management platform, three types of plug-ins used by a user at ordinary times are written in advance and put into storage, the user can only care about business logic, data development is performed on a page in a dragging manner, and a background can use the dependency packages of the relevant plug-ins as required.

According to the method, the three types of plug-ins used by the user at ordinary times are written in advance and are placed in the storage, the dependency packages of the related plug-ins can be used as required, and the trouble of operation and maintenance or development personnel caused by dependency conflicts is solved. The user can use the device more conveniently.

Referring to fig. 2, another data processing method according to an embodiment of the invention is shown.

As shown in fig. 2, in step 201, acquiring selection and configuration information of the user on the data source plug-in configuration sub-interface for the data source plug-in;

in step 202, acquiring the selection and processing logic information of the user on the data processing plug-in configuration sub-interface;

in step 203, acquiring selection and configuration information of the user on the target source plug-in configuration sub-interface;

in step 204, in response to the submission instruction of the user, acquiring a dependency package stored by the task management platform and corresponding to the plug-in selected by the user;

in step 205, the configuration information of the user for the data source plug-in, the processing logic information of the user for the data processing plug-in, the configuration information of the user for the target source plug-in and the obtained dependency package are submitted for execution.

In this embodiment, for step 201, in the process of submitting and executing the information configured by the user and the dependency package of the plug-in related to the information configured by the user, stored by the task management platform, according to the configuration operation and the submission instruction of the user on the configuration interface, selection and configuration information of the user on the data source plug-in configuration sub-interface need to be obtained, where the data source includes a distributed message queue Kafka, a Hadoop (distributed system infrastructure) based data warehouse tool Hive, a deployment file system Hdfs, an analytic database Doris, a full-text retrieval and analysis engine Es, and a columnar storage distributed database Kudu.

Then, for step 202, in the process of submitting the information configured by the user and the dependency package of the plug-in related to the information configured by the user, stored by the task management platform, to the execution according to the configuration operation and the submission instruction on the configuration interface of the user, the selection and processing logic information of the user on the data processing plug-in (data processing plug-in) configuration sub-interface needs to be acquired, where the data processing plug-in includes a data processing module sql based on Spark/Flink (distributed computing engine), a data processing function udf based on Spark/Flink, a code block script based on java/scala, a mode Watermark based on Spark/Flink to process the delay data, a state judgment name Status, a result selection name Switch, and a Table processing name Table.

Then, for step 203, in the process of submitting and executing the information configured by the user and the dependency package of the plug-in related to the information configured by the user, stored by the task management platform, in accordance with the configuration operation and the submission instruction on the configuration interface of the user, selection and configuration information of the user on the target source plug-in configuration sub-interface need to be obtained, where the target source includes a relational database, a distributed message queue Kafka, a data warehouse tool Hive based on Hadoop (distributed system infrastructure), a distributed file system Hdfs, an analytic database Doris, a full-text retrieval and analysis engine Es, and a columnar storage distributed database Kudu.

Then, for step 204, according to the configuration operation and the submission instruction of the user on the configuration interface, submitting the information configured by the user and the dependency package of the plug-in related to the information configured by the user, which is stored by the task management platform, to the execution, wherein the dependency package corresponding to the plug-in selected by the user, which is stored by the task management platform, needs to be acquired in the execution process; finally, for step 205, the configuration information of the user for the data source plug-in, the processing logic information of the user for the data processing plug-in, the configuration information of the user for the target source plug-in, and the obtained dependency package are submitted and executed, for example, the user submits the task to be executed, the task management platform submits the information of the task configuration and the data processing function udf based on Spark/Flink, the Script of the code block Script based on java/scala is submitted and executed in a parameter form together with the calculation plug-in package in the storage, and the plug-in package parsing parameter executes the corresponding operation.

The method provided by the embodiment of the application is convenient to call at any time when a background needs to use the dependency package of the related plug-in based on the configuration information of the user on the data source plug-in, the processing logic information of the user on the data processing plug-in, the configuration information of the user on the target source plug-in and the acquired dependency package submission execution.

It should be noted that the data source plug-in includes a relational database, a distributed message queue Kafka, a Hadoop-based data warehouse tool Hive, a deployment file system Hdfs, an analytic database Doris, a full-text retrieval and analysis engine Es, and a column-wise storage distributed database Kudu; the data processing plug-in comprises a Spark/Flink-based data processing module sql, a Spark/Flink-based data processing function udf, a java/scala-based code block script, a Spark/Flink-based delay data processing mode Watermark, a state judgment name Status, a result selection name Switch and a Table processing name Table; the target source plug-in comprises a relational database, a distributed message queue Kafka, a data warehouse tool Hive based on Hadoop, a distributed file system Hdfs, an analytic database Doris, a full-text retrieval and analysis engine Es and a column-type storage distributed database Kudu.

In some optional embodiments, based on Spark/Flink (distributed computing engine), mapping the read data source plug-ins into a table, encapsulating the data processing plug-ins in a reflection mode and/or a functional interface mode, completing the processing of the target source plug-ins in a table writing mode, storing the data source plug-ins, the data processing plug-ins and the target source plug-ins in a task management platform, for example, a user firstly maps all the read data source data into a table based on Spark/Flink, then encapsulating some Transform (data processing) plug-ins in reflection, functional interface and other modes, and finally completing some target source plug-ins in a table writing mode, putting the plug-ins into a storage, and storing the plug-ins in hdfs (distributed file system); the data source plug-in, the data processing plug-in and the target source plug-in are stored in the task management platform.

It should be noted that, the configuration operation of the user on the configuration interface further includes the selection of the type and version of the computing framework used by the user for the task, and the task configuration is completed after the selection of the type and version of the computing framework is completed.

Fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, and the system can execute the data processing method according to any of the above embodiments and is configured in a terminal.

The data processing execution device 100 provided by the embodiment comprises: a task creation module 110, a task configuration module 120, and an execution module 130.

The task creating module 110 is configured to respond to a user entering a task management platform, and display a task creating interface on the task management platform, where the task creating interface includes a stream task creating component and a batch task creating component; a 120 task configuration module, configured to respond to an operation of the user on any creation component on the task creation interface, and enter a configuration interface of a corresponding task; and the execution module 130 is used for submitting the information configured by the user and the dependency package of the plug-in stored by the task management platform and related to the information configured by the user to execute in response to the configuration operation and the submitting instruction of the user on the configuration interface.

The application also provides an alternative scheme, the task package is bound to be executed in a configuration file mode, and the task package can be directly used on a machine without extra development cost.

It should be noted that, per se, it is used in a production environment: there are more than 300 resident streaming tasks and more than 500 timed batch tasks performed per day. An analyst uses the platform to analyze 400 reports every day, 95% of task development time is shortened from 3-4 hours to 5-10 minutes, and the stability of the scheme is more than 99.9%; the invention solves the data processing requirement of more than 95% of analysts, functionalizes and interfaces complicated codes, and releases the energy investment of data cleaning of the analysts. Meanwhile, the problem that operation and maintenance personnel are difficult to operate and maintain in multiple versions and multiple environments is solved.

Please refer to fig. 4, which shows a schematic block diagram of a data processing method according to the present invention.

As shown in fig. 4, the method is mainly divided into 4 parts:

1. the task management platform mainly comprises task management, DAG management, data source management, target source management and Transform plug-in management;

2. the data source comprises a relational database, a distributed message queue Kafka, a data warehouse tool Hive based on Hadoop, a distributed file system Hdfs, an analytical database Doris, a full-text retrieval and analysis engine Es and a column-type storage distributed database Kudu;

transform plug-in: the system comprises a Spark/Flink-based data processing module sql, a Spark/Flink-based data processing function udf, a java/scala-based code block script, a Spark/Flink-based delay data processing mode Watermark, a state judgment name Staus, a result selection name Switch and a Table processing name Table;

4. the target source comprises a relational database, a distributed message queue Kafka, a data warehouse tool Hive based on Hadoop, a distributed file system Hdfs, an analytic database Doris, a full-text retrieval and analysis engine Es and a column-type storage distributed database Kudu.

Referring to fig. 5, a flow chart of the data processing method according to the present invention is shown.

As shown in fig. 5, step 1: a scheme user firstly maps all read data source data into a table based on Spark/Flink, then encapsulates some Transform plug-ins in some modes such as reflection and functional interfaces, and finally completes some target source plug-ins in a table writing mode. These plug-ins are put into a store. The inserts are stored in hdfs.

Step 2: and the user creates a task through the task management platform and then edits the task. The front end can support the production of the task flow chart by using jsplubb and other related technologies. In task editing, the first step: the user can select 1 or more data sources, fill in relevant metadata information, map the metadata information into a table name required by the user, and in the second step: the user can select 1 or more Transform plug-ins, such as sql, udf, script, watermark, status, switch, table, etc. Sql may perform Sql processing on the mapping Table, udf is used to enrich Sql, script is used to process unstructured data, status is used to determine the flow direction of the flow chart, switch is used to select the flow direction of the flow chart, and Table may perform related operations on related data source/target source. The third step: the user selects 1 or more target sources. The fourth step: the user selects the type and version of computing framework that the task uses. And completing task configuration.

And step 3: the user submits and executes the task, the task management platform submits and executes the information of the task configuration and udf, script packages in a parameter form together with the calculation plug-in package in the storage, and the plug-in package analyzes the parameter and executes the corresponding operation.

And 4, step 4: the status and log of the task can be seen through the task management platform.

It should be noted that, in the present invention, two current mainstream computing frameworks of Spark/Flink are used, and a data processing workflow is divided into Source [ data Source input ] - > Transform [ data processing ] (multiple ones) - > Sink [ result output ]. Three types of plug-ins used by a user at ordinary times are written in advance and put into storage, so that the user can only concern about business logic, and perform data development on a page in a dragging manner, and a background can use a dependency package of related plug-ins as required. Therefore, the use threshold of big data development is solved on one hand, and the trouble of operation and maintenance or developer caused by dependence on conflict is solved on the other hand.

It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

In some embodiments, the present invention provides a non-transitory computer readable storage medium, in which one or more programs including execution instructions are stored, and the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above data processing methods of the present invention.

In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the data processing methods described above.

In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: the system comprises at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data processing method.

Fig. 6 is a schematic diagram of a hardware structure of an electronic device executing a data processing method according to another embodiment of the present application, and as shown in fig. 6, the electronic device includes:

one or more processors 610 and a memory 620, with one processor 610 being an example in fig. 6.

The apparatus performing the data processing method may further include: an input device 630 and an output device 640.

The processor 610, the memory 620, the input device 630, and the output device 640 may be connected by a bus or other means, such as the bus connection in fig. 6.

The memory 620, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the data processing method in the embodiments of the present application. The processor 610 executes various functional applications of the server and data processing by executing nonvolatile software programs, instructions and modules stored in the memory 620, that is, implements the data processing method of the above-described method embodiment.

The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the data processing apparatus, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 620 optionally includes memory located remotely from processor 610, which may be connected to a data processing device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 630 may receive input numeric or character information and generate signals related to user settings and function control of the data processing apparatus. The output device 640 may include a display device such as a display screen.

The one or more modules are stored in the memory 620 and, when executed by the one or more processors 610, perform the data processing method of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) Mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, among others.

(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.

(4) Other onboard electronic devices with data interaction functions, such as a vehicle-mounted device mounted on a vehicle.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of data processing, comprising:

responding to a user entering a task management platform, and displaying a task creation interface on the task management platform, wherein the task creation interface comprises a stream task creation component and a batch task creation component;

responding to the operation of the user on any creating component on the task creating interface, and entering a configuration interface of a corresponding task;

and in response to the configuration operation and a submission instruction of the user on the configuration interface, submitting the information configured by the user and a dependency package of the plug-in stored by the task management platform and related to the information configured by the user to execute.

2. The method of claim 1, further comprising:

responding to a task viewing request of the user on the task management platform, and displaying a state and a log of a task corresponding to the task viewing request on the task management platform.

3. The method according to claim 1, wherein the configuration interface includes a data source plug-in configuration sub-interface, a data processing plug-in configuration sub-interface, and a target source plug-in configuration sub-interface, and the task management platform stores therein a dependency package of various data source plug-ins selectable in the data source plug-in configuration sub-interface, a dependency package of various data processing plug-ins selectable in the data processing plug-in configuration sub-interface, and a dependency package of various target source plug-ins selectable in the target source plug-in configuration sub-interface.

4. The method of claim 3, wherein the submitting the user-configured information and the dependency package of the plug-in stored by the task management platform in relation to the user-configured information for execution in response to the configuration operation and the submit instruction by the user on the configuration interface comprises:

acquiring selection and configuration information of the user on the data source plug-in configuration sub-interface for the data source plug-in;

acquiring the selection and processing logic information of the user on the data processing plug-in configuration sub-interface;

acquiring selection and configuration information of the user on the target source plug-in configuration sub-interface for the target source plug-in;

responding to a submission instruction of the user, and acquiring a dependency package which is stored by the task management platform and corresponds to the plug-in selected by the user;

and submitting the configuration information of the user on the data source plug-in, the processing logic information of the user on the data processing plug-in, the configuration information of the user on the target source plug-in and the acquired dependency package for execution.

5. The method of claim 3, wherein the data source plug-in comprises a relational database, a distributed message queue, a Hadoop-based data warehouse tool, a file system for deployment, an analytical database, a full-text search and analysis engine, a columnar storage distributed database; the data processing plug-in comprises a Spark/Flink-based data processing module, a Spark/Flink-based data processing function, a java/scala-based code block, a Spark/Flink-based delay data processing mode, a state judgment name, a result selection name and a table processing name; the target source plug-in comprises a relational database, a distributed message queue, a data warehouse tool based on Hadoop, a distributed file system, an analytic database, a full-text retrieval and analysis engine and a column-type storage distributed database.

6. The method of claim 3, wherein prior to the responding user entering a task management platform, the method further comprises:

mapping the read data source plug-in into a table based on Spark/Flink;

packaging the data processing plug-in by a reflection mode and/or a functional interface mode;

completing the processing of the target source plug-in according to a table writing mode;

and storing the data source plug-in, the data processing plug-in and the target source plug-in the task management platform.

7. The method of any of claims 1-6, wherein the configuration operations of the user on the configuration interface further comprise the user's selection of a computing framework type and version to use for a task.

8. A data processing apparatus comprising:

the system comprises a task creating module, a task managing module and a task creating module, wherein the task creating module is used for responding to the fact that a user enters a task managing platform and displaying a task creating interface on the task managing platform, and the task creating interface comprises a stream task creating component and a batch task creating component;

the task configuration module is used for responding to the operation of the user on any creation component on the task creation interface and entering a configuration interface of a corresponding task;

and the execution module is used for responding to the configuration operation and the submission instruction of the user on the configuration interface, submitting and executing the information configured by the user and the dependency package of the plug-in which is stored by the task management platform and is related to the information configured by the user.

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1 to 7.

10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.