CN114443025A - Modular ETL (extract transform load) task processing system and ETL task processing method for data governance platform - Google Patents

Modular ETL (extract transform load) task processing system and ETL task processing method for data governance platform Download PDF

Info

Publication number
CN114443025A
CN114443025A CN202210109047.6A CN202210109047A CN114443025A CN 114443025 A CN114443025 A CN 114443025A CN 202210109047 A CN202210109047 A CN 202210109047A CN 114443025 A CN114443025 A CN 114443025A
Authority
CN
China
Prior art keywords
task
component
data
configuration
etl
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210109047.6A
Other languages
Chinese (zh)
Other versions
CN114443025B (en
Inventor
张吉林
史亚雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuejin Digital Technology Shanghai Co ltd
Original Assignee
Riking Software System Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Riking Software System Shanghai Co ltd filed Critical Riking Software System Shanghai Co ltd
Priority to CN202210109047.6A priority Critical patent/CN114443025B/en
Publication of CN114443025A publication Critical patent/CN114443025A/en
Application granted granted Critical
Publication of CN114443025B publication Critical patent/CN114443025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/34Graphical or visual programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/323Visualisation of programs or trace data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/36Software reuse
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a modular ETL (extract transform load) task processing system for a data management platform, which comprises a data source management module, a configuration management module, a task monitoring module and a data analysis module, wherein the data source management module is used for managing the configuration management module; the data source management module provides support for data source configuration required by task operation; the configuration management module provides support for global configuration items required by task operation; the task management module is a system main module and is used for performing task management configuration; the task monitoring module is a monitoring interface in task operation and is responsible for monitoring the task execution progress and providing an error early warning interface; the data analysis module provides incidence relation analysis for the task related fields and provides a visual graphical analysis interface for the data after the task is executed. The invention also discloses an ETL task processing method realized by utilizing the ETL task processing system.

Description

Modular ETL (extract transform load) task processing system and ETL task processing method for data governance platform
Technical Field
The invention belongs to the technical field of data processing, and relates to a modular ETL (extract transform load) task processing system and an ETL task processing method for a data management platform.
Background
ETL is an abbreviation for Extract-Transform-Load in english, and is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source end to a destination end. The method is a process of loading data of a business system into a data warehouse after extraction, cleaning and conversion, and aims to integrate scattered, disordered and standard non-uniform data in an enterprise and provide an analysis basis for enterprise decision making. There are generally three ways to treat ETL, one: the method is realized through Python, SQL or other programming modes, but the data processing logic realized by the method is difficult to reuse, has high development cost and difficult error correction, the whole process is not transparent enough, and real-time monitoring cannot be performed, and if the functions of extraction, conversion, loading and monitoring are required to be realized, huge labor cost of an enterprise is required. The second method comprises the following steps: through the implementation of the button, although the button realizes the visualization function and has relatively rich functions, the centralized task management cannot be performed, the authority management cannot be performed, the task monitoring mode is single, and the record playback cannot be performed. The third method comprises the following steps: some enterprises can use some common data processing tools such as DataX, which implements data extraction, simple conversion and data loading in a text configuration manner, but this manner is also only suitable for some simple logic processing, and plug-in development is required for some scenarios requiring complex conversion, which has certain technical threshold for general enterprises.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a modular ETL task processing system and an ETL task processing method for a data governance platform.
The modularized ETL task processing system provided by the invention is a task processing system integrating data extraction, conversion, verification, loading, monitoring and authority control, various components are flexibly combined through visual component arrangement to complete complex data processing logic, the system can be used by users with different technical levels, a primary user can use the system only by configuring the system according to default parameters provided by the system, and a senior user can complete complex business logic processing through programming components such as hundreds of built-in functions, JAVASCRIPT, SQL and the like provided by the system. The system operation process is visualized, the operation process of each component is monitored in real time in detail, the operation state, the overall progress and the input/output record number of each component can be seen through a unified monitoring interface, and meanwhile, the historical task execution condition can be traced, replayed and checked for errors.
The invention provides a modular ETL (extract transform load) task processing system for a data management platform, which comprises a data source management module, a configuration management module, a task monitoring module and a data analysis module;
the data source management module provides support for data source configuration required by task operation, and comprises: the method comprises the following steps of configuring a database and FTP (file transfer protocol), wherein the configuration of the database comprises the configuration of the type of the database, the configuration of a connection mode, the configuration of a login account and the configuration of a connection pool, and the configuration of the FTP comprises the configuration of a protocol, the configuration of the login account and the configuration of data transmission codes;
the configuration management module provides support for global configuration items required by task running, and the support comprises the following steps: dictionary resource configuration, mail server configuration and verification rule configuration, wherein the mail server configuration comprises the following steps: SMTP (simple mail transfer protocol) server configuration, SMTP port configuration, SMTP security protocol configuration and login account configuration, wherein the verification rule configuration comprises the following steps: data type configuration, field length configuration, dictionary value configuration, data range configuration and regular expression configuration;
the task management module is a system main module and is used for performing task management configuration, including data migration management, online drawing management, task copy management and locking/unlocking management;
the task monitoring module is a monitoring interface in task running and is responsible for monitoring task execution progress and providing an error early warning interface, and the task monitoring comprises task execution log playback, each component operation log playback, data sampling, visual task progress monitoring and error data extraction;
the data analysis module provides incidence relation analysis for the task related fields and provides a visual graphic analysis interface for the data after the task is executed, and the data analysis comprises the following steps: analyzing the association relation of fields and tasks, analyzing the association relation of fields and data sources, and executing log analysis.
The invention also provides an ETL task processing method realized by utilizing the ETL task processing system, which comprises the following steps:
step one, selecting ETL task components required by task execution according to task requirements and initializing;
selecting data required by the pull task from a data source;
step three, calculating the number of required threads, distributing the initialized ETL task components to each thread and starting each thread, distributing the data in the step two to each thread according to the requirement and processing by utilizing the ETL components in the threads;
and step four, storing the processed data in a database for use in subsequent links, or updating the task state and repeating the operation.
The task flow is designed by a graphical user-defined flow, a visual attribute editing mode and batch attribute import/export; the process design area supports operations such as copying, cutting and pasting, returning, advancing and the like; data stream transfer, path branching and the like among the components are controlled in a graphical dragging mode, and data stream distribution, copying operation and the like are supported among different components; and complex business logic can be completed among the components through free combination.
In the first step, the ETL task component comprises five categories of an input class component, a conversion class component, an output class component, a flow class component and an application class component;
the input type component is responsible for reading a data source and generating input streams, and comprises a table input component, an Excel input component, a TXT input component, a fixed-length text input component, an XML input component, a constant input component, a read file list component and a result set input component;
the conversion component is responsible for various conversions of data streams, and completes the processing of service logic through the permutation and combination of different components, including a table output component, a table insertion/update component, an Excel output component, a TXT output component, a fixed-length text output component, an XML output component, a result set output component and a variable setting component;
the output assembly is responsible for storing the input stream to a data source and comprises a connecting assembly, an aggregation assembly, a sorting assembly, a cleaning assembly, a duplicate removal assembly, a row-to-column assembly, a column-to-row assembly, a formula assembly, a difference comparison assembly and an input stream merging assembly;
the flow class component is responsible for condition judgment, state conversion and path selection operation; the system comprises a checking component, a Boolean filtering component, an enumeration filtering component, a condition suspension component, a step waiting component and a data delay component;
the application component is responsible for processing various types of single service data and does not relate to data stream input, and comprises an SQL execution component, a storage process component, a SHELL component, an SSH component, an HTTP component, an Email sending component, an FTP downloading component, an FTP uploading component, a file decompressing component, a file compressing component, a file changing and encoding component, a time delay component, a file checking component, a file creating component, a file deleting component, a file transferring component and a JavaScript component.
In the second step, the data source comprises a database, JSON, TXT, RESTAPI, XML, CSV, Excel and a fixed-length text; the data source in the invention can be widely used in various industries and has no specific source requirement limitation.
In the third step, the thread can flexibly select the number of threads, the execution sequence of the threads and the execution strategy according to the service requirement;
the ETL task system calculates different thread branches according to the arrangement condition of each component in the task, then stores the component to which each thread branch belongs into different thread parameters, namely a HashMap structure, and respectively creates different threads according to the thread parameters after the system is started to initialize the components and run the corresponding task.
The processing method also comprises a task monitoring link, wherein the operation process is monitored through the flow of the whole processing method; the task monitoring link comprises task execution log playback, component operation log playback, visual task progress monitoring, data sampling and error data extraction; during the task running period, the current task execution progress, the running condition of each thread and the execution condition of the input and output data stream of each component can be checked through a visual interface; and during the operation process, each component can perform data sampling in real time and store the data for verifying the data accuracy of each component.
The task execution log playback means that a detailed log of the task during the whole task running period can be viewed in detail after the task execution is finished, and the log content comprises time, state and prompt information, as shown in fig. 9.
The playback of the running log of each component refers to clicking each component after the execution of the task is finished to check a detailed log of the component in the whole running period of the task, wherein the log content comprises time, state and prompt information, and is shown in fig. 7.
The visual task progress monitoring refers to checking the running states, the progress and the error information of all tasks through a task monitoring interface, as shown in fig. 11, and mastering the running state of the whole task system through the monitoring interface.
The data sampling refers to the data sampling of each component, and whether the parameters set during the operation of the component are effective or not is checked and debugged, as shown in fig. 7.
During the task running process, if some data do not accord with the component parameter rules, the error data are extracted into a log table for the monitoring interface to view in a unified way, as shown in fig. 12.
The API, the operation buttons and the list data related in the running process of the task system are controlled by the authority subsystem, and only a user with authority can operate the corresponding buttons, the API and the data. The system can automatically configure system modules and operation buttons according to the authority owned by the user, and the modules and operation buttons without the authority can be automatically removed.
The beneficial effects of the invention include: the ETL task processing system provided by the invention adopts a graphical, componentized and flow low-code development mode to reduce the use threshold of a user, and the operations of conversion, aggregation, filtration, cleaning and the like of business data are more transparent, flexible and convenient through a self-defined flow. Meanwhile, the system provides what you see is what you get interactive experience, so that the user is liberated from complicated system configuration, and can be more focused on the design of business top level design and data logic processing.
Drawings
FIG. 1 is a functional block diagram of an ETL task processing system of the present invention.
FIG. 2 is an architecture diagram of the ETL job processing system of the present invention.
FIG. 3 is a general flow chart of the ETL task processing system of the present invention involving multiple threads.
FIG. 4 is a visual example diagram of multi-threaded execution of the ETL processing method of the present invention.
FIG. 5 is a flow visualization example diagram of the ETL processing method of the present invention.
FIG. 6 is a schematic diagram of a task component interface in the ETL processing method of the present invention.
Fig. 7 is a schematic diagram of a task playback interface in the ETL processing method of the present invention.
FIG. 8 is a schematic view of a visual monitoring interface in the ETL processing method of the present invention.
FIG. 9 is a schematic diagram of a task log interface in the ETL processing method of the present invention.
FIG. 10 is an exemplary diagram of a process visualization in a specific service instance in an embodiment of the present invention.
FIG. 11 is a schematic diagram of a task monitoring interface for checking the running status, progress and error information of all tasks in the present invention.
FIG. 12 is a schematic diagram of a task monitoring interface viewing error data in the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.
The invention provides a modular ETL task processing system for a data governance platform and an ETL processing method realized by using the system.
The processing method comprises the steps of selecting ETL task components required by task execution according to task requirements and initializing the ETL task components; selecting data required by a pull task from a data source; calculating the number of required threads, distributing the initialized ETL task components to each thread and starting each thread, distributing the data to each thread according to the requirement, and processing by utilizing the ETL components in the threads; and storing the processed data in a database for subsequent links to use, or updating the task state, and repeating the operation to complete the flow of the whole processing method.
In the present invention, the start of multithreading when the task starts may be, for example, as shown in fig. 4, start 2 threads to respectively complete the tasks of TXT input and table input, compare the input data contents, after each thread starts, wait for each thread to complete the task or continue to enter the next component, and the process will cycle until the task ends.
In the specific full-flow operation process, as illustrated in fig. 5, the system starts two threads: thread 1 is responsible for loading the TXT text, thread 2 is responsible for loading data from the data sheet, then two threads finish the task and hand over the data to the main thread respectively and destroy automatically after keeping, at this moment the main thread can start thread 3 here and finish the difference contrast, then continue to run the routing module, the routing module will start 3 threads after judging according to the attribute that the module has set up and combining the current data afterwards, these 3 threads finish newly adding, deleting, relevant business logic modified respectively, upgrade the whole task state after 3 threads finish all executing, destroy 3 threads finally, until the task runs and finishes.
The ETL task processing system stores task parameters, component parameters, a component arrangement diagram and a thread sequence diagram into a database in a JSON format at a task starting stage, generates an independent UUID for the database, and updates the component parameters, sample data and a task execution state according to the UUID in the system operation process, so that a task execution log playback function can be realized, for example, as shown in FIG. 7; in fig. 7, four components, namely, a formula component, an enumeration filter component, a null operation component, and a table output component, are used, where a line of data is generated using the [ formula ] component, including a, integer: 1, b, date and time type: 2021-12-0300:00:00, c, string: the three fields of '1' are filtered out by a [ enumeration filtering ] component, the value of a being 1 is sent to a [ idle operation ] component, and finally the value is stored in a database by a [ table output ] component, in the process, each component can check the flowing sampling data (the first 10 pieces) and the number of input/output records, and if the component makes an error in the processing process, the component can be printed in a delivered log in real time for an operator to check.
According to the design in the task execution playback, the ETL task system restores JSON data such as task parameters, component arrangement diagrams, thread sequence diagrams and the like into a system design interface after acquiring the JSON data according to UUIDs, sets different state colors according to the actual operation result of each component, and points each component to display the detailed operation data of the component, wherein the method comprises the following steps: inputting the number of records, outputting the number of records, status log and other data. The system divides the task running state into: the 4 states of waiting to be executed, in progress, success and failure are displayed, and real-time statistics are shown in a form of a visual chart, as shown in fig. 8.
The log execution part of each component can print various logs in the operation stage to assist a user to complete task design, the logs can be formatted and then stored in a database so as to be used in visual task progress monitoring and task execution log playback, a system provides log options of four levels of ERROR, WARN, INFO and DEBUG which respectively correspond to logs with different detailed degrees, and the user can select different log levels according to service types, such as: the task with the application class component as the main component may select a DEBUG type, the task with the input and output class components as the main component may select an ERROR type, and the system may print log information of different levels according to the set levels in the running stage, for example, as shown in fig. 9.
The present invention is described below with a specific service example, as shown in fig. 10.
Step 1: and importing the txt file into the memory from the disk.
Step 2: and importing the customer in the database into the memory.
And 3, step 3: comparing the data generated in the step 1 and the step 2, and dividing the comparison result into: adding, modifying and deleting 3 types.
And 4, step 4: and storing the comparison result into a database diff _ result table for subsequent processing.
Detailed process description: after a task is started, 2 threads are started firstly, the two threads are responsible for importing txt files into a memory from a disk and importing a customer table in a database into the memory, after the txt files and the customer table are imported, data in the memory are contrasted according to rules set in a difference contrast assembly, and the contrast result is divided into: adding, modifying and deleting 3 categories, wherein the classification field is represented by Result, after the classification is finished, distributing different classification results to different path branches according to a rule set by a component (actually, an enumeration filtering component) according to a state rule path, the adding, modifying and deleting components represent different path branches, the 3 branches are used for storing the classified data in a database diff _ Result table for subsequent processing by a table output component (storing newly added data, storing modified data and storing deleted data, actually being alias of the table output component), after the table output components on the 3 paths are finished, the main thread is used for automatically destroying the updated task state successfully and returning the state to a front end interface, so that the whole task execution is finished, and when an error is encountered during the task execution, the main thread sends a stop signal to each task, and after the task thread is finished, the main thread updates the task state to failure and returns the state to the front-end interface.
The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, which is set forth in the following claims.

Claims (9)

1. A modularized ETL task processing system for a data governance platform is characterized by comprising a data source management module, a configuration management module, a task monitoring module and a data analysis module;
the data source management module provides support for data source configuration required by task operation, and the support comprises the following steps: database configuration and File Transfer Protocol (FTP) configuration; the method comprises the steps that the configuration of a database comprises the configuration of the type of the database, the configuration of a connection mode, the configuration of a login account and the configuration of a connection pool, and the configuration of FTP comprises the configuration of a protocol, the configuration of the login account and the configuration of data transmission codes;
the configuration management module provides support for global configuration items required by task running, and the support comprises the following steps: dictionary resource configuration, mail server configuration and verification rule configuration; wherein the mail server configuration comprises: the method comprises the following steps of Simple Mail Transfer Protocol (SMTP) server configuration, SMTP port configuration, SMTP security protocol configuration and login account configuration, wherein the verification rule configuration comprises the following steps: data type configuration, field length configuration, dictionary value configuration, data range configuration and regular expression configuration;
the task management module is a system main module and is used for performing task management configuration, including data migration management, online drawing management, task copy management and locking/unlocking management;
the task monitoring module is a monitoring interface in task running and is responsible for monitoring task execution progress and providing an error early warning interface, and the task monitoring comprises task execution log playback, each component operation log playback, data sampling, visual task progress monitoring and error data extraction;
the data analysis module provides incidence relation analysis for the task related fields and provides a visual graphic analysis interface for the data after the task is executed, and the data analysis comprises the following steps: analyzing the association relation of fields and tasks, analyzing the association relation of fields and data sources, and executing log analysis.
2. An ETL task processing method implemented by using the ETL task processing system according to claim 1, wherein said method comprises the steps of:
step one, selecting ETL task components required by task execution according to task requirements and initializing;
selecting data required by the pull task from a data source;
step three, calculating the number of required threads, distributing the initialized ETL task components to all the threads and starting all the threads, distributing the data in the step two to all the threads according to the requirements and processing by utilizing the ETL components in the threads;
and step four, storing the processed data in a database for use in subsequent links, or updating the task state and repeating the operation.
3. The ETL task processing method according to claim 2, wherein the flow of the task is designed by graphical custom flow, visual property editing manner, batch property import/export; the flow design area supports copying, cutting and pasting, returning and advancing operations; data flow and path branches among the components are controlled in a graphical dragging mode, and data flow distribution and copying operations are supported among different components; and complex business logic can be completed among the components through free combination.
4. The ETL task processing method according to claim 2, wherein in step one, said ETL task components include five major classes of input class components, conversion class components, output class components, flow class components, application class components;
the input type component is responsible for reading a data source and generating input streams, and comprises a table input component, an Excel input component, a TXT input component, a fixed-length text input component, an XML input component, a constant input component, a read file list component and a result set input component;
the conversion component is responsible for various conversions of data streams, and completes the processing of service logic through the permutation and combination of different components, including a table output component, a table insertion/update component, an Excel output component, a TXT output component, a fixed-length text output component, an XML output component, a result set output component and a variable setting component;
the output assembly is responsible for storing the input stream to a data source and comprises a connecting assembly, an aggregation assembly, a sorting assembly, a cleaning assembly, a duplicate removal assembly, a row-to-column assembly, a column-to-row assembly, a formula assembly, a difference comparison assembly and an input stream merging assembly;
the flow class component is responsible for condition judgment, state conversion and path selection operation; the system comprises a checking component, a Boolean filtering component, an enumeration filtering component, a condition suspension component, a step waiting component and a data delay component;
the application component is responsible for processing various types of single service data and does not relate to data stream input, and comprises an SQL execution component, a storage process component, a SHELL component, an SSH component, an HTTP component, an Email sending component, an FTP downloading component, an FTP uploading component, a file decompressing component, a file compressing component, a file changing and encoding component, a time delay component, a file checking component, a file creating component, a file deleting component, a file transferring component and a JavaScript component.
5. The ETL task processing method according to claim 2, wherein in step two, the data sources include database, JSON, TXT, RESTAPI, XML, CSV, Excel, fixed length text.
6. The ETL task processing method of claim 2, wherein in step three, the threads can flexibly select the number of threads, the thread execution order and the execution policy through the business needs; the ETL task processing system calculates different thread branches according to the arrangement condition of each component in the task, then stores the component to which each thread branch belongs into different thread parameters, namely a HashMap structure, and respectively creates different threads according to the thread parameters after the system is started to initialize the components and run the corresponding task.
7. The ETL task processing method according to claim 2, wherein the processing method further comprises a task monitoring step of monitoring an operation process through a flow of the entire processing method; the task monitoring link comprises the following steps: task execution log playback, each component operation log playback, visual task progress monitoring, data sampling and error data extraction; during the task running period, the current task execution progress, the running condition of each thread and the execution condition of the input and output data stream of each component can be checked through a visual interface; and during the operation process, each component can perform data sampling in real time and store the data for verifying the data accuracy of each component.
8. The ETL job processing method of claim 7,
the task execution log playback means that a detailed log of the task in the whole task running period can be checked in detail after the task execution is finished, and the log content comprises time, state and prompt information;
the playback of the running log of each component refers to that the detailed log of the component in the whole task running period can be checked by clicking each component after the task is executed, and the log content comprises time, state and prompt information;
the visual task progress monitoring means that the running states, the progress and the error information of all tasks can be checked through a task monitoring interface, and the running state of the whole task system is mastered through the monitoring interface; the data sampling refers to that whether the parameters set during the operation of each component are effective or not can be checked and debugged through the data sampling of the component;
the error data extraction means that if some data are not in accordance with the component parameter rule in the task running process, the error data are extracted into a log table for the monitoring interface to view in a unified mode.
9. The task processing system of claim 1, further comprising a permission subsystem, wherein the API, the operation button, and the list data involved in the operation of the task processing system are controlled by the permission subsystem, and only the user with permission can operate the corresponding button, API, and the associated data; the system can configure the system module and the operation buttons according to the authority owned by the user, and the modules and the operation buttons without the authority can be automatically removed.
CN202210109047.6A 2022-01-28 2022-01-28 Modularized ETL task processing system and ETL task processing method for data management platform Active CN114443025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210109047.6A CN114443025B (en) 2022-01-28 2022-01-28 Modularized ETL task processing system and ETL task processing method for data management platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210109047.6A CN114443025B (en) 2022-01-28 2022-01-28 Modularized ETL task processing system and ETL task processing method for data management platform

Publications (2)

Publication Number Publication Date
CN114443025A true CN114443025A (en) 2022-05-06
CN114443025B CN114443025B (en) 2023-10-24

Family

ID=81371117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210109047.6A Active CN114443025B (en) 2022-01-28 2022-01-28 Modularized ETL task processing system and ETL task processing method for data management platform

Country Status (1)

Country Link
CN (1) CN114443025B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166162A (en) * 2023-04-20 2023-05-26 紫金诚征信有限公司 Visual operation method and device of database and computer readable medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180011912A1 (en) * 2016-07-11 2018-01-11 Al-Elm Information Security Co. Methods and systems for multi-dynamic data retrieval and data disbursement
CN107948254A (en) * 2017-11-10 2018-04-20 上海华讯网络系统有限公司 Mix the big data processing frame arranging system and method for cloud platform
CN110232085A (en) * 2019-04-30 2019-09-13 中国科学院计算机网络信息中心 A kind of method of combination and system of big data ETL task
US20200175027A1 (en) * 2018-12-04 2020-06-04 International Business Machines Corporation Mining data transformation flows in spreadsheets
KR20200103133A (en) * 2019-02-07 2020-09-02 한국전자통신연구원 Method and apparatus for performing extract-transfrom-load procedures in a hadoop-based big data processing system
CN111694888A (en) * 2020-06-12 2020-09-22 谷云科技(广州)有限责任公司 Distributed ETL data exchange system and method based on micro-service architecture
CN113918636A (en) * 2021-10-21 2022-01-11 中通服公众信息产业股份有限公司 ETL-based data throughput analysis method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180011912A1 (en) * 2016-07-11 2018-01-11 Al-Elm Information Security Co. Methods and systems for multi-dynamic data retrieval and data disbursement
CN107948254A (en) * 2017-11-10 2018-04-20 上海华讯网络系统有限公司 Mix the big data processing frame arranging system and method for cloud platform
US20200175027A1 (en) * 2018-12-04 2020-06-04 International Business Machines Corporation Mining data transformation flows in spreadsheets
KR20200103133A (en) * 2019-02-07 2020-09-02 한국전자통신연구원 Method and apparatus for performing extract-transfrom-load procedures in a hadoop-based big data processing system
CN110232085A (en) * 2019-04-30 2019-09-13 中国科学院计算机网络信息中心 A kind of method of combination and system of big data ETL task
CN111694888A (en) * 2020-06-12 2020-09-22 谷云科技(广州)有限责任公司 Distributed ETL data exchange system and method based on micro-service architecture
CN113918636A (en) * 2021-10-21 2022-01-11 中通服公众信息产业股份有限公司 ETL-based data throughput analysis method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166162A (en) * 2023-04-20 2023-05-26 紫金诚征信有限公司 Visual operation method and device of database and computer readable medium

Also Published As

Publication number Publication date
CN114443025B (en) 2023-10-24

Similar Documents

Publication Publication Date Title
US11645250B2 (en) Detection and enrichment of missing data or metadata for large data sets
US7984426B2 (en) Graphical representation of dependencies between changes of source code
US8332811B2 (en) Systems and methods for generating source code for workflow platform
Koehler et al. Process anti-patterns: How to avoid the common traps of business process modeling
US20140180754A1 (en) Workflow System and Method for Single Call Batch Processing of Collections of Database Records
AU2011213842B2 (en) A system and method of managing mapping information
CN105354239A (en) Configuration data processing model based processing center data stream processing method
EP1810131A2 (en) Services oriented architecture for data integration services
CN109635024A (en) A kind of data migration method and system
US10275234B2 (en) Selective bypass of code flows in software program
EP1725922A2 (en) Methods and systems for automated data processing
CN103077192A (en) Data processing method and system thereof
CN114443025B (en) Modularized ETL task processing system and ETL task processing method for data management platform
CN109522005A (en) Cross-platform GRAPHICAL PROGRAMMING method
US20070156742A1 (en) Visual modeling method and apparatus
Hmami et al. Enhancing change mining from a collection of event logs: Merging and Filtering approaches
CN116150152A (en) Method and device for determining wind control characteristic blood-vessel relation
CN115564373A (en) Project information data processing method, system, device and medium
Vetter Detecting operator errors in cloud maintenance operations
US20070156755A1 (en) Data source mapping method and apparatus
US11726792B1 (en) Methods and apparatus for automatically transforming software process recordings into dynamic automation scripts
Tok et al. Microsoft SQL Server 2012 Integration Services
Sah Mastering Microsoft Dynamics NAV 2016
CN116226788B (en) Modeling method integrating multiple data types and related equipment
EP4109287A1 (en) A collaborative system and method for multi-user data management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 200062 8th floor, building D, No. 1006, Jinshajiang Road, Putuo District, Shanghai

Applicant after: Yuejin Digital Technology (Shanghai) Co.,Ltd.

Address before: 200062 8th floor, building D, No. 1006, Jinshajiang Road, Putuo District, Shanghai

Applicant before: RIKING SOFTWARE SYSTEM (SHANGHAI) Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 201, 2nd Floor, Building 19, No. 2177 Shenkun Road, Minhang District, Shanghai, 201106

Patentee after: Yuejin Digital Technology (Shanghai) Co.,Ltd.

Country or region after: China

Address before: 200062 8th floor, building D, No. 1006, Jinshajiang Road, Putuo District, Shanghai

Patentee before: Yuejin Digital Technology (Shanghai) Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address