CN111813845A - ETL task-based incremental data extraction method, device, equipment and medium - Google Patents

ETL task-based incremental data extraction method, device, equipment and medium Download PDF

Info

Publication number
CN111813845A
CN111813845A CN202010610186.8A CN202010610186A CN111813845A CN 111813845 A CN111813845 A CN 111813845A CN 202010610186 A CN202010610186 A CN 202010610186A CN 111813845 A CN111813845 A CN 111813845A
Authority
CN
China
Prior art keywords
task
target
preset
etl
target task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010610186.8A
Other languages
Chinese (zh)
Inventor
熊汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Saiante Technology Service Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202010610186.8A priority Critical patent/CN111813845A/en
Publication of CN111813845A publication Critical patent/CN111813845A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the field of data processing, and provides an ETL task-based incremental data extraction method, an ETL task-based incremental data extraction device, ETL task-based incremental data extraction equipment and an ETL task-based incremental data extraction medium, wherein the ETL task-based incremental data extraction method comprises the following steps: obtaining an ETL task; screening a target task containing a preset field from basic fields contained in the ETL task; analyzing the target task and determining a data type corresponding to the target task; judging whether the target task belongs to the first operation; if the target task is operated for the first time, determining a target increment threshold value corresponding to the target task according to the data type; if the target task does not belong to the first operation, acquiring a target value corresponding to a preset field as a target increment limit value corresponding to the target task; and extracting incremental data of the ETL task according to the target incremental limit value. The invention also relates to a block chain technique, the ETL tasks can be stored in a block chain. The technical scheme of the invention realizes the improvement of the accuracy of extracting the incremental data aiming at the ETL task.

Description

ETL task-based incremental data extraction method, device, equipment and medium
Technical Field
The invention relates to the field of data processing, in particular to an incremental data extraction method, device, equipment and medium based on an ETL task.
Background
The existing ETL tasks are common business processes for performing incremental data extraction on data, but each ETL task needs to perform configuration and data conversion processing according to business first and also needs intervention query configuration of a third party when performing incremental data extraction on the data due to different business processing logics, such as increment from a database to a database, increment from a file to the database and the like, and abnormal positioning and repairing cannot be performed under the condition that the ETL tasks are abnormal, so that the accuracy of incremental data extraction is influenced.
Disclosure of Invention
The embodiment of the invention provides an incremental data extraction method, an incremental data extraction device, incremental data extraction equipment and an incremental data extraction medium based on an ETL (extract transform load) task, and aims to solve the problem that the accuracy of incremental data extraction aiming at the ETL task is not high in the prior art.
An ETL task-based incremental data extraction method comprises the following steps:
the method comprises the steps of obtaining an ETL task from a preset task library, wherein the ETL task comprises a plurality of basic fields;
screening ETL tasks containing preset fields from the basic fields as target tasks;
analyzing the target task, and determining a data type corresponding to the target task;
judging whether the target task belongs to the first operation;
if the target task is operated for the first time, determining a target increment limit value corresponding to the target task according to the data type;
if the target task does not belong to the first operation, acquiring a target value corresponding to the preset field as a target increment limit value corresponding to the target task;
and extracting incremental data of the ETL task according to the target incremental limit value.
An ETL task-based incremental data extraction device, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an ETL task from a preset task library, and the ETL task comprises a plurality of basic fields;
the matching module is used for screening out ETL tasks containing preset fields from the basic fields as target tasks;
the analysis module is used for analyzing the target task and determining the data type corresponding to the target task;
the judging module is used for judging whether the target task is operated for the first time;
the first determining module is used for determining a target increment limit value corresponding to the target task according to the data type if the target task runs for the first time;
the second determining module is used for acquiring a target value corresponding to the preset field as a target increment limit value corresponding to the target task if the target task does not belong to the first operation;
and the incremental data extraction module is used for extracting incremental data of the ETL task according to the target incremental limit value.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the ETL task based incremental data extraction method when executing the computer program.
A computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the above-described ETL task-based incremental data extraction method.
According to the incremental data extraction method, the incremental data extraction device, the incremental data extraction equipment and the incremental data extraction medium based on the ETL task, the target task meeting the conditions can be effectively screened out by using the mode of matching the basic field with the preset field, and the accuracy of subsequent incremental data extraction is improved; when the target task is analyzed, the recognition degree of the server to the data can be improved, so that the analysis processing result is effectively recognized, and the accuracy of data type acquisition is ensured; and finally, extracting incremental data of the ETL task according to the target increment threshold value. The target increment limit value can be determined in different modes under different conditions, the precision of the target increment limit value can be improved, and the accuracy of obtaining the target increment limit value is further improved, so that the intervention of a third party is avoided, and when an ETL task is abnormal, the abnormal position can be quickly and accurately positioned according to the target increment limit value, and the accuracy of extracting the increment data is effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a flowchart of an ETL task-based incremental data extraction method according to an embodiment of the present invention;
fig. 2 is a flowchart of step S3 in the ETL task-based incremental data extraction method according to an embodiment of the present invention;
fig. 3 is a flowchart of step S32 in the ETL task-based incremental data extraction method according to an embodiment of the present invention;
fig. 4 is a flowchart of step S5 in the ETL task-based incremental data extraction method according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating step S6 of the ETL task-based incremental data extraction method according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating step S65 of the ETL task-based incremental data extraction method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an ETL task-based incremental data extraction device provided by an embodiment of the invention;
fig. 8 is a block diagram of a basic mechanism of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The incremental data extraction method based on the ETL task is applied to the server side, and the server side can be specifically realized by an independent server or a server cluster consisting of a plurality of servers. In one embodiment, as shown in fig. 1, an ETL task-based incremental data extraction method is provided, which includes the following steps:
s1: and obtaining the ETL task from a preset task library, wherein the ETL task comprises a plurality of basic fields.
In the embodiment of the invention, the ETL task is obtained from a preset task library, wherein the preset task library is a database specially used for storing the ETL task, and the ETL task comprises at least 2 basic fields.
It is emphasized that, to further ensure the privacy and security of the ETL task, the ETL task can also be stored in a node of a blockchain.
S2: and screening the ETL tasks containing the preset fields from the plurality of basic fields as target tasks.
In the embodiment of the invention, the ETL task comprises a plurality of basic fields, the basic fields contained in the ETL task are matched with the preset fields, if the basic fields same as the preset fields exist, the ETL task is indicated to be configured with the preset fields, and the ETL task is determined as the target task; if the basic field identical to the preset field does not exist, the ETL task is indicated to be not configured with the preset field and is not processed.
The preset field refers to a field preset by a user, and specifically may refer to an increment field incrementallfield.
The incremental field refers to a field corresponding to incremental data, where the incremental data refers to data after data is changed or updated.
S3: and analyzing the target task and determining the data type corresponding to the target task.
In the embodiment of the invention, the target task is imported into the preset analysis port for analysis processing, and the data type corresponding to the target task is determined according to the preset mode and the analysis result after the analysis processing.
The preset analysis port is a processing port which is specially used for analyzing the target task.
The preset mode refers to a processing mode preset by a user and used for determining the data type according to the analysis result.
S4: and judging whether the target task belongs to the first operation.
In the embodiment of the invention, the target task comprises the task id, and whether the target task is operated for the first time is judged by matching the task id with the historical task id in the preset historical table.
The preset history table is a data table specially used for storing history task id.
S5: and if the target task is operated for the first time, determining a target increment limit value corresponding to the target task according to the data type.
In the embodiment of the invention, the target increment limit value is used for representing the node corresponding to the last updating of the ETL task, and the processing position corresponding to the ETL task can be quickly positioned according to the node.
Specifically, according to the matching method in step S4, if it is not matched that the task id is the same as the history id, it indicates that the target task corresponding to the task id has no operation record, and belongs to the first operation, and the target increment threshold corresponding to the target task is determined according to the data type and the preset requirement. The preset requirement refers to a processing requirement that a user presets a target increment threshold value corresponding to a target task according to a data type.
Further, if the target task is operated for the first time, updating a task id corresponding to the target task into a preset history table as a history task id.
S6: and if the target task does not belong to the first operation, acquiring a target value corresponding to the preset field as a target increment limit value corresponding to the target task.
In the embodiment of the present invention, according to the matching manner in step S4, if it is found that the task id is the same as the history id, it indicates that the target task corresponding to the task id has an operation record and does not belong to the first operation, and the target value corresponding to the preset field is queried from the preset record table and is used as the target increment threshold corresponding to the target task.
The preset recording table is a data table specially used for recording the target value corresponding to the preset field.
S7: and extracting incremental data of the ETL task according to the target incremental limit value.
Specifically, the target increment limit value is used for indicating a node corresponding to the last update of the ETL task, and the incremental data extraction is performed on the ETL task by finding the node corresponding to the ETL according to the target increment limit value and extracting data behind the node.
In the embodiment, the target tasks meeting the conditions can be effectively screened out by using the mode of matching the basic field with the preset field, and the accuracy of subsequent incremental data extraction is improved; when the target task is analyzed, the recognition degree of the server to the data can be improved, so that the analysis processing result is effectively recognized, and the accuracy of data type acquisition is ensured; and finally, extracting incremental data of the ETL task according to the target increment threshold value. The target increment limit value can be determined in different modes under different conditions, the precision of the target increment limit value can be improved, and the accuracy of obtaining the target increment limit value is further improved, so that the intervention of a third party is avoided, and when an ETL task is abnormal, the abnormal position can be quickly and accurately positioned according to the target increment limit value, and the accuracy of extracting the increment data is effectively improved.
In an embodiment, as shown in fig. 2, in step S3, that is, performing parsing processing on the target task, determining the data type corresponding to the target task includes the following steps:
s31: and analyzing the target task to obtain output data corresponding to the target task.
In the embodiment of the invention, the target task is led into the preset analysis port to be analyzed, m data streams after analysis and identification information corresponding to the data streams are obtained, the identification information is identified, and if the identification information is identified to be the preset information, the data stream corresponding to the identification information is used as output data corresponding to the target task. Wherein m is a positive integer greater than 1.
The preset information is information preset by a user to indicate output data.
It should be noted that the output data may also include different legal fields, and the output data may also refer to an output step corresponding to the target task.
S32: and identifying whether the output data contains a preset type, and determining the data type corresponding to the target task according to the identification result.
Specifically, the output data further comprises different legal fields, the preset type comprises an increment domain field corresponding to the preset type, whether the output data comprises the preset type or not is identified in a mode of matching the increment domain field with the legal fields, a corresponding identification result is obtained, and the data type corresponding to the target task is determined according to the preset rule and the identification result.
The preset rule refers to a rule that a user sets a data type corresponding to the target task according to the identification result according to actual requirements.
In this embodiment, the target task is analyzed to obtain output data corresponding to the target task, whether the output data includes a preset type is identified, and a data type corresponding to the target task is determined according to an identification result. Through the analysis processing mode, the recognition degree of the server to the data can be improved, the accurate recognition of the data type is ensured, and therefore the accuracy of determining the target increment limit value according to the data type in the follow-up process is improved.
In an embodiment, as shown in fig. 3, the step S32 of recognizing whether the output data includes a preset type, and determining the data type corresponding to the target task according to the recognition result includes the following steps:
s321: and identifying whether the output data contains a preset type.
In the embodiment of the invention, the output data also comprises different legal fields; the preset type refers to an increment type preset by a user, and the preset type comprises an increment field corresponding to the preset type.
It should be noted that the increment type refers to a data type corresponding to an increment, and may specifically be a timestamp, a sequence, and a character string, and each increment type includes a corresponding increment field.
Specifically, whether the output data contains the preset type or not is identified in a mode of matching the increment field with the legal field.
S322: and if the preset type is identified, taking the preset type as the data type corresponding to the target task.
In this embodiment of the present invention, according to the matching manner in step S321, if there is a legal field that is the same as the increment field, it indicates that the output data includes a preset type, and obtains the preset type corresponding to the increment field, and uses the preset type as the data type corresponding to the target task.
S323: and if the preset type is not identified, acquiring the basic type from the preset type table, and taking the basic type as the data type corresponding to the target task.
In this embodiment of the present invention, according to the matching manner in step S321, if there is no legal field that is the same as the increment field, it indicates that the output data does not include the preset type, and obtains the basic type preset by the user from the preset type table, and uses the basic type as the data type corresponding to the target task.
The preset type table is a data table specially used for storing basic types preset by a user, and the basic types mainly refer to incremental types.
In this embodiment, whether the output data includes a preset type is identified, and the preset type is used as a data type corresponding to the target task when the preset type is identified, otherwise, the basic type is obtained from the preset type table and used as the data type corresponding to the target task. By means of identifying whether the output data contains the preset type, whether the target task is endowed with the preset type can be effectively judged, and the data type corresponding to the target task is accurately identified, so that the accuracy of determining the target increment threshold value according to the data type subsequently is improved.
In an embodiment, as shown in fig. 4, in step S5, that is, if the target task is running for the first time, determining the target increment limit value corresponding to the target task according to the data type includes the following steps:
s51: and if the target task is operated for the first time, inquiring whether the target task has a configuration value from a preset configuration table.
In the embodiment of the invention, if the target task is operated for the first time, whether a configuration value exists is inquired from a preset configuration table, wherein the preset configuration table is a data table specially used for storing the configuration value.
It should be noted that, when the configuration value does not exist in the preset configuration table, the preset configuration table is empty.
S52: and when the configuration value is inquired, taking the configuration value as a target increment limit value.
Specifically, when the configuration value is queried from the preset configuration table, the configuration value is directly used as the target increment limit value.
S53: and when the configuration value is not inquired, inquiring a minimum basic value corresponding to the data type from a preset database, and taking the minimum basic value as a target increment limit value corresponding to the target task.
Specifically, when the query from the preset configuration table is empty, that is, it indicates that no configuration value is queried from the preset configuration table, the data type corresponding to the target task is matched with the basic types in the preset database, all the basic values corresponding to the basic types which are successfully matched are selected, all the basic values are sorted according to the sequence from small to large, the smallest basic value is selected as the smallest basic value corresponding to the data type, and the smallest basic value is used as the target increment limit value corresponding to the target task.
The preset database is a database which is specially used for storing different basic types and basic values corresponding to the basic types, and the basic types which are the same as the data types must exist.
For example, the preset database has base types a and B, the base values corresponding to a are respectively 3, 1 and 2, and the base values corresponding to B are respectively 2, 4 and 3, if the data type corresponding to the target task is a, the data type is matched with the base type to obtain that the data type a is the same as the base type a, then the base values 3, 1 and 2 corresponding to a are obtained, the base values are sorted in the order from small to large to obtain the minimum base value of 1, that is, the minimum base value is taken as the target increment limit value corresponding to the target task of 1.
In this embodiment, if the target task is operated for the first time, whether a configuration value exists in the target task is queried from a preset configuration table, and when the configuration value is queried, the configuration value is used as a target increment limit value, otherwise, a minimum basic value corresponding to the data type is obtained and used as the target increment limit value corresponding to the target task. Under the condition that the target task is operated for the first time, the target increment threshold value can be determined through different values under different conditions by inquiring whether the target task has the configuration value or not, the precision of the target increment threshold value is effectively improved, and the accuracy of obtaining the target increment threshold value is further improved.
In an embodiment, as shown in fig. 5, in step S6, that is, if the target task does not belong to the first run, acquiring the target value corresponding to the preset field, and taking the target value as the target increment limit value corresponding to the target task includes the following steps:
s61: and if the target task does not belong to the first running, acquiring the detection state of the target task from a preset log table.
In the embodiment of the invention, if the target task does not belong to the first running, the detection state of the target task is obtained from the preset log table. The preset log table is a data table specially used for recording the detection state corresponding to the target task.
S62: and screening out a target task with the detection state matched with the preset state as an initial task in a mode of matching the detection state with the preset state.
Specifically, according to the detection state obtained in step S61, the detection state is matched with the preset state, and if the detection state is the same as the preset state, it indicates that the target task corresponding to the detection state is in the state of increment threshold detection, and the target task corresponding to the detection state is taken as the initial task.
The preset state refers to a processing state preset by a user and used for judging whether the incremental threshold value is detected.
The increment limit detection is a method of detecting a task set in advance by a user using the increment limit.
S63: and judging whether the output data corresponding to the initial task is output to the database.
Specifically, a target end identifier of output data corresponding to the initial task is obtained from a preset monitoring table, the target end identifier is identified, and whether the output data are output to the database or not is judged according to an identification result. The preset monitoring table is a data table which is specially used for storing the target end identification of the output data corresponding to the initial task.
It should be noted that the target identifier indicates an output object corresponding to the output data, where the output object may specifically refer to a database or a platform preset by a user, such as a file, and if the target identifier is the database, the output object indicates that the output data is output to the database, otherwise, the output object is output to another platform.
S64: and if the output data is output to the database, inquiring the maximum value corresponding to the preset field in the database and the historical cache value corresponding to the database, wherein the database comprises the preset field.
Specifically, according to the determination manner in step S63, when the target identifier is identified as a database, it indicates that the output data is output to the database, and n data values corresponding to a preset field of the database are obtained from a preset value table, and the largest data value is selected from the n data values as the maximum value, and the history cache value corresponding to the database is obtained from a preset cache table. Wherein n is a positive integer greater than 1.
The preset numerical value table is a data table which is specially used for storing preset fields of the database and n data values corresponding to the preset fields.
The preset cache table is specially used for recording historical cache values corresponding to the database.
S65: and comparing the maximum value with the historical cache value, and determining a target increment limit value corresponding to the target task according to the comparison result.
Specifically, the maximum value obtained in step S64 is compared with the history buffer value, and the target increment limit value corresponding to the target task is determined according to the comparison result and the predetermined determination manner.
S66: and if the output data is not output to the database, acquiring an updated cache value corresponding to the output data as a target increment limit value.
Specifically, according to the determination manner in step S63, when it is determined that the destination identifier is not the database, indicating that the output data is not output to the database, the update buffer value corresponding to the output data is obtained from the preset destination table, and the update buffer value is used as the destination increment limit value.
The preset target table is a data table specially used for recording the updated cache value corresponding to the output data.
In the embodiment, if the target task does not belong to the first operation, the detection state corresponding to the target task is obtained to be matched with the preset state, the initial task is screened out, whether output data corresponding to the initial task is output to a database is judged, if the output data is output to the database, the maximum value corresponding to a preset field in the database and a historical cache value corresponding to the database are obtained, the maximum value is compared with the historical cache value, and a target increment threshold value is determined according to the comparison result; and if the output data is not output to the database, acquiring the updated cache value as the target increment limit value. Under the condition that the target task does not belong to the first running, the initial task is further screened, and whether output data corresponding to the initial task is output to a database or not is judged, so that the target increment limit value is determined, the deep analysis can be effectively carried out on the target task, the relevance among the data is improved by combining an output object, the accuracy of determining the target increment limit value is further improved, and the accuracy of extracting the increment data according to the target increment limit value is ensured.
In one embodiment, as shown in fig. 6, the step S65 of comparing the maximum value with the historical cache value and determining the target increment limit value corresponding to the target task according to the comparison result includes the following steps:
s651: the maximum value is compared to the historical cache value.
Specifically, the maximum value is compared with the historical cache value.
S652: and if the maximum value is smaller than the historical cache value, taking the maximum value as a target increment limit value.
Specifically, according to the comparison method in step S651, if the maximum value is smaller than the history buffer value, the maximum value is used as the target increment limit value.
S653: and if the maximum value is larger than or equal to the historical cache value, taking the historical cache value as a target increment limit value.
Specifically, according to the comparison method of step S651, if the maximum value is greater than or equal to the history buffer value, the history buffer value is used as the target increment limit value.
In this embodiment, by comparing the maximum value with the historical cache value, different values can be determined as the target increment limit value under different conditions, so that the problem that the accuracy of acquiring the target increment limit value is improved due to the fact that the determined value of the target increment limit value is single is avoided, and the accuracy of subsequently extracting the increment data according to the target increment limit value is ensured.
In an embodiment, after step S1, the method for extracting incremental data based on ETL task further includes the following steps:
the ETL tasks are stored into the blockchain.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, an incremental data extraction device based on an ETL task is provided, and the incremental data extraction device based on the ETL task corresponds to the incremental data extraction method based on the ETL task in the above embodiment one to one. As shown in fig. 7, the incremental data extraction apparatus based on ETL task includes an obtaining module 71, a matching module 72, an analyzing module 73, a judging module 74, a first determining module 75, a second determining module 76 and an incremental data extraction module 77. The functional modules are explained in detail as follows:
an obtaining module 71, configured to obtain an ETL task from a preset task library, where the ETL task includes a plurality of basic fields; it should be emphasized that, in order to further ensure the privacy and security of the ETL task, the ETL task may also be stored in a node of a block chain;
a matching module 72, configured to screen out, from the multiple basic fields, an ETL task including a preset field as a target task;
the analysis module 73 is configured to perform analysis processing on the target task and determine a data type corresponding to the target task;
a judging module 74, configured to judge whether the target task belongs to a first operation;
a first determining module 75, configured to determine, if the target task belongs to the first operation, a target increment threshold corresponding to the target task according to the data type;
a second determining module 76, configured to, if the target task does not belong to the first operation, obtain a target value corresponding to the preset field, as a target increment threshold corresponding to the target task;
and an incremental data extraction module 77, configured to perform incremental data extraction on the ETL task according to the target incremental limit value.
Further, the parsing module 73 includes:
the output data acquisition submodule is used for analyzing the target task and acquiring output data corresponding to the target task;
and the data type determining submodule is used for identifying whether the output data contains a preset type or not and determining the data type corresponding to the target task according to the identification result.
Further, the identification submodule comprises:
the identification unit is used for identifying whether the output data contains a preset type;
the first identification unit is used for taking the preset type as a data type corresponding to the target task if the preset type is identified;
and the second identification unit is used for acquiring the basic type from the preset type table if the preset type is not identified, and taking the basic type as the data type corresponding to the target task.
Further, the first determining module 75 includes:
the query submodule is used for querying whether the target task has a configuration value from a preset configuration table if the target task is operated for the first time;
the first query submodule is used for taking the configuration value as a target increment limit value when the configuration value is queried;
and the second query submodule is used for querying the minimum basic value corresponding to the data type from the preset database when the configuration value is not queried, and taking the minimum basic value as a target increment limit value corresponding to the target task.
Further, the second determination module 76 includes:
the detection state acquisition submodule is used for acquiring the detection state of the target task from a preset log table if the target task does not belong to the first operation;
the screening submodule is used for screening out a target task of which the detection state is matched with the preset state as an initial task in a mode of matching the detection state with the preset state;
the output judgment submodule is used for judging whether the output data corresponding to the initial task is output to the database or not;
the first output judgment submodule is used for inquiring the maximum value corresponding to the preset field in the database and the historical cache value corresponding to the database if the output data is output to the database, wherein the database comprises the preset field;
the first comparison submodule is used for comparing the maximum value with the historical cache value and determining a target increment limit value corresponding to the target task according to the comparison result;
and the second output judgment submodule is used for acquiring an updated cache value corresponding to the output data as a target increment limit value if the output data is not output to the database.
Further, the first comparison sub-module includes:
the second comparison unit is used for comparing the maximum value with the historical cache value;
the first comparison result unit is used for taking the maximum value as a target increment threshold value if the maximum value is smaller than the historical cache value;
and the second comparison result unit is used for taking the historical cache value as the target increment limit value if the maximum value is greater than or equal to the historical cache value.
Further, the ETL task-based incremental data extraction device further includes:
and the storage module is used for storing the ETL tasks into the block chain.
Some embodiments of the present application disclose a computer device. Referring specifically to fig. 8, a basic structure block diagram of a computer device 90 according to an embodiment of the present application is shown.
As illustrated in fig. 8, the computer device 90 includes a memory 91, a processor 92, and a network interface 93 communicatively connected to each other through a system bus. It is noted that only a computer device 90 having components 91-93 is shown in FIG. 8, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 91 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 91 may be an internal storage unit of the computer device 90, such as a hard disk or a memory of the computer device 90. In other embodiments, the memory 91 may also be an external storage device of the computer device 90, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 90. Of course, the memory 91 may also include both internal and external memory units of the computer device 90. In this embodiment, the memory 91 is generally used for storing an operating system installed on the computer device 90 and various types of application software, such as program codes of the ETL task-based incremental data extraction method. Further, the memory 91 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 92 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 92 is typically used to control the overall operation of the computer device 90. In this embodiment, the processor 92 is configured to run a program code stored in the memory 91 or process data, for example, run a program code of the ETL task-based incremental data extraction method.
The network interface 93 may include a wireless network interface or a wired network interface, and the network interface 93 is generally used to establish a communication connection between the computer device 90 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium, where an ETL task information entry program is stored, where the ETL task information entry program is executable by at least one processor, so as to cause the at least one processor to perform any one of the above-mentioned steps of the ETL task-based incremental data extraction method.
It is emphasized that, to further ensure the privacy and security of the ETL task, the ETL task can also be stored in a node of a blockchain.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a computer device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Finally, it should be noted that the above-mentioned embodiments illustrate only some of the embodiments of the present application, and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. An ETL task-based incremental data extraction method is characterized by comprising the following steps:
the method comprises the steps of obtaining an ETL task from a preset task library, wherein the ETL task comprises a plurality of basic fields;
screening ETL tasks containing preset fields from the basic fields as target tasks;
analyzing the target task, and determining a data type corresponding to the target task;
judging whether the target task belongs to the first operation;
if the target task is operated for the first time, determining a target increment limit value corresponding to the target task according to the data type;
if the target task does not belong to the first operation, acquiring a target value corresponding to the preset field as a target increment limit value corresponding to the target task;
and extracting incremental data of the ETL task according to the target incremental limit value.
2. The ETL task-based incremental data extraction method of claim 1, wherein the step of analyzing the target task and determining the data type corresponding to the target task comprises:
analyzing the target task to obtain output data corresponding to the target task;
and identifying whether the output data contains a preset type, and determining the data type corresponding to the target task according to the identification result.
3. The ETL task-based incremental data extraction method of claim 2, wherein the step of identifying whether the output data contains a preset type and determining the data type corresponding to the target task according to the identification result comprises:
identifying whether the output data contains a preset type;
if a preset type is identified, taking the preset type as a data type corresponding to the target task;
and if the preset type is not identified, acquiring a basic type from a preset type table, and taking the basic type as a data type corresponding to the target task.
4. The ETL task-based incremental data extraction method of claim 1, wherein if the target task is run for the first time, the step of determining the target increment limit value corresponding to the target task according to the data type comprises:
if the target task is operated for the first time, inquiring whether the target task has a configuration value from a preset configuration table;
when the configuration value is inquired, taking the configuration value as the target increment limit value;
and when the configuration value is not inquired, inquiring a minimum basic value corresponding to the data type from a preset database, and taking the minimum basic value as a target increment limit value corresponding to the target task.
5. The ETL task-based incremental data extraction method of claim 2, wherein if the target task does not belong to the first operation, the step of obtaining the target value corresponding to the preset field as the target increment limit value corresponding to the target task comprises:
if the target task does not belong to the first operation, acquiring the detection state of the target task from a preset log table;
screening out a target task with the detection state matched with the preset state as an initial task in a mode of matching the detection state with the preset state;
judging whether the output data corresponding to the initial task is output to a database or not;
if the output data is output to a database, inquiring a maximum value corresponding to a preset field in the database and a historical cache value corresponding to the database, wherein the database comprises the preset field;
comparing the maximum value with the historical cache value, and determining a target increment limit value corresponding to the target task according to a comparison result;
and if the output data is not output to the database, acquiring an updated cache value corresponding to the output data as the target increment limit value.
6. The ETL task based incremental data extraction method of claim 5, wherein the step of comparing the maximum value with the historical cache value and determining a target increment limit value corresponding to the target task according to the comparison result comprises:
comparing the maximum value to the historical cache value;
if the maximum value is smaller than the historical cache value, taking the maximum value as the target increment threshold value;
and if the maximum value is larger than or equal to the historical cache value, taking the historical cache value as the target increment limit value.
7. The ETL task-based incremental data extraction method according to claim 1, wherein after the step of obtaining the ETL task from the preset task library, the ETL task-based incremental data extraction method further comprises:
storing the ETL task into a blockchain.
8. An ETL task-based incremental data extraction device, which is characterized by comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an ETL task from a preset task library, and the ETL task comprises a plurality of basic fields;
the matching module is used for screening out ETL tasks containing preset fields from the basic fields as target tasks;
the analysis module is used for analyzing the target task and determining the data type corresponding to the target task;
the judging module is used for judging whether the target task is operated for the first time;
the first determining module is used for determining a target increment limit value corresponding to the target task according to the data type if the target task runs for the first time;
the second determining module is used for acquiring a target value corresponding to the preset field as a target increment limit value corresponding to the target task if the target task does not belong to the first operation;
and the incremental data extraction module is used for extracting incremental data of the ETL task according to the target incremental limit value.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the ETL task based incremental data extraction method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the ETL task based incremental data extraction method according to any one of claims 1 to 7.
CN202010610186.8A 2020-06-29 2020-06-29 ETL task-based incremental data extraction method, device, equipment and medium Pending CN111813845A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010610186.8A CN111813845A (en) 2020-06-29 2020-06-29 ETL task-based incremental data extraction method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010610186.8A CN111813845A (en) 2020-06-29 2020-06-29 ETL task-based incremental data extraction method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN111813845A true CN111813845A (en) 2020-10-23

Family

ID=72856231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010610186.8A Pending CN111813845A (en) 2020-06-29 2020-06-29 ETL task-based incremental data extraction method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN111813845A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641742A (en) * 2021-08-05 2021-11-12 广东电网有限责任公司 Data extraction method, device, equipment and storage medium
CN113886478A (en) * 2021-09-30 2022-01-04 杭州数梦工场科技有限公司 Data processing method and device applied to ETL (extract transform load) and electronic equipment
CN113961572A (en) * 2021-12-23 2022-01-21 中电云数智科技有限公司 Database synchronization method and synchronization device based on increment field
CN115292021A (en) * 2022-09-28 2022-11-04 江西萤火虫微电子科技有限公司 Task scheduling method, system, electronic device and readable storage medium
CN116303702A (en) * 2022-12-27 2023-06-23 易方达基金管理有限公司 ETL-based data parallel processing method, device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915336A (en) * 2012-09-18 2013-02-06 北京金和软件股份有限公司 Incremental data capturing and extraction method based on timestamps and logs
CN103823797A (en) * 2012-11-16 2014-05-28 镇江诺尼基智能技术有限公司 FTP (file transfer protocol) based real-time industry database data synchronization system
CN105069142A (en) * 2015-08-18 2015-11-18 山大地纬软件股份有限公司 System and method for extraction, transformation and distribution of data increments
CN105488187A (en) * 2015-12-02 2016-04-13 北京四达时代软件技术股份有限公司 Method and device for extracting multi-source heterogeneous data increment
CN105677536A (en) * 2016-01-08 2016-06-15 上海斐讯数据通信技术有限公司 Implementing method for task messages and task system for implementing task messages
CN108681590A (en) * 2018-05-15 2018-10-19 普信恒业科技发展(北京)有限公司 Incremental data processing method and processing device, computer equipment, computer storage media
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission
CN109992621A (en) * 2019-04-11 2019-07-09 郭承湘 Foods supervision information resources increment ETL system and method
CN110765091A (en) * 2019-09-09 2020-02-07 上海陆家嘴国际金融资产交易市场股份有限公司 Account checking method and system
CN111061554A (en) * 2019-12-17 2020-04-24 深圳前海环融联易信息科技服务有限公司 Intelligent task scheduling method and device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915336A (en) * 2012-09-18 2013-02-06 北京金和软件股份有限公司 Incremental data capturing and extraction method based on timestamps and logs
CN103823797A (en) * 2012-11-16 2014-05-28 镇江诺尼基智能技术有限公司 FTP (file transfer protocol) based real-time industry database data synchronization system
CN105069142A (en) * 2015-08-18 2015-11-18 山大地纬软件股份有限公司 System and method for extraction, transformation and distribution of data increments
CN105488187A (en) * 2015-12-02 2016-04-13 北京四达时代软件技术股份有限公司 Method and device for extracting multi-source heterogeneous data increment
CN105677536A (en) * 2016-01-08 2016-06-15 上海斐讯数据通信技术有限公司 Implementing method for task messages and task system for implementing task messages
CN108681590A (en) * 2018-05-15 2018-10-19 普信恒业科技发展(北京)有限公司 Incremental data processing method and processing device, computer equipment, computer storage media
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission
CN109992621A (en) * 2019-04-11 2019-07-09 郭承湘 Foods supervision information resources increment ETL system and method
CN110765091A (en) * 2019-09-09 2020-02-07 上海陆家嘴国际金融资产交易市场股份有限公司 Account checking method and system
CN111061554A (en) * 2019-12-17 2020-04-24 深圳前海环融联易信息科技服务有限公司 Intelligent task scheduling method and device, computer equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641742A (en) * 2021-08-05 2021-11-12 广东电网有限责任公司 Data extraction method, device, equipment and storage medium
CN113886478A (en) * 2021-09-30 2022-01-04 杭州数梦工场科技有限公司 Data processing method and device applied to ETL (extract transform load) and electronic equipment
CN113961572A (en) * 2021-12-23 2022-01-21 中电云数智科技有限公司 Database synchronization method and synchronization device based on increment field
CN115292021A (en) * 2022-09-28 2022-11-04 江西萤火虫微电子科技有限公司 Task scheduling method, system, electronic device and readable storage medium
CN116303702A (en) * 2022-12-27 2023-06-23 易方达基金管理有限公司 ETL-based data parallel processing method, device, equipment and storage medium
CN116303702B (en) * 2022-12-27 2024-04-05 易方达基金管理有限公司 ETL-based data parallel processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111813845A (en) ETL task-based incremental data extraction method, device, equipment and medium
CN110309125B (en) Data verification method, electronic device and storage medium
CN112052138A (en) Service data quality detection method and device, computer equipment and storage medium
US20220156367A1 (en) System and method for detection of anomalous controller area network (can) messages
CN111835737B (en) WEB attack protection method based on automatic learning and related equipment thereof
CN111159413A (en) Log clustering method, device, equipment and storage medium
CN109933502B (en) Electronic device, user operation record processing method and storage medium
CN114662618B (en) Failure diagnosis method and device based on federal learning and related equipment
CN109783385B (en) Product testing method and device
CN112445775A (en) Fault analysis method, device, equipment and storage medium of photoetching machine
CN114493255A (en) Enterprise abnormity monitoring method based on knowledge graph and related equipment thereof
CN113452710A (en) Unauthorized vulnerability detection method, device, equipment and computer program product
CN111400435A (en) Mail alarm convergence method, device, computer equipment and storage medium
CN113946492A (en) Intelligent operation and maintenance method, device, equipment and storage medium
CN111752958A (en) Intelligent associated label method, device, computer equipment and storage medium
CN112671614A (en) Associated system connectivity test method, system, device and storage medium
CN110442466B (en) Method, device, computer equipment and storage medium for preventing repeated access request
CN108763053B (en) Method for generating buried point name and terminal equipment
CN111475526A (en) Sequential data conversion method based on oracle data and related equipment thereof
CN114168610B (en) Distributed storage and query method and system based on line sequence division
CN115589339A (en) Network attack type identification method, device, equipment and storage medium
KR20220117865A (en) Security compliance automation method
CN109977992B (en) Electronic device, method for identifying batch registration behaviors and storage medium
CN114547590A (en) Code detection method, device and non-transitory computer readable storage medium
CN112288060A (en) Method and apparatus for identifying a tag

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right

Effective date of registration: 20210218

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen saiante Technology Service Co.,Ltd.

Address before: 1-34 / F, Qianhai free trade building, 3048 Xinghai Avenue, Mawan, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong 518000

Applicant before: Ping An International Smart City Technology Co.,Ltd.

TA01 Transfer of patent application right
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination