Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the automatic ETL task processing method and device, which can effectively liberate manpower, save the manpower time cost and improve the efficiency.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, the present invention provides an ETL task automatic processing method, including:
putting data collected from a source database and a LOG LOG into an STG layer data sheet;
and detecting whether the STG layer data table has corresponding ODS layer data tables, if not, creating the corresponding ODS layer data tables, if so, detecting whether the data table fields in the STG layer data table have corresponding fields in the ODS layer data tables, and if not, supplementing the corresponding fields in the ODS layer data tables.
Further, when creating a corresponding ODS layer data table, the table name of the ODS layer data table follows the following naming convention: data layer name _ source library name _ source table name _ update frequency and extraction mode.
Further, when the corresponding fields are supplemented in the ODS layer data table, the field names are consistent with those in the standard data dictionary, which contains the following fields: STG layer table name, STG layer table field English name, STG layer table field Chinese name, STG layer table field data type, ODS layer table name, ODS layer table field standard English name, ODS layer table field standard Chinese name, ODS layer table field standard data type, and ODS layer table field default values.
Further, the method further comprises: data is transferred from the STG layer data table to the ODS layer data table.
Further, the method further comprises: when the ETL task fails, judging whether the failure reason is that an ETL scheduling system is unstable or not, if so, starting the ETL task for retry, otherwise, judging whether the ETL task is modified by a programmer, if so, rolling back the ETL task to a task version before modification to ensure the normal execution of the ETL task, sending an email to inform a responsible person corresponding to the ETL task that the task version is rolled back, otherwise, judging whether the ETL task which fails is the ETL task with the data asset level higher than a first preset threshold or the downstream influence range larger than a second preset threshold or not, if so, sending prompt information to prompt an operation and maintenance attendant to process in time, otherwise, storing the failure task to be processed by the operation and maintenance attendant or the ETL responsible person at normal working time.
In a second aspect, the present invention further provides an ETL task automatic processing device, including:
the transfer module is used for putting data collected from the source database and the LOG LOG into an STG layer data table;
and the detection module is used for detecting whether the STG layer data table has the corresponding ODS layer data table, if not, creating the corresponding ODS layer data table, if so, detecting whether the data table fields in the STG layer data table have the corresponding fields in the ODS layer data table, and if not, supplementing the corresponding fields in the ODS layer data table.
Further, when the detection module creates the corresponding ODS layer data table, the table name of the ODS layer data table follows the following naming convention: data layer name _ source library name _ source table name _ update frequency and extraction mode.
Further, when the detection module supplements corresponding fields in the ODS layer data table, field names are consistent with those in a standard data dictionary, wherein the standard data dictionary comprises the following fields: STG layer table name, STG layer table field English name, STG layer table field Chinese name, STG layer table field data type, ODS layer table name, ODS layer table field standard English name, ODS layer table field standard Chinese name, ODS layer table field standard data type, and ODS layer table field default values.
In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the ETL task automatic processing method according to the first aspect are implemented.
In a fourth aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the ETL task automatic processing method according to the first aspect.
According to the technical scheme, the ETL task automatic processing method provided by the invention comprises the steps of firstly putting data collected from a source database and a LOG LOG into an STG layer data table, then detecting whether the STG layer data table has a corresponding ODS layer data table, if not, creating the corresponding ODS layer data table, if so, detecting whether the data table fields in the STG layer data table have corresponding fields in the ODS layer data table, and if not, supplementing the corresponding fields in the ODS layer data table. Therefore, the invention can realize the automatic processing of the ETL task, thereby effectively liberating the manpower, saving the manpower time cost and simultaneously improving the efficiency.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides an ETL task automatic processing method, referring to fig. 1 and fig. 2, the method includes the following steps:
step 101: and putting the data collected from the source database and the LOG LOG into an STG layer data table.
Step 102: and detecting whether the STG layer data table has corresponding ODS layer data tables, if not, creating the corresponding ODS layer data tables, if so, detecting whether the data table fields in the STG layer data table have corresponding fields in the ODS layer data tables, and if not, supplementing the corresponding fields in the ODS layer data tables.
As can be seen from the above description, in the ETL task automatic processing method provided in this embodiment, first, data collected from a source database and a LOG are put into an STG layer data table, and then whether the STG layer data table has a corresponding ODS layer data table is detected, if not, a corresponding ODS layer data table is created, if yes, whether data table fields in the STG layer data table have corresponding fields in the ODS layer data table is detected, and if not, corresponding fields are supplemented in the ODS layer data table. Therefore, the automatic processing of the ETL task can be realized, so that the manpower can be effectively liberated, the manpower time cost is saved, and the efficiency is improved.
In a preferred embodiment, the table names of the ODS layer data tables, when creating the corresponding ODS layer data tables, follow the following naming convention: data layer name _ source library name _ source table name _ update frequency and extraction mode. Such as ods _ uc _ user _ da, where the naming information is in the metadata of the STG layer data table.
In a preferred embodiment, the field names are kept consistent with those in a standard data dictionary containing the following fields when the corresponding fields are supplemented in the ODS layer data table: STG layer table name, STG layer table field English name, STG layer table field Chinese name, STG layer table field data type, ODS layer table name, ODS layer table field standard English name, ODS layer table field standard Chinese name, ODS layer table field standard data type, and ODS layer table field default values.
It can be understood that the data in the current standard data dictionary is used as a training set for machine learning, and based on the algorithm such as k-nearest neighbor in machine learning, the naming, data type and default values of the ODS layer table field are obtained, and finally the ETL task of creating the table or adding the field is completed.
In a preferred embodiment, after performing the above step 101-102, the method further includes:
step 103: data is transferred from the STG layer data table to the ODS layer data table.
It can be understood that after the above processing of steps 101-102 is completed, the data cleaning and standardization work from the STG layer to the ODS layer is completed, so that the data can be more easily transferred from the STG layer data table to the ODS layer data table.
In the actual operation and maintenance process, ETL tasks fail, which is always a nightmare troubling operators on duty. The ETL failure tasks must be processed frequently overnight, otherwise the downstream ETL tasks are influenced, and the work of decision support, analysts and operators and the like is influenced. However, it is not reasonable to immediately process the ETL when the ETL reports an error, and for this reason, in a preferred embodiment, the automatic processing method for the ETL task further includes the following processing procedures:
when the ETL task fails, judging whether the failure reason is that an ETL scheduling system is unstable or not, if so, starting the ETL task for retry, otherwise, judging whether the ETL task is modified by a programmer, if so, rolling back the ETL task to a task version before modification to ensure the normal execution of the ETL task, sending an email to inform a responsible person corresponding to the ETL task that the task version is rolled back, otherwise, judging whether the ETL task which fails is the ETL task with the data asset level higher than a first preset threshold or the downstream influence range larger than a second preset threshold or not, if so, sending prompt information to prompt an operation and maintenance attendant to process in time, otherwise, storing the failure task to be processed by the operation and maintenance attendant or the ETL responsible person at normal working time.
It can be understood that, through the above judgment and processing procedures, unnecessary night work of the operation and maintenance attendant can be avoided, so that the working intensity of the operation and maintenance attendant can be reduced.
Therefore, the ETL task automatic processing method provided by the embodiment can automatically establish the table and add the field, name the field and standardize the data type, thereby improving the efficiency, liberating the manpower and saving the time cost of expensive engineers. In addition, according to the automatic processing method for the ETL task, which is provided by the embodiment, because the failed ETL task can be automatically processed, the influence range of the task can be reduced, and frequent night visits of operation and maintenance operators can be reduced.
Based on the same inventive concept, another embodiment of the present invention provides an ETL task automatic processing device, referring to fig. 3, the device including: a transfer module 21 and a detection module 22, wherein:
the transfer module 21 is configured to put data collected from the source database and the LOG into the STG layer data table;
the detection module 22 is configured to detect whether the STG layer data table has a corresponding ODS layer data table, if not, create a corresponding ODS layer data table, if so, detect whether the data table fields in the STG layer data table have corresponding fields in the ODS layer data table, and if not, supplement the corresponding fields in the ODS layer data table.
In a preferred embodiment, when the detection module 22 creates a corresponding ODS layer data table, the table name of the ODS layer data table follows the following naming convention: data layer name _ source library name _ source table name _ update frequency and extraction mode.
In a preferred embodiment, the detection module 22 is configured to keep the field names consistent with those in a standard data dictionary when the corresponding fields are supplemented in the ODS layer data table, where the standard data dictionary includes the following fields: STG layer table name, STG layer table field English name, STG layer table field Chinese name, STG layer table field data type, ODS layer table name, ODS layer table field standard English name, ODS layer table field standard Chinese name, ODS layer table field standard data type, and ODS layer table field default values.
In a preferred embodiment, the apparatus further comprises a processing module for transferring data from the STG layer data table to the ODS layer data table.
The ETL task automatic processing device provided by the embodiment of the present invention can be used for executing the ETL task automatic processing method described in the above embodiment, and the specific working principle and the beneficial effects thereof are similar, and are not described in detail herein.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 4: a processor 701, a memory 702, a communication interface 703 and a bus 704;
the processor 701, the memory 702 and the communication interface 703 complete mutual communication through the bus 704; the communication interface 703 is used for realizing information transmission between related devices such as modeling software, an intelligent manufacturing equipment module library and the like;
the processor 701 is configured to call the computer program in the memory 702, and the processor implements all the steps in the first embodiment when executing the computer program, for example, the processor implements the following steps when executing the computer program:
step 101: and putting the data collected from the source database and the LOG LOG into an STG layer data table.
Step 102: and detecting whether the STG layer data table has corresponding ODS layer data tables, if not, creating the corresponding ODS layer data tables, if so, detecting whether the data table fields in the STG layer data table have corresponding fields in the ODS layer data tables, and if not, supplementing the corresponding fields in the ODS layer data tables.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium, having a computer program stored thereon, where the computer program is executed by a processor to implement all the steps of the first embodiment, for example, when the processor executes the computer program, the processor implements the following steps:
step 101: and putting the data collected from the source database and the LOG LOG into an STG layer data table.
Step 102: and detecting whether the STG layer data table has corresponding ODS layer data tables, if not, creating the corresponding ODS layer data tables, if so, detecting whether the data table fields in the STG layer data table have corresponding fields in the ODS layer data tables, and if not, supplementing the corresponding fields in the ODS layer data tables.
In the description of the present invention, it should be noted that the terms "upper", "lower", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, which are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and operate, and thus, should not be construed as limiting the present invention. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above examples are only for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.