CN116186065A - Batch script generation method and device for data extraction-conversion-loading ETL - Google Patents

Batch script generation method and device for data extraction-conversion-loading ETL Download PDF

Info

Publication number
CN116186065A
CN116186065A CN202310014437.XA CN202310014437A CN116186065A CN 116186065 A CN116186065 A CN 116186065A CN 202310014437 A CN202310014437 A CN 202310014437A CN 116186065 A CN116186065 A CN 116186065A
Authority
CN
China
Prior art keywords
target
script
type
etl
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310014437.XA
Other languages
Chinese (zh)
Inventor
陈伟江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202310014437.XA priority Critical patent/CN116186065A/en
Publication of CN116186065A publication Critical patent/CN116186065A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Stored Programmes (AREA)

Abstract

The disclosure relates to a batch script generation method and device for data extraction-conversion-loading ETL, and belongs to the technical field of automatic programming. The batch script generation method of the data extraction-conversion-loading ETL comprises the following steps: abstracting a general structure of the ETL batch processing program to obtain an instantiation document of the structured data; receiving an input target script type, and acquiring a target script generation template matched with the target script type; statement processing is carried out according to the data item information in the instantiated document, and a target structured query language SQL statement is obtained; and filling the target script template by using the target SQL statement to generate the ETL batch script. Therefore, the label system of the ETL script logic is defined by the instantiation document, so that the automatic generation of the ETL script code can be supported, the complexity of generating ETL batch processing script processing logic is reduced, the readability of a program is improved, the development threshold is reduced, and meanwhile, the ETL script can support various script types and database types.

Description

Batch script generation method and device for data extraction-conversion-loading ETL
Technical Field
The disclosure relates to the technical field of programming, in particular to a batch script generation method and device for data extraction-conversion-loading ETL.
Background
In the related art, for extraction-transformation-Load (ETL) automation, metadata information of a source database and a target database is generally obtained, and processing mapping is assisted by the metadata information, so as to achieve the purpose of reducing the workload of developing data processing codes, but in a data processing application scenario of big data, the processing of the data is relatively more complex, the data processing mapping definition is difficult to be completed simply by using the metadata information, in terms of supporting multiple database types, the script is also developed for a specific database after determining the database types of the source database and the target database, and the script cannot be quickly adjusted to support other database types.
Disclosure of Invention
The present disclosure provides a batch script generation method, apparatus, electronic device, computer readable storage medium and computer program product for data extraction-conversion-loading ETL, so as to at least solve the problem that processing logic of ETL script processing in the related art is complex, and ETL scripts do not support multiple database types.
The technical scheme of the present disclosure is as follows:
according to a first aspect of an embodiment of the present disclosure, there is provided a batch script generation method of data extraction-conversion-loading ETL, including: abstracting a general structure of the ETL batch processing program to obtain an instantiation document of the structured data; receiving an input target script type, and acquiring a target script generation template matched with the target script type; performing sentence processing according to the data item information in the instantiation document to obtain a target structured query language SQL sentence; and filling the target script template by using the target SQL statement to generate an ETL batch script.
In one embodiment of the present disclosure, the processing of the statement according to the data item information in the instantiated document to obtain the target structured query language SQL statement includes: traversing the instantiation document according to groups to acquire data item information of each group; and carrying out statement processing on the data item information of the same group to obtain the target SQL statements of each group.
In one embodiment of the present disclosure, the sentence processing on the data item information of the same group, to obtain the target SQL sentence of each group, includes: acquiring a tag type from the data item information, and determining an SQL sentence structure according to the tag type; and acquiring the sub-tag type included in the tag type, and writing other data items into the SQL statement structure according to the sub-tag type to generate the target SQL statement.
In one embodiment of the present disclosure, the sentence processing on the data item information of the same group, to obtain the target SQL sentence of each group, includes: acquiring a tag type from the data item information, and determining an SQL sentence structure according to the tag type; and acquiring the sub-tag type included in the tag type, and writing other data items into the SQL statement structure according to the sub-tag type to generate the target SQL statement.
In one embodiment of the disclosure, the determining the SQL statement structure according to the tag type includes:
receiving an input target database type; determining whether the instantiated document comprises an extension type label according to the target database type; and if the extension type label exists, determining a target SQL statement structure matched with the target database type from SQL statement structures of a plurality of databases according to the extension type label.
In one embodiment of the present disclosure, the obtaining the sub-tag type included in the tag type, and writing a data item in the SQL statement structure according to the sub-tag type, generating the target SQL statement includes: determining an extended sub-label type according to the extended type label and the sub-label type; and writing data items into the target SQL statement structure according to the extension sub-label type, and generating the target SQL statement.
In one embodiment of the present disclosure, the filling the target script template with the target SQL statement generates an ETL batch script, including: acquiring condition control labels among groups from the instantiation document; generating execution logic sentences among the target SQL sentences of each group according to the condition control labels; and combining the target SQL sentences of each group according to the execution logic sentences, filling the target SQL sentences into the target script template, and generating the ETL batch processing script.
In one embodiment of the present disclosure, the method further comprises: extracting the description information of the instantiation document, and generating annotation information of the ETL batch script according to the description information; and annotating the file name and the header file of the ETL batch script based on the annotation information.
According to a second aspect of the embodiments of the present disclosure, there is provided a batch script generating apparatus of data extraction-conversion-loading ETL, including: the first acquisition module is configured to abstract the general structure of the ETL batch processing program and acquire an instantiation document of the structured data; the second acquisition module is configured to receive an input target script type and acquire a target script generation template matched with the target script type; the processing module is configured to process sentences according to the data item information in the instantiation document to obtain a target structured query language SQL sentence; and the generating module is configured to fill the target script template by using the target SQL statement to generate an ETL batch script.
In one embodiment of the present disclosure, the processing module is further configured to: traversing the instantiation document according to groups to acquire data item information of each group; and carrying out statement processing on the data item information of the same group to obtain the target SQL statements of each group.
In one embodiment of the present disclosure, the processing module is further configured to: acquiring a tag type from the data item information, and determining an SQL sentence structure according to the tag type; and acquiring the sub-tag type included in the tag type, and writing other data items into the SQL statement structure according to the sub-tag type to generate the target SQL statement. In one embodiment of the present disclosure,
In one embodiment of the present disclosure, the processing module is further configured to: acquiring SQL statement fragments corresponding to the sub-tag types and data items of the sub-tag types; writing data items of the sub-tag types in SQL statement fragments corresponding to the sub-tag types; and combining the SQL statement fragments according to the logical relation of the sub-label types to generate the SQL statement.
In one embodiment of the present disclosure, the processing module is further configured to: receiving an input target database type; determining whether the instantiated document comprises an extension type label according to the target database type; and if the extension type label exists, determining a target SQL statement structure matched with the target database type from SQL statement structures of a plurality of databases according to the extension type label.
In one embodiment of the present disclosure, the processing module is further configured to: determining an extended sub-label type according to the extended type label and the sub-label type; and writing data items into the target SQL statement structure according to the extension sub-label type, and generating the target SQL statement.
In one embodiment of the present disclosure, the generating module is further configured to: acquiring condition control labels among groups from the instantiation document; generating execution logic sentences among the target SQL sentences of each group according to the condition control labels; and combining the target SQL sentences of each group according to the execution logic sentences, filling the target SQL sentences into the target script template, and generating the ETL batch processing script.
In one embodiment of the present disclosure, the apparatus is further configured to: extracting the description information of the instantiation document, and generating annotation information of the ETL batch script according to the description information; and annotating the file name and the header file of the ETL batch script based on the annotation information.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the batch script generation method of the data decimation-conversion-loading ETL of the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the batch script generation method of data extraction-conversion-loading ETL of the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the batch script generation method of the data extraction-transformation-loading ETL of the first aspect.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: abstracting a general structure of the ETL batch processing program, obtaining an instantiation document of structured data, receiving an input target script type, obtaining a target script generation template matched with the target script type, carrying out sentence processing according to data item information in the instantiation document to obtain a target Structured Query Language (SQL) sentence, and filling the target script template by using the target SQL sentence to generate the ETL batch processing script. Therefore, the batch processing script generation method for the data extraction-conversion-loading ETL can support the automatic generation of ETL script codes by acquiring the instantiation document of the structured data and defining the label system of the ETL script logic by the instantiation document, reduce the complexity of generating the ETL batch processing script processing logic, improve the readability of a program and reduce the development threshold, and simultaneously the ETL script can support various script types and database types.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.
Fig. 1 is a flow diagram of a batch script generation method of data extraction-conversion-loading ETL according to a first embodiment of the present disclosure.
Fig. 2 is a flow diagram of a batch script generation method of data extraction-transformation-loading ETL according to a second embodiment of the present disclosure.
Fig. 3 is a flow chart of a batch script generation method of data extraction-conversion-loading ETL according to a third embodiment of the present disclosure.
Fig. 4 is a flow chart of a batch script generation method of data extraction-conversion-loading ETL according to a fourth embodiment of the present disclosure.
Fig. 5 is a flow diagram of a batch script generation method of data extraction-transformation-loading ETL provided in accordance with the present disclosure.
Fig. 6 is a block diagram of a batch script generating apparatus of data extraction-conversion-loading ETL according to a first embodiment of the present disclosure.
Fig. 7 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be noted that, in the technical scheme of the application, the acquisition, storage, use, processing and the like of the data all conform to the relevant regulations of national laws and regulations.
Fig. 1 is a flow diagram of a batch script generation method of data extraction-conversion-loading ETL according to a first embodiment of the present disclosure.
As shown in fig. 1, the batch script generation method of the data extraction-conversion-loading ETL of the first embodiment of the present disclosure includes the steps of:
in step S101, the general structure of the ETL batch program is abstracted, and an instantiation document of the structured data is obtained.
It should be noted that, the general structure of the ETL batch processing program is abstracted, the specific manner of obtaining the instantiation document of the structured data is not limited, and the method can be selected according to actual situations.
Alternatively, a structured data manipulation mapping instruction may be received that abstracts the general structure of the ETL batch to obtain an instantiated document of structured data.
Wherein, the instantiation document of the structured data describes the processing logic of ETL script processing.
In step S102, an input target script type is received, and a target script generation template matching the target script type is acquired.
Optionally, the target script type may be multiple types of a utility report extraction language (Practical Extraction and Report Language, abbreviated as Perl), shell, python, and the like.
In the embodiment of the disclosure, after receiving the input target script type, the target script generation template matched with the target script type may be returned.
For example, when the target script type received and input is Perl, acquiring a Perl script generation template; when the type of the target script received and input is Shell, acquiring a Shell script generation template; and when the type of the target script received and input is Python, acquiring a Python script generation template.
It should be noted that, after the target script generating template matching the target script type is acquired, a common variable related to the target script type may also be acquired, for example: a line feed identifier, a number of indentation characters, etc.
In step S103, sentence processing is performed according to the data item information in the instantiated document, and the target structured query language SQL sentence is obtained.
The structured query language (Structured Query Language, abbreviated as SQL) is a database language with various functions such as data manipulation and data definition.
Alternatively, the data item information in the instantiation document may include "target table", "target field type", "packet sequence number", "mapping rule", "source table", "source field type", and so forth.
Alternatively, the sentence processing is performed according to the data item information in the instantiated document in accordance with the "group order", and one group represents one complete processing sentence.
In the embodiment of the disclosure, after the data item information in the instantiation document is acquired, the instantiation document can be traversed according to groups, the data item information of each group is acquired, and sentence processing is performed on the data item information of the same group, so that the target SQL sentence of each group is obtained.
In step S104, the target script template is filled with the target SQL statement to generate an ETL batch script.
Alternatively, the target SQL sentences of each group can be combined according to the execution logic sentences and filled into the target script template to generate the ETL batch script.
Note that, the instantiation document may further include annotation information such as "job name", "job file name", "job frequency", "job description", "job author", and the like, and the annotation information may annotate the file name for generating the ETL batch script and the header file of the ET L batch script.
According to the batch processing script generation method of the data extraction-conversion-loading ETL, provided by the embodiment of the invention, the general structure of an ETL batch processing program is abstracted, an instantiation document of structured data is obtained, an input target script type is received, a target script generation template matched with the target script type is obtained, sentence processing is carried out according to data item information in the instantiation document, a target structured query language SQL sentence is obtained, and the target script template is filled by using the target SQL sentence, so that an ETL batch processing script is generated. Therefore, the batch processing script generation method for the data extraction-conversion-loading ETL can support the automatic generation of ETL script codes by acquiring the instantiation document of the structured data and defining the label system of the ETL script logic by the instantiation document, reduce the complexity of generating the ETL batch processing script processing logic, improve the readability of a program and reduce the development threshold, and simultaneously the ETL script can support various script types and database types.
Fig. 2 is a flow diagram of a batch script generation method of data extraction-transformation-loading ETL according to a second embodiment of the present disclosure.
As shown in fig. 2, the batch script generation method of the data extraction-conversion-loading ETL of the second embodiment of the present disclosure includes the steps of:
in step S201, the general structure of the ETL batch program is abstracted, and an instantiation document of the structured data is obtained.
In step S202, an input target script type is received, and a target script generation template matching the target script type is acquired.
The relevant content of steps S201-S202 can be seen in the above embodiments, and will not be described here again.
In step S203, the instantiated documents are traversed group by group, and the data item information of each group is acquired.
It should be noted that, because the data items corresponding to different groups are different, the instantiation document may be traversed by groups to obtain the data item information of each group.
In step S204, the same group of data item information is subjected to sentence processing, so as to obtain a target SQL sentence of each group.
As a possible implementation manner, as shown in fig. 3, based on the implementation manner, the specific process of performing sentence processing on the same group of data item information in step S204 to obtain the target SQL sentence of each group includes the following steps:
In step S301, a tag type is acquired from the data item information, and an SQL statement structure is determined according to the tag type.
It should be noted that, the tag type may be a type tag, an extension type tag, a sub-tag type, an extension sub-tag type, etc., and the SQL statement may be generated according to the tag type of the packet and the definition in the sub-tag type.
As a possible implementation manner, as shown in fig. 4, on the basis of the implementation manner, the specific process of determining the SQL statement structure according to the tag type in step S301 includes the following steps:
in step S401, an input target database type is received.
It should be noted that, in the embodiment of the present disclosure, the setting of the target database type is not limited, and may be selected according to actual situations.
Alternatively, the target database type may be a plurality of database types such as Oracle, mySQL, postgreSQL.
In step S402, it is determined whether an extension type tag is included in the instantiated document according to the target database type.
It should be noted that, in order to support multiple database types to generate the ETL batch script, the type tag may be extended to obtain an extended type tag.
In embodiments of the present disclosure, after the target database type is obtained, a determination may be made as to whether an extension type tag is included in the instantiated document.
In step S403, if the extension type tag exists, a target SQL statement structure matching the target database type is determined from the SQL statement structures of the plurality of databases according to the extension type tag.
It should be noted that if there is no extension type tag, the SQL statement structure may be determined according to the tag type.
In the embodiment of the disclosure, after determining that the extension type tag exists, the extension sub-tag type can be determined according to the extension type tag and the sub-tag type, and the data item is written in the target SQL statement structure according to the extension sub-tag type to generate the target SQL statement.
For example, as shown in table 1, for sub-label types including an INSERT label and an INSERT label, an INSERT extension type label and an INSERT extension sub-label type may be obtained, for sub-label types including a MERGE label and a MERGE label, a MERGE extension type label and a MERGE extension sub-label type may be obtained, and for sub-label types including an UPDATE label and an UPDATE label, an UPDATE extension type label and an UPDATE extension sub-label type may be obtained.
TABLE 1
Figure BDA0004039793410000081
In step S302, the sub-tag type included in the tag type is obtained, and according to the sub-tag type, other data items are written into the SQL statement structure to generate the target SQL statement.
It should be noted that the sub-tag types included in different tag types may be the same or different.
Optionally, the tag type may be an INSERT tag, a MERGE tag, or an UPDATE tag, after the tag type is obtained, the basic structure of the SQL statement may be determined, and according to the sub-tag type included in the tag type, other data items may be written into the SQL statement structure, so that the corresponding target SQL statement may be generated.
Optionally, the SQL sentence fragments corresponding to the sub-tag types and the data items of the sub-tag types can be obtained, the data items of the sub-tag types are written in the SQL sentence fragments corresponding to each sub-tag type, and the SQL sentence fragments are combined according to the logic relationship of the sub-tag types to generate the SQL sentence.
For example, as shown in table 2, for the tag type being an INSERT tag and the sub-tag type included in the INSERT tag, other data items may be written in the SQL statement structure to generate the target SQL statement; aiming at the tag type being a MERGE tag and the sub tag type included in the MERGE tag, other data items can be written in the SQL statement structure to generate a target SQL statement; for the label type being an UPDATE label and the sub-label type included in the UPDATE label, other data items can be written in the SQL statement structure to generate a target SQL statement.
TABLE 2
Figure BDA0004039793410000091
In step S205, the condition control tags between the groups are acquired from the instantiation document.
The grouping and grouping can be combined by a condition control tag.
In step S206, execution logic statements between the target SQL statements of each group are generated according to the condition control tags.
Alternatively, the condition control tag may be a IF.. ELSE IF.. ELSE. END IF tag.
For example, as shown in Table 3, when the condition control tag is IF.. ELSE IF.. ELSE. END IF tag, execution logic statements between the target SQL statements of each group may be generated.
TABLE 3 Table 3
Figure BDA0004039793410000101
In step S207, the target SQL statements of each group are combined according to the execution logic statement and filled into the target script template to generate an ETL batch script.
In the embodiment of the disclosure, after different groups of target SQL sentences are obtained, the target SQL sentences of each group can be combined according to the execution logic sentences and filled into the target script template to generate the ETL batch script.
According to the batch processing script generation method for extracting, converting and loading the data, provided by the embodiment of the disclosure, the general structure of the ETL batch processing program is abstracted, the instantiation document of structured data is obtained, the input target script type is received, the target script generation template matched with the target script type is obtained, the instantiation document is traversed according to groups, the data item information of each group is obtained, the sentence processing is carried out on the data item information of the same group, the target SQL sentence of each group is obtained, the target SQL sentence is used for filling the target script template, the ETL batch processing script is generated, the complexity of the processing logic of the ETL batch processing script is reduced by obtaining the instantiation document of structured data, the readability of the program is improved, the development threshold is reduced, and meanwhile, the ETL script can support various script types and database types.
The batch script generation method of the data extraction-conversion-loading ETL provided in the present disclosure is explained below.
For example, as shown in fig. 5, a script type, a database type and an instantiation document may be input, a script generation template matched with the script type may be obtained according to different script types, data item information in the instantiation document may be read, then the instantiation document may be traversed according to a group, data item information of each group may be obtained, so as to read all data item information in the same group, that is, identify a tag type corresponding to the same group, obtain a sub-tag type included in the tag type according to the tag type, may logically process the sub-tag type, determine an extended sub-tag type according to an extended type tag and a sub-tag type according to the database type of the sub-tag type, process the extended sub-tag type according to the extended sub-tag type, assemble data item information of the same group, generate an SQL statement, generate a template and a processing statement through the script after the traversing is finished, and generate an ETL batch processing script.
In summary, in the batch processing script generation method for extracting, converting and loading ETL of the data provided by the embodiments of the present disclosure, in the instantiation document, by defining the tag system of the ETL script logic, automatic generation of ETL script codes can be supported, and definition of the database type is performed, the extended type tag can be used for marking the processing logic of different database types in the ETL for the database type, a set of instantiation document can be supported, the ETL processing script of the corresponding database type can be generated as required, and multiple database types can be supported only by maintaining a set of instantiation document, so that the requirement of rapid adaptation due to selection of different database types when a customer implements a project can be met, and meanwhile, comprehensive application can be performed on ETL metadata, namely, structured ETL metadata information is actually formed by performing structural expression on the ETL batch processing process, and further application can be performed later, such as analysis of blood edge relationship of data processing, analysis of dependency of data job scheduling, and the like.
Fig. 6 is a block diagram of a batch script generating apparatus of data extraction-conversion-loading ETL according to a first embodiment of the present disclosure.
As shown in fig. 6, a batch script generating apparatus 600 of a data extraction-conversion-loading ETL of an embodiment of the present disclosure includes: a first acquisition module 601, a second acquisition module 602, a processing module 603, and a generation module 604.
A first obtaining module 601, configured to abstract a general structure of the ETL batch program, to obtain an instantiation document of the structured data;
a second obtaining module 602 configured to receive an input target script type and obtain a target script generation template matched with the target script type;
the processing module 603 is configured to process sentences according to the data item information in the instantiation document to obtain a target structured query language SQL sentence;
the generating module 604 is configured to populate the target script template with the target SQL statement to generate an ETL batch script.
In one embodiment of the present disclosure, the processing module 603 is further configured to: traversing the instantiation document according to groups to acquire data item information of each group; and carrying out statement processing on the data item information of the same group to obtain the target SQL statements of each group.
In one embodiment of the present disclosure, the processing module 603 is further configured to: acquiring a tag type from the data item information, and determining an SQL sentence structure according to the tag type; and acquiring the sub-tag type included in the tag type, and writing other data items into the SQL statement structure according to the sub-tag type to generate the target SQL statement. In one embodiment of the present disclosure,
in one embodiment of the present disclosure, the processing module 603 is further configured to: acquiring SQL statement fragments corresponding to the sub-tag types and data items of the sub-tag types; writing data items of the sub-tag types in SQL statement fragments corresponding to the sub-tag types; and combining the SQL statement fragments according to the logical relation of the sub-label types to generate the SQL statement.
In one embodiment of the present disclosure, the processing module 603 is further configured to: receiving an input target database type;
determining whether the instantiated document comprises an extension type label according to the target database type; and if the extension type label exists, determining a target SQL statement structure matched with the target database type from SQL statement structures of a plurality of databases according to the extension type label.
In one embodiment of the present disclosure, the processing module 603 is further configured to: determining an extended sub-label type according to the extended type label and the sub-label type; and writing data items into the target SQL statement structure according to the extension sub-label type, and generating the target SQL statement.
In one embodiment of the present disclosure, the generating module 604 is further configured to: acquiring condition control labels among groups from the instantiation document; generating execution logic sentences among the target SQL sentences of each group according to the condition control labels; and combining the target SQL sentences of each group according to the execution logic sentences, filling the target SQL sentences into the target script template, and generating the ETL batch processing script.
In one embodiment of the present disclosure, the apparatus 600 is further configured to: extracting the description information of the instantiation document, and generating annotation information of the ETL batch script according to the description information; and annotating the file name and the header file of the ETL batch script based on the annotation information.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
In summary, the batch script generating device for extracting, converting and loading data provided by the embodiment of the present disclosure abstracts a general structure of an ETL batch program, obtains an instantiation document of structured data, receives an input target script type, obtains a target script generating template matched with the target script type, processes sentences according to data item information in the instantiation document, obtains a target structured query language SQL sentence, and fills the target script template with the target SQL sentence to generate an ETL batch script. Therefore, the batch processing script generation method for the data extraction-conversion-loading ETL can support the automatic generation of ETL script codes by acquiring the instantiation document of the structured data and defining the label system of the ETL script logic by the instantiation document, reduce the complexity of generating the ETL batch processing script processing logic, improve the readability of a program and reduce the development threshold, and simultaneously the ETL script can support various script types and database types.
Fig. 7 is a block diagram of an electronic device, according to an example embodiment.
As shown in fig. 7, the electronic device 700 includes:
A memory 710 and a processor 720, a bus 730 connecting the different components (including the memory 710 and the processor 720), the memory 710 storing a computer program which when executed by the processor 720 implements the batch script generation method of data extraction-conversion-loading ETL of the first aspect of the present disclosure.
Bus 730 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 700 typically includes a variety of electronic device readable media. Such media can be any available media that is accessible by electronic device 700 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 710 may also include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 740 and/or cache memory 750. Electronic device 700 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 760 may be used to read from or write to non-removable, non-volatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard disk drive"). Although not shown in fig. 7, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 730 through one or more data medium interfaces. Memory 710 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the various embodiments of the disclosure.
A program/utility 780 having a set (at least one) of program modules 770 may be stored in, for example, memory 710, such program modules 770 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 770 typically carry out the functions and/or methods of the embodiments described in this disclosure.
The electronic device 700 may also communicate with one or more external devices 790 (e.g., keyboard, pointing device, display 791, etc.), one or more devices that enable a user to interact with the electronic device 700, and/or any device (e.g., network card, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 792. Also, the electronic device 700 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 793. As shown in fig. 7, the network adapter 793 communicates with other modules of the electronic device 700 over the bus 730. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 700, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processor 720 executes various functional applications and data processing by running programs stored in the memory 710.
It should be noted that, the implementation process and the technical principle of the electronic device in this embodiment refer to the foregoing explanation of the batch script generation method of data extraction-conversion-loading ETL in the embodiment of the disclosure, and are not repeated herein.
In summary, the electronic device provided by the embodiment of the present disclosure may execute the batch script generating method of data extraction-conversion-loading ETL as described above, abstract a general structure of an ETL batch program, obtain an instantiation document of structured data, receive an input target script type, obtain a target script generating template matched with the target script type, process a sentence according to data item information in the instantiation document to obtain a target structured query language SQL sentence, and fill the target script template with the target SQL sentence to generate an ETL batch script. Therefore, the batch processing script generation method for the data extraction-conversion-loading ETL can support the automatic generation of ETL script codes by acquiring the instantiation document of the structured data and defining the label system of the ETL script logic by the instantiation document, reduce the complexity of generating the ETL batch processing script processing logic, improve the readability of a program and reduce the development threshold, and simultaneously the ETL script can support various script types and database types.
To achieve the above embodiments, the present disclosure also proposes a computer-readable storage medium.
Wherein the instructions in the computer-readable storage medium, when executed by a processor of the electronic device, enable the electronic device to perform the batch script generation method of data extraction-transformation-loading ETL as previously described. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
To achieve the above embodiments, the present disclosure further provides a computer program product, including a computer program, wherein the computer program when executed by a processor implements the batch script generation method of data extraction-transformation-loading ETL according to the first aspect.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (19)

1. A batch script generation method for data extraction-transformation-loading ETL, the method comprising:
abstracting a general structure of the ETL batch processing program to obtain an instantiation document of the structured data;
receiving an input target script type, and acquiring a target script generation template matched with the target script type;
performing sentence processing according to the data item information in the instantiation document to obtain a target structured query language SQL sentence;
and filling the target script template by using the target SQL statement to generate an ETL batch script.
2. The method according to claim 1, wherein the sentence processing according to the data item information in the instantiated document, to obtain the target structured query language SQL sentence, includes:
traversing the instantiation document according to groups to acquire data item information of each group;
and carrying out statement processing on the data item information of the same group to obtain the target SQL statements of each group.
3. The method according to claim 2, wherein said processing the same set of data item information into said target SQL statements for each set comprises:
acquiring a tag type from the data item information, and determining an SQL sentence structure according to the tag type;
and acquiring the sub-tag type included in the tag type, and writing other data items into the SQL statement structure according to the sub-tag type to generate the target SQL statement.
4. A method according to claim 3, wherein said writing other data items in said SQ L statement structure according to said sub-tag type to generate said target SQL statement comprises:
acquiring SQL statement fragments corresponding to the sub-tag types and data items of the sub-tag types;
writing data items of the sub-tag types in SQL statement fragments corresponding to the sub-tag types;
and combining the SQL statement fragments according to the logical relation of the sub-label types to generate the SQL statement.
5. A method according to claim 3, wherein said determining an SQL statement structure from said tag type comprises:
Receiving an input target database type;
determining whether the instantiated document comprises an extension type label according to the target database type;
and if the extension type label exists, determining a target SQL statement structure matched with the target database type from SQL statement structures of a plurality of databases according to the extension type label.
6. The method of claim 5, wherein the obtaining the sub-tag type included in the tag type and writing the data item in the SQL statement structure according to the sub-tag type, generating the target SQL statement, comprises:
determining an extended sub-label type according to the extended type label and the sub-label type;
and writing data items into the target SQL statement structure according to the extension sub-label type, and generating the target SQL statement.
7. The method of any of claims 2-6, wherein the populating the target script template with the target SQL statement to generate an ETL batch script comprises:
acquiring condition control labels among groups from the instantiation document;
generating execution logic sentences among the target SQL sentences of each group according to the condition control labels;
And combining the target SQL sentences of each group according to the execution logic sentences, filling the target SQL sentences into the target script template, and generating the ETL batch processing script.
8. The method according to any one of claims 2-6, further comprising:
extracting the description information of the instantiation document, and generating annotation information of the ETL batch script according to the description information;
and annotating the file name and the header file of the ETL batch script based on the annotation information.
9. A batch script generating apparatus for data extraction-transformation-loading ETL, comprising:
the first acquisition module is configured to abstract the general structure of the ETL batch processing program and acquire an instantiation document of the structured data;
the second acquisition module is configured to receive an input target script type and acquire a target script generation template matched with the target script type;
the processing module is configured to process sentences according to the data item information in the instantiation document to obtain a target structured query language SQL sentence;
and the generating module is configured to fill the target script template by using the target SQL statement to generate an ET L batch script.
10. The apparatus of claim 9, wherein the processing module is further configured to:
traversing the instantiation document according to groups to acquire data item information of each group;
and carrying out statement processing on the data item information of the same group to obtain the target SQL statements of each group.
11. The apparatus of claim 10, wherein the processing module is further configured to:
acquiring a tag type from the data item information, and determining an SQL sentence structure according to the tag type;
and acquiring the sub-tag type included in the tag type, and writing other data items into the SQL statement structure according to the sub-tag type to generate the target SQL statement.
12. The apparatus of claim 11, wherein the processing module is further configured to:
acquiring SQL statement fragments corresponding to the sub-tag types and data items of the sub-tag types;
writing data items of the sub-tag types in SQL statement fragments corresponding to the sub-tag types;
and combining the SQL statement fragments according to the logical relation of the sub-label types to generate the SQL statement.
13. The apparatus of claim 11, wherein the processing module is further configured to:
receiving an input target database type;
determining whether the instantiated document comprises an extension type label according to the target database type;
and if the extension type label exists, determining a target SQL statement structure matched with the target database type from SQL statement structures of a plurality of databases according to the extension type label.
14. The apparatus of claim 13, wherein the processing module is further configured to:
determining an extended sub-label type according to the extended type label and the sub-label type;
and writing data items into the target SQL statement structure according to the extension sub-label type, and generating the target SQL statement.
15. The apparatus of claims 10-14, wherein the generating module is further configured to:
acquiring condition control labels among groups from the instantiation document;
generating execution logic sentences among the target SQL sentences of each group according to the condition control labels;
and combining the target SQL sentences of each group according to the execution logic sentences, filling the target SQL sentences into the target script template, and generating the ETL batch processing script.
16. The apparatus of claims 10-14, wherein the apparatus is further configured to:
extracting the description information of the instantiation document, and generating annotation information of the ETL batch script according to the description information;
and annotating the file name and the header file of the ETL batch script based on the annotation information.
17. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the batch script generation method of data extraction-transformation-loading ETL of any one of claims 1-8.
18. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the batch script generation method of data extraction-conversion-loading ETL of any of claims 1-8.
19. A computer program product comprising a computer program, characterized in that the computer program when executed by a processor implements the batch script generation method of data extraction-transformation-loading ETL of any one of claims 1-8.
CN202310014437.XA 2023-01-05 2023-01-05 Batch script generation method and device for data extraction-conversion-loading ETL Pending CN116186065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310014437.XA CN116186065A (en) 2023-01-05 2023-01-05 Batch script generation method and device for data extraction-conversion-loading ETL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310014437.XA CN116186065A (en) 2023-01-05 2023-01-05 Batch script generation method and device for data extraction-conversion-loading ETL

Publications (1)

Publication Number Publication Date
CN116186065A true CN116186065A (en) 2023-05-30

Family

ID=86435851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310014437.XA Pending CN116186065A (en) 2023-01-05 2023-01-05 Batch script generation method and device for data extraction-conversion-loading ETL

Country Status (1)

Country Link
CN (1) CN116186065A (en)

Similar Documents

Publication Publication Date Title
CN111488174B (en) Method and device for generating application program interface document, computer equipment and medium
CN107221328B (en) Method and device for positioning modification source, computer equipment and readable medium
CN110347598B (en) Test script generation method and device, server and storage medium
CN108090043B (en) Error correction report processing method and device based on artificial intelligence and readable medium
CN109471851B (en) Data processing method, device, server and storage medium
US9754083B2 (en) Automatic creation of clinical study reports
CN112732567B (en) Mock data testing method and device based on ip, electronic equipment and storage medium
CN111753140A (en) XML file parsing method and related equipment
CN113094625B (en) Page element positioning method and device, electronic equipment and storage medium
US9600562B2 (en) Method and apparatus for asynchronized de-serialization of E-R model in a huge data trunk
CN113760894A (en) Data calling method and device, electronic equipment and storage medium
CN110515954B (en) Data processing method, device, equipment and storage medium
CN116089535A (en) Data synchronization method, device, equipment and storage medium
CN114385891B (en) Data searching method and device, electronic equipment and storage medium
CN116186065A (en) Batch script generation method and device for data extraction-conversion-loading ETL
CN113792138B (en) Report generation method and device, electronic equipment and storage medium
CN115391322A (en) Data checking method, device, equipment, storage medium and program product
CN115114297A (en) Data lightweight storage and search method and device, electronic equipment and storage medium
CN110515653B (en) Document generation method and device, electronic equipment and computer readable storage medium
CN111401009B (en) Digital expression character recognition conversion method, device, server and storage medium
CN113254455A (en) Dynamic configuration method and device of database, computer equipment and storage medium
CN110750569A (en) Data extraction method, device, equipment and storage medium
CN111353308A (en) Named entity recognition method, device, server and storage medium
CN108932326B (en) Instance extension method, device, equipment and medium
US7401088B2 (en) Preserving user code in OTT-generated classes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination