CN111597243A - Data warehouse-based abstract data loading method and system - Google Patents

Data warehouse-based abstract data loading method and system Download PDF

Info

Publication number
CN111597243A
CN111597243A CN202010411049.1A CN202010411049A CN111597243A CN 111597243 A CN111597243 A CN 111597243A CN 202010411049 A CN202010411049 A CN 202010411049A CN 111597243 A CN111597243 A CN 111597243A
Authority
CN
China
Prior art keywords
loading
script
actual service
data
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010411049.1A
Other languages
Chinese (zh)
Other versions
CN111597243B (en
Inventor
李湘玲
聂冬琴
唐一帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202010411049.1A priority Critical patent/CN111597243B/en
Publication of CN111597243A publication Critical patent/CN111597243A/en
Application granted granted Critical
Publication of CN111597243B publication Critical patent/CN111597243B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention provides a method and a system for abstract data loading based on a data warehouse. The method comprises the following steps: preprocessing the operation script to obtain an actual service script; extracting the incidence relation of a target table, a source table field and insertion contents used by statement analysis in the actual service script, and extracting predicate information in the actual service script; determining the incidence relation between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information; and loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm. The invention overcomes the unordered management of a large number of manual loading programs of the data warehouse on different technical platforms, overcomes the defect that the logical correspondence of the data warehouse model depends on manual searching and cannot be accurately obtained, improves the development timeliness of the loading programs, reduces the labor cost, improves the analysis timeliness of the positioning problems and improves the analysis accuracy.

Description

Data warehouse-based abstract data loading method and system
Technical Field
The invention relates to the technical field of data loading, in particular to a method and a system for abstract data loading based on a data warehouse.
Background
The big data technology is developed rapidly, database software is updated continuously, data of a data warehouse is frequently switched among a SAS platform, a Teradata platform, a Hadoop platform and a GaussDB platform, so that in the migration and conversion process before operation, in order to adapt to the characteristics of a new technical platform, the situation of writing and modifying a large number of loading programs manually is inevitably generated, although a uniform template loading tool is formed at the later stage of the forming of the technical platform, the maintainability of the programs is poor, a large number of manual loading programs at the early stage are subjected to single analysis and modification by depending on manual searching programs, and uniform planning and management cannot be performed on the problems of homogeneity such as loading standard, technical platform characteristics and the like.
Disclosure of Invention
In order to solve the above problem, an embodiment of the present invention provides a method for abstract data loading based on a data warehouse, where the method includes:
preprocessing the operation script to obtain an actual service script;
extracting the incidence relation among a target table, a source table field and insertion contents used by statement analysis in the actual service script, and extracting predicate information in the actual service script;
determining the incidence relation between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information;
and loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
Optionally, in an embodiment of the present invention, the preprocessing the job script to obtain the actual service script includes: reading the content of the operation script into a variable, and removing unnecessary information in the variable by using a regular expression to obtain an actual service script only containing an actual service function.
Optionally, in an embodiment of the present invention, the extracting an association relationship between a target table, a source table field, and insertion content used for statement analysis in the actual service script, and extracting predicate information in the actual service script includes: and extracting the incidence relation of a target table, source table fields and insertion contents used by statement analysis in the actual service script and SQL statement predicate information in the actual service script one by using a regular expression.
Optionally, in an embodiment of the present invention, the loading data of the input job according to the association between the target table, the source table field, and the insertion content, and the association between the loader and the loading algorithm includes: according to the field level association relationship between the target table and the inserted content and the field level association relationship between the target table and the field of the source table, obtaining a templated loading program by utilizing the job name, the table name, the loading algorithm and the field level logic comparison relationship of the input job; and carrying out data loading by utilizing the templated loader.
The embodiment of the present invention further provides a system for abstract data loading based on a data warehouse, where the system includes:
the script processing module is used for preprocessing the operation script to obtain an actual service script;
the script analysis module is used for extracting the incidence relation among a target table, a source table field and insertion contents used by statement analysis in the actual service script and extracting predicate information in the actual service script; determining the incidence relation between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information;
and the data loading module is used for loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
Optionally, in an embodiment of the present invention, the script processing module includes: and the sentence extraction unit is used for reading the content of the operation script into a variable, and removing unnecessary information in the variable by using a regular expression to obtain an actual service script only containing an actual service function.
Optionally, in an embodiment of the present invention, the script parsing module includes: and the script analysis unit is used for extracting the incidence relation among a target table, a source table field and insertion contents used for statement analysis in the actual service script and SQL statement predicate information in the actual service script one by using the regular expression.
Optionally, in an embodiment of the present invention, the data loading module includes: the loading program unit is used for obtaining a templated loading program by utilizing the job name, the table name, the loading algorithm and the field-level logic contrast relation of the input job according to the field-level correlation relation between the target table and the inserted content and the field-level correlation relation between the target table and the field of the source table; and the data loading unit is used for loading data by utilizing the templated loader.
Optionally, in an embodiment of the present invention, the system further includes: and the source data module is used for storing the operation script of the application system.
Optionally, in an embodiment of the present invention, the system further includes: and the loading algorithm standard module is used for storing the loading algorithm standard information.
Optionally, in an embodiment of the present invention, the system further includes: and the storage module is used for storing the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the following steps are implemented:
preprocessing the operation script to obtain an actual service script;
extracting the incidence relation among a target table, a source table field and insertion contents used by statement analysis in the actual service script, and extracting predicate information in the actual service script;
determining the incidence relation between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information;
and loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:
preprocessing the operation script to obtain an actual service script;
extracting the incidence relation among a target table, a source table field and insertion contents used by statement analysis in the actual service script, and extracting predicate information in the actual service script;
determining the incidence relation between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information;
and loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
The invention overcomes the unordered management of a large number of manual loading programs of the data warehouse on different technical platforms, overcomes the defect that the logical correspondence of the data warehouse model depends on manual searching and cannot be accurately obtained, improves the development timeliness of the loading programs, reduces the labor cost, improves the analysis timeliness of the positioning problems and improves the analysis accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for abstracting data loading based on a data warehouse, according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a system for abstract data loading based on a data warehouse according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a script file structure of a source data application system according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating an SQL statement and a corresponding extraction result in an actual service script according to an embodiment of the present invention;
FIG. 5 is a flowchart of a procedure for extracting association relation according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method and a system for abstract data loading based on a data warehouse.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The data warehouse is a theme-oriented, integrated, relatively stable data set reflecting historical changes, and provides data support for business analysis and management decisions. To embody these characteristics, as shown in table 1, the loading operation of the subject model of the data warehouse includes the following main data algorithms (note: the algorithm is a general algorithm for modeling the data warehouse in the industry at present).
TABLE 1
Figure BDA0002493250000000041
In recent years, large data technology is rapidly developed, database software is continuously updated, data of a data warehouse is frequently switched among an SAS platform, a Teradata platform, a Hadoop platform and a GaussDB platform, so that the situations of writing and modifying a loading program manually in large quantity are inevitably generated in the migration and conversion process in the early stage of operation in order to adapt to the characteristics of a new technical platform.
The invention relates to the technical field of financial science and technology information of big data platforms, banks and the Internet, in particular to a method for forming a tool loading template by abstracting a data warehouse data loading manual program. Fig. 1 is a flowchart illustrating a method for abstracting data loading based on a data warehouse according to an embodiment of the present invention, where the method includes:
and step S1, preprocessing the operation script to obtain an actual service script. The operation script is preprocessed, the content of the file operation script is read into variables, information including line comments, section comments, variables and prompts is removed through regular expression matching, and only SQL statement content including actual business functions is left in the variables, is exported into a file and is stored in a specific position for use in extraction relation. The specific position is not unique, only a file system position is appointed, and a proper space is provided for storing the file.
And step S2, extracting the incidence relation among a target table, a source table field and insertion content used by statement analysis in the actual business script, and extracting predicate information in the actual business script. The method comprises the steps of extracting an incidence relation among a target table, an intermediate temporary table, a sub-query, fields of each source table and insertion contents (including various field deformation operations) used for statement analysis in a script and SQL statement predicate information in the script one by one through a regular expression.
And step S3, determining the association relationship between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information.
And step S4, loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm. The template loading program can be generated by inputting information such as a corresponding job name, a table name, a loading algorithm and a field-level logic comparison relation of related jobs, is used for loading data, and is convenient for developers to perform program maintenance and field-level logic analysis.
As an embodiment of the present invention, the preprocessing the job script to obtain the actual service script includes: reading the content of the operation script into a variable, and removing unnecessary information in the variable by using a regular expression to obtain an actual service script only containing an actual service function.
In this embodiment, the preprocessing job script may specifically be: reading the content of the file operation script into a variable, removing information including line annotation, segment annotation, variable and prompt class through regular expression matching, and only remaining SQL statement content including actual business functions in the variable and exporting the SQL statement content into a file.
As an embodiment of the present invention, extracting an association relationship between a target table, a source table field, and insertion content used for statement analysis in an actual service script, and extracting predicate information in the actual service script includes: and extracting the incidence relation of a target table, a source table field and inserted contents used by statement analysis in the actual service script and SQL statement predicate information in the actual service script one by using the regular expression.
As an embodiment of the present invention, the data loading of the input job according to the association relationship among the target table, the source table field, and the insertion content, and the association relationship between the loader and the loading algorithm includes: according to the field level association relationship between the target table and the inserted content and the field level association relationship between the target table and the field of the source table, obtaining a templated loading program by utilizing the job name, the table name, the loading algorithm and the field level logic comparison relationship of the input job; and carrying out data loading by utilizing a templated loading program.
In particular, the implementation of the method of the present invention can be seen in the implementation of a system based on abstract data loading of a data warehouse.
The invention overcomes the unordered management of a large number of manual loading programs of the data warehouse on different technical platforms, and overcomes the defect that the field-level logic corresponding relation of the data warehouse model depends on manual searching and cannot be accurately obtained. The invention can extract the algorithm information and the field-level logic relationship information contained in the manual loading program of the data warehouse model and store the information in the physical table so as to carry out uniform use and management. Therefore, the data warehouse manual loading program can be accurately and quickly abstracted into loading program elements, all the manual loading programs are finally converted into a template toolization loading mode, the development timeliness of the loading programs is improved, the labor cost is reduced, the positioning problem analysis timeliness is improved, and the analysis accuracy is improved.
Fig. 2 is a schematic structural diagram of a system for abstracting data loading based on a data warehouse according to an embodiment of the present invention, where the system includes:
and the script processing module 2 is used for preprocessing the operation script to obtain the actual service script. The operation script is preprocessed, the content of the file operation script is read into variables, information including line comments, section comments, variables and prompts is removed through regular expression matching, and only SQL statement content including actual business functions is left in the variables, is exported into a file and is stored in a specific position for use in extraction relation. The specific position is not unique, only a file system position is appointed, and a proper space is provided for storing the file.
The script analysis module 4 is configured to extract an association relationship between a target table, a source table field, and insertion content used for statement analysis in the actual service script, and extract predicate information in the actual service script; and determining the association relationship between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information. The method comprises the steps of extracting an incidence relation among a target table, an intermediate temporary table, a sub-query, each source table field and insertion content (including various field deformation operations) used in statement analysis in a script and sql statement predicate information in the script one by one through a regular expression.
And the data loading module 6 is used for loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm. The template loading program can be generated by inputting information such as a corresponding job name, a table name, a loading algorithm and a field-level logic comparison relation of related jobs, is used for loading data, and is convenient for developers to perform program maintenance and field-level logic analysis.
As an embodiment of the invention, the script processing module comprises: and the sentence extraction unit is used for reading the content of the operation script into the variable, removing unnecessary information in the variable by using the regular expression and obtaining the actual service script only containing the actual service function.
In this embodiment, the preprocessing job script may specifically be: reading the content of the file operation script into a variable, removing information including line annotation, segment annotation, variable and prompt class through regular expression matching, and only remaining SQL statement content including actual business functions in the variable and exporting the SQL statement content into a file.
As an embodiment of the invention, the script parsing module comprises: and the script analysis unit is used for extracting the incidence relation among a target table, a source table field and insertion contents used for statement analysis in the actual service script and SQL statement predicate information in the actual service script one by using the regular expression.
As an embodiment of the present invention, the data loading module includes: the loading program unit is used for obtaining a templated loading program by utilizing the job name, the table name, the loading algorithm and the field-level logic contrast relation of the input job according to the field-level correlation relation between the target table and the inserted content and the field-level correlation relation between the target table and the field of the source table; and the data loading unit is used for loading data by utilizing the templated loader.
As an embodiment of the present invention, the system further includes a source data module 1 for storing job scripts of the application system. The source data module comprises a plurality of application systems, and scripts of the application systems are stored in a specific directory of the file system and are distinguished by system names.
As an embodiment of the invention, the system further comprises a loading algorithm standard module 3 for storing loading algorithm standard information. For example, the F1 algorithm in table 1 is characterized in that after the operation of clearing the entire table data of the target table, the data is inserted into the target table operation, and the two operations are in sequence, which can be specifically seen in the description of the algorithm in table 1.
As an embodiment of the invention, the system further comprises a storage module 5 for storing the association relationship among the target table, the source table field and the insertion content, and the association relationship between the loader and the loading algorithm.
In an embodiment of the present invention, as shown in fig. 2, the source data module 1 includes a plurality of application systems, and the scripts of the application systems are stored in a specific directory of the file system and are distinguished by system names. Fig. 3 is a schematic diagram illustrating a script file structure of the source data application system.
The system operation SQL sentence extraction module, namely the script processing module 2, is a preprocessing operation of the source data application system operation script, reads the content of the file operation script into the variable, removes the information including the line annotation, the section annotation, the variable and the prompt class through the regular expression matching, and finally only the SQL sentence content containing the actual business function is remained in the variable, is exported into a file and is stored in a specific position for the script analysis module 4 to extract the relation for use. The specific position is not unique, only a file system position is appointed, and a proper space is provided for storing the file.
And a loading algorithm standard information module, namely a loading algorithm standard module 3 is used for extracting predicate characteristic information standards corresponding to the algorithm. For example, the F1 algorithm is characterized in that after the data operation of the whole target table is cleared, the data is inserted into the target table operation, and the two operations are in sequence. See in particular the algorithmic description in table 1.
The theoretical possibility algorithm that the algorithm can reach is as follows:
Figure BDA0002493250000000081
wherein n is the number of corresponding syntax predicates.
The system operation script SQL parsing module, that is, the script parsing module 4, extracts, through the regular expression, the association relationship between the target table, the intermediate temporary table, the sub-queries, the fields of each source table and the insertion content (including various field transformation operations) used for parsing the statements in the script, and the SQL statement predicate information in the script one by one.
FIG. 4 shows the SQL statement and the corresponding extraction result in the actual script. This is a few target table operation statements (element 41 in the figure) in the T00_ APP _ FIELD _ CD _ H _ ZG0_ a job script, with the extraction result being element 42 in the figure. Unit 42 partially shows the field-level logical contrast between the target table and the inserted contents in the statements and the relationship between the predicate orders corresponding to the statements of the job program.
The script analysis module 4 extracts the field-level logic comparison relationship between the target table field and the inserted content and the corresponding relationship between the program operation and the algorithm. The specific logic formula of the analysis relation of the algorithm is as follows:
if
Figure BDA0002493250000000082
then
JOB is Algorithmi
else
JOB is UNKNOW-Algorithm
Wherein, JOB (x, y, z) respectively represents statement sequence, predicate and condition in the operation.
The extraction schematic diagram of the scanning SQL statement analysis relationship after loop extraction and final finding of the association relationship is shown in fig. 5.
The field-level logic comparison relationship between the target table field and the inserted content and the corresponding relationship between the program operation and the algorithm are extracted and then stored in the model and table in the database, i.e. the storage module 5, and the complete relationship extraction program flow chart is shown in fig. 5.
And step.1, processing the manually loaded programs one by one, judging whether the analysis tasks of all the programs are finished, if so, exiting the programs, and otherwise, acquiring the next manually loaded program and starting analysis.
And step.2, judging whether the SQL statement is the last SQL statement in the manual loader or not, if so, completing the analysis of the loader, acquiring the association relation and the predicate sequence relation and storing the association relation and the predicate sequence relation in the array so as to carry out the next processing, and if not, continuously analyzing the next statement until the analysis is completed.
And step 3, extracting the association relationship between the fields of the target table and the fields of the source table from the field association relationship for each field in the target table, if the last field is processed, entering the next step, and if not, continuously extracting the association relationship between the fields of the target table and the fields of the source table until the last field is extracted.
And step 4, after the extracted field incidence relation between the target table and the source table is obtained, the field level logic comparison relation between the target field and the field of the source table is obtained through regular expression matching, and finally the field level logic comparison relation between the target field and the field of the source table and the relation between loading operation and loading algorithm are stored in a database table, so that a back exit program is completed.
The model loads the template program, that is, the data loading module 6 inputs information such as job names, table names, loading algorithms and field-level logic comparison relations corresponding to relevant jobs, so as to generate the templated loading program, which is used for data loading, and is convenient for developers to perform program maintenance and field-level processing logic analysis.
The invention overcomes the unordered management of a large number of manual loading programs of the data warehouse on different technical platforms, and overcomes the defect that the field-level logic corresponding relation of the data warehouse model depends on manual searching and cannot be accurately obtained. The invention can extract the algorithm information and the field-level logic relationship information contained in the manual loading program of the data warehouse model and store the information in the physical table so as to carry out uniform use and management. Therefore, the data warehouse manual loading program can be accurately and quickly abstracted into loading program elements, all the manual loading programs are finally converted into a template toolization loading mode, the development timeliness of the loading programs is improved, the labor cost is reduced, the positioning problem analysis timeliness is improved, and the analysis accuracy is improved.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the following steps are implemented:
preprocessing the operation script to obtain an actual service script;
extracting the incidence relation among a target table, a source table field and insertion contents used by statement analysis in the actual service script, and extracting predicate information in the actual service script;
determining the incidence relation between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information;
and loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:
preprocessing the operation script to obtain an actual service script;
extracting the incidence relation among a target table, a source table field and insertion contents used by statement analysis in the actual service script, and extracting predicate information in the actual service script;
determining the incidence relation between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information;
and loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
The invention also provides the computer equipment and a computer readable storage medium based on the same application concept as the data warehouse-based abstract data loading method. Since the principle of solving the problem of the computer device and the computer-readable storage medium is similar to that of a method based on abstract data loading of a data warehouse, the implementation of the computer device and the computer-readable storage medium can refer to the implementation of the method based on abstract data loading of the data warehouse, and repeated parts are not described again.
The invention overcomes the unordered management of a large number of manual loading programs of the data warehouse on different technical platforms, and overcomes the defect that the field-level logic corresponding relation of the data warehouse model depends on manual searching and cannot be accurately obtained. The invention can extract the algorithm information and the field-level logic relationship information contained in the manual loading program of the data warehouse model and store the information in the physical table so as to carry out uniform use and management. Therefore, the data warehouse manual loading program can be accurately and quickly abstracted into loading program elements, all the manual loading programs are finally converted into a template toolization loading mode, the development timeliness of the loading programs is improved, the labor cost is reduced, the positioning problem analysis timeliness is improved, and the analysis accuracy is improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (13)

1. A method for abstracting data loading based on a data warehouse, the method comprising:
preprocessing the operation script to obtain an actual service script;
extracting the incidence relation among a target table, a source table field and insertion contents used by statement analysis in the actual service script, and extracting predicate information in the actual service script;
determining the incidence relation between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information;
and loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
2. The method of claim 1, wherein preprocessing the job script to obtain the actual service script comprises:
reading the content of the operation script into a variable, and removing unnecessary information in the variable by using a regular expression to obtain an actual service script only containing an actual service function.
3. The method of claim 1, wherein the extracting the association relationship among the target table, the source table field, and the inserted content used in the parsing of the statement in the actual service script, and the extracting the predicate information in the actual service script comprises:
and extracting the incidence relation of a target table, source table fields and insertion contents used by statement analysis in the actual service script and SQL statement predicate information in the actual service script one by using a regular expression.
4. The method of claim 1, wherein the loading data of the input job according to the association relationship among the target table, the source table field and the insertion content and the association relationship between the loader and the loading algorithm comprises:
according to the field level association relationship between the target table and the inserted content and the field level association relationship between the target table and the field of the source table, obtaining a templated loading program by utilizing the job name, the table name, the loading algorithm and the field level logic comparison relationship of the input job;
and carrying out data loading by utilizing the templated loader.
5. A system for abstracting data loading based on a data warehouse, the system comprising:
the script processing module is used for preprocessing the operation script to obtain an actual service script;
the script analysis module is used for extracting the incidence relation among a target table, a source table field and insertion contents used by statement analysis in the actual service script and extracting predicate information in the actual service script; determining the incidence relation between the loading program and the loading algorithm by using the predicate information and the loading algorithm standard information;
and the data loading module is used for loading data of the input operation according to the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
6. The system of claim 5, wherein the script processing module comprises:
and the sentence extraction unit is used for reading the content of the operation script into a variable, and removing unnecessary information in the variable by using a regular expression to obtain an actual service script only containing an actual service function.
7. The system of claim 5, wherein the script parsing module comprises:
and the script analysis unit is used for extracting the incidence relation among a target table, a source table field and insertion contents used for statement analysis in the actual service script and SQL statement predicate information in the actual service script one by using the regular expression.
8. The system of claim 5, wherein the data loading module comprises:
the loading program unit is used for obtaining a templated loading program by utilizing the job name, the table name, the loading algorithm and the field-level logic contrast relation of the input job according to the field-level correlation relation between the target table and the inserted content and the field-level correlation relation between the target table and the field of the source table;
and the data loading unit is used for loading data by utilizing the templated loader.
9. The system of claim 5, further comprising: and the source data module is used for storing the operation script of the application system.
10. The system of claim 5, further comprising: and the loading algorithm standard module is used for storing the loading algorithm standard information.
11. The system of claim 5, further comprising: and the storage module is used for storing the incidence relation among the target table, the source table field and the inserted content and the incidence relation between the loading program and the loading algorithm.
12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 4 when executing the computer program.
13. A computer-readable storage medium, on which a computer program is stored, characterized in that the processor, when executing the computer program, implements the method of any of claims 1 to 4.
CN202010411049.1A 2020-05-15 2020-05-15 Method and system for abstract data loading based on data warehouse Active CN111597243B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010411049.1A CN111597243B (en) 2020-05-15 2020-05-15 Method and system for abstract data loading based on data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010411049.1A CN111597243B (en) 2020-05-15 2020-05-15 Method and system for abstract data loading based on data warehouse

Publications (2)

Publication Number Publication Date
CN111597243A true CN111597243A (en) 2020-08-28
CN111597243B CN111597243B (en) 2023-09-15

Family

ID=72192176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010411049.1A Active CN111597243B (en) 2020-05-15 2020-05-15 Method and system for abstract data loading based on data warehouse

Country Status (1)

Country Link
CN (1) CN111597243B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434115A (en) * 2020-11-23 2021-03-02 京东数字科技控股股份有限公司 Data processing method and device, electronic equipment and readable storage medium
CN112612783A (en) * 2020-12-22 2021-04-06 航天信息股份有限公司 Method for realizing cross-platform data sharing
CN112685325A (en) * 2021-01-22 2021-04-20 中信银行股份有限公司 ETL software research and development test management method and system
CN113051176A (en) * 2021-04-20 2021-06-29 中国工商银行股份有限公司 Method and device for processing automatic test data, electronic equipment and storage medium
CN114003230A (en) * 2021-09-28 2022-02-01 厦门国际银行股份有限公司 SQL script rapid compiling method and system
CN115904487A (en) * 2022-11-29 2023-04-04 广发银行股份有限公司 Analytical data aperture management method, system, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6151608A (en) * 1998-04-07 2000-11-21 Crystallize, Inc. Method and system for migrating data
US20110055147A1 (en) * 2009-08-25 2011-03-03 International Business Machines Corporation Generating extract, transform, and load (etl) jobs for loading data incrementally
US20120239612A1 (en) * 2011-01-25 2012-09-20 Muthian George User defined functions for data loading
CN103425762A (en) * 2013-08-05 2013-12-04 南京邮电大学 Telecom operator mass data processing method based on Hadoop platform
US20170249361A1 (en) * 2016-02-29 2017-08-31 International Business Machines Corporation Detecting logical relationships based on structured query statements
CN110019442A (en) * 2017-09-04 2019-07-16 华为技术有限公司 Access method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6151608A (en) * 1998-04-07 2000-11-21 Crystallize, Inc. Method and system for migrating data
US20110055147A1 (en) * 2009-08-25 2011-03-03 International Business Machines Corporation Generating extract, transform, and load (etl) jobs for loading data incrementally
US20120239612A1 (en) * 2011-01-25 2012-09-20 Muthian George User defined functions for data loading
CN103425762A (en) * 2013-08-05 2013-12-04 南京邮电大学 Telecom operator mass data processing method based on Hadoop platform
US20170249361A1 (en) * 2016-02-29 2017-08-31 International Business Machines Corporation Detecting logical relationships based on structured query statements
CN110019442A (en) * 2017-09-04 2019-07-16 华为技术有限公司 Access method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张迎周等: "一种新型形式化程序切片方法" *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434115A (en) * 2020-11-23 2021-03-02 京东数字科技控股股份有限公司 Data processing method and device, electronic equipment and readable storage medium
CN112434115B (en) * 2020-11-23 2024-02-06 京东科技控股股份有限公司 Data processing method and device, electronic equipment and readable storage medium
CN112612783A (en) * 2020-12-22 2021-04-06 航天信息股份有限公司 Method for realizing cross-platform data sharing
CN112685325A (en) * 2021-01-22 2021-04-20 中信银行股份有限公司 ETL software research and development test management method and system
CN112685325B (en) * 2021-01-22 2023-07-28 中信银行股份有限公司 ETL software research and development test management method and system
CN113051176A (en) * 2021-04-20 2021-06-29 中国工商银行股份有限公司 Method and device for processing automatic test data, electronic equipment and storage medium
CN113051176B (en) * 2021-04-20 2024-03-26 中国工商银行股份有限公司 Automatic test data processing method and device, electronic equipment and storage medium
CN114003230A (en) * 2021-09-28 2022-02-01 厦门国际银行股份有限公司 SQL script rapid compiling method and system
CN115904487A (en) * 2022-11-29 2023-04-04 广发银行股份有限公司 Analytical data aperture management method, system, equipment and storage medium
CN115904487B (en) * 2022-11-29 2023-08-18 广发银行股份有限公司 Analytical data caliber management method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN111597243B (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN111597243B (en) Method and system for abstract data loading based on data warehouse
AU2018272840B2 (en) Automated dependency analyzer for heterogeneously programmed data processing system
JP7026092B2 (en) How to determine descriptive information, devices, electronic devices, computer-readable media and computer programs
US8793674B2 (en) Computer-guided holistic optimization of MapReduce applications
CN111488137B (en) Code searching method based on common attention characterization learning
US7418449B2 (en) System and method for efficient enrichment of business data
US11651014B2 (en) Source code retrieval
CN112988782B (en) Hive-supported interactive query method and device and storage medium
US20210232591A1 (en) Transformation rule generation and validation
US11327722B1 (en) Programming language corpus generation
CN112579586A (en) Data processing method, device, equipment and storage medium
JP7015319B2 (en) Data analysis support device, data analysis support method and data analysis support program
JP3502543B2 (en) Test case generation system and test case generation method
CN110069455B (en) File merging method and device
US10976965B1 (en) Optimization of in-memory processing of data represented by an acyclic graph so that the removal and re-materialization of data in selected nodes is minimized
US10782942B1 (en) Rapid onboarding of data from diverse data sources into standardized objects with parser and unit test generation
CN110580170A (en) software performance risk identification method and device
CN114116773A (en) Structured Query Language (SQL) text auditing method and device
EP3816814A1 (en) Crux detection in search definitions
US11386155B2 (en) Filter evaluation in a database system
CN111581184B (en) Semantic comparison method and device based on database migration
Štěpánková et al. Preprocessing for data mining and decision support
CN105700854B (en) Run the method and device of application task
Mišić et al. Comparison of parallel central processing unit‐and graphics processing unit‐based implementations of greedy string tiling algorithm for source code plagiarism detection
Liu et al. ConFL: Constraint-guided Fuzzing for Machine Learning Framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant