CN112685325A - ETL software research and development test management method and system - Google Patents

ETL software research and development test management method and system Download PDF

Info

Publication number
CN112685325A
CN112685325A CN202110090036.3A CN202110090036A CN112685325A CN 112685325 A CN112685325 A CN 112685325A CN 202110090036 A CN202110090036 A CN 202110090036A CN 112685325 A CN112685325 A CN 112685325A
Authority
CN
China
Prior art keywords
data
information
test
obtaining
script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110090036.3A
Other languages
Chinese (zh)
Other versions
CN112685325B (en
Inventor
汪名森
徐迎田
王琴
席艳秋
肖威
信勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Citic Bank Corp Ltd
Original Assignee
China Citic Bank Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Citic Bank Corp Ltd filed Critical China Citic Bank Corp Ltd
Priority to CN202110090036.3A priority Critical patent/CN112685325B/en
Publication of CN112685325A publication Critical patent/CN112685325A/en
Application granted granted Critical
Publication of CN112685325B publication Critical patent/CN112685325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a system for researching, developing, testing and managing ETL software, which are used for acquiring field information according to metadata information of a data table; obtaining test data according to the field information; obtaining the data type according to the data information of the data table; obtaining a checking script according to the data type; acquiring first input information; obtaining a first program script according to the first input information; according to the first program script, acquiring first extraction data comprising source table information and target table information; obtaining first packing information according to the test data, the check script and the first extraction data; obtaining test case information according to the first packing information; and obtaining a regression testing case base according to the testing case information, wherein the regression testing case base supports continuous integration of programs. The technical problems that a test management platform is single in function and cannot meet various requirements of research and development tests are solved. The integration of test data, program scripts and check scripts is achieved, and the technical effect of reducing the workload of manual testing is achieved.

Description

ETL software research and development test management method and system
Technical Field
The invention relates to the technical field of computers, in particular to a method and a system for researching, developing, testing and managing ETL software.
Background
ETL, an abbreviation used in english Extract-Transform-Load, is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source end to a destination end. The procedure described by ETL, which is generally common practice, includes ETL or ELT (Extract-Load-Transform), and is used in combination. Generally, the larger the amount of data, the more complex the conversion logic, and the more powerful the destination is, the better the ELT is used to utilize the parallel processing capability of the destination database.
At present, the technical schemes of ETL research and development and test management mainly comprise: and processing a target data table by calling an ETL smoking test storage process, checking the data distribution, the null value proportion, the maximum value, the minimum value and the like of the target data table, processing a checking report, and further checking the quality of the ETL program. The method has high automation program and can well check the data quality of the target table. However, for the ETL script test, it is not enough to just smoke the test, and the system test, the performance test, the business test, and the regression test all need a lot of test data and test case preparation. The range covered by this patent is too narrow. Or checking the data quality of the test result table by methods such as item-by-item comparison of the expected result table and the test result table, and further checking the quality of the ETL script. Although the method has high accuracy, a large amount of data preparation work is required, and when large data amount processing is faced, preparation of an expected result table (namely theoretical values) requires a large amount of work, and continuous accumulation of test cases and automatic testing cannot be achieved.
However, in the process of implementing the technical solution of the invention in the embodiments of the present application, the inventors of the present application find that the above-mentioned technology has at least the following technical problems:
the technical problems that a test management platform is single in function and cannot meet various requirements of research and development tests exist in the prior art.
Disclosure of Invention
The embodiment of the application provides a method and a system for research, development, test and management of ETL software, and solves the technical problems that in the prior art, a test management platform is single in function and can not meet various requirements of research, development and test. The integration of test data, program scripts and check scripts is achieved, the test complexity is effectively reduced, and the technical effect of manual test workload is reduced.
In view of the foregoing problems, embodiments of the present application provide a method and a system for developing, testing, and managing ETL software.
In a first aspect, an embodiment of the present application provides a method for developing, testing, and managing ETL software, to obtain metadata information of a data table; acquiring field information according to the metadata information of the data table; obtaining test data according to the field information; obtaining the data type according to the data information of the data table; obtaining a checking script according to the data type; acquiring first input information, wherein the first input information comprises source table information, target table information and a field mapping relation between a source table and a target table; acquiring a first program script according to the first input information; obtaining first extraction data according to the first program script, wherein the first extraction data comprises the source table information and the target table information; obtaining first packing information according to the test data, the check script and the first extraction data; obtaining test case information according to the first packing information; and obtaining a regression testing case base according to the testing case information, wherein the regression testing case base supports continuous integration of programs.
Preferably, the obtaining test data according to the field information includes: obtaining a preset rule; obtaining a preset algorithm; obtaining various types of data corresponding to the fields according to the field information, the preset rule and the preset algorithm; obtaining a date field according to the field information; obtaining the date information of the data according to the date field; and obtaining the test data according to the various types of data and the data date information.
Preferably, the method comprises: obtaining first test data, wherein the first test data is the test data generated by a first field; obtaining second test data, wherein the second test data is the test data generated by a second field, and the first field is different from the second field; wherein the types of data in the first test data are associated with the types of data in the second test data; and obtaining a test database according to the first test data and the second test data.
Preferably, after the check script is obtained according to the data type, the method includes obtaining a first check script, where the first check script is generated at a first time; obtaining a second check script, wherein the second check script is generated at a second time; obtaining an Nth checking script by analogy, wherein the Nth checking script is generated at the Nth time, and N is a natural number greater than 1; and obtaining a checking script set according to the first checking script, the second checking script and the Nth checking script.
Preferably, after obtaining the check script according to the data type, the method includes: obtaining a checking information list according to the checking script; obtaining the information of the checking personnel according to the checking information list; obtaining a checking date according to the checking personnel information; and obtaining a checking result according to the checking date and the checking personnel information.
Preferably, the method comprises: obtaining a first test requirement; obtaining data to be tested from the test database according to the first test requirement; obtaining a first test date; obtaining first replacement information according to the first test date and the data to be tested, wherein the first replacement information is used for replacing the date in the data to be tested according to the first test date; obtaining first test information according to the data to be tested, wherein the first test information comprises a first test result and the data to be tested; and acquiring first recovery information according to the first test information, wherein the first recovery information is used for recovering the date in the data to be tested and storing the date in the test database.
Preferably, the method comprises: obtaining the test case information according to the source table information and the target table information; acquiring a data table check script according to the test case information; obtaining a first binding instruction according to the source table information, the target table information, the test case information and the data table check script, wherein the first binding instruction is used for binding the source table information and the target table information with the test case information and the data table check script; obtaining test data date, test data files and check scripts according to the test case information and the data table check scripts; obtaining a data test script according to the test data date, the test data file and the check script, wherein the data test script is a repeatable execution type shell script; and obtaining a first execution instruction according to the data test script, wherein the first execution instruction is used for issuing the data test script to a test environment for execution.
On the other hand, the application also provides an ETL software research, development, test and management system, which comprises:
a first obtaining unit configured to obtain data table metadata information;
a second obtaining unit, configured to obtain field information according to the data table metadata information;
a third obtaining unit, configured to obtain test data according to the field information;
a fourth obtaining unit, configured to obtain a data type according to the data information in the data table;
a fifth obtaining unit, configured to obtain a check script according to the data type;
a sixth obtaining unit, configured to obtain first entry information, where the first entry information includes source table information, target table information, and a field mapping relationship between the source table and the target table;
a seventh obtaining unit, configured to obtain a first program script according to the first entry information;
an eighth obtaining unit, configured to obtain first extracted data according to the first program script, where the first extracted data includes the source table information and the target table information;
a ninth obtaining unit, configured to obtain first package information according to the test data, the check script, and the first extraction data;
a tenth obtaining unit, configured to obtain test case information according to the first packing information;
an eleventh obtaining unit, configured to obtain a regression test case base according to the test case information, where the regression test case base supports continuous integration of programs.
In a third aspect, the present invention provides an ETL software development, test and management system, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of the first aspect when executing the program.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
the embodiment of the application provides an ETL software research, development, test and management method and system, which are used for acquiring metadata information of a data table; acquiring field information according to the metadata information of the data table; obtaining test data according to the field information; obtaining the data type according to the data information of the data table; obtaining a checking script according to the data type; acquiring first input information, wherein the first input information comprises source table information, target table information and a field mapping relation between a source table and a target table; acquiring a first program script according to the first input information; obtaining first extraction data according to the first program script, wherein the first extraction data comprises the source table information and the target table information; obtaining first packing information according to the test data, the check script and the first extraction data; obtaining test case information according to the first packing information; and obtaining a regression testing case base according to the testing case information, wherein the regression testing case base supports continuous integration of programs. The integrated ETL test system has the advantages that a unified ETL research and development and test management platform is built, unified management of a data table, a batch processing script, test data, a test script, a quality checking script and a test suite is achieved, full-flow scripting of data loading, program running, data checking, data cleaning and the like is achieved by combining three elements of the test data, the program script and the checking script, and the technical effects of continuous accumulation and effective management of the test suite are achieved by accumulating the test script. Therefore, the technical problems that the test management platform has single function and can not meet various requirements of research and development tests in the prior art are solved.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
Fig. 1 is a schematic flowchart of a method for developing, testing, and managing ETL software according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of an ETL software development, test and management system according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an exemplary electronic device according to an embodiment of the present application.
Description of reference numerals: a first obtaining unit 11, a second obtaining unit 12, a third obtaining unit 13, a fourth obtaining unit 14, a fifth obtaining unit 15, a sixth obtaining unit 16, a seventh obtaining unit 17, an eighth obtaining unit 18, a ninth obtaining unit 19, a tenth obtaining unit 20, an eleventh obtaining unit 21, a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304, and a bus interface 306.
Detailed Description
The embodiment of the application provides a method and a system for research, development, test and management of ETL software, and solves the technical problems that in the prior art, a test management platform is single in function and can not meet various requirements of research, development and test.
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are merely some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited to the example embodiments described herein.
Summary of the application
ETL, an abbreviation used in english Extract-Transform-Load, is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from a source end to a destination end. The procedure described by ETL, which is generally common practice, includes ETL or ELT (Extract-Load-Transform), and is used in combination. Generally, the larger the amount of data, the more complex the conversion logic, and the more powerful the destination is, the better the ELT is used to utilize the parallel processing capability of the destination database. However, the technical problems that the test management platform has single function and can not meet various requirements of research and development tests exist in the prior art.
In view of the above technical problems, the technical solution provided by the present application has the following general idea:
the embodiment of the application provides a method for researching, developing, testing and managing ETL software, which comprises the following steps: obtaining metadata information of a data table; acquiring field information according to the metadata information of the data table; obtaining test data according to the field information; obtaining the data type according to the data information of the data table; obtaining a checking script according to the data type; acquiring first input information, wherein the first input information comprises source table information, target table information and a field mapping relation between a source table and a target table; acquiring a first program script according to the first input information; obtaining first extraction data according to the first program script, wherein the first extraction data comprises the source table information and the target table information; obtaining first packing information according to the test data, the check script and the first extraction data; obtaining test case information according to the first packing information; and obtaining a regression testing case base according to the testing case information, wherein the regression testing case base supports continuous integration of programs. The integrated ETL test system has the advantages that a unified ETL research and development and test management platform is built, unified management of a data table, a batch processing script, test data, a test script, a quality checking script and a test suite is achieved, full-flow scripting of data loading, program running, data checking, data cleaning and the like is achieved by combining three elements of the test data, the program script and the checking script, and the technical effects of continuous accumulation and effective management of the test suite are achieved by accumulating the test script.
Having thus described the general principles of the present application, various non-limiting embodiments thereof will now be described in detail with reference to the accompanying drawings.
Example one
Fig. 1 is a schematic flow chart of an ETL software development, test and management method according to an embodiment of the present disclosure, and as shown in fig. 1, an embodiment of the present disclosure provides an ETL software development, test and management method, where the method includes:
step S100: data table metadata information is obtained.
Step S200: and obtaining field information according to the metadata information of the data table.
Step S300: and obtaining test data according to the field information.
Further, the obtaining of the test data according to the field information includes obtaining a preset rule; obtaining a preset algorithm; obtaining various types of data corresponding to the fields according to the field information, the preset rule and the preset algorithm; obtaining a date field according to the field information; obtaining the date information of the data according to the date field; and obtaining the test data according to the various types of data and the data date information.
Further, the method comprises: obtaining first test data, wherein the first test data is the test data generated by a first field; obtaining second test data, wherein the second test data is the test data generated by a second field, and the first field is different from the second field; wherein the types of data in the first test data are associated with the types of data in the second test data; and obtaining a test database according to the first test data and the second test data.
Specifically, the test data is automatically generated, the test data of each field is automatically generated according to the metadata information of the input data table, such as the information of a primary key, a data type, a dictionary enumeration value and the like, and various types of test data are generated according to a Cartesian product method. For example, for the char (3) field, various types of data such as "111, 112, … … 119, 11a … … 11 z" are automatically generated, and for the varchar (5) field, various types of data such as "1, 11, 111, 1111, 11111, … … zzzz" are automatically generated. The data generation rules are consistent, the algorithms are consistent, and the data are generated according to the sequence of 1-9-a-z, so that the data among the data tables can be correctly associated. Meanwhile, the different levels of the required test data are considered under different test scenes, so that unit test cases, integrated test cases and the like have different selection quantities of the test cases. Meanwhile, the data date field in the system is considered, and the data date field needs to be particularly noted when the metadata of the data table is maintained, so that the data date field is replaced by the specified data date information. In an actual test scene, part of test data is desensitized according to production data, so the platform also provides convenient data import and export functions, supports replacing the data with data of specified date, saves various test data versions, and realizes the multiplexing of test cases. The table structure of the test file storage in the embodiment of the application is field, field name, type and remark. And automatic generation of various test data is realized. The system supports various test data management such as unit test, system test, performance test, service test and the like, allows multiple sets of data to be reused, and simultaneously, the test data among team members do not interfere with each other.
Step S400: and obtaining the data type according to the data information of the data table.
Step S500: and obtaining a checking script according to the data type.
Further, after obtaining the check script according to the data type, the method includes obtaining a first check script, where the first check script is generated at a first time; obtaining a second check script, wherein the second check script is generated at a second time; obtaining an Nth checking script by analogy, wherein the Nth checking script is generated at the Nth time, and N is a natural number greater than 1; and obtaining a checking script set according to the first checking script, the second checking script and the Nth checking script.
Further, after obtaining the check script according to the data type, the method includes: obtaining a checking information list according to the checking script; obtaining the information of the checking personnel according to the checking information list; obtaining a checking date according to the checking personnel information; and obtaining a checking result according to the checking date and the checking personnel information.
Further, the method comprises: obtaining a first test requirement; obtaining data to be tested from the test database according to the first test requirement; obtaining a first test date; obtaining first replacement information according to the first test date and the data to be tested, wherein the first replacement information is used for replacing the date in the data to be tested according to the first test date; obtaining first test information according to the data to be tested, wherein the first test information comprises a first test result and the data to be tested; and acquiring first recovery information according to the first test information, wherein the first recovery information is used for recovering the date in the data to be tested and storing the date in the test database.
Specifically, the check script is automatically generated, and according to the entered metadata information of the data table, such as a main key, a non-null value, an enumerated value, and the like, the SQL check script is automatically generated, and for the ID check with the type of the main key, the check script is obtained as a select count (1) from (select ID, count (1) from table a where data _ date ═ date $ group by ID having count (1) > 1); for type enumerated value check, the check script is select count (1) from tabley a where data _ date $ and type not in (select type from dic). Besides automatic generation, the checking SQL can also be used for checking other types, and meanwhile, manual maintenance is supported, so that continuous accumulation of checking scripts is achieved, for example, checking on processing time and checking relation among data tables is achieved. The table structure design of the data quality checking table is as follows, supports the dynamic replacement of data dates, and supports the tests under different environments, different personnel and different dates. And automatic generation of batch scripts is realized. And realizing a framework program template, unifying coding style, log specification and the like based on the input data sheet and the output data sheet. And simultaneously, the functions of version management, release management and the like are embedded, so that the version management of the program script is realized. Different personnel can draw corresponding database from the test database according to the check requirement of oneself, and the database can multiplex, and first user uses the back promptly, with test data reduction to the test database in, the second user can draw the use, and the date can carry out automatic replacement simultaneously, resumes after the replacement, avoids the interact between the different personnel. And managing the relation between personnel and data dates. The corresponding relation between main maintenance personnel and data dates avoids the conflict of the data dates in the team when test data are subsequently issued, and data files are mutually covered.
Step S600: and acquiring first input information, wherein the first input information comprises source table information, target table information and a field mapping relation between the source table and the target table.
Step S700: and acquiring a first program script according to the first input information.
Step S800: and obtaining first extraction data according to the first program script, wherein the first extraction data comprises the source table information and the target table information.
Specifically, the program script is automatically generated according to input table information and output table information input by a user, the system provides two ways for editing the program script, the first way is to input a field mapping relation between a source table and a target table in an excel according to an embedded excel editing module, and the system can automatically generate MYSQL, Teradata or other types of program scripts. And in the second mode, a developer selects an input source table and an input target table, a program script template is automatically generated, and secondary program editing is supported. Generally, a transition from a first to a second is possible, but the reverse is not possible. And the system extracts the source table and the target table according to the final program script so as to be associated with the test data and the check script.
The program metadata table is designed as follows:
field(s) Name of field Type (B) Remarks for note
PROG_ID Self-increment ID Int
PROG_NAME Name of program varchar(50)
PROG_TAB_TYP Program table type Char(1) 1-Source table, 2-target table
TAB_NAME Name of data table varchar(50)
Step S900: obtaining first packing information according to the test data, the check script and the first extraction data;
step S1000: obtaining test case information according to the first packing information;
step S1100: and obtaining a regression testing case base according to the testing case information, wherein the regression testing case base supports continuous integration of programs.
Specifically, the combination of test data, test scripts, check scripts and the like realizes the continuous accumulation of test cases, and solves the problems of high test cost, incomplete test coverage, high regression test cost and the like in the development and development of ETL. And binding the source table and target table information of the program script with a data table test case and a data table check script, selecting a test data date, a test data file, a check method and the like to form a repeatedly executable shell-like script, and releasing the repeatedly executable shell-like script to a specified mysql, td or other data environment together to realize ETL processing and finally check the accuracy of a result. According to different test scenes, the magnitude of required test data is different, the test scenes can be divided into multiple levels, such as unit test cases, integrated test cases and the like, the selection quantity of the test cases is different, the test data is correspondingly selected, meanwhile, data date fields in the system are considered, and special attention needs to be paid when metadata of a data table is maintained, so that the metadata can be replaced by the specified data date information. The test cases are stored, in an actual test scene, part of test data is desensitized according to production data, the embodiment of the application also provides convenient data import and export functions, supports replacement of the test cases with data on a specified date, saves various test data versions, and realizes multiplexing of the test cases. The integrated ETL test system has the advantages that a unified ETL research and development and test management platform is built, unified management of a data table, a batch processing script, test data, a test script, a quality checking script and a test suite is achieved, full-flow scripting of data loading, program running, data checking, data cleaning and the like is achieved by combining three elements of the test data, the program script and the checking script, and continuous accumulation and effective management of the test suite are achieved by accumulating the test script. The technical problems that in the prior art, a test management platform is single in function and can not meet various requirements of research and development tests are solved.
The test case table is designed as follows:
field(s) Name of field Type (B) Remarks for note
TEST_ID Test case ID Int
PROG_NAME Name of program varchar(50)
Data_DATE Date of data
The detail information table of the test case library is designed as follows:
Figure BDA0002912107270000131
further, the method comprises: obtaining the test case information according to the source table information and the target table information; acquiring a data table check script according to the test case information; obtaining a first binding instruction according to the source table information, the target table information, the test case information and the data table check script, wherein the first binding instruction is used for binding the source table information and the target table information with the test case information and the data table check script; obtaining test data date, test data files and check scripts according to the test case information and the data table check scripts; obtaining a data test script according to the test data date, the test data file and the check script, wherein the data test script is a repeatable execution type shell script; and obtaining a first execution instruction according to the data test script, wherein the first execution instruction is used for issuing the data test script to a test environment for execution.
Specifically, the system searches the corresponding test data file in the test file storage table according to the name of the data table and the file type so as to load the test data file. And according to the name of the data table, removing the corresponding check SQL from the data quality check table. The data test shell script is formed and issued to the test environment for execution according to the following algorithm.
Load A temporary Table A.dat
Dat temporary Load B table
The data date in the Update A temporary table is the specified data date
The data date in the Update B temporary table is the specified data date
Respectively filling the A temporary table and the B temporary table into A, B tables
Operating a designated program to process the results of the C table according to the A, B table
Running the checking script, checking the data quality of C table, and inserting the access into the quality checking result table
Clearing data (optional) in A temporary table, B temporary table, A table, B table and C table
And exiting the program.
In summary, the embodiment of the present application has the following service modules, which are combined with each other to form a complete batch script management platform.
1. And managing a database. The method mainly maintains basic information of a local database, a test database and a production database, and is used for automatically connecting the local database when a test is subsequently issued, further initializing test data, issuing a program script, running a check program and the like.
2. And managing a data table. The method mainly maintains metadata information of the data table, including table building statements, primary key fields, non-empty fields, enumeration fields, data date fields and the like, and is used for automatically generating test cases and checking scripts subsequently. The metadata information can realize bidirectional synchronization, namely the metadata information can be read from the test database, and the metadata information maintained by the platform can also be synchronized into the specified test database.
3. And managing the relation between personnel and data dates. The corresponding relation between main maintenance personnel and data dates avoids the conflict of the data dates in the team when test data are subsequently issued, and data files are mutually covered.
4. And (5) managing the test files. And automatically generating test sample data of the table according to the database and the data table information maintained at the early stage. According to different requirements of a test scene, unit test data, system test data, performance test data and the like can be generated and can be exported and stored, and case sets can be conveniently accumulated.
5. And (5) checking rule management. According to the metadata information of the data table, the checking rules of the table are automatically generated, such as key checking, non-null checking, enumerated value checking and the like, and the checking information of audit relationship, fluctuation rate and the like can also be manually maintained by a tester. Meanwhile, various kinds of inspection information can be configured and issued to different test scenes such as unit test, system test, performance test and the like, so that the running condition and the result of the ETL script are monitored.
6. And managing batch scripts. The method supports selection of information such as a data source table and a target data table, automatically generates a program template, and supports manual editing and program script publishing.
7. And managing test cases. Key elements such as batch processing scripts, test files, data dates, verification rules and the like are combined to form test cases, and the test cases are automatically cleaned. And supporting various different combinations to further generate a test case library.
Furthermore, the embodiment of the application realizes integration of test data, program scripts and check scripts by building a full-flow management platform for batch processing scripts, effectively reduces test complexity and reduces manual test workload. The method and the device realize automatic generation of the test data, effectively solve the problems of complex test data preparation, incomplete test data coverage and the like, and reduce the complexity of test case preparation. Continuous integrated management of the batch processing scripts is realized, continuous checking of the batch processing scripts is realized through the precipitation test cases, and the robustness of the ETL program scripts is ensured. Simultaneously has the following characteristics:
1. according to the invention, by building an ETL research and development and test platform, combination of test data, test scripts, check scripts and the like is realized, continuous accumulation of test cases is realized, and the problems of high test cost, incomplete test coverage, high regression test cost and the like in the ETL research and development are solved.
2. The system technical method can be realized by using a BS or CS framework, namely a browser or a client form can be used.
3. Other Web servers, database servers, network deployments, and the like can be expanded according to the team size and the number of users, for example, distributed expansion is used to meet the needs of team usage, but the application architecture remains unchanged.
Furthermore, the embodiment of the application has been partially put into practical use in financial institutions
1. At present, a financial management domain of a management information development department of a software development center is responsible for research and development work of a financial system, a great amount of analysis type reports and ETL scripts exist for the financial analysis system, and a data quality check module is popularized in advance in order to reduce research and development burden of developers according to in-line research and development requirements, namely, the data quality check SQL is maintained in the system in advance and passes through a check result every day, so that rapid iteration of team delivery is realized.
2. The system brings the following benefits after being online:
1) the coding style is unified, and the operation and maintenance cost is reduced. By automatically generating the codes, the code specification information is unified, the readability of the script is enhanced, and the operation and maintenance cost is reduced.
2) The research and development efficiency is improved, and the test data preparation work is reduced. By automatically generating different types of test cases, the requirement of rapid research and development is met, and by checking the data quality of the target table, the data inspection of grammar and basic semantic level is realized.
3) And improving the iteration quality of the code. The test case construction is realized by combining the requirements of test data, program scripts, check scripts, cleaning scripts and the like; by means of the platform, test cases are continuously accumulated, automatic development of various work such as subsequent regression test is guaranteed, and code quality is improved.
Example two
Based on the same inventive concept as the ETL software development, test and management method in the foregoing embodiment, the present invention further provides an ETL software development, test and management system, as shown in fig. 2, the system includes:
a first obtaining unit 11, wherein the first obtaining unit 11 is used for obtaining data table metadata information;
a second obtaining unit 12, where the second obtaining unit 12 is configured to obtain field information according to the data table metadata information;
a third obtaining unit 13, where the third obtaining unit 13 is configured to obtain test data according to the field information;
a fourth obtaining unit 14, where the fourth obtaining unit 14 is configured to obtain a data type according to the data information in the data table;
a fifth obtaining unit 15, where the fifth obtaining unit 15 is configured to obtain a check script according to the data type;
a sixth obtaining unit 16, where the sixth obtaining unit 16 is configured to obtain first entry information, where the first entry information includes source table information, target table information, and a field mapping relationship between the source table and the target table;
a seventh obtaining unit 17, where the seventh obtaining unit 17 is configured to obtain a first program script according to the first entry information;
an eighth obtaining unit 18, where the eighth obtaining unit 18 is configured to obtain first extracted data according to the first program script, where the first extracted data includes the source table information and the target table information;
a ninth obtaining unit 19, where the ninth obtaining unit 19 is configured to obtain first package information according to the test data, the check script, and the first extraction data;
a tenth obtaining unit 20, where the tenth obtaining unit 20 is configured to obtain test case information according to the first package information;
an eleventh obtaining unit 21, where the eleventh obtaining unit 21 is configured to obtain a regression testing case base according to the testing case information, and the regression testing case base supports continuous integration of programs.
Further, the system further comprises:
a twelfth obtaining unit, configured to obtain a preset rule;
a thirteenth obtaining unit configured to obtain a preset algorithm;
a fourteenth obtaining unit, configured to obtain various types of data corresponding to the fields according to the field information, the preset rule, and the preset algorithm;
a fifteenth obtaining unit, configured to obtain a date field according to the field information;
a sixteenth obtaining unit, configured to obtain the data date information according to the date field;
a seventeenth obtaining unit, configured to obtain the test data according to the various types of data and the data date information.
Further, the system further comprises:
an eighteenth obtaining unit, configured to obtain first test data, where the first test data is the test data generated by the first field;
a nineteenth obtaining unit, configured to obtain second test data, where the second test data is the test data generated by a second field, and the first field is different from the second field; wherein the types of data in the first test data are associated with the types of data in the second test data;
a twentieth obtaining unit, configured to obtain a test database according to the first test data and the second test data.
Further, the system further comprises:
a twenty-first obtaining unit, configured to obtain a first check script, where the first check script is generated at a first time;
a twenty-second obtaining unit, configured to obtain a second check script, where the second check script is the check script generated at a second time; obtaining an Nth checking script by analogy, wherein the Nth checking script is generated at the Nth time, and N is a natural number greater than 1;
a twenty-third obtaining unit, configured to obtain a check script set according to the first check script, the second check script, and up to the nth check script.
Further, the system further comprises:
a twenty-fourth obtaining unit, configured to obtain a check information list according to the check script;
a twenty-fifth obtaining unit, configured to obtain the information of the examiners according to the examination information list;
a twenty-sixth obtaining unit, configured to obtain a check date according to the information of the check staff;
and the twenty-seventh obtaining unit is used for obtaining a checking result according to the checking date and the checking personnel information.
Further, the system further comprises:
a twenty-eighth obtaining unit for obtaining a first test requirement;
a twenty-ninth obtaining unit, configured to obtain data to be tested from the test database according to the first test requirement;
a thirtieth obtaining unit for obtaining a first test date;
a thirty-first obtaining unit, configured to obtain first replacement information according to the first test date and data to be tested, where the first replacement information is used to replace a date in the data to be tested according to the first test date;
a thirty-second obtaining unit, configured to obtain first test information according to the to-be-tested data, where the first test information includes a first test result and the to-be-tested data;
a thirty-third obtaining unit, configured to obtain first recovery information according to the first test information, where the first recovery information is used to recover and store a date in the to-be-tested data in the test database.
Further, the system further comprises:
a thirty-fourth obtaining unit, configured to obtain the test case information according to the source table information and the target table information;
a thirty-fifth obtaining unit, configured to obtain a data table check script according to the test case information;
a thirty-sixth obtaining unit, configured to obtain a first binding instruction according to the source table information, the target table information, the test case information, and the data table check script, where the first binding instruction is used to bind the source table information and the target table information with the test case information and the data table check script;
a thirty-seventh obtaining unit, configured to obtain a test data date, a test data file, and a check script according to the test case information and the data table check script;
a thirty-eighth obtaining unit, configured to obtain a data test script according to the test data date, the test data file, and a check script, where the data test script is a repeatable execution shell-like script;
a thirty-ninth obtaining unit, configured to obtain, according to the data test script, a first execution instruction, where the first execution instruction is used to issue the data test script to a test environment for execution.
Various changes and specific examples of the ETL software development, test and management method in the first embodiment of fig. 1 are also applicable to the ETL software development, test and management system in the present embodiment, and through the detailed description of the ETL software development, test and management method, those skilled in the art can clearly know the implementation method of the ETL software development, test and management system in the present embodiment, so for the brevity of the description, detailed descriptions are omitted here.
Exemplary electronic device
The electronic device of the embodiment of the present application is described below with reference to fig. 3.
Fig. 3 illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application.
Based on the inventive concept of the ETL software development, test and management method in the foregoing embodiments, the present invention further provides an ETL software development, test and management system, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of any one of the methods of the ETL software development, test and management method described above.
Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 306 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other systems over a transmission medium.
The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the embodiment of the application provides an ETL software research, development, test and management method and system, which are used for acquiring metadata information of a data table; acquiring field information according to the metadata information of the data table; obtaining test data according to the field information; obtaining the data type according to the data information of the data table; obtaining a checking script according to the data type; acquiring first input information, wherein the first input information comprises source table information, target table information and a field mapping relation between a source table and a target table; acquiring a first program script according to the first input information; obtaining first extraction data according to the first program script, wherein the first extraction data comprises the source table information and the target table information; obtaining first packing information according to the test data, the check script and the first extraction data; obtaining test case information according to the first packing information; and obtaining a regression testing case base according to the testing case information, wherein the regression testing case base supports continuous integration of programs. The integrated ETL test system has the advantages that a unified ETL research and development and test management platform is built, unified management of a data table, a batch processing script, test data, a test script, a quality checking script and a test suite is achieved, full-flow scripting of data loading, program running, data checking, data cleaning and the like is achieved by combining three elements of the test data, the program script and the checking script, and the technical effects of continuous accumulation and effective management of the test suite are achieved by accumulating the test script. Therefore, the technical problems that the test management platform has single function and can not meet various requirements of research and development tests in the prior art are solved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a system for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. An ETL software development test management method, wherein the method comprises the following steps:
obtaining metadata information of a data table;
acquiring field information according to the metadata information of the data table;
obtaining test data according to the field information;
obtaining the data type according to the data information of the data table;
obtaining a checking script according to the data type;
acquiring first input information, wherein the first input information comprises source table information, target table information and a field mapping relation between a source table and a target table;
acquiring a first program script according to the first input information;
obtaining first extraction data according to the first program script, wherein the first extraction data comprises the source table information and the target table information;
obtaining first packing information according to the test data, the check script and the first extraction data;
obtaining test case information according to the first packing information;
and obtaining a regression testing case base according to the testing case information, wherein the regression testing case base supports continuous integration of programs.
2. The method of claim 1, wherein the obtaining test data according to the field information comprises:
obtaining a preset rule;
obtaining a preset algorithm;
obtaining various types of data corresponding to the fields according to the field information, the preset rule and the preset algorithm;
obtaining a date field according to the field information;
obtaining the date information of the data according to the date field;
and obtaining the test data according to the various types of data and the data date information.
3. The method of claim 2, wherein the method comprises:
obtaining first test data, wherein the first test data is the test data generated by a first field;
obtaining second test data, wherein the second test data is the test data generated by a second field, and the first field is different from the second field; wherein the types of data in the first test data are associated with the types of data in the second test data;
and obtaining a test database according to the first test data and the second test data.
4. The method of claim 1, wherein the obtaining a check script according to the data type comprises:
obtaining a first check script, wherein the first check script is generated at a first time;
obtaining a second check script, wherein the second check script is generated at a second time;
obtaining an Nth checking script by analogy, wherein the Nth checking script is generated at the Nth time, and N is a natural number greater than 1;
and obtaining a checking script set according to the first checking script, the second checking script and the Nth checking script.
5. The method of claim 1, wherein the obtaining a check script according to the data type comprises:
obtaining a checking information list according to the checking script;
obtaining the information of the checking personnel according to the checking information list;
obtaining a checking date according to the checking personnel information;
and obtaining a checking result according to the checking date and the checking personnel information.
6. The method of claim 3, wherein the method comprises:
obtaining a first test requirement;
obtaining data to be tested from the test database according to the first test requirement;
obtaining a first test date;
obtaining first replacement information according to the first test date and the data to be tested, wherein the first replacement information is used for replacing the date in the data to be tested according to the first test date;
obtaining first test information according to the data to be tested, wherein the first test information comprises a first test result and the data to be tested;
and acquiring first recovery information according to the first test information, wherein the first recovery information is used for recovering the date in the data to be tested and storing the date in the test database.
7. The method of claim 1, wherein the method comprises:
obtaining the test case information according to the source table information and the target table information;
acquiring a data table check script according to the test case information;
obtaining a first binding instruction according to the source table information, the target table information, the test case information and the data table check script, wherein the first binding instruction is used for binding the source table information and the target table information with the test case information and the data table check script;
obtaining test data date, test data files and check scripts according to the test case information and the data table check scripts;
obtaining a data test script according to the test data date, the test data file and the check script, wherein the data test script is a repeatable execution type shell script;
and obtaining a first execution instruction according to the data test script, wherein the first execution instruction is used for issuing the data test script to a test environment for execution.
8. An ETL software development test management system, wherein the system comprises:
a first obtaining unit configured to obtain data table metadata information;
a second obtaining unit, configured to obtain field information according to the data table metadata information;
a third obtaining unit, configured to obtain test data according to the field information;
a fourth obtaining unit, configured to obtain a data type according to the data information in the data table;
a fifth obtaining unit, configured to obtain a check script according to the data type;
a sixth obtaining unit, configured to obtain first entry information, where the first entry information includes source table information, target table information, and a field mapping relationship between the source table and the target table;
a seventh obtaining unit, configured to obtain a first program script according to the first entry information;
an eighth obtaining unit, configured to obtain first extracted data according to the first program script, where the first extracted data includes the source table information and the target table information;
a ninth obtaining unit, configured to obtain first package information according to the test data, the check script, and the first extraction data;
a tenth obtaining unit, configured to obtain test case information according to the first packing information;
an eleventh obtaining unit, configured to obtain a regression test case base according to the test case information, where the regression test case base supports continuous integration of programs.
9. An ETL software development test management system comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-7 when executing the program.
CN202110090036.3A 2021-01-22 2021-01-22 ETL software research and development test management method and system Active CN112685325B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110090036.3A CN112685325B (en) 2021-01-22 2021-01-22 ETL software research and development test management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110090036.3A CN112685325B (en) 2021-01-22 2021-01-22 ETL software research and development test management method and system

Publications (2)

Publication Number Publication Date
CN112685325A true CN112685325A (en) 2021-04-20
CN112685325B CN112685325B (en) 2023-07-28

Family

ID=75458947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110090036.3A Active CN112685325B (en) 2021-01-22 2021-01-22 ETL software research and development test management method and system

Country Status (1)

Country Link
CN (1) CN112685325B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1601483A (en) * 2004-10-22 2005-03-30 中国工商银行 Automation software testing system based on script explanatory tool
CN105589874A (en) * 2014-10-22 2016-05-18 阿里巴巴集团控股有限公司 ETL task dependence relationship detecting method and device and ETL tool
CN109308258A (en) * 2018-08-21 2019-02-05 中国平安人寿保险股份有限公司 Building method, device, computer equipment and the storage medium of test data
CN109474488A (en) * 2018-10-31 2019-03-15 中国银行股份有限公司 Interface test method, device and computer equipment
CN109634846A (en) * 2018-11-16 2019-04-16 武汉达梦数据库有限公司 A kind of ETL method for testing software and device
CN110704475A (en) * 2019-09-29 2020-01-17 中国银行股份有限公司 Method and system for comparing ETL loading table structures
CN111597243A (en) * 2020-05-15 2020-08-28 中国工商银行股份有限公司 Data warehouse-based abstract data loading method and system
CN111930617A (en) * 2020-07-31 2020-11-13 中国工商银行股份有限公司 Automatic testing method and device based on data objectification

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1601483A (en) * 2004-10-22 2005-03-30 中国工商银行 Automation software testing system based on script explanatory tool
CN105589874A (en) * 2014-10-22 2016-05-18 阿里巴巴集团控股有限公司 ETL task dependence relationship detecting method and device and ETL tool
CN109308258A (en) * 2018-08-21 2019-02-05 中国平安人寿保险股份有限公司 Building method, device, computer equipment and the storage medium of test data
CN109474488A (en) * 2018-10-31 2019-03-15 中国银行股份有限公司 Interface test method, device and computer equipment
CN109634846A (en) * 2018-11-16 2019-04-16 武汉达梦数据库有限公司 A kind of ETL method for testing software and device
CN110704475A (en) * 2019-09-29 2020-01-17 中国银行股份有限公司 Method and system for comparing ETL loading table structures
CN111597243A (en) * 2020-05-15 2020-08-28 中国工商银行股份有限公司 Data warehouse-based abstract data loading method and system
CN111930617A (en) * 2020-07-31 2020-11-13 中国工商银行股份有限公司 Automatic testing method and device based on data objectification

Also Published As

Publication number Publication date
CN112685325B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
US9411712B2 (en) Generating test data
AU2014233672B2 (en) System for metadata management
US8141029B2 (en) Method and system for executing a data integration application using executable units that operate independently of each other
JP2020510925A (en) Method and apparatus for performing a test using a test case
US7418449B2 (en) System and method for efficient enrichment of business data
US20130041900A1 (en) Script Reuse and Duplicate Detection
Bhatele et al. Hatchet: Pruning the overgrowth in parallel profiles
US20140019941A1 (en) Data selection
CN112579586A (en) Data processing method, device, equipment and storage medium
CN110032594A (en) The data pick-up method, apparatus and storage medium of the Various database of customizable
US20220207039A1 (en) Query content-based data generation
Barberis et al. The ATLAS EventIndex: a BigData catalogue for all ATLAS experiment events
CN111382198A (en) Data recovery method, device, equipment and storage medium
CN112685325B (en) ETL software research and development test management method and system
CN114936154A (en) Test case and test data generation method and device
CN109101267B (en) Application release management method and device, electronic equipment and storage medium
CN112596806A (en) Data lake data loading script generation method and system
CN111666291A (en) Financial tool accounting processing method and device, electronic equipment and medium
CN109669668B (en) Method and device for realizing simulated transaction execution in system performance test
CN113590841B (en) Intelligent rapid examination and intelligent early warning system and method based on knowledge graph
CN112381228B (en) Knowledge system construction and application method based on business
Karpenko et al. ATLAS grid workload on NDGF resources: Analysis, modeling, and workload generation
Smith Joining and aggregating datasets using CouchDB
Tölli Risk Detector-testing advanced analytics to support audits
CN112598375A (en) Property inspection analysis method and device for intelligent park

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant