CN109634846B - ETL software testing method and device - Google Patents

ETL software testing method and device Download PDF

Info

Publication number
CN109634846B
CN109634846B CN201811366641.3A CN201811366641A CN109634846B CN 109634846 B CN109634846 B CN 109634846B CN 201811366641 A CN201811366641 A CN 201811366641A CN 109634846 B CN109634846 B CN 109634846B
Authority
CN
China
Prior art keywords
data
matching
result
test
test result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811366641.3A
Other languages
Chinese (zh)
Other versions
CN109634846A (en
Inventor
付博文
余院兰
冯源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Dream Database Co Ltd
Original Assignee
Wuhan Dream Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Dream Database Co Ltd filed Critical Wuhan Dream Database Co Ltd
Priority to CN201811366641.3A priority Critical patent/CN109634846B/en
Publication of CN109634846A publication Critical patent/CN109634846A/en
Application granted granted Critical
Publication of CN109634846B publication Critical patent/CN109634846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of software testing, and provides an ETL software testing method and device, wherein the method comprises the following steps: importing source data into a preset test flow for processing, wherein the test flow comprises data reading, data interaction conversion and data loading; acquiring a processed test result, matching the test result with an expected result, and recording the matching result; feeding back the recorded matching result to a tester; the matching process specifically comprises the following steps: firstly, matching a test result with a data structure of an expected result; after the data structure is successfully matched, matching the line number of the test result with the line number of the expected result; and after the line number matching is successful, continuously matching each line of data in the test result. The invention completes the acquisition of data source, data conversion design and result matching verification in an automatic mode, realizes the automatic test of ETL software, adopts step-by-step matching from simple to difficult in matching verification, greatly saves the test operation time and improves the matching efficiency.

Description

ETL software testing method and device
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of software testing, in particular to an ETL software testing method and device.
[ background of the invention ]
ETL (Extract-Transform-Load, data extraction, transformation, and loading) is a complete process of extracting data from an online transaction database, performing transformation processing, and then loading the data into a data warehouse, and is responsible for completing the transformation of data from a data source to a target data warehouse, which is an important step in implementing a data warehouse.
In the ETL test process, the data after the test is finished is matched with an expected result to verify the result, and the ETL test process can be determined to be successful only if the data is completely matched; when the matching fails, the testing process is proved to have problems, and the testing process needs to be adjusted and improved. However, the volume of ETL test data is usually very large, and if the whole data is matched one by one, the test process is very complex, the task volume is heavy, the matching efficiency is low, and the running time of the test process is long; moreover, when adjustment is performed, the corresponding differentiated data when matching fails is not effectively utilized, so that adjustment of the test process lacks pertinence and pertinence, and time is consumed.
In view of the above, it is an urgent problem in the art to overcome the above-mentioned drawbacks of the prior art.
[ summary of the invention ]
The technical problems to be solved by the invention are as follows:
in the ETL test process, the data volume is usually huge, and if the whole data is matched one by one, the task volume is heavy, the matching efficiency is low, and the running time of the test process is long; moreover, when the adjustment improvement is carried out, the adjustment of the test process is lack of pertinence and purpose and is time-consuming.
The invention achieves the above purpose by the following technical scheme:
in a first aspect, the present invention provides an ETL software testing method, for source data, generating expected results in advance, the method comprising:
importing source data into a preset test flow for processing; the test process comprises data reading, data interaction conversion and data loading;
acquiring a processed test result, matching the test result with an expected result step by step, and recording the matching result;
feeding back the recorded matching result to a tester;
the test result and the expected result are both composed of two-dimensional tables, and the step-by-step matching of the test result and the expected result is specifically as follows: matching the data structure of the test result with the data structure of the expected result; and after the data structure is successfully matched, matching the line number of the test result with the line number of the expected result, and after the line number is successfully matched, continuously matching each line of data in the test result with the corresponding line of data in the expected result line by line.
Preferably, the matching the data structure of the test result and the data structure of the expected result specifically includes: matching the column number of the test result with the column number of the expected result; and after the column number is successfully matched, continuously matching the data definition of each column of the test result with the data definition of each column of the expected result.
Preferably, when the row matching between the test result and the expected result is successful, the method further comprises: and counting the total data volume of the test result, matching the total data volume of the test result with the total data volume of the expected result, and continuously matching each line of data in the test result with the corresponding line of data in the expected result after the total data volume is successfully matched.
Preferably, the matching of each row of data in the test result with the corresponding row of data in the expected result specifically includes:
counting the data size of each row in the test result;
matching the data size of each line in the test result with the data size of the corresponding line in the expected result line by line;
and when the data volume of each row is successfully matched, continuously matching the specific data of each row in the test result with the specific data of the corresponding row in the expected result row by row.
Preferably, the matching of each row of data in the test result with the corresponding row of data in the expected result specifically includes:
counting the data volume of each line in the test result, and arranging the lines in the two-dimensional table according to the sequence of the data volume from small to large;
matching the data size of each line in the test result with the data size of the corresponding line in the expected result line by line;
and when the data volume of each row is successfully matched, sequentially matching the specific data of each row in the test result with the specific data of the corresponding row in the expected result line by line according to the arrangement sequence of the rows.
Preferably, when there is a data structure mismatch between the test result and the expected result, or there is a row number mismatch, or there is a data mismatch of any row between the test result and the expected result, the method further includes:
outputting differential data between the test result and the expected result according to a data matching result;
importing the source data into a preset test flow for processing again, and recording intermediate data generated in each link in the processing process;
and respectively matching the differential data with intermediate data generated in each link, further determining the links of the differential data in the test flow, and generating a test report to feed back to a tester.
Preferably, in the process of matching each line of data in the test result with the corresponding line of data in the expected result, when any line of data in the test result is not matched with the data in the corresponding line of the expected result for the first time, stopping data matching; or stopping data matching when the proportion of the differentiated data between the test result and the expected result reaches a preset threshold value.
Preferably, the method for acquiring the source data specifically includes:
connecting a data source of software to be detected, acquiring relevant information of a data source table by reading a system table of the data source, and writing the relevant information into an ETL system table; wherein the related information comprises one or more items of a data structure, a field type, and primary key information.
Preferably, the importing the source data into the preset test flow for processing specifically includes:
creating a test flow of data synchronization, adding a data reading component, a data cleaning and converting component and a data loading component in the test flow, and setting a source table needing synchronization; the source table is used for storing source data needing synchronization;
carrying out different conversion designs on the test flow according to the test function, wherein the conversion designs comprise one or more of incremental data synchronous design, data filtering design and data cleaning conversion design;
and through the added data components, the source data is subjected to data synchronization processing according to the designed test flow.
Preferably, after the recorded matching result is fed back to a tester, the method further comprises writing and executing an ant script; the ant script is specifically written as follows:
compiling a script of preset conditions of the test flow;
calling codes compiled when the source data are imported into the test flow and the test result is obtained for matching, and completing the test flow and data matching;
calling a code compiled when the matching result is fed back, wherein the code is used for finishing the feedback of the matching result;
compiling a recovery script for recovering the source data and the expected result for the test to an initial state;
the execution of the ant script is specifically as follows:
and executing the ant script according to a preset period, and further completing the periodic ETL software test.
In a second aspect, the present invention further provides an ETL software testing apparatus, including at least one processor and a memory, where the at least one processor and the memory are connected through a data bus, and the memory stores instructions executable by the at least one processor, and the instructions are used to complete the ETL software testing method according to the first aspect after being executed by the processor.
Compared with the prior art, the invention has the beneficial effects that:
the ETL software testing method provided by the invention can realize the automatic testing of the ETL software, the processes of data source acquisition, data interaction conversion, result matching inspection and the like are completed in an automatic mode, and the simple and difficult step-by-step matching can be carried out during result matching verification, so that the testing operation time is greatly saved, and the matching efficiency is improved. Meanwhile, unmatched differential data are effectively utilized, and a link causing matching failure can be determined by matching the differential data with intermediate data, so that a tester can quickly and accurately lock problems in testing, and can make adjustment in a targeted manner, and the debugging efficiency is improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a flowchart of an ETL software testing method according to an embodiment of the present invention;
FIG. 2 is a two-dimensional representation of test results and expected results provided by an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a step-by-step matching of test results with expected results according to an embodiment of the present invention;
FIG. 4 is a flow chart of one method of implementing the second step of data matching of FIG. 3;
FIG. 5 is a flow chart of another method of implementing the second step of data matching in FIG. 3;
FIG. 6 is a flowchart illustrating an embodiment of step 10 of FIG. 1;
fig. 7 is an architecture diagram of an ETL software testing apparatus according to an embodiment of the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The invention will be described in detail below with reference to the figures and examples.
Example 1:
the embodiment of the invention provides an ETL software testing method, as shown in FIG. 1, the method specifically comprises the following steps:
step 10, importing source data into a preset test flow for processing; the test process comprises data reading, data interaction conversion and data loading. The testing process is customized by a research and development staff according to specific scene requirements and/or customer requirements, the invention tests the customized and differentiated ETL software, takes the DMETL of the dream database as an example, and specifically calls an API (Application Programming Interface) of the DMETL through java codes to pre-create and set the testing process. Before the step 10 is executed, source data is also acquired in an automated manner, specifically: connecting a data source of software to be detected through jdbc, acquiring relevant information of a source table by reading a system table of the data source, writing the relevant information into a system table of an ETL (extract transform load), and acquiring source data for testing through the relevant information; wherein the related information comprises one or more items of a data structure, a field type, and primary key information. In the embodiment of the present invention, the method for testing ETL software is described by taking DMETL in dameng database as an example, but the present invention is not limited thereto.
And 20, acquiring the processed test result, matching the test result with an expected result, and recording the matching result.
Wherein, for the source data, an expected result needs to be generated in advance, and the expected result is used as a matching standard, and the test result and the expected result in the embodiment of the present invention are both composed of a two-dimensional table, as shown in fig. 2: the columns represent attribute items, which are predetermined, such as "name", "sex", "age", "telephone", "address" and the like in fig. 2, and the rows represent result items, which correspond to specific data under the attribute items. In this step, the processed test result (including the data structure and the data set result) can be obtained by writing the java code, and the expected result is read at the same time, so that the test result is matched with the expected result, and the matching result can be written into result. Here, both the results of successful matching and failed matching can be recorded, and usually, only the result of failed matching can be recorded for later reference.
In making the data match, the test flow is deemed successful if and only if the test result matches the expected result. Assuming that the tester stops matching when the tester is currently set to find that matching fails, in order to improve matching efficiency, the matching process may specifically include three steps, with reference to fig. 3: firstly, matching a data structure of the test result with a data structure of the expected result; secondly, matching the line number of the test result with the line number of the expected result; and thirdly, matching each line of data in the test result with the corresponding line of data in the expected result. The matching of the data structure is specifically as follows: matching the column number of the test result with the column number of the expected result; after the column number matching is successful, the data definition of each column of the test result is continuously matched with the data definition of each column of the expected result, taking fig. 3 as an example, that is, whether each column is respectively defined by data such as name, gender, age, and the like is compared. Because the matching of the data structure and the matching of the row number are much simpler and the matching speed is much faster than the matching of the specific data of each row, the second step can be carried out only after the first step is successfully matched; if the matching fails in the first step, the testing process is proved to have problems, the second step of matching is not needed, and the matching process is directly stopped; similarly, if the first matching is successful and the second matching is failed, the matching process can be directly stopped without performing the third matching. Therefore, the matching efficiency can be greatly improved and the test running time can be saved by the simple step-by-step matching method.
And step 30, feeding back the recorded matching result to a tester. In the step, reading of the recorded result in result.xml can be realized by writing java specifically, and then the function of writing the mail to the tester by calling the mail server interface is called, so that the tester can obtain the recorded result of the matching failure, and further the problem can be checked, the reason can be found out, and the test flow can be adjusted.
The ETL software testing method provided by the invention can realize the automatic testing of the ETL software, and the processes of data acquisition, data interaction conversion, result matching inspection and the like are completed in an automatic mode; when the result matching is verified, the step matching from simple to difficult can be carried out, the matching process can be stopped when the first step matching fails, further matching is not needed, the test running time is greatly saved, and the matching efficiency is improved.
After the step 30, an ant script needs to be written and executed, so that the steps 10 to 30 are effectively connected in series, wherein the writing of the ant script specifically comprises: 1) compiling a script of preset conditions of the test flow; 2) calling codes compiled when the source data are imported into the test flow and the test result is obtained for matching, and completing the test flow and data matching; calling the entry of the code written in the steps 10 and 20 to run the program of the automatic test; 3) calling a code compiled when the matching result is fed back, wherein the code is used for finishing the feedback of the matching result; calling the main program of the program written in the step 30 to complete the acquisition of the matching result by the tester; 4) and writing a recovery script for recovering the source data and the expected result for the test to the initial state. The execution of the ant script is specifically as follows: and executing the ant script according to a preset period, and further completing the periodic ETL software test. For example, ETL software testing needs to be performed once a day, and herein an ant script can be executed once a day in a timed manner using a timer and a bat script, so as to automatically perform ETL testing once a day.
With reference to the embodiment of the present invention, there is also a preferred implementation scheme, in step 20, after the line number matching in the first step is performed and the line number matching is successful, before performing the second step matching, the method further includes: and analyzing and counting the total data volume of the test result, matching the total data volume of the test result with the total data volume of the expected result, and continuing to match each line of data in the test result with the corresponding line of data in the expected result after the total data volume is successfully matched, namely executing the matching of the second step. When the matching of the total data volume fails, the testing process is proved to have problems, and the process of matching each row of data line by line in the second step is not needed. The matching of the total data size is also fast, so that the matching efficiency can be improved to a certain extent by increasing the matching of the total data size before the second step of matching, and the test running time is saved.
With reference to fig. 4, a preferred implementation scheme also exists in combination with the embodiment of the present invention, and for the third step matching process in step 20, the method specifically includes the following steps:
step 201, counting the data size of each line in the test result. Wherein the data size of each line in the expected result is counted in advance.
Step 202, matching the data size of each line in the test result with the data size of the corresponding line in the expected result line by line.
And 203, continuing to match the specific data of each row in the test result with the specific data of the corresponding row in the expected result row by row after the data amount of each row is successfully matched. Compared with the matching of each specific data, the matching of the data volume of each row is simple, the matching speed is high, if the data volume of any row is not matched, the matching process can be directly stopped, the specific data of each row does not need to be matched, the matching efficiency can be improved to a certain extent, and the test running time is saved.
In a more preferred embodiment, referring to fig. 5, for the third matching process in step 20, the following steps may be specifically included:
step 201', counting the data size of each line in the test result, and arranging the lines in the two-dimensional table according to the sequence of the data size from small to large. Compared with the step 201, steps of sorting the rows according to the data volume size are added, for example, the data volume sizes of the first row to the fifth row are respectively 10M, 12M, 13M, 15M and 20M, and are arranged from small to large; the expected results can also be arranged in advance according to the sequence of the line data from small to large, and then correspond to the test results line by line.
Step 202', the data size of each line in the test result is matched with the data size of the corresponding line in the expected result line by line. Taking the five-row data as an example, the sequence formed by the data size of each row of the test result is as follows: 10. 12, 13, 15, 20, only if the sequence formed by the data size of each row of the expected result is also the above-mentioned number sequence, the data size of each row is successfully matched, and if any value is not matched, the matching is failed.
And 203', when the data volumes of the rows are successfully matched, sequentially matching the specific data of each row in the test result with the specific data of the corresponding row in the expected result line by line according to the arrangement sequence of the rows. Still taking the above five-row data as an example, after the data volumes of the rows are successfully matched, the first row with the smallest data volume is matched in sequence, and then the second row, the third row, the fourth row and the fifth row are matched in sequence, wherein the smaller the data volume, the smaller the byte occupied by the row of data is, the faster the matching speed is. Therefore, according to the matching from small to large, the data matching of the rows as many as possible can be completed in the same time, and the unmatched rows can be found conveniently in time, so that the matching speed is increased, and the test running time is saved.
With reference to the embodiment of the present invention, there is also a preferred implementation scheme, when any one of the three matching steps in step 20 fails, that is, when the data structure between the test result and the expected result is not matched, or the number of rows is not matched, or the data in any row is not matched, the problem of matching failure can be found by using the unmatched differentiated data, and the specific method is as follows:
firstly, according to a data matching result, outputting differentiated data between the test result and the expected result, and further feeding the differentiated data back to a tester.
Then, the source data is imported into the preset test flow again for processing, that is, step 20 is executed again, except that intermediate data generated in each link in the retest process is recorded, for example, data after data reading, data after data filtering, and data after different data cleansing or data conversion, which are intermediate data to be recorded.
And finally, matching the differentiated data with intermediate data generated in each link respectively, further determining the links of the differentiated data in the test flow, and generating a test report to feed back to a tester. For example, when the differential data is successfully matched with the data after the data cleaning, the differential data appears in the data cleaning process, which proves that the data cleaning design has problems, and by generating a corresponding test report and feeding the test report back to a tester, the tester can know the error link in time, so that the data cleaning process can be adjusted in a targeted manner, and the whole test flow can be prevented from being adjusted blindly.
In the method, the link causing the matching failure can be determined by utilizing the differentiated data, and the link is effectively fed back to the tester, so that the tester can quickly and accurately lock the problem in the test flow, and further can make targeted adjustment, and the debugging efficiency is improved.
In step 20, when the line-by-line matching in the third step is performed, stopping data matching when a situation that any line of data in the test result is not matched with data in a corresponding line of the expected result occurs for the first time, and recording a corresponding matching failure result; or stopping data matching when the ratio of the differentiated data between the test result and the expected result reaches a preset threshold value, and recording a corresponding matching failure result; the preset threshold value can be adjusted by a tester according to actual needs. For example, when the requirement on the test precision is not high and a slight error is allowed in the test process, the preset threshold value may be set to 2%, the matching may continue as long as the differentiated data is controlled within 2%, and if the differentiated data exceeds 2%, the matching is considered to be failed, and the test process may be ended. For another example, the tester may set that the matching process is continued after the matching fails, and the matching is ended when a preset threshold is reached, and the data of the matching failure is fed back to the tester so that the tester can analyze the data.
Referring to fig. 6, the step 10 is further described in detail, and specifically includes the following steps:
step 101, creating a test flow of data synchronization, adding a data reading component, a data cleaning and converting component and a data loading component in the test flow, and setting a source table needing synchronization; wherein, the source table is used for storing the source data needing synchronization. The data reading assembly is used for extracting data from a data source to a source table, the data cleaning and converting assembly is used for cleaning and converting the data, the data loading assembly is used for loading the data to a target table, and each data assembly is a functional assembly of the DMETL and can be directly called.
Before a test flow is created, a server connected with the DMETL is usually required to be tested, and if the DMETL cannot be connected, abnormal information is thrown out; then, a corresponding project and a corresponding conversion flow are created, for example, a project named as an automatic test project can be created, and a conversion flow named as test data synchronization is created under the project; adding a data reading component, a data cleaning conversion component and a data loading component of the DMETL in a test data synchronization conversion process; and setting a source table to be synchronized according to the relevant information written into the ETL system table, and setting the size of the sub-batch custom cache for the source data to be synchronized in the source table preferably, thereby improving the synchronization efficiency. For example, if the data size to be synchronized is 1G, it can be set to complete 4 times, 256M for each synchronization, which is faster than the 1G data at a time.
And 102, carrying out different conversion designs on the test flow according to the test function, wherein the conversion designs comprise one or more of incremental data synchronous design, data filtering design and data cleaning conversion design.
According to different incremental data acquisition modes, the incremental data synchronization design further comprises a data synchronization design of a trigger increment, an MD5 increment, a shadow table increment and an increment comparison component, and the trigger increment is designed as follows: creating a trigger on a data source to capture incremental change data and operation, and recording the incremental change data and operation into a system table of the DMETL so as to synchronize the incremental data; the MD5 increment is designed as follows: calculating the MD5 value of each row of data, recording the MD5 value in an MD5 table created by the DMETL, obtaining incremental data and operation through primary key matching, and recording the incremental data and the operation in a system table of the DMETL, thereby synchronizing the incremental data; the shadow table increment is designed as follows: copying the data of the source table to a shadow table created by the DMETL, obtaining incremental data and operation through primary key matching, and recording the incremental data and the operation in a system table of the DMETL, so as to synchronize the incremental data; the design of the increment comparison component is as follows: and sequencing the source table and the target table in a database layer, comparing the conditions such as a unique matching example and a comparison column by configuring, obtaining incremental data and operation, converting the incremental data into sql by a code, sending the sql to a target database, and completing the synchronization of the incremental data.
The data filtering is designed as follows: if condition judgment can be adopted, the source data in the source table is filtered according to the filtering condition, and only the data meeting the filtering condition is synchronized to the target table.
The data cleansing conversion is divided into three categories, wherein for field deletion, merging and splitting: the data cleaning conversion component of the DMETL can be used for setting, and deleting, merging or splitting fields according to conditions such as positions or separators; for field content cleaning: the field content can be cleaned through a java function; for date time string formatting: and setting through a data cleaning conversion component of the DMETL, calling a format () method of date conversion, and throwing abnormal information if an error occurs, thereby formatting the date-time character string.
And 103, performing data synchronization processing on the source data according to the designed test flow through the added data components. According to the flow designed in step 102, the synchronization of the source data from the source table to the target table is completed through data reading, data exchange conversion and data loading, and then the corresponding synchronous test flow is completed.
In conclusion, the ETL software testing method provided by the invention can realize the automatic testing of the ETL software, including the acquisition of data sources, the data interaction conversion, the result matching inspection and the like, and can carry out the step-by-step matching from simple to difficult during the result matching verification, thereby greatly saving the testing running time and improving the matching efficiency. Meanwhile, unmatched differential data can be effectively utilized and timely fed back to a tester, and a link leading to matching failure can be determined by matching the differential data with intermediate data, so that the tester can quickly and accurately lock problems in a testing process, and can make adjustment in a targeted manner, and the debugging efficiency is improved.
Example 2:
on the basis of the ETL software testing method provided in embodiment 1, the present invention further provides an ETL software testing apparatus for implementing the method, as shown in fig. 7, which is a schematic diagram of an apparatus architecture in an embodiment of the present invention. The ETL software testing apparatus of the present embodiment includes one or more processors 21 and a memory 22. In fig. 7, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example.
The memory 22, which is a non-volatile computer-readable storage medium for the ETL software testing method, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the ETL software testing method in embodiment 1. The processor 21 executes various functional applications and data processing of the ETL software testing apparatus by running the nonvolatile software programs, instructions and modules stored in the memory 22, that is, implements the ETL software testing method of embodiment 1.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22, and when executed by the one or more processors 21, perform the ETL software testing method in the above embodiment 1, for example, perform the steps shown in fig. 1, 3-6 described above.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. An ETL software testing method, characterized in that, for source data, expected results are generated in advance, the method comprises:
importing source data into a preset test flow for processing; the test process comprises data reading, data interaction conversion and data loading;
acquiring a processed test result, matching the test result with an expected result step by step, and recording the matching result;
feeding back the recorded matching result to a tester;
the test result and the expected result are both composed of two-dimensional tables, and the step-by-step matching of the test result and the expected result is specifically as follows: matching the data structure of the test result with the data structure of the expected result; after the data structure is successfully matched, matching the line number of the test result with the line number of the expected result; after the line number matching is successful, continuously matching each line of data in the test result with the corresponding line of data in the expected result line by line;
wherein the matching of the data structure of the test result and the data structure of the expected result specifically comprises: matching the column number of the test result with the column number of the expected result; and after the column number is successfully matched, continuously matching the data definition of each column of the test result with the data definition of each column of the expected result.
2. The ETL software testing method according to claim 1, wherein when the row number matching between said test result and said expected result is successful, said method further comprises: and counting the total data volume of the test result, matching the total data volume of the test result with the total data volume of the expected result, and continuously matching each line of data in the test result with the corresponding line of data in the expected result after the total data volume is successfully matched.
3. The ETL software testing method according to claim 1 or 2, wherein the matching of each row of data in the test result with the corresponding row of data in the expected result is specifically:
counting the data size of each row in the test result;
matching the data size of each line in the test result with the data size of the corresponding line in the expected result line by line;
and when the data volume of each row is successfully matched, continuously matching the specific data of each row in the test result with the specific data of the corresponding row in the expected result row by row.
4. The ETL software testing method according to claim 1 or 2, wherein the matching of each row of data in the test result with the corresponding row of data in the expected result is specifically:
counting the data volume of each line in the test result, and arranging the lines in the two-dimensional table according to the sequence of the data volume from small to large;
matching the data size of each line in the test result with the data size of the corresponding line in the expected result line by line;
and when the data volume of each row is successfully matched, sequentially matching the specific data of each row in the test result with the specific data of the corresponding row in the expected result line by line according to the arrangement sequence of the rows.
5. The ETL software testing method according to claim 1, wherein when there is a data structure mismatch between said test result and said expected result, or a row number mismatch, or there is any row data mismatch between said test result and said expected result, said method further comprises:
outputting differential data between the test result and the expected result according to a data matching result;
importing the source data into a preset test flow for processing again, and recording intermediate data generated in each link in the processing process;
and respectively matching the differential data with intermediate data generated in each link, further determining the links of the differential data in the test flow, and generating a test report to feed back to a tester.
6. The ETL software testing method of claim 1, wherein in the process of matching each row of data in the test result with the corresponding row of data in the expected result, when any row of data in the test result is not matched with the data of the corresponding row in the expected result for the first time, data matching is stopped; or stopping data matching when the proportion of the differentiated data between the test result and the expected result reaches a preset threshold value.
7. The ETL software testing method of claim 1, wherein the importing the source data into a preset testing process for processing specifically comprises:
creating a test flow of data synchronization, adding a data reading component, a data cleaning and converting component and a data loading component in the test flow, and setting a source table needing synchronization; the source table is used for storing source data needing synchronization;
carrying out different conversion designs on the test flow according to the test function, wherein the conversion designs comprise one or more of incremental data synchronous design, data filtering design and data cleaning conversion design;
and through the added data components, the source data is subjected to data synchronization processing according to the designed test flow.
8. The ETL software testing method according to claim 1, wherein after said feeding back the recorded matching results to a tester, said method further comprises the writing and execution of an ant script; the ant script is specifically written as follows:
compiling a script of preset conditions of the test flow;
calling codes compiled when the source data are imported into the test flow and the test result is obtained for matching, and completing the test flow and data matching;
calling a code compiled when the matching result is fed back, wherein the code is used for finishing the feedback of the matching result;
compiling a recovery script for recovering the source data and the expected result for the test to an initial state;
the execution of the ant script is specifically as follows:
and executing the ant script according to a preset period, and further completing the periodic ETL software test.
9. An ETL software testing apparatus, comprising at least one processor and a memory, wherein the at least one processor and the memory are connected through a data bus, and the memory stores instructions executable by the at least one processor, and the instructions are used to complete the ETL software testing method according to any one of claims 1 to 8 after being executed by the processor.
CN201811366641.3A 2018-11-16 2018-11-16 ETL software testing method and device Active CN109634846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811366641.3A CN109634846B (en) 2018-11-16 2018-11-16 ETL software testing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811366641.3A CN109634846B (en) 2018-11-16 2018-11-16 ETL software testing method and device

Publications (2)

Publication Number Publication Date
CN109634846A CN109634846A (en) 2019-04-16
CN109634846B true CN109634846B (en) 2021-10-19

Family

ID=66068184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811366641.3A Active CN109634846B (en) 2018-11-16 2018-11-16 ETL software testing method and device

Country Status (1)

Country Link
CN (1) CN109634846B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781070B (en) * 2019-09-06 2024-08-23 平安科技(深圳)有限公司 Big data test verification method and device, computer equipment and storage medium
CN114448869A (en) * 2020-10-19 2022-05-06 南京中兴新软件有限责任公司 Data table synchronization method and device, data exchange equipment and storage medium
CN112597221B (en) * 2020-12-17 2023-04-11 四川新网银行股份有限公司 Test environment data extraction optimization execution method based on cross section data
CN112685325B (en) * 2021-01-22 2023-07-28 中信银行股份有限公司 ETL software research and development test management method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653445A (en) * 2015-12-25 2016-06-08 中电科航空电子有限公司 Implementation method capable of meeting DO-178C test result
CN106293977A (en) * 2015-05-15 2017-01-04 阿里巴巴集团控股有限公司 A kind of data verification method and equipment
CN106815100A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Interface test method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10042903B2 (en) * 2015-09-02 2018-08-07 International Business Machines Corporation Automating extract, transform, and load job testing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293977A (en) * 2015-05-15 2017-01-04 阿里巴巴集团控股有限公司 A kind of data verification method and equipment
CN106815100A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Interface test method and device
CN105653445A (en) * 2015-12-25 2016-06-08 中电科航空电子有限公司 Implementation method capable of meeting DO-178C test result

Also Published As

Publication number Publication date
CN109634846A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
CN109634846B (en) ETL software testing method and device
CN108829584B (en) Service logic interface mock test method and system
CN102831052B (en) Test exemple automation generating apparatus and method
CN110109897A (en) Database script generation method, device, computer equipment and storage medium
CN112286941B (en) Big data synchronization method and device based on Binlog + HBase + Hive
CN104376082A (en) Method for importing data in data source file to database
CN109063005B (en) Data migration method and system, storage medium and electronic device
CN112948473A (en) Data processing method, device and system of data warehouse and storage medium
US11416512B2 (en) Systems and methods for facilitating data transformation
CN111382198B (en) Data recovery method, device, equipment and storage medium
CN111159020B (en) Method and device applied to synchronous software test
CN111125067B (en) Data maintenance method and device
JP2007226780A (en) Method and system for derivation of missing data objects from test data
CN109697161A (en) A kind of test method of storing process, storage medium and database server
CN110633462A (en) Excel two-dimensional table importing method
CN113886590A (en) Data summarizing method and device, computer equipment and storage medium
CN107168883B (en) API data automatic testing method
CN116126846A (en) Data quality detection method, device and medium
CN114691519A (en) Interface automation test method, device, equipment and storage medium
CN114356454A (en) Account checking data processing method, account checking data processing device, account checking data storage medium and program product
CN112612773A (en) Database synchronization test method and device, computer equipment and storage medium
CN110147355B (en) Data synchronization method, device and server
CN116775767A (en) Data synchronization method, device, electronic equipment and storage medium
CN116107889A (en) Regression testing method and device for application program, computer equipment and storage medium
CN117743298A (en) Data tracing method and system based on snapshot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant after: Wuhan dream database Co., Ltd

Address before: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant before: WUHAN DAMENG DATABASE Co.,Ltd.

CB02 Change of applicant information
CB03 Change of inventor or designer information

Inventor after: Fu Bowen

Inventor after: Yu Yuanlan

Inventor after: Feng Yuan

Inventor before: Fu Bowen

Inventor before: Yu Yuanlan

Inventor before: Feng Yuan

Inventor before: Fu Quan

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant