[summary of the invention]
The technical problem to be solved in the invention is:
In ETL test process, data volume is usually very huge, if task will be made by carrying out matching one by one to overall data
Measure heavy, matching efficiency is low, test process long operational time;Moreover, being lacked when being adjusted improvement to the adjustment of test process
Weary specific aim and purpose, it is relatively time-consuming.
The present invention reaches above-mentioned purpose by following technical solution:
In a first aspect, for source data, expected results are pre-generated the present invention provides a kind of ETL method for testing software,
The described method includes:
Source data is imported preset testing process to handle;Wherein, the testing process includes reading data, data
Interaction conversion is loaded with data;
Obtain treated test result, test result and expected results be subjected to substep matching, and by matching result into
Row record;
The matching result of record is fed back into tester;
Wherein, the test result is made of with the expected results bivariate table, described by test result and expected knot
Fruit carries out substep matching specifically: by the data structure of the test result and the progress of the data structure of the expected results
Match;After data structure matching success, the line number of the test result is matched with the line number of the expected results, line number
After success, continue with the corresponding row data in the expected results to carry out every data line in the test result line by line
Matching.
Preferably, the data structure by the test result and the data structure of the expected results carry out matching tool
Body are as follows: match the columns of the test result with the columns of the expected results;After columns successful match, continue institute
The data definition for stating each column of test result is matched with the data definition of each column of the expected results.
Preferably, after line number successful match between the test result and the expected results, the method also includes:
The total amount of data for counting the test result, by the total data of the total amount of data size of the test result and the expected results
Amount size matched, after total amount of data size successful match, continue by the test result every data line with it is described
Correspondence row data in expected results are matched.
Preferably, every data line by the test result and the corresponding row data in the expected results into
Row matching specifically:
Count the data volume size of every a line in the test result;
The data volume that row is corresponded in the data volume size and the expected results of every row in the test result is matched line by line
Size;
After the data volume size of each row equal successful match, continue the specific number for matching every row in the test result line by line
According to specific data that row is corresponded in the expected results.
Preferably, every data line by the test result and the corresponding row data in the expected results into
Row matching specifically:
Count the data volume size of every a line in the test result, and by the test result according to data volume from it is small to
Big sequence arranges each row in bivariate table;
The data volume that row is corresponded in the data volume size and the expected results of every row in the test result is matched line by line
Size;
After the data volume size of each row equal successful match, puts in order according to capable, successively match the test line by line
As a result the specific data of row are corresponded in the specific data of every row and the expected results.
Preferably, when between the test result and the expected results data structure mismatch or line number mismatch,
Or between the test result and the expected results there are the data of any row mismatch when, the method also includes:
According to Data Matching as a result, the differentiation data between the test result and the expected results is exported;
Again source data is imported preset testing process to handle, and to the centre that link each in treatment process generates
Data are recorded;
The intermediate data that the differentiation data is generated with each link respectively is matched, and then determines the differentiation
Appearance link of the data in the testing process, and generate test report and feed back to tester.
Preferably, in every data line by the test result and the corresponding row data in the expected results
Carry out it is matched during, correspond to row when first appearing in any row data in the test result and the expected results
When data mismatch, stop Data Matching;Alternatively, when the differentiation data between the test result and the expected results
When accounting reaches preset threshold, stop Data Matching.
Preferably, the acquisition methods of the source data specifically:
The data source for connecting software to be detected, the system table by reading the data source obtain the related of Data source table and believe
Breath, and the relevant information is written in the system table of ETL;Wherein, the relevant information includes data structure, field type
And it is one or more in major key information.
Preferably, it is described by source data import preset testing process carry out processing specifically include:
The synchronous testing process of data is created, reading data component, data cleansing conversion are added in the testing process
Component and dataload component, and be arranged and need synchronous source table;Wherein, the source table, which is used to store, needs synchronous source number
According to;
Different conversion designs are carried out to testing process according to test function, the conversion designs include that incremental data is synchronous
It is one or more in design, data filtering design and data cleansing conversion designs;
By each data package of addition, source data is made to carry out the synchronous processing of data according to the testing process of design.
Preferably, it is described the matching result of record is fed back into tester after, the method also includes ant
Script being write and executing;The ant script is write specifically:
Write the script of the prerequisite of the testing process;
It calls and source data is imported into testing process and obtains the code write when test result matches, it is described for completing
Testing process and Data Matching;
The code write when calling feedback matching result, for completing the feedback of matching result;
Recovery script is write, for returning to original state the source data of test and expected results;
The execution of the ant script specifically:
The ant script, and then the ETL software test of execution cycle property are executed according to the preset period.
Second aspect, the present invention also provides a kind of ETL software testing devices, including at least one processor and storage
Device, between at least one described processor and memory by data/address bus connect, the memory be stored with can by it is described extremely
The instruction that a few processor executes, described instruction by the processor after being executed, for completing described in above-mentioned first aspect
ETL method for testing software.
Compared with prior art, the beneficial effects of the present invention are:
ETL method for testing software provided by the invention can realize the automatic test of ETL software, by way of automation
The processes such as acquisition, data interaction conversion and result match check for completing data source, when result is matched and is verified, can carry out by
Letter greatlys save the test run time, improves matching efficiency to difficult substep matching.Meanwhile for unmatched differentiation number
Get up according to also effective use, can determine by matching differentiation data with intermediate data and lead to the link that it fails to match, favorably
It where tester quickly and accurately locks the problems in test, and then targetedly makes adjustment, improves debugging effect
Rate.
Embodiment 1:
The embodiment of the invention provides a kind of ETL method for testing software, as shown in Figure 1, the method specifically include it is following
Step:
Step 10, source data preset testing process is imported to handle;Wherein, the testing process includes that data are read
It takes, the load of data interaction conversion and data.Testing process herein is by research staff for special scenes demand and/or client
What demand customized, the present invention is tested aiming at the ETL software of the customization, differentiation, to reach dream database
For DMETL, specifically can by java code call DMETL API (Application Programming Interface,
Application programming interface), setting testing process is pre-created.Before executing the step 10, also pass through the side of automation
Formula obtains source data, specifically: the data source that software to be detected is connected by jdbc, by the system table for reading the data source
The relevant information of acquisition source table, and the relevant information is written in the system table of ETL, it can be obtained by the relevant information
To the source data of test;Wherein, the relevant information include one in data structure, field type and major key information or
It is multinomial.In embodiments of the present invention, for up to the DMETL of dream database, ETL method for testing software is illustrated, but simultaneously
Not to limit the present invention.
Step 20, treated test result is obtained, test result is matched with expected results, and by matching result
It is recorded.
Wherein, expected results need to be pre-generated for source data, the expected results are used as matching criteria, and the present invention is implemented
Test result described in example is made of with the expected results bivariate table, as shown in Figure 2: column represent attribute item, are true in advance
It is fixed, such as " name " in Fig. 2, " gender ", " age ", " phone ", " address " etc., capable then represent result items, corresponding attribute
Specific data under.It in this step, specifically can treated by writing java Code obtaining test result (including data
Structure and Data concentrating fruit), while the expected results are read, and then the test result and the expected results are carried out
Match, matching result can be write in result.xml and be recorded.The result of successful match and failure can be recorded herein,
Under normal conditions, only the result that it fails to match can also be recorded, so as to subsequent reference use.
When carrying out Data Matching, when the test result and the expected results exactly match, can just recognize
The fixed testing process is successful.Assuming that tester is currently set for discovery, it fails to match just stops matching, then in order to mention
High matching efficiency, matching process refer to Fig. 3, particularly may be divided into three steps: the first step, first by the data structure of the test result with
The data structure of the expected results is matched;Second step, by the row of the line number of the test result and the expected results
Number is matched;Third step, by every data line and the corresponding row data in the expected results in the test result into
Row matching.Wherein, the matching of data structure specifically: first by the columns of the columns of the test result and the expected results into
Row matching;After columns successful match, continue the data definition of each column of the test result and each column of the expected results
Data definition matched, by taking Fig. 3 as an example, i.e., whether more each column are respectively that the data such as name, gender, age are fixed
Justice.Since the matching to data structure and the matching for being intended to compare the specific data of every row to the matching of line number are simply more,
It is also much faster with speed, second step can be just carried out after first step successful match herein;If just it fails to match for the first step, prove
Testing process no longer needs to carry out second step matching there are problem, directly stopping matching process;Similarly, if the first step
Successful match, it fails to match for second step, then without carrying out third step matching, directly stops matching process.Therefore, pass through this
Kind is greatly improved matching efficiency by letter to difficult substep matching process, saves the test run time.
Step 30, the matching result of record is fed back into tester.It in this step, specifically can be by writing
Java realizes the reading that result is recorded in result.xml, then calls Mail Server Interface to write mail and is sent to tester
The function of member makes tester get the record that it fails to match as a result, can check problem in turn, find out reason, flows to test
Journey is adjusted.
ETL method for testing software provided by the invention can realize the automatic test of ETL software, by way of automation
Complete the processes such as acquisition, data interaction conversion and the result match check of data;When carrying out result matching verification, can carry out
By letter to difficult substep matching, it fails to match can stop matching process for the first step, and without further matching, survey is greatly saved
Trial run time improves matching efficiency.
After the step 30, also need to carry out writing and executing for ant script, thus by the step 10- step 30
Effectively concatenation, the ant script are write specifically: 1) write the script of the prerequisite of the testing process;2) calling will
Source data imports testing process and obtains the code write when test result matches, for completing the testing process and number
According to matching;The program of the entrance operation automatic test of write code i.e. in invocation step 10 and step 20;3) feedback is called
The code write when matching result, for completing the feedback of matching result;The main program that i.e. invocation step 30 writes program is completed
Acquisition of the tester to matching result;4) recovery script is write, for the source data of test and expected results to be restored to
Original state.The execution of the ant script specifically: execute the ant script, and then execution cycle property according to the preset period
ETL software test.Such as need to carry out an ETL software test daily, timer and bat script specifically can be used herein,
Timing executes an ant script daily, to be automatically performed ETL detection once a day.
In conjunction with the embodiment of the present invention, there is also a kind of preferred implementations to execute the first step in the step 20
After line number matching and line number successful match, before executing second step matching, the method also includes: analysis counts the survey
The total amount of data of test result carries out the total amount of data size of the test result and the total amount of data size of the expected results
It matches, after total amount of data size successful match, is further continued for the every data line and the expected results in the test result
In correspondence row data matched, that is, execute the matching of the second step.Wherein, when it fails to match for total amount of data, then
It proves that there are problems for testing process, also just no longer needs to carry out in second step to every data line matched process line by line.To sum
Matching according to amount size is also very fast, therefore, can also be in certain journey by increasing the matching of total amount of data before second step matches
Matching efficiency is improved on degree, saves the test run time.
In conjunction with the embodiment of the present invention, there is also a kind of preferred implementations, for the third step in the step 20
With process, with reference to Fig. 4, it is specific again the following steps are included:
Step 201, the data volume size of every a line in the test result is counted.Wherein, each in the expected results
Capable data volume size has also been counted in advance.
Step 202, it matches line by line in the data volume size and the expected results of every row in the test result and corresponds to row
Data volume size.
Step 203, after the data volume size of each row equal successful match, continuation matches every row in the test result line by line
Specific data and the expected results in correspond to the specific data of row.For each specific data of matching, to every row
Data volume size match fairly simple, matching speed is also very fast, and the data volume size of any row mismatches if it exists, then
Matching process can directly be stopped, no longer needing to match the specific data of every a line, therefore can also improve to a certain extent
Matching efficiency saves the test run time.
Wherein, in preferred scheme, specifically may be used with reference to Fig. 5 for the third step matching process in the step 20
The following steps are included:
Step 201 ', the data volume size of every a line in the test result is counted, and by the test result according to number
Each row is arranged in bivariate table according to the sequence of amount from small to large.Relative to step 201, increase according to data volume size to each row
The step of sequence, for example, the data volume size of the first row to fifth line is respectively 10M, 12M, 13M, 15M and 20M, from small to large
It is arranged successively;Wherein, the expected results can also shift to an earlier date is arranged according to the sequence of each row of data amount from small to large, Jin Eryu
The test result corresponds to line by line.
Step 202 ', it is matched in the data volume size and the expected results of every row in the test result line by line and corresponds to row
Data volume size.With above-mentioned five-element's data instance, the sequence of each row of data amount size formation of the test result are as follows:
10,12,13,15,20, only when the sequence that each row of data amount size of the expected results is formed also is above-mentioned Serial No.
When, the data volume size of each row just calculates successful match, mismatches if any any value, then it fails to match.
Step 203 ', after the data volume size of each row equal successful match, puts in order according to capable, successively match line by line
The specific data of row are corresponded in the test result in the specific data of every row and the expected results.Still with five above-mentioned line numbers
For, after the equal successful match of data volume size of each row, the first the smallest the first row of matched data amount in sequence, then successively
With the second row, the third line, fourth line and fifth line, data volume is smaller, and the byte for showing that the row data occupy is smaller, then matches speed
It spends faster.Therefore, it is matched according to sequence from small to large, the data of row as much as possible can be completed within the same time
Matching, to improve matching speed, saves the test run time convenient for finding unmatched row in time.
In conjunction with the embodiment of the present invention, there is also a kind of preferred implementations, when any step in the three steps matching of step 20
When it fails to match, i.e., when data structure mismatch between the test result and the expected results or line number mismatch, or
When the data of person's any row mismatch, where also searching the problem of it fails to match using unmatched differentiation data, specifically
Method is as follows:
Firstly, according to Data Matching as a result, the differentiation data between the test result and the expected results is defeated
Out, and then by the differentiation data tester is fed back to.
Then, source data is imported preset testing process again to handle, that is, re-execute the steps 20, difference
It is, the intermediate data that wherein each link generates is recorded during retesting, such as after reading data
Data, the data after data filtering and the data after process different data cleansing or data conversion, these are all
The intermediate data for needing to record.
Finally, the intermediate data that the differentiation data is generated with each link respectively is matched, and then described in determination
Appearance link of the differentiation data in the testing process, and generate test report and feed back to tester.For example, when described
When Data Matching after differentiation data and data cleansing is successful, then differentiation data is that occur during data cleansing
, it was demonstrated that the problems in the design of data cleansing can make to survey by generating corresponding test report and feeding back to tester
Examination personnel recognize the link to go wrong in time, and targetedly to data cleansing, this process is adjusted, and avoid blind
Mesh entire testing process is adjusted.
In the above-mentioned methods, lead to the link that it fails to match by can determine using differentiation data, and effective Feedback is given
Tester is conducive to tester and quickly and accurately locks the problems in testing process place, and then targetedly makes
Adjustment, improves debugging efficiency.
Wherein, in the step 20, when carrying out the matching line by line of third step, when first appearing in the test result
When corresponding to the unmatched situation of data of row in any row data and the expected results, stop Data Matching, and will be corresponding
It fails to match, and result is recorded;Alternatively, when the accounting of the differentiation data between the test result and the expected results
When reaching preset threshold, stop Data Matching, and it fails to match that result is recorded by corresponding;Wherein, the preset threshold
It can be adjusted according to actual needs by tester.For example, allowing test process to have subtle error when measuring accuracy is of less demanding
When, can set preset threshold is 2%, as long as then differentiation data control can continue to match within 2%, is thought more than 2%
It fails to match, can finish test procedure.For another example tester is settable to continue matching process after it fails to match, when reaching
Just terminate to match when to preset threshold, and by the data feedback that it fails to match to tester, so that tester analyzes.
Below with reference to Fig. 6, the step 10 is further spread out and is discussed in detail, specifically includes the following steps:
Step 101, the synchronous testing process of creation data, adds reading data component, data in the testing process
Transition components and dataload component are cleaned, and is arranged and needs synchronous source table;Wherein, the source table needs to synchronize for storing
Source data.The reading data component be used for from from data source extract data into source table, the data cleansing transition components
For carrying out cleaning conversion to data, the dataload component is loaded for data to object table, and each data package is
The functional unit of DMETL can call directly.
Before creating testing process, it usually needs the server-side of first test connection DMETL is dished out different if it can not connect
Normal information;Then corresponding engineering and flow path switch are created, such as the engineering of entitled " automatic test engineering " can be created, and
The flow path switch of entitled " test data is synchronous " is created under the engineering;DMETL is added in " test data is synchronous " flow path switch
Reading data component, data cleansing transition components and dataload component;According to the correlation being written in ETL system table
Information setting needs synchronous source table, while also preferable customized in batches to needing synchronous source data to be arranged in the table of source
Cache size, and then improve synchronous efficiency.For example, it is desired to which synchronous size of data is 1G, then it may be configured as point 4 completions, often
Subsynchronous 256M, this is than disposably synchronizing the speed of 1G data faster.
Step 102, different conversion designs are carried out to testing process according to test function, the conversion designs include increment
It is one or more in data Synchronization Design, data filtering design and data cleansing conversion designs.
Wherein, the mode obtained according to incremental data is different, the incremental data Synchronization Design include again trigger increment,
The data Synchronization Design of MD5 increment, shadow table increment and incremental raio to component, the design of the trigger increment are as follows: in data
Trigger is created on source to capture increment delta data and operation, and is recorded in the system table of DMETL, to carry out incremental number
According to synchronization;The design of the MD5 increment are as follows: the MD5 value for calculating every data line is recorded in the MD5 table of DMETL creation,
It is matched by major key and obtains incremental data and operation, and be recorded in the system table of DMETL, to carry out the same of incremental data
Step;The design of the shadow table increment are as follows: in the shadow table that copy source table data to DMETL create, obtained by major key matching
Incremental data and operation are obtained, and is recorded in the system table of DMETL, to carry out the synchronization of incremental data;The incremental raio pair
The design of component are as follows: be ranked up in database layer in face of source table and object table, by configuring unique match example and comparison column etc.
Condition is compared, and obtains incremental data and operation, is that sql is sent to object library by code conversion, it is same to complete incremental data
Step.
The design of the data filtering are as follows: can be used if condition judgement, according to filter condition to the source data in the table of source into
The data for meeting filter condition are only synchronized to object table by row filtering.
The data cleansing conversion is divided into three classes, wherein field is deleted, merge, is split: can pass through the number of DMETL
It is configured according to cleaning transition components, deletion, merging or the fractionation of field is carried out according to conditions such as position or separators;For
Field contents cleaning: field contents can be cleaned by java function;For date-time string format: passing through
The data cleansing transition components of DMETL are configured, and are called format () method of date conversion, are errors excepted then dished out different
Normal information, to carry out the formatting of date-time character string.
Step 103, by each data package of addition, source data is made to carry out what data synchronized according to the testing process of design
Processing.According to the process designed in step 102, loaded by reading data, data exchange conversion and data, complete source data from
Source table and then completes corresponding synchronism detection process to the synchronization of object table.
In conclusion the ETL method for testing software provided through the invention, it can be achieved that ETL software automatic test, packet
The acquisition, data interaction conversion and result match check etc. for including data source can be carried out when carrying out result matching verification by letter
It is matched to difficult substep, the test run time is greatly saved, improve matching efficiency.Meanwhile for unmatched differentiation number
It according to that can efficiently use, timely feedbacks to tester, can determine and cause by matching differentiation data with intermediate data
The link that it fails to match is conducive to tester and quickly and accurately locks the problems in testing process place, and then targetedly
Ground is made adjustment, and debugging efficiency is improved.