CN116737573A

CN116737573A - Data testing method, device, equipment and medium for big data platform

Info

Publication number: CN116737573A
Application number: CN202310721422.7A
Authority: CN
Inventors: 钱家欣; 高俊; 唐琳; 海彤; 汤定定
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2023-06-16
Filing date: 2023-06-16
Publication date: 2023-09-12

Abstract

The application discloses a data testing method, a device, equipment and a medium of a big data platform, which can be applied to the big data field or the financial field and comprises the following steps: and acquiring a source system data table with an association relationship with the data table to be tested, and determining a field mapping rule of the data table to be tested by utilizing field information of the source system data table and the association relationship between the data table to be tested and the source system data table. And determining first processing data of the data table to be tested based on the test data of the source system data table and the field mapping rule. And executing the test case of the data table to be tested, and acquiring second processing data. The test results of the test cases are determined by comparing the first process data with the second process data. The first processing data and the second processing data are automatically checked to be compared, so that whether the test case written by the developer is correct or not can be determined, the data do not need to be manually checked, and the efficiency and the accuracy of the test are improved.

Description

Data testing method, device, equipment and medium for big data platform

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data testing method, apparatus, device, and medium for a big data platform.

Background

When the bank processes various businesses, the large data platform can be applied to manage the data set due to more related data. In integrating canonical data sets, there is a complex logical relationship between the data sets, and the standards for processing data by different business systems may differ. In this case, when testing data, a manual process is typically required to complete the test. However, manual testing takes a long time, the efficiency of the test is relatively low, and the manual testing also depends on the capability of a tester, which may result in poor accuracy of the test.

Disclosure of Invention

In view of the above, the present application provides a data testing method, device, apparatus and medium for a big data platform, so as to improve the efficiency and accuracy of the test.

In a first aspect, the present application provides a data testing method for a big data platform, the method comprising:

acquiring a source system data table with an association relationship with a data table to be tested;

determining a field mapping rule of the data table to be tested based on field information of the data table of the source system and an association relationship between the data table to be tested and the data table of the source system;

determining first processing data of the data table to be tested based on the test data of the source system data table and the field mapping rule;

executing the test case of the data table to be tested, and acquiring second processing data of the data table to be tested;

comparing the first processing data with the second processing data to determine a test result of the test case.

In one possible implementation manner, the determining the field mapping rule of the data table to be tested based on the field information of the data table of the source system and the association relationship between the data table to be tested and the data table of the source system includes:

determining association relations between the fields and each field of the source system data table aiming at any field in the data table to be tested;

and determining a field mapping rule of the field based on field information of the source system data table and association relations between the field and each field of the source system data table.

and determining a field mapping rule of the data table to be tested based on the field information of the data table of the source system, the association relation between the data table to be tested and the data table of the source system and the preset screening condition of the data table to be tested.

and determining a field mapping rule of the data table to be tested based on the field information of the data table of the source system, the association relation between the data table to be tested and the data table of the source system and the test scene of the data table to be tested.

In one possible implementation, the test data of the source system data table is predetermined for a number of make constraint for a test scenario, the test data including at least one of normal data, abnormal data, and boundary data.

In one possible implementation, the test case is a test program determined based on field information of the source system data table and an association relationship between the data table to be tested and the source system data table.

In one possible implementation, the comparing the first processed data and the second processed data to determine the test result of the test case includes:

when the similarity of the first processing data and the second processing data meets a preset condition, determining that the test result of the case to be tested is qualified; otherwise, determining that the test result of the test case is unqualified.

In a second aspect, the present application provides a data testing apparatus for a big data platform, the apparatus comprising:

the first acquisition unit is used for acquiring a source system data table which has an association relationship with the data table to be tested;

a first determining unit, configured to determine a field mapping rule of the data table to be tested based on field information of the data table of the source system and an association relationship between the data table to be tested and the data table of the source system;

a second determining unit, configured to determine first processing data of the data table to be tested based on test data of the source system data table and the field mapping rule;

the second acquisition unit is used for executing the test cases of the data table to be tested and acquiring second processing data of the data table to be tested;

and the test unit is used for comparing the first processing data with the second processing data and determining the test result of the test case.

In a possible implementation manner, the first determining unit is specifically configured to determine, for any field in the data table to be tested, an association relationship between the field and each field of the source system data table; and determining a field mapping rule of the field based on field information of the source system data table and association relations between the field and each field of the source system data table.

In a possible implementation manner, the first determining unit is specifically configured to determine a field mapping rule of the data table to be tested based on field information of the source system data table, an association relationship between the data table to be tested and the source system data table, and a preset screening condition of the data table to be tested.

In a possible implementation manner, the first determining unit is specifically configured to determine a field mapping rule of the data table to be tested based on field information of the source system data table, an association relationship between the data table to be tested and the source system data table, and a test scenario of the data table to be tested.

In a possible implementation manner, the test unit is specifically configured to determine that the test result of the to-be-tested case is qualified when the similarity between the first processing data and the second processing data meets a preset condition; otherwise, determining that the test result of the test case is unqualified.

In a third aspect, the present application provides a data testing apparatus for a large data platform, the apparatus comprising: a memory and a processor;

the memory is used for storing related program codes;

the processor is configured to invoke the program code to execute the data testing method of the big data platform according to any one of the implementation manners of the first aspect.

In a fourth aspect, the present application provides a computer readable storage medium, where the computer readable storage medium is used to store a computer program, where the computer program is used to execute the data testing method of the big data platform according to any implementation manner of the first aspect.

From this, the application has the following beneficial effects:

in the above implementation manner of the present application, in order to implement the test of the data table to be tested, firstly, a source system data table having an association relationship with the data table to be tested is obtained, and then, a field mapping rule of the data table to be tested, that is, how to determine the fields in the data table to be tested, is determined by using field information of the source system data table and the association relationship between the data table to be tested and the source system data table. And determining first processing data of the data table to be tested based on the test data of the source system data table and the field mapping rule of the data table to be tested, namely, expected data of the data table to be tested according to the test data of the source system data table and the field mapping rule. And executing the test case of the data table to be tested, and acquiring second processing data of the data table to be tested. The test results of the test cases are determined by comparing the first process data with the second process data. By the method, the field mapping rule can be determined by utilizing the relation between the data table to be tested and the data table of other source systems, and the first processing data of the data table to be tested is obtained as a reference. The second processing data is obtained by automatically executing the test cases, and the first processing data and the second processing data are automatically checked for comparison, so that whether the test cases written by the developer are correct or not can be determined, the data do not need to be manually checked, and the efficiency and the accuracy of the test are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments provided in the present application, and other drawings may be obtained according to these drawings for those of ordinary skill in the art.

FIG. 1 is a flow chart of a data testing method for a big data platform according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a data testing device of a big data platform according to an embodiment of the present application;

fig. 3 is a schematic diagram of a data testing device of a big data platform according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, where the described embodiments are merely exemplary implementations, but not all implementations of the application. Those skilled in the art can combine embodiments of the application to obtain other embodiments without inventive faculty, and such embodiments are also within the scope of the application.

When the bank processes various businesses, the large data platform can be applied to manage the data set due to more related data. In integrating canonical data sets, there is a complex logical relationship between the data sets, and the standards for processing data by different business systems may differ, in which case manual processing is typically required to complete the test when the data is tested. However, manual testing takes a long time, the efficiency of the test is relatively low, and the manual testing also depends on the capability of a tester, which may result in poor accuracy of the test.

Based on the above, the embodiment of the application provides a data testing method of a big data platform so as to improve the efficiency and accuracy of data testing. In particular, in order to test a data table to be tested, a source system data table having an association relationship with the data table to be tested is first obtained, and then a field mapping rule of the data table to be tested, that is, how to determine fields in the data table to be tested, is determined by using field information of the source system data table and the association relationship between the data table to be tested and the source system data table. And determining first processing data of the data table to be tested based on the test data of the source system data table and the field mapping rule of the data table to be tested, namely, expected data of the data table to be tested according to the test data of the source system data table and the field mapping rule. And executing the test case of the data table to be tested, and acquiring second processing data of the data table to be tested. The test results of the test cases are determined by comparing the first process data with the second process data. By the method, the field mapping rule can be determined by utilizing the relation between the data table to be tested and the data table of other source systems, and the first processing data of the data table to be tested is obtained as a reference. The second processing data is obtained by automatically executing the test cases, and the first processing data and the second processing data are automatically checked for comparison, so that whether the test cases written by the developer are correct or not can be determined, the data do not need to be manually checked, and the efficiency and the accuracy of the test are improved.

In one possible implementation manner, the method provided by the embodiment of the application can test the data managed by the big data platform. The large data platform may store data using an MPP database, i.e., store data using a data table. The MPP database is a large distributed database system, has high expandability and high availability, and is suitable for storing and inquiring mass data under a large data platform. In this embodiment, when testing test cases written by a developer for different test scenarios, the data in the source system data table used is test data embedded in advance according to the test requirements, that is, data actually generated in a real service scenario is not used.

When determining the field mapping rule of the data table to be tested, other test languages may be used for setting, for example, sql statement of the database, etc., and the first processing data is determined according to the field mapping rule and used as the reference data of the test case. And then, compiling test cases aiming at the data table to be tested by a developer, obtaining second processing data after executing the test cases, and determining whether the compiled test cases are correct or not by comparing the first processing data with the second processing data. The test cases may be determined by referring to a field mapping rule of the data table to be tested, that is, a program language is used to set an association relationship between the data table to be tested and the source system data table.

In order to facilitate understanding of the technical solution provided by the embodiments of the present application, the following description will be given with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart of a data testing method of a big data platform according to an embodiment of the present application.

The method may comprise the steps of:

s101: and acquiring a source system data table which has an association relationship with the data table to be tested.

In a data table generated when a service is processed, there may be an association relationship between different data tables, for example, data corresponding to a certain field in the data table may be calculated according to fields of other data tables associated with the field of the data table. Therefore, in order to test the data table to be tested, it is first required to acquire each source system data table having an association relationship with the data table to be tested. The source system data table may be a data table from a different source system, or may be a data table from the same source system, and the number of data tables of each source system may be multiple, which is not limited in this embodiment.

S102: and determining a field mapping rule of the data table to be tested based on the field information of the data table of the source system and the association relation between the data table to be tested and the data table of the source system.

The source system data table comprises a plurality of fields, and a field mapping rule of the data table to be tested can be determined according to field information of the source system data table and an association relationship between the data table to be tested and the source system data table, wherein the field information can represent meaning represented by the field and data corresponding to the field. That is, it may be determined how the fields of the data table to be tested are obtained.

In one possible implementation, for any field in the data table to be tested, an association relationship between the field and each field of the source system data table may be determined, and then, based on field information of the source system data table and the association relationship between the field and each field of the source system data table, a field mapping rule of the field is determined. That is, the fields in the data table to be tested may be determined by the fields of other associated source system data tables.

Taking a possible application scenario as an example, in the application scenario, field 1 in the data table to be tested represents the unit price of the item a, field a in the source system data table 1 associated with the data table to be tested represents the total sales of the item a, and field B represents the sales of the item a, and then the field mapping rule of field 1 may be determined according to the association relationship between field 1 and fields a and B, and may be the ratio of field a to field B.

In one possible implementation, the method may be applied to different test scenarios when testing the data table to be tested. Optionally, the test scenario may be an abnormal test scenario, that is, the data generated by the data table to be tested is abnormal data. For example, when a field in the data table to be tested cannot be null, the generated field in the data table to be tested can be null by setting field data and a field mapping rule in the source system data table associated with the field. Therefore, when determining the field mapping rule, the determination may be performed based on the field information of the source system data table, the association relationship between the data table to be tested and the source system data table, and the test scenario of the data table to be tested.

In one possible implementation manner, different test requirements are provided for different test scenes, so that preset screening conditions corresponding to the test scenes can be preset according to the different test scenes. Therefore, when determining the field mapping rule of the data table to be tested, the field mapping rule of the data table to be tested can be determined based on the field information of the source system data table, the association relationship between the data table to be tested and the source system data table, and the preset screening condition of the data table to be tested.

For example, when the preset screening condition is to screen the data from 1 to 10 in the month of the data table to be tested, the data from 1 to 10 corresponding to all the fields are required to be screened according to the fields indicating the date in the data table to be tested.

S103: and determining first processing data of the data table to be tested based on the test data of the source system data table and the field mapping rule.

After determining the field mapping rule of the data table to be tested, the first processing data of the data table to be tested can be determined according to the test data of the source system data table and the field mapping rule. Optionally, when determining the field mapping rule, the setting may be performed by using a database language, so that after determining the field mapping rule, the first processing data of the data table to be tested may be automatically obtained according to the test data of the source system data table and the field mapping rule.

In one possible implementation, the test data of the source system data table is generated for the number of make constraints determined for each test scenario, i.e. not data in a real business scenario. That is, in order to meet different test scenario requirements, a number constraint condition for generating data in the source system data table may be preset, where the number constraint condition may be a condition for the data itself in the data table or a condition for constraining an association relationship between different data. For example, the data of a certain field may be constrained not to exceed a threshold value, or the value of the field a may be constrained to be smaller than the value of the field b, or the association relationship between fields of different data tables may be constrained. After determining the number constraint condition, the normal data, the abnormal data, the boundary data and the like used for testing can be generated according to the number constraint condition, so that the test data is more various, and the test requirements of different test scenes are met.

S104: and executing the test case of the data table to be tested, and acquiring second processing data of the data table to be tested.

In order to realize the test of the data table to be tested, a developer can write a test case, namely a test program for testing the data table to be tested. In one possible implementation, the test cases may be determined based on field information of the source system data table and an association between the data table to be tested and the source system data table. That is, the field mapping rule for setting the data table to be tested is written in a programming language. So that after execution of the test cases, second processed data of the data sheet to be tested can be obtained.

S105: comparing the first processing data with the second processing data to determine a test result of the test case.

By comparing the first processing data with the second processing data, it can be judged whether the test case of the data table to be tested is correct. Specifically, when the similarity of the first processing data and the second processing data meets a preset condition, determining that the test result of the case to be tested is qualified; otherwise, determining that the test result of the test case is unqualified. The preset condition of the similarity may be set, for example, in such a manner that the similarity is greater than a threshold, and a specific value of the threshold may be set in combination with an actual requirement. For example, the threshold value may be set to 90%, that is, when the similarity between the first processed data and the second processed data is greater than 90%, the test result of the case to be tested is qualified.

By the method provided by the embodiment of the application, the field mapping rule can be determined by utilizing the relation between the data table to be tested and the data table of other source systems, and the first processing data of the data table to be tested is obtained as a reference. The second processing data is obtained by automatically executing the test cases, and the first processing data and the second processing data are automatically checked for comparison, so that whether the test cases written by the developer are correct or not can be determined, the data do not need to be manually checked, and the efficiency and the accuracy of the test are improved.

Based on the method embodiment, the embodiment of the application also provides a data testing device of the big data platform. The following will make a detailed description with reference to the accompanying drawings.

Referring to fig. 2, fig. 2 is a schematic diagram of a data testing device of a big data platform according to an embodiment of the present application.

The apparatus 200 comprises:

a first obtaining unit 201, configured to obtain a source system data table having an association relationship with a data table to be tested;

a first determining unit 202, configured to determine a field mapping rule of the data table to be tested based on field information of the data table of the source system and an association relationship between the data table to be tested and the data table of the source system;

a second determining unit 203, configured to determine first processing data of the data table to be tested based on test data of the source system data table and the field mapping rule;

a second obtaining unit 204, configured to execute a test case of the data table to be tested, and obtain second processing data of the data table to be tested;

and a test unit 205, configured to compare the first processing data and the second processing data, and determine a test result of the test case.

In a possible implementation manner, the first determining unit 202 is specifically configured to determine, for any field in the data table to be tested, an association relationship between the field and each field of the source system data table; and determining a field mapping rule of the field based on field information of the source system data table and association relations between the field and each field of the source system data table.

In a possible implementation manner, the first determining unit 202 is specifically configured to determine a field mapping rule of the data table to be tested based on field information of the source system data table, an association relationship between the data table to be tested and the source system data table, and a preset screening condition of the data table to be tested.

In a possible implementation manner, the first determining unit 202 is specifically configured to determine a field mapping rule of the data table to be tested based on field information of the source system data table, an association relationship between the data table to be tested and the source system data table, and a test scenario of the data table to be tested.

In a possible implementation manner, the test unit 205 is specifically configured to determine that the test result of the to-be-tested case is qualified when the similarity between the first processing data and the second processing data meets a preset condition; otherwise, determining that the test result of the test case is unqualified.

The beneficial effects of the data testing device of the big data platform provided by the embodiment of the application can be seen in the above method embodiments, and are not described herein.

Based on the method embodiment and the device embodiment, the embodiment of the application also provides data testing equipment of the big data platform. The following description will be made with reference to the accompanying drawings.

Referring to fig. 3, fig. 3 is a schematic diagram of a data testing device of a big data platform according to an embodiment of the present application.

The apparatus 300 comprises: a memory 301 and a processor 302;

the memory 301 is used for storing relevant program codes;

the processor 302 is arranged to invoke the program code to perform the method described in the method embodiments above.

In addition, the embodiment of the application also provides a computer readable storage medium for storing a computer program, wherein the computer program is used for executing the data testing method of the big data platform.

It should be noted that the data testing method, device, equipment and medium of the big data platform provided by the application can be used in the big data field or the financial field. The foregoing is merely an example, and the application fields of the data testing method, the device, the equipment and the medium of the big data platform provided by the application are not limited.

It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. In particular, for system or apparatus embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with relevant portions being referred to in the description of the method embodiments. The above-described apparatus embodiments are merely illustrative, in which units or modules illustrated as separate components may or may not be physically separate, and components shown as units or modules may or may not be physical modules, i.e. may be located in one place, or may be distributed over multiple network units, where some or all of the units or modules may be selected according to actual needs to achieve the purposes of the embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data testing method for a big data platform, the method comprising:

executing a test case of the data table to be tested, and acquiring second processing data of the data table to be tested, wherein the test case is a test program for testing the data table to be tested;

2. The method according to claim 1, wherein the determining the field mapping rule of the data table to be tested based on the field information of the data table of the source system and the association relationship between the data table to be tested and the data table of the source system includes:

3. The method according to claim 1, wherein the determining the field mapping rule of the data table to be tested based on the field information of the data table of the source system and the association relationship between the data table to be tested and the data table of the source system includes:

4. The method according to claim 1, wherein the determining the field mapping rule of the data table to be tested based on the field information of the data table of the source system and the association relationship between the data table to be tested and the data table of the source system includes:

5. The method of any of claims 1 to 4, wherein the test data of the source system data table is generated for a build constraint for a test scenario, the test data comprising at least one of normal data, abnormal data, and boundary data.

6. The method according to any one of claims 1 to 4, wherein the test case is a test program determined based on field information of the source system data table and an association relationship between the data table to be tested and the source system data table.

7. The method of claim 1, wherein the comparing the first process data and the second process data to determine the test results for the test case comprises:

8. A data testing apparatus for a large data platform, the apparatus comprising:

9. A data testing apparatus for a large data platform, the apparatus comprising: a memory and a processor;

the memory is used for storing related program codes;

the processor is configured to invoke the program code to perform the data testing method of the big data platform of any of claims 1 to 7.

10. A computer readable storage medium for storing a computer program for executing the data testing method of the big data platform according to any one of claims 1 to 7.