CN115525575A

CN115525575A - Data automation testing method and system based on Dataworks platform

Info

Publication number: CN115525575A
Application number: CN202211341016.XA
Authority: CN
Inventors: 胡苗青; 胡少君; 李景哲
Original assignee: China Pacific Life Insurance Co Ltd
Current assignee: China Pacific Life Insurance Co Ltd
Priority date: 2022-10-30
Filing date: 2022-10-30
Publication date: 2022-12-27

Abstract

The invention relates to a data automation test method and a system based on a Dataworks platform, wherein the method comprises the following steps: acquiring a data model document, and analyzing the data model document to obtain analysis information; acquiring table structure actual information of a data model from an API (application program interface) of a Dataworks platform, and comparing table structure analysis information with the table structure actual information one by one; generating test data, determining a plurality of test environments of the data model, acquiring an open interface of a Dataworks platform, automatically configuring the data model in the plurality of test environments, and substituting the test data; and acquiring a newly written script or selecting an existing script, executing the test of the data model in each test environment, and checking the test result. Compared with the prior art, the method decomposes the basic actions of the data model test, realizes automation of a specific manual operation process, improves the test efficiency of testers, and improves the quality threshold of the data model.

Description

Data automation testing method and system based on Dataworks platform

Technical Field

The invention relates to the technical field of testing, in particular to a data automation testing method and system based on a Dataworks platform.

Background

The Dataworks is a PaaS platform in the big data field released by Aliyun, provides all-round product services such as data integration, data development, data management, data governance and the like, and is widely applied to the digital transformation of finance, internet and traditional industries. Various data models can be developed and designed through the Dataworks, how to ensure the quality of the data models developed based on the Dataworks, such as integrity, consistency, accuracy, timeliness and the like, and improve the testing efficiency, so that the standardization and the sustainability of testing activities are problems which need to be considered and solved urgently in the industry.

At present, the tests in the industry are mainly implemented by writing SQL on an Ali platform, carrying out total quantity test on data results and carrying out data result sampling kernel peer-to-peer test, and the granularity, the depth and the like of the test highly depend on the technical capability of a tester and the comprehension capability of a data model. Several pain spots were present during the test:

1) In the testing process, more manual comparison links exist, such as table structure comparison and table association comparison;

2) The whole data link has high test tracking difficulty and is not systematic;

3) And the test asset difficult association model generates a better test knowledge sharing system.

The other pain point is data preparation, and due to reasons such as supervision and the like, test data of a test environment may have the problems of loss of correlation and verifiability after deletion or desensitization, so that test data are required to be constructed in the test environment by referring to a field meaning table structure and the like.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a data automation testing method and system based on a Dataworks platform.

The purpose of the invention can be realized by the following technical scheme:

a data automation test method based on a Dataworks platform is used for testing a data model deployed on the Dataworks platform and comprises the following steps:

s1, obtaining a data model document, analyzing the data model document to obtain analysis information, wherein the analysis information comprises table structure analysis information, a process node name, a table association relation and field value logic of the data model;

s2, acquiring table structure actual information of the data model from an API (application programming interface) of the Dataworks platform, comparing table structure analysis information with the table structure actual information one by one, and if the table structure analysis information is different from the table structure actual information, adjusting a data model document or the data model until the table structure analysis information is consistent with the table structure actual information;

s3, generating test data, determining a plurality of test environments of the data model, acquiring an open interface of a Dataworks platform, automatically configuring the data model in the plurality of test environments, and substituting the test data;

and S4, acquiring a new writing script or selecting an existing script, executing the test of the data model under each test environment, and checking the test result.

Further, after the step S2, the method further includes: presetting an outline rule template, and substituting the process node name, the table association relation and the field value logic into the outline rule template to generate a test outline.

Further, in step S3, if there is the historical test data of the data model, the historical test data of the data model is synchronized as the test data, and if there is no historical test data of the data model, the test data is obtained through the data parity.

Further, the data modeling process is as follows:

and extracting fields needing number making according to the table structure analysis information, analyzing field number making rules preset for each field, and making the number based on the field number making rules and the table association information.

Further, the method also comprises the step S5:

for a newly written script, storing the script, adding labels including but not limited to a theme domain dimension, a model dimension, a function dimension, a project dimension and a function point dimension for the script, and calculating an integral value of the script; for the existing script, updating labels of a script model dimension, a function dimension, a project dimension and a function point dimension, and updating an integral value of the script; the value of the integral value is based on the click rate and the frequency of use of the script.

Further, in step S4, a script is selected from existing scripts as a script to be recommended based on the correlation between the tag of the script and the data model, a recommended script is selected from the scripts to be recommended based on the score of the script to be recommended, and the recommended script is displayed to the user.

Further, in step S4, the checking manner of the test result includes one or more of constraint checking, enumeration checking, range checking, consistency checking, data amount checking, non-null checking, data fluctuation checking, uniqueness checking, and supply file checking.

Further, the step S4 further includes: calling an interface of a Dataworks platform to collect logs in a testing process, analyzing the log file to obtain error information, performing similarity matching on the error information and error information records in a scheme library, and recommending a solution according to a matching result, wherein a plurality of error information records and corresponding solutions are prestored in the scheme library.

Further, step S1 is preceded by:

presetting a trigger condition, monitoring a data model on a Dataworks platform, and executing the steps S1-S4 if the data model meets the trigger condition.

A data automation test system based on a Dataworks platform is used for executing the data automation test method and comprises the following steps:

the data model document analysis module is used for acquiring a data model document and analyzing the data model document to obtain analysis information, wherein the analysis information comprises table structure analysis information, process node names, table association relations and field value logic of the data model;

the table structure comparison module is used for acquiring the table structure actual information of the data model from the API of the Dataworks platform and comparing the table structure analysis information with the table structure actual information one by one;

the test preparation module is used for generating test data, determining a plurality of test environments of the data model, acquiring an open interface of a Dataworks platform, automatically configuring the data model in the plurality of test environments, and substituting the test data;

and the test module is used for acquiring the newly written script or selecting the existing script, executing the test of the data model in each test environment and checking the test result.

Compared with the prior art, the invention has the following beneficial effects:

the basic actions of the data model test are decomposed, automation of specific manual operation processes is realized, such as table structure comparison, test outline compiling, test data preparation, result checking, dataworks automatic node execution tasks and the like, script management, solution recommendation and the like are integrated, a unified, systematic and extensible one-stop test platform is provided, the test efficiency of testers is improved, the quality threshold of the data model is improved, and the workload of the testers is reduced.

Drawings

FIG. 1 is a flow chart of a method for automated testing of data;

FIG. 2 is a schematic diagram of an application data automated test system for testing.

Detailed Description

The invention is described in detail below with reference to the figures and the specific embodiments. The present invention is not limited to the embodiments described above, and the embodiments are not limited to the specific embodiments described above, but may be implemented in other embodiments without departing from the scope of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic may be included in at least one implementation of the invention. In describing the present invention, it is to be understood that the terms "first," "second," and "third," etc. in the description and claims of the present invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The present specification provides method steps as in the examples or flow diagrams, but may include more or fewer steps based on routine or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In actual system or server product execution, the steps in the method according to the embodiment or the figures may be executed sequentially or in parallel (for example, in the context of parallel processors or multi-thread processing), or the execution order of the steps without timing limitation may be adjusted.

Example 1:

s1, acquiring a data model document, analyzing the data model document to obtain analysis information, wherein the analysis information comprises table structure analysis information, a process node name, a table association relation and field value logic of the data model;

the data model document is designed by a data analyst, is in an excel fixed format, records information such as a table structure and the like designed by a data hierarchical model, and comprises DWD table mapping, DWS table mapping and ADM table mapping. The data model document is analyzed to obtain table structure analysis information, process node names, table association relations, field dereferencing logic and other contents, wherein the table structure analysis information comprises table names, field types, field lengths, comments and other contents.

the data model is realized in the Dataworks by deploying the code package, and the actual table structure information of the data model, including table names, field types, field lengths, comments and the like, can be obtained from the API of the Dataworks. And comparing the table structure analysis information with the table structure actual information one by one, and if the table structure analysis information is different from the table structure of the document, the test cannot be started and needs to be fed back to a tester.

Furthermore, in order to guide the test, after step S2, the method further comprises: presetting an outline rule template, and logically substituting a process node name, a table association relation and a field value into the outline rule template to generate a test outline. Specifically, an outline rule template in a thought-lead diagram form can be configured in advance, verification points from a node level to a surface level to a field level, including verification points of table association relations and field value logic, are set in the template, contents such as a process node name, the table association relations and the field value logic are obtained by analyzing a data model document, and are combined with the outline rule template to generate a test outline in an online thought-lead diagram form for guiding test work, wherein the outline comprises nodes contained in the document, association relations of tables, field value logic, types, enumeration values, function test methods and the like, so that the online test outline is generated automatically, and the time for manually editing the test outline is saved.

on one hand, if the historical test data of the data model exist in the sandbox, the historical test data of the data model is directly synchronized to be used as the test data of the current test, and if the historical test data of the data model do not exist, the test data is obtained through data modeling; the data making process is as follows: and extracting fields needing the number of manufacture according to the table structure analysis information, analyzing field number rules preset in each field, and performing the number of manufacture based on the field number rules and the table association information.

Specifically, the rules of the fields in the number making process can be analyzed according to the table structure passing the verification in the step S2, and the association relationship of each table is combined according to the rules, such as uniqueness and value range, so as to automatically match the tables during the number making process; parameterized key fields such as mobile phone number (network segment, digit, etc.), identity card (gender, age group), etc.

When the method and the device are used for manufacturing the number, besides the field number manufacturing rule, the incidence relation of each table is also considered, for example, a business employee number management business employee monthly table and a business employee performance table are used, so that the test data can accord with the table incidence relation of the data model, the efficiency of data construction is improved, the standard of the test data is standardized, and the advantage of improving the effectiveness of the number manufacturing is brought.

In addition, the data model needs to run in different environments, such as an SIT (System Integration Test) environment, a sandbox environment, and the like, and the conventional Test needs to deploy codes in each Test environment, configure nodes of the data model, and acquire node task information and node code information of each node of the data model from different Test environments. By acquiring the interface opened by the Dataworks, the method and the device can realize the synchronization of the flow node information of a plurality of synchronous test environments, provide a comparison function for comparing the multi-environment flow node information and discover the difference among the environments. The node task information includes execution frequency, execution time and the like, the node code information includes data script codes of the nodes, and the process node information includes upstream and downstream relations and the like of the nodes.

And S4, acquiring a newly written script or selecting an existing script, executing the test of the data model in each test environment, and checking the test result.

Checking items such as format and desensitization rule of checking identity card number can be designed according to data model, service requirement and data test specification requirement, and multidimensional checking points are set for automatic checking, such as bundle checking, enumeration checking, range checking, consistency checking, data volume checking, non-null checking, data fluctuation checking, uniqueness checking, data supply file checking and the like.

Constraint checking, such as checking of association between tables, etc.; enumeration check, for example, the enumeration value of a channel field of some dangerous sale can only be WeChat small program, official network, off-line marketing personnel and group; scope checking, such as the age field of an insurance carrier of a certain insurance type ranging from 18 to 80 years old; consistency check, such as consistency of values or equal logical sum of existence, such as total company data = sum of branch company data, attendance rate = number of people in attendance/total number of people, and the like; data quantity inspection, such as a certain activity report, the number of people participating in the activity does not change after being associated with other tables for calculation; non-null checking, such as a policy related table, where the policy number cannot be null; data fluctuation checking, such as that the daily data amount fluctuates within a certain range; uniqueness checks, such as a policy correlation table, that the policy number is unique;

the inspection of the data supply file is an inspection mode provided by the application aiming at the data model test, and the inspection process comprises the following steps: the Dataworks provides a process node for generating a data supply file, the data file generated by the process node is acquired and automatically compared with the source table data, and the accuracy of the data file generated by the data model can be ensured by checking the data supply file.

In addition, an interface collection log of a Dataworks platform is called in the test process, and a log file is analyzed to obtain error information, such as: table association error, whether to use new association fields, whether field types are inconsistent, whether to modify field types, whether execution time exceeds early warning duration, and the like. In a conventional test, a tester generally reads error information, judges the error information according to business experience, and provides a solution. In the application, a solution library is established, and a plurality of error information records and corresponding solutions are prestored in the solution library, so that similarity matching can be performed on the error information and the error information records in the solution library, and the solutions are recommended according to matching results, so that the technical threshold of testers is greatly reduced, and the workload of the testers is reduced. Moreover, log collection can well track the whole data link test process, and intelligent solution recommendation of test problems can be realized by using the scheme library.

The application also adds an automatic script recommendation function, which comprises the following steps:

s5, for the newly written script, storing the script, adding labels including but not limited to a theme domain dimension, a model dimension, a function dimension, a project dimension and a function point dimension for the script, and calculating an integral value of the script; for the existing script, updating labels of a script model dimension, a function dimension, a project dimension and a function point dimension, and updating an integral value of the script; the value of the integral value is based on the click rate and the use frequency of the script, the efficient, accurate and automatic recommendation of the test script can be achieved according to the accumulation of user contact data, the accumulation and effective use of data test assets are achieved, and the workload of test personnel for compiling the script is reduced.

The dimension of the subject domain can be financial/supervision reporting/anti-fraud and the like, the dimension of the function can be summation/index calculation/judgment selection and the like, the dimension of the project corresponds to a development project to which the data model belongs, and the dimension of the function point corresponds to the function of the data model.

The application also provides index management such as subject domain indexes, models, tables and field consanguinity relations. The knowledge base is established, and the index information including the subject domain, the model, the table and the field consanguinity relation of the index is maintained, so that the advantages of convenience in combing data assets and rapid identification of an associated party by modifying the model can be brought.

The method and the system also provide the advantages of automatically generating the lake entering application form service, maintaining the lake entering information of the data and connecting the data model with the service database, thereby improving the lake entering efficiency. The method can directly select a business data table needing to enter the lake, a table name or a file name after entering the lake to generate a lake entering application form, can select the business data table needing to enter the lake and provide a desensitization function, can desensitize sensitive business data such as names, mobile phone numbers, certificate numbers and the like, can generate a desensitization script by using a desensitization tool, can record lake application record data, and can inform a downstream associated node when switching or changing a lake entering source and trigger the automatic operation of the associated node.

Corresponding to the step S5, in the step S4, a user can obtain a corresponding script as a test reference through a search tag, an automatic recommendation system is further established, the automatic recommendation system can obtain the field and related functions of the data model, so that a script to be recommended with high association with the data model is found from the existing scripts through similarity search or association search and the like, then the script with high integration is selected from a plurality of scripts to be recommended as a recommendation script based on the integration of the script to be recommended, the plurality of recommendation scripts are displayed to the user for selection of the user, such as searching for the mark insurance, a related test script of mark insurance calculation logic can be recommended, the mark insurance belongs to an important index of a financial subject field, and the script can be prompted for reference during the test of the financial subject field data model.

An automatic trigger test mechanism is designed, a trigger condition can be preset, a data model on a Dataworks platform is monitored, and the steps S1-S5 are executed if the data model meets the trigger condition.

If the code of a certain task node of one data model is changed, a test task is immediately and automatically triggered if each task node of the data model on the Dataworks platform can be monitored; or periodically and regularly executing the test, re-comparing the table structure, executing the test of each node and pushing the test structure.

In addition, currently, dataworks only provides scheduling according to a model or a node, the granularity is coarse, tables and fields under each node of a data model are arranged, and when a test of the data model is executed in each test environment through a script, test tasks aiming at the tables and the fields are included. Considering that the test of the whole data model has large workload, only the tasks related to the table and the index field of the data model update can be executed in the test triggered after the data model update.

According to the method and the device, basic actions of data model testing are decomposed, automation of specific manual operation processes is realized, such as table structure comparison, test outline compiling, test data preparation, result checking, dataworks automatic node task execution and the like, script management, solution recommendation and the like are integrated, a unified systematic extensible one-stop testing platform is provided, the testing efficiency of testers is improved, the quality threshold of data models is improved, and the workload of the testers is reduced.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

Additionally, some portions of the present application may be applied as a computer program product, such as computer program instructions, which, when executed by a computer, may invoke or provide the method and/or solution according to the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

The application also provides a data automation test system based on the Dataworks platform, which is used for executing the data automation test method and comprises the following steps:

the data model document analysis module is used for acquiring a data model document, analyzing the data model document to obtain analysis information, wherein the analysis information comprises table structure analysis information, a process node name, a table association relation and field value logic of the data model;

In addition, the method can also be compared with the data automatic test method, an extended script recommending module, a solution recommending module, a test task triggering module and the like.

The functions of each module of the system are the same as those of the above-mentioned data automated testing method, and are not described herein again, and reference may be made to fig. 2, where, when the system is used for testing, as shown in fig. 2:

in the pre-preparation phase of the test: and synchronizing metadata in the project space, namely acquiring a data model from an API (application programming interface) of a Dataworks platform, uploading a data model document, comparing a table structure, and then generating a test outline based on contents such as a process node name, a table association relation, field value logic and the like. And determining nodes needing to be tested by the data model, and then comparing the relevant codes of the nodes with the codes in the CICD test package to determine that the codes tested next are the correct versions.

In the event of a test, the phase is executed: the method comprises the steps of determining a node task corresponding to a node to be tested in a data model in a test through task arrangement, then generating test data by using the existing test data or a manufactured number, executing the node task to test, performing multi-dimensional inspection such as consistency inspection and data supply file inspection on a test result, obtaining log information of the test process and analyzing the log information so as to intelligently recommend a solution. It can be understood that when the tasks are arranged, the corresponding tasks include some temporary SQL tasks besides the node tasks, and the tasks can be executed.

In the afterward review phase of the test: adding corresponding indexes for the data model and performing index management, wherein the index management comprises a theme domain, a model, a table and a field blood relationship where the indexes are located, adding labels for the script and putting the labels into a knowledge base, making a solution for error information to realize operation and maintenance management, and managing the lake entering information of the data.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations can be devised by those skilled in the art in light of the above teachings. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A data automation test method based on a Dataworks platform is characterized by being used for testing a data model deployed on the Dataworks platform and comprising the following steps of:

s2, obtaining table structure actual information of the data model from an API (application program interface) of the data platforms, comparing table structure analysis information with the table structure actual information one by one, and if the table structure analysis information is different from the table structure actual information, adjusting a data model document or the data model until the table structure analysis information is consistent with the table structure actual information;

2. The data automation test method based on Dataworks platform as claimed in claim 1, wherein the step S2 is followed by further comprising: presetting an outline rule template, and logically substituting a process node name, a table association relation and a field value into the outline rule template to generate a test outline.

3. The method according to claim 1, wherein in step S3, if there is historical test data of the data model, the historical test data of the data model is synchronized as test data, and if there is no historical test data of the data model, the test data is obtained through data modeling.

4. The data automation test method based on the Dataworks platform as claimed in claim 1, wherein the data modeling process is as follows:

5. The data automation test method based on the Dataworks platform as claimed in claim 1, further comprising the step S5:

for a newly written script, storing the script, adding labels including but not limited to a theme domain dimension, a model dimension, a function dimension, a project dimension and a function point dimension for the script, and calculating an integral value of the script; for the existing script, updating labels of a script model dimension, a function dimension, a project dimension and a function point dimension, and updating an integral value of the script; the value of the integral value is based on the click rate and the use frequency of the script.

6. The data automation test method based on the Dataworks platform as claimed in claim 5, wherein in the step S4, a script is selected from existing scripts as a script to be recommended based on the correlation between the label of the script and the data model, a recommended script is selected from the scripts to be recommended based on the score of the script to be recommended, and the recommended script is displayed to the user.

7. The method for automatically testing data based on Dataworks platform as claimed in claim 1, wherein in step S4, the checking manner of checking the test result includes one or more of constraint check, enumeration check, range check, consistency check, data quantity check, non-null check, data fluctuation check, uniqueness check, and supply file check.

8. The method for data automation test based on Dataworks platform as claimed in claim 1, wherein said step S4 further comprises: calling an interface of a Dataworks platform to collect logs in a testing process, analyzing the log file to obtain error information, performing similarity matching on the error information and error information records in a scheme library, and recommending a solution according to a matching result, wherein a plurality of error information records and corresponding solutions are prestored in the scheme library.

9. The method for automatically testing data based on Dataworks platform as claimed in claim 1, wherein said step S1 further comprises:

10. A Dataworks platform based data automated testing system for performing the data automated testing method of any of claims 1-9, comprising: