CN113434414A - Data testing method and device, electronic equipment and storage medium - Google Patents

Data testing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113434414A
CN113434414A CN202110721429.XA CN202110721429A CN113434414A CN 113434414 A CN113434414 A CN 113434414A CN 202110721429 A CN202110721429 A CN 202110721429A CN 113434414 A CN113434414 A CN 113434414A
Authority
CN
China
Prior art keywords
data
data set
sample data
training sample
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110721429.XA
Other languages
Chinese (zh)
Inventor
向乾
尤薇
李桂芸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202110721429.XA priority Critical patent/CN113434414A/en
Publication of CN113434414A publication Critical patent/CN113434414A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3676Test management for coverage analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The invention relates to the technical field of testing, and discloses a data testing method, which comprises the following steps: identifying a data version of sample data in a training sample data set; when the sample data is of a first version, calling a test case corresponding to the first version as a test case of the training sample data set; and when the sample data is of a second version, calculating first similarity of the training sample data and the reference data and second similarity between field-level data in the reference data and a plurality of standard fields, wherein the first similarity meets the condition, screening out a target test case according to the second similarity, and testing the target test case executed by the data to be tested by using a test engine to obtain a test result. In addition, the invention also relates to a block chain technology, and the first similarity can be stored in a node of the block chain. The invention also provides a data testing device, electronic equipment and a computer readable storage medium. The invention can solve the problem of low efficiency of data test.

Description

Data testing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of test technologies, and in particular, to a data testing method and apparatus, an electronic device, and a computer-readable storage medium.
Background
With the advent of the big data era, various data are increasing, and meanwhile, certain requirements are also placed on the accuracy of the data, so that data testing is required. The traditional data test generally needs to manually compile data test cases, the manual compilation depends on the experience of testers and the understanding degree of services, and the test coverage rate cannot be guaranteed. Meanwhile, data testing lags behind data development, instant testing cannot be achieved, and efficiency needs to be improved. Therefore, a better data testing method is urgently needed.
Disclosure of Invention
The invention provides a data testing method, a data testing device and a computer readable storage medium, and mainly aims to solve the problem of low data testing efficiency.
In order to achieve the above object, the present invention provides a data testing method, which includes:
acquiring a historical data set, and screening the historical data set according to a pre-compiled check rule to obtain a training sample data set;
identifying a data version of sample data in the training sample data set;
when the sample data in the training sample data set is a first version, calling a test case corresponding to the first version as a test case of the training sample data set;
when the sample data in the training sample data set is a second version, calculating a first similarity between the training sample data set and pre-acquired reference data, and taking test cases corresponding to one or more groups of reference data with the first similarity being greater than or equal to a preset similarity threshold as the test cases of the training sample data set;
extracting field-level data in one or more groups of corresponding reference data of which the first similarity is greater than or equal to a preset similarity threshold, calculating second similarities between the field-level data and a plurality of standard fields in a standard field library, and taking test cases corresponding to the standard fields of which the second similarities are greater than or equal to the field threshold as field-level test cases;
determining at least one item in the test cases of the training sample data set, the field level test cases and the table level test cases corresponding to the field level test cases as a target test case;
and inputting the target test case into a preset test engine, and executing the target test case to test the pre-acquired data to be tested by using the test engine to obtain a test result.
Optionally, the screening the historical data set according to a pre-written verification rule to obtain a training sample data set, including:
verifying the historical data set by using a basic data verification rule in the verification rules to obtain first data which accords with the basic data verification rule;
screening out second data which accord with the user-defined check rule in the check rule from the historical data set;
and summarizing the first data and the second data to obtain a training sample data set.
Optionally, the verifying the historical data set by using a basic data verification rule in the verification rules to obtain first data meeting the basic data verification rule includes:
determining whether there is duplicate historical data in the historical data set;
and if the repeated historical data exist, deleting the repeated historical data, and screening out the historical data with the acquisition time being greater than or equal to a preset time threshold value in the historical data set as first data.
Optionally, the calculating a first similarity between the training sample data set and pre-acquired reference data includes:
calculating a covariance between the training sample data set and the reference data;
and calculating according to the covariance and a preset Pearson correlation formula to obtain a first similarity between the training sample data set and the pre-acquired reference data.
Optionally, said calculating covariance between said training sample data set and said reference data comprises:
training the covariance between the sample data set and the reference data using the following formula:
cov(X,Y)=E(X-μ)(Y-υ)
wherein cov (X, Y) is the covariance, X is the training sample data set, Y is the reference data, μ represents the mathematical expectation of the training sample data set, and υ is the mathematical expectation of the reference data.
Optionally, the calculating according to the covariance and a preset pearson correlation formula to obtain a first similarity between the training sample data set and the pre-acquired reference data includes:
and calculating to obtain a first similarity between the training sample data set and the pre-acquired reference data by using the following formula:
Figure BDA0003136677030000031
where ρ isx,yCov (X, Y) is the covariance, σxAnd σyAnd respectively corresponding standard deviations of the training sample data set and the pre-acquired reference data.
Optionally, the identifying a data version of sample data in the training sample data set comprises:
acquiring a version information corresponding table;
and identifying the data version corresponding to the sample data according to the mapping relation between the sample data and the data version in the version information corresponding table.
In order to solve the above problem, the present invention also provides a data testing apparatus, comprising:
the data screening module is used for acquiring a historical data set, and screening the historical data set according to a pre-compiled check rule to obtain a training sample data set;
the data version identification module is used for identifying the data version of the sample data in the training sample data set;
a test case generation module, configured to, when sample data in the training sample data set is a first version, call a test case corresponding to the first version as the test case of the training sample data set, when the sample data in the training sample data set is a second version, calculate a first similarity between the training sample data set and pre-acquired reference data, use a test case corresponding to one or more groups of reference data having the first similarity greater than or equal to a preset similarity threshold as the test case of the training sample data set, extract field-level data in the one or more groups of reference data having the first similarity greater than or equal to the preset similarity threshold, calculate a second similarity between the field-level data and multiple standard fields in a standard field library, and use a test case corresponding to a standard field having the second similarity greater than or equal to the field threshold as the field-level test case Using a case;
and the test execution module is used for determining at least one of the obtained test cases of the training sample data set, the field level test cases and the table level test cases corresponding to the field level test cases as a target test case, inputting the target test case into a preset test engine, and executing the target test case on the pre-obtained data to be tested by using the test engine to obtain a test result.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the data testing method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the data testing method.
According to the invention, a pre-compiled check rule is executed on the historical data set to obtain a training sample data set, data meeting requirements are screened out by utilizing the check rule to serve as training sample data, the data version of the sample data in the training sample data set is judged, corresponding processing is carried out according to the data version of the sample data, when the sample data is a first version, a test case corresponding to the first version is called to serve as a test case of the training sample data set, wherein the first version is an old version, and a preset test case exists in the old version, so that the test case corresponding to the first version is directly called to serve as the test case of the training sample data set, the efficiency of data testing is improved, when the sample data in the training sample data set is a second version, the test case corresponding to one or more corresponding reference data screened out by calculating the first similarity between the sample data set and the pre-acquired reference data is used as the test case of the training sample data set After the use case is used as the test case of the training sample data set, field level data in the reference data are extracted, second similarity calculation and screening are carried out, the test case corresponding to the standard field with the second similarity being larger than or equal to the field threshold value is used as the field level test case, screening is carried out from the angle of the field level data, the accuracy of the test case is guaranteed, the target test case is executed to the engine, and the engine is used for testing the data to be tested to obtain a test result. Therefore, the data testing method, the data testing device, the electronic equipment and the computer readable storage medium provided by the invention can solve the problem of low data testing efficiency.
Drawings
Fig. 1 is a schematic flow chart of a data testing method according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of a data testing apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device implementing the data testing method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a data testing method. The execution subject of the data testing method includes, but is not limited to, at least one of electronic devices, such as a server and a terminal, which can be configured to execute the method provided by the embodiments of the present application. In other words, the data testing method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a data testing method according to an embodiment of the present invention.
In this embodiment, the data testing method includes:
s1, obtaining a historical data set, and screening the historical data set according to a pre-written check rule to obtain a training sample data set.
In the embodiment of the invention, a historical data set can be obtained from a database storing the historical data set by using java statements with a data calling function, wherein the historical data set comprises data information such as a data source table, a calculation rule, a calculation cycle, a release cycle and a field.
Specifically, the screening the historical data set according to a pre-written verification rule to obtain a training sample data set includes:
verifying the historical data set by using a basic data verification rule in the verification rules to obtain first data which accords with the basic data verification rule;
screening out second data which accord with the user-defined check rule in the check rule from the historical data set;
and summarizing the first data and the second data to obtain a training sample data set.
In detail, the pre-written verification rule includes a basic data verification rule and a self-defined verification rule, wherein the basic verification rule is simple verification and data processing for the historical data set biased toward basic property data.
In the embodiment of the present invention, the basic data checking rule includes, but is not limited to, deduplication processing and checking against acquisition time. The user-defined check rule is a check rule written according to the operation purpose.
For example, the job purpose is "obtain a female client list with 1 ten thousand yuan of daily-average assets in the past week", the custom check rule may be that the daily-average assets are greater than or equal to 1 ten thousand yuan, the gender of the client is female, and the like.
Further, the verifying the historical data set by using a basic data verification rule in the verification rules to obtain first data meeting the basic data verification rule includes:
determining whether there is duplicate historical data in the historical data set;
and if the repeated historical data exist, deleting the repeated historical data, and screening out the historical data with the acquisition time being greater than or equal to a preset time threshold value in the historical data set as first data.
The acquisition time in the historical data set refers to the time called from the database.
And S2, identifying the data version of the sample data in the training sample data set.
In this embodiment of the present invention, the identifying the data version of the sample data in the training sample data set includes:
acquiring a version information corresponding table;
and identifying the data version corresponding to the sample data according to the mapping relation between the sample data and the data version in the version information corresponding table.
In detail, the version information correspondence table includes mapping relationships between a plurality of pieces of sample data and data versions, and the data versions of the sample data in the training sample data set can be identified according to the version information correspondence table.
And S3, when the sample data in the training sample data set is a first version, calling a test case corresponding to the first version as the test case of the training sample data set.
In the embodiment of the present invention, the first version is a version corresponding to reference data that is calculated and processed in advance, and when the sample data in the training sample data set is the first version, a test case corresponding to the first version is called as the test case of the training sample data set.
And S4, when the sample data in the training sample data set is a second version, calculating a first similarity between the training sample data set and the pre-acquired reference data, and taking the test cases corresponding to one or more groups of reference data with the first similarity being greater than or equal to a preset similarity threshold as the test cases of the training sample data set.
In the embodiment of the present invention, the second version refers to a version corresponding to reference data that is not subjected to calculation and processing, and when the sample data in the training sample data set is the second version, the corresponding test case cannot be directly obtained, so that similarity calculation and screening are required. The reference data is data which is stored in the database in advance and used for comparison, and can be obtained by calling from the database through a high-level program with a data calling function.
Specifically, the calculating a first similarity between the training sample data set and the pre-acquired reference data includes:
calculating a covariance between the training sample data set and the reference data;
and calculating according to the covariance and a preset Pearson correlation formula to obtain a first similarity between the training sample data set and the pre-acquired reference data.
In detail, said calculating covariance between said training sample data set and said reference data comprises:
training the covariance between the sample data set and the reference data using the following formula:
cov(X,Y)=E(X-μ)(Y-υ)
wherein cov (X, Y) is the covariance, X is the training sample data set, Y is the reference data, μ represents the mathematical expectation of the training sample data set, and υ is the mathematical expectation of the reference data.
In detail, the covariance is used to measure the overall error of two variables.
Further, the calculating according to the covariance and a preset pearson correlation formula to obtain a first similarity between the training sample data set and the pre-acquired reference data includes:
and calculating to obtain a first similarity between the training sample data set and the pre-acquired reference data by using the following formula:
Figure BDA0003136677030000071
where ρ isx,yCov (X, Y) is the covariance, σxAnd σyAnd respectively corresponding standard deviations of the training sample data set and the pre-acquired reference data.
Specifically, the magnitude between the similarity and a preset similarity threshold is judged, and a test case corresponding to one or more reference data corresponding to the similarity greater than or equal to the preset similarity threshold is used as the test case of the training sample data set.
S5, extracting field level data in one or more groups of corresponding reference data with the first similarity being greater than or equal to a preset similarity threshold, calculating second similarities between the field level data and a plurality of standard fields in a standard field library, and taking the test case corresponding to the standard field with the second similarity being greater than or equal to the field threshold as the field level test case.
In the embodiment of the invention, the field level data in the corresponding one or more reference data with the first similarity being greater than or equal to the preset similarity threshold is extracted, the second similarity between the field level data and a plurality of standard fields in a standard field library is calculated, the test cases are screened according to the second similarity, the corresponding test cases can be further screened from the field angle based on the field level similarity calculation, and the accuracy of the test cases is ensured.
Specifically, the extracting of the field-level data in the corresponding one or more reference data with the first similarity greater than or equal to the preset similarity threshold is to trace the source of the field in the table according to SQL analysis.
Further, calculating a second similarity between the field-level data and a plurality of standard fields in a standard field library, comprising:
and respectively calculating second similarity between the field-level data and a plurality of standard fields in the standard field library by using a similarity formula.
In detail, the embodiment of the present invention may employ many calculation methods to calculate the second similarity between the field-level data and the plurality of standard fields in the standard field library, including, but not limited to, calculating by using a cosine similarity formula, calculating by using an euclidean distance, and the like.
Optionally, in an embodiment of the present invention, the calculating the second similarity between the field-level data and the plurality of standard fields in the standard field library includes:
calculating a second similarity between the field-level data and a plurality of standard fields in the standard field library using the following formula:
Figure BDA0003136677030000081
wherein cos (a, b) is the second similarity, a is the field vector, b is the standard vector, and | a |, | are the modulus corresponding to the field vector and the modulus corresponding to the standard vector, respectively.
The field level data and the plurality of standard fields in the standard field library can be vectorized according to a preset word2vec algorithm to obtain the field vector and the standard vector.
Specifically, the test case corresponding to the standard field with the second similarity greater than or equal to the field threshold is used as the field-level test case.
S6, determining at least one item of the test cases of the training sample data set, the field level test cases and the table level test cases corresponding to the field level test cases as target test cases.
In the embodiment of the present invention, the table-level test case corresponding to the field-level test case refers to a test case in which reference data corresponding to field-level data is extracted, a target test case is further determined according to the obtained response request, when the obtained response request is a full test, the test case of the training sample data set is determined as the target test case, when the obtained response request is a field test, the field-level test case is determined as the target test case, and when the obtained response request is a table-level test, the table-level test case corresponding to the field-level test case is determined as the target test case.
S7, inputting the target test case into a preset test engine, and executing the target test case to test the pre-acquired data to be tested by using the test engine to obtain a test result.
In the embodiment of the invention, a Hive engine is used for the test cases of the training sample data set and the field-level test cases to ensure the stability of the query sql, and a Presto engine is used for the table-level test cases corresponding to the field-level test cases to improve the test efficiency.
In detail, the Hive engine is a data warehouse basic tool used in Hadoop to process structured data, is structured above Hadoop to facilitate query and analysis, provides a simple sql query function, and can convert sql statements into MapReduce tasks for operation. The Presto engine is an open-source distributed SQL query engine, is suitable for real-time interactive analysis and query, supports massive data, and can solve the problem of low processing speed.
The data to be tested can be data which needs to be detected and evaluated and is obtained in a daily test environment.
In this embodiment, after the data to be tested is obtained, the test engine may be used to test a target test case executed on the pre-obtained data to be tested, so as to obtain a test result.
According to the invention, a pre-compiled check rule is executed on the historical data set to obtain a training sample data set, data meeting requirements are screened out by utilizing the check rule to serve as training sample data, the data version of the sample data in the training sample data set is judged, corresponding processing is carried out according to the data version of the sample data, when the sample data is a first version, a test case corresponding to the first version is called to serve as a test case of the training sample data set, wherein the first version is an old version, and a preset test case exists in the old version, so that the test case corresponding to the first version is directly called to serve as the test case of the training sample data set, the efficiency of data testing is improved, when the sample data in the training sample data set is a second version, the test case corresponding to one or more corresponding reference data screened out by calculating the first similarity between the sample data set and the pre-acquired reference data is used as the test case of the training sample data set After the use case is used as the test case of the training sample data set, field level data in the reference data are extracted, second similarity calculation and screening are carried out, the test case corresponding to the standard field with the second similarity being larger than or equal to the field threshold value is used as the field level test case, screening is carried out from the angle of the field level data, the accuracy of the test case is guaranteed, the target test case is executed to the engine, and the engine is used for testing the data to be tested to obtain a test result. Therefore, the data testing method provided by the invention can solve the problem of low data testing efficiency.
Fig. 2 is a functional block diagram of a data testing apparatus according to an embodiment of the present invention.
The data testing device 100 of the present invention can be installed in an electronic device. According to the implemented functions, the data testing apparatus 100 may include a data screening module 101, a data version identification module 102, a test case generation module 103, and a test execution module 104. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the data screening module 101 is configured to obtain a historical data set, and screen the historical data set according to a pre-compiled check rule to obtain a training sample data set;
the data version identification module 102 is configured to identify a data version of sample data in the training sample data set;
the test case generating module 103 is configured to, when sample data in the training sample data set is a first version, invoke a test case corresponding to the first version as the test case of the training sample data set, when the sample data in the training sample data set is a second version, calculate a first similarity between the training sample data set and pre-acquired reference data, use a test case corresponding to one or more groups of reference data having the first similarity greater than or equal to a preset similarity threshold as the test case of the training sample data set, extract field-level data in the one or more groups of reference data having the first similarity greater than or equal to the preset similarity threshold, calculate a second similarity between the field-level data and a plurality of standard fields in a standard field library, and use a test case corresponding to a standard field having the second similarity greater than or equal to the field threshold as the field-level data A class test case;
the test execution module 104 is configured to determine at least one of the obtained test cases of the training sample data set, the field-level test cases, and the table-level test cases corresponding to the field-level test cases as a target test case, input the target test case into a preset test engine, and execute the target test case on the pre-obtained data to be tested by using the test engine to obtain a test result.
In detail, the data testing apparatus 100 includes the following modules:
the method comprises the steps of firstly, obtaining a historical data set, and screening the historical data set according to a pre-compiled check rule to obtain a training sample data set.
In the embodiment of the invention, a historical data set can be obtained from a database storing the historical data set by using java statements with a data calling function, wherein the historical data set comprises data information such as a data source table, a calculation rule, a calculation cycle, a release cycle and a field.
Specifically, the screening the historical data set according to a pre-written verification rule to obtain a training sample data set includes:
verifying the historical data set by using a basic data verification rule in the verification rules to obtain first data which accords with the basic data verification rule;
screening out second data which accord with the user-defined check rule in the check rule from the historical data set;
and summarizing the first data and the second data to obtain a training sample data set.
In detail, the pre-written verification rule includes a basic data verification rule and a self-defined verification rule, wherein the basic verification rule is simple verification and data processing for the historical data set biased toward basic property data.
In the embodiment of the present invention, the basic data checking rule includes, but is not limited to, deduplication processing and checking against acquisition time. The user-defined check rule is a check rule written according to the operation purpose.
For example, the job purpose is "obtain a female client list with 1 ten thousand yuan of daily-average assets in the past week", the custom check rule may be that the daily-average assets are greater than or equal to 1 ten thousand yuan, the gender of the client is female, and the like.
Further, the verifying the historical data set by using a basic data verification rule in the verification rules to obtain first data meeting the basic data verification rule includes:
determining whether there is duplicate historical data in the historical data set;
and if the repeated historical data exist, deleting the repeated historical data, and screening out the historical data with the acquisition time being greater than or equal to a preset time threshold value in the historical data set as first data.
The acquisition time in the historical data set refers to the time called from the database.
And secondly, identifying the data version of the sample data in the training sample data set.
In this embodiment of the present invention, the identifying the data version of the sample data in the training sample data set includes:
acquiring a version information corresponding table;
and identifying the data version corresponding to the sample data according to the mapping relation between the sample data and the data version in the version information corresponding table.
In detail, the version information correspondence table includes mapping relationships between a plurality of pieces of sample data and data versions, and the data versions of the sample data in the training sample data set can be identified according to the version information correspondence table.
And step three, when the sample data in the training sample data set is a first version, calling a test case corresponding to the first version as the test case of the training sample data set.
In the embodiment of the present invention, the first version is a version corresponding to reference data that is calculated and processed in advance, and when the sample data in the training sample data set is the first version, a test case corresponding to the first version is called as the test case of the training sample data set.
And fourthly, when the sample data in the training sample data set is of a second version, calculating first similarity between the training sample data set and the pre-acquired reference data, and taking the test cases corresponding to one or more groups of reference data with the first similarity being greater than or equal to a preset similarity threshold as the test cases of the training sample data set.
In the embodiment of the present invention, the second version refers to a version corresponding to reference data that is not subjected to calculation and processing, and when the sample data in the training sample data set is the second version, the corresponding test case cannot be directly obtained, so that similarity calculation and screening are required. The reference data is data which is stored in the database in advance and used for comparison, and can be obtained by calling from the database through a high-level program with a data calling function.
Specifically, the calculating a first similarity between the training sample data set and the pre-acquired reference data includes:
calculating a covariance between the training sample data set and the reference data;
and calculating according to the covariance and a preset Pearson correlation formula to obtain a first similarity between the training sample data set and the pre-acquired reference data.
In detail, said calculating covariance between said training sample data set and said reference data comprises:
training the covariance between the sample data set and the reference data using the following formula:
cov(X,Y)=E(X-μ)(Y-υ)
wherein cov (X, Y) is the covariance, X is the training sample data set, Y is the reference data, μ represents the mathematical expectation of the training sample data set, and υ is the mathematical expectation of the reference data.
In detail, the covariance is used to measure the overall error of two variables.
Further, the calculating according to the covariance and a preset pearson correlation formula to obtain a first similarity between the training sample data set and the pre-acquired reference data includes:
and calculating to obtain a first similarity between the training sample data set and the pre-acquired reference data by using the following formula:
Figure BDA0003136677030000131
where ρ isx,yCov (X, Y) is the covariance, σxAnd σyAnd respectively corresponding standard deviations of the training sample data set and the pre-acquired reference data.
Specifically, the magnitude between the similarity and a preset similarity threshold is judged, and a test case corresponding to one or more reference data corresponding to the similarity greater than or equal to the preset similarity threshold is used as the test case of the training sample data set.
And fifthly, extracting field level data in one or more groups of corresponding reference data with the first similarity being greater than or equal to a preset similarity threshold, calculating second similarities between the field level data and a plurality of standard fields in a standard field library, and taking the test case corresponding to the standard field with the second similarity being greater than or equal to the field threshold as the field level test case.
In the embodiment of the invention, the field level data in the corresponding one or more reference data with the first similarity being greater than or equal to the preset similarity threshold is extracted, the second similarity between the field level data and a plurality of standard fields in a standard field library is calculated, the test cases are screened according to the second similarity, the corresponding test cases can be further screened from the field angle based on the field level similarity calculation, and the accuracy of the test cases is ensured.
Specifically, the extracting of the field-level data in the corresponding one or more reference data with the first similarity greater than or equal to the preset similarity threshold is to trace the source of the field in the table according to SQL analysis.
Further, calculating a second similarity between the field-level data and a plurality of standard fields in a standard field library, comprising:
and respectively calculating second similarity between the field-level data and a plurality of standard fields in the standard field library by using a similarity formula.
In detail, the embodiment of the present invention may employ many calculation methods to calculate the second similarity between the field-level data and the plurality of standard fields in the standard field library, including, but not limited to, calculating by using a cosine similarity formula, calculating by using an euclidean distance, and the like.
Optionally, in an embodiment of the present invention, the calculating the second similarity between the field-level data and the plurality of standard fields in the standard field library includes:
calculating a second similarity between the field-level data and a plurality of standard fields in the standard field library using the following formula:
Figure BDA0003136677030000141
wherein cos (a, b) is the second similarity, a is the field vector, b is the standard vector, and | a |, | are the modulus corresponding to the field vector and the modulus corresponding to the standard vector, respectively.
The field level data and the plurality of standard fields in the standard field library can be vectorized according to a preset word2vec algorithm to obtain the field vector and the standard vector.
Specifically, the test case corresponding to the standard field with the second similarity greater than or equal to the field threshold is used as the field-level test case.
And step six, determining at least one item of the test cases of the training sample data set, the field level test cases and the table level test cases corresponding to the field level test cases as a target test case.
In the embodiment of the present invention, the table-level test case corresponding to the field-level test case refers to a test case in which reference data corresponding to field-level data is extracted, a target test case is further determined according to the obtained response request, when the obtained response request is a full test, the test case of the training sample data set is determined as the target test case, when the obtained response request is a field test, the field-level test case is determined as the target test case, and when the obtained response request is a table-level test, the table-level test case corresponding to the field-level test case is determined as the target test case.
And step seven, inputting the target test case into a preset test engine, and executing the target test case to test the pre-acquired data to be tested by using the test engine to obtain a test result.
In the embodiment of the invention, a Hive engine is used for the test cases of the training sample data set and the field-level test cases to ensure the stability of the query sql, and a Presto engine is used for the table-level test cases corresponding to the field-level test cases to improve the test efficiency.
In detail, the Hive engine is a data warehouse basic tool used in Hadoop to process structured data, is structured above Hadoop to facilitate query and analysis, provides a simple sql query function, and can convert sql statements into MapReduce tasks for operation. The Presto engine is an open-source distributed SQL query engine, is suitable for real-time interactive analysis and query, supports massive data, and can solve the problem of low processing speed.
The data to be tested can be data which needs to be detected and evaluated and is obtained in a daily test environment.
In this embodiment, after the data to be tested is obtained, the test engine may be used to test a target test case executed on the pre-obtained data to be tested, so as to obtain a test result.
According to the invention, a pre-compiled check rule is executed on the historical data set to obtain a training sample data set, data meeting requirements are screened out by utilizing the check rule to serve as training sample data, the data version of the sample data in the training sample data set is judged, corresponding processing is carried out according to the data version of the sample data, when the sample data is a first version, a test case corresponding to the first version is called to serve as a test case of the training sample data set, wherein the first version is an old version, and a preset test case exists in the old version, so that the test case corresponding to the first version is directly called to serve as the test case of the training sample data set, the efficiency of data testing is improved, when the sample data in the training sample data set is a second version, the test case corresponding to one or more corresponding reference data screened out by calculating the first similarity between the sample data set and the pre-acquired reference data is used as the test case of the training sample data set After the use case is used as the test case of the training sample data set, field level data in the reference data are extracted, second similarity calculation and screening are carried out, the test case corresponding to the standard field with the second similarity being larger than or equal to the field threshold value is used as the field level test case, screening is carried out from the angle of the field level data, the accuracy of the test case is guaranteed, the target test case is executed to the engine, and the engine is used for testing the data to be tested to obtain a test result. Therefore, the data testing device provided by the invention can solve the problem of low data testing efficiency.
Fig. 3 is a schematic structural diagram of an electronic device implementing a data testing method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication interface 12 and a bus 13, and may further comprise a computer program, such as a data testing program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a data test program, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., data test programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The communication interface 12 is used for communication between the electronic device and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
The bus 13 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 13 may be divided into an address bus, a data bus, a control bus, etc. The bus 13 is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The data test program stored in the memory 11 of the electronic device 1 is a combination of instructions, which when executed in the processor 10, can implement:
acquiring a historical data set, and screening the historical data set according to a pre-compiled check rule to obtain a training sample data set;
identifying a data version of sample data in the training sample data set;
when the sample data in the training sample data set is a first version, calling a test case corresponding to the first version as a test case of the training sample data set;
when the sample data in the training sample data set is a second version, calculating a first similarity between the training sample data set and pre-acquired reference data, and taking test cases corresponding to one or more groups of reference data with the first similarity being greater than or equal to a preset similarity threshold as the test cases of the training sample data set;
extracting field-level data in one or more groups of corresponding reference data of which the first similarity is greater than or equal to a preset similarity threshold, calculating second similarities between the field-level data and a plurality of standard fields in a standard field library, and taking test cases corresponding to the standard fields of which the second similarities are greater than or equal to the field threshold as field-level test cases;
determining at least one item in the test cases of the training sample data set, the field level test cases and the table level test cases corresponding to the field level test cases as a target test case;
and inputting the target test case into a preset test engine, and executing the target test case to test the pre-acquired data to be tested by using the test engine to obtain a test result.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
acquiring a historical data set, and screening the historical data set according to a pre-compiled check rule to obtain a training sample data set;
identifying a data version of sample data in the training sample data set;
when the sample data in the training sample data set is a first version, calling a test case corresponding to the first version as a test case of the training sample data set;
when the sample data in the training sample data set is a second version, calculating a first similarity between the training sample data set and pre-acquired reference data, and taking test cases corresponding to one or more groups of reference data with the first similarity being greater than or equal to a preset similarity threshold as the test cases of the training sample data set;
extracting field-level data in one or more groups of corresponding reference data of which the first similarity is greater than or equal to a preset similarity threshold, calculating second similarities between the field-level data and a plurality of standard fields in a standard field library, and taking test cases corresponding to the standard fields of which the second similarities are greater than or equal to the field threshold as field-level test cases;
determining at least one item in the test cases of the training sample data set, the field level test cases and the table level test cases corresponding to the field level test cases as a target test case;
and inputting the target test case into a preset test engine, and executing the target test case to test the pre-acquired data to be tested by using the test engine to obtain a test result.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for data testing, the method comprising:
acquiring a historical data set, and screening the historical data set according to a pre-compiled check rule to obtain a training sample data set;
identifying a data version of sample data in the training sample data set;
when the sample data in the training sample data set is a first version, calling a test case corresponding to the first version as a test case of the training sample data set;
when the sample data in the training sample data set is a second version, calculating a first similarity between the training sample data set and pre-acquired reference data, and taking test cases corresponding to one or more groups of reference data with the first similarity being greater than or equal to a preset similarity threshold as the test cases of the training sample data set;
extracting field-level data in one or more groups of corresponding reference data of which the first similarity is greater than or equal to a preset similarity threshold, calculating second similarities between the field-level data and a plurality of standard fields in a standard field library, and taking test cases corresponding to the standard fields of which the second similarities are greater than or equal to the field threshold as field-level test cases;
determining at least one item in the test cases of the training sample data set, the field level test cases and the table level test cases corresponding to the field level test cases as a target test case;
and inputting the target test case into a preset test engine, and executing the target test case to test the pre-acquired data to be tested by using the test engine to obtain a test result.
2. The data testing method of claim 1, wherein the screening the historical data set according to a pre-written verification rule to obtain a training sample data set, comprises:
verifying the historical data set by using a basic data verification rule in the verification rules to obtain first data which accords with the basic data verification rule;
screening out second data which accord with the user-defined check rule in the check rule from the historical data set;
and summarizing the first data and the second data to obtain a training sample data set.
3. The data testing method of claim 2, wherein the verifying the historical data set using a base data verification rule of the verification rules to obtain first data that conforms to the base data verification rule comprises:
determining whether there is duplicate historical data in the historical data set;
and if the repeated historical data exist, deleting the repeated historical data, and screening out the historical data with the acquisition time being greater than or equal to a preset time threshold value in the historical data set as first data.
4. The data testing method of claim 1, wherein said calculating a first similarity between the training sample data set and pre-acquired reference data comprises:
calculating a covariance between the training sample data set and the reference data;
and calculating according to the covariance and a preset Pearson correlation formula to obtain a first similarity between the training sample data set and the pre-acquired reference data.
5. The data testing method of claim 4, wherein said calculating covariance between said training sample data set and said reference data comprises:
training the covariance between the sample data set and the reference data using the following formula:
cov(X,Y)=E(X-μ)(Y-υ)
wherein cov (X, Y) is the covariance, X is the training sample data set, Y is the reference data, μ represents the mathematical expectation of the training sample data set, and υ is the mathematical expectation of the reference data.
6. The data testing method of claim 4, wherein the calculating according to the covariance and a preset Pearson correlation formula to obtain a first similarity between the training sample data set and pre-acquired reference data comprises:
and calculating to obtain a first similarity between the training sample data set and the pre-acquired reference data by using the following formula:
Figure FDA0003136677020000021
where ρ isx,yCov (X, Y) is the covariance, σxAnd σyAnd respectively corresponding standard deviations of the training sample data set and the pre-acquired reference data.
7. The data testing method of claim 1, wherein said identifying data versions of sample data in the training sample data set comprises:
acquiring a version information corresponding table;
and identifying the data version corresponding to the sample data according to the mapping relation between the sample data and the data version in the version information corresponding table.
8. A data testing apparatus, characterized in that the apparatus comprises:
the data screening module is used for acquiring a historical data set, and screening the historical data set according to a pre-compiled check rule to obtain a training sample data set;
the data version identification module is used for identifying the data version of the sample data in the training sample data set;
a test case generation module, configured to, when sample data in the training sample data set is a first version, call a test case corresponding to the first version as the test case of the training sample data set, when the sample data in the training sample data set is a second version, calculate a first similarity between the training sample data set and pre-acquired reference data, use a test case corresponding to one or more groups of reference data having the first similarity greater than or equal to a preset similarity threshold as the test case of the training sample data set, extract field-level data in the one or more groups of reference data having the first similarity greater than or equal to the preset similarity threshold, calculate a second similarity between the field-level data and multiple standard fields in a standard field library, and use a test case corresponding to a standard field having the second similarity greater than or equal to the field threshold as the field-level test case Using a case;
and the test execution module is used for determining at least one of the obtained test cases of the training sample data set, the field level test cases and the table level test cases corresponding to the field level test cases as a target test case, inputting the target test case into a preset test engine, and executing the target test case on the pre-obtained data to be tested by using the test engine to obtain a test result.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data testing method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a data testing method according to any one of claims 1 to 7.
CN202110721429.XA 2021-06-28 2021-06-28 Data testing method and device, electronic equipment and storage medium Pending CN113434414A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110721429.XA CN113434414A (en) 2021-06-28 2021-06-28 Data testing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110721429.XA CN113434414A (en) 2021-06-28 2021-06-28 Data testing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113434414A true CN113434414A (en) 2021-09-24

Family

ID=77754973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110721429.XA Pending CN113434414A (en) 2021-06-28 2021-06-28 Data testing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113434414A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045916A (en) * 2018-10-12 2020-04-21 伊姆西Ip控股有限责任公司 Automated software defect verification
CN111538663A (en) * 2020-04-26 2020-08-14 中国工商银行股份有限公司 Test case generation method and device, computing device and medium
CN112100359A (en) * 2020-10-14 2020-12-18 北京嘀嘀无限科技发展有限公司 Test case searching method, device, equipment and storage medium
CN112231224A (en) * 2020-10-30 2021-01-15 平安银行股份有限公司 Business system testing method, device, equipment and medium based on artificial intelligence
CN112685324A (en) * 2021-01-21 2021-04-20 三一重工股份有限公司 Method and system for generating test scheme
CN113032275A (en) * 2021-04-08 2021-06-25 平安国际智慧城市科技股份有限公司 Method and device for testing field, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045916A (en) * 2018-10-12 2020-04-21 伊姆西Ip控股有限责任公司 Automated software defect verification
CN111538663A (en) * 2020-04-26 2020-08-14 中国工商银行股份有限公司 Test case generation method and device, computing device and medium
CN112100359A (en) * 2020-10-14 2020-12-18 北京嘀嘀无限科技发展有限公司 Test case searching method, device, equipment and storage medium
CN112231224A (en) * 2020-10-30 2021-01-15 平安银行股份有限公司 Business system testing method, device, equipment and medium based on artificial intelligence
CN112685324A (en) * 2021-01-21 2021-04-20 三一重工股份有限公司 Method and system for generating test scheme
CN113032275A (en) * 2021-04-08 2021-06-25 平安国际智慧城市科技股份有限公司 Method and device for testing field, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113961473A (en) Data testing method and device, electronic equipment and computer readable storage medium
CN114979120B (en) Data uploading method, device, equipment and storage medium
CN112528616A (en) Business form generation method and device, electronic equipment and computer storage medium
CN112732567A (en) Mock data testing method and device based on ip, electronic equipment and storage medium
CN113806434A (en) Big data processing method, device, equipment and medium
CN113434542B (en) Data relationship identification method and device, electronic equipment and storage medium
CN112486957B (en) Database migration detection method, device, equipment and storage medium
CN112104662B (en) Far-end data read-write method, device, equipment and computer readable storage medium
CN112541688A (en) Service data checking method and device, electronic equipment and computer storage medium
CN113434397B (en) Task system testing method and device, electronic equipment and storage medium
CN115033489A (en) Code resource detection method and device, electronic equipment and storage medium
CN114911479A (en) Interface generation method, device, equipment and storage medium based on configuration
CN113469649A (en) Project progress analysis method and device, electronic equipment and storage medium
CN114116488A (en) Method, device and equipment for acquiring test coverage rate information and storage medium
CN113051171A (en) Interface test method, device, equipment and storage medium
CN113434414A (en) Data testing method and device, electronic equipment and storage medium
CN114138243A (en) Function calling method, device, equipment and storage medium based on development platform
CN112686759A (en) Account checking monitoring method, device, equipment and medium
CN112527655A (en) Software version quality abnormity detection method and device, electronic equipment and storage medium
CN111859985A (en) AI customer service model testing method, device, electronic equipment and storage medium
CN114721744A (en) Interface modification evaluation method and device, electronic equipment and readable storage medium
CN113886246A (en) O2O project flow management system testing method, device, equipment and storage medium
CN114398277A (en) Test information marking method, device, equipment and readable storage medium
CN114840438A (en) Text code detection and evaluation method, device, equipment and storage medium
CN115454864A (en) Automatic test method, device, equipment and storage medium for commission calculation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210924

RJ01 Rejection of invention patent application after publication