CA3144122A1 - Data verifying method, device and system - Google Patents

Data verifying method, device and system Download PDF

Info

Publication number
CA3144122A1
CA3144122A1 CA3144122A CA3144122A CA3144122A1 CA 3144122 A1 CA3144122 A1 CA 3144122A1 CA 3144122 A CA3144122 A CA 3144122A CA 3144122 A CA3144122 A CA 3144122A CA 3144122 A1 CA3144122 A1 CA 3144122A1
Authority
CA
Canada
Prior art keywords
data
task
verifying
target
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3144122A
Other languages
French (fr)
Inventor
Haiyang Cao
Zhenzhen Wang
Qian Sun
Wenping GUO
Wei Xu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3144122A1 publication Critical patent/CA3144122A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Pertaining to the field of big data processing technology, the present invention makes public a data verifying method, and corresponding device and system. The method comprises: creating a data verifying task based on an offline data task, the offline data task including extracting target data from a source database and writing the target data in a target database; determining an execution order of the data verifying task relative to the offline data task; executing the data verifying task according to the execution order; and judging during execution whether the target data is abnormal according to an abnormality judging condition, if abnormality is verified, interrupting the data verifying task and generating verification information, and, after receiving data amendment information provided by a user according to the verification information, continuing to execute the data verifying task.

Description

DATA VERIFYING METHOD, DEVICE AND SYSTEM
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the field of big data processing technology, and more particularly to a data verifying method, and corresponding device and system.
Description of Related Art
[0002] Data warehouse storage technique (ETL, Extract-Transform-Load) is a technique that extracts, cleans and transforms, then loads data of business systems into a data warehouse for storage and administration to provide basic data for subsequent online analytical processing and data mining. In order to ensure the quality of the incoming data, data verification should be performed on the data extracted from a data source before the data is disposed in the data warehouse. Data verification is mainly directed to the verification of data type, and valuation range of data, and such bad point data as invalid and repetitive data, as well as the checking on uniqueness, relevance, consistency, precision, single fields, and statistic types of record rows, etc. In the state of the art, since the quality appraisal criteria are different for different data, a new verifying method should be introduced for everyday data verification, but with increasing data volume of incoming data, the pressure on data verification is also increasingly greater, and it is therefore required to take a technical solution enabling quick data verification into consideration.
SUMMARY OF THE INVENTION
[0003] In order to overcome the problems pending in the state of the art, embodiments of the present invention provide a data verifying method, and corresponding device and system.

Date Recue/Date Received 2021-12-29 The technical solutions are as follows:
[0004] According to the first aspect, there is provided a data verifying method that comprises:
[0005] creating a data verifying task based on an offline data task, the offline data task including extracting target data from a source database and writing the target data in a target database;
[0006] determining an execution order of the data verifying task relative to the offline data task;
[0007] executing the data verifying task according to the execution order; and
[0008] judging during execution whether the target data is abnormal according to an abnormality judging condition, if abnormality is verified, interrupting the data verifying task and generating verification information, and, after receiving data amendment information provided by a user according to the verification information, continuing to execute the data verifying task.
[0009] Further, the step of executing the data verifying task according to the execution order includes:
[0010] if the data verifying task is a task being executed, extracting the target data from the source database and writing the target data in a temporary database, and performing synchronous data verification on the target data in the temporary database;
[0011] if the target data passes verification, synchronously writing in the target database the target data in the temporary database, and deleting the temporary database after the target data extracted from the source database has all passed verification and been written in the target database; and
[0012] if the target data does not pass verification, deleting the temporary database.
[0013] Further, the step of executing the data verifying task according to the execution order includes:
[0014] if the data verifying task is a predecessor task, executing the data verifying task in the source database before the target data is extracted; if the target data passes verification, Date Recue/Date Received 2021-12-29 extracting it from the source database and writing it in the target database.
[0015] Further, the step of executing the data verifying task according to the execution order includes:
[0016] if the data verifying task is a successor task, executing the data verifying task in the target database after the target data has been extracted from the source database and written in the target database.
[0017] Further, the step of creating a data verifying task based on an offline data task includes:
[0018] obtaining the offline data task;
[0019] judging whether the offline data task has a corresponding data verifying rule, if yes, configuring the data verifying rule for the offline data task, and obtaining resource metadata and a verification parameter table; and
[0020] creating the data verifying task according to the data verifying rule, the resource metadata and the verification parameter table, wherein the data verifying rule includes the abnormality judging condition and the execution order.
[0021] Further, the step of judging whether the offline data task has a corresponding data verifying rule includes:
[0022] reading a verification rule table and a task ID of the offline data task, wherein the verification rule table contains the data verifying rule to which various task IDs correspond; and
[0023] matching the task IDs with the verification rule table, and determining the data verifying rule to which the offline data task corresponds.
[0024] Further, generation of the verification parameter table includes:
[0025] obtaining tables and/or fields contained in the source database and the target database to which the offline data task corresponds; and
[0026] generating a verification parameter table corresponding to the offline data task according Date Recue/Date Received 2021-12-29 to verification parameters configured by the user for the tables and/or fields.
[0027] Further, the step of obtaining tables and/or fields contained in the source database and the target database to which the offline data task corresponds includes:
[0028] automatically obtaining and analyzing a task script of the offline data task, if analysis succeeds, obtaining tables and/or fields contained in the source database and the target database, if analysis fails, receiving tables and/or fields input by the user.
[0029] According to the second aspect, there is provided a data verifying device that comprises:
[0030] a data verifying task obtaining module, for obtaining a data verifying task based on an offline data task, the offline data task including extracting target data from a source database and writing the target data in a target database;
[0031] an execution order judging module, for determining an execution order of the data verifying task relative to the offline data task; and
[0032] a verifying module, for executing the data verifying task according to the execution order, judging during execution whether the target data is abnormal according to an abnormality judging condition, if abnormality is verified, interrupting the data verifying task and generating verification information, and, after receiving data amendment information provided by a user according to the verification information, continuing to execute the data verifying task.
[0033] Further, the verifying module is specifically employed for:
[0034] if it is judged that the data verifying task is a task being executed, extracting the target data from the source database and writing the target data in a temporary database, and performing synchronous data verification on the target data in the temporary database;
[0035] if the target data passes verification, synchronously writing in the target database the target data in the temporary database, and deleting the temporary database after the target data extracted from the source database has all passed verification and been written in the target database; and Date Recue/Date Received 2021-12-29
[0036] if the target data does not pass verification, deleting the temporary database.
[0037] Further, the verifying module is specifically employed for:
[0038] if it is judged that the data verifying task is a predecessor task, executing the data verifying task in the source database before the target data is extracted;
[0039] if the target data passes verification, extracting it from the source database and writing it in the target database.
[0040] Further, the verifying module is specifically employed for:
[0041] If it is judged that the data verifying task is a successor task, executing the data verifying task in the target database after the target data has been extracted from the source database and written in the target database.
[0042] Further, the data verifying task obtaining module includes:
[0043] an offline data task obtaining module, for obtaining the offline data task;
[0044] a data verifying task creating module, for judging whether the offline data task has a corresponding data verifying rule, if yes, configuring the data verifying rule for the offline data task, and obtaining resource metadata and a verification parameter table;
and creating the data verifying task according to the data verifying rule, the resource metadata and the verification parameter table, wherein the data verifying rule includes the abnormality judging condition and the execution order.
[0045] Further, the data verifying task creating module includes:
[0046] a verification rule table determining module for:
[0047] reading a verification rule table and a task ID of the offline data task, wherein the verification rule table contains the data verifying rule to which various task IDs correspond; and
[0048] matching the task IDs with the verification rule table, and determining the data verifying rule to which the offline data task corresponds.
Date Recue/Date Received 2021-12-29
[0049] Further, the data verifying task creating module further includes:
[0050] a verification parameter table determining module for:
[0051] obtaining tables and fields contained in the source database and the target database to which the offline data task corresponds; and
[0052] generating a verification parameter table corresponding to the offline data task according to verification parameters configured by the user for tables and/or fields.
[0053] Further, the data verifying task creating module further includes:
[0054] an analyzing module, for automatically obtaining and analyzing a task script of the offline data task, if analysis succeeds, obtaining tables and/or fields contained in the source database and the target database, if analysis fails, receiving tables and/or fields input by the user.
[0055] According to the third aspect, there is provided a computer system that comprises:
[0056] one or more processor(s); and
[0057] a memory, associated with the one or more processor(s), wherein the memory is employed to store a program instruction, and the program instruction executes the method according to anyone of the aforementioned first aspect when it is read and executed by the one or more processor(s).
[0058] The technical solutions provided by the embodiments of the present invention bring about the following advantageous effects:
[0059] The technical solutions disclosed by the present invention provide several possibilities for the execution order of data verification relative to the offline data task, realize setup of the execution order through preconfiguration of the user or based on default configuration generated by automatically obtaining and analyzing scripts, and make it possible to automatically judge the execution order based on the offline data task, whereby flexibility of data verifying operation is enhanced, and verification efficiency is Date Recue/Date Received 2021-12-29 enhanced.
[0060] The technical solutions disclosed by the present invention supply a technical solution in which the verifying task is interrupted to generate verification information when abnormality occurs to the data, the data verifying task is continued to be executed after the user has made amendment, upstream and downstream tasks can be called during the continued execution, and it is not required to notify or operate downstream tasks on a one-by-one basis, and it is also not required to perform sorting, appraising and checking operations back and forth, whereby production accidents of data verification are better avoided.
[0061] The technical solutions disclosed by the present invention also contain a task being executed, that is to say, the data verifying task is executed at the same time when the offline data task is being executed, and pressure on data verification is reduced with a temporary database serving as buffer.
[0062] The technical solutions disclosed by the present invention economize on production machine resources, and prevent abnormal data tasks from being executed to waste spaces of CPU/memory and magnetic disk, whereby machine cost is further reduced.
BRIEF DESCRIPTION OF THE DRAWINGS
[0063] To more clearly describe the technical solutions in the embodiments of the present invention, drawings required to illustrate the embodiments will be briefly introduced below. Apparently, the drawings introduced below are merely directed to some embodiments of the present invention, while persons ordinarily skilled in the art may further acquire other drawings on the basis of these drawings without spending creative effort in the process.

Date Recue/Date Received 2021-12-29
[0064] Fig. 1 is a flowchart illustrating a data verifying method provided by an embodiment of the present invention;
[0065] Fig. 2 is a view schematically illustrating the structure of a data verifying device provided by an embodiment of the present invention; and
[0066] Fig. 3 is a view schematically illustrating the structure of a computer system provided by an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0067] To make more lucid and clear the objectives, technical solutions and advantages of the present invention, the technical solutions in the embodiments of the present invention will be clearly and comprehensively described below with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the embodiments as described are merely partial, rather than the entire, embodiments of the present invention.
Any other embodiments makeable by persons ordinarily skilled in the art on the basis of the embodiments in the present invention without creative effort shall all fall within the protection scope of the present invention.
[0068] As noted in the Description of Related Art, in the data warehouse storage technique, data should be verified in the process of writing the data from the source database in the target database, so as to ensure validity of the incoming data. A main objective of the technical solutions disclosed by the present invention is to propose a data verifying method capable of enhancing flexibility and execution efficiency of data verification, with technical solutions specified as follows:
[0069] Si - creating a data verifying task based on an offline data task, the offline data task including extracting target data from a source database and writing the target data in a Date Recue/Date Received 2021-12-29 target database.
[0070] The offline data task mainly indicates an offline data task in the data warehouse storage technique (ETL), and the task not only includes extracting data from a source database and writing the data in a target database, but can also include analyzing and processing the data. Specifically, the offline data task can be Sqoop, Datax, Spark, Py Spark, SparkSql, Hive, and MR, etc. Accordingly, the above step 51 also includes judging whether the offline task is of a preset offline data task type.
[0071] Taking the SparkSql task for example, Job Schedule Service receives the task from a T WAIT FOR TAKE table, and judges whether the task type is the SparkSql task, specifically, each offline data task type has a task ID, so it is possible to judge the task type according to the task ID.
[0072] In one embodiment, step 51 includes:
[0073] Sll - obtaining the offline data task;
[0074] S12 -judging whether the offline data task has a corresponding data verifying rule, if yes, configuring the data verifying rule for the offline data task, and obtaining resource metadata and a verification parameter table; and creating the data verifying task according to the data verifying rule, the resource metadata and the verification parameter table, wherein the data verifying rule includes the abnormality judging condition and the execution order.
[0075] The data verifying rule, the resource metadata and the verification parameter table are all preconfigured. The abnormality judging condition is a condition to judge whether data is valid, and can specifically judge: whether the data is void; the valuation range of the data;
and the enumeration range of data valuation, etc. The execution order is an execution order of the data verifying rule relative to the offline data task, and includes a predecessor task, a task being executed, and a successor task, of which the predecessor task means Date Recue/Date Received 2021-12-29 that the data verifying task is executed before the data is extracted, the task being executed means that the data verifying task is executed during the process of extracting and importing the data, and the successor task means that the data verifying task is executed after the data has been imported. The resource metadata file is data resource required to execute the data verifying task, and specifically is task configuration or dependency jar resource, and the verification parameter table contains task parameters such as time parameters and frequencies, etc., that are required to execute the data verifying task.
[0076] Likewise taking the SparkSql task for example, the data verifying rule is stored in a T JOB DATA QUALITY RULE table. The verification parameter table is specifically a T JOB PRAMAS table. The resource metadata is read by reading the configuration value whose key is dataquality.file.id in a T SYSTEM CONFIG table as an id to enquire a T FILE RESOURCE table, and the resource metadata as read is added to the T FILE RESOURCE list to wait for being downloaded. The specific constructing process is as follows:
[0077] writing an environment variable whose key is DATE QUALITY JARNAME in resource name of the resource metadata;
[0078] writing in the data verifying rule according to T JOB DATA QUALITY
RULE;
[0079] reading T JOB PRAMAS table to obtain verification parameters;
[0080] reading T FILE RESOURCE table to obtain the resource metadata, waiting for the resource metadata to be downloaded to completion, completing creation of the data verifying task.
[0081] In one embodiment, step S12 ofjudging whether the offline data task has a corresponding data verifying rule includes:
[0082] reading a verification rule table and a task ID of the offline data task, wherein the verification rule table contains the data verifying rule to which various task IDs correspond; and
[0083] matching the task IDs with the verification rule table, and determining the data verifying Date Recue/Date Received 2021-12-29 rule to which the offline data task corresponds.
[0084] In one embodiment, generation of the verification parameter table includes:
[0085] obtaining tables and/or fields contained in the source database and the target database to which the offline data task corresponds; and
[0086] generating a verification parameter table corresponding to the offline data task according to verification parameters configured by the user for the tables and/or fields.
[0087] Specifically, taking the SparkSql task for example, verification parameters are configured through the T JOB PRAMAS table.
[0088] Preferably, the data verifying rule to which the offline data task corresponds is firstly matched from the verification rule table through the task ID of the offline data task, including predecessor, being executed, and successor, and verification rules are thereafter configured according to the execution order.
[0089] In one embodiment, the step of obtaining tables and/or fields contained in the source database and the target database to which the offline data task corresponds includes:
[0090] automatically obtaining and analyzing a task script of the offline data task, if analysis succeeds, obtaining tables and/or fields contained in the source database and the target database, if analysis fails, receiving tables and/or fields input by the user.
[0091] Specifically, the user develops such offline data tasks as Sqoop, Datax, SparkSql, Hive and MR at the foreground page, and reading/writing script information is verified in real time at the foreground. With respect to such jar-type tasks as SparkSql and so on, the script is captured via external script parameters. Information of tables and fields of the source database and the target database contained in the offline data task is analyzed via an automatic and real-time sql script blood analyzing module. Based on the automatic analysis, data verifying rules and verification parameters of tables or fields of interest to Date Recue/Date Received 2021-12-29 the user are configured.
[0092] S2 - determining an execution order of the data verifying task relative to the offline data task.
[0093] S3 ¨ executing the data verifying task according to the execution order.
[0094] As previously mentioned, the execution order of the data verifying task relative to the offline data task includes: predecessor, being executed, and successor.
Accordingly, executing the data verifying task according to the execution order can specifically include one or more circumstances in the following embodiment:
[0095] In one embodiment, step S3 of executing the data verifying task according to the execution order includes:
[0096] if the data verifying task is a task being executed, extracting the target data from the source database and writing the target data in a temporary database, and performing synchronous data verification on the target data in the temporary database;
[0097] if the target data passes verification, synchronously writing in the target database the target data in the temporary database, and deleting the temporary database after the target data extracted from the source database has all been written in the target database; and
[0098] if the target data does not pass verification, deleting the temporary database.
[0099] The temporary database includes a temporary datasheet, and the specific writing-in is in the temporary datasheet.
[0100] In one embodiment, step S3 of executing the data verifying task according to the execution order includes:
[0101] if the data verifying task is a predecessor task, executing the data verifying task in the source database before the target data is extracted;

Date Recue/Date Received 2021-12-29
[0102] if the target data passes verification, writing the target data extracted from the source database in the target database.
[0103] In one embodiment, step S3 of executing the data verifying task according to the execution order includes:
[0104] if the data verifying task is a successor task, executing the data verifying task in the target database after the target data has been extracted from the source database and written in the target database.
[0105] During the process of executing the data verifying task with the aforementioned three different execution orders, preferably, the predecessor and successor execution orders are default execution orders of the data verifying task, and the task being executed execution order requires the opening by the user before it is configured in the data verifying task.
[0106] During specific execution, it can be sequentially judged in step S2 whether the data verifying task is a predecessor task, a successor task, or a being executed task, taking for example the data verifying task in the SparkSql task:
[0107] judging whether there is a PER CHECK variable (predecessor) in the environment variable in the data verifying task;
[0108] if there is a PER CHECK variable, submitting the predecessor data verifying task, and continuing to execute the SparkSql task when the predecessor task is successfully executed;
[0109] if there is no PER CHECK variable, directly executing the SparkSql task, and judging whether there is a POST CHECK variable (successor) in the environment variable after the SparkSql task has been executed;
[0110] if yes, submitting the successor data verifying task, and executing the data verifying task;
if not, completing task execution.
[0111] judging whether there is a RUNNING CHECK variable (being executed) in the environment variable in the data verifying task;

Date Recue/Date Received 2021-12-29
[0112] if yes, upgrading the task as a Spark execution engine to execute the Spark task, importing the data from the source database into the temporary database, executing the data verifying task in the temporary database, and writing the data in the target database after verification has succeeded;
[0113] if not, directly executing the SparkSql task.
[0114] In another circumstance, it is sequentially judged in step S2 whether the data verifying task is a predecessor task, a being executed task, or a successor task, taking for example the data verifying task in the SparkSql task:
[0115] judging whether there is a PER CHECK variable (predecessor) in the environment variable in the data verifying task;
[0116] if there is a PER CHECK variable, submitting the predecessor data verifying task, and continuing to execute the SparkSql task when the predecessor task is successfully executed;
[0117] if there is no PER CHECK variable, directly executing the SparkSql task, and judging whether there is a RUNNING CHECK variable (being executed) in the environment variable after the SparkSql task has been executed;
[0118] if yes, upgrading the task as a Spark execution engine to execute the Spark task, importing the data from the source database into the temporary database, executing the data verifying task in the temporary database, and writing the data in the target database after verification has succeeded;
[0119] if not, judging whether there is a POST CHECK variable (successor) in the environment variable;
[0120] if yes, submitting the successor data verifying task, and executing the data verifying task;
if not, completing task execution.
[0121] S4 - judging during execution whether the target data is abnormal according to an abnormality judging condition, if abnormality is verified, interrupting the data verifying task and generating verification information, and, after receiving data amendment Date Recue/Date Received 2021-12-29 information provided by a user according to the verification information, continuing to execute the data verifying task.
[0122] The verification information mainly indicates a quality report. After it has been judged that abnormality occurs to the data, the data verifying task disclosed by the embodiments of the present invention can be interrupted, and the data verifying task is resumed to be executed after the user has processed the abnormal data, upstream and downstream tasks are automatically called during the continued execution, and it is not required to notify or operate downstream tasks on a one-by-one basis. After the data verifying task has been executed to completion, the user is notified to timely check the result. By checking the circumstance of analyzed data quality in the quality report, by starting an offline quality report analyzing module, and by automatically collecting the quality report stored on hdfs, it is made possible to base on user dimension analysis to analyze abnormality details of specific tables and fields for the user, to base on keywords to categorize and summarize abnormality index types, to issue data quality common abnormality reports, and to feed back to the data user. Taking the data verifying task in the SparkSql task for example, after the data verifying task has been executed to completion, the SparkSql offline task execution engine stores the quality report via hdfs-api interface in a path specified by hdfs, for analysis by the SparkSql offline task. The user can precisely locate any problematic data on the basis of the quality report, after the problematic data has been amended, tasks are pulled up by resetting data verifying tasks of the predecessor, being executed, and successor quality rules to ensure normal execution of downstream tasks, thus providing reliable guarantee for precise administration of data.
[0123] As shown in Fig. 2, based on the aforementioned data verifying method, an embodiment of the present invention further provides a data verifying device that comprises the following modules.
[0124] A data verifying task obtaining module 201 is employed for obtaining a data verifying Date Recue/Date Received 2021-12-29 task based on an offline data task, the offline data task including extracting target data from a source database and writing the target data in a target database.
[0125] The offline data task mainly indicates an offline data task in the data warehouse storage technique (ETL), and the task not only includes extracting data from a source database and writing the data in a target database, but can also include analyzing and processing the data. Specifically, the offline data task can be Sqoop, Datax, Spark, Py Spark, SparkSql, Hive, and MR, etc. Accordingly, the data verifying task obtaining module is further employed for judging whether the offline task is of a preset offline data task type, and preferably for judging a type of the offline data task according to the task ID.
[0126] In one embodiment, the data verifying task obtaining module 201 includes:
[0127] an offline data task obtaining module, for obtaining the offline data task; and
[0128] a data verifying task creating module, for judging whether the offline data task has a corresponding data verifying rule, if yes, configuring the data verifying rule for the offline data task, and obtaining resource metadata and a verification parameter table;
and creating the data verifying task according to the data verifying rule, the resource metadata and the verification parameter table, wherein the data verifying rule includes the abnormality judging condition and the execution order.
[0129] The data verifying rule, the resource metadata and the verification parameter table are all preconfigured. The abnormality judging condition is a condition to judge whether data is valid, and can specifically judge: whether the data is void; the valuation range of the data;
and the enumeration range of data valuation, etc. The execution order is an execution order of the data verifying rule relative to the offline data task, and includes a predecessor task, a task being executed, and a successor task. The resource metadata file is data resource required to execute the data verifying task, and the verification parameter table contains task parameters that are required to execute the data verifying task.

Date Recue/Date Received 2021-12-29
[0130] In one embodiment, the data verifying task creating module includes:
[0131] a verification rule table determining module for:
[0132] reading a verification rule table and a task ID of the offline data task, wherein the verification rule table contains verifying rules to which various task IDs correspond; and
[0133] matching the task IDs with the verification rule table, and determining the data verifying rule to which the offline data task corresponds.
[0134] In one embodiment, the data verifying task creating module further includes:
[0135] a verification parameter table determining module for:
[0136] obtaining tables and/or fields contained in the source database and the target database to which the offline data task corresponds; and
[0137] generating a verification parameter table corresponding to the offline data task according to verification parameters configured by the user for tables and/or fields.
[0138] In one embodiment, the data verifying task creating module further includes:
[0139] an analyzing module, for automatically obtaining and analyzing a task script of the offline data task, if analysis succeeds, obtaining tables and/or fields contained in the source database and the target database, if analysis fails, receiving tables and/or fields input by the user.
[0140] An execution order judging module 202 is employed for determining an execution order of the data verifying task relative to the offline data task.
[0141] The execution order judging module judges the execution order of the data verifying task relative to the offline data task mainly through an environment variable in the data verifying task.
[0142] The execution order of the data verifying task relative to the offline data task includes:
predecessor, being executed, and successor. During the process of executing the data Date Recue/Date Received 2021-12-29 verifying task with the aforementioned three different execution orders, preferably, the predecessor and successor execution orders are default execution orders of the data verifying task, and the task being executed execution order requires the opening by the user before it is configured in the data verifying task.
[0143] During specific execution, it is possible to sequentially judge whether the data verifying task is a predecessor task, a successor task, or a task being executed, or to sequentially judge whether the data verifying task is an predecessor task, a task being executed, or a successor task.
[0144] A verifying module 203 is employed for executing the data verifying task according to the execution order, judging during execution whether the target data is abnormal according to an abnormality judging condition, if abnormality is verified, interrupting the data verifying task and generating verification information, and, after receiving data amendment information provided by a user according to the verification information, continuing to execute the data verifying task.
[0145] In one embodiment, the verifying module 203 is specifically employed for:
[0146] if it is judged that the data verifying task is a task being executed, extracting the target data from the source database and writing the target data in a temporary database, and performing synchronous data verification on the target data in the temporary database;
[0147] if the target data passes verification, synchronously writing in the target database the target data in the temporary database, and deleting the temporary database after the target data extracted from the source database has all been written in the target database; and
[0148] if the target data does not pass verification, deleting the temporary database.
[0149] In one embodiment, the verifying module 203 is specifically employed for:
[0150] if it is judged that the data verifying task is a predecessor task, executing the data verifying task in the source database before the target data is extracted;

Date Recue/Date Received 2021-12-29
[0151] if the target data passes verification, writing the data extracted from the source database in the target database.
[0152] In one embodiment, the verifying module 203 is specifically employed for:
[0153] If it is judged that the data verifying task is a successor task, executing the data verifying task in the target database after the target data has been extracted from the source database and written in the target database.
[0154] Based on the aforementioned data verifying method, the present invention further provides a computer system that comprises:
[0155] one or more processor(s); and
[0156] a memory, associated with the one or more processor(s), wherein the memory is employed to store a program instruction, and the program instruction executes the aforementioned data verifying method when it is read and executed by the one or more processor(s).
[0157] Fig. 3 exemplarily illustrates the framework of the computer system that can specifically include a processor 310, a video display adapter 311, a magnetic disk driver 312, an input/output interface 313, a network interface 314, and a memory 320. The processor 310, the video display adapter 311, the magnetic disk driver 312, the input/output interface 313, the network interface 314, and the memory 320 can be communicably connected with one another via a communication bus 330.
[0158] The processor 310 can be embodied as a general CPU (Central Processing Unit), a microprocessor, an ASIC (Application Specific Integrated Circuit), or one or more integrated circuit(s) for executing relevant program(s) to realize the technical solutions provided by the present application.
[0159] The memory 320 can be embodied in such a form as an ROM (Read Only Memory), an RAM (Random Access Memory), a static storage device, or a dynamic storage device.

Date Recue/Date Received 2021-12-29 The memory 320 can store an operating system 321 for controlling the running of an electronic equipment 300, and a basic input/output system 322 (BIOS) for controlling lower-level operations of the electronic equipment 300. In addition, the memory 320 can also store a web browser 323, a data storage administration system 324, and an equipment identification information processing system 325, etc. The equipment identification information processing system 325 can be an application program that specifically realizes the aforementioned various step operations in the embodiments of the present application. To sum it up, when the technical solutions provided by the present application are to be realized via software or firmware, the relevant program codes are stored in the memory 320, and invoked and executed by the processor 310.
[0160] The input/output interface 313 is employed to connect with an input/output module to realize input and output of information. The input/output module can be equipped in the device as a component part (not shown in the drawings), and can also be externally connected with the device to provide corresponding functions. The input means can include a keyboard, a mouse, a touch screen, a microphone, and various sensors etc., and the output means can include a display screen, a loudspeaker, a vibrator, an indicator light etc.
[0161] The network interface 314 is employed to connect to a communication module (not shown in the drawings) to realize intercommunication between the current device and other devices. The communication module can realize communication in a wired mode (via USB, network cable, for example) or in a wireless mode (via mobile network, WIFI, Bluetooth, etc.).
[0162] The bus 330 includes a passageway transmitting information between various component parts of the device (such as the processor 310, the video display adapter 311, the magnetic disk driver 312, the input/output interface 313, the network interface 314, and the memory 320).
Date Recue/Date Received 2021-12-29
[0163] Additionally, the electronic equipment 300 may further obtain information of specific collection conditions from a virtual resource object collection condition information database for judgment on conditions, and so on.
[0164] As should be noted, although merely the processor 310, the video display adapter 311, the magnetic disk driver 312, the input/output interface 313, the network interface 314, the memory 320, and the bus 330 are illustrated for the aforementioned device, the device may further include other component parts prerequisite for realizing normal running during specific implementation. In addition, as can be understood by persons skilled in the art, the aforementioned device may as well only include component parts necessary for realizing the solutions of the present application, without including the entire component parts as illustrated.
[0165] As can be known through the description to the aforementioned embodiments, it is clearly learnt by person skilled in the art that the present application can be realized through software plus a general hardware platform. Based on such understanding, the technical solutions of the present application, or the contributions made thereby over the state of the art, can be essentially embodied in the form of a software product, and such a computer software product can be stored in a storage medium, such as an ROM/RAM, a magnetic disk, an optical disk etc., and includes plural instructions enabling a computer equipment (such as a personal computer, a server, or a network device etc.) to execute the methods described in various embodiments or some sections of the embodiments of the present application.
[0166] The various embodiments are progressively described in the Description, identical or similar sections among the various embodiments can be inferred from one another, and each embodiment stresses what is different from other embodiments.
Particularly, with respect to the system or system embodiment, since it is essentially similar to the method embodiment, its description is relatively simple, and the relevant sections thereof can be Date Recue/Date Received 2021-12-29 inferred from the corresponding sections of the method embodiment. The system or system embodiment as described above is merely exemplary in nature, units therein described as separate parts can be or may not be physically separate, parts displayed as units can be or may not be physical units, that is to say, they can be located in a single site, or distributed over a plurality of network units. It is possible to base on practical requirements to select partial modules or the entire modules to realize the objectives of the embodied solutions. It is understandable and implementable by persons ordinarily skilled in the art without spending creative effort in the process.
[0167] The technical solutions provided by the embodiments of the present invention bring about the following advantageous effects:
[0168] The technical solutions disclosed by the present invention provide several possibilities for the execution order of data verification relative to the offline data task, realize setup of the execution order through preconfiguration of the user or based on default configuration generated by automatically obtaining and analyzing scripts, and make it possible to automatically judge the execution order based on the offline data task, whereby flexibility of data verifying operation is enhanced, and verification efficiency is enhanced.
[0169] The technical solutions disclosed by the present invention supply a technical solution in which the verifying task is interrupted to generate verification information when abnormality occurs to the data, the data verifying task is continued to be executed after the user has made amendment, upstream and downstream tasks can be called during the continued execution, and it is not required to notify or operate downstream tasks on a one-by-one basis, and it is also not required to perform sorting, appraising and checking operations back and forth, whereby production accidents of data verification are better avoided.

Date Recue/Date Received 2021-12-29
[0170] The technical solutions disclosed by the present invention also contain a task being executed, that is to say, the data verifying task is executed at the same time when the offline data task is being executed, and pressure on data verification is reduced with a temporary database serving as buffer.
[0171] The technical solutions disclosed by the present invention economize on production machine resources, and prevent abnormal data tasks from being executed to waste spaces of CPU/memory and magnetic disk, whereby machine cost is further reduced.
[0172] All of the aforementioned optional technical solutions are randomly combinable to form optional embodiments of the present invention, to which no repetition is made in this context.
[0173] What is described above is merely directed to preferred embodiments of the present invention, and is not meant to restrict the present invention. Any amendment, equivalent replacement and improvement makeable within the spirit and principle of the present invention shall all fall within the protection scope of the present invention.

Date Recue/Date Received 2021-12-29

Claims (10)

What is claimed is:
1. A data verifying method, characterized in comprising:
creating a data verifying task based on an offline data task, wherein the offline data task includes extracting target data from a source database and writing the target data in a target database;
determining an execution order of the data verifying task relative to the offline data task;
executing the data verifying task according to the execution order; and judging during execution whether the target data is abnormal according to an abnormality judging condition, if abnormality is verified, interrupting the data verifying task and generating verification information, and, after receiving data amendment information provided by a user according to the verification information, continuing to execute the data verifying task.
2. The method according to Claim 1, characterized in that the step of executing the data verifying task according to the execution order includes:
if the data verifying task is a task being executed, extracting the target data from the source database and writing the target data in a temporary database, and performing synchronous data verification on the target data in the temporary database;
if the target data passes verification, synchronously writing in the target database the target data in the temporary database, and deleting the temporary database after the target data extracted from the source database has all passed verification and been written in the target database; and if the target data does not pass verification, deleting the temporary database.
3. The method according to Claim 1, characterized in that the step of executing the data verifying task according to the execution order includes:

Date Recue/Date Received 2021-12-29 if the data verifying task is a predecessor task, executing the data verifying task in the source database before the target data is extracted; if the target data passes verification, extracting it from the source database and writing it in the target database.
4. The method according to Claim 1, characterized in that the step of executing the data verifying task according to the execution order includes:
if the data verifying task is a successor task, executing the data verifying task in the target database after the target data has been extracted from the source database and written in the target database.
5. The method according to anyone of Claims 1 to 4, characterized in that the step of creating a data verifying task based on an offline data task includes:
obtaining the offline data task;
judging whether the offline data task has a corresponding data verifying rule, if yes, configuring the data verifying rule for the offline data task, and obtaining resource metadata and a verification parameter table; and creating the data verifying task according to the data verifying rule, the resource metadata and the verification parameter table, wherein the data verifying rule includes the abnormality judging condition and the execution order.
6. The method according to Claim 5, characterized in that the step ofjudging whether the offline data task has a corresponding data verifying rule includes:
reading a verification rule table and a task ID of the offline data task, wherein the verification rule table contains the data verifying rule to which various task IDs correspond; and matching the task IDs with the verification rule table, and determining the data verifying rule to which the offline data task corresponds.
7. The method according to Claim 5, characterized in that the generation of the verification parameter table includes:
Date Recue/Date Received 2021-12-29 obtaining tables and/or fields contained in the source database and the target database to which the offline data task corresponds; and generating a verification parameter table corresponding to the offline data task according to verification parameters configured by the user for the tables and/or fields.
8. The method according to Claim 7, characterized in that the step of obtaining tables and/or fields contained in the source database and the target database to which the offline data task corresponds includes: automatically obtaining and analyzing a task script of the offline data task, if analysis succeeds, obtaining tables and/or fields contained in the source database and the target database, if analysis fails, receiving tables and/or fields input by the user.
9. A data verifying device, characterized in comprising:
a data verifying task obtaining module, for obtaining a data verifying task based on an offline data task, the offline data task including extracting target data from a source database and writing the target data in a target database;
an execution order judging module, for determining an execution order of the data verifying task relative to the offline data task; and a verifying module, for executing the data verifying task according to the execution order, judging during execution whether the target data is abnormal according to an abnormality judging condition, if abnormality is verified, interrupting the data verifying task and generating verification information, and, after receiving data amendment information provided by a user according to the verification information, continuing to execute the data verifying task.
10. A computer system, characterized in comprising:
one or more processor(s); and a memory, associated with the one or more processor(s), wherein the memory is employed to store a program instruction, and the program instruction executes the method according to anyone of Claims 1 to 8 when it is read and executed by the one or more processor(s).

Date Recue/Date Received 2021-12-29
CA3144122A 2020-12-31 2021-12-29 Data verifying method, device and system Pending CA3144122A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011625467.7A CN112632174A (en) 2020-12-31 2020-12-31 Data inspection method, device and system
CN202011625467.7 2020-12-31

Publications (1)

Publication Number Publication Date
CA3144122A1 true CA3144122A1 (en) 2022-06-30

Family

ID=75290345

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3144122A Pending CA3144122A1 (en) 2020-12-31 2021-12-29 Data verifying method, device and system

Country Status (2)

Country Link
CN (1) CN112632174A (en)
CA (1) CA3144122A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641739B (en) * 2021-07-05 2022-09-06 南京联创信息科技有限公司 Spark-based intelligent data conversion method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918403A (en) * 2019-02-02 2019-06-21 中国银行股份有限公司 Data verification method, device, computer equipment and storage medium
CN110457371A (en) * 2019-08-13 2019-11-15 杭州有赞科技有限公司 Data managing method, device, storage medium and system
CN110851539A (en) * 2019-10-25 2020-02-28 东软集团股份有限公司 Metadata verification method and device, readable storage medium and electronic equipment
CN111367886B (en) * 2020-03-02 2024-01-19 中国邮政储蓄银行股份有限公司 Method and device for data migration in database
CN112148788A (en) * 2020-08-25 2020-12-29 珠海市卓轩科技有限公司 Data synchronization method and system for heterogeneous data source

Also Published As

Publication number Publication date
CN112632174A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
US20180349257A1 (en) Systems and methods for test prediction in continuous integration environments
CN109558746B (en) Data desensitization method and device, electronic equipment and storage medium
CN108628748B (en) Automatic test management method and automatic test management system
CN112491602B (en) Behavior data monitoring method and device, computer equipment and medium
CN110956269A (en) Data model generation method, device, equipment and computer storage medium
CN109542764B (en) Webpage automatic testing method and device, computer equipment and storage medium
CN111045935A (en) Automatic version auditing method, device, equipment and storage medium
CN113014445A (en) Operation and maintenance method, device and platform for server and electronic equipment
US20150302089A1 (en) Recovery of Information from Commercial Web Portals
CA3144122A1 (en) Data verifying method, device and system
CN113836237A (en) Method and device for auditing data operation of database
CN115391655A (en) Information query method and device, electronic equipment and computer readable storage medium
WO2019062087A1 (en) Attendance check data testing method, terminal and device, and computer readable storage medium
CN114064510A (en) Function testing method and device, electronic equipment and storage medium
JPWO2016067391A1 (en) Electronic apparatus, system and method
CN114693116A (en) Method and device for detecting code review validity and electronic equipment
CN114357032A (en) Data quality monitoring method and device, electronic equipment and storage medium
CN113918525A (en) Data exchange scheduling method, system, electronic device, medium, and program product
CN114003497A (en) Method, device and equipment for testing service system and storage medium
CN113434382A (en) Database performance monitoring method and device, electronic equipment and computer readable medium
CN112650679B (en) Test verification method, device and computer system
US11645136B2 (en) Capturing referenced information in a report to resolve a computer problem
CN117112415A (en) Business process monitoring method based on EDA model and related equipment thereof
CN113868095A (en) Data monitoring method, system, server and storage medium
CN112395197A (en) Data processing method, data processing device and electronic equipment

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916