CN111291990B - Quality monitoring processing method and device - Google Patents

Quality monitoring processing method and device Download PDF

Info

Publication number
CN111291990B
CN111291990B CN202010080019.7A CN202010080019A CN111291990B CN 111291990 B CN111291990 B CN 111291990B CN 202010080019 A CN202010080019 A CN 202010080019A CN 111291990 B CN111291990 B CN 111291990B
Authority
CN
China
Prior art keywords
sql statement
quality
quality rule
rule
sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010080019.7A
Other languages
Chinese (zh)
Other versions
CN111291990A (en
Inventor
刘文之
何林强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202010080019.7A priority Critical patent/CN111291990B/en
Publication of CN111291990A publication Critical patent/CN111291990A/en
Application granted granted Critical
Publication of CN111291990B publication Critical patent/CN111291990B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Abstract

The application provides a data quality monitoring processing method and a device, wherein the method comprises the following steps: executing a quality rule SQL statement through a Spark SQL task in a preset time period, wherein the quality rule SQL statement is an SQL statement which is tested to run successfully after the basic grammar is verified to pass; the quality rule is executed in a Map/Reduce mode through the execution result of the quality rule SQL statement, the problems that complicated Map/Reduce codes are written according to the rule SQL, so that development is time-consuming, large in workload and low in execution speed are solved, the Map/Reduce frame is replaced by the Spark SQL frame, the Spark SQL calculation frame is based on a memory, the Map/Reduce frame calculation is based on a disk, a large amount of cost for disk reading and writing is well shielded based on the memory, the quality rule code development efficiency is improved, and the quality rule execution speed is accelerated.

Description

Quality monitoring processing method and device
Technical Field
The application relates to the technical field of data quality, in particular to a data quality monitoring and processing method and device.
Background
The data quality management technology is mainly used for carrying out a series of management activities such as identification, measurement, monitoring, early warning and the like aiming at the data quality problem so as to ensure the characteristics such as data integrity, normalization, consistency, accuracy, uniqueness and the like.
In the related art, a Map/Reduce framework is adopted as a data quality management method of an embodiment, information such as field names, attributes and the like of physical table metadata is maintained, then a rule base is established, one or more monitoring rules are configured for each attribute field of the metadata, a rule running period is configured for each field, and finally quality rules are run, so that an audit report is generated.
The data quality rule of the relational database or the data warehouse can be directly executed through SQL sentences, under the environment of big data, the quality rule is executed in a Map/Reduce mode, complex Map/Reduce codes are required to be written according to the rule SQL, if the quality rule SQL is complex, the Map/Reduce codes are more time-consuming to develop, extra workload of developers is increased, meanwhile, the Map/Reduce framework is calculated based on a hard disk, massive data can be calculated, but the calculation time is relatively long, and the real-time performance is not high, even the service request processing function of an original data source is affected.
Aiming at the problems that in the related art, quality rules are executed in a Map/Reduce mode, and complicated Map/Reduce codes are written according to rules SQL, so that development is time-consuming and workload is increased, no solution is proposed.
Disclosure of Invention
The embodiment of the application provides a data quality monitoring processing method and device, which at least solve the problems that in the related art, quality rules are executed in a Map/Reduce mode, and complicated Map/Reduce codes are written according to rules SQL (structured query language) to cause time consumption in development and workload increase.
According to an embodiment of the present application, there is provided a data quality monitoring processing method including:
executing a quality rule SQL statement through a Spark SQL task in a preset time period, wherein the quality rule SQL statement is an SQL statement which is tested to run successfully after the basic grammar is verified to pass;
and carrying out quality monitoring on the data through the execution result of the quality rule SQL statement.
Optionally, executing the quality rule SQL statement through the Spark SQL task for a predetermined period of time includes:
creating a first Spark SQL task for the quality rule SQL statement in the predetermined time period;
and submitting the first Spark SQL task to a resource management container Yarn cluster to execute the quality rule SQL statement.
Optionally, after submitting the first Spark SQL task to a resource management container Yarn cluster to execute the quality rule SQL statement, the method further comprises:
returning an execution result of the quality rule SQL statement;
judging whether the execution result is in a preset rule normal value range or not;
and determining whether the execution result is normal or abnormal according to the judgment result, and storing the execution result into a rule execution result record table of a system background database.
Optionally, before executing the quality rule SQL statement by the Spark SQL task for a predetermined period of time, the method further comprises:
receiving a quality rule SQL statement of data;
and under the condition that the basic grammar verification of the quality rule SQL statement is passed, the quality rule SQL statement is tried.
Optionally, the commissioning the quality rule SQL statement includes:
creating a second Spark SQL task for the quality rule SQL statement;
and submitting the second Spark SQL task to a resource management container Yarn cluster to try to run the quality rule SQL statement.
Optionally, after submitting the second Spark SQL task to a resource management container Yarn cluster to try to run the quality rule SQL statement, the method further comprises:
if the operation of the quality rule SQL statement is abnormal, returning first prompt information of grammar errors of the quality rule SQL statement;
if the running time of the quality rule SQL statement exceeds the preset time, returning second prompt information for optimizing the quality rule SQL statement;
and if the quality rule SQL statement is operated within the preset time, returning a third prompt message for successful trial operation of the quality rule SQL statement.
Optionally, receiving the quality rule SQL statement of the data includes:
receiving the quality rule SQL sentence completed by editing a pre-stored quality rule template through interactive operation, wherein the instruction rule template at least comprises: the number of table rows, the number of field unique values, the number of field null values, the number of field repeated values, the number of abnormal values of a field association dictionary set, and the number of abnormal values of a field unsatisfied regular expression rule;
and receiving the quality rule SQL statement input through interactive operation.
Optionally, after the quality monitoring of the data by the execution result of the quality rule SQL statement, the method further comprises:
receiving a query instruction for querying a monitoring index corresponding to the execution result;
judging whether a Key corresponding to the monitoring index exists in a Redis cache database according to the query instruction;
acquiring a Value corresponding to the Key from the Redis cache database under the condition that the Key corresponding to the monitoring index exists in the Redis cache database, wherein the monitoring index in the Redis cache database is stored in a Key Value Key-Value pair mode;
and under the condition that the Key corresponding to the monitoring index does not exist in the Redis cache database, acquiring an execution result from a rule execution result record table of a system background database, storing the execution result into the Redis cache database in the mode of the monitoring index, and setting the failure duration of the Key corresponding to the monitoring index.
Optionally, storing the execution result in the dis cache database in the manner of the monitoring index includes:
starting a third Spark SQL task for counting the execution result;
acquiring a monitoring index corresponding to the execution result from a rule execution record table of a system background database;
and submitting the third Spark SQL task to a Yarn cluster for execution, so as to store the monitoring index into the Value of the Key corresponding to the monitoring index in the Redis database.
According to another embodiment of the present application, there is also provided a data quality monitoring processing apparatus including:
the execution module is used for executing the quality rule SQL statement through the Spark SQL task in a preset time period, wherein the quality rule SQL statement is an SQL statement which is tested to run successfully after the basic grammar passes verification;
and the quality monitoring module is used for monitoring the quality of the data according to the execution result of the quality rule SQL statement.
Optionally, the execution module includes:
the first creating sub-module is used for creating a first Spark SQL task for the quality rule SQL statement in the preset time period;
and the execution sub-module is used for submitting the first Spark SQL task to a resource management container Yarn cluster so as to execute the quality rule SQL statement.
Optionally, the apparatus further comprises:
the first return sub-module is used for returning an execution result of the quality rule SQL statement;
the judging submodule is used for judging whether the execution result is in a preset rule normal value range or not;
and the determining submodule is used for determining whether the execution result is normal or abnormal according to the judging result and storing the execution result into a rule execution result record table of a system background database.
Optionally, the apparatus further comprises:
the first receiving module is used for receiving the quality rule SQL statement of the data;
and the test run module is used for testing and running the quality rule SQL statement under the condition that the basic grammar verification of the quality rule SQL statement is passed.
Optionally, the commissioning module includes:
the second creating sub-module is used for creating a second Spark SQL task for the quality rule SQL statement;
and the commissioning sub-module is used for submitting the second Spark SQL task to a resource management container Yarn cluster so as to commission the quality rule SQL statement.
Optionally, the apparatus further comprises:
the second return sub-module is used for returning first prompt information of grammar errors of the quality rule SQL statement if the operation of the quality rule SQL statement is abnormal;
the third return sub-module is used for returning second prompt information for optimizing the quality rule SQL statement if the running time of the quality rule SQL statement exceeds the preset time;
and the fourth return sub-module is used for returning third prompt information for successful trial running of the quality rule SQL statement if the quality rule SQL statement runs in the preset time.
Optionally, the first receiving module includes:
the first receiving sub-module is used for receiving the quality rule SQL sentence completed by editing a pre-stored quality rule template through interactive operation, wherein the instruction rule template at least comprises: the number of table rows, the number of field unique values, the number of field null values, the number of field repeated values, the number of abnormal values of a field association dictionary set, and the number of abnormal values of a field unsatisfied regular expression rule;
and the second receiving sub-module is used for receiving the quality rule SQL statement input through the interactive operation.
Optionally, the apparatus further comprises:
the second receiving module is used for receiving a query instruction for querying the monitoring index corresponding to the execution result;
the judging module is used for judging whether Key corresponding to the monitoring index exists in the Redis cache database according to the query instruction;
the first acquisition module is used for acquiring Value corresponding to the Key from the Redis cache database under the condition that the Key corresponding to the monitoring index exists in the Redis cache database, wherein the monitoring index in the Redis cache database is stored in a Key Value Key-Value pair mode;
the storage module is used for acquiring the execution result from a rule execution result record table of a system background database under the condition that the Key corresponding to the monitoring index does not exist in the Redis cache database, storing the execution result into the Redis cache database in the mode of the monitoring index, and setting the failure time of the Key corresponding to the monitoring index.
Optionally, the storage module includes:
the promoter module is used for starting a third Spark SQL task for counting the execution result;
the acquisition sub-module is used for acquiring monitoring indexes corresponding to the execution results from a rule execution record table of a system background database;
and the storage sub-module is used for submitting the third Spark SQL task to the Yarn cluster for execution so as to store the monitoring index into the Value of the Key corresponding to the monitoring index in the Redis database.
According to a further embodiment of the application, there is also provided a computer-readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the application, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the application, the quality rule SQL statement is executed through Spark SQL tasks in a preset time period, wherein the quality rule SQL statement is an SQL statement which is tested to run successfully after the basic grammar passes verification; the quality rule is executed in a Map/Reduce mode through the execution result of the quality rule SQL statement, the problems that complicated Map/Reduce codes are written according to the rule SQL, so that development is time-consuming, large in workload and low in execution speed are solved, the Map/Reduce frame is replaced by the Spark SQL frame, the Spark SQL calculation frame is based on a memory, the Map/Reduce frame calculation is based on a disk, a large amount of cost for disk reading and writing is well shielded based on the memory, the quality rule code development efficiency is improved, and the quality rule execution speed is accelerated.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a block diagram of a hardware structure of a mobile terminal of a data quality monitoring processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of a data quality monitoring processing method according to an embodiment of the present application;
FIG. 3 is a block diagram of a data quality management system according to an embodiment of the present application;
FIG. 4 is a flow chart of data quality rule execution according to an embodiment of the present application;
FIG. 5 is a flow chart of execution result acquisition according to an embodiment of the present application;
fig. 6 is a block diagram of a data quality monitoring processing apparatus according to an embodiment of the present application.
Detailed Description
The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
Example 1
The method according to the first embodiment of the present application may be implemented in a mobile terminal, a computer terminal or a similar computing device. Taking a mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of a mobile terminal according to an embodiment of the present application, as shown in fig. 1, a mobile terminal 10 may include one or more (only one is shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input/output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1 or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a message receiving method in an embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission means 106 is arranged to receive or transmit data via a network. The specific examples of networks described above may include wireless networks provided by the communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.
In this embodiment, a data quality monitoring processing method running on the mobile terminal or the network architecture is provided, and fig. 2 is a flowchart of the data quality monitoring processing method according to an embodiment of the present application, as shown in fig. 2, where the flowchart includes the following steps:
step S202, executing a quality rule SQL statement through a Spark SQL task in a preset time period, wherein the quality rule SQL statement is an SQL statement which is tested to run successfully after the basic grammar passes verification;
and step S204, performing quality monitoring on the data through the execution result of the quality rule SQL statement.
Through the steps S202 to S204, executing a quality rule SQL statement through Spark SQL tasks in a predetermined time period, wherein the quality rule SQL statement is an SQL statement that is successfully tested after the basic grammar is verified to pass; the quality rule is executed in a Map/Reduce mode through the execution result of the quality rule SQL statement, the problems that complicated Map/Reduce codes are written according to the rule SQL, so that development is time-consuming, large in workload and low in execution speed are solved, the Map/Reduce frame is replaced by the Spark SQL frame, the Spark SQL calculation frame is based on a memory, the Map/Reduce frame calculation is based on a disk, a large amount of cost for disk reading and writing is well shielded based on the memory, the quality rule code development efficiency is improved, and the quality rule execution speed is accelerated.
Optionally, the step S202 may specifically include:
s2021, creating a first Spark SQL task for the quality rule SQL statement in the preset time period;
s2022, submitting the first Spark SQL task to a resource management container Yarn cluster to execute the quality rule SQL statement.
In an alternative embodiment, after step S2022, the execution result of the quality rule SQL statement is returned, it is determined whether the execution result is within a rule normal value range configured in advance, the execution result is determined to be normal or abnormal according to the determined result, and the execution result is stored in a rule execution result record table of the system background database.
In the embodiment of the present application, before the step S2021, a quality rule SQL statement of data is received; further, receiving the quality rule SQL sentence completed by editing a pre-stored quality rule template through interactive operation, wherein the instruction rule template at least comprises: the number of table rows, the number of field unique values, the number of field null values, the number of field repeated values, the number of abnormal values of a field association dictionary set, and the number of abnormal values of a field unsatisfied regular expression rule; or receiving the quality rule SQL statement entered through the interactive operation, but is not limited to the above;
under the condition that the basic grammar verification of the quality rule SQL statement is passed, the quality rule SQL statement is tested and run, and specifically, a second Spark SQL task is created for the quality rule SQL statement; and submitting the second Spark SQL task to a resource management container Yarn cluster to try to run the quality rule SQL statement. Further, if the operation of the quality rule SQL sentence is abnormal, returning first prompt information of grammar errors of the quality rule SQL sentence; if the running time of the quality rule SQL statement exceeds the preset time, returning second prompt information for optimizing the quality rule SQL statement; and if the quality rule SQL statement is operated within the preset time, returning a third prompt message for successful trial operation of the quality rule SQL statement.
Optionally, after the step S204, an execution result may be queried, specifically, a query instruction for querying a monitoring index corresponding to the execution result is received; judging whether a Key corresponding to the monitoring index exists in a Redis cache database according to the query instruction; acquiring a Value corresponding to the Key from the Redis cache database under the condition that the Key corresponding to the monitoring index exists in the Redis cache database, wherein the monitoring index in the Redis cache database is stored in a Key Value Key-Value pair mode; under the condition that the Key corresponding to the monitoring index does not exist in the Redis cache database, acquiring an execution result from a rule execution result record table of a system background database, storing the execution result into the Redis cache database in the mode of the monitoring index, setting the failure duration of the Key corresponding to the monitoring index, and further starting a third Spark SQL task for counting the execution result; acquiring a monitoring index corresponding to the execution result from a rule execution record table of a system background database; and submitting the third Spark SQL task to a Yarn cluster for execution, so as to store the monitoring index into the Value of the Key corresponding to the monitoring index in the Redis database.
The data quality management system based on the B/S architecture allows a user to directly input quality rules SQL at the front end according to requirements, configures a rule execution result normal value range at the front end, runs the rules in a test mode, and can submit quality rule timing scheduling tasks after the test operation is successful. Under the condition that the rule is normally executed, the execution result is not in the range of the normal value set configured by the user, and the system gives alarm information. The system stores the commonly used quality rules as rule SQL templates, and a user can directly search the templates in the process of inputting the quality rules at the front end, copy the templates SQL and then perform proper optimization. After the quality rule scheduling task is submitted, the user can view the overall statistics of the quality rule execution record in the last period of time and evaluate the data quality based on the statistics. The data source monitored by the system can be a common relational database: mySQL, oracle, postgresql, etc., or Hive, MPP data warehouse, fig. 3 is a block diagram of a data quality management system according to an embodiment of the application, as shown in fig. 3, comprising: a rule template management module 32, a rule development management module 34, a rule scheduling management module 36, and an execution record statistics module 38, wherein,
rule template management module 32 is used to maintain common quality rule templates, the template records including, but not limited to: the number of table rows, the number of field unique values, the number of field blank values, the number of field repeated values, the number of abnormal values of a field association dictionary set, and the number of abnormal values of a field unsatisfied regular expression rule (verification of field formats such as mailbox, telephone number, character date and time can be realized by directly replacing related regular expressions in a template). Before the front end of the system inputs a specific quality monitoring rule SQL, a user can directly search a template, copy a template sample SQL sentence and submit the system to test operation after simple modification.
The rule development management module 34 receives the quality rule SQL statement submitted by the user, verifies the basic grammar of the rule SQL statement, creates Spark SQL tasks to execute the rule SQL on Yarn, returns rule SQL execution results or prompts SQL execution exception.
The user inputs a specific rule SQL sentence in the front-end rule development management module 34 and tries to run the rule, firstly, the rule SQL submitted by the user is obtained, basic grammar judgment is carried out, and when the basic grammar is not satisfied (for example, the acquisition index has no Select keyword, no from keyword, no table name and the like), the exception is returned and thrown, and SQL basic grammar errors are prompted; after the SQL basic grammar passes verification, a Spark SQL task is created for the quality SQL, the task is submitted to the Yarn cluster for execution, if the Spark SQL framework throws out an exception, the SQL sentence is prompted to be incorrectly executed, and the system returns and prompts that the quality rule SQL grammar is incorrect; if the SQL sentence execution time is too long, the system returns and prompts that the quality rule needs to be optimized, and the condition can optimize the data range of the SQL sentence limit monitoring and then retries the rule; if the quality rule SQL is executed in a reasonable time range, the system returns a quality rule execution result to the front end, prompts success of the test operation rule, records a test operation result value, judges whether the rule test operation result value is in a rule normal value range configured by a user, and records that the rule execution result value is normal or abnormal. And under the condition that the test run of the rule is successful and the test run result is successfully returned, scheduling execution of the rule is allowed. Fig. 4 is a flowchart of data quality rule trial execution, as shown in fig. 4, according to an embodiment of the present application, including:
step S401, receiving a quality rule SQL statement;
step S402, checking the basic grammar of the quality gauge SQL sentence;
step S403, judging whether the verification is passed, if yes, executing step S404, otherwise executing step S407;
step S404, creating Spark SQL task (corresponding to the second Spark SQL task);
step S405, submitting a task to a Yarn for trial execution;
step S406, judging whether the quality rule execution is abnormal, if yes, executing step S407, otherwise executing step S408;
step S407, prompting grammar errors of the quality rule SQL sentence, wherein the grammar errors can be specifically prompted by the first prompting information;
step S408, judging whether the execution time is too long, if yes, executing step S409, otherwise executing step S410;
step S409, prompting an optimized quality rule SQL sentence, specifically through the second prompting information;
step S410, returning the quality rule test execution result.
The rule schedule management module 36 is responsible for executing quality rules that have been successfully commissioned at fixed execution frequency timing over a range of dates and times. The user can configure the start and end dates of the scheduled tasks through the front end of the system and configure the regular scheduling frequency. The system submits Spark SQL task to the Yarn to execute rules at regular time within the appointed date and time range according to the actual information configured by the user, and saves the execution result of each rule, wherein the result value is normal or abnormal (abnormal: the quality rule result value is not within the normal value set configured by the user, and the SQL grammar is correct). While providing a separate rule history execution record list at the front end for viewing by the user.
The execution record statistics module 38 is used to count and present rule execution records of scheduled execution, the statistics including, but not limited to: the execution times of each day of quality rule in the last cycle time, and the abnormal times of each day of quality rule execution statistics (from the zero point of the day to the previous cycle time); n statistics before the quality rule executes the abnormal times in the last cycle time (from the zero point of the day to the previous cycle time); the module realizes that by means of the Redis memory database, the statistical execution records are stored in different monitoring indexes, the collected monitoring value of each index is stored in the Redis memory database in a key-value pair mode, the data failure time of different indexes is set, and the failure time is set as shown in a table 1 for the monitoring sample indexes mentioned in the scheme.
TABLE 1
When a user inquires and executes the record index statistics, firstly searching whether a monitored index exists in a Redis cache, and if the key value of the monitored index is directly searched in the cache, directly acquiring a value corresponding to the key value and displaying the value to the user; if the Redis has no corresponding record, starting a Spark SQL task, counting values of various indexes in a table related to the rule execution record of a system background database, submitting the task to the Yarn for execution, storing a counted result into value information of a corresponding key in the Redis database after counting, and returning the result to the front end. For the index that the time of failure is the current day, the way to calculate the time of failure is: and calculating the time of the phase difference between the current time and the zero point of the next day, and taking the time of the phase difference as the failure time of the index, so as to ensure that database query statistics are carried out only once in the time of the day. FIG. 5 is a flowchart of execution result acquisition according to an embodiment of the present application, as shown in FIG. 5, including:
step S501, reading a monitoring record in Redis according to a key;
step S502, judging whether the monitoring record is null or not, if yes, executing step S503, otherwise executing step S507;
step S503, creating Spark SQL task and counting the execution result;
step S504, submitting Spark SQL tasks to the Yarn for execution;
step S505, obtaining an execution result and generating a Redis cache object;
step S506, setting the object expiration time (namely the expiration time of the key) and storing the object expiration time into a Redis cache system;
step S507, the execution result record is returned.
In addition to the more complex statistical indicators presented in table 1, statistics of some simple indicators, such as the number of tables the user submitted rules monitor, and the number of rules the user submitted may be added. Compared with the execution flow of complex indexes, the execution of the simple indexes is directly connected with a system background database in a jdbc mode without a sparkSQL task, query sentences are executed, the results are written into Redis, meanwhile, the record failure time is set to be shorter (for example, 1 minute), and other flows are identical with the complex indexes.
The embodiment of the application provides that the spark SQL framework is used for replacing the Map/Reduce framework, so that the complex Map/Reduce code writing process according to the quality rule SQL is avoided, the quality rule code development efficiency can be improved, and meanwhile, the quality rule execution speed is higher. The front end is directly opened to the user, the user is allowed to directly input the quality rules, the template is provided for reference, the flexibility of quality rule configuration is improved, and the rule test running function helps the user to conveniently verify the correctness of the input quality rules. And when the execution result of the query rule is provided, the execution result of the scheduling task is counted, and the counting result is displayed, so that the user can evaluate the quality of the whole data conveniently.
Example 2
According to another embodiment of the present application, there is also provided a data quality monitoring processing apparatus, fig. 6 is a block diagram of the data quality monitoring processing apparatus according to an embodiment of the present application, as shown in fig. 6, including:
an execution module 62, configured to execute a quality rule SQL statement through a Spark SQL task in a predetermined time period, where the quality rule SQL statement is an SQL statement that is tested to run successfully after the basic grammar passes verification;
and the quality monitoring module 64 is used for monitoring the quality of the data according to the execution result of the quality rule SQL statement.
Optionally, the execution module 62 includes:
the first creating sub-module is used for creating a first Spark SQL task for the quality rule SQL statement in the preset time period;
and the execution sub-module is used for submitting the first Spark SQL task to a resource management container Yarn cluster so as to execute the quality rule SQL statement.
Optionally, the apparatus further comprises:
the first return sub-module is used for returning an execution result of the quality rule SQL statement;
the judging submodule is used for judging whether the execution result is in a preset rule normal value range or not;
and the determining submodule is used for determining whether the execution result is normal or abnormal according to the judging result and storing the execution result into a rule execution result record table of a system background database.
Optionally, the apparatus further comprises:
the first receiving module is used for receiving the quality rule SQL statement of the data;
and the test run module is used for testing and running the quality rule SQL statement under the condition that the basic grammar verification of the quality rule SQL statement is passed.
Optionally, the commissioning module includes:
the second creating sub-module is used for creating a second Spark SQL task for the quality rule SQL statement;
and the commissioning sub-module is used for submitting the second Spark SQL task to a resource management container Yarn cluster so as to commission the quality rule SQL statement.
Optionally, the apparatus further comprises:
the second return sub-module is used for returning first prompt information of grammar errors of the quality rule SQL statement if the operation of the quality rule SQL statement is abnormal;
the third return sub-module is used for returning second prompt information for optimizing the quality rule SQL statement if the running time of the quality rule SQL statement exceeds the preset time;
and the fourth return sub-module is used for returning third prompt information for successful trial running of the quality rule SQL statement if the quality rule SQL statement runs in the preset time.
Optionally, the first receiving module includes:
the first receiving sub-module is used for receiving the quality rule SQL sentence completed by editing a pre-stored quality rule template through interactive operation, wherein the instruction rule template at least comprises: the number of table rows, the number of field unique values, the number of field null values, the number of field repeated values, the number of abnormal values of a field association dictionary set, and the number of abnormal values of a field unsatisfied regular expression rule;
and the second receiving sub-module is used for receiving the quality rule SQL statement input through the interactive operation.
Optionally, the apparatus further comprises:
the second receiving module is used for receiving a query instruction for querying the monitoring index corresponding to the execution result;
the judging module is used for judging whether Key corresponding to the monitoring index exists in the Redis cache database according to the query instruction;
the first acquisition module is used for acquiring Value corresponding to the Key from the Redis cache database under the condition that the Key corresponding to the monitoring index exists in the Redis cache database, wherein the monitoring index in the Redis cache database is stored in a Key Value Key-Value pair mode;
the storage module is used for acquiring the execution result from a rule execution result record table of a system background database under the condition that the Key corresponding to the monitoring index does not exist in the Redis cache database, storing the execution result into the Redis cache database in the mode of the monitoring index, and setting the failure time of the Key corresponding to the monitoring index.
Optionally, the storage module includes:
the promoter module is used for starting a third Spark SQL task for counting the execution result;
the acquisition sub-module is used for acquiring monitoring indexes corresponding to the execution results from a rule execution record table of a system background database;
and the storage sub-module is used for submitting the third Spark SQL task to the Yarn cluster for execution so as to store the monitoring index into the Value of the Key corresponding to the monitoring index in the Redis database.
It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
Example 3
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
s1, executing a quality rule SQL statement through a Spark SQL task in a preset time period, wherein the quality rule SQL statement is an SQL statement which is tested to run successfully after the basic grammar passes verification;
s2, quality monitoring is conducted on the data through the execution result of the quality rule SQL statement.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
Example 4
An embodiment of the application also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, executing a quality rule SQL statement through a Spark SQL task in a preset time period, wherein the quality rule SQL statement is an SQL statement which is tested to run successfully after the basic grammar passes verification;
s2, quality monitoring is conducted on the data through the execution result of the quality rule SQL statement.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present application should be included in the protection scope of the present application.

Claims (11)

1. A data quality monitoring processing method, comprising:
executing a quality rule SQL statement through a Spark SQL task in a preset time period, wherein the quality rule SQL statement is an SQL statement which is tested to run successfully after the basic grammar is verified to pass;
performing quality monitoring on the data through the execution result of the quality rule SQL statement;
wherein, before the executing the quality rule SQL statement by Spark SQL task in the predetermined time period, the method further comprises:
receiving a quality rule SQL statement of data;
and under the condition that the basic grammar verification of the quality rule SQL statement is passed, the quality rule SQL statement is tried.
2. The method of claim 1, wherein executing the quality rule SQL statement through a Spark SQL task for a predetermined period of time comprises:
creating a first Spark SQL task for the quality rule SQL statement in the predetermined time period;
and submitting the first Spark SQL task to a resource management container Yarn cluster to execute the quality rule SQL statement.
3. The method of claim 2, wherein after said submitting said first Spark SQL task to a resource management container Yarn cluster to execute said quality rules SQL statement, said method further comprises:
returning an execution result of the quality rule SQL statement;
judging whether the execution result is in a preset rule normal value range or not;
and determining whether the execution result is normal or abnormal according to the judgment result, and storing the execution result into a rule execution result record table of a system background database.
4. The method of claim 1, wherein the commissioning the quality rule SQL statement comprises:
creating a second Spark SQL task for the quality rule SQL statement;
and submitting the second Spark SQL task to a resource management container Yarn cluster to try to run the quality rule SQL statement.
5. The method of claim 4, wherein after said submitting said second Spark SQL task to a resource management container Yarn cluster to try-in said quality rules SQL statement, said method further comprises:
if the operation of the quality rule SQL statement is abnormal, returning first prompt information of grammar errors of the quality rule SQL statement;
if the running time of the quality rule SQL statement exceeds the preset time, returning second prompt information for optimizing the quality rule SQL statement;
and if the quality rule SQL statement is operated within the preset time, returning a third prompt message for successful trial operation of the quality rule SQL statement.
6. The method of any one of claims 1 to 5, wherein the receiving the quality rule SQL statement of data comprises:
receiving the quality rule SQL sentence completed by editing a pre-stored quality rule template through interactive operation, wherein the quality rule template at least comprises: the number of table rows, the number of field unique values, the number of field null values, the number of field repeated values, the number of abnormal values of a field association dictionary set, and the number of abnormal values of a field unsatisfied regular expression rule;
and receiving the quality rule SQL statement input through interactive operation.
7. The method according to any one of claims 1 to 5, wherein after the quality monitoring of the data by the execution result of the quality rule SQL statement, the method further comprises:
receiving a query instruction for querying a monitoring index corresponding to the execution result;
judging whether a Key corresponding to the monitoring index exists in a Redis cache database according to the query instruction;
acquiring a Value corresponding to the Key from the Redis cache database under the condition that the Key corresponding to the monitoring index exists in the Redis cache database, wherein the monitoring index in the Redis cache database is stored in a Key Value Key-Value pair mode;
and under the condition that the Key corresponding to the monitoring index does not exist in the Redis cache database, acquiring an execution result from a rule execution result record table of a system background database, storing the execution result into the Redis cache database in the mode of the monitoring index, and setting the failure duration of the Key corresponding to the monitoring index.
8. The method of claim 7, wherein storing the execution results in the Redis cache database in the manner of the monitor metrics comprises:
starting a third Spark SQL task for counting the execution result;
acquiring a monitoring index corresponding to the execution result from a rule execution record table of a system background database;
and submitting the third Spark SQL task to a Yarn cluster for execution, so as to store the monitoring index into the Value of the Key corresponding to the monitoring index in the Redis database.
9. A data quality monitoring processing apparatus, comprising:
the execution module is used for executing the quality rule SQL statement through the Spark SQL task in a preset time period, wherein the quality rule SQL statement is an SQL statement which is tested to run successfully after the basic grammar passes verification;
the quality monitoring module is used for monitoring the quality of the data according to the execution result of the quality rule SQL statement;
wherein the apparatus further comprises:
the first receiving module is used for receiving the quality rule SQL statement of the data;
and the test run module is used for testing and running the quality rule SQL statement under the condition that the basic grammar verification of the quality rule SQL statement is passed.
10. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program, wherein the computer program is arranged to execute the method of any of the claims 1 to 8 when run.
11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 8.
CN202010080019.7A 2020-02-04 2020-02-04 Quality monitoring processing method and device Active CN111291990B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010080019.7A CN111291990B (en) 2020-02-04 2020-02-04 Quality monitoring processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010080019.7A CN111291990B (en) 2020-02-04 2020-02-04 Quality monitoring processing method and device

Publications (2)

Publication Number Publication Date
CN111291990A CN111291990A (en) 2020-06-16
CN111291990B true CN111291990B (en) 2023-11-07

Family

ID=71028455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010080019.7A Active CN111291990B (en) 2020-02-04 2020-02-04 Quality monitoring processing method and device

Country Status (1)

Country Link
CN (1) CN111291990B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737233A (en) * 2020-06-24 2020-10-02 浙江大华技术股份有限公司 Data monitoring method and device
CN113760945A (en) * 2020-08-11 2021-12-07 北京沃东天骏信息技术有限公司 Method and device for auditing SQL (structured query language) statements
CN112162980A (en) * 2020-11-26 2021-01-01 成都数联铭品科技有限公司 Data quality control method and system, storage medium and electronic equipment
CN115563150B (en) * 2022-12-02 2023-04-18 浙江大华技术股份有限公司 Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102053961A (en) * 2009-10-27 2011-05-11 中兴通讯股份有限公司 Method and device for SQL statements and system for improving database reliability
KR20160123913A (en) * 2015-04-17 2016-10-26 영남대학교 산학협력단 Method for Hash-Join Using Sorting calculation, and computer program, and storage medium operating thereof
CN107291550A (en) * 2017-06-22 2017-10-24 华中科技大学 A kind of Spark platform resources dynamic allocation method and system for iterated application
CN107958049A (en) * 2017-11-28 2018-04-24 航天科工智慧产业发展有限公司 A kind of quality of data checking and administration system
CN108958726A (en) * 2017-05-19 2018-12-07 北京京东尚科信息技术有限公司 A kind of method and apparatus of data mining
CN109101632A (en) * 2018-08-15 2018-12-28 中国人民解放军海军航空大学 Product quality abnormal data retrospective analysis method based on manufacture big data
CN109284282A (en) * 2018-10-22 2019-01-29 北京极数云舟科技有限公司 One kind being based on MySQL database O&M method and system
CN109656917A (en) * 2018-12-18 2019-04-19 深圳前海微众银行股份有限公司 Data detection method, device, equipment and the readable storage medium storing program for executing of multi-data source
CN109669931A (en) * 2018-12-29 2019-04-23 上海携程商务有限公司 Historical data exception analysis method, system, equipment and storage medium
CN110716912A (en) * 2018-07-12 2020-01-21 马上消费金融股份有限公司 SQL performance detection method and server

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8700602B2 (en) * 2008-09-29 2014-04-15 Oracle International Corporation Multi-database, runtime database query performance monitoring
US20100153431A1 (en) * 2008-12-11 2010-06-17 Louis Burger Alert triggered statistics collections
US9311345B2 (en) * 2012-10-09 2016-04-12 Sap Se Template based database analyzer

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102053961A (en) * 2009-10-27 2011-05-11 中兴通讯股份有限公司 Method and device for SQL statements and system for improving database reliability
KR20160123913A (en) * 2015-04-17 2016-10-26 영남대학교 산학협력단 Method for Hash-Join Using Sorting calculation, and computer program, and storage medium operating thereof
CN108958726A (en) * 2017-05-19 2018-12-07 北京京东尚科信息技术有限公司 A kind of method and apparatus of data mining
CN107291550A (en) * 2017-06-22 2017-10-24 华中科技大学 A kind of Spark platform resources dynamic allocation method and system for iterated application
CN107958049A (en) * 2017-11-28 2018-04-24 航天科工智慧产业发展有限公司 A kind of quality of data checking and administration system
CN110716912A (en) * 2018-07-12 2020-01-21 马上消费金融股份有限公司 SQL performance detection method and server
CN109101632A (en) * 2018-08-15 2018-12-28 中国人民解放军海军航空大学 Product quality abnormal data retrospective analysis method based on manufacture big data
CN109284282A (en) * 2018-10-22 2019-01-29 北京极数云舟科技有限公司 One kind being based on MySQL database O&M method and system
CN109656917A (en) * 2018-12-18 2019-04-19 深圳前海微众银行股份有限公司 Data detection method, device, equipment and the readable storage medium storing program for executing of multi-data source
CN109669931A (en) * 2018-12-29 2019-04-23 上海携程商务有限公司 Historical data exception analysis method, system, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
蔡莉 等.数据质量的历史沿革和发展趋势.计算机科学.2018,(04),全文. *
阿里云上基于规则的数据质量管理系统的设计与实现;吕鹏;《中国优秀硕士学位论文全文数据库 信息科技辑》;全文 *

Also Published As

Publication number Publication date
CN111291990A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN111291990B (en) Quality monitoring processing method and device
CN110287052B (en) Root cause task determination method and device for abnormal task
US9037549B2 (en) System and method for testing data at a data warehouse
CN110292775B (en) Method and device for acquiring difference data
CN108182215B (en) Structured Query Language (SQL) performance statistics method and device
CN108804459B (en) Data query method and device
CN109840298B (en) Multi-information-source acquisition method and system for large-scale network data
CN112527783A (en) Data quality probing system based on Hadoop
CN110728422A (en) Building information model, method, device and settlement system for construction project
CN112434033B (en) Method for collecting and displaying flexible and changeable table type data
CN112162980A (en) Data quality control method and system, storage medium and electronic equipment
CN116541752B (en) Metadata management method, device, computer equipment and storage medium
CN109753490A (en) Database optimizing method, system, equipment and medium based on loophole reparation
CN111190817B (en) Method and device for processing software defects
CN111061733B (en) Data processing method, device, electronic equipment and computer readable storage medium
CN113760681A (en) Unified SQL (structured query language) -based multi-source heterogeneous data quality verification method and system
CN115099769A (en) Wind-controlled approval platform for automatic approval of automobile financial orders
CN113377604A (en) Data processing method, device, equipment and storage medium
CN113377801A (en) Data inspection method, data inspection device, electronic equipment and computer storage medium
CN107168942B (en) Automatic report generation method and device
CN117271481B (en) Automatic database optimization method and equipment
CN117391197A (en) Rule reasoning method and system for compliance judgment
CN117763002A (en) Data table generation method and device, electronic equipment and storage medium
CN116860561A (en) Report monitoring method, device and equipment
CN118051505A (en) Data asset management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant