CN114281807A - Data quality auditing method, device, equipment and storage medium - Google Patents

Data quality auditing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114281807A
CN114281807A CN202111376117.6A CN202111376117A CN114281807A CN 114281807 A CN114281807 A CN 114281807A CN 202111376117 A CN202111376117 A CN 202111376117A CN 114281807 A CN114281807 A CN 114281807A
Authority
CN
China
Prior art keywords
data
preset
auditing
audited
audit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111376117.6A
Other languages
Chinese (zh)
Inventor
王庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Unicom Big Data Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Unicom Big Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd, Unicom Big Data Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202111376117.6A priority Critical patent/CN114281807A/en
Publication of CN114281807A publication Critical patent/CN114281807A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application provides a data quality auditing method, a device, equipment and a storage medium, wherein the method determines a preset auditing mode by acquiring data to be audited, the preset auditing mode comprises a sliding time window or a sliding event window, so that the data to be audited are subjected to data sampling according to the sliding time window or the sliding event window to obtain auditing sampling data, the auditing sampling data are verified according to a preset verification rule to obtain an auditing verification result, and the problems that the existing data auditing consumes more computing resources in a full auditing mode and the auditing speed is slow are solved.

Description

Data quality auditing method, device, equipment and storage medium
Technical Field
The present application relates to the field of data verification technologies, and in particular, to a data quality auditing method, apparatus, device, and storage medium.
Background
The data quality audit is the data quality control of each link of production, processing, transmission, storage, use, exchange and the like in the data life cycle. The computing platform generally detects whether the data quality meets the requirement according to a data quality rule, and performs corresponding processing on the data which does not pass the quality.
When a computing platform audits data quality, data quality detection is usually performed in a full-volume auditing manner.
For example, in the data auditing in the data warehouse, in the enterprise informatization process, the data warehouse plays an increasingly important role in the medium and long term management decision of the enterprise, and as the volume of data is larger and the format is richer and richer, the requirement of the management decision is more and more strict, and the continuous development of the data warehouse is further promoted. The data auditing in the existing data storage basically adopts a full auditing method, consumes more computing resources and has slower processing speed.
Disclosure of Invention
The application provides a data quality auditing method, device, equipment and storage medium, which solve the problems that the existing data auditing consumes more computing resources and has slower processing speed.
In a first aspect, the present application provides a data quality auditing method, including the following steps:
acquiring data to be audited;
determining a preset auditing mode, wherein the preset auditing mode comprises a sliding time window or a sliding event window;
according to the sliding time window or the sliding event window, carrying out data sampling on the data to be audited to obtain audited sampling data;
and verifying the audit sampling data according to a preset verification rule to obtain an audit verification result.
In a possible implementation manner, both the step size and the window size of the sliding time window are adjustable, and both the step size and the window size of the sliding event window are adjustable.
In a possible implementation manner, the acquiring data to be audited includes:
determining a data theme to be audited, and acquiring source data from a preset message queue according to the data theme;
and analyzing the source data according to the data structure of the source data to obtain the data to be audited.
In a possible implementation manner, the determining the preset auditing manner includes:
judging whether the preset auditing mode is cached in a memory or not;
and if the preset auditing mode is cached in the memory, acquiring the cached preset auditing mode, wherein the preset auditing mode is acquired after the cache is refreshed based on the auditing mode stored in a preset database after the preset time.
In a possible implementation manner, after the verifying the audit sample data according to a preset verification rule to obtain an audit verification result, the method further includes:
and according to the audit verification result and a preset abnormal data processing rule, performing warehousing processing on the data to be audited.
In a possible implementation manner, the audit verification result comprises a normal verification result or an abnormal verification result, and the preset abnormal data processing rule comprises normal warehousing, discarding or separate warehousing;
the step of performing warehousing processing on the data to be audited according to the audit verification result and a preset abnormal data processing rule comprises the following steps:
and performing normal warehousing, discarding or separate warehousing treatment on the data to be audited according to the normal verification result or the abnormal verification result.
In a possible implementation manner, after performing the warehousing processing on the data to be audited according to the audit result and a preset abnormal data processing rule, the method further includes:
and generating an alarm and/or a quality report according to the warehousing processing result.
In a possible implementation manner, the preset check rule includes one or more of null value check, regular check, date check, numerical check, enumeration check, and identity check.
In a second aspect, the present application provides a data quality auditing apparatus, including:
the acquisition module is used for acquiring data to be audited;
the system comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining a preset auditing mode, and the preset auditing mode comprises a sliding time window or a sliding event window;
the sampling module is used for carrying out data sampling on the data to be audited according to the sliding time window or the sliding event window to obtain audited sampling data;
and the checking module is used for checking the audit sampling data according to a preset checking rule to obtain an audit checking result.
In a possible implementation manner, both the step size and the window size of the sliding time window are adjustable, and both the step size and the window size of the sliding event window are adjustable.
In a possible implementation manner, the obtaining module is specifically configured to:
determining a data theme to be audited, and acquiring source data from a preset message queue according to the data theme;
and analyzing the source data according to the data structure of the source data to obtain the data to be audited.
In a possible implementation manner, the determining module is specifically configured to:
judging whether the preset auditing mode is cached in a memory or not;
and if the preset auditing mode is cached in the memory, acquiring the cached preset auditing mode, wherein the preset auditing mode is acquired after the cache is refreshed based on the auditing mode stored in a preset database after the preset time.
In a possible implementation manner, the apparatus further includes an exception handling module configured to:
and according to the audit verification result and a preset abnormal data processing rule, performing warehousing processing on the data to be audited.
In a possible implementation manner, the audit verification result comprises a normal verification result or an abnormal verification result, and the preset abnormal data processing rule comprises normal warehousing, discarding or separate warehousing;
the exception handling module is specifically configured to:
and performing normal warehousing, discarding or separate warehousing treatment on the data to be audited according to the normal verification result or the abnormal verification result.
In one possible implementation, the apparatus further includes a result processing module configured to:
and generating an alarm and/or a quality report according to the warehousing processing result.
In a possible implementation manner, the preset check rule includes one or more of null value check, regular check, date check, numerical check, enumeration check, and identity check.
In a third aspect, the present application provides a data quality auditing apparatus, including:
a processor;
a memory; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program for causing a server to execute the method of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising computer instructions for executing the method of the first aspect by a processor.
The method comprises the steps of obtaining data to be audited, determining a preset auditing mode, wherein the preset auditing mode comprises a sliding time window or a sliding event window, carrying out data sampling on the data to be audited according to the sliding time window or the sliding event window, obtaining auditing sampling data, verifying the auditing sampling data according to a preset verification rule, obtaining an auditing verification result, and solving the problems that the existing data auditing consumes more computing resources in a full auditing mode and the auditing speed is slow.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic diagram of a data quality auditing system architecture according to an embodiment of the present application;
fig. 2 is a schematic flow chart illustrating a data quality auditing method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating another data quality auditing method according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram illustrating data quality audit according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a data quality auditing apparatus according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of another data quality auditing apparatus according to an embodiment of the present application;
FIG. 7A is a diagram illustrating a basic hardware architecture of a data quality auditing apparatus according to the present application;
fig. 7B is a schematic diagram of a basic hardware architecture of another data quality auditing apparatus provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," if any, in the description and claims of this application and the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Here, taking data auditing in data warehousing as an example, in an enterprise informatization process, a data warehouse plays an increasingly important role in medium and long term management decisions of an enterprise, and as the volume of data is larger and the format is richer and richer, the requirements of the management decisions are more and more strict, thereby further promoting the continuous development of the data warehouse. The data auditing in the existing data storage basically adopts a full auditing method, consumes more computing resources and has slower processing speed.
Therefore, the embodiment of the application provides a data quality auditing method, which performs data sampling on data to be audited according to a sliding time window or a sliding event window to obtain auditing sampling data, so that the auditing sampling data are verified according to a preset verification rule to obtain an auditing verification result, and the problems that the existing data auditing consumes more computing resources in a full auditing mode and the auditing speed is slow are solved.
Optionally, a data quality auditing method provided by the present application may be applied to the data quality auditing system architecture diagram shown in fig. 1, and as shown in fig. 1, the system may include a receiving device 101, a processing device 102, and a display device 103.
In a specific implementation process, the receiving device 101 may be an input/output interface or a communication interface, and may be configured to receive data to be audited.
The processing device 102 may obtain the data to be audited through the receiving device 101, and further perform data sampling on the data to be audited according to a sliding time window or a sliding event window to obtain audit sample data, so as to check the audit sample data according to a preset check rule to obtain an audit check result, thereby solving the problems that the existing data audit adopts a full audit mode to consume more computing resources and the audit speed is slow. In addition, the step length and the window size of the sliding time window can be set to be both adjustable, and similarly, the step length and the window size of the sliding event window can also be set to be both adjustable, that is, the sampling frequency is adjustable, thereby meeting the requirements of various applications.
The display device 103 may be configured to display the data to be audited, the preset check rule, the audit check result, and the like.
The display device may also be a touch display screen for receiving user instructions while displaying the above-mentioned content to enable interaction with a user.
It should be understood that the processing device may be implemented by a processor reading instructions in a memory and executing the instructions, or may be implemented by a chip circuit.
The system is only an exemplary system, and when the system is implemented, the system can be set according to application requirements.
It is to be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the architecture of the data quality auditing system. In other possible embodiments of the present application, the foregoing architecture may include more or less components than those shown in the drawings, or combine some components, or split some components, or arrange different components, which may be determined according to practical application scenarios, and is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.
In addition, the system architecture described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not form a limitation on the technical solution provided in the embodiment of the present application, and it can be known by a person skilled in the art that the technical solution provided in the embodiment of the present application is also applicable to similar technical problems along with the evolution of the system architecture and the appearance of new service scenarios.
The technical solutions of the present application are described below with several embodiments as examples, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 2 is a schematic flow chart of a data quality auditing method provided in an embodiment of the present application, where an execution subject of the embodiment may be the processing device in the embodiment shown in fig. 1, and as shown in fig. 2, the method may include:
s201: and acquiring data to be audited.
For example, the processing device may determine a data topic to be audited, further obtain source data from a preset message queue according to the data topic, and analyze the source data according to a data structure of the source data to obtain the data to be audited.
The message queue stores active data, and different source data have different data subjects. For example, the processing device may pre-store a corresponding relationship between a data topic and source data, so as to determine, by using the corresponding relationship, source data corresponding to the data topic to be audited, and acquire the source data from a preset message queue storing the source data. Different source data have different data structures, and the processing device can analyze the data according to the data structure of the source data acquired from the preset message queue to acquire the data to be audited.
S202: and determining a preset auditing mode, wherein the preset auditing mode comprises a sliding time window or a sliding event window.
In the embodiment of the application, the processing device performs data sampling on the data to be audited according to a sliding time window or a sliding event window to obtain audit sampling data, so that the audit sampling data is verified according to a preset verification rule to obtain an audit verification result, less computing resources are consumed, and the processing speed is high.
Here, the sliding window is understood to be a window that can frame a time series according to a specified unit length to calculate a statistical index in the frame, and may be equivalent to a case where a slider of a specified length slides on a scale, and data in the slider can be fed back every time the slider slides by one unit. The specified unit length may include a specified unit time length, a specified unit event length, and the like. The sliding time window can be understood as being capable of framing the time series according to the specified unit time length, thereby calculating the statistical index in the frame. Similarly, the sliding event window can be understood as being capable of framing the time series according to the specified unit event length, thereby calculating the statistical index in the frame.
In addition, the step length and the window size of the sliding time window can be set to be both adjustable, and similarly, the step length and the window size of the sliding event window can also be set to be both adjustable, that is, the sampling frequency is adjustable, thereby meeting the requirements of various applications. Here, the processing device may define a step size (slide) and a window size (width) for a sliding time window or a sliding event window to sample data, and when the slide is the width, it is equivalent to the full amount audit; when the slide is less than the winsize, part of data is audited repeatedly, so that the slide is guaranteed to be equal to the winsize, and the data is prevented from being audited by repeated sampling. That is, the step size of the sliding time window is greater than or equal to the window size, and the step size of the sliding event window is greater than or equal to the window size.
For example, when determining the predetermined auditing manner, the processing device may first determine whether the predetermined auditing manner is cached in the memory. If the preset auditing mode is cached in the memory, the processing device can directly acquire the cached preset auditing mode, wherein the preset auditing mode is acquired after the cache is refreshed based on the auditing mode stored in the preset database after the preset time.
The processing device can refresh the cache based on the auditing mode stored in the preset database after the preset time, dynamically modify the rule and meet the requirement of real-time data auditing.
The auditing mode stored in the preset database can be an auditing mode set by a user according to different requirements. The preset time can be determined according to actual conditions, such as 1 hour.
S203: and performing data sampling on the data to be audited according to the sliding time window or the sliding event window to obtain audit sampling data.
S204: and verifying the audit sampling data according to a preset verification rule to obtain an audit verification result.
The preset verification rule comprises one or more of null value verification, regular verification, date verification, numerical value verification, enumeration verification and identity verification.
Here, before the processing device verifies the audit sample data according to the preset verification rule to obtain the audit verification result, the processing device may first determine whether the preset verification rule is cached in the memory. If the preset check rule is cached in the memory, the processing device may directly obtain the cached preset check rule, where the preset check rule is obtained after the cache is refreshed based on the check rule stored in the preset database after a preset time elapses.
The check rule stored in the preset database may be a check rule set by a user according to different needs. The preset time can be determined according to actual conditions, such as 1 hour.
According to the method and the device, the data to be audited are obtained, and then the preset auditing mode is determined according to the data to be audited, wherein the preset auditing mode comprises a sliding time window or a sliding event window, so that data sampling is carried out on the data to be audited according to the sliding time window or the sliding event window, the auditing sampling data are obtained, the auditing sampling data are verified according to the preset verification rule, the auditing verification result is obtained, and the problems that the existing data auditing consumes more computing resources in a full auditing mode and the auditing speed is slow are solved. In addition, in the embodiment of the application, the step length and the window size of the sliding time window are both adjustable, and the step length and the window size of the sliding event window are both adjustable, namely the sampling frequency is adjustable, so that various application requirements are met.
In addition, in the embodiment of the application, after the audit sampling data is verified according to the preset verification rule to obtain the audit verification result, the data to be audited can be put into a warehouse for processing according to the audit verification result and the preset abnormal data processing rule, so that the subsequent related personnel can directly obtain the related data from the warehouse. And after the warehousing processing, the processing device can also generate an alarm and/or a quality report according to the warehousing processing result, so that the processing device is convenient for relevant personnel to check and is suitable for application. Fig. 3 is a flowchart illustrating another data quality auditing method according to an embodiment of the present disclosure. As shown in fig. 3, the method includes:
s301: and acquiring data to be audited.
S302: and determining a preset auditing mode, wherein the preset auditing mode comprises a sliding time window or a sliding event window.
The step size and the window size of the sliding time window may be set to be both adjustable, and similarly, the step size and the window size of the sliding event window may also be set to be both adjustable.
S303: and according to the sliding time window or the sliding event window, carrying out data sampling on the data to be audited to obtain audit sampling data.
S304: and verifying the audit sampling data according to a preset verification rule to obtain an audit verification result.
The steps S301 to S304 refer to the related descriptions of the steps S201 to S204, and are not described herein again.
S305: and according to the audit verification result and a preset abnormal data processing rule, performing warehousing processing on the data to be audited.
The audit verification result comprises normal verification result or abnormal verification result, and the preset abnormal data processing rule comprises normal warehousing, discarding or separate warehousing.
For example, the processing device may perform normal warehousing, discarding or separate warehousing on the data to be audited according to whether the verification result is normal or abnormal.
For example, if the data is normally put in storage, the processing device may write the check result normally into the corresponding queue, and discard the check result that is abnormal. If the data is separately put in storage, the processing device can write the check result into the normal queue normally and write the check result into the abnormal queue abnormally.
In addition, before the processing device performs the warehousing processing on the data to be audited according to the audit result and the preset abnormal data processing rule, the processing device may first determine whether the preset abnormal data processing rule is cached in the memory. If the preset abnormal data processing rule is cached in the memory, the processing device may directly obtain the cached preset abnormal data processing rule, where the preset abnormal data processing rule is obtained after the cache is refreshed based on the abnormal data processing rule stored in the preset database after a preset time elapses.
The processing device can refresh the cache based on the abnormal data processing rule stored in the preset database after the preset time, dynamically modify the rule and meet the requirement of real-time data audit.
S306: and generating an alarm and/or a quality report according to the warehousing processing result.
For example, the processing device may generate an alarm according to the data stored in the abnormal queue, and/or generate a quality report according to the data stored in the normal queue and the abnormal queue.
In the embodiment of the application, after the audit sampling data is verified according to the preset verification rule to obtain the audit verification result, the data to be audited can be put into a warehouse to be processed according to the audit verification result and the preset abnormal data processing rule, so that the subsequent related personnel can directly obtain the related data from the warehouse. And after the warehousing processing, the processing device can also generate an alarm and/or a quality report according to the warehousing processing result, so that the processing device is convenient for relevant personnel to check and is suitable for application. In addition, according to the embodiment of the application, data sampling is performed on the data to be audited according to the sliding time window or the sliding event window, and the audit sampling data is obtained, so that the audit sampling data is verified according to the preset verification rule, an audit verification result is obtained, and the problems that the existing data audit consumes more computing resources in a full audit mode and the audit speed is slow are solved. The step length and the window size of the sliding time window are adjustable, and the step length and the window size of the sliding event window are adjustable, namely the sampling frequency is adjustable, so that the requirements of various applications are met.
In this embodiment of the application, as shown in fig. 4, the processing device may first determine a data topic to be audited, obtain source data from a preset message queue according to the data topic, analyze the source data according to a data structure of the source data to obtain data to be audited, and then, according to a configured auditing manner, if the data to be audited is detected by a sliding time window or a sliding event window, perform data sampling on the data to be audited according to the sliding time window or the sliding event window to obtain auditing sample data, where non-window data is directly issued to a sink node.
The processing device can further check the audit sampling data according to a preset check rule to obtain an audit check result. The preset verification rule comprises null value verification, regular verification, date verification, numerical value verification, enumeration verification, identity verification and the like. And then, the processing device can process the audit data according to the verification result and the abnormal data processing rule and send the audit data to the sink node. The processing device can perform warehousing operation on the data to be audited through the sink node, and if the data to be audited is normally warehoused, the processing device can write the check result into the corresponding queue normally and discard the check result which is abnormal. If the data is separately put in storage, the processing device can write the check result into the normal queue normally and write the check result into the abnormal queue abnormally. After the warehousing processing, the processing device can also generate an alarm and/or a quality report according to the warehousing processing result.
The processing device can cache the preset auditing mode, the preset verification rule and the abnormal data processing rule into a memory, refreshes the cache based on the auditing mode, the verification rule and the abnormal data processing rule stored in the preset database after the preset time, dynamically modifies the rule and is suitable for application.
Compared with the prior art, the processing device performs data sampling on the data to be audited according to the sliding time window or the sliding event window to obtain the audited sampled data, so that the audited sampled data are verified according to the preset verification rule to obtain the audited verification result, and the problems that the existing data auditing adopts a full auditing mode to consume more computing resources and the auditing speed is slow are solved. And the step length and the window size of the sliding time window are adjustable, and the step length and the window size of the sliding event window are adjustable, namely the sampling frequency is adjustable, so that the requirements of various applications are met. In addition, after the processing device checks the audit sampling data according to the preset check rule to obtain the audit check result, the processing device can also perform warehousing processing on the data to be audited according to the audit check result and the preset abnormal data processing rule, so that subsequent related personnel can directly obtain related data from a warehouse. And after the warehousing processing, the processing device can also generate an alarm and/or a quality report according to the warehousing processing result, so that the related personnel can conveniently check the report.
Fig. 5 is a schematic structural diagram of the data quality auditing apparatus according to the embodiment of the present application, which corresponds to the data quality auditing method according to the embodiment of the present application. For convenience of explanation, only portions related to the embodiments of the present application are shown. Fig. 5 is a schematic structural diagram of a data quality auditing apparatus according to an embodiment of the present application, where the data quality auditing apparatus 50 includes: an acquisition module 501, a determination module 502, a sampling module 503, and a verification module 504. The data quality auditing device may be the processing device itself, or a chip or an integrated circuit for implementing the functions of the processing device. It should be noted here that the division of the obtaining module, the determining module, the sampling module, and the checking module is only a division of logical functions, and the obtaining module, the determining module, the sampling module, and the checking module may be integrated or independent physically.
The obtaining module 501 is configured to obtain data to be audited.
The determining module 502 is configured to determine a preset auditing manner, where the preset auditing manner includes a sliding time window or a sliding event window.
And the sampling module 503 is configured to perform data sampling on the data to be audited according to the sliding time window or the sliding event window to obtain audit sampling data.
The checking module 504 is configured to check the audit sampling data according to a preset checking rule to obtain an audit checking result.
In a possible implementation manner, both the step size and the window size of the sliding time window are adjustable, and both the step size and the window size of the sliding event window are adjustable.
In a possible implementation manner, the obtaining module 501 is specifically configured to:
determining a data theme to be audited, and acquiring source data from a preset message queue according to the data theme;
and analyzing the source data according to the data structure of the source data to obtain the data to be audited.
In a possible implementation manner, the determining module 502 is specifically configured to:
judging whether the preset auditing mode is cached in a memory or not;
and if the preset auditing mode is cached in the memory, acquiring the cached preset auditing mode, wherein the preset auditing mode is acquired after the cache is refreshed based on the auditing mode stored in a preset database after the preset time.
In a possible implementation manner, the preset check rule includes one or more of null value check, regular check, date check, numerical check, enumeration check, and identity check.
The apparatus provided in the embodiment of the present application may be used to implement the technical solution of the method embodiment in fig. 2, which has similar implementation principles and technical effects, and is not described herein again in the embodiment of the present application.
Fig. 6 is a schematic structural diagram of another data quality auditing apparatus according to an embodiment of the present application. On the basis of fig. 5, the data quality auditing device 50 further includes: an exception handling module 505 and a result handling module 506.
The exception handling module 505 is configured to:
and according to the audit verification result and a preset abnormal data processing rule, performing warehousing processing on the data to be audited.
In a possible implementation manner, the audit verification result includes that the verification result is normal or the verification result is abnormal, and the preset abnormal data processing rule includes normal warehousing, discarding or separate warehousing.
The exception handling module 505 is specifically configured to:
and performing normal warehousing, discarding or separate warehousing treatment on the data to be audited according to the normal verification result or the abnormal verification result.
In one possible implementation, the result processing module 506 is configured to:
and generating an alarm and/or a quality report according to the warehousing processing result.
The apparatus provided in the embodiment of the present application may be used to implement the technical solution of the method embodiment in fig. 3, which has similar implementation principles and technical effects, and is not described herein again in the embodiment of the present application.
Alternatively, fig. 7A and 7B schematically provide a possible basic hardware architecture of the data quality auditing apparatus described in the present application, respectively.
Referring to fig. 7A and 7B, the data quality auditing apparatus includes at least one processor 701 and a communication interface 703. Further optionally, a memory 702 and a bus 704 may also be included.
In the data quality auditing device, the number of the processors 701 may be one or more, and fig. 7A and 7B only illustrate one of the processors 701. Alternatively, the processor 701 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Digital Signal Processor (DSP). If the data quality auditing apparatus has multiple processors 701, the types of the multiple processors 701 may be different, or may be the same. Optionally, the plurality of processors 701 of the data quality auditing apparatus may also be integrated into a multi-core processor.
Memory 702 stores computer instructions and data; the memory 702 may store computer instructions and data required to implement the above-described data quality auditing methods provided herein, e.g., the memory 702 stores instructions for implementing the steps of the above-described data quality auditing methods. Memory 702 can be any one or any combination of the following storage media: nonvolatile memory (e.g., Read Only Memory (ROM), Solid State Disk (SSD), hard disk (HDD), optical disk), volatile memory.
The communication interface 703 may provide information input/output for the at least one processor. Any one or any combination of the following devices may also be included: a network interface (e.g., an ethernet interface), a wireless network card, etc. having a network access function.
Optionally, the communication interface 703 may also be used for data communication between the data quality auditing device and other computing devices or terminals.
Further alternatively, fig. 7A and 7B show the bus 704 by a thick line. The bus 704 may connect the processor 701 with the memory 702 and the communication interface 703. Thus, via bus 704, processor 701 may access memory 702 and may also interact with other computing devices or terminals using communication interface 703.
In the present application, the data quality auditing device executes the computer instructions in the memory 702, so that the data quality auditing device implements the data quality auditing method provided by the present application, or the data quality auditing device deploys the data quality auditing apparatus.
From the viewpoint of logical functional division, as shown in fig. 7A, the memory 702 may include an obtaining module 501, a determining module 502, a sampling module 503, and a checking module 504. The inclusion herein merely refers to that instructions stored in the memory may, when executed, implement the functionality of the acquisition module, the determination module, the sampling module, and the verification module, respectively, and is not limited to physical structures.
In one possible design, as shown in fig. 7B, the exception handling module 505 and the result handling module 506 are included in the memory 702, and the inclusion merely refers to that the instructions stored in the memory can implement the functions of the exception handling module and the result handling module, respectively, when executed, and is not limited to a physical structure.
In addition, the data quality auditing device can be implemented by software as shown in fig. 7A and 7B, or can be implemented by hardware as a hardware module or as a circuit unit.
The present application provides a computer-readable storage medium, the computer program product comprising computer instructions that instruct a computing device to perform the above-mentioned data quality auditing method provided herein.
An embodiment of the present application provides a computer program product, which includes computer instructions, where the computer instructions are executed by a processor to perform the data quality auditing method provided in the present application.
The present application provides a chip comprising at least one processor and a communication interface providing information input and/or output for the at least one processor. Further, the chip may also include at least one memory for storing computer instructions. The at least one processor is used for calling and running the computer instructions to execute the data quality auditing method provided by the application.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Claims (10)

1. A data quality auditing method is characterized by comprising the following steps:
acquiring data to be audited;
determining a preset auditing mode, wherein the preset auditing mode comprises a sliding time window or a sliding event window;
according to the sliding time window or the sliding event window, carrying out data sampling on the data to be audited to obtain audited sampling data;
and verifying the audit sampling data according to a preset verification rule to obtain an audit verification result.
2. The method of claim 1, wherein the step size and window size of the sliding time window are adjustable, and wherein the step size and window size of the sliding event window are adjustable.
3. The method according to claim 1 or 2, wherein the obtaining data to be audited comprises:
determining a data theme to be audited, and acquiring source data from a preset message queue according to the data theme;
and analyzing the source data according to the data structure of the source data to obtain the data to be audited.
4. The method according to claim 1 or 2, wherein the determining the predetermined audit mode comprises:
judging whether the preset auditing mode is cached in a memory or not;
and if the preset auditing mode is cached in the memory, acquiring the cached preset auditing mode, wherein the preset auditing mode is acquired after the cache is refreshed based on the auditing mode stored in a preset database after the preset time.
5. The method as claimed in claim 1 or 2, wherein after the auditing sampled data is verified according to a preset verification rule to obtain an auditing verification result, the method further comprises:
and according to the audit verification result and a preset abnormal data processing rule, performing warehousing processing on the data to be audited.
6. The method as claimed in claim 5, wherein the audit result includes normal audit result or abnormal audit result, and the predetermined abnormal data processing rule includes normal warehousing, discarding or split warehousing;
the step of performing warehousing processing on the data to be audited according to the audit verification result and a preset abnormal data processing rule comprises the following steps:
and performing normal warehousing, discarding or separate warehousing treatment on the data to be audited according to the normal verification result or the abnormal verification result.
7. A data quality auditing apparatus, comprising:
the acquisition module is used for acquiring data to be audited;
the system comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining a preset auditing mode, and the preset auditing mode comprises a sliding time window or a sliding event window;
the sampling module is used for carrying out data sampling on the data to be audited according to the sliding time window or the sliding event window to obtain audited sampling data;
and the checking module is used for checking the audit sampling data according to a preset checking rule to obtain an audit checking result.
8. A data quality auditing apparatus, comprising:
a processor;
a memory; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-6.
9. A computer-readable storage medium, characterized in that it stores a computer program that causes a server to execute the method of any one of claims 1-6.
10. A computer program product comprising computer instructions for executing the method of any one of claims 1-6 by a processor.
CN202111376117.6A 2021-11-19 2021-11-19 Data quality auditing method, device, equipment and storage medium Pending CN114281807A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111376117.6A CN114281807A (en) 2021-11-19 2021-11-19 Data quality auditing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111376117.6A CN114281807A (en) 2021-11-19 2021-11-19 Data quality auditing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114281807A true CN114281807A (en) 2022-04-05

Family

ID=80869458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111376117.6A Pending CN114281807A (en) 2021-11-19 2021-11-19 Data quality auditing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114281807A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880312A (en) * 2022-05-17 2022-08-09 三峡高科信息技术有限责任公司 Flexibly-set application system service data auditing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880312A (en) * 2022-05-17 2022-08-09 三峡高科信息技术有限责任公司 Flexibly-set application system service data auditing method
CN114880312B (en) * 2022-05-17 2023-02-28 三峡高科信息技术有限责任公司 Flexibly-set application system service data auditing method

Similar Documents

Publication Publication Date Title
CN112162965B (en) Log data processing method, device, computer equipment and storage medium
CN108874672B (en) Application program exception positioning method, device, equipment and storage medium
CN110727556A (en) BMC health state monitoring method, system, terminal and storage medium
US20120089724A1 (en) Diagnosis of application server performance problems via thread level pattern analysis
CN108845914A (en) Generation method, electronic device and the readable storage medium storing program for executing of performance test report
CN112817831A (en) Application performance monitoring method, device, computer system and readable storage medium
CN110825731A (en) Data storage method and device, electronic equipment and storage medium
CN110807050B (en) Performance analysis method, device, computer equipment and storage medium
CN112463254A (en) Method, device and equipment for acquiring webpage loading time and storage medium
CN114281807A (en) Data quality auditing method, device, equipment and storage medium
US9569614B2 (en) Capturing correlations between activity and non-activity attributes using N-grams
CN112860762A (en) Method and apparatus for detecting time period overlap
CN110619541B (en) Application program management method, device, computer equipment and storage medium
CN116781568A (en) Data monitoring alarm method, device, equipment and storage medium
CN115619261A (en) Job label portrait data processing method and device and computer equipment
CN110020166A (en) A kind of data analysing method and relevant device
CN105245380B (en) Message propagation mode identification method and device
CN107122284A (en) Using monitoring method, device, electronic equipment and storage medium
CN113568773B (en) Abnormal service classification method, device, equipment and storage medium
CN115314404B (en) Service optimization method, device, computer equipment and storage medium
CN113905400B (en) Network optimization processing method and device, electronic equipment and storage medium
US20220269579A1 (en) Performance metric monitoring and feedback system
CN115757373A (en) Data warehouse cleaning method and device, computer equipment and storage medium
CN115344212A (en) Data desensitization method, device and system, electronic equipment and storage medium
CN114429360A (en) Conversion rate determination method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination