CN117851397A - Data quality detection method, device, equipment and storage medium - Google Patents

Data quality detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN117851397A
CN117851397A CN202410223333.4A CN202410223333A CN117851397A CN 117851397 A CN117851397 A CN 117851397A CN 202410223333 A CN202410223333 A CN 202410223333A CN 117851397 A CN117851397 A CN 117851397A
Authority
CN
China
Prior art keywords
detection
data
detected
data table
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410223333.4A
Other languages
Chinese (zh)
Inventor
胡迁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Geely Holding Group Co Ltd
Zhejiang Zeekr Intelligent Technology Co Ltd
Original Assignee
Zhejiang Geely Holding Group Co Ltd
Zhejiang Zeekr Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Geely Holding Group Co Ltd, Zhejiang Zeekr Intelligent Technology Co Ltd filed Critical Zhejiang Geely Holding Group Co Ltd
Priority to CN202410223333.4A priority Critical patent/CN117851397A/en
Publication of CN117851397A publication Critical patent/CN117851397A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data quality detection method, a device, equipment and a storage medium. The method is applied to the field of big data, and comprises the following steps: responding to a trigger instruction, acquiring the update time of the data table to be detected in a preset metadata module according to the identification of the data table to be detected, acquiring the detection time of the last quality detection of the data table to be detected in a preset data quality module, determining whether the data table to be detected is updated after the detection time, generating first prompt information to remind a user that the data table to be detected is not updated if the data table to be detected is not updated, and accumulating that the data table to be detected is not detected once in the data quality module. By the method, high-efficiency quality detection of the data is realized, and resource waste caused by repeated detection under the condition that the detection result is unchanged can be avoided.

Description

Data quality detection method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting data quality.
Background
With the development of information technology and the popularization of the internet, we have entered an era of data explosion, and a large amount of data is generated at every moment. In order to be able to better utilize these data and to obtain value from them, big data have been created. Big data can be defined as data that is huge in data volume, diverse in sources, complex in structure, and requires efficient processing, analysis, and management. In order to be able to obtain a high quality value in large data, the data quality of large data is therefore particularly important.
In the prior art, the quality detection of the data table is usually performed by setting a timing detection task by a user.
However, for a data table with a large data volume and a slow update, the data table may not be updated but the detection is repeated according to the time timing set by the user, so that a large amount of resources are wasted, and the quality detection efficiency of the data table is reduced.
Disclosure of Invention
The application provides a data quality detection method, a device, equipment and a storage medium, which are used for solving the problem of how to improve the data quality detection efficiency.
In a first aspect, the present application provides a data quality detection method, including:
responding to a trigger instruction, acquiring the update time of a data table to be detected in a preset metadata module according to the identification of the data table to be detected, and acquiring the detection time of the last quality detection of the data table to be detected in a preset data quality module, wherein the trigger instruction is used for indicating the quality detection of the data table to be detected, the metadata module comprises at least one data table, the identification of each data table, the update time and the detection result of the at least one data table, and the data quality module comprises the detection unexecuted times of the at least one data table, the detection time of each quality detection and the detection result;
Determining whether the data table to be detected is updated after the detection time according to the trigger instruction, the update time and the detection time;
if the fact that the data table to be detected is not updated after the detection time is determined, accumulating the data table to be detected in the data quality module for one time to detect that the data table to be detected is not executed, and generating first prompt information, wherein the first prompt information is used for prompting a user that the data table to be detected is not updated;
and sending the first prompt information to the user.
With reference to the first aspect, in some embodiments, the method further includes:
if the data table to be detected is updated after the detection time, performing quality detection on the data table to be detected according to the trigger instruction, the identification of the data table to be detected and a preset data standard to obtain new detection time and a new detection result;
and storing the new detection time and the detection result in the data quality module, and sending the detection result to the user.
With reference to the first aspect, in some embodiments, before the updating time of the to-be-detected data table is obtained in a preset metadata module according to the identification of the to-be-detected data table in response to the trigger instruction, and the detection time of the last quality detection of the to-be-detected data table is obtained in a preset data quality module, the method further includes:
And receiving a quality detection request sent by the user, wherein the quality detection request comprises the identification of the data table to be detected.
With reference to the first aspect, in some embodiments, the quality detection request further includes a detection instruction for timing a detection task and a detection interval time, then the method further includes:
according to the identification of the data table to be detected and the metadata module, predicting and calculating the detection duration of the data table to be detected to obtain the target detection duration corresponding to the data table to be detected;
determining whether the detection interval duration is greater than or equal to the target detection duration according to the target detection duration;
and if the detection interval duration is smaller than the target detection duration, generating second prompt information, and sending the second prompt information to the user, wherein the second prompt information comprises the target detection duration and is used for prompting the user to modify the detection interval duration.
With reference to the first aspect, in some embodiments, the detection request further includes a number of times the timing detection task is executed, and the method further includes:
The detection interval time length is determined to be greater than or equal to the target detection time length, the detection unexecuted times of the data table to be detected are obtained in the data quality module, and whether the detection unexecuted times reach the times threshold is determined according to a preset times threshold;
if the detection non-execution times reach the times threshold, generating third prompt information, and sending the third prompt information to the user, wherein the third prompt information is used for prompting the user to modify the execution times.
With reference to the first aspect, in some embodiments, the method further includes:
and if the detection unexecuted times do not reach the times threshold, generating the trigger instruction.
With reference to the first aspect, in some embodiments, the data quality module further includes a data type of each data table and a data amount that can be processed in a unit time corresponding to each type, and the predicting calculating, according to the identifier of the data table to be detected and the metadata module, a detection duration of the data table to be detected, to obtain a target detection duration corresponding to the data table to be detected includes:
Determining the data type and the data quantity of the data table to be detected according to the identification of the data table to be detected and the metadata module;
and calculating the target detection duration according to the data type, the data quantity and the data quantity which can be processed in the unit time corresponding to the data type.
In a second aspect, the present application provides a data quality detection apparatus, comprising:
the first acquisition module is used for responding to a trigger instruction, acquiring the update time of the data table to be detected in a preset metadata module according to the identification of the data table to be detected, and acquiring the detection time of the last quality detection of the data table to be detected in a preset data quality module, wherein the trigger instruction is used for indicating the quality detection of the data table to be detected, the metadata module comprises at least one data table, the identification and the update time of each data table, and the data quality module comprises the detection unexecuted times of the at least one data table, the detection time of each quality detection and the detection result;
the first determining module is used for determining whether the data table to be detected is updated or not after the detection time according to the trigger instruction, the update time and the detection time;
The first generation module is used for accumulating the to-be-detected data table for one time in the data quality module to detect non-execution if the to-be-detected data table is not updated after the detection time is determined, and generating first prompt information, wherein the first prompt information is used for prompting a user that the to-be-detected data table is not updated;
and the first sending module is used for sending the first prompt information to the user.
With reference to the second aspect, in some embodiments, the apparatus further includes:
the detection module is used for carrying out quality detection on the data table to be detected according to the trigger instruction, the identification of the data table to be detected and a preset data standard if the data table to be detected is updated after the detection time, so as to obtain new detection time and a detection result;
and the second sending module is used for storing the new detection time and the detection result in the data quality module and sending the detection result to the user.
With reference to the second aspect, in some embodiments, the method further includes:
and the receiving module is used for receiving a quality detection request sent by the user, wherein the quality detection request comprises the identification of the data table to be detected.
With reference to the second aspect, in some embodiments, the quality detection request further includes a detection instruction for timing a detection task and a detection interval time, and the apparatus further includes:
the calculation module is used for carrying out prediction calculation on the detection duration of the data table to be detected according to the identification of the data table to be detected and the metadata module to obtain the target detection duration corresponding to the data table to be detected;
the second determining module is used for determining whether the detection interval duration is greater than or equal to the target detection duration according to the target detection duration;
and the third sending module is used for generating second prompt information when the detection interval duration is smaller than the target detection duration, and sending the second prompt information to the user, wherein the second prompt information comprises the target detection duration and is used for prompting the user to modify the detection interval duration.
With reference to the second aspect, in some embodiments, the detection request further includes a number of times the timing detection task is executed, and the apparatus further includes:
the second acquisition module is used for determining that the detection interval time length is greater than or equal to the target detection time length, acquiring the detection unexecuted times of the data table to be detected in the data quality module, and determining whether the detection unexecuted times reach the times threshold according to a preset times threshold;
And the second generation module is used for generating third prompt information and sending the third prompt information to the user if the detection non-execution times reach the times threshold, wherein the third prompt information is used for prompting the user to modify the execution times.
With reference to the second aspect, in some embodiments, the apparatus further includes:
and the third generation module is used for generating the trigger instruction if the detection unexecuted times are not up to the times threshold.
With reference to the second aspect, in some embodiments, the data quality module further includes a data type of each data table and a corresponding data amount that can be processed in a unit time of each type, and the calculating module includes:
the determining unit is used for determining the data type and the data quantity of the data table to be detected according to the identification of the data table to be detected and the metadata module;
and the calculating unit is used for calculating the target detection duration according to the data type, the data quantity and the data quantity which can be processed in the unit time corresponding to the data type.
In a third aspect, the present application provides an electronic device, comprising: a processor, and a memory communicatively coupled to the processor, a display;
The memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of any one of the first aspects above.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the data quality detection method of any one of the first aspects when executed by a processor.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the data quality detection method of any of the first aspects.
The application provides a data quality detection method, a device, equipment and a storage medium. Responding to a trigger instruction, acquiring the update time of the data table to be detected in a preset metadata module according to the identification of the data table to be detected, acquiring the detection time of the last quality detection of the data table to be detected in a preset data quality module, determining whether the data table to be detected is updated after the detection time, generating first prompt information to remind a user that the data table to be detected is not updated if the data table to be detected is not updated, and accumulating that the data table to be detected is not detected once in the data quality module. By the method, high-efficiency quality detection of the data is realized, and resource waste caused by repeated detection under the condition that the detection result is unchanged can be avoided.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is an application system architecture diagram of a data quality detection method provided in an embodiment of the present application;
fig. 2 is a schematic flow chart of a first embodiment of a data quality detection method provided in the embodiment of the present application;
fig. 3 is a schematic flow chart of a second embodiment of a data quality detection method provided in the embodiment of the present application;
fig. 4 is a schematic flow chart of a third embodiment of a data quality detection method provided in the embodiment of the present application;
fig. 5 is a schematic flow chart of a fourth embodiment of a data quality detection method provided in the embodiment of the present application;
fig. 6 is a schematic structural diagram of a first embodiment of a data quality detecting apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a second embodiment of a data quality detecting apparatus according to the embodiment of the present application;
fig. 8 is a schematic structural diagram of a third embodiment of a data quality detecting apparatus according to the embodiment of the present application;
fig. 9 is a schematic structural diagram of a fourth embodiment of a data quality detecting apparatus according to the embodiment of the present application;
fig. 10 is a schematic structural diagram of a fifth embodiment of a data quality detecting apparatus according to the embodiment of the present application;
Fig. 11 is a schematic structural diagram of an electronic device provided in the present application.
Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
With the development of information technology and the popularization of the internet, we have entered an era of data explosion, and a large amount of data is generated at every moment. In order to be able to better utilize these data and to obtain value from them, big data have been created. Big data can be defined as data that is huge in data volume, diverse in sources, complex in structure, and requires efficient processing, analysis, and management. In order to be able to obtain a high quality value in large data, the data quality of large data is therefore particularly important. In the prior art, the quality detection of the data table is usually performed by setting a timing detection task by a user. However, for a data table with a large data volume and a slow update, the data table may not be updated but the detection is repeated according to the time timing set by the user, so that a large amount of resources are wasted, and the quality detection efficiency of the data table is reduced.
In view of the above problems, the present application provides a data quality detection method, apparatus, device, and storage medium, which implement high-efficiency data quality detection, and save resources. Specifically, the quality detection of data is usually performed by setting a timing task, for example, detecting at a fixed time every day, or setting the number of times of detection and the interval duration of each detection every day, so as to perform quality detection on the data. However, for the data with less fluctuation and longer update period, a great amount of resources are wasted by the detection of the user at a set timing every day, and for the data with short update period, the detection interval duration possibly set by the user is shorter than the detection duration of the data, so that the detection of the last time is not completed, and the detection of the new time is started, thus seriously affecting the efficiency and accuracy of the data quality detection and occupying resources. In view of these problems, the inventor researches whether the user can set a timing task, calculates the data amount which can be processed in the unit time of the data type according to the processed historical data, calculates the detection duration of the data table to be detected set by the user, sends the calculated detection duration to the user, and further determines whether the data table needs to be detected according to the update time of the data table to be detected and the last detection time, thereby saving resources and improving the detection efficiency of the data and the accuracy of the data detection.
Fig. 1 is an application system architecture diagram of a data quality detection method according to an embodiment of the present application, where, as shown in fig. 1, the system includes a data standard management module, a metadata management module, and a data quality module. The data standard management module is used for managing data standard specifications so as to provide other modules with related data standards for inquiring and using. Data standards may be formulated or imported in batches with reference to prescribed standards as well as standards of various industries. The metadata management module is used for managing metadata of various types of data sources, and comprises the following steps: metadata collection, metadata storage, metadata maintenance and metadata analysis, and metadata specification is performed. The data quality module checks the data quality in the data table by using various check rules preset by the system aiming at the data in the existing data table, and checks whether the existing data meets the standard defined in the data standard module. During the checking process, metadata of the data table needs to be queried from the metadata management module.
After the user sets the data detection task, the data quality module obtains the data table to be detected and the information of the data table from the metadata management module through the task set by the user, and whether the data table is updated after the last detection is ensured through the information of the data table, if not, the user is prompted that the detection is not executed, and the user can check the last detection result. If the data table is updated after the last detection, the data standard is acquired in the data standard management module, the quality detection is carried out on the data table according to the data standard, and the obtained detection result is sent to the user.
It should be noted that, if the user sets the timing detection task, the data quality module may further detect the timing task parameter set by the user according to the metadata in the metadata management module, so as to prompt the user, thereby improving accuracy and detection efficiency of data detection and saving resources.
The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings
Fig. 2 is a schematic flow chart of a first embodiment of a data quality detection method provided in an embodiment of the present application, as shown in fig. 2, where the method includes:
s201: and responding to the trigger instruction, acquiring the update time of the data table to be detected in a preset metadata module according to the identification of the data table to be detected, and acquiring the detection time of the last quality detection of the data table to be detected in a preset data quality module.
In this step, the data detection device generates a trigger instruction, and in response to the trigger instruction, further obtains the data table to be detected and the update time of the data table in the metadata module according to the identifier of the data table to be detected, and obtains the detection time of the last quality detection of the data table to be detected in the data quality module. The triggering instruction is used for indicating the quality detection of the data table to be detected, the metadata module comprises at least one data table, the identification of each data table and the updating time, and the data quality module comprises the detection unexecuted times of at least one data table, the detection time of each quality detection and the detection result.
Specifically, the data detection device may generate the trigger instruction according to the manual click of the user, or may automatically generate the trigger instruction at regular time by the data detection device.
The user may click the control of the data inspection device to generate the trigger instruction, or the user may operate the data inspection device through the terminal device, and the terminal device sends a request to the data inspection device to generate the trigger instruction according to the request. The specific manner in which the data detection device generates the trigger instruction is not specifically limited in this embodiment.
S202: and determining whether the data table to be detected is updated after the detection time according to the trigger instruction, the update time and the detection time.
In this step, the update time of the data table to be detected is obtained in the metadata module, and after the detection time of the last quality detection of the data table to be detected is obtained in the data quality module, in order to avoid invalid detection caused by long update period of the data table to be detected, whether the data table to be detected is updated after the detection time of the last quality detection is determined according to the trigger instruction.
Specifically, comparing the update time of the data table to be detected with the detection time of the last quality detection, if the update time is greater than the detection time, the data table to be detected is updated after the last quality detection, and if the update time is less than the detection time, the data table to be detected is not updated after the last quality detection.
S203: if it is determined that the data table to be detected is not updated after the detection time, accumulating the data table to be detected in the data quality module for one detection not to be executed, and generating first prompt information.
In this step, by determining whether the data table to be detected is updated after the detection time, if it is determined that the data table to be detected is not updated after the detection time, it is indicated that the detection result of the last quality detection is not changed, in order to save resources and improve the data detection efficiency, a first prompt message is generated, and the data quality module accumulates the data table to be detected for one time and does not execute the detection. The first prompt message is used for prompting a user that the data table to be detected is not updated.
For example, the first prompt message may be "after the last task is executed, the table is not changed at all, and the last detection result may be checked. The first prompt information is merely an example, so long as the user can be prompted that the data table to be detected is not updated, and the specific content of the first prompt information is not specifically limited in this embodiment.
S204: and sending the first prompt information to the user.
In this step, after it is determined that the data table to be detected has not been updated after the last quality detection, the current quality detection is not performed, and a first prompt message is generated, and in order to enable the user to learn the result, the first prompt message is sent to the user.
Specifically, the data detection device may send the first prompt information to the terminal device of the user through the communication interface, or may directly display the first prompt information on a display screen configured by the data detection device to the user.
Optionally, before step S201, the method further includes: and receiving a quality detection request sent by the user.
The detection task of the data table to be detected is set by a user, and before the data detection device generates a trigger instruction and responds, a quality detection request sent by the user is also required to be received. The quality detection request comprises identification of a data table to be detected.
Optionally, if the detection task set by the user is a timing detection task, the quality detection request may further include a detection instruction and a detection interval time of the timing detection task, and may further include the execution times of the timing detection task.
In one possible design, the quality detection request sent by the user may further include identifiers of a plurality of data tables to be detected. The specific number of the data tables to be detected is not specifically limited in this embodiment.
According to the data quality detection method provided by the embodiment, in response to a trigger instruction, according to the identification of the data table to be detected, the update time of the data table to be detected is obtained in a preset metadata module, the detection time of the last quality detection of the data table to be detected is obtained in a preset data quality module, whether the data table to be detected is updated after the detection time is determined, if not, a first prompt message is generated to remind a user that the data table to be detected is not updated, and the data table to be detected is accumulated in the data quality module for one time to be detected and not executed. By the method, high-efficiency quality detection of the data is realized, and resource waste caused by repeated detection under the condition that the detection result is unchanged can be avoided.
Fig. 3 is a schematic flow chart of a second embodiment of a data quality detection method provided in the embodiment of the present application, as shown in fig. 3, where on the basis of the foregoing embodiment, the method further includes:
s301: if the data table to be detected is updated after the detection time, quality detection is carried out on the data table to be detected according to the trigger instruction, the identification of the data table to be detected and the preset data standard, and new detection time and detection result are obtained.
In this step, whether the data table to be detected is updated after the detection time is determined, if the data table to be detected is updated after the detection time, which indicates that the data in the data table to be detected is changed, the quality of the data table is changed compared with the last quality detection result, and the new quality detection is needed to be performed on the data table to be detected, and the detection time of the data table and the detection result of the quality detection are updated in the data quality module.
Specifically, after the fact that the data table to be detected is updated after the last quality detection is determined, a preset data standard is obtained in the data standard management module, and then quality detection is carried out on the data table to be detected according to the data standard, so that a detection result is obtained.
S302: and storing the new detection time and the detection result in a data quality module, and sending the detection result to a user.
In the step, after the quality detection of the data table to be detected is completed, the obtained detection result and the new detection time are stored in a data quality module, and the detection result is sent to a user.
Specifically, the detection result can be sent to the terminal equipment of the user through the communication interface, and then the user can check the detection result through the terminal equipment, and the detection result can also be directly displayed to the user through the display screen of the data detection equipment.
Optionally, on the basis of the foregoing embodiment, after determining whether the to-be-detected data table has an update after the detection time, if the to-be-detected data table has no update after the detection time, prompting the user to directly look up the detection result of the primary quality detection. After that, whether the preset data standard in the data standard management module is reset after the last quality detection of the data table to be detected can be judged, if the data standard is reset after the last quality detection of the data table to be detected, the inaccuracy of the last detection result is indicated, at this time, the quality detection is carried out on the data table to be detected according to the newly set data standard, and the detection result is sent to the user.
According to the data quality detection method provided by the embodiment, if the data table to be detected is updated after the detection time, quality detection is carried out on the data table to be detected according to the trigger instruction, the identification of the data table to be detected and the preset data standard, so that new detection time and detection result are obtained, the new detection time and detection result are stored in the data quality module, and the detection result is sent to a user. By the method, the data to be detected is updated after the last quality detection, so that the quality detection is carried out on the data to be detected, the efficiency of data detection is improved, and the resource waste caused by repeated detection is avoided.
Fig. 4 is a schematic flow chart of a third embodiment of a data quality detection method provided in the embodiment of the present application, as shown in fig. 4, on the basis of the foregoing embodiments, a quality detection request may further include a detection instruction of a timing detection task and a detection interval time, where the method further includes:
s401: and according to the identification of the data table to be detected and the metadata module, carrying out prediction calculation on the detection time length of the data table to be detected to obtain the target detection time length corresponding to the data table to be detected.
In this step, when the user sets a timing detection task for detecting the data table to be detected, and the quality detection request sent by the data detection device includes a detection instruction of the timing detection task and a detection interval time, the data detection device performs prediction calculation on the detection duration of the data table to be detected according to the detection instruction, so as to determine whether the detection interval set by the user can perform normal quality detection on the data table.
Specifically, according to the identification of the data table to be detected and the metadata module, the data type and the data quantity of the data table to be detected are determined, and then the target detection duration is calculated according to the data type, the data quantity and the data quantity which can be processed in the unit time corresponding to the data type.
S402: and determining whether the detection interval duration is greater than or equal to the target detection duration according to the target detection duration.
In the step, the detection duration of the data table to be detected is predicted, after the target detection duration is obtained, the target detection duration is compared with the detection interval duration set by the user, and whether the detection interval duration is greater than or equal to the target detection duration is further determined.
Specifically, after a user sets a timing detection task, the data detection device automatically triggers the data table to be detected to carry out quality detection according to the detection interval duration set by the user, if the detection interval duration is longer than or equal to the target detection duration, the next quality detection is triggered only when the detection interval duration reaches the detection interval duration after the data table to be detected is detected once, and if the detection interval duration is shorter than the target detection duration, the next quality detection is triggered only when the data table to be detected is not detected once, the accuracy of data detection is affected, and resources are seriously wasted.
S403: and if the detection interval duration is smaller than the target detection duration, generating second prompt information and sending the second prompt information to the user.
In this step, by judging whether the detection interval duration is greater than or equal to the target detection duration, if it is determined that the detection interval duration is less than the target detection duration, it is indicated that the next quality detection is triggered when the detection of the data table to be detected is not completed yet, the detection interval duration of the timing task set by the user is unreasonable, and accuracy of data detection cannot be guaranteed, so that second prompt information is generated, where the second prompt information includes the target detection duration, and is used for prompting the user to modify the detection interval duration.
It should be noted that, the second prompting information further includes any content that may prompt the user to modify the detection interval duration, and the embodiment is not limited specifically.
S404: and if the detection interval time length is greater than or equal to the target detection time length, acquiring the detection unexecuted times of the data table to be detected in the data quality module, and determining whether the detection unexecuted times reach the times threshold according to the preset times threshold.
In this step, whether the detection interval duration is greater than or equal to the target detection duration is determined, and if the detection interval duration is greater than or equal to the target detection duration, it is indicated that the detection interval duration set by the user can normally perform normal detection on the data table to be detected. The detection request can also comprise the execution times of the timing detection task, so that the high efficiency of data quality detection is improved, the resource waste caused by repeated execution of the task is avoided, and whether the execution times of the timing detection task are reasonable or not is verified after the detection interval time is longer than or equal to the target detection time length is determined.
Specifically, according to the identification of the data table to be detected, the detection non-execution times of the data table are obtained in the data quality module, and the detection non-execution times are compared with a preset time threshold to determine whether the detection non-execution times reach the time threshold. If the number of times of detection non-execution reaches the number threshold, the detection frequency of the data table to be detected is too high, so that the data table to be detected is unreasonable, and if the number of times of detection non-execution does not reach the number threshold, the number of times of execution of the timing detection task set by a user is reasonable.
S405: if the detection non-execution times reach the times threshold, generating third prompt information and sending the third prompt information to the user.
In this step, comparing the number of times of non-execution detection with the number of times threshold through the above steps, if it is determined that the number of times of non-execution detection reaches the number of times threshold, it is indicated that the detection frequency of the data table to be detected set by the user is too high, unreasonable, and modification is required, and then third prompt information is generated, where the third prompt information is used to prompt the user to modify the number of times of execution.
Optionally, the data detection device may calculate, according to the update time of the data table to be detected, an update frequency of the data table to be detected, and the third prompt information may further include the update frequency, so that the update frequency is provided to the user to modify the execution frequency.
S406: and if the detection non-execution times do not reach the time threshold, generating a trigger instruction.
In this step, by comparing the number of times of non-execution detection with the number of times threshold, if it is determined that the number of times of non-execution detection does not reach the number of times threshold, it is indicated that the number of times of execution of the timing task set by the user is reasonable, and it is indicated that the timing detection task set by the user is reasonable, and then the timing detection task set by the user can be executed, so that a trigger instruction is generated, and the data detection device can trigger the detection task of the data table to be detected according to the trigger instruction.
Illustratively, the mysql type data source is executed to check the value range of the "gener" field of the data table1 under the data source dataBase1 (check that the value of the "gener" field of the table is not "male" and "female" data), and the execution is set to be 0:00 time per day. Setting the threshold of the times to be 3, and if detection is not executed and the table is not changed in any way, generating prompt information to prompt a user: the change frequency of the data table is low, and the time execution frequency of the task is reduced, namely the execution times are modified; if the number of times of detecting non-execution is less than 3, generating a trigger instruction, and the data detection device may execute the technical solution in the foregoing embodiment in response to the trigger instruction.
According to the data quality detection method provided by the embodiment, according to the identification of the data table to be detected and the metadata module, the detection duration of the data table to be detected is predicted and calculated to obtain the target detection duration corresponding to the data table to be detected, whether the detection interval duration is greater than or equal to the target detection duration is determined according to the target detection duration, if the detection interval duration is less than the target detection duration is determined, second prompt information is generated, and the second prompt information is sent to a user and is sent to the user; if the detection interval time length is greater than or equal to the target detection time length, acquiring the detection unexecuted times of the data table to be detected in the data quality module, determining whether the detection unexecuted times reach the times threshold according to a preset times threshold, generating third prompt information if the detection unexecuted times reach the times threshold, transmitting the third prompt information to a user, and generating a trigger instruction if the detection unexecuted times do not reach the times threshold. By the method, the detection parameters of the timing detection task set by the user are verified, the detection time of the data table to be detected is predicted based on the data quantity and the data type of the data table to be detected, and the auxiliary effect is achieved for the user, so that the efficiency and the accuracy of data detection are improved.
Fig. 5 is a schematic flow chart of a fourth embodiment of a data quality detection method provided in the embodiment of the present application, as shown in fig. 5, based on the foregoing embodiments, step S401 specifically includes:
s501: and determining the data type and the data quantity of the data table to be detected according to the identification of the data table to be detected and the metadata module.
In this step, in order to accurately predict the detection time of the data table to be detected, the data type of the data table to be detected and the data amount in the data table to be detected are determined in the metadata module according to the identification of the data table to be detected.
Specifically, the metadata module stores a huge amount of data tables, and each data table has the data type and the data volume of the table.
S502: and calculating to obtain the target detection duration according to the data type, the data quantity and the data quantity which can be processed in the unit time corresponding to the data type.
In the step, after the data type and the data amount of the data table to be detected are determined, the data amount which can be processed in the unit time corresponding to the data type is acquired in the data quality module according to the data type of the data table to be detected, and finally, the target detection duration of the data table to be detected is calculated according to the data amount of the data table to be detected.
Specifically, the data detection device counts the types and the data amounts of the detected data tables and the detection duration of each data table according to each data table detected in the history, classifies the detected data tables according to the data types, calculates the data amounts processed in unit time of each data table corresponding to each data type, averages the data amounts processed in unit time of the data table corresponding to each data type, and stores the data amounts processed in unit time corresponding to each data type.
Illustratively, if a mysql type data source performs a null value check rule of a column1 field of a table1 data table under a dataBase1 dataBase under the data source (i.e., checks a record that a column1 field value of table1 is null), 1 ten thousand pieces of data can be checked per second, and then the check data amount per unit time of the table null value check rule is 1 ten thousand/second. The data type of the data table to be detected is mysql type, and the data volume of the data table is 1000 ten thousand pieces, the target detection time required for quality detection of the data table to be detected is 1000/60=16.67 minutes.
According to the data quality detection method provided by the embodiment, the data type and the data quantity of the data table to be detected are determined according to the identification of the data table to be detected and the metadata module, and the target detection duration is calculated according to the data type, the data quantity and the data quantity which can be processed in the unit time corresponding to the data type. According to the method, the time consumption of executing the data quality inspection at the present time is predicted according to the data volume of the table and the time consumption of executing the quality measurement at the last time, so that the data detection efficiency is improved, the repeated detection is prevented, and the resources are saved.
Fig. 6 is a schematic structural diagram of a first embodiment of a data quality detection device according to an embodiment of the present application, as shown in fig. 6, a data quality detection device 600 includes:
the first obtaining module 601 is configured to respond to a trigger instruction, obtain, according to an identifier of a data table to be detected, an update time of the data table to be detected in a metadata module set in advance, and obtain, in a data quality module set in advance, a detection time of a last quality detection of the data table to be detected, where the trigger instruction is used to instruct the data table to be detected to perform quality detection, the metadata module includes at least one data table, an identifier of each data table, an update time, and the data quality module includes a number of times of detection non-execution of the at least one data table, a detection time of each quality detection, and a detection result.
The first determining module 602 is configured to determine whether there is an update in the data table to be detected after the detection time according to the trigger instruction, the update time and the detection time.
The first generating module 603 is configured to accumulate, in the data quality module, that the to-be-detected data table is not detected for one time if it is determined that the to-be-detected data table is not updated after the detection time, and generate first prompt information, where the first prompt information is used to prompt a user that the to-be-detected data table is not updated.
The first sending module 604 is configured to send the first prompt information to the user.
Fig. 7 is a schematic structural diagram of a second embodiment of a data quality detection device according to the embodiment of the present application, as shown in fig. 7, a data quality detection device 600 further includes:
the detection module 701 is configured to, if it is determined that the data table to be detected is updated after the detection time, perform quality detection on the data table to be detected according to the trigger instruction, the identifier of the data table to be detected, and a preset data standard, so as to obtain a new detection time and a new detection result.
And the second sending module 702 is configured to store the new detection time and the detection result in the data quality module, and send the detection result to the user.
The receiving module 703 is configured to receive a quality detection request sent by a user, where the quality detection request includes an identifier of a data table to be detected.
Fig. 8 is a schematic structural diagram of a third embodiment of a data quality detection apparatus according to the embodiment of the present application, as shown in fig. 8, where, on the basis of the foregoing embodiments of the respective embodiments of the apparatus, a quality detection request further includes a detection instruction for a timing detection task and a detection interval time, and then the data quality detection apparatus 600 further includes:
the calculating module 801 is configured to perform prediction calculation on a detection duration of the data table to be detected according to the identifier of the data table to be detected and the metadata module, so as to obtain a target detection duration corresponding to the data table to be detected.
A second determining module 802, configured to determine, according to the target detection duration, whether the detection interval duration is greater than or equal to the target detection duration.
And the third sending module 803 is configured to determine that the detection interval duration is less than the target detection duration, generate a second prompt message, and send the second prompt message to the user, where the second prompt message includes the target detection duration, and the second prompt message is configured to prompt the user to modify the detection interval duration.
Fig. 9 is a schematic structural diagram of a fourth embodiment of a data quality detection apparatus according to the embodiment of the present application, where, as shown in fig. 9, on the basis of the foregoing embodiments of the respective devices, a detection request further includes the execution times of a timing detection task, and then the data quality detection apparatus 600 further includes:
The second obtaining module 901 is configured to determine that the detection interval time is greater than or equal to the target detection time, obtain the number of times of detection non-execution of the data table to be detected in the data quality module, and determine, according to a preset number of times threshold, whether the number of times of detection non-execution reaches the number threshold.
And the second generating module 902 is configured to generate third prompting information if it is determined that the number of times of detection that the number of times of non-execution reaches the number of times threshold, and send the third prompting information to the user, where the third prompting information is used to prompt the user to modify the number of times of execution.
Optionally, the data quality detection apparatus 600 further includes:
the third generating module 903 is configured to generate a trigger instruction if it is determined that the number of times of non-execution is detected does not reach the number of times threshold.
Fig. 10 is a schematic structural diagram of a fifth embodiment of a data quality detection apparatus provided in the embodiment of the present application, as shown in fig. 10, on the basis of the foregoing embodiments of the respective apparatus, a data type of each data table and a data amount that can be processed in a unit time corresponding to each type are further included in a data quality module, and a calculation module 801 includes:
the determining unit 1001 is configured to determine a data type and a data amount of the data table to be detected according to the identifier of the data table to be detected and the metadata module.
The calculating unit 1002 is configured to calculate, according to the data type, the data amount, and the data amount that can be processed in a unit time corresponding to the data type, a target detection duration.
The data quality detection device provided by the application can execute the data quality detection method in the method embodiment, and the implementation principle and the technical effect are similar, and are not repeated here.
Fig. 11 is a schematic structural diagram of an electronic device provided in the present application. As shown in fig. 11, the electronic device 1100 specifically includes a processor, and a memory, a display, and a communication connection with the processor;
memory 1101 stores computer-executable instructions. In particular, the computer-executable instructions may comprise program code comprising computer-operational instructions. The memory 1101 may include a high-speed RAM memory or may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.
The processor 1102 executes computer-executable instructions stored in the memory 1101 to implement the data quality detection method described in the foregoing method embodiments. The processor 1102 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more integrated circuits configured to implement embodiments of the present application.
The display 1103 is used for displaying prompt information to a user.
The electronic device 1100 may also include a communication interface through which communication interactions with external devices may occur. The external device may be, for example, an electronic device such as a computer.
In a specific implementation, if the communication interface, the memory 1101, and the processor 1102 are implemented independently, the communication interface, the memory 1101, and the processor 1102 may be connected to each other and communicate with each other through a bus. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (Peripheral Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. Buses may be divided into address buses, data buses, control buses, etc., but do not represent only one bus or one type of bus.
Alternatively, in a specific implementation, if the communication interface, the memory 1101, and the processor 1102 are implemented on a single chip, the communication interface, the memory 1101, and the processor 1102 may complete communication through an internal interface.
The present application also provides a computer-readable storage medium having stored therein computer-executable instructions, wherein the computer-readable storage medium may comprise: various media capable of storing computer-executed instructions, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, are used in the data quality detection method in the above embodiment when the computer-executed instructions are executed by a processor.
The present application also provides a computer program product comprising a computer program stored in a readable storage medium. At least one processor of the electronic device 1100 may read a computer program from a readable storage medium, the at least one processor executing the computer program to cause the electronic device 1100 to implement the data quality detection methods provided by the various embodiments described above.
The application also provides a chip, on which a computer program is stored, which when executed by the chip, implements the data quality detection method provided by the various embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for detecting data quality, comprising:
responding to a trigger instruction, acquiring the update time of a data table to be detected in a preset metadata module according to the identification of the data table to be detected, and acquiring the detection time of the last quality detection of the data table to be detected in a preset data quality module, wherein the trigger instruction is used for indicating the quality detection of the data table to be detected, the metadata module comprises at least one data table, the identification of each data table and the update time, and the data quality module comprises the detection unexecuted times of the at least one data table, the detection time of each quality detection and the detection result;
determining whether the data table to be detected is updated after the detection time according to the trigger instruction, the update time and the detection time;
if the fact that the data table to be detected is not updated after the detection time is determined, accumulating the data table to be detected in the data quality module for one time to detect that the data table to be detected is not executed, and generating first prompt information, wherein the first prompt information is used for prompting a user that the data table to be detected is not updated;
And sending the first prompt information to the user.
2. The method according to claim 1, wherein the method further comprises:
if the data table to be detected is updated after the detection time, performing quality detection on the data table to be detected according to the trigger instruction, the identification of the data table to be detected and a preset data standard to obtain new detection time and a new detection result;
and storing the new detection time and the detection result in the data quality module, and sending the detection result to the user.
3. The method according to claim 2, wherein before acquiring the update time of the data table to be detected in the metadata module set in advance according to the identification of the data table to be detected in response to the trigger instruction, and acquiring the detection time of the last quality detection of the data table to be detected in the data quality module set in advance, the method further comprises:
and receiving a quality detection request sent by the user, wherein the quality detection request comprises the identification of the data table to be detected.
4. A method according to claim 3, wherein the quality detection request further includes a detection instruction for timing a detection task and a detection interval time, and the method further includes:
According to the identification of the data table to be detected and the metadata module, predicting and calculating the detection duration of the data table to be detected to obtain the target detection duration corresponding to the data table to be detected;
determining whether the detection interval duration is greater than or equal to the target detection duration according to the target detection duration;
and if the detection interval duration is smaller than the target detection duration, generating second prompt information, and sending the second prompt information to the user, wherein the second prompt information comprises the target detection duration and is used for prompting the user to modify the detection interval duration.
5. The method of claim 4, wherein the detection request further includes a number of times the timing detection task is performed, and the method further includes:
the detection interval time length is determined to be greater than or equal to the target detection time length, the detection unexecuted times of the data table to be detected are obtained in the data quality module, and whether the detection unexecuted times reach the times threshold is determined according to a preset times threshold;
If the detection non-execution times reach the times threshold, generating third prompt information, and sending the third prompt information to the user, wherein the third prompt information is used for prompting the user to modify the execution times.
6. The method of claim 5, wherein the method further comprises:
and if the detection unexecuted times do not reach the times threshold, generating the trigger instruction.
7. The method according to any one of claims 4 to 6, wherein the data quality module further includes a data type of each data table and a data amount that can be processed in a unit time corresponding to each type, and the predicting calculating, according to the identifier of the data table to be detected and the metadata module, a detection duration of the data table to be detected, to obtain a target detection duration corresponding to the data table to be detected includes:
determining the data type and the data quantity of the data table to be detected according to the identification of the data table to be detected and the metadata module;
and calculating the target detection duration according to the data type, the data quantity and the data quantity which can be processed in the unit time corresponding to the data type.
8. A data quality detection apparatus, comprising:
the first acquisition module is used for responding to a trigger instruction, acquiring the update time of the data table to be detected in a preset metadata module according to the identification of the data table to be detected, and acquiring the detection time of the last quality detection of the data table to be detected in a preset data quality module, wherein the trigger instruction is used for indicating the quality detection of the data table to be detected, the metadata module comprises at least one data table, the identification and the update time of each data table, and the data quality module comprises the detection unexecuted times of the at least one data table, the detection time of each quality detection and the detection result;
the first determining module is used for determining whether the data table to be detected is updated or not after the detection time according to the trigger instruction, the update time and the detection time;
the first generation module is used for accumulating the to-be-detected data table for one time in the data quality module to detect non-execution if the to-be-detected data table is not updated after the detection time is determined, and generating first prompt information, wherein the first prompt information is used for prompting a user that the to-be-detected data table is not updated;
And the first sending module is used for sending the first prompt information to the user.
9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor, a display;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the data quality detection method of any one of claims 1 to 7.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are for implementing the data quality detection method according to any of claims 1 to 7.
CN202410223333.4A 2024-02-28 2024-02-28 Data quality detection method, device, equipment and storage medium Pending CN117851397A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410223333.4A CN117851397A (en) 2024-02-28 2024-02-28 Data quality detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410223333.4A CN117851397A (en) 2024-02-28 2024-02-28 Data quality detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117851397A true CN117851397A (en) 2024-04-09

Family

ID=90536448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410223333.4A Pending CN117851397A (en) 2024-02-28 2024-02-28 Data quality detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117851397A (en)

Similar Documents

Publication Publication Date Title
CN112597263B (en) Pipe network detection data abnormity judgment method and system
CN109871251B (en) Response data processing method and device, storage medium and terminal equipment
CN109670091B (en) Metadata intelligent maintenance method and device based on data standard
CN107316156B (en) Data processing method, device, server and storage medium
CN109213476B (en) Installation package generation method, computer readable storage medium and terminal equipment
CN111090593A (en) Method, device, electronic equipment and storage medium for determining crash attribution
CN112434308A (en) Application vulnerability detection method and device, electronic equipment and computer storage medium
CN111488736B (en) Self-learning word segmentation method, device, computer equipment and storage medium
CN111427784A (en) Data acquisition method, device, equipment and storage medium
CN116302902A (en) Method and device for generating test case, electronic equipment and storage medium
CN117851397A (en) Data quality detection method, device, equipment and storage medium
CN116774673A (en) Data calibration method and device, electronic equipment and storage medium
CN114564502A (en) Electric power data complementary copying method and system based on Redis cache technology
CN111309623B (en) Coordinate class data classification test method and device
CN109902067B (en) File processing method and device, storage medium and computer equipment
CN114996519B (en) Data processing method, device, electronic equipment, storage medium and product
CN111782479A (en) Log processing method and device, electronic equipment and computer readable storage medium
CN117609881B (en) Metal overlap detection method and system based on artificial intelligence
CN112783732B (en) Database table capacity monitoring method and device
CN107391330B (en) Method and system for testing computer performance under Itanium platform
CN115409647A (en) Energy router service life prediction method and device based on artificial intelligence
CN117236313A (en) Test data analysis method and device, electronic equipment and storage medium
CN118226120A (en) Power consumption determination method and device, electronic equipment and storage medium
CN117370158A (en) Test processing method, system, electronic equipment and medium
CN117609064A (en) Unit test method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination