CN116028481A - Data quality detection method, device, equipment and storage medium - Google Patents

Data quality detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN116028481A
CN116028481A CN202310330194.0A CN202310330194A CN116028481A CN 116028481 A CN116028481 A CN 116028481A CN 202310330194 A CN202310330194 A CN 202310330194A CN 116028481 A CN116028481 A CN 116028481A
Authority
CN
China
Prior art keywords
target
detected
data
target data
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310330194.0A
Other languages
Chinese (zh)
Other versions
CN116028481B (en
Inventor
王锦胤
杨培
刘海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zijincheng Credit Investigation Co ltd
Original Assignee
Zijincheng Credit Investigation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zijincheng Credit Investigation Co ltd filed Critical Zijincheng Credit Investigation Co ltd
Priority to CN202310330194.0A priority Critical patent/CN116028481B/en
Publication of CN116028481A publication Critical patent/CN116028481A/en
Application granted granted Critical
Publication of CN116028481B publication Critical patent/CN116028481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The embodiment of the application provides a data quality detection method, a device, equipment and a storage medium, comprising the following steps: acquiring a data set to be detected, wherein one data set to be detected comprises a plurality of data values to be detected with different data types, the data values to be detected with different data types correspond to different tag identifications, and the tag identifications contained in one data set to be detected form a tag identification group; determining a target data index group and a target data group to be detected, which correspond to the target tag identification group, based on the target tag identification group; and determining a detection result of the target to-be-detected data included in the target to-be-detected data set based on the target data index set and the tag identification corresponding to the target to-be-detected data included in the target to-be-detected data set. The embodiment can realize the data quality detection of simple data and effectively improve the practicability of the data quality detection.

Description

Data quality detection method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a data quality detection method, a device, equipment and a storage medium.
Background
With the development of enterprise business, data types and data sources are more and more abundant, the data quantity is also rapidly increased, and enterprises face more and more data quality problems in data management work and data flow.
In the related technology, although the data quality can be monitored, the non-lightweight data quality monitoring method needs to consider the problems of version compatibility and the like, has serious version dependence, and has low cost performance for realizing the index statistics monitoring of simple data.
However, the above implementation is less cost effective for achieving data quality detection of simple data.
Disclosure of Invention
In view of the foregoing, embodiments of the present application provide a data quality detection method, apparatus, device, and storage medium, which overcome the foregoing problems.
In a first aspect, an embodiment of the present disclosure provides a data quality detection method, including:
acquiring a data set to be detected, wherein one data set to be detected comprises a plurality of data values to be detected with different data types, the data values to be detected with different data types correspond to different tag identifications, and the tag identifications contained in one data set to be detected form a tag identification group;
Determining a target data index group and a target data group to be detected, which correspond to a target tag identification group, based on the target tag identification group;
and determining a detection result of the target to-be-detected data included in the target to-be-detected data set based on the target data index set and the tag identification corresponding to the target to-be-detected data included in the target to-be-detected data set.
In an optional manner, the determining, based on the target tag identification group, the target data index group and the target data group to be detected corresponding to the target tag identification group includes:
screening target to-be-detected data sets which are the same as the target tag identification sets from the to-be-detected data sets based on the target tag identification sets;
and searching a target data index group corresponding to the target label identification group from a database based on the target label identification group.
In an optional manner, the determining, based on the target data index set and the tag identifier corresponding to the target to-be-detected data included in the target to-be-detected data set, a detection result of the target to-be-detected data included in the target to-be-detected data set includes:
determining the corresponding relation between the target to-be-detected data set and the target data index set based on a tag identifier corresponding to the target to-be-detected data included in the target to-be-detected data set and the target tag identifier set;
And sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result.
In an alternative way, the target data set to be detected includes a plurality of pieces;
the sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result includes:
sequentially comparing the target data to be detected of each target data set to be detected under the same label mark with the target data index of the target data index set;
and after the label identification comparison of the target to-be-detected data is finished, determining the detection result of the target to-be-detected data set based on the comparison result.
In an alternative way, the target data set to be detected includes a plurality of pieces;
the sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result includes:
According to the data sequence of the target data set to be detected, sequentially comparing the target data to be detected of each target data set to be detected with the target data indexes of the target data index set;
and after the label identification comparison of each piece of target to-be-detected data is finished, determining the detection result of the target to-be-detected data set based on the comparison result.
In an optional manner, the sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result includes:
responding to task triggering operation, sequentially comparing target data to be detected of the target data set to target data indexes of the target data index set, and determining a detection result of the target data set to be detected based on the comparison result;
wherein the triggering operation comprises an automatic triggering operation and a timing triggering operation.
In an alternative, the method further comprises:
sending the detection result of the target data set to be detected to a target object; or alternatively, the first and second heat exchangers may be,
and sending the detection result of the target data set to be detected to a target terminal.
In a second aspect, an embodiment of the present disclosure provides a data quality detection apparatus, including:
the device comprises an acquisition module, a detection module and a detection module, wherein the acquisition module is used for acquiring a data set to be detected, one data set to be detected comprises a plurality of data values to be detected with different data types, the data values to be detected with different data types correspond to different tag identifications, and the tag identifications included in one data set to be detected form a tag identification group;
the target data determining module is used for determining a target data index group and a target data group to be detected, which correspond to the tag identification group, based on the tag identification group;
the detection module is used for determining a detection result of the target to-be-detected data included in the target to-be-detected data set based on the target data index set and the tag identification corresponding to the target to-be-detected data included in the target to-be-detected data set.
In a third aspect, embodiments of the present disclosure provide a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method as in any one of the embodiments above when the computer program is executed.
In a fourth aspect, the disclosed embodiments provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of a method as in any of the above embodiments.
The data quality detection method, device, equipment and storage medium provided by the embodiment of the disclosure acquire a data set to be detected, and determine a target data index set and a target data set to be detected, which correspond to the target tag identification set, based on the target tag identification set; and determining a detection result of the target to-be-detected data included in the target to-be-detected data set based on the target data index set and the tag identification corresponding to the target to-be-detected data included in the target to-be-detected data set, namely acquiring the target data index set and the target to-be-detected data corresponding to the target tag identification set from a database through the target tag identification set, and further realizing detection of the target to-be-detected data set based on the tag identification, realizing data quality detection of simple data, and effectively improving the practicability of the data quality detection.
The foregoing description is only an overview of the technical solutions of the embodiments of the present application, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present application can be more clearly understood, and the following detailed description of the present application will be presented in order to make the foregoing and other objects, features and advantages of the embodiments of the present application more understandable.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a data quality detection method provided in the present embodiment;
fig. 2 is a schematic structural diagram of a data quality detecting apparatus provided in the present embodiment;
fig. 3 is a schematic structural diagram of a computer device provided in the present embodiment.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the drawings are intended to cover a non-exclusive inclusion.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of the phrase "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: there are three cases, a, B, a and B simultaneously. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Furthermore, the terms first, second and the like in the description and in the claims of the present application or in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order, and may be used to expressly or implicitly include one or more such features.
In the description of the present application, unless otherwise indicated, the meaning of "plurality" means two or more (including two), and similarly, "plural sets" means two or more (including two).
In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.
Fig. 1 is a flow chart of a data quality detection method according to an embodiment, where the data quality detection method may include the following steps.
S110, acquiring a data set to be detected.
The data detection method comprises the steps that one data group to be detected comprises a plurality of data values to be detected with different data types, the data values to be detected with different data types correspond to different tag identifications, and the tag identifications included in the data group to be detected form a tag identification group.
In a specific embodiment, as shown in the following table 1, table 1 exemplarily represents 3 groups of data groups to be detected, where in the data group to be detected 1, the data values to be detected include: a11, B12, C13 and D14, wherein the label corresponding to the data value A11 to be detected is labeled as label 11, the label corresponding to the data value B12 to be detected is labeled as label 12, the label corresponding to the data value C13 to be detected is labeled as label 13, and the label corresponding to the data D14 to be detected is labeled as label 14; in the data set to be detected 2, the data values to be detected include: a21, B22, C23 and D24, wherein the label corresponding to the data value A21 to be detected is identified as a label 21, the label corresponding to the data value B22 to be detected is identified as a label 22, the label corresponding to the data value C23 to be detected is identified as a label 23, and the label corresponding to the data D24 to be detected is identified as a label 24; in the data set 3 to be detected, the data values to be detected include: a31, B32, C33 and D34, wherein the label corresponding to the data value A31 to be detected is identified as a label 31, the label corresponding to the data value B32 to be detected is identified as a label 32, the label corresponding to the data value C33 to be detected is identified as a label 33, and the label corresponding to the data D34 to be detected is identified as a label 34; the label identification group formed by the data group 1 to be detected is [ label 11; a label 12; a label 13; tag 14. The tag identification group formed by the data group 2 to be detected is tag 21; a label 22; a label 23; tag 24 ", the tag identification group formed by the data group 3 to be detected is [ tag 31; a label 32; a label 33; tag 34 ", the tag identification group formed by the data set 4 to be detected is [ tag 41; a label 42; a label 43; label 44 ].
Figure SMS_1
It should be noted that the foregoing embodiments exemplarily represent a part of the data set to be detected, and in a specific embodiment, the data set to be detected further includes other data sets to be detected, and the data set to be detected is not limited to the tag identifier shown in the foregoing table 1.
In addition, the data set to be detected provided by the embodiment of the present disclosure may be stored in a data table form or may be stored in a text form, and the storage form of the data set to be detected is not specifically limited in the embodiment of the present disclosure.
In a specific implementation manner, in the process of detecting target data to be detected, a script program needs to be executed first to obtain a data set to be detected from a database.
S120, determining a target data index group and a target data group to be detected, which correspond to the target tag identification group, based on the target tag identification group.
In a specific embodiment, in order to realize data quality detection, a target tag identification group is first determined, a target to-be-detected data group with the same tag identification group as the target tag identification group is selected from the to-be-detected data groups based on the determined target tag identification group, and a target data index group stored in a system by the target tag identification group is determined.
In a specific embodiment, the target tag identification group may be a tag identification group set by user definition, or one tag identification group may be selected from preset tag identification groups to be used as the target tag identification group, which is not specifically limited in the embodiment of the present disclosure.
Illustratively, if it is desired to obtain the tag identification group from the data table, it includes [ tag identification 1; a tag label 2; a tag label 3; tag identification 4), determining the target tag identification group as [ tag identification 1 ]; a tag label 2; a tag label 3; after the target tag identification group is determined, searching target to-be-detected data which is the same as the target tag identification group in a database based on the target tag identification group, namely based on the target tag identification group [ tag identification 1 ]; a tag label 2; a tag label 3; the target to-be-detected data set searched by the tag identifier 4 is the data shown in the table 1.
In a specific embodiment, the target to-be-detected data set which is the same as the target tag identification set is screened from the to-be-detected data sets based on the target tag identification set.
Illustratively, as shown in the following tables 2 and 3, the data table to be detected further includes the following data set to be detected.
Figure SMS_2
/>
Figure SMS_3
It should be noted that in the foregoing embodiment, the to-be-detected data sets having the same tag identification group are exemplarily shown in the same table, and in a specific embodiment, all to-be-detected data sets are in one table, and when the to-be-detected data sets need to be subjected to data quality detection, the to-be-detected data table which is the same as the target tag identification group needs to be searched from the data tables based on the target tag identification group to form the target to-be-detected data set.
In a specific embodiment, based on the target tag identification group, a target data index group corresponding to the target tag identification group is searched from a database.
In a specific embodiment, the data index group is composed of different data indexes, the plurality of data indexes form the data index group, and the data indexes include a null value index, a repetition value index, a threshold value index, a data volume ring ratio index, a data volume homonymy index and the like by way of example, which is not particularly limited in the embodiment of the present disclosure.
Specifically, different tag identification groups are correspondingly provided with target data index groups, and the target data index groups are used for measuring the data quality of target data groups to be detected corresponding to the tag identification groups.
As a specific implementation manner, the correspondence between the tag identification group and the data index group may be set, for example, the tag identification group includes [ tag identification 1; a tag label 2; a tag label 3; the tag identifier 4 is that the data indexes corresponding to the tag identifiers in the tag identifier group are respectively index 1, index 2, index 3 and index 4, and the index 1, the index 2, the index 3 and the index 4 form a data index group, so that the tag identifier group and the data index group have the following corresponding relation: the index corresponding to the target to-be-detected data under the target mark 1 in the target to-be-detected data group 1 is the index 1, the index corresponding to the target to-be-detected data under the target mark 2 in the target to-be-detected data group 1 is the index 2, the index corresponding to the target to-be-detected data under the target mark 3 in the target to-be-detected data group 1 is the index 3, and the index corresponding to the target to-be-detected data under the target mark 4 in the target to-be-detected data group 1 is the index 4.
S130, determining a detection result of the target to-be-detected data included in the target to-be-detected data set based on the target data index set and the tag identification corresponding to the target to-be-detected data included in the target to-be-detected data set.
And determining the detection result of the target to-be-detected data included in the target to-be-detected data set by sequentially comparing each to-be-detected data in each to-be-detected data set with the corresponding index relation.
As an embodiment, determining, based on the target data index set and the tag identifier, a detection result of target to-be-detected data included in the target to-be-detected data set includes:
based on the label identification, determining the corresponding relation between the target data set to be detected and the target data index set;
and sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result.
For example, if the determined target tag identification group is [ tag identification 1; a tag label 2; a tag label 3; tag identification 4 ] based on the target tag identification group [ tag identification 1 ] after the target tag identification group is determined; a tag label 2; a tag label 3; the target to-be-detected data set searched by the tag identifier 4 is the data shown in the table 1, and the corresponding relation between the target to-be-detected data set and the target data index set is determined through the tag identifier. The method comprises the steps of determining a target data index corresponding to-be-detected data A11 in a target to-be-detected data set 1 as an index 1, a target data index corresponding to-be-detected data B12 in the target to-be-detected data set 1 as an index 2, a target data index corresponding to-be-detected data C13 in the target to-be-detected data set 1 as an index 3, and a target data index corresponding to-be-detected data D14 in the target to-be-detected data set 1 as an index 4 based on a tag identifier 1; the target data index corresponding to the data A21 to be detected in the target data set 2 to be detected is an index 1, the target data index corresponding to the data B22 to be detected in the target data set 2 to be detected is an index 2, the target data index corresponding to the data C23 to be detected in the target data set 2 to be detected is an index 3, and the target data index corresponding to the data D24 to be detected in the target data set 2 to be detected is an index 4; the target data index corresponding to the data A31 to be detected in the target data set 3 to be detected is an index 1, the target data index corresponding to the data B32 to be detected in the target data set 3 to be detected is an index 2, the target data index corresponding to the data C33 to be detected in the target data set 3 to be detected is an index 3, and the target data index corresponding to the data D34 to be detected in the target data set 3 to be detected is an index 4.
By sequentially comparing the relationship between the data to be detected A11 in the target data to be detected 1 and the index 1, the relationship between the data to be detected B12 in the target data to be detected 1 and the index 2, the relationship between the data to be detected C13 in the target data to be detected 1 and the index 3, and the relationship between the data to be detected D14 in the target data to be detected 1 and the index 4, the detection result of the target data to be detected 1, namely whether the target data to be detected 1 has abnormal data or not, can be determined. By sequentially comparing the relation between the data A21 to be detected in the target data set 2 to be detected and the index 1, the relation between the data B22 to be detected in the target data set 2 to be detected and the index 2, the relation between the data C23 to be detected in the target data set 2 to be detected and the index 3, and the relation between the data D24 to be detected in the target data set 2 to be detected and the index 4, the detection result of the target data set 2 to be detected, namely whether the target data set 2 to be detected has abnormal data or not, can be determined; by sequentially comparing the relationship between the data to be detected A31 in the target data set to be detected 3 and the index 1, the relationship between the data to be detected B32 in the target data set to be detected 3 and the index 2, the relationship between the data to be detected C33 in the target data set to be detected 3 and the index 3, and the relationship between the data to be detected D34 in the target data set to be detected 3 and the index 4, the detection result of the target data set to be detected 3, namely whether the target data set to be detected 3 has abnormal data or not, can be determined.
As a specific implementation manner, when the target to-be-detected data set includes a plurality of pieces, sequentially comparing target to-be-detected data of the target to-be-detected data set with target data indexes of the target data index set, determining a detection result of the target to-be-detected data set based on the comparison result, including: sequentially comparing the target data to be detected of each target data set to be detected under the same label mark with the target data index of the target data index set; and after the label identification comparison of the target to-be-detected data is finished, determining the detection result of the target to-be-detected data set based on the comparison result.
Specifically, when the determined target tag identification group is [ tag identification 1; a tag label 2; a tag label 3; tag identification 4 ] based on the target tag identification group [ tag identification 1 ] after the target tag identification group is determined; a tag label 2; a tag label 3; the label identification 4 is the data shown in table 1, after the corresponding relation between the target to-be-detected data set and the target data index set is determined through the label identification, one implementation manner is that the relation between the to-be-detected data a11 in the target to-be-detected data set 1 and the index 1 is compared, the relation between the to-be-detected data a21 in the target to-be-detected data set 2 and the index 1 is compared with the relation between the to-be-detected data a31 in the target to-be-detected data set 3, after the comparison of all to-be-detected data in the target to-be-detected data set 1 and the index 1 is finished, the relation between the to-be-detected data B12 in the target to-be-detected data set 1 and the index 2 is compared with the relation between the to-be-detected data B32 in the target to-be-detected data set 3 and the index 2 is performed according to the data sequence of the target to-be-detected data set 2, and the method is performed until the comparison result of the label identification included in the target to-be-detected data set is determined based on the comparison result.
As another specific implementation manner, when the target to-be-detected data set includes a plurality of pieces, sequentially comparing target to-be-detected data of the target to-be-detected data set with target data indexes of the target data index set, determining a detection result of the target to-be-detected data set based on the comparison result, including: sequentially comparing the target data to be detected of each target data set to be detected with the target data indexes of the target data index set according to the data sequence of the target data sets to be detected; and after the label identification comparison of each piece of target to-be-detected data is finished, determining the detection result of the target to-be-detected data set based on the comparison result.
Specifically, when the determined target tag identification group is [ tag identification 1; a tag label 2; a tag label 3; tag identification 4 ] based on the target tag identification group [ tag identification 1 ] after the target tag identification group is determined; a tag label 2; a tag label 3; the label mark 4 is the data shown in table 1, after the corresponding relation between the target to-be-detected data set and the target data index set is determined through the label mark, another implementation manner is that according to the data sequence of the target to-be-detected data set, the relation between the to-be-detected data A11 in the target to-be-detected data set 1 and the index 1, the relation between the to-be-detected data B12 in the target to-be-detected data set 1 and the index 2, the relation between the to-be-detected data C13 in the target to-be-detected data set 1 and the index 3, and the relation between the to-be-detected data D14 in the target to-be-detected data set 1 and the index 4 can be determined, namely whether abnormal data exists in the target to-be-detected data set 1. After the comparison of the target to-be-detected data set 1 is completed, then comparing the relation between the to-be-detected data A21 in the target to-be-detected data set 2 and the index 1, the relation between the to-be-detected data B22 in the target to-be-detected data set 2 and the index 2, the relation between the to-be-detected data C23 in the target to-be-detected data set 2 and the index 3, and the relation between the to-be-detected data D24 in the target to-be-detected data set 2 and the index 4, so that the detection result of the target to-be-detected data set 2, namely whether the target to-be-detected data set 2 has abnormal data or not, can be determined. Finally, comparing the relation between the data A31 to be detected in the target data set 3 to be detected and the index 1, the relation between the data B32 to be detected in the target data set 3 to be detected and the index 2, the relation between the data C33 to be detected in the target data set 3 to be detected and the index 3, and the relation between the data D34 to be detected in the target data set 3 to be detected and the index 4, so as to determine the detection result of the target data set 3 to be detected, namely whether the target data set 3 to be detected has abnormal data or not.
As an implementation manner, the method provided by the embodiment of the present disclosure further includes: sending a detection result of the target data set to be detected to a target object; or, sending the detection result of the target data set to be detected to the target terminal.
The detection result of the target to-be-detected data set is sent to the target object, so that the target object can timely receive information corresponding to abnormal target to-be-detected data in the target to-be-detected data set, the abnormal target to-be-detected data set is further processed, and the processing efficiency of staff on the abnormal to-be-detected data set is improved. In addition, the abnormal information of the target to-be-detected data set can be displayed on the target terminal by sending the detection result value of the target to-be-detected data set, so that a worker can conveniently check detailed information corresponding to normal target to-be-detected data.
According to the data quality detection method provided by the embodiment of the disclosure, after the data set to be detected is obtained, the target data index set and the target data set to be detected, which correspond to the target tag identification set, are searched in the data set to be detected, and finally, the detection result of the target data to be detected, which is included in the target data set to be detected, is determined based on the target data index set and the tag identification, so that the detection of the target data to be detected is realized.
As an implementation manner, in response to a task triggering operation, sequentially comparing target data to be detected of a target data set to target data indexes of a target data index set, and determining a detection result of the target data set to be detected based on the comparison result; the triggering operation comprises an automatic triggering operation and a timing triggering operation.
Specifically, the comparison of the target data to be detected of the target data set to the target data index of the target data index set can be realized through a task triggering operation, and as an implementation manner, the task triggering operation can be automatic triggering or manual triggering.
Fig. 2 is a schematic structural diagram of a data quality detection device according to the present embodiment, where the data quality detection device may include:
the obtaining module 210 is configured to obtain a data set to be detected, where one data set to be detected includes a plurality of data values to be detected with different data types, the data values to be detected with different data types correspond to different tag identifications, and the tag identifications included in one data set to be detected form a tag identification group.
The target data determining module 220 is configured to determine, based on the tag identification group, a target data index group and a target data group to be detected, which correspond to the tag identification group.
The detection module 230 is configured to determine a detection result of the target to-be-detected data included in the target to-be-detected data set based on the target data index set and the tag identifier.
In some embodiments, optionally, the target data determination module 220 includes: a first determination unit and a second determination unit.
And the first determining unit is used for screening the target to-be-detected data set which is the same as the target tag identification set from the to-be-detected data sets based on the target tag identification set.
And the second determining unit is used for searching the target data index group corresponding to the target label identification group from the database based on the target label identification group.
In some embodiments, the optional determining module 230 includes: and a correspondence relation determining unit and a detection result determining unit.
The corresponding relation determining unit is used for determining the corresponding relation between the target to-be-detected data set and the target data index set based on the tag identification corresponding to the target to-be-detected data included in the target to-be-detected data set and the target tag identification set.
And the detection result determining unit is used for sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result.
In some embodiments, the detection result determining unit is specifically configured to:
sequentially comparing the target data to be detected of each target data set to be detected under the same label mark with the target data index of the target data index set;
and after the label identification comparison of the target to-be-detected data is finished, determining the detection result of the target to-be-detected data set based on the comparison result.
In some embodiments, the detection result determining unit is further specifically configured to:
sequentially comparing the target data to be detected of each target data set to be detected with the target data indexes of the target data index set according to the data sequence of the target data sets to be detected;
and after the label identification comparison of each piece of target to-be-detected data is finished, determining the detection result of the target to-be-detected data set based on the comparison result.
In some embodiments, the detection result determining unit is specifically further configured to implement:
responding to task triggering operation, sequentially comparing target data to be detected of the target data set to target data indexes of the target data index set, and determining a detection result of the target data set to be detected based on the comparison result;
the triggering operation comprises an automatic triggering operation and a timing triggering operation.
In some embodiments, the data quality detection apparatus further comprises: and a transmitting module.
The sending module is used for sending the detection result of the target data set to be detected to the target object; or alternatively, the first and second heat exchangers may be,
and sending the detection result of the target data set to be detected to the target terminal.
After receiving the first rule triggering request, the rule processing device provided by the embodiment of the invention responds to the first rule triggering request, and executes corresponding target triggering operation on the target triggering rule carried by the first rule triggering request in the database to update the stored rule in the database so as to determine the compound rule based on the stored rule in the database, realize the combined use of the rules and store the obtained compound rule into the memory unit, thus, the memory unit can store the compound rule formed by combining a plurality of single rules, thereby being convenient for meeting the integrated realization of different wind control rules and effectively improving the usability of the wind control rule.
The embodiment of the application also provides computer equipment. Referring specifically to fig. 3, fig. 3 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device includes a memory 410 and a processor 420 communicatively coupled to each other via a system bus. It should be noted that only computer devices having components 410-420 are shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, a microprocessor, an application specific integrated circuit (Application Specific IntegratedCircuit, ASIC), a programmable gate array (Field-Programmable Gate Array, FPGA), a digital processor (DigitalSignal Processor, DSP), an embedded device, etc.
The computer device may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer device can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 410 includes at least one type of readable storage medium including non-volatile memory (non-volatile memory) or volatile memory, such as flash memory (flash memory), hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random access memory (random accessmemory, RAM), read-only memory (ROM), erasable programmable read-only memory (erasableprogrammable read-only memory, EPROM), electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), programmable read-only memory (programmable read-only memory, PROM), magnetic memory, RAM, optical disk, etc., which may include static or dynamic. In some embodiments, the memory 410 may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device. In other embodiments, the memory 410 may also be an external storage device of a computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), or the like, which are provided on the computer device. Of course, memory 410 may also include both internal storage units of a computer device and external storage devices. In this embodiment, the memory 410 is typically used to store an operating system installed on a computer device and various types of application software, such as program codes of the above-described methods. In addition, the memory 410 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 420 is typically used to perform the overall operations of the computer device. In this embodiment, the memory 410 is used for storing program codes or instructions, the program codes include computer operation instructions, and the processor 420 is used for executing the program codes or instructions stored in the memory 410 or processing data, such as the program codes for executing the above-mentioned method.
Herein, the bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, a peripheral component interconnect (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus system may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
Another embodiment of the present application also provides a computer-readable medium, which may be a computer-readable signal medium or a computer-readable medium. A processor in a computer reads computer readable program code stored in a computer readable medium, such that the processor is capable of performing the functional actions specified in each step or combination of steps in the above-described method; a means for generating a functional action specified in each block of the block diagram or a combination of blocks.
The computer readable medium includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared memory or semiconductor system, apparatus or device, or any suitable combination of the foregoing, the memory storing program code or instructions, the program code including computer operating instructions, and the processor executing the program code or instructions of the above-described methods stored by the memory.
The definition of memory and processor may refer to the description of the embodiments of the computer device described above, and will not be repeated here.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The functional units or modules in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all or part of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of first, second, third, etc. does not denote any order, and the words are to be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A method for detecting data quality, comprising:
acquiring a data set to be detected, wherein one data set to be detected comprises a plurality of data values to be detected with different data types, the data values to be detected with different data types correspond to different tag identifications, and the tag identifications contained in one data set to be detected form a tag identification group;
determining a target data index group and a target data group to be detected, which correspond to a target tag identification group, based on the target tag identification group;
and determining a detection result of the target to-be-detected data included in the target to-be-detected data set based on the target data index set and the tag identification corresponding to the target to-be-detected data included in the target to-be-detected data set.
2. The method of claim 1, wherein the determining, based on the target tag identification group, the target data index group and the target data to be detected group corresponding to the target tag identification group comprises:
screening target to-be-detected data sets which are the same as the target tag identification sets from the to-be-detected data sets based on the target tag identification sets;
and searching a target data index group corresponding to the target label identification group from a database based on the target label identification group.
3. The method according to claim 1, wherein the determining, based on the target data index set and the tag identifier corresponding to the target to-be-detected data included in the target to-be-detected data set, a detection result of the target to-be-detected data included in the target to-be-detected data set includes:
determining the corresponding relation between the target to-be-detected data set and the target data index set based on a tag identifier corresponding to the target to-be-detected data included in the target to-be-detected data set and the target tag identifier set;
and sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result.
4. A method according to claim 3, wherein the target set of data to be detected comprises a plurality of strips;
the sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result includes:
sequentially comparing the target data to be detected of each target data set to be detected under the same label mark with the target data index of the target data index set;
And after the label identification comparison of the target to-be-detected data is finished, determining the detection result of the target to-be-detected data set based on the comparison result.
5. A method according to claim 3, wherein the target set of data to be detected comprises a plurality of strips;
the sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result includes:
according to the data sequence of the target data set to be detected, sequentially comparing the target data to be detected of each target data set to be detected with the target data indexes of the target data index set;
and after the label identification comparison of each piece of target to-be-detected data is finished, determining the detection result of the target to-be-detected data set based on the comparison result.
6. A method according to claim 3, wherein the sequentially comparing the target data to be detected of the target data set to the target data index of the target data index set, and determining the detection result of the target data set to be detected based on the comparison result comprises:
Responding to task triggering operation, sequentially comparing target data to be detected of the target data set to target data indexes of the target data index set, and determining a detection result of the target data set to be detected based on the comparison result;
wherein the triggering operation comprises an automatic triggering operation and a timing triggering operation.
7. The method according to claim 1, wherein the method further comprises:
sending the detection result of the target data set to be detected to a target object; or alternatively, the first and second heat exchangers may be,
and sending the detection result of the target data set to be detected to a target terminal.
8. A data quality detection apparatus, comprising:
the device comprises an acquisition module, a detection module and a detection module, wherein the acquisition module is used for acquiring a data set to be detected, one data set to be detected comprises a plurality of data values to be detected with different data types, the data values to be detected with different data types correspond to different tag identifications, and the tag identifications included in one data set to be detected form a tag identification group;
the target data determining module is used for determining a target data index group and a target data group to be detected, which correspond to the tag identification group, based on the tag identification group;
the detection module is used for determining a detection result of the target to-be-detected data included in the target to-be-detected data set based on the target data index set and the tag identification corresponding to the target to-be-detected data included in the target to-be-detected data set.
9. A computer device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-7.
CN202310330194.0A 2023-03-30 2023-03-30 Data quality detection method, device, equipment and storage medium Active CN116028481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310330194.0A CN116028481B (en) 2023-03-30 2023-03-30 Data quality detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310330194.0A CN116028481B (en) 2023-03-30 2023-03-30 Data quality detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116028481A true CN116028481A (en) 2023-04-28
CN116028481B CN116028481B (en) 2023-06-27

Family

ID=86089780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310330194.0A Active CN116028481B (en) 2023-03-30 2023-03-30 Data quality detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116028481B (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411600A (en) * 2011-08-02 2012-04-11 暨南大学 Data quality automatic detection method based on implication rule
CN108073720A (en) * 2017-12-30 2018-05-25 广州明动软件股份有限公司 Data quality management system and method applied to big data system
CN108830554A (en) * 2018-05-29 2018-11-16 农业部规划设计研究院 The outcome data information quality intelligent detecting method and system of task based access control model
CN108898264A (en) * 2018-04-26 2018-11-27 深圳大学 A kind of calculation method and device being overlapped community's set quality Measure Indexes
CN109102329A (en) * 2018-07-27 2018-12-28 索信市场咨询(北京)有限公司 A kind of data sampling and processing and analysis application method and device
CN109491990A (en) * 2018-09-17 2019-03-19 武汉达梦数据库有限公司 A kind of method of detection data quality and the device of detection data quality
CN109524070A (en) * 2018-11-12 2019-03-26 北京懿医云科技有限公司 Data processing method and device, electronic equipment, storage medium
CN109656812A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Data quality checking method, apparatus and storage medium
CN110309309A (en) * 2019-07-03 2019-10-08 中国搜索信息科技股份有限公司 It is a kind of for assessing the method and system of artificial labeled data quality
CN111563074A (en) * 2020-04-28 2020-08-21 厦门市美亚柏科信息股份有限公司 Data quality detection method and system based on multi-dimensional label
CN112506901A (en) * 2020-11-30 2021-03-16 深圳微众信用科技股份有限公司 Data quality measuring method, device and medium
CN112650762A (en) * 2021-03-15 2021-04-13 腾讯科技(深圳)有限公司 Data quality monitoring method and device, electronic equipment and storage medium
US20210263900A1 (en) * 2020-02-26 2021-08-26 Ab Initio Technology Llc Generating rules for data processing values of data fields from semantic labels of the data fields
CN114860699A (en) * 2022-04-06 2022-08-05 深圳坐标软件集团有限公司 Data quality detection method, device, equipment and storage medium
CN115017969A (en) * 2022-04-21 2022-09-06 平安科技(深圳)有限公司 Data quality monitoring method and device for numerical label and electronic equipment
CN115146530A (en) * 2022-06-15 2022-10-04 蕴硕物联技术(上海)有限公司 Method, apparatus, medium, and program product for constructing welding quality detection model
CN115357572A (en) * 2022-08-30 2022-11-18 云南电网有限责任公司信息中心 Data quality inspection rule construction method, storage medium and system

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411600A (en) * 2011-08-02 2012-04-11 暨南大学 Data quality automatic detection method based on implication rule
CN108073720A (en) * 2017-12-30 2018-05-25 广州明动软件股份有限公司 Data quality management system and method applied to big data system
CN108898264A (en) * 2018-04-26 2018-11-27 深圳大学 A kind of calculation method and device being overlapped community's set quality Measure Indexes
CN108830554A (en) * 2018-05-29 2018-11-16 农业部规划设计研究院 The outcome data information quality intelligent detecting method and system of task based access control model
CN109102329A (en) * 2018-07-27 2018-12-28 索信市场咨询(北京)有限公司 A kind of data sampling and processing and analysis application method and device
CN109491990A (en) * 2018-09-17 2019-03-19 武汉达梦数据库有限公司 A kind of method of detection data quality and the device of detection data quality
CN109524070A (en) * 2018-11-12 2019-03-26 北京懿医云科技有限公司 Data processing method and device, electronic equipment, storage medium
CN109656812A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Data quality checking method, apparatus and storage medium
CN110309309A (en) * 2019-07-03 2019-10-08 中国搜索信息科技股份有限公司 It is a kind of for assessing the method and system of artificial labeled data quality
US20210263900A1 (en) * 2020-02-26 2021-08-26 Ab Initio Technology Llc Generating rules for data processing values of data fields from semantic labels of the data fields
CN111563074A (en) * 2020-04-28 2020-08-21 厦门市美亚柏科信息股份有限公司 Data quality detection method and system based on multi-dimensional label
CN112506901A (en) * 2020-11-30 2021-03-16 深圳微众信用科技股份有限公司 Data quality measuring method, device and medium
CN112650762A (en) * 2021-03-15 2021-04-13 腾讯科技(深圳)有限公司 Data quality monitoring method and device, electronic equipment and storage medium
CN114860699A (en) * 2022-04-06 2022-08-05 深圳坐标软件集团有限公司 Data quality detection method, device, equipment and storage medium
CN115017969A (en) * 2022-04-21 2022-09-06 平安科技(深圳)有限公司 Data quality monitoring method and device for numerical label and electronic equipment
CN115146530A (en) * 2022-06-15 2022-10-04 蕴硕物联技术(上海)有限公司 Method, apparatus, medium, and program product for constructing welding quality detection model
CN115357572A (en) * 2022-08-30 2022-11-18 云南电网有限责任公司信息中心 Data quality inspection rule construction method, storage medium and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李安然: "面向特定任务的大规模数据集质量高效评估", 《中国博士学位论文全文数据库 信息科技辑》, no. 09, pages 138 - 21 *

Also Published As

Publication number Publication date
CN116028481B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN113220657B (en) Data processing method and device and computer equipment
CN109933502B (en) Electronic device, user operation record processing method and storage medium
CN112181835A (en) Automatic testing method and device, computer equipment and storage medium
CN114428677B (en) Task processing method, processing device, electronic equipment and storage medium
CN116028481B (en) Data quality detection method, device, equipment and storage medium
CN112486841A (en) Method and device for checking data collected by buried point
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN114116811B (en) Log processing method, device, equipment and storage medium
CN115544558A (en) Sensitive information detection method and device, computer equipment and storage medium
CN111045983B (en) Nuclear power station electronic file management method, device, terminal equipment and medium
CN109977992B (en) Electronic device, method for identifying batch registration behaviors and storage medium
CN114329164A (en) Method, apparatus, device, medium and product for processing data
CN109885710B (en) User image depicting method based on differential evolution algorithm and server
CN113064984A (en) Intention recognition method and device, electronic equipment and readable storage medium
CN112835991B (en) System, method, device and storage medium for monitoring data
CN112417310B (en) Method for establishing intelligent service index and recommending intelligent service
CN117520610A (en) Data searching method, device, equipment and storage medium based on graph calculation
CN113536788B (en) Information processing method, device, storage medium and equipment
CN110851346B (en) Query statement boundary problem detection method, device, equipment and storage medium
CN112948328A (en) Retrieval method, device, equipment and medium of log data
CN117331786A (en) Service system alarm convergence method, device, computer equipment and storage medium
CN117093582A (en) Service test data checking method and device, electronic equipment and storage medium
CN117555619A (en) Data preprocessing method, device, equipment and medium
CN117278298A (en) Domain name detection method, device, equipment and storage medium based on artificial intelligence
CN115038089A (en) Multi-terminal data monitoring and collecting method based on information extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant