CN111209538A - Table data quality probing method and device - Google Patents

Table data quality probing method and device Download PDF

Info

Publication number
CN111209538A
CN111209538A CN202010004964.9A CN202010004964A CN111209538A CN 111209538 A CN111209538 A CN 111209538A CN 202010004964 A CN202010004964 A CN 202010004964A CN 111209538 A CN111209538 A CN 111209538A
Authority
CN
China
Prior art keywords
field
check rule
preset
rule
check
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010004964.9A
Other languages
Chinese (zh)
Inventor
堵新政
张毅然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202010004964.9A priority Critical patent/CN111209538A/en
Publication of CN111209538A publication Critical patent/CN111209538A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method and a device for detecting the quality of table data, wherein the method comprises the following steps: acquiring a plurality of check rules respectively corresponding to the preset attribute categories; establishing a mapping relation between each type of check rule and a preset field set; determining a target table field in the preset field set corresponding to the table field to be probed based on the editing distance between the table field to be probed and the plurality of table fields in the preset field set; determining a check rule corresponding to the target table field according to the mapping relation between each check rule and a preset field set; and utilizing a check rule corresponding to the target table field to carry out quality detection on the example data of the table field to be detected. The method and the device can achieve the effects of quickly and accurately completing quality exploration of the table data and saving exploration cost of the table data quality.

Description

Table data quality probing method and device
Technical Field
The application relates to the technical field of table data quality exploration, in particular to a table data quality exploration method and device.
Background
Data exploration is to achieve the purpose of knowing data characteristics and data quality through query and analysis of a data set. Data exploration is a very important link in data quality and is a critical step for determining data correctness. Data exploration can provide guidance for subsequent ETL of data, such as how many cleansing mechanisms are needed, the ratio of dirty data, the amount of data, the structure of data, etc.
For data exploration in a structured database table, the conventional mode is to perform statistical analysis by manually writing an SQL query instruction, the mode is complex in workload, not intelligent enough and low in exploration efficiency, errors are easily caused when the SQL query instruction is manually written, the workload of manual input is large, and the exploration cost of the table data quality is improved.
Disclosure of Invention
In view of this, an object of the present application is to provide a method and an apparatus for investigating table data quality, which can achieve the effects of quickly and accurately completing table data quality investigation and saving table data quality investigation cost.
In a first aspect, an embodiment of the present application provides a method for investigating table data quality, including:
acquiring a plurality of check rules respectively corresponding to the preset attribute categories;
establishing a mapping relation between each type of check rule and a preset field set;
determining a target table field in the preset field set corresponding to the table field to be probed based on the editing distance between the table field to be probed and the plurality of table fields in the preset field set;
determining a check rule corresponding to the target table field according to the mapping relation between each check rule and a preset field set;
and utilizing a check rule corresponding to the target table field to carry out quality detection on the example data of the table field to be detected.
With reference to the first aspect, an embodiment of the present application provides a first possible implementation manner of the first aspect, where before obtaining the check rules corresponding to the multiple preset attribute categories, the method further includes:
and establishing a check rule base by using check rules respectively corresponding to a plurality of preset attribute categories.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a second possible implementation manner of the first aspect, where after the building of the verification rule base, the method further includes:
receiving an editing instruction for the verification rule base, and editing the verification rule base based on the editing instruction;
wherein the editing instructions comprise: modification instructions for modifying the verification rules of the verification rule base and/or addition instructions for adding custom verification rules.
With reference to the first aspect, this embodiment provides a first possible implementation manner of the third aspect, where determining, based on an edit distance between a table field to be explored and a plurality of table fields in the preset field set, a target table field in the preset field set corresponding to the table field to be explored includes:
determining editing distances between the table fields to be probed and the plurality of table fields in the preset field set;
determining a minimum edit distance from a plurality of the edit distances;
if the minimum editing distance corresponds to one table field in the preset field set, determining the table field as a target table field;
and if the minimum editing distance corresponds to at least two table fields in the preset field set, randomly selecting one table field of the at least two table fields as a target table field.
With reference to the first aspect, an embodiment of the present application provides a fourth possible implementation manner of the first aspect, where determining, according to a mapping relationship between each type of the check rule and a preset field set, a check rule corresponding to a field of the target table includes:
recommending a check rule for the target table field according to the mapping relation between each check rule and a preset field set;
and determining the recommended check rule as the check rule corresponding to the target table field.
With reference to the first aspect, an embodiment of the present application provides a fifth possible implementation manner of the first aspect, where determining, according to a mapping relationship between each of the check rules and a preset field set, a check rule corresponding to a field of the target table includes:
recommending a check rule for the target table field according to the mapping relation between each check rule and a preset field set;
receiving an adjusting instruction for the recommended verification rule, and adjusting the recommended verification rule based on the adjusting instruction;
and determining the adjusted check rule as the check rule corresponding to the target table field.
With reference to the first aspect, an embodiment of the present application provides a sixth possible implementation manner of the first aspect, where performing quality inspection on example data of the table field to be inspected by using a check rule corresponding to the target table field includes:
utilizing a check rule corresponding to the target table field to perform quality detection on the example data of the table field to be detected one by one;
after all the table data are subjected to quality detection, at least one of the following quality detection results of the table data is obtained: the total amount of table data, the number of illegal fields, the proportion of illegal fields and the corresponding check rule of the table fields.
In a second aspect, an embodiment of the present application further provides a table data quality detection apparatus, including:
the rule obtaining module is used for obtaining a plurality of check rules respectively corresponding to the preset attribute categories;
the mapping establishing module is used for establishing the mapping relation between each check rule and a preset field set;
a first determining module, configured to determine, based on an edit distance between a table field to be explored and a plurality of table fields in the preset field set, a target table field in the preset field set corresponding to the table field to be explored;
the second determining module is used for determining the check rule corresponding to the target table field according to the mapping relation between each check rule and a preset field set;
and the quality probing module is used for probing the quality of the example data of the table field to be probed by using the check rule corresponding to the target table field.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor communicates with the storage medium through the bus, and the processor executes the machine-readable instructions to perform the steps of any one of the possible implementation manners of the first aspect.
In a fourth aspect, this application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of any one of the possible implementation manners in the first aspect.
According to the table data quality probing method and device provided by the embodiment of the application, the mapping relation between various check rules and the preset field set is established, the target table field is determined based on the editing distance between the table field to be probed and the table field in the preset field set, the check rule corresponding to the target table field is determined based on the mapping relation, and the quality probing is performed on the example data of the table field to be probed by using the check rule. Compared with the prior art that statistical analysis is carried out by manually writing the SQL query instruction, the method and the device can quickly and accurately complete the quality exploration of the table data, thereby improving the data exploration efficiency and the data exploration accuracy, and saving the exploration cost of the table data quality without manually writing the SQL query instruction for statistical analysis.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a flowchart illustrating a table data quality probing method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram illustrating a table data quality inspection apparatus according to an embodiment of the present application;
fig. 3 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
In order to enable a person skilled in the art to use the present disclosure, the following embodiments are given in connection with the specific application scenario "personal information table data quality exploration". It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Although the present application is described primarily in the context of a "people information table data quality survey," it should be understood that this is merely one exemplary embodiment.
It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.
Referring to fig. 1, fig. 1 is a flowchart of a table data quality probing method according to an embodiment of the present disclosure. As shown in fig. 1, the method may include:
s101, obtaining a plurality of check rules respectively corresponding to the preset attribute categories.
In one possible embodiment, before step S101, the method further comprises: and establishing a check rule base by using check rules respectively corresponding to a plurality of preset attribute categories. The preset attribute category generally refers to a common attribute category herein. The check rule base is roughly divided into two categories: a character-type attribute rule and a numerical-type attribute rule. The character-type attribute categories include: binary attributes (e.g., gender, yes, etc.), enumerated attributes (province, country, marital status, etc.), ordinal attributes (title, satisfaction, etc.), identification card attributes, cell phone number attributes, MAC address attributes, license plate number attributes, non-null attributes, etc. The numerical attribute categories include: interval attributes (e.g., temperature, score, age, etc.), ratios (percentages, etc.).
In a possible embodiment, after the building the check rule base, the method further includes: and receiving an editing instruction for the verification rule base, and editing the verification rule base based on the editing instruction. Wherein the editing instructions comprise: modification instructions for modifying the verification rules of the verification rule base and/or addition instructions for adding custom verification rules. In specific implementation, an open interface can be provided for a user, and the user is supported to develop a self-defined check rule or modify the existing check rule according to the service requirement of the user.
And S102, establishing a mapping relation between each check rule and a preset field set.
In step S102, the mapping relationship between the check rule and the preset field set may be represented by establishing a check rule name in < R, [ F1, F2 … Fi ] >, where R is 1), "F1, F2 … Fi" in [ F1, F2 … Fi ] being a common english name of a field in the preset field set to which the rule is applied.
S103, determining a target table field in a preset field set corresponding to the table field to be probed based on the editing distance between the table field to be probed and a plurality of table fields in the preset field set.
In one possible embodiment, step S103 comprises: determining editing distances between the table fields to be probed and the plurality of table fields in the preset field set; determining a minimum edit distance from a plurality of the edit distances; if the minimum editing distance corresponds to one table field in the preset field set, determining the table field as a target table field; and if the minimum editing distance corresponds to at least two table fields in the preset field set, randomly selecting one table field of the at least two table fields as a target table field. Specifically, an Edit Distance (Edit Distance) algorithm is adopted to calculate an Edit Distance D between the table field c to be explored and the common field F { F1, F2 … fi } in the preset field set, where the Edit Distance D is { Lcf1, Lcf2 … Lcfi }, where Lcfi represents an Edit Distance between the table field c to be explored and the common field fi, and the smaller the Edit Distance, the greater the similarity of the fields. Taking the minimum Dmin of the editing distance D, and finding a common field f of the Dmin; and if the same Dmin corresponds to a plurality of fields, randomly selecting one field as a target table field.
And S104, determining the check rule corresponding to the field of the target table according to the mapping relation between each check rule and a preset field set.
In one possible embodiment, step S104 includes: recommending a check rule for the target table field according to the mapping relation between each check rule and a preset field set; and determining the recommended check rule as the check rule corresponding to the target table field. The embodiment can automatically recommend the corresponding check rule for the target table field.
In another possible embodiment, step S104 includes: recommending a check rule for the target table field according to the mapping relation between each check rule and a preset field set; receiving an adjusting instruction for the recommended verification rule, and adjusting the recommended verification rule based on the adjusting instruction; and determining the adjusted check rule as the check rule corresponding to the target table field. According to the embodiment, the user can adjust the recommended verification rule, and the requirements of the user can be met more flexibly. Optionally, the user may not use the recommended verification rule, and may select the target verification rule according to an operation instruction of the user.
S105, detecting the quality of the example data of the table field to be detected by using the check rule corresponding to the target table field. If the instance data meets the verification logic, the verification is passed, and the instance data is legal data, otherwise, the instance data is illegal data.
In one possible embodiment, step S105 comprises: utilizing a check rule corresponding to the target table field to perform quality detection on the example data of the table field to be detected one by one; after all the table data are subjected to quality detection, at least one of the following quality detection results of the table data is obtained: the total amount of table data, the number of illegal fields, the proportion of illegal fields and the corresponding check rule of the table fields.
The table data quality probing method is described in detail below with specific embodiments.
1) Assuming a person information table A, the table fields and data are shown in Table-A:
Figure BDA0002354916050000081
TABLE-A
2) Obtaining a rule base R { < identity card rule, [ sfz, sfzh, zjhm ] >, < age rule, [ nl, age, nj ] >, < gender rule, [ xb, sex ] >, < mobile phone number rule, [ sjhm, sjh, sj, dhhm ] >, < province rule, [ province, sf, shengfen, szsf ] >);
3) establishing a mapping relation between each check rule in a rule base R and a common field set, wherein the identity card rule corresponds to a field ' identity card number ', the age rule corresponds to a field ' age ', the gender rule corresponds to a field ' gender ', the mobile phone number rule corresponds to a field ' mobile phone number ', and the province rule corresponds to a field ' province;
4) calculating similarity, namely editing distance, of a common field set [ sfz, sfzh, zjhm ], [ nl, age, nj ], [ xb, sex ], [ sjhm, sjh, sj, dhhm ], [ provice, sf, shengfen, szsf ] with the identity card number (sfzh), age (nl), sex (xb), mobile phone number (sjh) and province (szsf) in the table to determine a target table field;
5) according to the calculation result, the ID card number recommends using an ID card rule, an age rule, a gender rule, a mobile phone number rule and a province rule;
6) scanning all data, and after the rules are verified, finding out that 2 pieces of identity cards are illegal, 1 piece of age is illegal, 0 piece of gender is illegal, 1 piece of mobile phone number is illegal, and 1 piece of province is illegal;
7) and the statistical result of the exploration table A is 3 in total data, 5 in illegal fields, 33.3% (5/15) in percentage, 5 in loading rule number and other analysis indexes.
According to the table data quality probing method provided by the embodiment of the application, the mapping relation between various check rules and a preset field set is established, a target table field is determined based on the editing distance between a table field to be probed and the table field in the preset field set, the check rule corresponding to the target table field is determined based on the mapping relation, and quality probing is performed on example data of the table field to be probed by using the check rule. Compared with the prior art that statistical analysis is carried out by manually writing the SQL query instruction, the method and the device can quickly and accurately complete the quality exploration of the table data, thereby improving the data exploration efficiency and the data exploration accuracy, and saving the exploration cost of the table data quality without manually writing the SQL query instruction for statistical analysis.
Based on the same technical concept, embodiments of the present application further provide a table data quality detection apparatus, an electronic device, a computer-readable storage medium, and the like, which can be seen in the following embodiments.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a table data quality detecting apparatus according to an embodiment of the present application. As shown in fig. 2, the table data quality inspection apparatus includes:
the rule obtaining module 210 is configured to obtain check rules corresponding to a plurality of preset attribute categories.
And the mapping establishing module 220 is configured to establish a mapping relationship between each of the check rules and a preset field set.
A first determining module 230, configured to determine a target table field in the preset field set corresponding to a table field to be explored, based on an edit distance between the table field to be explored and a plurality of table fields in the preset field set.
A second determining module 240, configured to determine, according to a mapping relationship between each of the check rules and a preset field set, a check rule corresponding to a field of the target table.
And a quality probing module 250, configured to perform quality probing on the instance data of the table field to be probed according to the check rule corresponding to the target table field.
In one possible embodiment, the table data quality detecting device further includes:
the rule building module 260 is configured to build a verification rule base by using the verification rules respectively corresponding to the plurality of preset attribute categories.
In one possible embodiment, the table data quality detecting device further includes:
a rule editing module 270, configured to receive an editing instruction for the verification rule base, and edit the verification rule base based on the editing instruction;
wherein the editing instructions comprise: modification instructions for modifying the verification rules of the verification rule base and/or addition instructions for adding custom verification rules.
In a possible embodiment, the first determining module 230 is specifically configured to: determining editing distances between the table fields to be probed and the plurality of table fields in the preset field set; determining a minimum edit distance from a plurality of the edit distances; when the minimum editing distance corresponds to one table field in the preset field set, determining the table field as a target table field; and when the minimum editing distance corresponds to at least two table fields in the preset field set, randomly selecting one table field of the at least two table fields as a target table field.
In a possible embodiment, the second determining module 240 is specifically configured to: recommending a check rule for the target table field according to the mapping relation between each check rule and a preset field set; and determining the recommended check rule as the check rule corresponding to the target table field.
In a possible embodiment, the second determining module 240 is specifically configured to: recommending a check rule for the target table field according to the mapping relation between each check rule and a preset field set; receiving an adjusting instruction for the recommended verification rule, and adjusting the recommended verification rule based on the adjusting instruction; and determining the adjusted check rule as the check rule corresponding to the target table field.
In one possible embodiment, the quality probing module 250 is specifically configured to: utilizing a check rule corresponding to the target table field to perform quality detection on the example data of the table field to be detected one by one; after all the table data are subjected to quality detection, at least one of the following quality detection results of the table data is obtained: the total amount of table data, the number of illegal fields, the proportion of illegal fields and the corresponding check rule of the table fields.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 3, includes: the processor 301, the storage medium 302, and the bus 303, where the storage medium 302 stores machine-readable instructions executable by the processor 301, when the electronic device runs, the processor 301 and the storage medium 302 communicate via the bus 303, and the processor 301 executes the machine-readable instructions to perform the method described in the foregoing method embodiment.
The computer program product of the table data quality probing method provided in the embodiment of the present application includes a computer-readable storage medium storing a nonvolatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, and is not described herein again.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for probing the quality of table data, comprising:
acquiring a plurality of check rules respectively corresponding to the preset attribute categories;
establishing a mapping relation between each type of check rule and a preset field set;
determining a target table field in the preset field set corresponding to the table field to be probed based on the editing distance between the table field to be probed and the plurality of table fields in the preset field set;
determining a check rule corresponding to the target table field according to the mapping relation between each check rule and a preset field set;
and utilizing a check rule corresponding to the target table field to carry out quality detection on the example data of the table field to be detected.
2. The method according to claim 1, before obtaining the verification rules respectively corresponding to the plurality of preset attribute categories, further comprising:
and establishing a check rule base by using check rules respectively corresponding to a plurality of preset attribute categories.
3. The method of claim 2, after building the validation rule base, further comprising:
receiving an editing instruction for the verification rule base, and editing the verification rule base based on the editing instruction;
wherein the editing instructions comprise: modification instructions for modifying the verification rules of the verification rule base and/or addition instructions for adding custom verification rules.
4. The method of claim 1, wherein determining a target table field in the preset field set corresponding to the table field to be explored based on an edit distance between the table field to be explored and a plurality of table fields in the preset field set comprises:
determining editing distances between the table fields to be probed and the plurality of table fields in the preset field set;
determining a minimum edit distance from a plurality of the edit distances;
if the minimum editing distance corresponds to one table field in the preset field set, determining the table field as a target table field;
and if the minimum editing distance corresponds to at least two table fields in the preset field set, randomly selecting one table field of the at least two table fields as a target table field.
5. The method according to claim 1, wherein determining the check rule corresponding to the target table field according to the mapping relationship between each check rule and a preset field set comprises:
recommending a check rule for the target table field according to the mapping relation between each check rule and a preset field set;
and determining the recommended check rule as the check rule corresponding to the target table field.
6. The method according to claim 1, wherein determining the check rule corresponding to the target table field according to the mapping relationship between each check rule and a preset field set comprises:
recommending a check rule for the target table field according to the mapping relation between each check rule and a preset field set;
receiving an adjusting instruction for the recommended verification rule, and adjusting the recommended verification rule based on the adjusting instruction;
and determining the adjusted check rule as the check rule corresponding to the target table field.
7. The method according to claim 1, wherein the quality inspection of the instance data of the table field to be inspected by using the check rule corresponding to the target table field comprises:
utilizing a check rule corresponding to the target table field to perform quality detection on the example data of the table field to be detected one by one;
after all the table data are subjected to quality detection, at least one of the following quality detection results of the table data is obtained: the total amount of table data, the number of illegal fields, the proportion of illegal fields and the corresponding check rule of the table fields.
8. A table data quality inspection apparatus, comprising:
the rule obtaining module is used for obtaining a plurality of check rules respectively corresponding to the preset attribute categories;
the mapping establishing module is used for establishing the mapping relation between each check rule and a preset field set;
a first determining module, configured to determine, based on an edit distance between a table field to be explored and a plurality of table fields in the preset field set, a target table field in the preset field set corresponding to the table field to be explored;
the second determining module is used for determining the check rule corresponding to the target table field according to the mapping relation between each check rule and a preset field set;
and the quality probing module is used for probing the quality of the example data of the table field to be probed by using the check rule corresponding to the target table field.
9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 7.
CN202010004964.9A 2020-01-03 2020-01-03 Table data quality probing method and device Pending CN111209538A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010004964.9A CN111209538A (en) 2020-01-03 2020-01-03 Table data quality probing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010004964.9A CN111209538A (en) 2020-01-03 2020-01-03 Table data quality probing method and device

Publications (1)

Publication Number Publication Date
CN111209538A true CN111209538A (en) 2020-05-29

Family

ID=70785781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010004964.9A Pending CN111209538A (en) 2020-01-03 2020-01-03 Table data quality probing method and device

Country Status (1)

Country Link
CN (1) CN111209538A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813837A (en) * 2020-09-11 2020-10-23 成都寻道科技有限公司 Method for intelligently detecting data quality
CN112182116A (en) * 2020-09-17 2021-01-05 支付宝(杭州)信息技术有限公司 Data probing method and device
CN112256689A (en) * 2020-11-26 2021-01-22 杭州数梦工场科技有限公司 Service data cleaning method and device and electronic equipment
CN112699103A (en) * 2020-12-04 2021-04-23 国泰新点软件股份有限公司 Data rule probing method and device based on data pre-analysis
CN112749164A (en) * 2020-12-30 2021-05-04 北京知因智慧科技有限公司 Data quality analysis method and device and electronic equipment
CN113722333A (en) * 2021-09-10 2021-11-30 拉卡拉支付股份有限公司 Data checking method, device, electronic equipment, storage medium and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140282830A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Firewall Packet Filtering
CN108647358A (en) * 2018-05-17 2018-10-12 东软集团股份有限公司 Quality of data method of calibration, device, storage medium and electronic equipment
CN110287383A (en) * 2019-06-28 2019-09-27 深圳前海微众银行股份有限公司 A kind of field information method of inspection and device
CN110427375A (en) * 2019-07-29 2019-11-08 北京明略软件系统有限公司 The recognition methods of field classification and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140282830A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Firewall Packet Filtering
CN108647358A (en) * 2018-05-17 2018-10-12 东软集团股份有限公司 Quality of data method of calibration, device, storage medium and electronic equipment
CN110287383A (en) * 2019-06-28 2019-09-27 深圳前海微众银行股份有限公司 A kind of field information method of inspection and device
CN110427375A (en) * 2019-07-29 2019-11-08 北京明略软件系统有限公司 The recognition methods of field classification and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813837A (en) * 2020-09-11 2020-10-23 成都寻道科技有限公司 Method for intelligently detecting data quality
CN112182116A (en) * 2020-09-17 2021-01-05 支付宝(杭州)信息技术有限公司 Data probing method and device
CN112256689A (en) * 2020-11-26 2021-01-22 杭州数梦工场科技有限公司 Service data cleaning method and device and electronic equipment
CN112699103A (en) * 2020-12-04 2021-04-23 国泰新点软件股份有限公司 Data rule probing method and device based on data pre-analysis
CN112749164A (en) * 2020-12-30 2021-05-04 北京知因智慧科技有限公司 Data quality analysis method and device and electronic equipment
CN113722333A (en) * 2021-09-10 2021-11-30 拉卡拉支付股份有限公司 Data checking method, device, electronic equipment, storage medium and program product

Similar Documents

Publication Publication Date Title
CN111209538A (en) Table data quality probing method and device
CN106708909B (en) Data quality detection method and device
CN110008193B (en) Data standardization method and device
CN109885597B (en) User grouping processing method and device based on machine learning and electronic terminal
CN110427375B (en) Method and device for identifying field type
CN104756113A (en) Method, apparatus and computer program for detecting deviations in data sources
CN111552690A (en) Data generation method, device, terminal and storage medium
CN111309586A (en) Command testing method, device and storage medium thereof
CN116452329A (en) Abnormal behavior monitoring method and device, electronic equipment and storage medium
CN110795464B (en) Method, device, terminal and storage medium for checking field of object marker data
CN110413596A (en) Field processing method and processing device, storage medium, electronic device
CN109144999B (en) Data positioning method, device, storage medium and program product
CN111222923A (en) Method and device for judging potential customer, electronic equipment and storage medium
CN116228374A (en) Logistics industry market single data early warning method, device, equipment and storage medium
CN115809228A (en) Data comparison method and device, storage medium and electronic equipment
CN109597828A (en) A kind of off-line data checking method, device and server
CN112001792B (en) Configuration information consistency detection method and device
CN113434653A (en) Method, device and equipment for processing query statement and storage medium
CN115422180A (en) Data verification method and system
CN113986762A (en) Test case generation method and device
CN113469696A (en) User abnormality degree evaluation method and device and computer readable storage medium
CN112860722A (en) Data checking method and device, electronic equipment and readable storage medium
US20220261666A1 (en) Leveraging big data, statistical computation and artificial intelligence to determine a likelihood of object renunciation prior to a resource event
CN117349358B (en) Data matching and merging method and system based on distributed graph processing framework
CN117194500A (en) Data index verification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination