CN106708909A - Data quality detection method and apparatus - Google Patents

Data quality detection method and apparatus Download PDF

Info

Publication number
CN106708909A
CN106708909A CN201510796894.4A CN201510796894A CN106708909A CN 106708909 A CN106708909 A CN 106708909A CN 201510796894 A CN201510796894 A CN 201510796894A CN 106708909 A CN106708909 A CN 106708909A
Authority
CN
China
Prior art keywords
field
data
testing result
value
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510796894.4A
Other languages
Chinese (zh)
Other versions
CN106708909B (en
Inventor
曲丹鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510796894.4A priority Critical patent/CN106708909B/en
Publication of CN106708909A publication Critical patent/CN106708909A/en
Application granted granted Critical
Publication of CN106708909B publication Critical patent/CN106708909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Abstract

The invention provides a data quality detection method and apparatus. The data quality detection method comprises the following steps of obtaining metadata of to-be-detected data, wherein the metadata comprises N fields of the to-be-detected data and data types of the fields, and N is a positive integer; searching for at least one detection rule corresponding to each field in a pre-established detection rule library according to a preset matching policy and the data type of each field; and performing quality detection on each field according to the at least one detection rule corresponding to each field, thereby obtaining detection results of the N fields and performing display. According to the data quality detection method, the coverage rate and universality of detection can be effectively improved, the detection efficiency can be improved, and the manpower and time costs are reduced.

Description

The detection method and device of the quality of data
Technical field
The application is related to technical field of data processing, the detection method and device of more particularly to a kind of quality of data.
Background technology
With the arrival of data age, data message is towards magnanimity, diversified development.Data exception may cause serious loss Or accident, therefore, in data R&D mode, the detection of the quality of data is a very important link.
At present, for the detection of the quality of data, it usually needs tester is comprehensive to each field construction according to service logic Test case, i.e., write the code and execution logic of test case by tester, and judged by tester be in implementing result It is no to there is exception, whether meet business demand.In addition, after the completion of manual testing, if being related to the reparation to abnormal data, Manual testing's recurrence again after data are repaired then is needed, is write repeatedly and implementation of test cases.Because this mode is artificial ginseng With, there is the possibility of missing inspection in abnormal data, therefore test coverage is low, and for the data after different fields and reparation Need to repeat test case, therefore efficiency is low, high cost.
The content of the invention
The application is intended at least solve above-mentioned technical problem to a certain extent.
Therefore, first purpose of the application is to propose a kind of detection method of the quality of data,.
Second purpose of the application is to propose a kind of detection means of the quality of data.
It is, up to above-mentioned purpose, a kind of detection method of the quality of data to be proposed according to the application first aspect embodiment, including it is following Step:Obtain the metadata of data to be tested, wherein, N number of field of the metadata including the data to be tested and The data type of each field, wherein, N is positive integer;According to default matching strategy and the data type of each field Corresponding at least one detected rule of described each field is searched in the detected rule storehouse for pre-building;According to described each field Corresponding at least one detected rule carries out quality testing to described each field respectively, to obtain the detection of N number of field Result is simultaneously shown.
The detection method of the quality of data of the embodiment of the present application, can be according in the metadata of default matching strategy and data to be tested The data type of field searches the corresponding detected rule of each field in detected rule storehouse, and respectively to the word in data to be tested Duan Jinhang quality testings, obtain quality measurements.Wherein, the detected rule in detected rule storehouse is applicable to different data Storehouse and field, and coverage is wider, and automatically completing for detected rule matching and quality testing, therefore phase can be realized For the mode of traditional approach manual compiling test case, the detection method of the quality of data of the embodiment of the present application, Neng Gouyou Effect improves the coverage rate and versatility of detection, and can improve detection efficiency, and reduces artificial and time cost.
The application second aspect embodiment provides a kind of detection means of the quality of data, including:Acquisition module, treats for obtaining The metadata of detection data, wherein, the metadata includes N number of field of the data to be tested and the number of each field According to type, wherein, N is positive integer;Searching modul, for the data class according to default matching strategy and each field Type searches corresponding at least one detected rule of described each field in the detected rule storehouse for pre-building;Detection module, is used for Quality testing is carried out to described each field according to corresponding at least one detected rule of described each field respectively, it is described to obtain The testing result of N number of field is simultaneously shown.
The detection means of the quality of data of the embodiment of the present application, can be according in the metadata of default matching strategy and data to be tested The data type of field searches the corresponding detected rule of each field in detected rule storehouse, and respectively to the word in data to be tested Duan Jinhang quality testings, obtain quality measurements.Wherein, the detected rule in detected rule storehouse is applicable to different data Storehouse and field, and coverage is wider, and automatically completing for detected rule matching and quality testing, therefore phase can be realized For the mode of traditional approach manual compiling test case, the detection method of the quality of data of the embodiment of the present application, Neng Gouyou Effect improves the coverage rate and versatility of detection, and can improve detection efficiency, and reduces artificial and time cost.
The additional aspect and advantage of the application will be set forth in part in the description, and partly will from the following description become bright It is aobvious, or recognized by the practice of the application.
Brief description of the drawings
The above-mentioned and/or additional aspect and advantage of the application be will be apparent from description of the accompanying drawings below to embodiment is combined and It is readily appreciated that, wherein:
Fig. 1 is the flow chart of the detection method of the quality of data according to the application one embodiment;
Fig. 2 is according to one flow chart of the detection method of the quality of data of specific embodiment of the application;
Fig. 3 is the flow chart of the detection method of the quality of data according to the application another specific embodiment;
Fig. 4 a are the schematic diagram of the data quality checking method according to the application one embodiment;
Fig. 4 b are that the sex of the testing result in the application Fig. 4 a illustrated embodiments enumerates distribution map;
Fig. 4 c are the age range distribution map of the testing result in the application Fig. 4 a illustrated embodiments
Fig. 5 is the structural representation of the detection means of the quality of data according to the application one embodiment.
Specific embodiment
Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein it is identical from start to finish or Similar label represents same or similar element or the element with same or like function.Below with reference to Description of Drawings Embodiment is exemplary, is only used for explaining the application, and it is not intended that limitation to the application.
Below with reference to the accompanying drawings the detection method and device of the quality of data according to the embodiment of the present application described.
Fig. 1 is the flow chart of the detection method of the quality of data according to the application one embodiment.
As shown in figure 1, the detection method of the quality of data according to the embodiment of the present application, comprises the following steps.
S101, obtains the metadata of data to be tested, wherein, N number of field of the metadata including data to be tested and The data type of each field, wherein, N is positive integer.
Specifically, the table name of user input can be received, and using the corresponding tables of data of the table name as data to be tested.
Further, the subregion of user's selection detection can be also received, and using the corresponding partition table of the subregion as data to be tested. Wherein, partition table is a subset for tables of data, you can a big data table is divided into multiple subregions, and each subregion is one Partition table.
Wherein, metadata is for describing the data of data (data about data), being mainly used in description data attribute (property) Information, for support such as indicate storage location, historical data, resource lookup, file record function.This that is, The letter such as field and its corresponding field type, key assignments, creation time of data to be tested is included in the metadata of data to be tested Breath.
The field of data to be tested is the basic unit of storage of data to be tested.For example, a field can be data to be tested In a row, then the value of N depend on the total columns with data to be tested.
In a distributed manner as a example by cluster grammer, wherein, the data type of field may include Boolean (boolean) type, Double (double-precision floating point) type, Bigint (integer) type, Decimal (exact value) type, String (character string) class Type, Data time (date) type etc..For example, the data type of the corresponding field of sex is Boolean (boolean) Type, the data type of age corresponding field can be Bigint (integer) type, the data class of the corresponding field of login time Type can be Data time (date) type.
S102, the data type according to default matching strategy and each field searches each in the detected rule storehouse for pre-building Corresponding at least one detected rule of field.
In one embodiment of the application, can extract many according to the corresponding test case of each data type in different types of data Individual detected rule, and set up detected rule storehouse according to the multiple detected rules for extracting.Specifically, can be previously according to frequently-used data matter The core calculations logical abstraction of the test case of amount is regular masterplate, and using the general desired value of test case as respective rule mould The corresponding desired value of version, so that, to set up detected rule storehouse.Wherein, detected rule storehouse includes multiple detected rules, each inspection Gauge then includes regular masterplate and desired value.
It should be appreciated that some detected rules for detecting class are it is not anticipated that value, therefore, its corresponding desired value can be default value, For example can be N/A.For example, for the age field in tables of data, its number of iterations does not have particular/special requirement, therefore, Desired value for the detected rule of number of iterations may be set to N/A.
Wherein, default matching strategy is to carry out statistical analysis and obtain previously according to the test case of the conventional quality of data.Tool Body ground, for each detected rule, needs to use the detected rule during the quality testing that can count which data type, And the label of corresponding data type is added for the detected rule.
So as to after a data type for field is obtained, be found out in detected rule storehouse according to the data type all Detected rule with the data type label.
For example, for the field of Boolean types, such as other field of representative, its corresponding detected rule may include: Detection assigning null data amount (expected=0), detection virtual value quantity are (expected>0), the detection rule such as detection null value rate (expected=0%) Then.
For the field of the numeric types such as Double types, Bigint types and Decimal types, the field at age is such as represented, Its corresponding detected rule may include:Detection assigning null data amount (expected=0), detection virtual value quantity are (expected>0), detect empty Value rate (expected=0%), detection maximum are (expected>0), detection minimum value is (expected<0), detection average value, detection percentage The detected rules such as position (being contemplated to default quantile), detection negative value (expected=0), detection null value.
For the field of String types, the field of ID is such as represented, its corresponding detected rule may include:Detection null value Data volume (expected=0), detection virtual value quantity are (expected>0), detection null value rate (expected=0%), detection enumerated value number, The detected rules such as detected field length maximum, detected field length minimum value.
For the field of Data time types, the field of User logs in time is such as represented, its corresponding detected rule may include: Detection assigning null data amount (expected=0), detection virtual value quantity are (expected>0), detection null value rate (expected=0%), detection are full The quantity detected rule of the default date format (such as yyMMdd forms) of foot.
It should be appreciated that the desired value of detected rule can be the default value in detected rule storehouse, also can as needed be set by testing staff It is fixed.
S103, carries out quality testing, to obtain N to each field respectively according to corresponding at least one detected rule of each field The testing result of individual field is simultaneously shown.
Specifically, for each field, the field can respectively be calculated according to the corresponding detected rule of the field and is detected at each The lower detected value of rule, and desired value by detected value respectively with corresponding detected rule is compared, if each detected value is accorded with Desired value is closed, then shows that exception does not occur in the field, otherwise the field exception.If each field does not go out in data to be tested Now abnormal, then there is not exception, no person, data to be tested exception in data to be tested.
For example, if there is problems with data to be tested, then it represents that data to be tested occur abnormal:It is in the presence of repetition Not there is null value in unique, core field to major key or null value rate is more than preset range, index field (such as amount of money, age field) There is abnormal (such as visit capacity page view of negative value, interfield mathematical logic<Independent visitor unique visitor), field length surpass Go out business be expected (as ID card No. is more than 18), can enumerated field (such as sex) there is abnormal enumerated value or enumerated value Data distribution abnormal proportion, field format and preset format are not inconsistent.
In one embodiment of the application, when detecting abnormal, corresponding field and corresponding detected value are can record, and carry Supply testing staff.Wherein, the testing result of each field can be shown in the diversified form such as form, distribution map, And abnormal field and its corresponding problem can be marked with highlighted or other special markings, so as to testing staff can and When note abnormalities and processed.
It should be appreciated that after testing staff is processed abnormal data, above-mentioned S101-S103 can be automatically repeated, with to treatment Data afterwards are detected again.
The detection method of the quality of data of the embodiment of the present application, can be according in the metadata of default matching strategy and data to be tested The data type of field searches the corresponding detected rule of each field in detected rule storehouse, and respectively to the word in data to be tested Duan Jinhang quality testings, obtain quality measurements.Wherein, the detected rule in detected rule storehouse is applicable to different data Storehouse and field, and coverage is wider, and automatically completing for detected rule matching and quality testing, therefore phase can be realized For the mode of traditional approach manual compiling test case, the detection method of the quality of data of the embodiment of the present application, Neng Gouyou Effect improves the coverage rate and versatility of detection, and can improve detection efficiency, and reduces artificial and time cost.
In another embodiment of the application, in order to further improve detection efficiency, can according to detected rule to number to be detected During according to carrying out quality testing, detection is merged, and parallel detection is carried out to field according to the detected rule after merging.Specifically, Fig. 2 is according to one flow chart of the detection method of the quality of data of specific embodiment of the application.As shown in Fig. 2 the application reality The detection method of the quality of data of example is applied, is comprised the following steps:
S201, obtains the metadata of data to be tested, wherein, N number of field of the metadata including data to be tested and The data type of each field, wherein, N is positive integer.
S202, the data type according to default matching strategy and each field searches each in the detected rule storehouse for pre-building Corresponding at least one detected rule of field.
Wherein, S201-S202 is identical with S101-S102, will not be repeated here.
S203, the regular masterplate of the corresponding multiple detected rules of each field is merged, to obtain merging regular masterplate.
It should be appreciated that the application is not limited the merging rule of regular masterplate.In embodiments herein, can be by a word The corresponding multiple detected rule masterplates of section merge into one and merge regular masterplate, also can be corresponding by whole fields in data to be tested Detected rule all merges into one and merges regular masterplate.
Specifically, it is necessary to be made a distinction to field when being merged to the corresponding detected rule of whole fields.For example, Regular 1 () of the corresponding detected rule masterplates of column1, regular 2 (), regular 3 (), the corresponding detection rule of column1 Then regular 1 (), regular 3 (), regular 4 (), regular 5 (), then need to column1 and column2 when masterplate merges Make a distinction, 3 (column1) of merging rule masterplate rule 1 (column1) rule 2 (column1) rules for obtaining 5 (column2) of rule 1 (column2) rule 3 (column2) rule 4 (column2) rules.
S204, is detected using regular masterplate is merged to N number of field, to obtain multiple testing results, wherein, it is described many Individual testing result detected rule corresponding with N number of field is corresponding.
Specifically, when being detected using the regular masterplate of merging, can be to merging the regular masterplates of multiple protected in regular masterplate Parallel processing, thus, can in a short time hold completion one field even quality testing for packet.
Whether S205, is respectively compared the multiple testing result consistent with the desired value of corresponding detected rule.
S206, if testing result is consistent with the desired value of corresponding detected rule, is judging field corresponding with testing result just Often.
S207, if testing result is inconsistent with the desired value of corresponding detected rule, judges field corresponding with testing result It is abnormal.
Multiple detected rule masterplates can be merged by the detection method of the quality of data of the embodiment of the present application, and be advised using merging Then masterplate carries out quality testing, so that the detected value of multiple detected rules can be obtained in once calculating by joint account, from And testing result can be got more quickly to, further improve detection efficiency.
Fig. 3 is the flow chart of the detection method of the quality of data according to the application another specific embodiment.
As shown in figure 3, the detection method of the quality of data of the embodiment of the present application, comprises the following steps:
S301, obtains the metadata of data to be tested, wherein, N number of field of the metadata including data to be tested and The data type of each field, wherein, N is positive integer.
S302, the data type according to default matching strategy and each field searches each in the detected rule storehouse for pre-building Corresponding at least one detected rule of field.
S303, quality testing is carried out according to corresponding at least one detected rule of each field to each field respectively.
Wherein, S301-S303 is identical with S101-S103, will not be repeated here.
S304, if whether the testing result comprising enumerated value number in N number of field testing result, judge enumerated value number Meet first pre-conditioned.
Wherein, enumerated value number is the quantity after field value duplicate removal.
First pre-conditioned is:Enumerated value number<Predetermined number and enumerated value number/field value total quantity<=preset ratio.Its In, predetermined number can be adjusted with preset ratio according to business, for example, predetermined number can be 100, and preset ratio can It is 0.01.
If the enumerated value number satisfaction first of a field is pre-conditioned, then it represents that the field is enumerable discrete field, Therefore, it can be carried out enumerating Distribution value detection.
S305, if it is satisfied, then obtaining enumerated value distribution proportion according to enumerated value number.
Specifically, the frequency of each enumerated value appearance, the i.e. number of repetition of each enumerated value/field value total quantity can be calculated, with Obtain enumerated value distribution proportion.
S306, if enumerated value distribution proportion is unsatisfactory for preset ratio condition, judges the testing result pair comprising enumerated value number The field exception answered.
In embodiments herein, preset ratio condition can determine according to conventional judge criterion, can also be set by testing staff. Certainly, in embodiments herein, the enumerated value distribution proportion that will can also obtain shows testing staff, with by testing staff Judge whether data are abnormal according to enumerated value distribution proportion.
S307, if the testing result comprising maximum and minimum value in the testing result of N number of field, judge maximum and It is pre-conditioned whether the difference of minimum value meets second.
Wherein, second pre-conditioned can be set according to business demand, for example, second pre-conditioned is<=100.
S308, if it is satisfied, then obtain the field value comprising the corresponding field of the testing result of maximum and minimum value entering by interval Row statistics, to obtain the interval distributed data of the corresponding field of the testing result comprising maximum and minimum value.
Specifically, the field value of the corresponding field of corresponding testing result can be divided into by default area according to maximum and minimum value Between, and the quantity of field value in each interval is counted, so as to obtain the interval distributed data of the field.
S309, if interval distributed data is unsatisfactory for default distribution, judges corresponding with the testing result of minimum value comprising maximum Field exception.
In embodiments herein, default distribution can determine according to conventional judge criterion, can also be set by testing staff.When So, in embodiments herein, the interval distributed data that will can also obtain shows testing staff, with by testing staff according to Interval distributed data judges whether data are abnormal.
The data quality checking method of the embodiment of the present application is illustrated with reference to chart 1 and 4a- Fig. 4 c.
Fig. 4 a are the schematic diagram of the data quality checking method according to the application one embodiment.
Table 1 is the field testing result table of the data quality checking method according to the application one embodiment.
Fig. 4 b are that the sex of the testing result in the application Fig. 4 a illustrated embodiments enumerates distribution map.
Fig. 4 c are the age range distribution map of the testing result in the application Fig. 4 a illustrated embodiments.
As shown in fig. 4 a, the partition table of the pt=20150801 of user's selection user's table a_user is used as data to be tested.The number Include the fields such as userid, nick, age, score, logintime according to table.Field type Auto-matching according to each field Corresponding detected rule (detected rule that userid, age, logintime are matched only is shown in figure), then according to matching Detected rule detected, obtains testing result.
Then the field testing result table according to testing result generation table 1, the sex shown in Fig. 4 b enumerate distribution map and Age range distribution map shown in Fig. 4 c.
As shown in table 1, table 1 lists exception field userid, age, logintime, wherein, overstriking, the numerical value of italic are There is abnormal detected value, i.e., detected value inconsistent with corresponding desired value.
Table 1
Field Null value quantity Virtual value quantity Maximum Minimum value Repeat to measure
userid 100 102 N/A N/A 10
age 0 102 10 -1 N/A
logintime 1 101 N/A N/A 50
As shown in Figure 4 b, M-F difference 80% and 20%, sex man's ratio is very high, and testing staff can be according to specific industry Business judges whether this phenomenon is reasonable.
As illustrated in fig. 4 c, age range [0,100] can be divided into 10 intervals, and counts the data positioned at each interval age Amount, obtains age range distribution map of the statistics as shown in Fig. 4 c.It can be seen that in the crowd of 50-60 Sui be it is most, Testing staff can judge whether this result is reasonable according to specific business.
Thus, after preliminary testing result is obtained, whether be able to can be enumerated according to field, whether sectional detection etc. further enter Row detection, to obtain enumerating distribution map or segmentation testing result such that it is able to which the distribution according to data quality checking is more directly perceived The judgement quality of data, so as to deeper into ground find data problem.
It should be appreciated that in embodiments herein, step S304-S306 and step S307-S309 is optional, and step The order of S304-S306 and step S307-S309 is in no particular order.
In one embodiment of the application, the metadata may also include the major key information of data to be tested, alternatively, in root After carrying out quality testing to each field respectively according to corresponding at least one detected rule of each field, may also include:
S310, determines the corresponding field of major key information.
Wherein, major key information is the information for major key for which field in mark data table.Recorded in the metadata of tables of data Major key information, therefore, corresponding field can be determined according to major key information.
S311, obtains the number of iterations testing result of the corresponding field of major key information.
Wherein, number of iterations detection is the corresponding field value of detected field with the presence or absence of repetition.By taking userid IDs as an example, ID if there is two or more users is xiaoming, then show there is the field value for repeating.
S312, if the corresponding field of major key information has the field value for repeating, judges that major key information or major key information are corresponding Field exception.
Because major key is recorded in unique identification data table, if there is the value for repeating, then can not be to record only One mark, therefore, can determine whether data exception.
Thus, the major key of packet can further be detected by major key information, so as to more fully find the problem of data.
It should be appreciated that in embodiments herein, step S310-S312, step S304-S306, step S307-S309 Execution sequence is in no particular order.
In order to realize above-described embodiment, the application also proposes a kind of detection means of the quality of data.
Fig. 5 is the structural representation of the detection means of the quality of data according to the application one embodiment.
As shown in figure 5, the detection means of the quality of data according to the embodiment of the present application, including:Acquisition module 10, lookup mould Block 20 and detection module 30.
Specifically, acquisition module 10 is used to obtain the metadata of data to be tested, wherein, the metadata includes number to be detected According to N number of field and each field data type, wherein, N is positive integer.
More specifically, acquisition module 10 can receive the table name of user input, and using the corresponding tables of data of the table name as to be detected Data.
Further, acquisition module 10 can also receive user selection detection subregion, and using the corresponding partition table of the subregion as Data to be tested.Wherein, partition table is a subset for tables of data, you can a big data table is divided into multiple subregions, often Individual subregion is a partition table.
Wherein, metadata is for describing the data of data (data about data), being mainly used in description data attribute (property) Information, for support such as indicate storage location, historical data, resource lookup, file record function.This that is, The letter such as field and its corresponding field type, key assignments, creation time of data to be tested is included in the metadata of data to be tested Breath.
The field of data to be tested is the basic unit of storage of data to be tested.For example, a field can be data to be tested In a row, then the value of N depend on the total columns with data to be tested.
In a distributed manner as a example by cluster grammer, wherein, the data type of field may include Boolean (boolean) type, Double (double-precision floating point) type, Bigint (integer) type, Decimal (exact value) type, String (character string) class Type, Data time (date) type etc..For example, the data type of the corresponding field of sex is Boolean (boolean) Type, the data type of age corresponding field can be Bigint (integer) type, the data class of the corresponding field of login time Type can be Data time (date) type.
Searching modul 20 is used for the data type according to default matching strategy and each field in the detected rule storehouse for pre-building It is middle to search corresponding at least one detected rule of each field.
In one embodiment of the application, searching modul 20 can be according to the corresponding survey of each data type in different types of data Example on probation extracts multiple detected rules, and sets up detected rule storehouse according to the multiple detected rules for extracting.Specifically, can root in advance It is regular masterplate according to the core calculations logical abstraction of the test case of frequently-used data quality, and the general desired value of test case is made It is the corresponding desired value of respective rule masterplate, so that, to set up detected rule storehouse.Wherein, detected rule storehouse includes multiple detection Rule, each detected rule includes regular masterplate and desired value.
It should be appreciated that some detected rules for detecting class are it is not anticipated that value, therefore, its corresponding desired value can be default value, For example can be N/A.
Wherein, default matching strategy is to carry out statistical analysis and obtain previously according to the test case of the conventional quality of data.Tool Body ground, for each detected rule, needs to use the detected rule during the quality testing that can count which data type, And the label of corresponding data type is added for the detected rule.
So as to after a data type for field is obtained, be found out in detected rule storehouse according to the data type all Detected rule with the data type label.
It should be appreciated that the desired value of detected rule can be the default value in detected rule storehouse, also can as needed be set by testing staff It is fixed.
Detection module 30 is used to carry out quality inspection to each field respectively according to corresponding at least one detected rule of each field Survey, to obtain the testing result of N number of field and be shown.
Specifically, for each field, detection module 30 can respectively calculate the word according to the corresponding detected rule of the field Section detected value under each detected rule, and desired value by detected value respectively with corresponding detected rule is compared, if often Individual detected value all meets desired value, then show that exception does not occur in the field, otherwise the field exception.If every in data to be tested All there is not exception in individual field, then exception, no person, data to be tested exception do not occur in data to be tested.
In one embodiment of the application, when detecting abnormal, detection module 30 can record corresponding field and corresponding Detected value, and it is supplied to testing staff.Wherein, detection module 30 can be by the testing result of each field with form, distribution map It is shown etc. diversified form, and abnormal field and its corresponding problem can be carried out with highlighted or other special markings Mark, so that testing staff can note abnormalities in time and be processed.
In another embodiment of the application, in order to further improve detection efficiency, detection module 30 can be advised according to detection When then carrying out quality testing to data to be tested, detection is merged, and field is carried out parallel according to the detected rule after merging Detection.
Specifically, detection module 30 can be specifically for:The regular masterplate of the corresponding multiple detected rules of each field is closed And, to obtain merging regular masterplate;N number of field is detected using regular masterplate is merged, to obtain multiple testing results, Wherein, multiple testing results detected rule corresponding with N number of field is corresponding;Multiple testing results are respectively compared with corresponding inspection Whether gauge desired value then is consistent;If testing result is consistent with the desired value of corresponding detected rule, judge to be tied with detection Really corresponding field is normal;If testing result is inconsistent with the desired value of corresponding detected rule, judge and testing result pair The field exception answered.
It should be appreciated that the application is not limited the merging rule of regular masterplate.In embodiments herein, can be by a word The corresponding multiple detected rule masterplates of section merge into one and merge regular masterplate, also can be corresponding by whole fields in data to be tested Detected rule all merges into one and merges regular masterplate.When being detected using the regular masterplate of merging, can be to merging regular mould The multiple rule masterplate parallel processing protected in version, thus, can in a short time hold one field of completion even packet Quality testing.
Further, in one embodiment of the application, detection module 30 is additionally operable to:Corresponding extremely according to each field A few detected rule is carried out after quality testing to each field respectively, if including enumerated value in N number of field testing result The testing result of number, then judge whether enumerated value number meets first pre-conditioned;If it is satisfied, then according to enumerated value number Obtain enumerated value distribution proportion;If enumerated value distribution proportion is unsatisfactory for preset ratio condition, judge comprising enumerated value number The corresponding field exception of testing result.
Wherein, enumerated value number is the quantity after field value duplicate removal.
First pre-conditioned is:Enumerated value number<Predetermined number and enumerated value number/field value total quantity<=preset ratio.Its In, predetermined number can be adjusted with preset ratio according to business, for example, predetermined number can be 100, and preset ratio can It is 0.01.Preset ratio condition can determine according to conventional judge criterion, can also be set by testing staff.
If the enumerated value number satisfaction first of a field is pre-conditioned, then it represents that the field is enumerable discrete field, Therefore, it can be carried out enumerating Distribution value detection.Specifically, the frequency of each enumerated value appearance, i.e. each enumerated value can be calculated Number of repetition/field value total quantity, to obtain enumerated value distribution proportion.
In embodiments herein, the enumerated value distribution proportion that will can also obtain shows testing staff, with by testing staff's root Judge whether data are abnormal according to enumerated value distribution proportion.
In one embodiment of the application, detection module 30 is additionally operable to:According to corresponding at least one detection of each field Rule is carried out after quality testing to each field respectively, if including maximum and minimum value in the testing result of N number of field Testing result, then judging the difference of maxima and minima, whether to meet second pre-conditioned;If it is satisfied, then acquisition is included The field value of the corresponding field of testing result of maximum and minimum value is counted by interval, to obtain comprising maximum and minimum The interval distributed data of the corresponding field of testing result of value;If interval distributed data is unsatisfactory for default distribution, judgement is included The corresponding field exception of testing result of maximum and minimum value.
Wherein, second pre-conditioned can be set according to business demand, for example, second pre-conditioned is<=100.It is default Distribution can determine according to conventional judge criterion, can also be set by testing staff.
Specifically, the field value of the corresponding field of corresponding testing result can be divided into by default area according to maximum and minimum value Between, and the quantity of field value in each interval is counted, so as to obtain the interval distributed data of the field.
In embodiments herein, the interval distributed data that will can also obtain shows testing staff, with by testing staff according to Interval distributed data judges whether data are abnormal.
Thus, after preliminary testing result is obtained, whether be able to can be enumerated according to field, whether sectional detection etc. further enter Row detection, to obtain enumerating distribution map or segmentation testing result such that it is able to which the distribution according to data quality checking is more directly perceived The judgement quality of data, so as to deeper into ground find data problem.
In one embodiment of the application, metadata includes the major key information of data to be tested, and detection module 30 can be additionally used in: After quality testing is carried out to each field respectively according to corresponding at least one detected rule of each field, major key information is determined Corresponding field;Obtain the number of iterations testing result of the corresponding field of major key information;If the corresponding field of major key information is present The field value for repeating, then judge major key information or the corresponding field exception of major key information.
Wherein, major key information is the information for major key for which field in mark data table.Recorded in the metadata of tables of data Major key information, therefore, corresponding field can be determined according to major key information.Number of iterations detection is the corresponding field value of detected field With the presence or absence of repetition.By taking userid IDs as an example, the ID if there is two or more users is xiaoming, Then show there is the field value for repeating.
Because major key is recorded in unique identification data table, if there is the value for repeating, then can not be to record only One mark, therefore, can determine whether data exception.
Thus, the major key of packet can further be detected by major key information, so as to more fully find the problem of data.
The detection means of the quality of data of the embodiment of the present application, can be according in the metadata of default matching strategy and data to be tested The data type of field searches the corresponding detected rule of each field in detected rule storehouse, and respectively to the word in data to be tested Duan Jinhang quality testings, obtain quality measurements.Wherein, the detected rule in detected rule storehouse is applicable to different data Storehouse and field, and coverage is wider, and automatically completing for detected rule matching and quality testing, therefore phase can be realized For the mode of traditional approach manual compiling test case, the detection method of the quality of data of the embodiment of the present application, Neng Gouyou Effect improves the coverage rate and versatility of detection, and can improve detection efficiency, and reduces artificial and time cost.
In flow chart or herein any process described otherwise above or method description be construed as, expression include one or Module, fragment or part that more are used for the code of the executable instruction of the step of realizing specific logical function or process, and The scope of the preferred embodiment of the application includes other realization, wherein order that is shown or discussing can not be pressed, including root According to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be by embodiments herein institute Category those skilled in the art understood.
Represent in flow charts or logic and/or step described otherwise above herein, for example, being considered for reality The order list of the executable instruction of existing logic function, may be embodied in any computer-readable medium, be held for instructing Row system, device or equipment (such as computer based system, including the system of processor or other can from instruction execution system, The system of device or equipment instruction fetch and execute instruction) use, or used with reference to these instruction execution systems, device or equipment. For the purpose of this specification, " computer-readable medium " can be it is any can include, store, communicating, propagating or transmission procedure with The device used for instruction execution system, device or equipment or with reference to these instruction execution systems, device or equipment.Computer The more specifically example (non-exhaustive list) of computer-readable recording medium includes following:Electrical connection section (the electricity connected up with one or more Sub-device), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage (ROM), Erasable edit read-only storage (EPROM or flash memory), fiber device, and the read-only storage of portable optic disk Device (CDROM).In addition, computer-readable medium can even is that the paper that can thereon print described program or other are suitable Medium, because optical scanner for example can be carried out by paper or other media, then enters edlin, interpretation or if necessary with it His suitable method is processed electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combinations thereof.In above-mentioned implementation In mode, software or firmware that multiple steps or method can in memory and by suitable instruction execution system be performed with storage To realize.If for example, being realized with hardware, with another embodiment, following technology well known in the art can be used Any one of or their combination realize:Discrete with the logic gates for realizing logic function to data-signal is patrolled Collect circuit, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), field-programmable Gate array (FPGA) etc..
Those skilled in the art be appreciated that to realize all or part of step that above-described embodiment method is carried is can Completed with the hardware that correlation is instructed by program, described program can be stored in a kind of computer-readable recording medium, should Program upon execution, including one or a combination set of the step of embodiment of the method.
Additionally, during each functional unit in the application each embodiment can be integrated in a processing module, or each Unit is individually physically present, it is also possible to which two or more units are integrated in a module.Above-mentioned integrated module both can be with Realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.If the integrated module is with software work( Can the form of module realize and as independent production marketing or when using, it is also possible to storage is situated between in an embodied on computer readable storage In matter.
Storage medium mentioned above can be read-only storage, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specific example ", Or the description of " some examples " etc. means to combine specific features, structure, material or feature that the embodiment or example describe and includes In at least one embodiment or example of the application.In this manual, the schematic representation of above-mentioned term is not necessarily referred to It is identical embodiment or example.And, the specific features of description, structure, material or feature can at any one or Combined in an appropriate manner in multiple embodiments or example.
While there has been shown and described that embodiments herein, it will be understood by those skilled in the art that:This is not being departed from In the case of the principle and objective of application various changes, modification, replacement and modification can be carried out to these embodiments, the application's Scope is by claim and its equivalent limits.

Claims (16)

1. a kind of detection method of the quality of data, it is characterised in that comprise the following steps:
The metadata of data to be tested is obtained, wherein, N number of field of the metadata including the data to be tested and every The data type of individual field, wherein, N is positive integer;
Described in data type according to default matching strategy and each field is searched in the detected rule storehouse for pre-building Corresponding at least one detected rule of each field;
Quality testing is carried out to described each field according to corresponding at least one detected rule of described each field respectively, to obtain The testing result of N number of field is simultaneously shown.
2. the detection method of the quality of data as claimed in claim 1, it is characterised in that the detection in the detected rule storehouse Rule includes rule name, regular masterplate and regular desired value.
3. the detection method of the quality of data as claimed in claim 2, it is characterised in that each field pair described in the basis At least one detected rule answered carries out quality testing to described each field respectively, specifically includes:
The regular masterplate of the corresponding multiple detected rules of each field is merged, to obtain merging regular masterplate;
N number of field is detected using the rule masterplate that merges, to obtain multiple testing results, wherein, it is described Multiple testing results detected rule corresponding with the N number of field is corresponding;
Whether it is respectively compared the multiple testing result consistent with the desired value of corresponding detected rule;
If the testing result is consistent with the desired value of corresponding detected rule, field corresponding with the testing result is judged Normally;
If the testing result is inconsistent with the desired value of corresponding detected rule, word corresponding with the testing result is judged Duan Yichang.
4. the detection method of the quality of data as claimed in claim 1, it is characterised in that in each field described in the basis Corresponding at least one detected rule is carried out after quality testing to described each field respectively, is also included:
If the testing result comprising enumerated value number, judges that the enumerated value number is in N number of field testing result No satisfaction first is pre-conditioned;
If it is satisfied, then obtaining enumerated value distribution proportion according to the enumerated value number;
If the enumerated value distribution proportion is unsatisfactory for preset ratio condition, the testing result comprising enumerated value number is judged Corresponding field exception.
5. the detection method of the quality of data as claimed in claim 1, it is characterised in that in each field described in the basis Corresponding at least one detected rule is carried out after quality testing to described each field respectively, is also included:
If the testing result comprising maximum and minimum value in the testing result of N number of field, judges the maximum It is pre-conditioned with whether the difference of the minimum value meets second;
If it is satisfied, then obtain the field value comprising the corresponding field of the testing result of maximum and minimum value being carried out by interval Statistics, to obtain the interval distributed data of the corresponding field of the testing result comprising maximum and minimum value;
If the interval distributed data is unsatisfactory for default distribution, the testing result pair comprising maximum and minimum value is judged The field exception answered.
6. the detection method of the quality of data as claimed in claim 1, it is characterised in that wherein, the metadata includes institute The major key information of data to be tested is stated, in corresponding at least one detected rule of each field described in the basis respectively to described every Individual field is carried out after quality testing, is also included:
Determine the corresponding field of the major key information;
Obtain the number of iterations testing result of the corresponding field of the major key information;
If the corresponding field of the major key information has the field value for repeating, the major key information or the major key information are judged Corresponding field exception.
7. the detection method of the quality of data as claimed in claim 1, it is characterised in that the data type includes:Boolean Boolean types, character string String types, date Data time types, double-precision floating point Double types, integer Bigint Type or exact value Decimal types.
8. the detection method of the quality of data as claimed in claim 1, it is characterised in that wherein, according to different types of data In the corresponding test case of each data type extract multiple detected rules, and the detection is set up according to the multiple detected rule Rule base.
9. a kind of detection means of the quality of data, it is characterised in that including:
Acquisition module, the metadata for obtaining data to be tested, wherein, the metadata includes the N of the data to be tested The data type of individual field and each field, wherein, N is positive integer;
Searching modul, for the data type according to default matching strategy and each field in the detected rule for pre-building Corresponding at least one detected rule of described each field is searched in storehouse;
Detection module, for carrying out matter to described each field respectively according to corresponding at least one detected rule of each field Amount detection, to obtain the testing result of N number of field and be shown.
10. the detection means of the quality of data as claimed in claim 9, it is characterised in that the detection in the detected rule storehouse Rule includes rule name, regular masterplate and regular desired value.
The detection means of 11. qualities of data as claimed in claim 10, it is characterised in that the detection module specifically for:
The regular masterplate of the corresponding multiple detected rules of each field is merged, to obtain merging regular masterplate;
N number of field is detected using the rule masterplate that merges, to obtain multiple testing results, wherein, it is described Multiple testing results detected rule corresponding with the N number of field is corresponding;
Whether it is respectively compared the multiple testing result consistent with the desired value of corresponding detected rule;
If the testing result is consistent with the desired value of corresponding detected rule, field corresponding with the testing result is judged Normally;
If the testing result is inconsistent with the desired value of corresponding detected rule, word corresponding with the testing result is judged Duan Yichang.
The detection means of 12. qualities of data as claimed in claim 9, it is characterised in that the detection module is additionally operable to:
Quality testing is carried out to described each field respectively in corresponding at least one detected rule of each field described in the basis Afterwards, if including the testing result of enumerated value number in N number of field testing result, the enumerated value number is judged Whether first is met pre-conditioned;
If it is satisfied, then obtaining enumerated value distribution proportion according to the enumerated value number;
If the enumerated value distribution proportion is unsatisfactory for preset ratio condition, the testing result comprising enumerated value number is judged Corresponding field exception.
The detection means of 13. qualities of data as claimed in claim 9, it is characterised in that the detection module is additionally operable to:
Quality testing is carried out to described each field respectively in corresponding at least one detected rule of each field described in the basis Afterwards, if including the testing result of maximum and minimum value in the testing result of N number of field, the maximum is judged It is pre-conditioned whether value meets second with the difference of the minimum value;
If it is satisfied, then obtain the field value comprising the corresponding field of the testing result of maximum and minimum value being carried out by interval Statistics, to obtain the interval distributed data of the corresponding field of the testing result comprising maximum and minimum value;
If the interval distributed data is unsatisfactory for default distribution, the testing result pair comprising maximum and minimum value is judged The field exception answered.
The detection means of 14. qualities of data as claimed in claim 9, it is characterised in that wherein, the metadata includes institute The major key information of data to be tested is stated, the detection module is additionally operable to:
Quality testing is carried out to described each field respectively in corresponding at least one detected rule of each field described in the basis Afterwards, the corresponding field of the major key information is determined;
Obtain the number of iterations testing result of the corresponding field of the major key information;
If the corresponding field of the major key information has the field value for repeating, the major key information or the major key information are judged Corresponding field exception.
The detection means of 15. qualities of data as claimed in claim 9, it is characterised in that the data type includes:Boolean Boolean types, character string String types, date Data time types, double-precision floating point Double types, integer Bigint Type or exact value Decimal types.
The detection means of 16. qualities of data as claimed in claim 9, it is characterised in that wherein, according to different types of data In the corresponding test case of each data type extract multiple detected rules, and the detection is set up according to the multiple detected rule Rule base.
CN201510796894.4A 2015-11-18 2015-11-18 Data quality detection method and device Active CN106708909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510796894.4A CN106708909B (en) 2015-11-18 2015-11-18 Data quality detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510796894.4A CN106708909B (en) 2015-11-18 2015-11-18 Data quality detection method and device

Publications (2)

Publication Number Publication Date
CN106708909A true CN106708909A (en) 2017-05-24
CN106708909B CN106708909B (en) 2020-12-08

Family

ID=58933526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510796894.4A Active CN106708909B (en) 2015-11-18 2015-11-18 Data quality detection method and device

Country Status (1)

Country Link
CN (1) CN106708909B (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766510A (en) * 2017-10-23 2018-03-06 中国银行股份有限公司 A kind of data processing method, data query method and device
CN108170371A (en) * 2017-12-07 2018-06-15 北京空间技术研制试验中心 Interpretation conclusion compression and storage method
CN108536777A (en) * 2018-03-28 2018-09-14 联想(北京)有限公司 A kind of data processing method, server cluster and data processing equipment
CN108595563A (en) * 2018-04-13 2018-09-28 林秀丽 A kind of data quality management method and device
CN108959374A (en) * 2018-05-24 2018-12-07 北京三快在线科技有限公司 Date storage method, device and electronic equipment
CN109241043A (en) * 2018-08-13 2019-01-18 蜜小蜂智慧(北京)科技有限公司 A kind of data quality checking method and device
CN109491990A (en) * 2018-09-17 2019-03-19 武汉达梦数据库有限公司 A kind of method of detection data quality and the device of detection data quality
CN109510824A (en) * 2018-11-12 2019-03-22 中国银行股份有限公司 A kind of method of calibration and device of interface packets
CN109656812A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Data quality checking method, apparatus and storage medium
CN110019566A (en) * 2019-03-13 2019-07-16 平安信托有限责任公司 Data checking, device, computer equipment and storage medium based on data warehouse
CN110362563A (en) * 2019-07-19 2019-10-22 北京明略软件系统有限公司 The processing method and processing device of tables of data, storage medium, electronic device
CN110413596A (en) * 2019-07-30 2019-11-05 北京明略软件系统有限公司 Field processing method and processing device, storage medium, electronic device
CN110737650A (en) * 2019-09-27 2020-01-31 北京明略软件系统有限公司 Data quality detection method and device
CN110908987A (en) * 2019-11-12 2020-03-24 北京中数智汇科技股份有限公司 Data detection method and device
CN110990447A (en) * 2019-12-19 2020-04-10 北京锐安科技有限公司 Data probing method, device, equipment and storage medium
CN111078545A (en) * 2019-12-05 2020-04-28 贝壳技术有限公司 Method and system for automatically generating test data
CN111159181A (en) * 2019-12-18 2020-05-15 东软集团股份有限公司 Medical data screening method and device, storage medium and electronic equipment
CN111400365A (en) * 2020-02-26 2020-07-10 杭州美创科技有限公司 Business system data quality detection method based on standard SQ L
CN111400282A (en) * 2020-03-17 2020-07-10 北京锐安科技有限公司 Data processing strategy adjusting method, device, equipment and storage medium
CN111427928A (en) * 2020-03-26 2020-07-17 京东数字科技控股有限公司 Data quality detection method and device
CN111563074A (en) * 2020-04-28 2020-08-21 厦门市美亚柏科信息股份有限公司 Data quality detection method and system based on multi-dimensional label
CN112256682A (en) * 2020-10-22 2021-01-22 佳都新太科技股份有限公司 Data quality detection method and device for multi-dimensional heterogeneous data
CN112347095A (en) * 2020-11-16 2021-02-09 建信金融科技有限责任公司 Data table processing method and device and server
CN112559523A (en) * 2020-12-11 2021-03-26 北京锐安科技有限公司 Data detection method and device, electronic equipment and storage medium
CN112699103A (en) * 2020-12-04 2021-04-23 国泰新点软件股份有限公司 Data rule probing method and device based on data pre-analysis
CN113127482A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Data quality analysis method and device, computer equipment and storage medium
WO2021147559A1 (en) * 2020-08-31 2021-07-29 平安科技(深圳)有限公司 Service data quality measurement method, apparatus, computer device, and storage medium
CN113328981A (en) * 2020-07-09 2021-08-31 深信服科技股份有限公司 Rule quality detection method, device and equipment and readable storage medium
CN113342791A (en) * 2021-05-31 2021-09-03 中国工商银行股份有限公司 Data quality monitoring method and device
CN113434490A (en) * 2020-03-23 2021-09-24 北京京东振世信息技术有限公司 Quality detection method and device for offline imported data
CN113468158A (en) * 2021-07-13 2021-10-01 广域铭岛数字科技有限公司 Data repair method, system, electronic device and medium
CN113505159A (en) * 2021-07-16 2021-10-15 马上消费金融股份有限公司 Data detection method, device and equipment
CN113569006A (en) * 2021-06-17 2021-10-29 国家电网有限公司 Large-scale data quality anomaly detection method based on data characteristics
CN113591485A (en) * 2021-06-17 2021-11-02 国网浙江省电力有限公司 Intelligent data quality auditing system and method based on data science
CN113987190A (en) * 2021-11-16 2022-01-28 全球能源互联网研究院有限公司 Data quality check rule extraction method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760096A (en) * 2011-04-27 2012-10-31 阿里巴巴集团控股有限公司 Test data generation method, unit testing method and unit testing system
CN103020269A (en) * 2012-12-26 2013-04-03 广州市西美信息科技有限公司 Method and device for verifying data
CN104391934A (en) * 2014-11-21 2015-03-04 深圳市银雁金融配套服务有限公司 Data calibration method and device
CN104933096A (en) * 2015-05-22 2015-09-23 北京奇虎科技有限公司 Abnormal key recognition method of database, abnormal key recognition device of database and data system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760096A (en) * 2011-04-27 2012-10-31 阿里巴巴集团控股有限公司 Test data generation method, unit testing method and unit testing system
CN103020269A (en) * 2012-12-26 2013-04-03 广州市西美信息科技有限公司 Method and device for verifying data
CN104391934A (en) * 2014-11-21 2015-03-04 深圳市银雁金融配套服务有限公司 Data calibration method and device
CN104933096A (en) * 2015-05-22 2015-09-23 北京奇虎科技有限公司 Abnormal key recognition method of database, abnormal key recognition device of database and data system

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766510A (en) * 2017-10-23 2018-03-06 中国银行股份有限公司 A kind of data processing method, data query method and device
CN108170371A (en) * 2017-12-07 2018-06-15 北京空间技术研制试验中心 Interpretation conclusion compression and storage method
CN108536777A (en) * 2018-03-28 2018-09-14 联想(北京)有限公司 A kind of data processing method, server cluster and data processing equipment
CN108536777B (en) * 2018-03-28 2022-03-25 联想(北京)有限公司 Data processing method, server cluster and data processing device
CN108595563A (en) * 2018-04-13 2018-09-28 林秀丽 A kind of data quality management method and device
CN108959374A (en) * 2018-05-24 2018-12-07 北京三快在线科技有限公司 Date storage method, device and electronic equipment
CN109241043A (en) * 2018-08-13 2019-01-18 蜜小蜂智慧(北京)科技有限公司 A kind of data quality checking method and device
CN109241043B (en) * 2018-08-13 2022-10-14 蜜小蜂智慧(北京)科技有限公司 Data quality detection method and device
CN109491990A (en) * 2018-09-17 2019-03-19 武汉达梦数据库有限公司 A kind of method of detection data quality and the device of detection data quality
CN109510824A (en) * 2018-11-12 2019-03-22 中国银行股份有限公司 A kind of method of calibration and device of interface packets
CN109510824B (en) * 2018-11-12 2021-06-04 中国银行股份有限公司 Interface message checking method and device
CN109656812A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Data quality checking method, apparatus and storage medium
CN110019566A (en) * 2019-03-13 2019-07-16 平安信托有限责任公司 Data checking, device, computer equipment and storage medium based on data warehouse
CN110362563A (en) * 2019-07-19 2019-10-22 北京明略软件系统有限公司 The processing method and processing device of tables of data, storage medium, electronic device
CN110413596A (en) * 2019-07-30 2019-11-05 北京明略软件系统有限公司 Field processing method and processing device, storage medium, electronic device
CN110737650A (en) * 2019-09-27 2020-01-31 北京明略软件系统有限公司 Data quality detection method and device
CN110908987A (en) * 2019-11-12 2020-03-24 北京中数智汇科技股份有限公司 Data detection method and device
CN111078545A (en) * 2019-12-05 2020-04-28 贝壳技术有限公司 Method and system for automatically generating test data
CN111159181A (en) * 2019-12-18 2020-05-15 东软集团股份有限公司 Medical data screening method and device, storage medium and electronic equipment
CN110990447A (en) * 2019-12-19 2020-04-10 北京锐安科技有限公司 Data probing method, device, equipment and storage medium
CN110990447B (en) * 2019-12-19 2023-09-15 北京锐安科技有限公司 Data exploration method, device, equipment and storage medium
CN113127482B (en) * 2019-12-31 2024-03-26 奇安信科技集团股份有限公司 Data quality analysis method, device, computer equipment and storage medium
CN113127482A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Data quality analysis method and device, computer equipment and storage medium
CN111400365B (en) * 2020-02-26 2023-09-19 杭州美创科技股份有限公司 Service system data quality detection method based on standard SQL
CN111400365A (en) * 2020-02-26 2020-07-10 杭州美创科技有限公司 Business system data quality detection method based on standard SQ L
CN111400282B (en) * 2020-03-17 2023-06-09 北京锐安科技有限公司 Data processing strategy adjustment method, device, equipment and storage medium
CN111400282A (en) * 2020-03-17 2020-07-10 北京锐安科技有限公司 Data processing strategy adjusting method, device, equipment and storage medium
CN113434490A (en) * 2020-03-23 2021-09-24 北京京东振世信息技术有限公司 Quality detection method and device for offline imported data
CN113434490B (en) * 2020-03-23 2024-04-12 北京京东振世信息技术有限公司 Quality detection method and device for offline imported data
CN111427928A (en) * 2020-03-26 2020-07-17 京东数字科技控股有限公司 Data quality detection method and device
CN111563074A (en) * 2020-04-28 2020-08-21 厦门市美亚柏科信息股份有限公司 Data quality detection method and system based on multi-dimensional label
CN111563074B (en) * 2020-04-28 2022-05-31 厦门市美亚柏科信息股份有限公司 Data quality detection method and system based on multi-dimensional label
CN113328981A (en) * 2020-07-09 2021-08-31 深信服科技股份有限公司 Rule quality detection method, device and equipment and readable storage medium
WO2021147559A1 (en) * 2020-08-31 2021-07-29 平安科技(深圳)有限公司 Service data quality measurement method, apparatus, computer device, and storage medium
CN112256682B (en) * 2020-10-22 2022-09-20 佳都科技集团股份有限公司 Data quality detection method and device for multi-dimensional heterogeneous data
CN112256682A (en) * 2020-10-22 2021-01-22 佳都新太科技股份有限公司 Data quality detection method and device for multi-dimensional heterogeneous data
CN112347095B (en) * 2020-11-16 2023-04-21 建信金融科技有限责任公司 Data table processing method, device and server
CN112347095A (en) * 2020-11-16 2021-02-09 建信金融科技有限责任公司 Data table processing method and device and server
CN112699103A (en) * 2020-12-04 2021-04-23 国泰新点软件股份有限公司 Data rule probing method and device based on data pre-analysis
WO2022121337A1 (en) * 2020-12-11 2022-06-16 北京锐安科技有限公司 Data exploration method and apparatus, and electronic device and storage medium
CN112559523A (en) * 2020-12-11 2021-03-26 北京锐安科技有限公司 Data detection method and device, electronic equipment and storage medium
CN113342791A (en) * 2021-05-31 2021-09-03 中国工商银行股份有限公司 Data quality monitoring method and device
CN113591485A (en) * 2021-06-17 2021-11-02 国网浙江省电力有限公司 Intelligent data quality auditing system and method based on data science
CN113569006A (en) * 2021-06-17 2021-10-29 国家电网有限公司 Large-scale data quality anomaly detection method based on data characteristics
CN113468158A (en) * 2021-07-13 2021-10-01 广域铭岛数字科技有限公司 Data repair method, system, electronic device and medium
CN113468158B (en) * 2021-07-13 2023-10-31 广域铭岛数字科技有限公司 Data restoration method, system, electronic equipment and medium
CN113505159A (en) * 2021-07-16 2021-10-15 马上消费金融股份有限公司 Data detection method, device and equipment
CN113987190A (en) * 2021-11-16 2022-01-28 全球能源互联网研究院有限公司 Data quality check rule extraction method and system
CN113987190B (en) * 2021-11-16 2023-02-28 国网智能电网研究院有限公司 Data quality check rule extraction method and system

Also Published As

Publication number Publication date
CN106708909B (en) 2020-12-08

Similar Documents

Publication Publication Date Title
CN106708909A (en) Data quality detection method and apparatus
CN110413506A (en) Test case recommended method, device, equipment and storage medium
US9152620B2 (en) System, method, and computer-readable medium for plagiarism detection
CN105446864B (en) Method and device for verifying influence of deletion of cache file and mobile terminal
CN106127505A (en) The single recognition methods of a kind of brush and device
CN104756106A (en) Characterizing data sources in a data storage system
CN107423613A (en) The method, apparatus and server of device-fingerprint are determined according to similarity
CN106919957A (en) The method and device of processing data
CN107545043A (en) A kind of data application method and device based on data quality checking
CN106557420B (en) Test DB data creation method and device
CN104778179A (en) Data migration test method and system
CN112686388A (en) Data set partitioning method and system under federated learning scene
CN102521713B (en) Data processing equipment and data processing method
Bittmann et al. Decision‐making method using a visual approach for cluster analysis problems; indicative classification algorithms and grouping scope
CN108897765A (en) A kind of batch data introduction method and its system
CN111460315A (en) Social portrait construction method, device and equipment and storage medium
CN110413596A (en) Field processing method and processing device, storage medium, electronic device
CN109582906A (en) Determination method, apparatus, equipment and the storage medium of data reliability
CN105701086A (en) Method and system for detecting literature through sliding window
CN105677641B (en) A kind of paper self checking method and system
CN110941638B (en) Application classification rule base construction method, application classification method and device
CN106776348A (en) Testing case management and device
CN101587511A (en) Appraising method for short answer questions in computer auxiliary test system
CN103136187A (en) Method and system for extraction of patent rejection information
Lipke Further Study of the Normality of CPI and SPI (t)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant