CN109992576A - A kind of government data quality evaluation and abnormal data recovery technique based on big data technology - Google Patents
A kind of government data quality evaluation and abnormal data recovery technique based on big data technology Download PDFInfo
- Publication number
- CN109992576A CN109992576A CN201910156894.6A CN201910156894A CN109992576A CN 109992576 A CN109992576 A CN 109992576A CN 201910156894 A CN201910156894 A CN 201910156894A CN 109992576 A CN109992576 A CN 109992576A
- Authority
- CN
- China
- Prior art keywords
- data
- quality
- rule
- library
- inspection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 35
- 238000005516 engineering process Methods 0.000 title claims abstract description 19
- 238000011084 recovery Methods 0.000 title claims abstract description 11
- 238000013441 quality evaluation Methods 0.000 title claims abstract description 8
- 238000007689 inspection Methods 0.000 claims abstract description 91
- 238000013135 deep learning Methods 0.000 claims abstract description 15
- 238000007667 floating Methods 0.000 claims description 28
- 238000001303 quality assessment method Methods 0.000 claims description 19
- 238000007726 management method Methods 0.000 claims description 16
- 238000001514 detection method Methods 0.000 claims description 11
- 238000012372 quality testing Methods 0.000 claims description 11
- 238000012544 monitoring process Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 5
- 230000010354 integration Effects 0.000 claims description 5
- 230000008439 repair process Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 4
- PCTMTFRHKVHKIS-BMFZQQSSSA-N (1s,3r,4e,6e,8e,10e,12e,14e,16e,18s,19r,20r,21s,25r,27r,30r,31r,33s,35r,37s,38r)-3-[(2r,3s,4s,5s,6r)-4-amino-3,5-dihydroxy-6-methyloxan-2-yl]oxy-19,25,27,30,31,33,35,37-octahydroxy-18,20,21-trimethyl-23-oxo-22,39-dioxabicyclo[33.3.1]nonatriaconta-4,6,8,10 Chemical compound C1C=C2C[C@@H](OS(O)(=O)=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2.O[C@H]1[C@@H](N)[C@H](O)[C@@H](C)O[C@H]1O[C@H]1/C=C/C=C/C=C/C=C/C=C/C=C/C=C/[C@H](C)[C@@H](O)[C@@H](C)[C@H](C)OC(=O)C[C@H](O)C[C@H](O)CC[C@@H](O)[C@H](O)C[C@H](O)C[C@](O)(C[C@H](O)[C@H]2C(O)=O)O[C@H]2C1 PCTMTFRHKVHKIS-BMFZQQSSSA-N 0.000 claims description 3
- 238000012098 association analyses Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000011840 criminal investigation Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 claims description 3
- 239000003814 drug Substances 0.000 claims description 3
- 229940079593 drug Drugs 0.000 claims description 3
- 230000036541 health Effects 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 claims description 3
- 238000012552 review Methods 0.000 claims description 3
- 238000011157 data evaluation Methods 0.000 claims description 2
- 230000008520 organization Effects 0.000 claims description 2
- 230000006872 improvement Effects 0.000 claims 1
- 230000004888 barrier function Effects 0.000 abstract description 3
- 238000007405 data analysis Methods 0.000 abstract description 3
- 239000000203 mixture Substances 0.000 description 4
- 238000009472 formulation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 102100040160 Rabankyrin-5 Human genes 0.000 description 1
- 101710086049 Rabankyrin-5 Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of government data quality evaluations and abnormal data recovery technique based on big data technology in data analysis technique field, first establish database, then carry out data quality accessment, finally carry out quality of data reparation;The present invention is by carrying out null value, codomain, specification, logic, reference property, repeated data inspection to data field, from data integrity, relevance, uniqueness, accuracy, consistency and normative six dimension comprehensive assessment qualities of data, and creation data quality appraisal report, repaired by hand is carried out to data by user or rule is repaired or deep learning reparation, government is helped to break internal data barrier, vitalize data assets, promote data value, unified intelligent data service is externally provided, the value bonus of big data is further deep-cut and discharge.
Description
Technical field
The present invention relates to data analysis technique fields, and in particular to a kind of government data quality based on big data technology is commented
Estimate and abnormal data recovery technique.
Background technique
Project be based on PDCA (Plan, Do, Check, Act, U.S. quality control specialist doctor Xiu Hate, after by Dai Mingcai
Receive universal) method for quality control, DQAF (Data Quality Assessment Framework, the IMF joint World Bank
Disclose international data quality accessment frame) Data quality assessment model, DAMA (international data manage association) number
According to management function frame and the abnormal data recovery technique based on deep learning, establishes complete big data science and administer system
And standard, guarantee that the quality of data, the service efficiency of Improving Government ensure that this highway of data infrastructure is efficient, unimpeded,
To build wisdom government lay a good foundation.
There are relevant Database Systems in current each government department, and cuts management, causes government information not smooth, and
The data of government database are mixed and disorderly, and there are all kinds of problems in a large amount of data without bad lookup and discovery, easily cause data
Missing and inaccuracy, such as in population library ID card No. fail to fill in or fill in it is incorrect, in legal person library relevant information it is imperfect or
Mistake etc., using this big data analysis and appraisal procedure, entire each government information platform, from data integrity, relevance, only
One property, accuracy, consistency and normative six dimension comprehensive assessment qualities of data.Based on this, the present invention devises one kind
Government data quality evaluation and abnormal data recovery technique based on big data technology, to solve the above problems.
Summary of the invention
The purpose of the present invention is to provide a kind of government data quality evaluations based on big data technology and abnormal data to repair
Recovering technology, to solve the problems mentioned in the above background technology.
To achieve the above object, the invention provides the following technical scheme: a kind of government data matter based on big data technology
Amount assessment and abnormal data recovery technique, the specific steps are as follows:
The first step establishes database
The database includes base library and theme library, and the solution of the base library construction combines current government data
The problem of, shared thinking is built according to overall planning, one, based on the department of key data source, passes through data
Acquire that exchange, working process, information is integrated and the means such as mining analysis, integrate people society, civil administration, credit, public security, industry and commerce, health,
Other committees such as education, traffic do the data of office, and planning standard standard system constructs base library, and provides face in this base library
To the data sharing service of government department and the public, corresponding client includes the Committee of Development and Reform, Jing Xinwei, big data office;
The theme library is discussed with overall strategic planning and Object--oriented method as foundation, in conjunction with the business characteristic of client,
The means such as exchange, Data Integration, association analysis are acquired by data, the theme library of characteristic is established, vitalizes data assets, to make
Innovation special topic application lays the foundation, such as the legal person library of market surveillance management board, special equipment library, food storage, drug storage, license
Library, population library, license library, criminal investigation library, security administration library, the entry and exit library of public security bureau, the population library of Department of Civil Affairs, social group
Knit library, aged library, welfare library, marriage library etc.;
Second step, data quality accessment
(1) general rule management
The general rule management includes five groups of general, network, date, character and numerical value rules, it is described it is general include body
Part card, phone number, mailbox, postcode and fixed-line telephone, the regular expression of the identity card are
^[1-9]\d{7}((0\d)|(1[0-2]))(([0|1|2]\d)|3[0-1])\d{3}$|^[1-9]\d{5}[1-
9] d { 3 } ((0 d) | (1 [0-2])) (([0 | 1 | 2] d) | 3 [0-1]) d { 3 } ([0-9] | X) $
The rule of the identity card is described as
China second-generation identity card, such as 420106198311136666, regular length are 18, and first 17 are number, last position
For number or letter x, and it is necessary for legal effective ID card No.;One generation ID: such as 420106831113666, Gu
Measured length is 15, and 7 to 12 are six dates;
The regular expression of the phone number is
^1 ([38] [0-9] | 4 [579] | 5 [0-3,5-9] | 6 [6] | 7 [0135678] | 9 [89]) d { 8 } $,
The rule of the phone number is described as
Such as 13666666666, started with number 1, regular length is 11;
The regular expression of the mailbox is
^ w+ ([-+] w+) *@w+ ([-] w+) * w+ ([-] w+) * $,
The rule of the mailbox is described as
Such as 123 mail.com, English alphabet, number and underscore can only occur in mailbox names and cannot be with underscore
Beginning, and with the ending of the characters such as .com .cn .edu;
The regular expression of the postcode be [1-9] d { 5 } (?!D), the rule of the postcode is described as
Beginning cannot be 0, totally 6 numbers;
The regular expression of the fixed-line telephone be d { 3 }-d { 8 } | d { 4 }-d { 7 }, the rule of the fixed-line telephone
It is described as
Such as 027-88880808-1, wherein 027 is area code, 1 is extension number, is separated with "-", and area code and extension number can not
It fills out;
The network includes the address IPv4, the address IPv6 and MAC Address, and the regular expression of the address IPv4 is
^ ((25 [0-5] | 2 [0-4] d | [01]? d d?)) { 3 } (and 25 [0-5] | 2 [0-4] d | [01]? d d?) $,
The rule of the address IPv4 is described as
It such as 000.000.000.000, is made of 4 0~255 numerical value, is separated with " ";
The regular expression of the address IPv6 is
^ ([da-fA-F] { Isosorbide-5-Nitrae } :) { 7 } [da-fA-F] { Isosorbide-5-Nitrae } $,
The rule of the address IPv6 is described as
Such as CDCD:910A:222:9:8475:11:390:2020, it is made of 8 four hexadecimal numerical value, with ": "
It separates, while supporting to write a Chinese character in simplified form or mix literary style, it is recommended that using standard literary style;
The regular expression of the MAC Address is
[0-9a-fA-F] { 2 } (: [0-9a-fA-F] { 2 }) { 5 },
The rule of the MAC Address is described as
Such as 00-00-00-00-00-00, is formed with 6 two hexadecimal numbers, separated with "-";
The date include YYYY.MM.DD, YYYYMMDD, YYYY/MM/DD, YYYY MM month DD day, YYYY and
YYYYMM, wherein the YYYY is the specific time, and the MM is specific month, and the DD is exact date, the YYYY/
The regular expression of MM/DD be (d { 4 })/(d { 1,2 })/(d { 1,2 });
The numerical value includes nonnegative integer, integer, non-negative floating number, floating number, integer band percentage sign, floating number percentage
Number, integer band per thousand sign and floating number per thousand sign, the regular expression of the nonnegative integer be ^ [1-9] d* | 0 $, it is described non-negative
The rule of integer is described as the character string of nonnegative integer format, and such as 28;
Does is the regular expression of the integer ^-? [1-9] d* $, the rule of the integer is described as the character of integer data format
String;
The regular expression ^ of the non-negative floating number d+ (d+)? $, the rule of the non-negative floating number be described as
The character string of non-negative floating number format;
The regular expression of the floating number be ^ (-? d+) (d+)? the rule of $, the floating number are described as floating-point
The character string of number format;
(2) data quality model
Incidence relation according to the general rule management establishes data quality model, the data quality model be based on
The Data quality assessment model of DQAF, the Data quality assessment model includes that entity table, incidence relation and rule describe, described
The entity table name of entity table is selected from database, incidence relation of the incidence relation between main table and word table, the rule
Description is divided into null value inspection, codomain detection, normalized checking, logical check, repeated data inspection and referential integrity and checks six groups
Rule type;
(3) quality-monitoring task
According to the title of the data quality model, all data of the data quality model is exported, according to quality mould
Type title, quality model description, implementation strategy, execute state recently and execute recently time etc. to the data quality model into
Row assessment, completes quality testing task;
(4) quality-monitoring is reported
According to the quality testing task, quality testing report is generated;
(5) quality appraisal report
According to the quality testing report content, from data integrity, relevance, uniqueness, accuracy, consistency with
And the normative six comprehensive assessment qualities of data only, the data in the database are generated and are based on database classification and data
The quality appraisal report of library name, the quality appraisal report include that quality score, quality score figure and data quality model are commented
Divide ranking list, the quality score includes overall quality scoring, quality model number and model rule number;The quality score figure packet
Include quality score tendency chart and data aggregate distribution figure;The data quality model scoring ranking list is commented according to data quality model
Divide ranking.
Preferably, the data quality accessment can pass through the customized length range of fixed character, the rule of the fixed character
Then be described as support asterisk wildcard " * " and "? ";" * " represents multiple any characters, "? " an any character is represented, such as ABC*:
ABC, ABCD, ABCDE meet the expression formula;A? C: only ABC, ADC meets, and ABDC does not meet expression formula then.
Preferably, in the data quality model, detection cycle is manually set, according to detection cycle timing to the quality of data
It is detected.
Preferably, the Data quality assessment model based on DQAF, the GB/T25000.24- based on DQAF and China
2017 " system and soft project system and software quality require and evaluate (SQuaRE) " the 12nd partial data quality models and
24th partial data mass measurement establishes general data quality assessment models and thematic data quality evaluation for Urban Data center
Model supports null value inspection, codomain inspection, normalized checking, logical check, repeated data inspection, referential integrity inspection, peels off
Advanced, comprehensive, the expansible quality such as value inspection, timeliness inspection, missing inspection, fluctuation inspection, balance inspection are commented
Valence algorithmic technique meets the definition of Constructing data center, each rule-like in data governance process, establishes the assessment mould of science
Type finally carries out Urban Data center from six integrality, normalization, consistency, accuracy, uniqueness, relevance dimensions
Comprehensive assessment.
Preferably, according to the data quality accessment, abnormal data is dealt, checks the abnormal data, and to institute
It states abnormal data and carries out quality of data reparation, the method for the quality of data reparation includes repaired by hand, rule is repaired and depth
It repairs, the repaired by hand is manually to pass through computer keyboard to database update abnormal data and the correct data letter of typing
Breath, the rule are repaired as according to the correct data information of general rule typing, the depth reparation is deep learning exception number
According to recovery technique, mature Hadoop/Spark big data technology is made full use of, extensive number is realized by deep learning algorithm
It is administered according to automation, examination abnormal data and reparation abnormal data carry out exception using deep learning method for abnormal data
Data reparation, the main depth including average value filling, K minimum distance method, recurrence, the estimation of very big liny and multiple interpolation
Learning method realizes that efficiently accurately data are administered, and improve the quality of data in conjunction with manual review.
Compared with prior art, the beneficial effects of the present invention are:
(1) present invention Database Systems open to notebook data platform by binding, by data field carry out null value,
Codomain, specification, logic, reference property, repeated data inspection, from data integrity, relevance, uniqueness, accuracy, consistency with
And normative six dimension comprehensive assessment qualities of data, and creation data quality appraisal report, hand is carried out to data by user
Work reparation or rule reparation or deep learning reparation.
(2) present invention, which builds and administers based on smart city large data center, provides integrative solution, from data
It is originally formed, standard formulation, secure storage, exchanges shared, applied analysis to science decision, precisely prediction, form complete city
City's big data is administered and management system, constructs city big data platform, from data standard, manages, runs to decision and formed completely
Data ecological chain, continuous growth data boundary promotes the quality of data and data user rate, constructs city healthy ecology.It helps
Government breaks internal data barrier, vitalizes data assets, promotes data value, externally provides unified intelligent data service,
Further deep-cut and discharge the value bonus of big data.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will be described below to embodiment required
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is general rule schematic table of the present invention.
Fig. 2 is Data quality assessment model schematic table of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other
Embodiment shall fall within the protection scope of the present invention.
The present invention provides a kind of technical solution referring to FIG. 1-2: a kind of government data quality based on big data technology is commented
Estimate and abnormal data recovery technique, the specific steps are as follows:
The first step establishes database
Database includes base library and theme library, and the solution of base library construction combines present in current government data
Problem builds shared thinking according to overall planning, one, based on the department of key data source, exchanged by the acquisition of data,
The means such as working process, information integration and mining analysis, integrate people society, civil administration, credit, public security, industry and commerce, health, education, traffic
The data of office are done Deng other committees, planning standard standard system constructs base library, and Government department is provided in this base library
With the data sharing service of the public, corresponding client includes the Committee of Development and Reform, Jing Xinwei, big data office;
Theme library is discussed with overall strategic planning and Object--oriented method as foundation, in conjunction with the business characteristic of client, is passed through
Data acquire the means such as exchange, Data Integration, association analysis, establish the theme library of characteristic, vitalize data assets, to make innovation
Special topic application lays the foundation, public such as the legal person library of market surveillance management board, special equipment library, food storage, drug storage, license library
The population library of peace office, license library, criminal investigation library, security administration library, entry and exit library, the population library of Department of Civil Affairs, social organization library,
Aged library, welfare library, marriage library etc.;
Second step, data quality accessment
(1) general rule management
General rule management includes general, network, five groups of date, character and numerical value rules, and general includes identity card, mobile phone
The regular expression of number, mailbox, postcode and fixed-line telephone, identity card is
^[1-9]\d{7}((0\d)|(1[0-2]))(([0|1|2]\d)|3[0-1])\d{3}$|^[1-9]\d{5}[1-
9] d { 3 } ((0 d) | (1 [0-2])) (([0 | 1 | 2] d) | 3 [0-1]) d { 3 } ([0-9] | X) $
The rule of identity card is described as
China second-generation identity card, such as 420106198311136666, regular length are 18, and first 17 are number, last position
For number or letter x, and it is necessary for legal effective ID card No.;One generation ID: such as 420106831113666, Gu
Measured length is 15, and 7 to 12 are six dates;
The regular expression of phone number is
^1 ([38] [0-9] | 4 [579] | 5 [0-3,5-9] | 6 [6] | 7 [0135678] | 9 [89]) d { 8 } $,
The rule of phone number is described as
Such as 13666666666, started with number 1, regular length is 11;
The regular expression of mailbox is
^ w+ ([-+] w+) *@w+ ([-] w+) * w+ ([-] w+) * $,
The rule of mailbox is described as
Such as 123 mail.com, English alphabet, number and underscore can only occur in mailbox names and cannot be with underscore
Beginning, and with the ending of the characters such as .com .cn .edu;
The regular expression of postcode be [1-9] d { 5 } (?!D), the rule of postcode is described as
Beginning cannot be 0, totally 6 numbers;
The regular expression of fixed-line telephone be d { 3 }-d { 8 } | d { 4 }-d { 7 }, the rule of fixed-line telephone is described as
Such as 027-88880808-1, wherein 027 is area code, 1 is extension number, is separated with "-", and area code and extension number can not
It fills out;
Network includes the address IPv4, the address IPv6 and MAC Address, and the regular expression of the address IPv4 is
^ ((25 [0-5] | 2 [0-4] d | [01]? d d?)) { 3 } (and 25 [0-5] | 2 [0-4] d | [01]? d d?) $,
The rule of the address IPv4 is described as
It such as 000.000.000.000, is made of 4 0~255 numerical value, is separated with " ";
The regular expression of the address IPv6 is
^ ([da-fA-F] { Isosorbide-5-Nitrae } :) { 7 } [da-fA-F] { Isosorbide-5-Nitrae } $,
The rule of the address IPv6 is described as
Such as CDCD:910A:222:9:8475:11:390:2020, it is made of 8 four hexadecimal numerical value, with ": "
It separates, while supporting to write a Chinese character in simplified form or mix literary style, it is recommended that using standard literary style;
The regular expression of MAC Address is
[0-9a-fA-F] { 2 } (: [0-9a-fA-F] { 2 }) { 5 },
The rule of MAC Address is described as
Such as 00-00-00-00-00-00, is formed with 6 two hexadecimal numbers, separated with "-";
Date includes YYYY.MM.DD, YYYYMMDD, YYYY/MM/DD, YYYY MM month DD day, YYYY and YYYYMM,
Wherein, YYYY is the specific time, and MM is specific month, and DD is exact date, the regular expression of YYYY/MM/DD be (d
{4})\/(\d{1,2})\/(\d{1,2});
Numerical value includes nonnegative integer, integer, non-negative floating number, floating number, integer band percentage sign, floating number percentage sign, whole
Number band per thousand sign and floating number per thousand signs, the regular expression of nonnegative integer be ^ [1-9] d* | the rule of 0 $, nonnegative integer are retouched
It states as the character string of nonnegative integer format, such as 28;
Does is the regular expression of integer ^-? [1-9] d* $, the rule of integer is described as the character string of integer data format;
The regular expression ^ of non-negative floating number d+ (d+)? $, the rule of non-negative floating number are described as non-negative floating-point
The character string of number format;
The regular expression of floating number be ^ (-? d+) (d+)? the rule of $, floating number are described as floating number format
Character string;
(2) data quality model
Incidence relation according to general rule management establishes data quality model, and data quality model is the number based on DQAF
According to Evaluation Model on Quality, the Data quality assessment model includes that entity table, incidence relation and rule describe, the entity table
Entity table name is selected from database, incidence relation of the incidence relation between main table and word table, and the rule description is divided into
Null value inspection, codomain detection, normalized checking, logical check, repeated data inspection and referential integrity check six groups of rule types;
(3) quality-monitoring task
According to the title of data quality model, all data of data quality model is exported, according to quality model title, matter
Amount model description, implementation strategy, nearest execution state and nearest execution time etc. assess data quality model, complete matter
Measure Detection task;
(4) quality-monitoring is reported
According to quality testing task, quality testing report is generated;
(5) quality appraisal report
According to the content of quality testing report, from data integrity, relevance, uniqueness, accuracy, consistency and rule
The comprehensive assessment quality of data only of plasticity six generates based on database classification and data library name the data in database
Quality appraisal report, quality appraisal report include quality score, quality score figure and data quality model scoring ranking list, quality
Scoring includes overall quality scoring, quality model number and model rule number;Quality score figure includes quality score tendency chart sum number
According to aggregate distribution figure;Data quality model scores ranking list according to data quality model scoring ranking;
Wherein, data quality accessment can be described as propping up by the customized length range of fixed character, the rule of fixed character
Hold asterisk wildcard " * " and "? ";" * " represents multiple any characters, "? " represent an any character, as ABC*:ABC, ABCD,
ABCDE meets the expression formula;A? C: only ABC, ADC meets, and ABDC does not meet expression formula then.
In data quality model, manually set detection cycle, according to detection cycle timing to data null value, codomain, specification,
Logic, reference property, repeated data inspection.
Data quality assessment model based on DQAF, based on DQAF and China GB/T 25000.24-2017 " system with
Soft project system and software quality require and evaluate (SQuaRE) " the 12nd partial data quality model and the 24th part number
According to mass measurement, general data quality assessment models and thematic data Evaluation Model on Quality are established for Urban Data center, are supported
Null value inspection, codomain inspection, normalized checking, logical check, repeated data inspection, referential integrity inspection, outlier inspection and
The inspection of when property, missing check, fluctuate advanced, comprehensive, the expansible quality evaluation algorithm skills such as inspection, balance inspection
Art meets the definition of Constructing data center, each rule-like in data governance process, establishes the assessment models of science, finally from
Six integrality, normalization, consistency, accuracy, uniqueness, relevance dimensions carry out comprehensive assessment to Urban Data center.
According to data quality accessment, abnormal data is dealt, checks abnormal data, and data matter is carried out to abnormal data
Amount is repaired, and the method for quality of data reparation includes repaired by hand, rule is repaired and depth reparation, and repaired by hand is manually to pass through meter
Switch disk is calculated to database update abnormal data and the correct data information of typing, rule repair for according to general rule typing just
True data information, depth reparation are deep learning abnormal data recovery technique, make full use of mature Hadoop/Spark big
Data technique is realized that large-scale data automates by deep learning algorithm and is administered, and screens abnormal data and repairs abnormal data,
For abnormal data using deep learning method carry out abnormal data reparation, mainly include average value filling, K minimum distance method,
The deep learning method of recurrence, the estimation of very big liny and multiple interpolation realizes efficiently accurately data in conjunction with manual review
It administers, improves the quality of data.
One concrete application of the present embodiment are as follows:
Data quality assessment model includes that entity table, incidence relation and rule description, the entity table name of entity table are selected from
Database, enable in database include table1, table2, table3, table4 and table5, table1 include C11, C12,
C13, C14, C15, C16, C17 and C18, table2 include C21, C22, C23, C24, C25, C26, C27 and C28, table3 packet
Include C31, C32, C33, C34, C35, C36, C37 and C38, table4 include C41, C42, C43, C44, C45, C46, C47 and
C48, table5 include C51, C52, C53, C54, C55, C56, C57 and C58, table1, table2, table3 and table4
Data volume be respectively 1000,800,1500 and 2000;
Incidence relation of the incidence relation between main table and word table, rule description are divided into null value inspection, codomain detection, specification
Inspection, logical check, repeated data inspection and referential integrity check six groups of rule types;
Enable null value inspection regular number be 3, respectively to the inspection column C11 of the database auditing table table1 of null value inspection,
C12, C13, inspection column C24, C25 of table2, the inspection column C38 of table3 is checked, " selectcount (*) is inputted
Fromtable1whereC11isnullor C12isnullorC13isnull ", obtain table1 inspection column C11, C12,
The problem of C13 rank is important, rank 5, and weight 9, it is not sky that inspection condition, which is whole, and the data volume that breaks the rules is 100;
It inputs " selectcount (*) fromtable2whereC24isnulland C25isnull ", obtains the inspection column of table2
The problem of C24, C25, rank was serious, rank 3, and weight 7, inspection condition is that at least one is not sky, and break the rules data
Amount is 80;It inputs " selectcount (*) fromtable3whereC38isnull ", obtains asking for the inspection column C38 of table3
Topic rank is general, rank 1, and weight 3, it is not sky that inspection condition, which is whole, and the data volume that breaks the rules is 50;Calculate null value
The weighted value of inspection is 2400, and total weight is 28, weighted average 85.71428571;
The regular number for enabling codomain inspection is 2, respectively to the inspection column C18 of the database auditing table table1 of codomain inspection
It is checked with the inspection column C34 of table3, inputs " selectcount (*) fromtable1where!(C18 > minimum value
AndC18≤maximum value) ", the problem of obtaining the inspection column C18 of table1 rank be serious, rank 3, weight 6 violates
Regular data amount is 300;Input " selectcount (*) fromtable3where!(C34 > minimum value andC34≤maximum
Value) ", the problem of obtaining the inspection column C34 of table3 rank be general, rank 1, weight 2, the data volume that breaks the rules is
350;The weighted value for calculating codomain inspection is 3750, and total weight is 12, weighted average 312.5;
Enable normalized checking regular number be 3, respectively to the inspection column C15 of the database auditing table table1 of normalized checking,
The inspection column C48 of the inspection column C44 and table4 of table4 is checked, " selectcount (*) is inputted
fromtable1where!C15regrxp' identity card regular expression ' ", the problem of obtaining the inspection column C15 of table1 rank
To be serious, rank 3, weight 8, inspection condition is identity card, and the data volume that breaks the rules is 7;Input " selectcount (*)
from table1where!C15regrxp' phone number regular expression ' ", the problem of obtaining the inspection column C44 of table4
Rank is general, rank 1, and weight 3, inspection condition is phone number, and the data volume that breaks the rules is 9;Input
"selectcount(*)fromtable1where!C15regrxp ' mailbox regular expression ' ", obtain the inspection column of table4
The problem of C48, rank was general, rank 1, and weight 4, inspection condition is mailbox, and the data volume that breaks the rules is 20;Calculate rule
The weighted value of model inspection is 213, and total weight is 20, weighted average 10.65;
The regular number for enabling logical check is 2, is carried out respectively to database auditing the table table1 and table2 of logical check
It checks, the inspection formula of table1 is " casewhenC14isnullthenC15 in (' A', ' B') end ", input
"selectcount(*)fromtable1where!(casewhenC14 isnullthenC15in (' A', ' B') end) ", it obtains
Rank is serious, rank 3 the problem of table1 out, and weight 7, the data volume that breaks the rules is 10;The inspection formula of table2
For " if (C24 isnotnull, C24in (' 1', ' 2'), 1=1) ", input " selectcount (*) fromtable2
where!If (C24isnotnull, C24in (' 1', ' 2'), 1=1) ", the problem of obtaining table2 rank be it is general, rank is
1, weight 1, the data volume that breaks the rules is 30;The weighted value for calculating logical check is 160, and total weight is 12, weighted average
It is 13.3333333;
The regular number for enabling repeated data inspection is 3, respectively to the inspection of the database auditing table table2 of repeated data inspection
Column C11, C12, C13 are looked into, inspection column C34, C35 of table3, inspection column C41, C42 of table4 is checked, is inputted
" selectcount (*) fromtable2groupby C11, C12, C13havingcount (*) > 1 ", obtain the inspection of table2
The problem of looking into column C11, C12, C13 rank is serious, rank 3, and weight 6, the data volume that breaks the rules is 100;Input
" selectcount (*) fromtable3groupbyC34, C35havingcount (*) > 1 " obtains the inspection column of table3
The problem of C34, C35, rank was serious, rank 3, and weight 7, the data volume that breaks the rules is 150;Input " selectcount
(*) fromtable4groupby C41, C42havingcount (*) > 1 ", the problem of obtaining inspection column C41, C42 of table4
Rank is general, rank 1, and weight 3, the data volume that breaks the rules is 130;Calculate repeated data inspection weighted value be
2920, total weight is 23, weighted average 126.9565217;
The regular number for enabling referential integrity inspection is 3, respectively to the database auditing table table1 of referential integrity inspection
Inspection column C18, table2 inspection column C28, table3 inspection column C38 checked, input " selectcount (*)
Fromtable1whereC18notin (' A', ' B', ' C') ", the problem of obtaining the inspection column C18 of table1 rank be it is important,
Rank is 5, weight 9, and the data volume that breaks the rules is 130;Input " selectcount (*) fromtable2where
C28notin (' A', ' B', ' C') ", the problem of obtaining the inspection column C28 of table2 rank be serious, rank 3, weight 6,
The data volume that breaks the rules is 170;Input " selectcount (*) from table3whereC38notin (' A', ' B', '
C') ", the problem of obtaining the inspection column C38 of table3 rank is general, rank 1, weight 3, and the data volume that breaks the rules is
190;The weighted value for calculating referential integrity inspection is 4110, and total weight is 27, weighted average 152.2222222;
The scoring calculation formula of comprehensive score is the 100- (SUM (weighted average of null value inspection: referential integrity inspection
Weighted average)/SUM (data volume of table1: the data volume of table4)) * 100=100- (SUM (85.71428571:
152.2222222)/SUM (1000:2000)) * 100=86.76648372.
By binding the Database Systems open to notebook data platform, null value is carried out to data field, codomain, standardizes, patrol
It collects, reference property, repeated data inspection, from data integrity, relevance, uniqueness, accuracy, consistency and normalization six
The dimension comprehensive assessment quality of data, and creation data quality appraisal report carry out repaired by hand or rule to data by user
Reparation or deep learning reparation.It builds and administers based on smart city large data center and integrative solution is provided, from number
According to being originally formed, standard formulation, secure storage, exchanging shared, applied analysis to science decision, precisely prediction, formed completely
City big data is administered and management system, constructs city big data platform, from data standard, manages, runs to decision and formed
Whole data ecological chain, continuous growth data boundary promote the quality of data and data user rate, construct city healthy ecology.Side
It helps government to break internal data barrier, vitalize data assets, promote data value, unified intelligent data clothes is externally provided
Business, further deep-cuts and discharges the value bonus of big data.
In the description of this specification, the description of reference term " one embodiment ", " example ", " specific example " etc. means
Particular features, structures, materials, or characteristics described in conjunction with this embodiment or example are contained at least one implementation of the invention
In example or example.In the present specification, schematic expression of the above terms may not refer to the same embodiment or example.
Moreover, particular features, structures, materials, or characteristics described can be in any one or more of the embodiments or examples to close
Suitable mode combines.
Present invention disclosed above preferred embodiment is only intended to help to illustrate the present invention.There is no detailed for preferred embodiment
All details are described, also do not limit the specific embodiment that the invention is only.Obviously, according to the content of this specification, can make
Many modifications and variations.These embodiments are chosen and specifically described to this specification, is original in order to better explain the present invention
Reason and practical application, so that skilled artisan be enable to better understand and utilize the present invention.The present invention is only authorized
The limitation of sharp claim and its full scope and equivalent.
Claims (5)
1. a kind of government data quality assessment techniques based on big data technology, which is characterized in that specific step is as follows:
The first step establishes database
The database includes base library and theme library, and the solution of the base library construction combines deposits in current government data
The problem of, according to overall planning, one build shared thinking, based on the department of key data source, pass through the acquisition of data
The means such as exchange, working process, information integration and mining analysis, integrate people society, civil administration, credit, public security, industry and commerce, health, religion
Educate, other committees such as traffic do the data of office, planning standard standard system constructs base library, and provide in this base library towards
The data sharing service of government department and the public, corresponding client include the Committee of Development and Reform, Jing Xinwei, big data office;
The theme library is discussed with overall strategic planning and Object--oriented method as foundation, in conjunction with the business characteristic of client, is passed through
Data acquire the means such as exchange, Data Integration, association analysis, establish the theme library of characteristic, vitalize data assets, to make innovation
Special topic application lays the foundation, public such as the legal person library of market surveillance management board, special equipment library, food storage, drug storage, license library
The population library of peace office, license library, criminal investigation library, security administration library, entry and exit library, the population library of Department of Civil Affairs, social organization library,
Aged library, welfare library, marriage library etc.;
Second step, data quality accessment
(1) general rule management
The general rule management includes five groups of general, network, date, character and numerical value rules, it is described it is general include identity card,
The regular expression of phone number, mailbox, postcode and fixed-line telephone, the identity card is
^[1-9]\d{7}((0\d)|(1[0-2]))(([0|1|2]\d)|3[0-1])\d{3}$|^[1-9]\d{5}[1-9]\d
{ 3 } ((0 d) | (1 [0-2])) (([0 | 1 | 2] d) | 3 [0-1]) d { 3 } ([0-9] | X) $,
The rule of the identity card is described as
China second-generation identity card, such as 420106198311136666, regular length are 18, and first 17 are number, last position is number
Word or letter x, and it is necessary for legal effective ID card No.;One generation ID: such as 420106831113666, fixed length
Degree is 15, and 7 to 12 are six dates;
The regular expression of the phone number is
^1 ([38] [0-9] | 4 [579] | 5 [0-3,5-9] | 6 [6] | 7 [0135678] | 9 [89]) d { 8 } $,
The rule of the phone number is described as
Such as 13666666666, started with number 1, regular length is 11;
The regular expression of the mailbox is
^ w+ ([-+] w+) *@w+ ([-] w+) * w+ ([-] w+) * $,
The rule of the mailbox is described as
Such as 123 mail.com, can only occur in mailbox names English alphabet, number and underscore and cannot with underscore,
And with the ending of the characters such as .com .cn .edu;
The regular expression of the postcode be [1-9] d { 5 } (?!D), the rule of the postcode is described as
Beginning cannot be 0, totally 6 numbers;
The regular expression of the fixed-line telephone be d { 3 }-d { 8 } | d { 4 }-d { 7 }, the fixed-line telephone rule description
For
Such as 027-88880808-1, wherein 027 is area code, 1 is extension number, is separated with "-", and area code and extension number can not be filled out;
The network includes the address IPv4, the address IPv6 and MAC Address, and the regular expression of the address IPv4 is
^ ((25 [0-5] | 2 [0-4] d | [01]? d d?)) { 3 } (and 25 [0-5] | 2 [0-4] d | [01]? d d?) $,
The rule of the address IPv4 is described as
It such as 000.000.000.000, is made of 4 0~255 numerical value, is separated with " ";
The regular expression of the address IPv6 is
^ ([da-fA-F] { Isosorbide-5-Nitrae } :) { 7 } [da-fA-F] { Isosorbide-5-Nitrae } $,
The rule of the address IPv6 is described as
Such as CDCD:910A:222:9:8475:11:390:2020, it is made of 8 four hexadecimal numerical value, is separated with ": ",
It supports to write a Chinese character in simplified form or mix literary style simultaneously, it is recommended that using standard literary style;
The regular expression of the MAC Address is
[0-9a-fA-F] { 2 } (: [0-9a-fA-F] { 2 }) { 5 },
The rule of the MAC Address is described as
Such as 00-00-00-00-00-00, is formed with 6 two hexadecimal numbers, separated with "-";
The date includes YYYY.MM.DD, YYYYMMDD, YYYY/MM/DD, YYYY MM month DD day, YYYY and YYYYMM,
Wherein, the YYYY is the specific time, and the MM is specific month, and the DD is exact date, the rule of the YYYY/MM/DD
Then expression formula be (d { 4 })/(d { 1,2 })/(d { 1,2 });
The numerical value includes nonnegative integer, integer, non-negative floating number, floating number, integer band percentage sign, floating number percentage sign, whole
Number band per thousand sign and floating number per thousand signs, the regular expression of the nonnegative integer be ^ [1-9] d* | 0 $, the nonnegative integer
Rule be described as the character string of nonnegative integer format, such as 28;
Does is the regular expression of the integer ^-? [1-9] d* $, the rule of the integer is described as the character string of integer data format;
The regular expression ^ of the non-negative floating number d+ (d+)? $, the rule of the non-negative floating number are described as being non-negative
The character string of floating number format;
The regular expression of the floating number be ^ (-? d+) (d+)? $, the rule of the floating number are described as floating number lattice
The character string of formula;
(2) data quality model
Incidence relation according to the general rule management establishes data quality model, and the data quality model is based on DQAF
Data quality assessment model, the Data quality assessment model include entity table, incidence relation and rule description, the entity
The entity table name of table is selected from database, incidence relation of the incidence relation between main table and word table, the rule description
It is divided into null value inspection, codomain detection, normalized checking, logical check, repeated data inspection and referential integrity and checks six groups of rules
Type;
(3) quality-monitoring task
According to the title of the data quality model, all data of the data quality model is exported, according to quality model name
Title, quality model description, implementation strategy, nearest execution state and nearest execution time etc. comment the data quality model
Estimate, completes quality testing task;
(4) quality-monitoring is reported
According to the quality testing task, quality testing report is generated;
(5) quality appraisal report
According to the content of quality testing report, from data integrity, relevance, uniqueness, accuracy, consistency and rule
The comprehensive assessment quality of data only of plasticity six generates the data in the database and is based on database classification and data library name
The quality appraisal report of title, the quality appraisal report include quality score, quality score figure and data quality model scoring row
Row list, the quality score include overall quality scoring, quality model number and model rule number;The quality score figure includes matter
Measure grade trend figure and data aggregate distribution figure;The data quality model scoring ranking list scores according to data quality model arranges
Name.
2. a kind of government data quality assessment techniques based on big data technology according to claim 1, it is characterised in that:
The data quality accessment can be described as supporting wildcard by the customized length range of fixed character, the rule of the fixed character
Accord with " * " and "? ";" * " represents multiple any characters, "? " an any character is represented, as ABC*:ABC, ABCD, ABCDE are accorded with
Close the expression formula;A? C: only ABC, ADC meets, and ABDC does not meet expression formula then.
3. a kind of government data quality assessment techniques based on big data technology according to claim 1, it is characterised in that:
In the data quality model, manually set detection cycle, according to detection cycle timing to data null value, codomain, standardize, patrol
It collects, reference property, repeated data inspection.
4. a kind of government data quality assessment techniques based on big data technology according to claim 1, it is characterised in that:
The Data quality assessment model based on DQAF, GB/T 25000.24-2017 " system and software based on DQAF and China
Engineering system and software quality require and evaluate (SQuaRE) " the 12nd partial data quality model and the 24th partial data quality
Measurement establishes general data quality assessment models and thematic data Evaluation Model on Quality for Urban Data center, supports null value inspection
It looks into, codomain inspection, normalized checking, logical check, repeated data inspection, referential integrity inspection, outlier inspection, timeliness inspection
It looks into, lack advanced, comprehensive, the expansible quality evaluation algorithm technologies such as inspection, fluctuation inspection, balance inspection, meet
The definition of each rule-like in Constructing data center, data governance process, establish science assessment models, finally from integrality,
Six normalization, consistency, accuracy, uniqueness, relevance dimensions carry out comprehensive assessment to Urban Data center.
5. a kind of government data abnormal data recovery technique based on big data technology according to claim 1, feature
It is: according to the data quality accessment, deals abnormal data, checks the abnormal data, and to the abnormal data
Quality of data reparation is carried out, the method for the quality of data reparation includes repaired by hand, rule is repaired and depth reparation, the hand
Work reparation is manually by computer keyboard to database update abnormal data and the correct data information of typing, and the rule is repaired
Multiple is according to the correct data information of general rule typing, and the depth reparation is deep learning abnormal data recovery technique, is filled
Divide using mature Hadoop/Spark big data technology, large-scale data automation improvement realized by deep learning algorithm,
It screens abnormal data and repairs abnormal data, abnormal data reparation is carried out using deep learning method for abnormal data, mainly
Deep learning method including average value filling, K minimum distance method, recurrence, the estimation of very big liny and multiple interpolation, in conjunction with
Manual review realizes that efficiently accurately data are administered, and improve the quality of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910156894.6A CN109992576A (en) | 2019-03-01 | 2019-03-01 | A kind of government data quality evaluation and abnormal data recovery technique based on big data technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910156894.6A CN109992576A (en) | 2019-03-01 | 2019-03-01 | A kind of government data quality evaluation and abnormal data recovery technique based on big data technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109992576A true CN109992576A (en) | 2019-07-09 |
Family
ID=67130091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910156894.6A Pending CN109992576A (en) | 2019-03-01 | 2019-03-01 | A kind of government data quality evaluation and abnormal data recovery technique based on big data technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109992576A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400299A (en) * | 2020-06-04 | 2020-07-10 | 成都四方伟业软件股份有限公司 | Method and system for testing fusion quality of multiple data |
CN112035456A (en) * | 2020-08-31 | 2020-12-04 | 重庆长安汽车股份有限公司 | Real-time detection method for user behavior data quality and storage medium |
CN112380204A (en) * | 2020-11-16 | 2021-02-19 | 浙江大华技术股份有限公司 | Data quality evaluation method and device |
CN113360548A (en) * | 2021-06-29 | 2021-09-07 | 平安普惠企业管理有限公司 | Data processing method, device, equipment and medium based on data asset analysis |
CN113468158A (en) * | 2021-07-13 | 2021-10-01 | 广域铭岛数字科技有限公司 | Data repair method, system, electronic device and medium |
CN116341987A (en) * | 2023-04-11 | 2023-06-27 | 北京数字政通科技股份有限公司 | Configurable evaluation method and system thereof |
CN116777288A (en) * | 2023-06-28 | 2023-09-19 | 广东裕太科技有限公司 | Government system information integration system and application method thereof |
CN117743310A (en) * | 2023-12-19 | 2024-03-22 | 云宝宝大数据产业发展有限责任公司 | Full-period data management method, system and storage medium |
CN118297444A (en) * | 2024-06-06 | 2024-07-05 | 中国信息通信研究院 | Artificial intelligence-oriented data set quality general assessment method |
CN118503888A (en) * | 2024-07-18 | 2024-08-16 | 北京亚信数据有限公司 | Medical insurance data quality detection method and device, electronic equipment and medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103873298A (en) * | 2014-03-14 | 2014-06-18 | 浪潮通信信息系统有限公司 | Configurable method for automatically monitoring data quality of maintenance-center OMC (Operation and Maintenance Center) northbound interfaces |
CN106855962A (en) * | 2015-12-09 | 2017-06-16 | 星际空间(天津)科技发展有限公司 | A kind of method for building government affairs big data platform |
CN107368957A (en) * | 2017-07-04 | 2017-11-21 | 广西电网有限责任公司电力科学研究院 | A kind of construction method of equipment condition monitoring quality of data evaluation and test system |
CN107491381A (en) * | 2017-07-04 | 2017-12-19 | 广西电网有限责任公司电力科学研究院 | A kind of equipment condition monitoring quality of data evaluating system |
CN107545043A (en) * | 2017-08-09 | 2018-01-05 | 国政通科技股份有限公司 | A kind of data application method and device based on data quality checking |
CN107545349A (en) * | 2016-06-28 | 2018-01-05 | 国网天津市电力公司 | A kind of Data Quality Analysis evaluation model towards electric power big data |
CN108595563A (en) * | 2018-04-13 | 2018-09-28 | 林秀丽 | A kind of data quality management method and device |
CN108615115A (en) * | 2018-05-02 | 2018-10-02 | 山东汇贸电子口岸有限公司 | A kind of implementation method of government data collecting flowchart |
CN108830029A (en) * | 2017-11-29 | 2018-11-16 | 上海海洋大学 | A kind of quality evaluation of typhoon data and restorative procedure |
-
2019
- 2019-03-01 CN CN201910156894.6A patent/CN109992576A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103873298A (en) * | 2014-03-14 | 2014-06-18 | 浪潮通信信息系统有限公司 | Configurable method for automatically monitoring data quality of maintenance-center OMC (Operation and Maintenance Center) northbound interfaces |
CN106855962A (en) * | 2015-12-09 | 2017-06-16 | 星际空间(天津)科技发展有限公司 | A kind of method for building government affairs big data platform |
CN107545349A (en) * | 2016-06-28 | 2018-01-05 | 国网天津市电力公司 | A kind of Data Quality Analysis evaluation model towards electric power big data |
CN107368957A (en) * | 2017-07-04 | 2017-11-21 | 广西电网有限责任公司电力科学研究院 | A kind of construction method of equipment condition monitoring quality of data evaluation and test system |
CN107491381A (en) * | 2017-07-04 | 2017-12-19 | 广西电网有限责任公司电力科学研究院 | A kind of equipment condition monitoring quality of data evaluating system |
CN107545043A (en) * | 2017-08-09 | 2018-01-05 | 国政通科技股份有限公司 | A kind of data application method and device based on data quality checking |
CN108830029A (en) * | 2017-11-29 | 2018-11-16 | 上海海洋大学 | A kind of quality evaluation of typhoon data and restorative procedure |
CN108595563A (en) * | 2018-04-13 | 2018-09-28 | 林秀丽 | A kind of data quality management method and device |
CN108615115A (en) * | 2018-05-02 | 2018-10-02 | 山东汇贸电子口岸有限公司 | A kind of implementation method of government data collecting flowchart |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400299A (en) * | 2020-06-04 | 2020-07-10 | 成都四方伟业软件股份有限公司 | Method and system for testing fusion quality of multiple data |
CN112035456B (en) * | 2020-08-31 | 2024-05-03 | 重庆长安汽车股份有限公司 | Real-time detection method for user behavior data quality and storage medium |
CN112035456A (en) * | 2020-08-31 | 2020-12-04 | 重庆长安汽车股份有限公司 | Real-time detection method for user behavior data quality and storage medium |
CN112380204A (en) * | 2020-11-16 | 2021-02-19 | 浙江大华技术股份有限公司 | Data quality evaluation method and device |
CN113360548A (en) * | 2021-06-29 | 2021-09-07 | 平安普惠企业管理有限公司 | Data processing method, device, equipment and medium based on data asset analysis |
CN113468158A (en) * | 2021-07-13 | 2021-10-01 | 广域铭岛数字科技有限公司 | Data repair method, system, electronic device and medium |
CN113468158B (en) * | 2021-07-13 | 2023-10-31 | 广域铭岛数字科技有限公司 | Data restoration method, system, electronic equipment and medium |
CN116341987A (en) * | 2023-04-11 | 2023-06-27 | 北京数字政通科技股份有限公司 | Configurable evaluation method and system thereof |
CN116341987B (en) * | 2023-04-11 | 2023-10-31 | 北京数字政通科技股份有限公司 | Configurable evaluation method and system thereof |
CN116777288A (en) * | 2023-06-28 | 2023-09-19 | 广东裕太科技有限公司 | Government system information integration system and application method thereof |
CN116777288B (en) * | 2023-06-28 | 2024-03-12 | 广东裕太科技有限公司 | Government system information integration system and application method thereof |
CN117743310A (en) * | 2023-12-19 | 2024-03-22 | 云宝宝大数据产业发展有限责任公司 | Full-period data management method, system and storage medium |
CN118297444A (en) * | 2024-06-06 | 2024-07-05 | 中国信息通信研究院 | Artificial intelligence-oriented data set quality general assessment method |
CN118503888A (en) * | 2024-07-18 | 2024-08-16 | 北京亚信数据有限公司 | Medical insurance data quality detection method and device, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992576A (en) | A kind of government data quality evaluation and abnormal data recovery technique based on big data technology | |
Hart et al. | Reference data and geocoding quality: Examining completeness and positional accuracy of street geocoded crime incidents | |
Ngo et al. | Estimating the confidence intervals for DEA efficiency scores of Asia-Pacific airlines | |
Gómez et al. | Governance and type of industry as determinants of corporate social responsibility disclosures in Latin America | |
CN112507936A (en) | Image information auditing method and device, electronic equipment and readable storage medium | |
CN104834731A (en) | Recommendation method and device for self-media information | |
CN103577404A (en) | Microblog-oriented discovery method for new emergencies | |
Dong | Who will trade bauxite with whom? Finding potential links through link prediction | |
Morris | Manifestation of emerging specialties in journal literature: A growth model of papers, references, exemplars, bibliographic coupling, cocitation, and clustering coefficient distribution | |
Mazeika et al. | The impact of geocoding method on the positional accuracy of residential burglaries reported to police | |
Lyu et al. | Global scientific production, international cooperation, and knowledge evolution of public administration | |
CN111858627B (en) | System and method for inquiring academic calendar based on blockchain | |
Li et al. | Identification of Critical Risks in Hosting Sports Mega-events: a Social Network Perspective | |
CN110941638B (en) | Application classification rule base construction method, application classification method and device | |
Sumić et al. | Favourable culture for crisis management–an empirical evaluation | |
Zhou et al. | Dynamic development analysis of complex network research: A bibliometric analysis | |
Lei | [Retracted] Association Rule Mining Algorithm in College Students’ Quality Evaluation System | |
CN109670728A (en) | A kind of Ship Design quality information management system based on database | |
TW201539217A (en) | A document analysis system, document analysis method and document analysis program | |
He et al. | Analyzing hospital medical efficiency of administration and medical treatment in China | |
Jiang et al. | [Retracted] Employment Recommendation for Education Talents Based on Big Data Precision Technology | |
CN104239314A (en) | Search word expanding method and system | |
Gu | [Retracted] Evaluation of Teaching Quality on IP Environment Driven by Multiple Values Theory Based on Big Data | |
Sun et al. | An evaluation model for the teaching reform of the physical education industry | |
Li et al. | Revaluation of occupancy duration for live load using big data of enterprise credit information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190709 |
|
RJ01 | Rejection of invention patent application after publication |