CN108304464A - A kind of method and device of data cleansing - Google Patents

A kind of method and device of data cleansing Download PDF

Info

Publication number
CN108304464A
CN108304464A CN201711437598.0A CN201711437598A CN108304464A CN 108304464 A CN108304464 A CN 108304464A CN 201711437598 A CN201711437598 A CN 201711437598A CN 108304464 A CN108304464 A CN 108304464A
Authority
CN
China
Prior art keywords
cleaning rule
original
rule
object table
data cleansing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711437598.0A
Other languages
Chinese (zh)
Other versions
CN108304464B (en
Inventor
张毅然
廖惠琳
冯是聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhizhi Heshu Technology Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201711437598.0A priority Critical patent/CN108304464B/en
Publication of CN108304464A publication Critical patent/CN108304464A/en
Application granted granted Critical
Publication of CN108304464B publication Critical patent/CN108304464B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Abstract

A kind of method and device of data cleansing, this method include:Respectively each original table and object table configure corresponding cleaning rule;Establish the mapping relations between the original table and the object table;The cleaning rule of the original table and the object table is synchronized in the mapping relations;Data cleansing is carried out using the cleaning rule after synchronizing.This programme can reduce the human configuration time that cleaning rule is added in the mapping of original table to object table, easy, not error-prone.

Description

A kind of method and device of data cleansing
Technical field
The embodiment of the present disclosure relates to, but are not limited to data processing field, espespecially a kind of method and device of data cleansing.
Background technology
Universal with internet, data are more important, and data can assist website to carry out basic work, can be with Police is helped to handle a case, it might even be possible to predict stock price.Since the source of data is very extensive, the quality of data is with regard to most important. Data cleansing is to ensure that the important means of the quality of data, the result of data cleansing directly influence the modelling effect generated by data And final conclusion.
The main method of data cleansing has the data of removal/completion missing, the number of removal/modification format and content mistake According to, the data of removal/modification logic error, relevance verification etc..In practical applications, the method for cleaning is mainly by addition rule To realize.Tables of data to be treated is usually referred to as original table, handles the table generated later object table, rule is original Table is added during being converted to object table.When the data volume of original table and object table is all very big, in original table to mesh The work that cleaning rule is a cumbersome and easy error is added in the mapping of mark table.
Invention content
The embodiment of the present invention provides a kind of method and device of data cleansing, to reduce the mapping in original table to object table The human configuration time of upper addition cleaning rule.
A kind of method of data cleansing, including:
Respectively each original table and object table configure corresponding cleaning rule;
Establish the mapping relations between the original table and the object table;
The cleaning rule of the original table and the object table is synchronized in the mapping relations;
Data cleansing is carried out using the cleaning rule after synchronizing.
Optionally, after the cleaning rule by the original table and the object table is synchronized in the mapping relations, Further include:
It modifies to the cleaning rule after synchronizing.
A kind of device of data cleansing, wherein including:
Configuration module configures corresponding cleaning rule for respectively each original table and object table;
Module is established, for establishing the mapping relations between the original table and the object table;
Synchronization module, for the cleaning rule of the original table and the object table to be synchronized to the mapping relations;
Cleaning module, for carrying out data cleansing using the cleaning rule after synchronizing.
Optionally, described device further includes:
Modified module is modified for the cleaning rule after being synchronized to the synchronization module.
A kind of device of data cleansing, including:Memory, processor and storage can transport on a memory and on a processor Capable computer program, wherein the processor realizes following steps when executing described program:
Respectively each original table and object table configure corresponding cleaning rule;
Establish the mapping relations between the original table and the object table;
The cleaning rule of the original table and the object table is synchronized in the mapping relations;
Data cleansing is carried out using the cleaning rule after synchronizing.
To sum up, the embodiment of the present invention provides a kind of method and device of data cleansing, it is possible to reduce in original table to target The human configuration time of cleaning rule is added in the mapping of table.
Description of the drawings
Fig. 1 is a kind of flow chart of the method for data cleansing of the embodiment of the present disclosure;
Fig. 2 is a kind of schematic diagram of the device of data cleansing of the embodiment of the present disclosure.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the absence of conflict, in the embodiment and embodiment in the application Feature mutually can arbitrarily combine.
Fig. 1 is a kind of flow chart of the method for data cleansing of the embodiment of the present disclosure, as shown in Figure 1, the side of the present embodiment Method includes:
Step 11 is respectively that each original table and object table configure corresponding cleaning rule;
Step 12 establishes mapping relations between the original table and the object table;
The cleaning rule of the original table and the object table is synchronized in the mapping relations by step 13;
Step 14 carries out data cleansing using the cleaning rule after synchronizing.
This method of the present embodiment resettles original dependent on the cleaning rule first added respectively to original table and object table Mapping relations between table and object table table, then by the regular and synchronized on original table and object table between original table and object table It establishes in the mapping relations between table, reduces when adding the human configuration of cleaning rule in the mapping of original table to object table Between.It modifies in the last mapping relations established between table between original table and object table again, adds some personalized rule Then, that is, foundation of the original table to object table cleaning rule is completed.
Assuming that user will complete to carry out conversion process to N number of original table, M object table is generated, needs to establish N number of original table To the mapping of M object table, and cleaning rule is added in mapping between each group of table, specific implementation mode in accordance with the following steps according to Secondary progress:
Step 101 adds cleaning rule respectively for N number of original table;
It is added respectively for N number of original table (origin_table_1, origin_table_2 ..., origin_table_N) Cleaning rule, cleaning rule are respectively rule_origin_1, rule_origin_1 ..., rule_origin_N, need to add N It is secondary, as shown in table 1.
Table 1
origin_table_1 rule_origin_1
origin_table_2 rule_origin_2
…… ……
origin_table_N rule_origin_N
Step 102 adds cleaning rule respectively for M object table.
It is added respectively for M object table (target_table_1, target_table_2 ..., target_table_M) Cleaning rule, cleaning rule are respectively rule_target_1, rule_target_1 ..., rule_target_N, need to add M It is secondary, as shown in table 2.
Table 2
target_table_1 target_table_2 …… target_table_M
rule_target_1 rule_target_2 …… rule_target_M
Step 103 establishes the mapping relations that object table is generated by original table between N number of original table and M object table, most It is have N*M more.
In N number of original table (origin_table_1, origin_table_2 ..., origin_table_N) and M target It is established by original table between M object table of table (target_table_1, target_table_2 ..., target_table_M) The mapping relations for generating object table have N*M, as shown in table 3.
Table 3
Step 104, by the regular and synchronized on N number of original table and M object table to the mapping of N*M original table and object table In relationship.
By N number of original table (origin_table_1, origin_table_2 ..., origin_table_N) and M target Regular and synchronized on table (target_table_1, target_table_2 ..., target_table_M) is to N*M original table In the mapping relations of object table, as shown in table 4.
Table 4
Optionally, step 105, to modify in the table level mapping relations of N*M original table and object table, add individual character The rule of change.
By above example as it can be seen that the data cleaning method of the present embodiment reduces the configuration of data cleansing Cleaning Process rule Number and the time.
Fig. 2 is a kind of schematic diagram of the device of data cleansing of the embodiment of the present disclosure, as shown in Fig. 2, the dress of the present embodiment Set including:
Configuration module configures corresponding cleaning rule for respectively each original table and object table;
Module is established, for establishing the mapping relations between the original table and the object table;
Synchronization module, for the cleaning rule of the original table and the object table to be synchronized to the mapping relations;
Cleaning module, for carrying out data cleansing using the cleaning rule after synchronizing.
In one embodiment, the device of the data cleansing can also include:
Modified module is modified for the cleaning rule after being synchronized to the synchronization module.
The present embodiment also provides a kind of device of data cleansing, a kind of device of data cleansing, including:Memory, processing Device and storage are on a memory and the computer program that can run on a processor, wherein the processor execution described program Shi Shixian following steps:
Respectively each original table and object table configure corresponding cleaning rule;
Establish the mapping relations between the original table and the object table;
The cleaning rule of the original table and the object table is synchronized in the mapping relations;
Data cleansing is carried out using the cleaning rule after synchronizing.
The device of the present embodiment, it is possible to reduce the human configuration of cleaning rule is added in the mapping of original table to object table It is time, easy, it is not error-prone.
The embodiment of the present invention additionally provides a kind of computer readable storage medium, is stored with computer executable instructions, The computer executable instructions are performed the method for realizing the data cleansing.
One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program Related hardware is completed, and described program can be stored in computer readable storage medium, such as read-only memory, disk or CD Deng.Optionally, all or part of step of above-described embodiment can also be realized using one or more integrated circuits.Accordingly Ground, the form that hardware may be used in each module/unit in above-described embodiment are realized, the shape of software function module can also be used Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
It these are only the preferred embodiment of the present invention, certainly, the invention may also have other embodiments, without departing substantially from this In the case of spirit and its essence, those skilled in the art make various corresponding changes in accordance with the present invention And deformation, but these corresponding change and deformations should all belong to the protection domain of appended claims of the invention.

Claims (5)

1. a kind of method of data cleansing, including:
Respectively each original table and object table configure corresponding cleaning rule;
Establish the mapping relations between the original table and the object table;
The cleaning rule of the original table and the object table is synchronized in the mapping relations;
Data cleansing is carried out using the cleaning rule after synchronizing.
2. the method as described in claim 1, it is characterised in that:The cleaning rule by the original table and the object table After being synchronized in the mapping relations, further include:
It modifies to the cleaning rule after synchronizing.
3. a kind of device of data cleansing, which is characterized in that including:
Configuration module configures corresponding cleaning rule for respectively each original table and object table;
Module is established, for establishing the mapping relations between the original table and the object table;
Synchronization module, for the cleaning rule of the original table and the object table to be synchronized to the mapping relations;
Cleaning module, for carrying out data cleansing using the cleaning rule after synchronizing.
4. device as claimed in claim 3, it is characterised in that:Described device further includes:
Modified module is modified for the cleaning rule after being synchronized to the synchronization module.
5. a kind of device of data cleansing, including:Memory, processor and storage can be run on a memory and on a processor Computer program, which is characterized in that the processor realizes following steps when executing described program:
Respectively each original table and object table configure corresponding cleaning rule;
Establish the mapping relations between the original table and the object table;
The cleaning rule of the original table and the object table is synchronized in the mapping relations;It is advised using the cleaning after synchronizing Then carry out data cleansing.
CN201711437598.0A 2017-12-26 2017-12-26 Data cleaning method and device Active CN108304464B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711437598.0A CN108304464B (en) 2017-12-26 2017-12-26 Data cleaning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711437598.0A CN108304464B (en) 2017-12-26 2017-12-26 Data cleaning method and device

Publications (2)

Publication Number Publication Date
CN108304464A true CN108304464A (en) 2018-07-20
CN108304464B CN108304464B (en) 2021-01-29

Family

ID=62867470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711437598.0A Active CN108304464B (en) 2017-12-26 2017-12-26 Data cleaning method and device

Country Status (1)

Country Link
CN (1) CN108304464B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120266254A1 (en) * 2010-12-14 2012-10-18 International Business Machines Corporation De-Identification of Data
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
US20130185349A1 (en) * 2002-09-06 2013-07-18 Oracle International Corporation Method and apparatus for a multiplexed active data window in a near real-time business intelligence system
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN104537103A (en) * 2015-01-12 2015-04-22 用友医疗卫生信息系统有限公司 Data processing method and device
CN105069033A (en) * 2015-07-22 2015-11-18 北京京东尚科信息技术有限公司 Method and device for creating database table model
CN105095327A (en) * 2014-05-23 2015-11-25 深圳市珍爱网信息技术有限公司 Distributed ELT system and scheduling method
CN105512283A (en) * 2015-12-04 2016-04-20 国网江西省电力公司信息通信分公司 Data quality management and control method and device
CN107229662A (en) * 2016-03-25 2017-10-03 阿里巴巴集团控股有限公司 Data cleaning method and device
CN107239581A (en) * 2017-07-07 2017-10-10 小草数语(北京)科技有限公司 Data cleaning method and device
CN107506383A (en) * 2017-07-25 2017-12-22 中国建设银行股份有限公司 A kind of audit data processing method and computer equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130185349A1 (en) * 2002-09-06 2013-07-18 Oracle International Corporation Method and apparatus for a multiplexed active data window in a near real-time business intelligence system
US20120266254A1 (en) * 2010-12-14 2012-10-18 International Business Machines Corporation De-Identification of Data
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
CN105095327A (en) * 2014-05-23 2015-11-25 深圳市珍爱网信息技术有限公司 Distributed ELT system and scheduling method
CN104537103A (en) * 2015-01-12 2015-04-22 用友医疗卫生信息系统有限公司 Data processing method and device
CN105069033A (en) * 2015-07-22 2015-11-18 北京京东尚科信息技术有限公司 Method and device for creating database table model
CN105512283A (en) * 2015-12-04 2016-04-20 国网江西省电力公司信息通信分公司 Data quality management and control method and device
CN107229662A (en) * 2016-03-25 2017-10-03 阿里巴巴集团控股有限公司 Data cleaning method and device
CN107239581A (en) * 2017-07-07 2017-10-10 小草数语(北京)科技有限公司 Data cleaning method and device
CN107506383A (en) * 2017-07-25 2017-12-22 中国建设银行股份有限公司 A kind of audit data processing method and computer equipment

Also Published As

Publication number Publication date
CN108304464B (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN107392842B (en) Image stylization processing method and device, computing equipment and computer storage medium
JP2019532366A5 (en)
CN107516290B (en) Image conversion network acquisition method and device, computing equipment and storage medium
JP2021500658A5 (en)
CN107277615B (en) Live broadcast stylization processing method and device, computing device and storage medium
JP2009528604A5 (en)
CN107277391B (en) Image conversion network processing method, server, computing device and storage medium
RU2016137787A (en) PERSONALIZED SEARCH BASED ON EXPLICIT SUBMISSION OF SIGNALS
WO2016074370A1 (en) Keyvalue database data table updating method and data table updating device
CN111475192B (en) Method, device, storage medium and system for performing thermal augmentation on game server
JP7098327B2 (en) Information processing system, function creation method and function creation program
JP2009528604A (en) Method for comparing a first computer-aided 3D model with a second computer-aided 3D model
CN107480260B (en) Big data real-time analysis method and device, computing equipment and computer storage medium
CN105550206A (en) Version control method and device for structured query language
CN107392316B (en) Network training method and device, computing equipment and computer storage medium
CN104021219A (en) Method and device for generating data template
JP2015149052A (en) Mesh quality improvement in computer aided engineering
JP6586850B2 (en) Table reconstruction apparatus and method
CN111047434B (en) Operation record generation method and device, computer equipment and storage medium
JP5194581B2 (en) Document processing apparatus and document processing program
US8732655B2 (en) Systems and methods for metamodel transformation
WO2019057097A1 (en) Convolution operation method and apparatus, computer device, and computer-readable storage medium
JP6331756B2 (en) Test case generation program, test case generation method, and test case generation apparatus
CN108304464A (en) A kind of method and device of data cleansing
KR102239588B1 (en) Image processing method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220602

Address after: 15, second floor, east side of clean coal workshop, No. 68, Shijingshan Road, Shijingshan District, Beijing 100043 (cluster registration)

Patentee after: Beijing Zhizhi Heshu Technology Co.,Ltd.

Address before: 102218 5th floor, building 1, China Coal Construction Group building, 398 Zhongdong Road, Dongxiaokou Town, Changping District, Beijing

Patentee before: MININGLAMP SOFTWARE SYSTEMS Co.,Ltd.

TR01 Transfer of patent right