CN102411569A - Database conversion and cleaning information processing method - Google Patents
Database conversion and cleaning information processing method Download PDFInfo
- Publication number
- CN102411569A CN102411569A CN2010102879710A CN201010287971A CN102411569A CN 102411569 A CN102411569 A CN 102411569A CN 2010102879710 A CN2010102879710 A CN 2010102879710A CN 201010287971 A CN201010287971 A CN 201010287971A CN 102411569 A CN102411569 A CN 102411569A
- Authority
- CN
- China
- Prior art keywords
- update
- temp
- target matrix
- target
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a database conversion and cleaning information processing method, which comprises the following steps of: 1) connecting a target database to a data source; 2) selecting a target data table to be cleaned in the target database; 3) selecting an update mode, executing the fourth step if the incremental update is adopted, and executing the tenth step if the total update is adopted; 4) obtaining the maximum update time last_update in the target data table, and defaulting the last_update as the set time if the target data table is null; 5) screening all records with the update time greater than the last_update in the data source to a temporary table temp_table; 6) deleting repeated records in the temp_table by restraining fields in the target data table; 7) comparing the target data table and the temp_table and obtaining the records of the temp_table in the target data table, and the like. Compared with the prior art, the method has the advantages that the problems of data repetitiveness and omission in the data cleaning process are effectively avoided, the data consistency and the completeness are ensured, and the like.
Description
Technical field
The present invention relates to a kind of database correlation technique, especially relate to a kind of database conversion and cleaning information disposal route.
Background technology
ETL is also claimed in the cleaning of data and conversion (Extract, Transform, Load), is the problem that often need solve in database field, especially data warehouse field.ETL cleans after being responsible for data that distribute, in the heterogeneous data source such as relation data, flat data file etc. are drawn into interim middle layer, conversion, integrated; Be loaded at last in the object library (data warehouse, Data Mart etc.), become the basis of on-line analytical processing, data mining.
Though the professional tool about data cleansing has much in the market; Like the Datastage of Ascential company, the Powercenter of Informatica company, the ETL Automation of NCR Teradata company etc.; These instruments are mostly powerful, but it uses also comparatively complicated simultaneously.But as general middle-size and small-size application, use these professional tool costs too high, generally can then seek some comparatively instruments of lightweight, like SSIS or directly use storing process programming realization.
Summary of the invention
The object of the invention is exactly for the defective that overcomes above-mentioned prior art existence a kind of database conversion and cleaning information disposal route to be provided.
The object of the invention can be realized through following technical scheme:
A kind of database conversion and cleaning information disposal route is characterized in that, may further comprise the steps:
1) target database is connected to data source;
The target matrix that 2) need clean in the select target database;
3) select update mode, if incremental update, then execution in step 4); If full dose is upgraded, then execution in step 10);
4) obtain last_update update time maximum in the target matrix, if target matrix is empty, then last_update is defaulted as setting-up time;
5) be recorded in a temporary table temp_table greater than all of last_update update time in the garbled data source;
6) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
7) compare through target matrix and temporary table temp_table, obtain being present among the temporary table temp_table record in the target matrix;
8) be present in the record in the target matrix among the rejecting temporary table temp_table;
9) with remaining whole records among the temporary table temp_table, insert in the target matrix, and execution in step 14);
10) with the data of data source one end, be organized as the target data list structure form, and with whole recorded and stored to temporary table temp_table;
11) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
12) empty target matrix;
13) the whole records among the temporary table temp_table are inserted in the target matrix.
14) record upgrades daily record.
Setting-up time in the described step 4) can be on January 1st, 1900.
Described step 6) bind field is one or more.
Described step 11) bind field is one or more.
Compared with prior art, the present invention has the following advantages:
1, specializes the flow process of data-switching and cleaning, can effectively accomplish the Data Update of full dose and two kinds of update modes of increment;
2, use service logic clearly, can effectively avoid the data in the data cleansing process to repeat and the omission problem, guarantee the consistance and the integrality of data.
Description of drawings
Fig. 1 is a process flow diagram of the present invention;
Fig. 2 is a hardware configuration synoptic diagram of the present invention.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is elaborated.
Embodiment
Like Fig. 1, shown in Figure 2, a kind of database conversion and cleaning information disposal route may further comprise the steps:
1) target database 1 is connected to data source 2;
The target matrix that 2) need clean in the select target database 1;
3) select update mode, if incremental update, then execution in step 4); If full dose is upgraded, then execution in step 10);
4) obtain last_update update time maximum in the target matrix, if target matrix is empty, then last_update is defaulted as setting-up time;
5) be recorded in a temporary table temp_table greater than all of last_update update time in the garbled data source 2;
6) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
7) compare through target matrix and temporary table temp_table, obtain being present among the temporary table temp_table record in the target matrix;
8) be present in the record in the target matrix among the rejecting temporary table temp_table;
9) with remaining whole records among the temporary table temp_table, insert in the target matrix, and execution in step 14);
10) with the data of data source one end, be organized as the target data list structure form, and with whole recorded and stored to temporary table temp_table;
11) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
12) empty target matrix;
13) the whole records among the temporary table temp_table are inserted in the target matrix.
14) record upgrades daily record.
Setting-up time in the described step 4) can be on January 1st, 1900.
Described step 6) bind field is one or more.
Described step 11) bind field is one or more.
Claims (4)
1. a database is changed and the cleaning information disposal route, it is characterized in that, may further comprise the steps:
1) target database is connected to data source;
The target matrix that 2) need clean in the select target database;
3) select update mode, if incremental update, then execution in step 4); If full dose is upgraded, then execution in step 10);
4) obtain last_update update time maximum in the target matrix, if target matrix is empty, then last_update is defaulted as setting-up time;
5) be recorded in a temporary table temp_table greater than all of last_update update time in the garbled data source;
6) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
7) compare through target matrix and temporary table temp_table, obtain being present among the temporary table temp_table record in the target matrix;
8) be present in the record in the target matrix among the rejecting temporary table temp_table;
9) with remaining whole records among the temporary table temp_table, insert in the target matrix, and execution in step 14);
10) with the data of data source one end, be organized as the target data list structure form, and with whole recorded and stored to temporary table temp_table;
11) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
12) empty target matrix;
13) the whole records among the temporary table temp_table are inserted in the target matrix.
14) record upgrades daily record.
2. a kind of database conversion according to claim 1 and cleaning information disposal route is characterized in that the setting-up time in the described step 4) can be on January 1st, 1900.
3. a kind of database conversion according to claim 1 and cleaning information disposal route is characterized in that described step 6) bind field is one or more.
4. a kind of database conversion according to claim 1 and cleaning information disposal route is characterized in that described step 11) bind field is one or more.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102879710A CN102411569A (en) | 2010-09-20 | 2010-09-20 | Database conversion and cleaning information processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102879710A CN102411569A (en) | 2010-09-20 | 2010-09-20 | Database conversion and cleaning information processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102411569A true CN102411569A (en) | 2012-04-11 |
Family
ID=45913646
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010102879710A Pending CN102411569A (en) | 2010-09-20 | 2010-09-20 | Database conversion and cleaning information processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102411569A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103473375A (en) * | 2013-09-29 | 2013-12-25 | 方正国际软件有限公司 | Data cleaning method and data cleaning system |
CN103530375A (en) * | 2013-10-15 | 2014-01-22 | 北京国双科技有限公司 | Method and device for data source matching |
CN103593447A (en) * | 2013-11-18 | 2014-02-19 | 北京国双科技有限公司 | Data processing method and device applied to database table |
CN107729222A (en) * | 2017-07-26 | 2018-02-23 | 上海壹账通金融科技有限公司 | User behavior statistical method, system, computer equipment and storage medium |
WO2018127116A1 (en) * | 2017-01-09 | 2018-07-12 | 腾讯科技(深圳)有限公司 | Data cleaning method and apparatus, and computer-readable storage medium |
CN109634971A (en) * | 2018-11-07 | 2019-04-16 | 平安科技(深圳)有限公司 | Data-updating method, device, equipment and computer readable storage medium |
CN110147362A (en) * | 2019-04-04 | 2019-08-20 | 中电科大数据研究院有限公司 | One kind is based on the acquisition of event driven DOC DATA and processing system and its method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6208990B1 (en) * | 1998-07-15 | 2001-03-27 | Informatica Corporation | Method and architecture for automated optimization of ETL throughput in data warehousing applications |
CN101075304A (en) * | 2006-05-18 | 2007-11-21 | 河北全通通信有限公司 | Method for constructing decision supporting system of telecommunication industry based on database |
CN101183387A (en) * | 2007-12-14 | 2008-05-21 | 沈阳东软软件股份有限公司 | Increment data capturing method and system |
CN101504664A (en) * | 2009-03-18 | 2009-08-12 | 中国工商银行股份有限公司 | Apparatus and method for extracting, converting and loading total source data |
CN101621529A (en) * | 2008-06-30 | 2010-01-06 | 上海全成通信技术有限公司 | High-efficient and low-cost loading method for heterogeneous mass data |
CN101697126A (en) * | 2009-10-28 | 2010-04-21 | 山东中创软件商用中间件股份有限公司 | ETL realization method for incremental data of Excel file |
-
2010
- 2010-09-20 CN CN2010102879710A patent/CN102411569A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6208990B1 (en) * | 1998-07-15 | 2001-03-27 | Informatica Corporation | Method and architecture for automated optimization of ETL throughput in data warehousing applications |
CN101075304A (en) * | 2006-05-18 | 2007-11-21 | 河北全通通信有限公司 | Method for constructing decision supporting system of telecommunication industry based on database |
CN101183387A (en) * | 2007-12-14 | 2008-05-21 | 沈阳东软软件股份有限公司 | Increment data capturing method and system |
CN101621529A (en) * | 2008-06-30 | 2010-01-06 | 上海全成通信技术有限公司 | High-efficient and low-cost loading method for heterogeneous mass data |
CN101504664A (en) * | 2009-03-18 | 2009-08-12 | 中国工商银行股份有限公司 | Apparatus and method for extracting, converting and loading total source data |
CN101697126A (en) * | 2009-10-28 | 2010-04-21 | 山东中创软件商用中间件股份有限公司 | ETL realization method for incremental data of Excel file |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103473375A (en) * | 2013-09-29 | 2013-12-25 | 方正国际软件有限公司 | Data cleaning method and data cleaning system |
CN103530375A (en) * | 2013-10-15 | 2014-01-22 | 北京国双科技有限公司 | Method and device for data source matching |
CN103593447A (en) * | 2013-11-18 | 2014-02-19 | 北京国双科技有限公司 | Data processing method and device applied to database table |
CN103593447B (en) * | 2013-11-18 | 2017-02-08 | 北京国双科技有限公司 | Data processing method and device applied to database table |
WO2018127116A1 (en) * | 2017-01-09 | 2018-07-12 | 腾讯科技(深圳)有限公司 | Data cleaning method and apparatus, and computer-readable storage medium |
CN108287835A (en) * | 2017-01-09 | 2018-07-17 | 腾讯科技(深圳)有限公司 | A kind of data clearing method and device |
US11023448B2 (en) | 2017-01-09 | 2021-06-01 | Tencent Technology (Shenzhen) Company Limited | Data scrubbing method and apparatus, and computer readable storage medium |
CN108287835B (en) * | 2017-01-09 | 2022-06-21 | 腾讯科技(深圳)有限公司 | Data cleaning method and device |
CN107729222A (en) * | 2017-07-26 | 2018-02-23 | 上海壹账通金融科技有限公司 | User behavior statistical method, system, computer equipment and storage medium |
CN109634971A (en) * | 2018-11-07 | 2019-04-16 | 平安科技(深圳)有限公司 | Data-updating method, device, equipment and computer readable storage medium |
CN109634971B (en) * | 2018-11-07 | 2024-01-23 | 平安科技(深圳)有限公司 | Data updating method, device, equipment and computer readable storage medium |
CN110147362A (en) * | 2019-04-04 | 2019-08-20 | 中电科大数据研究院有限公司 | One kind is based on the acquisition of event driven DOC DATA and processing system and its method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102411569A (en) | Database conversion and cleaning information processing method | |
US8463724B2 (en) | Computer archive traversal | |
JP6400010B2 (en) | Aggregation / grouping operation: Hardware implementation of filtering method | |
CN102004744B (en) | Data extraction system and method from one source table to table of at least one object database | |
Prekopcsak et al. | Radoop: Analyzing big data with rapidminer and hadoop | |
CN102171695A (en) | Efficient large-scale joining for querying of column based data encoded structures | |
EP4216069A1 (en) | Managing data queries | |
CN102112962A (en) | Efficient column based data encoding for large-scale data storage | |
CN102112986A (en) | Efficient large-scale processing of column based data encoded structures | |
CN102135995A (en) | Extract transform and load (ETL) data cleaning design method | |
CN103544323A (en) | Data updating method and device | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
CN105912609A (en) | Data file processing method and device | |
CN107301214A (en) | Data migration method, device and terminal device in HIVE | |
CN100367278C (en) | Device and method for archiving and inquiry historical data | |
CN104407991A (en) | Data storage method and device | |
CN104298736A (en) | Method and device for aggregating and connecting data as well as database system | |
US20220058052A1 (en) | Data processing management methods for imaging applications | |
CN104239580A (en) | General single-field split data extraction method and device based on value-column mapping | |
EP3889793A1 (en) | Preprocessing in database system workload capture and replay | |
CN102411632B (en) | Chain table-based memory database page type storage method | |
US10679230B2 (en) | Associative memory-based project management system | |
EP2620901A1 (en) | Associative memory-based project management system | |
JP6248137B2 (en) | Script-based data processing system using a commercial interpreter | |
CN101393624A (en) | Method and apparatus for operating material list to realize production management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20120411 |
|
RJ01 | Rejection of invention patent application after publication |