CN101504664A - Apparatus and method for extracting, converting and loading total source data - Google Patents

Apparatus and method for extracting, converting and loading total source data Download PDF

Info

Publication number
CN101504664A
CN101504664A CNA2009100800650A CN200910080065A CN101504664A CN 101504664 A CN101504664 A CN 101504664A CN A2009100800650 A CNA2009100800650 A CN A2009100800650A CN 200910080065 A CN200910080065 A CN 200910080065A CN 101504664 A CN101504664 A CN 101504664A
Authority
CN
China
Prior art keywords
data
database
temporary table
date
source data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2009100800650A
Other languages
Chinese (zh)
Other versions
CN101504664B (en
Inventor
陈慧
俞斌
赵亚华
张海军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN2009100800650A priority Critical patent/CN101504664B/en
Publication of CN101504664A publication Critical patent/CN101504664A/en
Application granted granted Critical
Publication of CN101504664B publication Critical patent/CN101504664B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a device and a method for extracting, converting and loading gross source data. The device comprises a data receiving unit, a data processing unit, a data export unit and a database, wherein the database is provided with a temporary working area and a foundation area; the foundation area of the database is provided with an object list inside; the data receiving unit is connected with the database and used for receiving gross source data files from a source system and loading the gross source data files into the temporary working area of the database; the data processing unit is connected with the database and used for extracting, converting and loading the gross source data files which are loaded into the temporary working area of the database and storing the processed gross source data into the database in the mode of historical chains; and the data export unit is connected with the database and used for exporting the gross source data which is stored into the database for other systems to use. The invention meets the requirement of the database on the reservation of historical track of the gross source data.

Description

Gross source data is extracted the device and method that conversion loads
Technical field
The present invention relates to the database source technical field of data processing, relate in particular to a kind of device and method that gross source data is extracted the conversion loading.
Background technology
For enterprise-level data warehouse (EDW), its most basic function is that the data from the service source system are extracted, change and load by certain rule (is Extract, Transform and Load, ETL).Because the diversity of origin system, also be panoramic from the data of origin system, can't be extracted conversion with single rule.The enterprise-level data warehouse be exactly to differ various attributes, standardized data message that the changeable data-switching of rule becomes to satisfy certain rule, meets the relevant normal form of database, this has mass data information for any one and as the basic enterprise of survival and development, all is very important.Particularly for large-scale state-run commercial bank,,, be the most important thing of EDW how with the data pick-up of this magnanimity and be organized into a whole set of orderly, normalized infosystem because its data volume is a magnanimity.
In the service source system, except a few data can keep historical information (for example, marketing system amount of money class data), most of data are not keep history in origin system, and only safeguard a current latest data value.That is to say, if in a single day variation has taken place the value of certain bar record, will upgrade at once in the origin system and cover original numerical value, just the original information in the record is no longer preserved.And the enterprise-level data warehouse can remedy this deficiency, and the processing conversion method of historical zipper form just can load the record that originally have only current last look and preserve in the database in the mode of historical time section.When the record attribute value changes, the data warehouse renewal of can on original record, not making amendment, but regenerate a record that has up-to-date property value, and time tag all stamped in new, old record.The effect of the historical slide fastener algorithm of standard that Here it is.
But, the historical slide fastener algorithm of standard has its limitation: the process object of the historical slide fastener algorithm of standard only limits to increment source data (this increment source data is the data relevant with source data), promptly can handle out its historical track effectively for newly-increased or data updated.But, when the source data downloading mode is full dose, because the historical slide fastener algorithm of standard can't be done " pass chain " to the record of having left out in the origin system and handle, the historical slide fastener algorithm of standard just can't be delineated out the historical track (the historical slide fastener algorithm process of standard process flow diagram is referring to Fig. 1) of data exactly.For the situation that this gross source data is downloaded, the historical slide fastener algorithm of standard can cause occurring many mistakes " current still effective's " data recording, and this algorithm no longer satisfies actual requirement.
Therefore, press for the technology that a kind of data pick-up conversion that can handle gross source data loads, to satisfy database keeps historical track to gross source data demand, each time full dose data are scanned and delineate out the relevant historical track, solve the problem that the historical slide fastener algorithm of the standard that runs in the enterprise-level database project can't satisfy the gross source data processing requirements.
Summary of the invention
(1) technical matters that will solve
In view of this, fundamental purpose of the present invention is to provide a kind of gross source data is extracted the device and method that conversion loads, to satisfy the demand of database, solve the problem that the historical slide fastener algorithm of the standard that runs in the enterprise-level database project can't satisfy the gross source data processing requirements to gross source data reservation historical track.
(2) technical scheme
Be an aspect that achieves the above object, the invention provides a kind of device that gross source data is extracted the conversion loading, this device comprises Data Receiving unit, data processing unit, data lead-out unit and database, this database has an odd-job district and basis district, and in the basis district of this database, has an object table, wherein
The Data Receiving unit is connected in database, is used to receive the gross source data file that comes from origin system, and with the odd-job district of this gross source data file load in the database;
Data processing unit is connected in database, be used for the gross source data file that is loaded into database odd-job district is extracted the conversion loading processing, and the gross source data after will handling with historical slide fastener form is saved in the database;
The data lead-out unit is connected in database, and the gross source data that is used for being saved in database derives, and uses for other system.
In the such scheme, the odd-job district that described database has can create temporary table for described data processing unit the space is provided.
In the such scheme, the temporary table that described data processing unit is created comprises: temporary table CUR, temporary table PRE, temporary table INS and temporary table UPD, the structure of described temporary table CUR, temporary table PRE, temporary table INS, temporary table UPD is consistent with the structure of object table, and temporary table CUR, temporary table PRE, temporary table INS and temporary table UPD are variable temporary tables, existing only in during the session, will be deleted automatically when conversation end.
In the such scheme, in described temporary table CUR, field " Start Date " and " Close Date " with the historical slide fastener of sign, when handling temporary table CUR table, on the unified work date for the present batch processing of " Start Date " field, " Close Date " then is unified into the maximum date of default setting.
In the such scheme, in described temporary table CUR, exist but in temporary table PRE non-existent data, be data newly-increased or that upgraded, the data of depositing among the temporary table CUR are to extract from the database temporary area and the gross source data on a same day of coming, and the data of depositing among the temporary table PRE are current effective data in the object table, adopt temporary table CUR to deduct temporary table PRE and can access the increment source data on the same day that increases newly or revise in the gross source data on the same day;
The data that exist in described temporary table PRE remain effective in object table at present; But non-existent data in temporary table CUR are deleted in gross source data, belong to the data of " expired ", " calcellation ".
In the such scheme, in described object table, the Close Date equals the data on maximum date, is the current effective data; Close Date is not equal to the data on maximum date, is current invalid data; For data invalid in the described object table, it need be carried out " pass chain ", detailed process is as follows: " Close Date " field of corresponding record is changed to the present batch date, and then the term of validity of this record is by the end of the current batch date; Adopt record among the temporary table UPD and the record in the object table to compare, can search which data is invalid datas of " expired ", " calcellation " in the object table; Adopt record and the record among the temporary table UPD in the object table to do comparison one by one, if the match is successful, then " Close Date " with this record changes to the present batch date, otherwise constant.
Be another aspect that achieves the above object, the invention provides a kind of method that gross source data is extracted the conversion loading, be applied to comprise the device of Data Receiving unit, data processing unit, data lead-out unit and database, this database has an odd-job district and basis district, and have an object table in the basis district of this database, this method comprises:
The Data Receiving unit receives the gross source data file come from origin system, and with the odd-job district of this gross source data file load in the database;
Data processing unit extracts the conversion loading processing to the gross source data file that is loaded in the database odd-job district, and the gross source data after will handling with historical slide fastener form is saved in the database.
In the such scheme, described data processing unit extracts the conversion loading processing to the gross source data file that is loaded in the database odd-job district, specifically comprises:
Be loaded into the temporary area of database at the gross source data file after, data processing unit is created temporary table CUR in database odd-job district, and each field in the database temporary area is inserted among the temporary table CUR according to certain extraction conversion loading rule;
Data processing unit is created temporary table PRE in database odd-job district, and the Close Date in the object table in Basis of Database district is equaled among the data insertion temporary table PRE on maximum date;
Data processing unit is created temporary table INS in database odd-job district, and will in temporary table CUR, exist but in temporary table PRE non-existent data be inserted among the temporary table INS;
Data processing unit is created temporary table UPD in database odd-job district, and will in temporary table PRE, exist but in temporary table CUR non-existent data be inserted among the temporary table UPD;
Data processing unit closes the chain operation to the object table that is arranged in the Basis of Database district; And
Data processing unit will increase newly or data updated is inserted in the object table in Basis of Database district.
In the such scheme, in described temporary table CUR, field " Start Date " and " Close Date " with the historical slide fastener of sign, when handling temporary table CUR table, on the unified work date for the present batch processing of " Start Date " field, " Close Date " then is unified into the maximum date of default setting.
In the such scheme, in described temporary table CUR, exist but in temporary table PRE non-existent data, be data newly-increased or that upgraded, the data of depositing among the temporary table CUR are to extract from the database temporary area and the gross source data on a same day of coming, and the data of depositing among the temporary table PRE are current effective data in the object table, adopt temporary table CUR to deduct temporary table PRE and can access the increment source data on the same day that increases newly or revise in the gross source data on the same day;
The data that exist in described temporary table PRE remain effective in object table at present; But non-existent data in temporary table CUR are deleted in gross source data, belong to the data of " expired ", " calcellation ".
In the such scheme, in described object table, the Close Date equals the data on maximum date, is the current effective data; Close Date is not equal to the data on maximum date, is current invalid data; For data invalid in the described object table, it need be carried out " pass chain ", detailed process is as follows: " Close Date " field of corresponding record is changed to the present batch date, and then the term of validity of this record is by the end of the current batch date; Adopt record among the temporary table UPD and the record in the object table to compare, can search which data is invalid datas of " expired ", " calcellation " in the object table; Adopt record and the record among the temporary table UPD in the object table to do comparison one by one, if the match is successful, then " Close Date " with this record changes to the present batch date, otherwise constant.
In the such scheme, described data processing unit closes the chain operation to the object table that is arranged in the Basis of Database district, specifically comprises: data processing unit is changed to the present batch date with expired in the object table with " Close Date " field of making dirty data institute corresponding record.
In the such scheme, described data processing unit will increase newly or data updated is inserted in the object table in Basis of Database district, further comprise: will increase newly or " Start Date " of the record of data updated correspondence is changed to the present batch date, " Close Date " is changed to the maximum date of system, then " term of validity " of described record came into effect from the present batch date, and from the same day these newly-increased or data updated all be " effectively ", be next day gross source data extract the process object of changing.
In the such scheme, the gross source data after described data processing unit will be handled with historical slide fastener form is saved in after the step in the database, further comprises: the gross source data that the data lead-out unit will be saved in the database derives, and uses for other system.
(3) beneficial effect
From technique scheme as can be seen, the device and method that gross source data is extracted the conversion loading provided by the invention, not only can't handle the shortcoming of gross source data at the historical slide fastener algorithm of standard, appropriate available solution has been proposed, and this solution is also still suitable under the situation of increment source data, has realized the unitarity of increment source data and gross source data processing mode.Compare with the historical slide fastener algorithm of existing standard, gross source data extract the effect of conversion method and advantage applies following some:
1), having satisfied in the source data presentation mode is under the situation of full dose, source system data is carried out the demand of reasonable slide fastener.
2), realized the unitarity of database side to increment source data and gross source data processing mode.
3), satisfied the dirigibility requirement of source system data presentation mode, the upstream source system can switch flexibly as required and increase/storage is for the number mode, realized the transparence with the EDW interactive interfacing.
Description of drawings
Fig. 1 is the processing flow chart of the historical slide fastener algorithm of existing standard;
Fig. 2 is the structural representation that gross source data is extracted the device of conversion loading provided by the invention;
Fig. 3 is the method flow diagram that gross source data is extracted the conversion loading provided by the invention;
To be data processing unit provided by the invention be inserted into synoptic diagram among the temporary table CUR with each field in the database temporary area according to certain extraction conversion loading rule to Fig. 4;
Fig. 5 data that to be data processing unit provided by the invention equal the maximum date with the Close Date in the object table in Basis of Database district are inserted the synoptic diagram among the temporary table PRE;
Fig. 6 be data processing unit provided by the invention will in temporary table CUR, exist but in temporary table PRE non-existent data be inserted into synoptic diagram among the temporary table INS;
Fig. 7 be data processing unit provided by the invention will in temporary table PRE, exist but in temporary table CUR non-existent data be inserted into synoptic diagram among the temporary table UPD;
Fig. 8 is data processing unit provided by the invention closes the chain operation to the object table that is arranged in the Basis of Database district a synoptic diagram;
Fig. 9 is that data processing unit provided by the invention will increase newly or data updated is inserted into synoptic diagram in the object table in Basis of Database district.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
The present invention handles by the record of having deleted in the origin system being carried out " pass chain ", promptly the Close Date of corresponding record (End_Dt) is changed to the present batch date, the valid period of representing this record is at this end of day, realized that the same day, gross source data was loaded in the data warehouse, and preserve with the form of historical slide fastener, satisfy data warehouse keeps historical track to gross source data demand.
As shown in Figure 2, Fig. 2 provided by the inventionly extracts the structural representation of the device that conversion loads to gross source data, and this device comprises Data Receiving unit, data processing unit, data lead-out unit and database.This database has an odd-job district and basis district, and has an object table in the basis district of this database.Wherein, the Data Receiving unit is connected in database, is used to receive the gross source data file that comes from origin system, and with the odd-job district of this gross source data file load in the database.Data processing unit is connected in database, be used for the gross source data file that is loaded into database odd-job district is extracted the conversion loading processing, and the gross source data after will handling with historical slide fastener form is saved in the database.The data lead-out unit is connected in database, and the gross source data that is used for being saved in database derives, and uses for other system.
The odd-job district that database has can create temporary table for described data processing unit the space is provided.The temporary table that data processing unit is created comprises: temporary table CUR, temporary table PRE, temporary table INS and temporary table UPD, the structure of described temporary table CUR, temporary table PRE, temporary table INS, temporary table UPD is consistent with the structure of object table, and temporary table CUR, temporary table PRE, temporary table INS and temporary table UPD are variable temporary tables, existing only in during the session, will be deleted automatically when conversation end.
Wherein, CUR is the abbreviation of CURRENT, can be understood as " current ", is used for depositing the temporary table of downloading the data of coming the same day from origin system.PRE can be understood as " previous ", is used for depositing the temporary table of day valid data.INS is the abbreviation of INSERT, can be understood as " insertion ", is used for depositing (same day was than last day) newly-increased and data updated, promptly needs the data of " insertion ".UPD is the abbreviation of UPDATE, can be understood as " renewal ", is used for depositing the data that Japan and China need be updated (being " pass chain " operation) herein.
In temporary table CUR, have field " Start Date " and " Close Date " of the historical slide fastener of sign.When handling temporary table CUR table, on the unified work date for the present batch processing of " Start Date " field, " Close Date " then is unified into the maximum date of default setting.
In temporary table CUR, exist but in temporary table PRE non-existent data, be data newly-increased or that upgraded, the data of depositing among the temporary table CUR are to extract from the database temporary area and the gross source data on a same day of coming, and the data of depositing among the temporary table PRE are current effective data in the object table, adopt temporary table CUR to deduct temporary table PRE and can access the increment source data on the same day that increases newly or revise in the gross source data on the same day.The data that exist in temporary table PRE remain effective in object table at present; But non-existent data in temporary table CUR are deleted in gross source data, belong to the data of " expired ", " calcellation ".
In object table, the Close Date equals the data on maximum date, is the current effective data; Close Date is not equal to the data on maximum date, is current invalid data.For data invalid in the object table, it need be carried out " pass chain ", detailed process is as follows: " Close Date " field of corresponding record is changed to the present batch date, and then the term of validity of this record is by the end of the current batch date; Adopt record among the temporary table UPD and the record in the object table to compare, can search which data is invalid datas of " expired ", " calcellation " in the object table; Adopt record and the record among the temporary table UPD in the object table to do comparison one by one, if the match is successful, then " Close Date " with this record changes to the present batch date, otherwise constant.
As shown in Figure 3, Fig. 3 is that provided by the invention gross source data is extracted changed the method flow diagram that loads, and this method is applied to the device that comprises Data Receiving unit, data processing unit, data lead-out unit and database shown in Figure 2.This database has an odd-job district and basis district, and has an object table in the basis district of this database, and this method specifically may further comprise the steps:
Step 301: the Data Receiving unit receives the gross source data file come from origin system, and with the odd-job district of this gross source data file load in the database; This step is to carry out the work that data are prepared for follow-up data pick-up conversion loading processing.
Step 302: data processing unit extracts the conversion loading processing to the gross source data file that is loaded in the database odd-job district, and the gross source data after will handling with historical slide fastener form is saved in the database.
Step 303: the gross source data that the data lead-out unit will be saved in the database derives, and uses for other system.
Data processing unit described in the above-mentioned steps 302 extracts the conversion loading processing to the gross source data file that is loaded in the database odd-job district, specifically comprises:
Step 3021: be loaded into the temporary area of database at the gross source data file after, data processing unit is created temporary table CUR in database odd-job district, and each field in the database temporary area is inserted among the temporary table CUR according to certain extraction conversion loading rule;
In this step, after gross source data is loaded into the temporary area of data warehouse, data processing unit is created temporary table CUR in database space, and each field in the database temporary area is inserted among the temporary table CUR according to certain ETL rule, simultaneously, the field " Start Date " and " Close Date " that indicate historical slide fastener among the temporary table CUR in addition.When handling temporary table CUR table, on the unified work date for system's present batch processing of " Start Date " field, the maximum date that " Close Date " then is unified into default setting in the EDW system (for example: 9999-12-31).As shown in Figure 4, the structure of the structure of temporary table CUR and object table is consistent.Temporary table CUR is variable temporary table, exists only in during the session (session), and when conversation end, this table is deleted automatically.
Step 3022: data processing unit is created temporary table PRE in database odd-job district, and the Close Date in the object table in Basis of Database district is equaled among the data insertion temporary table PRE on maximum date;
In this step, data processing unit is created temporary table PRE in database odd-job district, and the Close Date in the object table is equaled among the data insertion temporary table PRE on maximum date.The structure of temporary table PRE is consistent with the structure of object table.The Close Date equals the data on maximum date in the object table, and expression is the current effective data.Be not equal to the data on maximum date for the Close Date, show it is current invalid data, therefore not as the object of handling.As shown in Figure 5, temporary table PRE is variable temporary table, exists only in during the session (session), and when conversation end, this table is deleted automatically.
Step 3023: data processing unit is created temporary table INS in database odd-job district, and will in temporary table CUR, exist but in temporary table PRE non-existent data be inserted among the temporary table INS;
In this step, data processing unit is created temporary table INS in database odd-job district, and will in temporary table CUR, exist but in temporary table PRE non-existent data be inserted among the temporary table INS; The structure of temporary table INS is consistent with the structure of object table.In temporary table CUR, exist but in temporary table PRE, do not exist, illustrate that these data are data newly-increased or that upgraded.Owing to the data among the temporary table CUR extract from the data warehouse temporary area, promptly represented gross source data on a same day, and what deposit among the temporary table PRE is current effective data in the object table, therefore deduct PRE with CUR, just can obtain in the gross source data on the same day part newly-increased or that revise, i.e. the increment source data on the same day.As shown in Figure 6, temporary table INS is variable temporary table, exists only in during the session (session), and when conversation end, this table is deleted automatically.
Step 3024: data processing unit is created temporary table UPD in database odd-job district, and will in temporary table PRE, exist but in temporary table CUR non-existent data be inserted among the temporary table UPD;
In this step, data processing unit is created temporary table UPD in database odd-job district, and will in temporary table PRE, exist but in temporary table CUR non-existent data be inserted among the temporary table UPD.The structure of temporary table UPD is consistent with the structure of object table.In temporary table PRE, exist, illustrate that these data in the object table remain effective at present; But in temporary table CUR, do not exist, illustrate that these data are deleted in gross source data, promptly discarded in data " source ".Therefore, these class data belong to the data of " expired ", " calcellation ", must do " pass chain " and handle in historical slide fastener, show that its valid period has ended.These data are put into temporary table UPD, wait for next step " pass chain " operation.As shown in Figure 7, temporary table UPD is variable temporary table, exists only in during the session (session), and when conversation end, this table is deleted automatically.
Step 3025: data processing unit closes the chain operation to the object table that is arranged in the Basis of Database district;
In this step, the object table that is arranged in data warehouse basis district is carried out " pass chain " operation.For the data of " expired ", " calcellation " in the object table, its " pass chain " promptly need be changed to the present batch date to " Close Date " field of corresponding record, the term of validity that shows this record is by the end of the current batch date.Compare with record among the temporary table UPD and the record in the object table, can search which data is " expired ", " calcellation " in the target.As shown in Figure 8, do comparison one by one with record in the object table and the record among the temporary table UPD, on coupling, then " Close Date " with this record changes to the present batch date, otherwise constant.
Step 3026: data processing unit will increase newly or data updated is inserted in the object table in Basis of Database district;
In this step, will increase newly or data updated is inserted in the object table.Record among the temporary table INS is inserted in the object table, be changed to the present batch date simultaneously " Start Date " with these records, " Close Date " is changed to the maximum date of system, " term of validity " that shows these records came into effect from the present batch date, and from the same day these data all be " effectively ", be next day gross source data extract the process object of conversion.What keep among the temporary table INS as shown in Figure 9, is exactly record newly-increased or that upgrade.
By above treatment step, realized a kind of method that realizes gross source data extraction conversion loading based on historical slide fastener algorithm, this method is handled by the record of having deleted in the origin system being carried out " pass chain ", promptly the Close Date of corresponding record (End_Dt) is changed to the present batch date, the valid period of representing this record is at this end of day.This method has realized that the same day, gross source data was loaded in the data warehouse, and preserves with the form of historical slide fastener, satisfies data warehouse keeps historical track to gross source data demand.
Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the above only is specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of being made, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (14)

1, a kind of device that gross source data is extracted the conversion loading, this device comprises Data Receiving unit, data processing unit, data lead-out unit and database, and this database has an odd-job district and basis district, and has an object table in the basis district of this database, wherein
The Data Receiving unit is connected in database, is used to receive the gross source data file that comes from origin system, and with the odd-job district of this gross source data file load in the database;
Data processing unit is connected in database, be used for the gross source data file that is loaded into database odd-job district is extracted the conversion loading processing, and the gross source data after will handling with historical slide fastener form is saved in the database;
The data lead-out unit is connected in database, and the gross source data that is used for being saved in database derives, and uses for other system.
2, according to claim 1 gross source data is extracted changed the device that loads, and it is characterized in that the odd-job district that described database has can create temporary table for described data processing unit the space is provided.
3, the device that gross source data is extracted the conversion loading according to claim 2, it is characterized in that, the temporary table that described data processing unit is created comprises: temporary table CUR, temporary table PRE, temporary table INS and temporary table UPD, the structure of described temporary table CUR, temporary table PRE, temporary table INS, temporary table UPD is consistent with the structure of object table, and temporary table CUR, temporary table PRE, temporary table INS and temporary table UPD are variable temporary tables, existing only in during the session, will be deleted automatically when conversation end.
4, the device that gross source data is extracted the conversion loading according to claim 3, it is characterized in that, in described temporary table CUR, field " Start Date " and " Close Date " with the historical slide fastener of sign, when handling temporary table CUR table, on the unified work date for the present batch processing of " Start Date " field, " Close Date " then is unified into the maximum date of default setting.
5, the device that gross source data is extracted the conversion loading according to claim 3, it is characterized in that, in described temporary table CUR, exist but in temporary table PRE non-existent data, be data newly-increased or that upgraded, the data of depositing among the temporary table CUR are to extract from the database temporary area and the gross source data on a same day of coming, and the data of depositing among the temporary table PRE are current effective data in the object table, adopt temporary table CUR to deduct temporary table PRE and can access the increment source data on the same day that increases newly or revise in the gross source data on the same day;
The data that exist in described temporary table PRE remain effective in object table at present; But non-existent data in temporary table CUR are deleted in gross source data, belong to the data of " expired ", " calcellation ".
6, according to claim 3 gross source data is extracted changed the device that loads, and it is characterized in that, in described object table, the Close Date equals the data on maximum date, is the current effective data; Close Date is not equal to the data on maximum date, is current invalid data; For data invalid in the described object table, it need be carried out " pass chain ", detailed process is as follows: " Close Date " field of corresponding record is changed to the present batch date, and then the term of validity of this record is by the end of the current batch date; Adopt record among the temporary table UPD and the record in the object table to compare, can search which data is invalid datas of " expired ", " calcellation " in the object table; Adopt record and the record among the temporary table UPD in the object table to do comparison one by one, if the match is successful, then " Close Date " with this record changes to the present batch date, otherwise constant.
7, a kind of method that gross source data is extracted the conversion loading, be applied to comprise the device of Data Receiving unit, data processing unit, data lead-out unit and database, this database has an odd-job district and basis district, and have an object table in the basis district of this database, this method comprises:
The Data Receiving unit receives the gross source data file come from origin system, and with the odd-job district of this gross source data file load in the database;
Data processing unit extracts the conversion loading processing to the gross source data file that is loaded in the database odd-job district, and the gross source data after will handling with historical slide fastener form is saved in the database.
8, according to claim 7 gross source data is extracted changed the method that loads, and it is characterized in that described data processing unit extracts the conversion loading processing to the gross source data file that is loaded in the database odd-job district, specifically comprises:
Be loaded into the temporary area of database at the gross source data file after, data processing unit is created temporary table CUR in database odd-job district, and each field in the database temporary area is inserted among the temporary table CUR according to certain extraction conversion loading rule;
Data processing unit is created temporary table PRE in database odd-job district, and the Close Date in the object table in Basis of Database district is equaled among the data insertion temporary table PRE on maximum date;
Data processing unit is created temporary table INS in database odd-job district, and will in temporary table CUR, exist but in temporary table PRE non-existent data be inserted among the temporary table INS;
Data processing unit is created temporary table UPD in database odd-job district, and will in temporary table PRE, exist but in temporary table CUR non-existent data be inserted among the temporary table UPD;
Data processing unit closes the chain operation to the object table that is arranged in the Basis of Database district; And
Data processing unit will increase newly or data updated is inserted in the object table in Basis of Database district.
9, the method that gross source data is extracted the conversion loading according to claim 8, it is characterized in that, in described temporary table CUR, field " Start Date " and " Close Date " with the historical slide fastener of sign, when handling temporary table CUR table, on the unified work date for the present batch processing of " Start Date " field, " Close Date " then is unified into the maximum date of default setting.
10, the method that gross source data is extracted the conversion loading according to claim 8, it is characterized in that, in described temporary table CUR, exist but in temporary table PRE non-existent data, be data newly-increased or that upgraded, the data of depositing among the temporary table CUR are to extract from the database temporary area and the gross source data on a same day of coming, and the data of depositing among the temporary table PRE are current effective data in the object table, adopt temporary table CUR to deduct temporary table PRE and can access the increment source data on the same day that increases newly or revise in the gross source data on the same day;
The data that exist in described temporary table PRE remain effective in object table at present; But non-existent data in temporary table CUR are deleted in gross source data, belong to the data of " expired ", " calcellation ".
11, according to claim 8 gross source data is extracted changed the method that loads, and it is characterized in that, in described object table, the Close Date equals the data on maximum date, is the current effective data; Close Date is not equal to the data on maximum date, is current invalid data; For data invalid in the described object table, it need be carried out " pass chain ", detailed process is as follows: " Close Date " field of corresponding record is changed to the present batch date, and then the term of validity of this record is by the end of the current batch date; Adopt record among the temporary table UPD and the record in the object table to compare, can search which data is invalid datas of " expired ", " calcellation " in the object table; Adopt record and the record among the temporary table UPD in the object table to do comparison one by one, if the match is successful, then " Close Date " with this record changes to the present batch date, otherwise constant.
12, according to claim 8 gross source data is extracted changed the method that loads, and it is characterized in that, described data processing unit closes the chain operation to the object table that is arranged in the Basis of Database district, specifically comprises:
Data processing unit is changed to the present batch date with expired in the object table with " Close Date " field of making dirty data institute corresponding record.
13, according to claim 8 gross source data is extracted changed the method that loads, and it is characterized in that described data processing unit will increase newly or data updated is inserted in the object table in Basis of Database district, further comprise:
With newly-increased or " Start Date " record that data updated is corresponding be changed to the present batch date, " Close Date " is changed to the maximum date of system, then " term of validity " of described record came into effect from the present batch date, and from the same day these newly-increased or data updated all be " effectively ", be next day gross source data extract the process object of changing.
14, according to claim 7 gross source data is extracted the method that conversion loads, it is characterized in that, the gross source data after described data processing unit will be handled with historical slide fastener form is saved in after the step in the database, further comprises:
The gross source data that the data lead-out unit will be saved in the database derives, and uses for other system.
CN2009100800650A 2009-03-18 2009-03-18 Apparatus and method for extracting, converting and loading total source data Active CN101504664B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100800650A CN101504664B (en) 2009-03-18 2009-03-18 Apparatus and method for extracting, converting and loading total source data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100800650A CN101504664B (en) 2009-03-18 2009-03-18 Apparatus and method for extracting, converting and loading total source data

Publications (2)

Publication Number Publication Date
CN101504664A true CN101504664A (en) 2009-08-12
CN101504664B CN101504664B (en) 2010-12-01

Family

ID=40976910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100800650A Active CN101504664B (en) 2009-03-18 2009-03-18 Apparatus and method for extracting, converting and loading total source data

Country Status (1)

Country Link
CN (1) CN101504664B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011026311A1 (en) * 2009-09-02 2011-03-10 中国银联股份有限公司 System and method for data transfer
CN102411569A (en) * 2010-09-20 2012-04-11 上海众融信息技术有限公司 Database conversion and cleaning information processing method
CN103530375A (en) * 2013-10-15 2014-01-22 北京国双科技有限公司 Method and device for data source matching
CN103593447A (en) * 2013-11-18 2014-02-19 北京国双科技有限公司 Data processing method and device applied to database table
CN103605776A (en) * 2013-11-28 2014-02-26 北京国双科技有限公司 Method and device for processing data of information database
CN104572122A (en) * 2015-01-28 2015-04-29 中国工商银行股份有限公司 Software application data generating device and method
CN105045881A (en) * 2015-07-21 2015-11-11 上海融甸信息科技有限公司 Historical data adding method
CN105045879A (en) * 2015-07-21 2015-11-11 上海融甸信息科技有限公司 Parallel data processing method
CN105260485A (en) * 2015-11-20 2016-01-20 杭州数梦工场科技有限公司 Method and device for loading data
CN106569731A (en) * 2015-10-10 2017-04-19 阿里巴巴集团控股有限公司 Limit storage method, limit storage device and limit storage equipment
CN106649355A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for processing data
CN107193985A (en) * 2017-05-27 2017-09-22 郑州云海信息技术有限公司 A kind of slide fastener table design method of record data change histories
CN108763259A (en) * 2018-04-03 2018-11-06 四川新网银行股份有限公司 A kind of big data platform data load converting algorithm
CN108829747A (en) * 2018-05-24 2018-11-16 新华三大数据技术有限公司 Data load method and device
CN110209662A (en) * 2018-02-13 2019-09-06 北京京东尚科信息技术有限公司 A kind of method and apparatus of automation load data
CN110442578A (en) * 2019-07-30 2019-11-12 新华三大数据技术有限公司 Zipper table updating method, device, server and computer readable storage medium
CN113535481A (en) * 2021-07-20 2021-10-22 北京京东振世信息技术有限公司 Data backtracking method and device and nonvolatile computer readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787122A (en) * 2016-03-28 2016-07-20 北京科东电力控制系统有限责任公司 Data extracting, transforming and loading method

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8924342B2 (en) 2009-09-02 2014-12-30 China Unionpay Co., Ltd. System and method for data transfer
WO2011026311A1 (en) * 2009-09-02 2011-03-10 中国银联股份有限公司 System and method for data transfer
CN102411569A (en) * 2010-09-20 2012-04-11 上海众融信息技术有限公司 Database conversion and cleaning information processing method
CN103530375A (en) * 2013-10-15 2014-01-22 北京国双科技有限公司 Method and device for data source matching
CN103593447B (en) * 2013-11-18 2017-02-08 北京国双科技有限公司 Data processing method and device applied to database table
CN103593447A (en) * 2013-11-18 2014-02-19 北京国双科技有限公司 Data processing method and device applied to database table
CN103605776A (en) * 2013-11-28 2014-02-26 北京国双科技有限公司 Method and device for processing data of information database
CN103605776B (en) * 2013-11-28 2017-04-12 北京国双科技有限公司 Method and device for processing data of information database
CN104572122A (en) * 2015-01-28 2015-04-29 中国工商银行股份有限公司 Software application data generating device and method
CN104572122B (en) * 2015-01-28 2018-04-27 中国工商银行股份有限公司 A kind of generating means and method of software application data
CN105045879B (en) * 2015-07-21 2019-08-02 上海融甸信息科技有限公司 A kind of data parallel processing method
CN105045879A (en) * 2015-07-21 2015-11-11 上海融甸信息科技有限公司 Parallel data processing method
CN105045881A (en) * 2015-07-21 2015-11-11 上海融甸信息科技有限公司 Historical data adding method
CN106569731A (en) * 2015-10-10 2017-04-19 阿里巴巴集团控股有限公司 Limit storage method, limit storage device and limit storage equipment
CN106569731B (en) * 2015-10-10 2020-08-04 阿里巴巴集团控股有限公司 Limit storage method, device and equipment
CN106649355A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for processing data
CN105260485A (en) * 2015-11-20 2016-01-20 杭州数梦工场科技有限公司 Method and device for loading data
CN105260485B (en) * 2015-11-20 2019-05-31 杭州数梦工场科技有限公司 A kind of method and apparatus of data load
CN110083651A (en) * 2015-11-20 2019-08-02 杭州数梦工场科技有限公司 A kind of method and apparatus of data load
CN110083651B (en) * 2015-11-20 2021-06-29 杭州数梦工场科技有限公司 Data loading method and device
CN107193985A (en) * 2017-05-27 2017-09-22 郑州云海信息技术有限公司 A kind of slide fastener table design method of record data change histories
CN110209662A (en) * 2018-02-13 2019-09-06 北京京东尚科信息技术有限公司 A kind of method and apparatus of automation load data
CN108763259A (en) * 2018-04-03 2018-11-06 四川新网银行股份有限公司 A kind of big data platform data load converting algorithm
CN108829747A (en) * 2018-05-24 2018-11-16 新华三大数据技术有限公司 Data load method and device
CN110442578A (en) * 2019-07-30 2019-11-12 新华三大数据技术有限公司 Zipper table updating method, device, server and computer readable storage medium
CN113535481A (en) * 2021-07-20 2021-10-22 北京京东振世信息技术有限公司 Data backtracking method and device and nonvolatile computer readable storage medium
CN113535481B (en) * 2021-07-20 2024-05-17 北京京东振世信息技术有限公司 Data backtracking method and device and non-volatile computer readable storage medium

Also Published As

Publication number Publication date
CN101504664B (en) 2010-12-01

Similar Documents

Publication Publication Date Title
CN101504664B (en) Apparatus and method for extracting, converting and loading total source data
CN102542071B (en) Distributed data processing system and method
CN104462141B (en) Method, system and the storage engines device of a kind of data storage and inquiry
CN101464895A (en) Method, system and apparatus for updating internal memory data
CN107918621A (en) Daily record data processing method, device and operation system
CN102521145A (en) Java card system and space distribution processing method thereof
CN102375891A (en) Implementation tool for unloading and loading incremental data
CN105975502A (en) Method for realizing incremental data extract based on CDC (Change Data Capture) mode
CN102043789A (en) Method and device for updating data table
CN104504030B (en) A kind of indexing means towards power dispatching automation magnanimity message
CN103064908B (en) A kind of method by the quick duplicate removal list of internal memory
CN104636337B (en) A kind of data cleansing storage method for value-added tax
CN104298736A (en) Method and device for aggregating and connecting data as well as database system
CN102467525A (en) Document associating method and system
CN109522315A (en) A kind of data base processing method and system
CN107704573A (en) A kind of intelligent buffer method coupled with business
CN201465108U (en) Extraction, transformation and loading device for full source data
CN102831204B (en) A kind of file packing form possessing Version Control ability
CN107403012A (en) A kind of method for interchanging data and device
CN102096676B (en) Data updating and query control method and system
CN102081644A (en) Data storage method for storing data and data meanings separately
CN105556486B (en) Method and system for managing the memory dynamically distributed in computing unit automatically
CN103714068A (en) Method and device for just-in-time data updating
CN102486844A (en) Method and device for processing concurrent data in enterprise resource planning (ERP) system
CN102855288A (en) Method and device for processing difference data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant