CN105488187A - Method and device for extracting multi-source heterogeneous data increment - Google Patents

Method and device for extracting multi-source heterogeneous data increment Download PDF

Info

Publication number
CN105488187A
CN105488187A CN201510867992.2A CN201510867992A CN105488187A CN 105488187 A CN105488187 A CN 105488187A CN 201510867992 A CN201510867992 A CN 201510867992A CN 105488187 A CN105488187 A CN 105488187A
Authority
CN
China
Prior art keywords
data
increment
time stamp
source database
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510867992.2A
Other languages
Chinese (zh)
Inventor
胡玉婷
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
STAR SOFTWARE TECHNOLOGY CO LTD
Original Assignee
STAR SOFTWARE TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by STAR SOFTWARE TECHNOLOGY CO LTD filed Critical STAR SOFTWARE TECHNOLOGY CO LTD
Priority to CN201510867992.2A priority Critical patent/CN105488187A/en
Publication of CN105488187A publication Critical patent/CN105488187A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

The invention discloses a method and a device for extracting a multi-source heterogeneous data increment. The method comprises: analyzing a source database transaction log, and acquiring main key information, data operation type and operation timestamp information of a variable data table in a source database from the analysis result; determining data of increment variation according to the main key information, the data operation type and the operation timestamp information, and extracting the data of increment variation into the database. By adopting the technical solution, the accuracy, efficiency and completeness of data increment extraction are improved.

Description

The method of multi-source heterogeneous data increment extraction and device
Technical field
The present invention relates to database technical field, particularly, relate to a kind of method and device of multi-source heterogeneous data increment extraction.
Background technology
Along with the development of infotech, increasing business datum is produced, and for enterprise, data are wealth of enterprise, is a kind of important strategic resource.When enterprise have accumulated a certain amount of data, based on data warehouse technology, mining analysis is carried out to data, therefrom can find a lot of valuable information.In the building process of data warehouse, need by various distribution, carry out cleaning, change and load (ExtractionTransformation & Load after data pick-up in the data source of isomery, ETL), finally data warehouse is entered into.ETL is a very important step in data warehouse implementation process, usually accounts for about 60% ~ 80% of the data warehouse whole development time.The height of ETL process extraction efficiency, the quality of conversion quality, directly affect the construction of data warehouse and the validity of data mining results.Different according to extraction mode, ETL extraction process is divided into full dose to extract and increment extraction substantially.Full dose extracts and generally uses when data warehouse has just been set up or data warehouse not yet there are data, and the use of this process is very convenient.Increment extraction was just for once there being the data of operation to operate in certain time period, extract relative to full dose, it is the important component part of data warehouse later maintenance.And in data integration field, for promoting data-handling efficiency, incremental data extracts has become wherein indispensable gordian technique.
Due to enterprise many business datums residing for platform and the data source of use there are differences, the incremental data in the Service Database of different geographical is caused to be difficult to extract and to be integrated in unified data warehouse, thus can not comprehensively provide business datum accurately, also just can not give required information for decision maker.Although existing ETL instrument can extract the data from multi-data source, change and loading work, but exist a lot of not enough in service efficiency and operability etc., and can not support completely to use to the heterogeneous data source that many districts and cities distribute, therefore set up and support that multi-source heterogeneous ETL increment extraction process is carried out business datum for the later stage and excavated extremely important efficiently.
The data changed in the table that will extract in database since increment extraction needs to be obtained from extraction last time fast and accurately, can not cause too large pressure to operation system again simultaneously.In ETL use procedure, comparatively full dose extraction application is wider for the mode of increment extraction, and implementation procedure is also more complicated.Current incremental data extracts the conventional mode of catching delta data to be had several as follows:
(1) trigger mode
The main thought of trigger mode reaches the object of catching delta data by activated trigger.This mode sets up the trigger of needs on the table that will extract, and general needs sets up insertion, amendment and delete three kinds of triggers.Data in the table of source change, and trigger is caught the data of change and data are saved in middle temporary table, and then these data changed are taken out and are loaded into data warehouse by extraction thread from temporary table.
(2) timestamp mode
Timestamp mode is mainly caught by the record operated according to date and time, and the main thought of which catches by the renewal of timestamp the record once operated.The mode of this capture-data needs stab field in the table of source if having time or can increase timestamp field.In data extraction process, only need the time in the timestamp field of time and the current source table record obtained last time to compare, obtain those once by the Data import that operated in data warehouse.
(3) full list deletion mode
Full list deletion mode realizes comparatively simple, and which is before data pick-up, first carries out the data deletion action of full table, the more all data in being shown in source are all reloaded into.
(4) full alignments is shown
Full table alignments is the comparison that the record in being shown in source and the record in object table carry out one by one, catch newly-increased, inconsistent and non-existent data, correspondingly respectively to increase, revise and the record of deletion action, then make corresponding operating according to comparison result.At present, derive again a kind of full table alignments newly, which needs for source table creates the same MD5 temporary table of a structure respectively with object table, two fields deposited by this table, one is the Major key that records respectively in corresponding source table and object table, and another is shown in source and MD5 check code that the field of the every bar record of object table and data value obtain after MD5 encrypts respectively.When incremental data extracts, first, source table is carried out md5 encryption with the record in object table and data, then, the Major key recorded in correspondence table and encrypted result is deposited the MD5 temporary table to correspondence respectively.And then, compare the Major key in two temporary tables and check code, if two temporary table Major keys are identical, MD5 check code comparison result is different, then illustrate that source data changes, need to carry out Update operation; If major key exists in the MD5 temporary table that source table is corresponding, and MD5 temporary table not this major key that object table is corresponding, then illustrating needs newly-increased data, carries out Insert operation; If major key does not exist in MD5 temporary table corresponding to source table, and MD5 temporary table corresponding to object table has this major key, illustrates that these data are deleted, then carries out Delete operation.
(5) log sheet mode
For Production database, a business diary table can be created, for recording the situation of change of specific business datum.When carrying out incremental data and extracting, only need to read the data in business diary table, then catch the data of change and operate accordingly.
(6) transaction journal analysis mode
Transaction journal analysis mode is exactly catch delta data by the journal file of analytical database itself thus complete incremental data to extract.In increment extraction process, according to the time of specifying, obtain the information of data manipulation language (DML) record (DDL) in the journal file of source database, the all operations carried out after obtaining the last increment extraction, then according to these information, the increment extraction that corresponding operation has come this is carried out to data warehouse.
(7) database acquisition mode (CDC)
The principle of database acquisition mode is the data being judged change by the daily record of analytical database self.Oracle introduces delta data for the data changed and catches CDC (ChangeDataCapture) mode.CDC can automatically capture the last data pick-up after by the record operated, and after to the tables of data insertion in storehouse, source, renewal and deletion action, the data of operation are extracted in list of modification, finally uses controlled mode to utilize view to be supplied to data warehouse and complete data pick-up.
Below respectively to 7 kinds of delta data catching methods above-mentioned analysis deficiency separately:
(1) trigger mode needs in the operation system of source, set up three kinds of triggers, needs to modify to source database, if the change of data is very frequent, can pose a big pressure to system performance expense, and user does not generally allow to revise source database.
(2) timestamp mode realizes comparatively simple, but this method cannot capture the deletion action of data, can not the support matrix scene of being deleted by physics.In addition, the table in source database also not exclusively all has timestamp field, so timestamp mode is not enough in the completeness of data pick-up, data accuracy can not be guaranteed.
(3) realization of full list deletion mode is the simplest, but this mode is only applicable to the scene that the extraction time is not strict or data volume is less, and for process big data quantity, the consumption of obvious time is unacceptable.
(4) rule of full table alignments very simply and easily realize, and do not have invasive to source list structure, but need to carry out comparison one by one in implementation process, poor comparatively speaking in performance.
(5) rule that extracts of log sheet mode is convenient and simple, invasive is not had to source list structure, but log sheet needs to be set up by user building the table initial stage, this mode requires too high to the technical merit of user, and it is very large to the expense of system, mode that neither be best in performance, later maintenance bothers.
(6) transaction journal analysis mode and log sheet mode similar, although it has the system transaction daily record of oneself and does not need user to go to set up, what performance neither be best.
(7) database acquisition mode realizes ETL process voluntarily, but due to operation system database version and product disunity, implementation procedure relative complex, and need further investigation to realize, it is short that CDC product releases the time, there are some BUG., temporarily cannot promote the use of.
By above-mentioned known, the scheme of available data increment extraction, accuracy rate and efficiency low, and completeness is not enough.
Summary of the invention
Embodiments provide a kind of method and device of multi-source heterogeneous data increment extraction, in order to improve the accuracy rate of increment extraction, efficiency and completeness.
On the one hand, embodiments provide a kind of method of multi-source heterogeneous data increment extraction, in order to improve the accuracy rate of increment extraction, efficiency and completeness, the method comprises:
Resolve source database transaction journal, analytically in result, obtain the major key information of delta data table in source database, data manipulation type and operating time stamp information;
According to major key information, data manipulation type and operating time stamp information, determine the data that increment changes, the data pick-up changed by increment is in data warehouse.
On the other hand, the embodiment of the present invention additionally provides a kind of device of multi-source heterogeneous data increment extraction, in order to improve the accuracy rate of increment extraction, and efficiency and completeness, this device comprises:
Resolution unit, for obtaining the major key information of delta data table in source database, data manipulation type and operating time stamp information in analytically result;
Extracting unit, for according to described major key information, data manipulation type and operating time stamp information, determines the data that increment changes, and the data pick-up changed by increment is in data warehouse.
The data of being deleted by physics cannot be caught with timestamp mode traditional in prior art, and current ETL instrument is in extraction, change and existence a lot of deficiency in the aspect such as service efficiency and operability in loading work, can not support that the heterogeneous data source distributed to many districts and cities carries out extraction and compares completely, the technical scheme that the embodiment of the present invention provides, based on db transaction daily record, first source database transaction journal is resolved, obtain the analysis result of source database transaction journal, therefrom can obtain the concrete delta data of database, major key information is read from this concrete delta data, data manipulation type and operating time stamp information etc., the data of increment change are caught again in conjunction with traditional timestamp mode, finally carry out batch operation according to action type field and perform increment extraction.By the transaction journal of resolution data storehouse, batch operation can be set according to action type field, save plenty of time and system resource, greatly improve incremental data extraction efficiency; Simultaneously, which kind of type of database is transaction journal method can not limit to use, support multi-source heterogeneous data increment extraction, achieve the accuracy and completeness that ensure data, reduce the pressure of production system, save plenty of time and system resource, support multi-source heterogeneous data pick-up, greatly improve the efficiency that incremental data extracts.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms a application's part, does not form limitation of the invention.In the accompanying drawings:
Fig. 1 is the schematic flow sheet of the method for multi-source heterogeneous data increment extraction in the embodiment of the present invention;
Fig. 2 is the schematic flow sheet of the method for multi-source heterogeneous data increment extraction in another embodiment of the present invention;
Fig. 3 is that in the embodiment of the present invention, in source database, record extracts the structural representation of the table of timestamp information;
Fig. 4 is the schematic diagram of query statement XML collocation method in the embodiment of the present invention;
Fig. 5 is the schematic diagram that in the embodiment of the present invention, Kettle performs the conversion of increment extraction;
Fig. 6 is the structural representation of the device of multi-source heterogeneous data increment extraction in the embodiment of the present invention;
Fig. 7 is the structural representation of the device of multi-source heterogeneous data increment extraction in another embodiment of the present invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with embodiment and accompanying drawing, the present invention is described in further details.At this, exemplary embodiment of the present invention and illustrating for explaining the present invention, but not as a limitation of the invention.
Cannot catch the data and current ETL instrument of being deleted by physics for timestamp mode traditional in prior art exists a lot of not enough in extraction, conversion and loading work in service efficiency and operability etc., can not support that the heterogeneous data source to many districts and cities distribute extracts completely, this programme provides a kind of method and apparatus to multi-source heterogeneous data increment extraction.The method is based on db transaction daily record, under the guarantee of reliability, first resolution data storehouse transaction journal, obtain the analysis result of database transaction journal, therefrom can obtain the concrete delta data of database, major key information is read from this concrete delta data, data manipulation type and operating time stamp information etc., then in Production database, temporary table is set up, the result obtained is aggregated in this temporary table, the data of increment change are caught again in conjunction with traditional timestamp mode, finally carry out batch operation according to action type field and perform increment extraction, the data set extracted stores with Zip file, carry out conversion to load and date restoring so that follow-up.Under the guidance of the method, compensate for the deficiency that timestamp mode cannot catch deletion action, accuracy and the completeness of data can be ensured, reduce the pressure of production system, save plenty of time and system resource, support multi-source heterogeneous data pick-up, greatly improve the efficiency that incremental data extracts.
Technical scheme provided by the invention describes by following steps:
1, needed to carry out fail-safe analysis to database transaction journal before data pick-up;
2, by data base tool LogMiner, transaction journal is resolved, obtain the major key information of data, data manipulation type and operating time stamp information etc., and be stored in temporary table;
3, judge that increment changes and performs increment extraction according to the timestamp of temporary table and data manipulation type field, the data set of extraction stores with Zip file;
4, carry into execution a plan after repeat step 1.
Above step can form a benign cycle, adjusts an increment extraction scheme at set intervals, can the runnability of dynamically holding device, improves the efficiency that incremental data extracts.
Below technical scheme of the present invention is described in detail.
Embodiments provide a kind of method and device of multi-source heterogeneous data increment extraction, in order to improve the accuracy rate of increment extraction, efficiency and completeness.
On the one hand, embodiments provide a kind of method of multi-source heterogeneous data increment extraction, in order to improve the accuracy rate of increment extraction, efficiency and completeness, Fig. 1 is the schematic flow sheet of the method for multi-source heterogeneous data increment extraction in one embodiment of the invention, as shown in Figure 1, the method comprises the steps:
Step 102: resolve source database transaction journal, analytically obtain the major key information of delta data table in source database, data manipulation type and operating time stamp information in result;
Step 103: according to described major key information, data manipulation type and operating time stamp information, determines the data that increment changes, and the data pick-up changed by increment is in data warehouse.
The data of being deleted by physics cannot be caught with timestamp mode traditional in prior art, and current ETL instrument is in extraction, change and existence a lot of deficiency in the aspect such as service efficiency and operability in loading work, can not support that the heterogeneous data source distributed to many districts and cities carries out extraction and compares completely, the technical scheme that the embodiment of the present invention provides, based on db transaction daily record, first source database transaction journal is resolved, obtain the analysis result of source database transaction journal, therefrom can obtain the concrete delta data of database, major key information is read from this concrete delta data, data manipulation type and operating time stamp information etc., the data of increment change are caught again in conjunction with traditional timestamp mode, finally carry out batch operation according to action type field and perform increment extraction.By the transaction journal of resolution data storehouse, batch operation can be set according to action type field, save plenty of time and system resource, greatly improve incremental data extraction efficiency; Simultaneously, which kind of type of database is transaction journal method can not limit to use, support multi-source heterogeneous data increment extraction, achieve the accuracy and completeness that ensure data, reduce the pressure of production system, save plenty of time and system resource, support multi-source heterogeneous data pick-up, greatly improve the efficiency that incremental data extracts.
Fig. 2 is the schematic flow sheet of the method for multi-source heterogeneous data increment extraction in another embodiment of the present invention, as shown in Figure 2, before above-mentioned steps 102, can also comprise:
Step 101: carry out fail-safe analysis to source database transaction journal, finds out the source database transaction journal meeting reliability conditions;
Resolve source database transaction journal, comprising: the described source database transaction journal meeting reliability conditions is resolved.
During concrete enforcement, reliability refers under defined terms and in official hour, completes the ability of predetermined function.Because this programme is based on db transaction log analysis, the reliability of therefore transaction journal determines the operating accuracy to object library.So needed the fail-safe analysis determining database journal before carrying out data pick-up.
Described reliability conditions is: the speed reading source database transaction journal and the ratio writing source database transaction journal speed are S/ (S-L), wherein, the physical space size of S shared by source database transaction log file, L reads source database transaction log file to lag behind the physical space size writing source database transaction log file, 1≤L≤S-1.Below the analysis of this reliability conditions and research process are described below:
Suppose that the physical space size shared by journal file (journal file mentioned in the embodiment of the present invention all refers to source database transaction log file) is Size (logfile), the speed reading daily record is V r, the speed writing daily record is V w.Lag behind write journalizing owing to reading journalizing, therefore suppose that Late (logfile) lags behind the physical space size of write operation for read operation.So, if want to ensure that db transaction log content is not lost, and the reading journal file record of reliability, the following condition of demand fulfillment:
k × S i z e ( l o g f l e ) - L a t e ( l o g f i l e ) V w ≥ k × S i z e ( log f i l e ) V r - - - ( 1 )
Wherein, k is the number of times (k >=1, k is positive integer) repeatedly reading daily record.
Further derivation, can draw following result:
V r V w ≥ k × S i z e ( log f i l e ) k × S i z e ( log f i l e ) - L a t e ( log f i l e ) ⇒ V r V w ≥ 1 1 - L a t e ( log f i l e ) k × S i z e ( log f i l e ) - - - ( 2 )
Make Size (logfile)=S, Late (logfile)=L, then have:
V r V w ≥ 1 1 - L a t e ( log f i l e ) k × S i z e ( log f i l e ) = 1 1 - L k S - - - ( 3 )
If k=1, then have V r V w ≥ = 1 1 - L S - - - ( 4 )
⇒ V r ≥ S S - L × V w - - - ( 5 )
So, ensure that db transaction log content is not lost, the speed reading daily record must be met and write daily record speed doubly.
If a file is by write operation, then can be prohibited other read-write operation, therefore have relation 1≤L≤S-1.When reading times k constantly increases, read or write speed ratio is close to 1.Only meet the journal file that above-mentioned condition obtains and be only reliably transaction log file, only have when journal file is reliable, the accuracy of the data pick-up of guarantee based on journal file.
Under the guarantee of reliability, following this programme sets forth analytic method and the increment extraction of db transaction daily record.
First, the analytic method of the db transaction daily record of above-mentioned steps 102 is introduced.
During concrete enforcement, the db transaction daily record mentioned in the embodiment of the present invention all refers to source database transaction journal, the information recorded in db transaction daily record comprises: the change history of database, action type (Insert, Update, Delete etc.), No. SCN (SystemChangeNumber: be used for the state of registration database in the past in the time and track) that change is corresponding and perform the user profile etc. of these operations, but the original log of its record is binary storage, and direct parsing cannot be understood at all.In above-mentioned steps 102, have multiple to the method that transaction journal is resolved, the method that following this programme introduction is resolved transaction journal by data base tool LogMiner.
The transaction journal of database needs usage data storehouse instrument LogMiner to resolve.Due to record in db transaction daily record and nonprimitive object oriented, and just their internal number in a database.For the ease of identifying the SQL result parsed, needing the title using primary object, therefore needing to use LogMiner and extracting data dictionary information.Next data base tool LogMiner is first introduced:
LogMiner comprises two bags:
1. Dbms_logmnr_d bag, this bag only includes one for extracting the process of data dictionary information, i.e. dbms_logmnr_d.build (dictionary_filenamevarchar2, dictionary_locationvarchar2) process.
2. Dbms_logmnr bag, it comprises three storing processs:
● dbms_logmnr.add_logfile (namevarchar2, optionsnumber), is used for adding/delete the journal file for analyzing;
● dbms_logmnr.start_logmnr (start_scnnumber, end_scnnumber, start_timenumber, end_timenumber, dictfilenamevarchar2, optionsnumber), is used for opening log analysis work;
● dbms_logmnr.end_logmnr (), is used for termination analysis session, the internal memory that it will reclaim shared by Log_Miner.
Concrete parsing following steps 1)-6) shown in:
1) data dictionary file directory is created
Wherein, value/data/cyx/logmnr that field VALUE deposits is exactly the catalogue that data dictionary file is deposited, if VALUE is empty, needs change to arrange.
2) data dictionary file is created
SQL>execdbms_logmnr_d.build(dictionary_filename=>'dic.ora',dictionary_location=>'/data6/cyx/logmnr');
Wherein, dictionary_location refers to the deposit position of dictionary message file, i.e. step 1) in the value of VALUE; Dictionary_filename refers to the name being put in dictionary message file, can get arbitrarily;
3) transaction log file that will analyze is added
SQL>execdbms_logmnr.add_logfile(logfilename=>'/data6/cyx/rac1arch/arch_1_197.arc',options=>dbms_logmnr.new);
Wherein, logfilename is the path that transaction log file is corresponding;
Options option has three parameters to use:
NEW-represents the journal file list that establishment one is new;
ADDFILE-represents add journal file in this list;
REMOVEFILE-represents delete journal file in this list.
4) LogMiner is used to carry out log analysis
SQL>execdbms_logmnr.start_logmnr(dictionary_filename=>'/data6/cyx/logmnr/dic.ora')
5) result of log analysis is checked
SQL>selectsql_redofromv$logmnr_contentswhereusername='xxxx';
6) log analysis affairs are stopped, releasing memory
SQL>execdbms_logmnr.end_logmnr()。
The analysis result of database transaction journal can be obtained according to foregoing, therefrom can obtain the concrete delta data of database, read from this concrete delta data.
Next temporary table is set up, for storing above-mentioned major key information, data manipulation type and operating time stamp information etc., to use during follow-up judgement increment delta data.Concrete table structure is as shown in table 1 below:
Table 1 db transaction daily record analysis result temporary table
Wherein, LOG_ACTION value is (Insert, Update, Delete), represents the action type that this record occurs.
Follow-up introduce incremental data extract time, the extraction that needs carry out data according to the operating time stamp field of this temporary table and action type field judges, like this by the transaction journal of resolution data storehouse, batch operation can be set according to action type field, save plenty of time and system resource, greatly improve incremental data extraction efficiency.Meanwhile, which kind of type of database is transaction journal method can not limit to use, and supports multi-source heterogeneous data increment extraction.
Then, the increment extraction of above-mentioned steps 103 is introduced.
During concrete enforcement, above-mentioned steps 103 can comprise:
According to described data manipulation type and operating time stamp information, determine the data that increment changes and data type; Namely judge that increment changes, determine the data that increment changes, and the data of this increment change are which kind of type (are the increment delta datas of deletion action, or: upgrade or the increment delta data of update)
According to the data type of described major key information, increment change, the data pick-up changed by increment is in data warehouse.
Particularly, can comprise:
According to the data manipulation type in temporary table and operating time stamp information, determine the data that increment changes and type; ;
According to the data type of the major key information in temporary table, increment change, the data increment extraction changed by increment is in data warehouse.
First, introduce how to determine the data that (judgement) increment changes:
During concrete enforcement, the method for the multi-source heterogeneous data increment extraction that the embodiment of the present invention provides also comprises: above-mentioned mention set up temporary table, described major key information, data manipulation type and operating time stamp information are stored in described temporary table;
According to described data manipulation type and operating time stamp information, determine the data that increment changes and data type, comprising:
When the data manipulation type field in temporary table is deletion action, relatively in the operating time stamp of temporary table and source database, record extracts the operating time stamp in the table of timestamp information, if the operating time stamp of temporary table is greater than record in source database extract the operating time stamp in the table of timestamp information, determine that target data is increment delta data, type is the increment delta data of deletion action;
When the data manipulation type field in temporary table is for renewal or update, if creation-time stamp that is existing or that configured is greater than the timestamp performing increment extraction last time in source database, or modification time stamp is greater than the timestamp performing increment extraction last time, determine that target data is increment delta data, type is the increment delta data of renewal or update.
Being introduced increment change with a little embodiment below judges as follows:
Operating time stamp (LOG_TIMESTAMP) field according to above-mentioned temporary table judges that increment changes.The table CDC_TIME that it needs and in source database, record extracts timestamp information compares, and this table stores two timestamp field CURRENT_LOAD and LAST_LOAD.What wherein CURRENT_LOAD recorded is the timestamp information performing this increment extraction, and what LAST_LOAD recorded is the timestamp information performing increment extraction last time, and the field type of this table is timestamp type, and CDC_TIME list structure as shown in Figure 3.
In order to improve extraction efficiency, this programme binding time stamp mode judges that increment changes, and does not need the LAST_LOAD timestamp compared one by one in timestamp LOG_TIMESTAMP and CDC_TIME of temporary table.Preferably, two kinds of judgment rules are divided into according to action type field difference in temporary table:
If the action type field a) in temporary table is Delete, compare the LAST_LOAD timestamp in timestamp LOG_TIMESTAMP and CDC_TIME of temporary table, if LOG_TIMESTAMP is greater than LAST_LOAD, namely think that the row data are increment delta datas.
If the action type field b) in temporary table is Update (renewal) or Insert (insertion) operation, only need be greater than LAST_LOAD or modification time stamp according to creation-time stamp that is existing in source database or configuration and be greater than LAST_LOAD and be judged as the data that increment changes.Certainly, also by comparing the LAST_LOAD timestamp in timestamp LOG_TIMESTAMP and CDC_TIME of temporary table, the increment delta data of renewal or update can be determined.
In addition, being greater than of mentioning in the embodiment of the present invention can be understood as being later than on the running time.Above-mentioned creation-time stamp of mentioning can for the one operation in Insert operation, and the operation of modification time stamp can for the one operation in Update operation.
Db transaction daily record analytic method and timestamp method are combined, the drawback that timestamp method does not catch deletion action can be made up, the scene that supported data is deleted by physics, make use of again the advantage of timestamp method simple and fast simultaneously, support the execution increment extraction that the mode of follow-up use XML configuration is concurrent, to reach fast, efficiently, accurately to catch the object of incremental data.
After being judged as increment delta data, need to be divided into two kinds of ways to perform increment extraction according to above-mentioned rule:
During concrete enforcement, according to the data type of described major key information, increment change, the data pick-up changed by increment, in data warehouse, can comprise:
When determining that target data is the increment delta data of deletion action, by the major key information extraction that parses in temporary table out, before loading data, according to major key information, deletion action is performed to data warehouse.
When determining that target data is the increment delta data of renewal or update, query statement being configured in expandable mark language XML file and performing extraction, according to major key information, Data Update or update are performed to data warehouse.
Increment extraction is introduced below as follows with a little embodiment:
If the action type field in temporary table is Delete, and LOG_TIMESTAMP is greater than LAST_LOAD, then the data major key parsed in temporary table is extracted, before loading data, according to major key, deletion action is performed to data warehouse.
If the action type field Update in temporary table or Insert operation, only need be greater than LAST_LOAD or modification time stamp according to creation-time stamp that is existing in source database or configuration to be greater than LAST_LOAD and to be judged as the data that increment changes, query statement is configured in XML file and performs extraction, then according to major key, Data Update or update are performed to data warehouse.
Wherein XML configuration form and Kettle perform increment extraction translation example as shown in Figure 4 and Figure 5: Fig. 4 is the schematic diagram of query statement XML collocation method in the embodiment of the present invention; Fig. 5 is the schematic diagram that in the embodiment of the present invention, Kettle performs the conversion of increment extraction; Wherein, Kettle is third party's instrument of a set of data pick-up of ETL server deploy, follow-up introduce the device of multi-source heterogeneous data increment extraction in can make referrals to.
In Kettle, perform incremental data extract, need first to carry out deletion action, then carry out again inserting or renewal rewards theory, in order to avoid major key clashes.
Below storage incremental data is introduced.
During concrete enforcement, as shown in Figure 2, the technical scheme that the embodiment of the present invention provides can also comprise step 104:
The data that the increment of extraction changes are stored in text formatting txt file;
The data that the increment be stored in text formatting txt file changes are stored in Zip compressed package mode.
In one embodiment, in step 104, the data that the increment of extraction changes are stored in text formatting txt file, comprise:
The increment delta data of deletion action is stored in the first text formatting txt file;
The increment delta data of renewal or update is stored in the second text formatting txt file.
Particularly, after executing extraction, the increment delta data of extraction is left in txt file, deposit in Zip compressed package mode.According to two kinds of working rules of increment extraction, be stored in a txt file by deletion Delete data major key, filename is deposited with " D_XXXLOG_TIMESTAMP ", inserts Insert and is stored in a txt file together with renewal Update data.Like this when performing Data import, first can judge filename, if the file having " D_ " to start, first performing deletion action, and then perform insertion or renewal rewards theory, be unlikely to cause major key conflict.
During concrete enforcement, as shown in Figure 2, the method for multi-source heterogeneous data increment extraction that the embodiment of the present invention provides can also comprise step 105: upgrade record in source database and extract the operating time stamp information in the table of timestamp information.
Particularly, after execution extraction terminates, delete according to the temporary table that transaction journal is set up in source data, upgrade LAST_LOAD is this increment extraction timestamp information simultaneously.
Based on same inventive concept, additionally provide a kind of device of multi-source heterogeneous data increment extraction in the embodiment of the present invention, as described in the following examples.The principle of dealing with problems due to a kind of device of multi-source heterogeneous data increment extraction is similar to a kind of method of multi-source heterogeneous data increment extraction, therefore the enforcement of the device of multi-source heterogeneous data increment extraction see an a kind of enforcement of method of multi-source heterogeneous data increment extraction, can repeat part and repeats no more.Following used, term " unit " or " module " can realize the software of predetermined function and/or the combination of hardware.Although the device described by following examples preferably realizes with software, hardware, or the realization of the combination of software and hardware also may and conceived.
Fig. 6 is the structural representation of the device of multi-source heterogeneous data increment extraction in the embodiment of the present invention, and as shown in Figure 6, this device comprises:
Resolution unit 20, for resolving source database transaction journal, analytically obtains the major key information of delta data table in source database, data manipulation type and operating time stamp information in result;
Extracting unit 30, for according to described major key information, data manipulation type and operating time stamp information, determines the data that increment changes, and the data pick-up changed by increment is in data warehouse.
During concrete enforcement, described resolution unit 20 is for the transaction log file in resolution data storehouse, obtain the concrete delta data of database, major key information, data manipulation type and operating time stamp information etc. are read from this concrete delta data, in wherein data manipulation type, Insert represents change type for inserting, Update represents change type for upgrading, and Delete represents change type for deleting.In source database, set up temporary table, for storing the above-mentioned transaction journal information got simultaneously.
During concrete enforcement, perform extraction according to the operating time stamp in the temporary table of above-mentioned resolution unit 20 and action type field.The table CDC_TIME that wherein timestamp needs and in source database, record extracts timestamp information compares, and this table stores two timestamp field CURRENT_LOAD and LAST_LOAD.It is as follows that concrete extracting unit 30 performs extraction judgment rule:
If the action type field a) in temporary table is Delete, LASTLOAD_TIME in then being shown by timestamp LOG_TIMESTAMP and CDC_TIME corresponding for this row compares, if LOG_TIMESTAMP is greater than LAST_LOAD, then the data major key parsed in temporary table is extracted, before loading data, according to major key, deletion action is performed to data warehouse.
If the action type field b) in temporary table is Update or Insert operation, only need be greater than LAST_LOAD or modification time stamp according to creation-time stamp that is existing in source database or configuration to be greater than LAST_LOAD and to be judged as the data that increment changes, query statement is configured in XML file and performs extraction, then according to major key, Data Update or update are performed to data warehouse.
Fig. 7 is the structural representation of the device of multi-source heterogeneous data increment extraction in another embodiment of the present invention, and as shown in Figure 7, the device of multi-source heterogeneous data increment extraction in the embodiment of the present invention, can also comprise:
Transaction journal fail-safe analysis unit 10, for carrying out fail-safe analysis to source database transaction journal, finds out the source database transaction journal meeting reliability conditions;
Described resolution unit 20 is specifically for resolving the described source database transaction journal meeting reliability conditions.
In one embodiment, described reliability conditions is: the speed reading source database transaction journal and the ratio writing source database transaction journal speed are S/ (S-L), wherein, the physical space size of S shared by source database transaction log file, L reads source database transaction log file to lag behind the physical space size writing source database transaction log file, 1≤L≤S-1.
In one embodiment, described extracting unit 30 can comprise:
Increment delta data determining unit, for according to data manipulation type and operating time stamp information, determines the data that increment changes and data type;
Increment delta data extracting unit, for the data type according to major key information, increment change, the data pick-up changed by increment is in data warehouse.
In one embodiment, the device of the multi-source heterogeneous data increment extraction that the embodiment of the present invention provides also comprises: temporary table sets up unit, for setting up temporary table, described major key information, data manipulation type and operating time stamp information are stored in described temporary table;
Described increment delta data determining unit can comprise:
First determining unit, for when the data manipulation type field in temporary table is deletion action, relatively in the operating time stamp of temporary table and source database, record extracts the operating time stamp in the table of timestamp information, if the operating time stamp of temporary table is greater than record in source database extract the operating time stamp in the table of timestamp information, determine that target data is increment delta data, type is the increment delta data of deletion action;
Second determining unit, for when the data manipulation type field in temporary table is for renewal or update, if creation-time stamp that is existing or that configured is greater than the timestamp performing increment extraction last time in source database, or modification time stamp is greater than the timestamp performing increment extraction last time, determine that target data is increment delta data, type is the increment delta data of renewal or update.
In one embodiment, described increment delta data extracting unit can comprise:
First extracting unit, for when determining that target data is the increment delta data of deletion action, by the major key information extraction that parses in temporary table out, before loading data, according to major key information, performs deletion action to data warehouse.
Second extracting unit, for when determining that target data is the increment delta data of renewal or update, query statement is configured in expandable mark language XML file and performs extraction, according to major key information, Data Update or update are performed to data warehouse.
In one embodiment, the device of multi-source heterogeneous data increment extraction in the embodiment of the present invention, as shown in Figure 7, can also comprise: storage unit 40, data for being changed by the increment of extraction are stored in text formatting txt file, the data that the increment be stored in text formatting txt file changes are stored in Zip compressed package mode.
In one embodiment, described storage unit 40 specifically for:
The increment delta data of deletion action is stored in the first text formatting txt file;
The increment delta data of renewal or update is stored in the second text formatting txt file.
During concrete enforcement, after executing extraction, the increment delta data of extraction is left in txt file, deposit in Zip compressed package mode.According to the decimation rule of extracting unit, be stored in a txt file by deletion Delete data major key, insert Insert and be stored in a txt file together with renewal Update data, convenient differentiation stores and second load like this.
In one embodiment, the device of multi-source heterogeneous data increment extraction in the embodiment of the present invention, as shown in Figure 7, can also comprise: update of time stamp unit 50, extracts the operating time stamp information in the table of timestamp information for upgrading record in source database.
During concrete enforcement, after execution extraction terminates, delete according to the temporary table that transaction journal is set up in source data, upgrade LAST_LOAD is this increment extraction timestamp information simultaneously.
Certainly, the device of multi-source heterogeneous data increment extraction in the embodiment of the present invention, can also comprise:
Dispensing unit, according to above-mentioned multi-source heterogeneous incremental data abstracting method, need the third party's instrument Kettle at a set of data pick-up of each ETL server deploy, and configure for it JNDI database linkage information extracting source database and data warehouse, support that many districts and cities distribute and extract.Guarantee or source of configuration database have " creation-time stamp (Creatdt) " and " last modification time stabs (Modifydt) " field simultaneously, and require that field type is timestamp type, and precision is accurate to millisecond.
In sum, technical scheme provided by the invention, the mode providing a kind of db transaction daily record binding time stamp obtains the method for multi-source heterogeneous incremental data change; Data in the mode query source database providing a kind of XML file to configure, utilize Kettle instrument to perform concurrent incremental and extract, the data generated stored with Zip compressed file; A kind of analytical approach of validation database transaction journal abstracting method reliability is provided; A kind of incremental data abstracting method and device are provided.
By the above introduction to the embodiment of the present invention, the technical scheme that the embodiment of the present invention provides has following Advantageous Effects:
(1) based on db transaction journal file, resolved by the db transaction journal file that obtains, obtain in transaction log file the various detailed operation of these data and timestamp, especially deletion action, compensate for the deficiency that timestamp mode cannot catch deletion action, the scene that supported data is deleted by physics.Meanwhile, which kind of type of database is transaction journal method can not limit to use, and supports multi-source heterogeneous data increment extraction.Two kinds of modes combine, and fast, efficiently and accurately can realize incremental data and extract;
(2) use the execution increment extraction that the mode of XML configuration is concurrent, reduce the pressure to storage facility located at processing plant system.The extracted file simultaneously generated stores with Zip file, can realize second load easily, conveniently carry out date restoring;
(3) this programme is that the mode that db transaction log analysis binding time stabs performs increment extraction, and the reliability of transaction journal determines the operating accuracy to data warehouse;
(4) accuracy and the completeness of data can be ensured, reduce the pressure of production system, save plenty of time and system resource, support multi-source heterogeneous data pick-up, greatly improve the efficiency that incremental data extracts.
Obviously, those skilled in the art should be understood that, each module of the above-mentioned embodiment of the present invention, device, or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of calculation element, thus, they can be stored and be performed by calculation element in the storage device, and in some cases, step shown or described by can performing with the order be different from herein, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the embodiment of the present invention is not restricted to any specific hardware and software combination.
Above specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; these are only specific embodiments of the invention; the protection domain be not intended to limit the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (15)

1. a method for multi-source heterogeneous data increment extraction, is characterized in that, comprising:
Resolve source database transaction journal, analytically in result, obtain the major key information of delta data table in source database, data manipulation type and operating time stamp information;
According to described major key information, data manipulation type and operating time stamp information, determine the data that increment changes, the data pick-up changed by increment is in data warehouse.
2. the method for multi-source heterogeneous data increment extraction as claimed in claim 1, it is characterized in that, before the transaction journal of parsing source database, also comprise: fail-safe analysis is carried out to source database transaction journal, find out the source database transaction journal meeting reliability conditions;
Resolve source database transaction journal, comprising: the described source database transaction journal meeting reliability conditions is resolved.
3. the method for multi-source heterogeneous data increment extraction as claimed in claim 2, it is characterized in that, described reliability conditions is: the speed reading source database transaction journal and the ratio writing source database transaction journal speed are S/ (S-L), wherein, the physical space size of S shared by source database transaction log file, L reads source database transaction log file to lag behind the physical space size writing source database transaction log file, 1≤L≤S-1.
4. the method for multi-source heterogeneous data increment extraction as claimed in claim 1, it is characterized in that, according to described major key information, data manipulation type and operating time stamp information, determine the data that increment changes, the data pick-up changed by increment, in data warehouse, comprising:
According to described data manipulation type and operating time stamp information, determine the data that increment changes and data type;
According to the data type of described major key information, increment change, the data pick-up changed by increment is in data warehouse.
5. the method for multi-source heterogeneous data increment extraction as claimed in claim 4, is characterized in that, also comprise: set up temporary table, is stored in described temporary table by described major key information, data manipulation type and operating time stamp information;
According to described data manipulation type and operating time stamp information, determine the data that increment changes and data type, comprising:
When the data manipulation type field in temporary table is deletion action, relatively in the operating time stamp of temporary table and source database, record extracts the operating time stamp in the table of timestamp information, if the operating time stamp of temporary table is greater than record in source database extract the operating time stamp in the table of timestamp information, determine that target data is increment delta data, type is the increment delta data of deletion action;
When the data manipulation type field in temporary table is for renewal or update, if creation-time stamp that is existing or that configured is greater than the timestamp performing increment extraction last time in source database, or modification time stamp is greater than the timestamp performing increment extraction last time, determine that target data is increment delta data, type is the increment delta data of renewal or update.
6. the method for multi-source heterogeneous data increment extraction as claimed in claim 5, is characterized in that, according to the data type of described major key information, increment change, the data pick-up changed by increment, in data warehouse, comprising:
When determining that target data is the increment delta data of deletion action, by the major key information extraction that parses in temporary table out, before loading data, according to major key information, deletion action is performed to data warehouse;
When determining that target data is the increment delta data of renewal or update, query statement being configured in expandable mark language XML file and performing extraction, according to major key information, Data Update or update are performed to data warehouse.
7. the method for multi-source heterogeneous data increment extraction as claimed in claim 1, is characterized in that, also comprise:
The data that the increment of extraction changes are stored in text formatting txt file;
The data that the increment be stored in text formatting txt file changes are stored in Zip compressed package mode.
8. the method for multi-source heterogeneous data increment extraction as claimed in claim 1, is characterized in that, also comprise: upgrade record in source database and extract the operating time stamp information in the table of timestamp information.
9. a device for multi-source heterogeneous data increment extraction, is characterized in that, comprising:
Resolution unit, for resolving source database transaction journal, analytically obtains the major key information of delta data table in source database, data manipulation type and operating time stamp information in result;
Extracting unit, for according to described major key information, data manipulation type and operating time stamp information, determines the data that increment changes, and the data pick-up changed by increment is in data warehouse.
10. the device of multi-source heterogeneous data increment extraction as claimed in claim 9, it is characterized in that, also comprising: transaction journal fail-safe analysis unit, for carrying out fail-safe analysis to source database transaction journal, finding out the source database transaction journal meeting reliability conditions;
Described resolution unit is specifically for resolving the described source database transaction journal meeting reliability conditions.
The device of 11. multi-source heterogeneous data increment extractions as claimed in claim 10, it is characterized in that, described reliability conditions is: the speed reading source database transaction journal and the ratio writing source database transaction journal speed are S/ (S-L), wherein, the physical space size of S shared by source database transaction log file, L reads source database transaction log file to lag behind the physical space size writing source database transaction log file, 1≤L≤S-1.
The device of 12. multi-source heterogeneous data increment extractions as claimed in claim 9, it is characterized in that, described extracting unit comprises:
Increment delta data determining unit, for according to data manipulation type and operating time stamp information, determines the data that increment changes and data type;
Increment delta data extracting unit, for the data type according to major key information, increment change, the data pick-up changed by increment is in data warehouse.
The device of 13. multi-source heterogeneous data increment extractions as claimed in claim 12, it is characterized in that, also comprising: temporary table sets up unit, for setting up temporary table, described major key information, data manipulation type and operating time stamp information being stored in described temporary table;
Described increment delta data determining unit comprises:
First determining unit, for when the data manipulation type field in temporary table is deletion action, relatively in the operating time stamp of temporary table and source database, record extracts the operating time stamp in the table of timestamp information, if the operating time stamp of temporary table is greater than record in source database extract the operating time stamp in the table of timestamp information, determine that target data is increment delta data, type is the increment delta data of deletion action;
Second determining unit, for when the data manipulation type field in temporary table is for renewal or update, if creation-time stamp that is existing or that configured is greater than the timestamp performing increment extraction last time in source database, or modification time stamp is greater than the timestamp performing increment extraction last time, determine that target data is increment delta data, type is the increment delta data of renewal or update.
The device of 14. multi-source heterogeneous data increment extractions as claimed in claim 13, is characterized in that, described increment delta data extracting unit comprises:
First extracting unit, for when determining that target data is the increment delta data of deletion action, by the major key information extraction that parses in temporary table out, before loading data, according to major key information, performs deletion action to data warehouse;
Second extracting unit, for when determining that target data is the increment delta data of renewal or update, query statement is configured in expandable mark language XML file and performs extraction, according to major key information, Data Update or update are performed to data warehouse.
The device of 15. multi-source heterogeneous data increment extractions as claimed in claim 9, is characterized in that, also comprise: update of time stamp unit, extracts the operating time stamp information in the table of timestamp information for upgrading record in source database.
CN201510867992.2A 2015-12-02 2015-12-02 Method and device for extracting multi-source heterogeneous data increment Pending CN105488187A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510867992.2A CN105488187A (en) 2015-12-02 2015-12-02 Method and device for extracting multi-source heterogeneous data increment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510867992.2A CN105488187A (en) 2015-12-02 2015-12-02 Method and device for extracting multi-source heterogeneous data increment

Publications (1)

Publication Number Publication Date
CN105488187A true CN105488187A (en) 2016-04-13

Family

ID=55675161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510867992.2A Pending CN105488187A (en) 2015-12-02 2015-12-02 Method and device for extracting multi-source heterogeneous data increment

Country Status (1)

Country Link
CN (1) CN105488187A (en)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126753A (en) * 2016-08-23 2016-11-16 易联众信息技术股份有限公司 The method of increment extractions based on big data
CN106326470A (en) * 2016-08-31 2017-01-11 无锡雅座在线科技发展有限公司 Streaming big data processing method and device
CN106407360A (en) * 2016-09-07 2017-02-15 广州视源电子科技股份有限公司 Data processing method and device
CN106682140A (en) * 2016-12-20 2017-05-17 华北计算技术研究所(中国电子科技集团公司第十五研究所) Multi-system user incremental synchronization method based on timestamps and mapping strategies
CN106844507A (en) * 2016-12-27 2017-06-13 星环信息科技(上海)有限公司 A kind of method and apparatus of data batch processing
CN107122424A (en) * 2017-04-07 2017-09-01 南京南瑞集团公司 A kind of relational database daily record abstracting method
CN107347062A (en) * 2017-06-19 2017-11-14 北京开数科技有限公司 A kind of method, electronic equipment and the readable storage medium storing program for executing of daily record data processing
CN107436902A (en) * 2016-05-27 2017-12-05 北京京东尚科信息技术有限公司 Data pick-up method and system based on mass data
CN107748752A (en) * 2017-09-05 2018-03-02 新智云数据服务有限公司 A kind of data processing method and device
CN108399256A (en) * 2018-03-06 2018-08-14 北京慧萌信安软件技术有限公司 Heterogeneous database content synchronization method, device and middleware
CN108563658A (en) * 2017-12-29 2018-09-21 邵阳学院 A kind of method and apparatus of multi-platform data synchronization updating
CN108681590A (en) * 2018-05-15 2018-10-19 普信恒业科技发展(北京)有限公司 Incremental data processing method and processing device, computer equipment, computer storage media
CN108829830A (en) * 2018-06-15 2018-11-16 四川众之金科技有限公司 Data processing method and device
CN109241156A (en) * 2018-07-31 2019-01-18 安徽四创电子股份有限公司 The method updated based on ETL tool from relevant database to non-relational database
CN109254967A (en) * 2018-08-29 2019-01-22 河南智慧云大数据有限公司 A kind of depth analysis method and device based on multi-source heterogeneous mass data
CN109408480A (en) * 2018-09-29 2019-03-01 武汉达梦数据库有限公司 The method and system read based on ORACLE multinode RAC log based on SCN alignment
CN110019477A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of method and system carrying out big data processing using HIVE backup table
CN110019254A (en) * 2017-07-17 2019-07-16 中兴通讯股份有限公司 Processing method, device and the computer readable storage medium of planning region increment record
CN110147362A (en) * 2019-04-04 2019-08-20 中电科大数据研究院有限公司 One kind is based on the acquisition of event driven DOC DATA and processing system and its method
CN110222121A (en) * 2019-06-14 2019-09-10 浪潮软件股份有限公司 A kind of SQL Server database increment synchronization realization method and system based on CDC mode
CN110457358A (en) * 2019-07-30 2019-11-15 新华三大数据技术有限公司 A kind of information collecting method, device, server and computer readable storage medium
CN110569222A (en) * 2019-08-23 2019-12-13 浙江大搜车软件技术有限公司 link tracking method and device, computer equipment and readable storage medium
CN110609860A (en) * 2018-05-29 2019-12-24 中国移动通信集团重庆有限公司 Data ETL processing method, device, equipment and storage medium
CN110674154A (en) * 2019-09-26 2020-01-10 浪潮软件股份有限公司 Spark-based method for inserting, updating and deleting data in Hive
CN110727724A (en) * 2019-09-09 2020-01-24 上海陆家嘴国际金融资产交易市场股份有限公司 Data extraction method and device, computer equipment and storage medium
CN111104445A (en) * 2019-12-06 2020-05-05 杭州数梦工场科技有限公司 Data synchronization method, device and equipment
CN111813845A (en) * 2020-06-29 2020-10-23 平安国际智慧城市科技股份有限公司 ETL task-based incremental data extraction method, device, equipment and medium
CN111881136A (en) * 2020-07-29 2020-11-03 山东健康医疗大数据有限公司 Method for realizing incremental data management in medical industry
CN112231301A (en) * 2020-10-21 2021-01-15 黄河水利委员会黄河水利科学研究院 Yellow river water sand change data warehouse
CN112286892A (en) * 2020-07-01 2021-01-29 上海柯林布瑞信息技术有限公司 Real-time data synchronization method and device for post-relational database, storage medium and terminal
CN112486924A (en) * 2020-12-17 2021-03-12 深圳软牛科技有限公司 Method and device for searching file deletion time in NTFS (New technology File System) and electronic equipment
WO2021174696A1 (en) * 2020-03-06 2021-09-10 平安科技(深圳)有限公司 Data updating method and apparatus, computer device, and storage medium
CN113641742A (en) * 2021-08-05 2021-11-12 广东电网有限责任公司 Data extraction method, device, equipment and storage medium
CN113779048A (en) * 2020-06-18 2021-12-10 北京沃东天骏信息技术有限公司 Data processing method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216987A (en) * 2014-09-04 2014-12-17 浪潮通用软件有限公司 Timestamp-based method for capturing incremental data and supporting delete operation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216987A (en) * 2014-09-04 2014-12-17 浪潮通用软件有限公司 Timestamp-based method for capturing incremental data and supporting delete operation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贾艳凯: "多源异构增量数据抽取方法研究与设计", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107436902A (en) * 2016-05-27 2017-12-05 北京京东尚科信息技术有限公司 Data pick-up method and system based on mass data
CN107436902B (en) * 2016-05-27 2019-05-03 北京京东尚科信息技术有限公司 Data pick-up method and system based on mass data
CN106126753B (en) * 2016-08-23 2019-03-05 易联众信息技术股份有限公司 The method of increment extraction based on big data
CN106126753A (en) * 2016-08-23 2016-11-16 易联众信息技术股份有限公司 The method of increment extractions based on big data
CN106326470A (en) * 2016-08-31 2017-01-11 无锡雅座在线科技发展有限公司 Streaming big data processing method and device
CN106407360A (en) * 2016-09-07 2017-02-15 广州视源电子科技股份有限公司 Data processing method and device
CN106407360B (en) * 2016-09-07 2020-07-24 广州视源电子科技股份有限公司 Data processing method and device
CN106682140A (en) * 2016-12-20 2017-05-17 华北计算技术研究所(中国电子科技集团公司第十五研究所) Multi-system user incremental synchronization method based on timestamps and mapping strategies
CN106844507B (en) * 2016-12-27 2019-07-26 星环信息科技(上海)有限公司 A kind of method and apparatus of data batch processing
CN106844507A (en) * 2016-12-27 2017-06-13 星环信息科技(上海)有限公司 A kind of method and apparatus of data batch processing
CN107122424B (en) * 2017-04-07 2019-11-05 南京南瑞集团公司 A kind of relational database log abstracting method
CN107122424A (en) * 2017-04-07 2017-09-01 南京南瑞集团公司 A kind of relational database daily record abstracting method
CN107347062A (en) * 2017-06-19 2017-11-14 北京开数科技有限公司 A kind of method, electronic equipment and the readable storage medium storing program for executing of daily record data processing
CN110019254A (en) * 2017-07-17 2019-07-16 中兴通讯股份有限公司 Processing method, device and the computer readable storage medium of planning region increment record
CN107748752A (en) * 2017-09-05 2018-03-02 新智云数据服务有限公司 A kind of data processing method and device
CN110019477A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of method and system carrying out big data processing using HIVE backup table
CN108563658A (en) * 2017-12-29 2018-09-21 邵阳学院 A kind of method and apparatus of multi-platform data synchronization updating
CN108399256B (en) * 2018-03-06 2020-08-04 北京慧萌信安软件技术有限公司 Heterogeneous database content synchronization method and device and middleware
CN108399256A (en) * 2018-03-06 2018-08-14 北京慧萌信安软件技术有限公司 Heterogeneous database content synchronization method, device and middleware
CN108681590A (en) * 2018-05-15 2018-10-19 普信恒业科技发展(北京)有限公司 Incremental data processing method and processing device, computer equipment, computer storage media
CN110609860A (en) * 2018-05-29 2019-12-24 中国移动通信集团重庆有限公司 Data ETL processing method, device, equipment and storage medium
CN108829830A (en) * 2018-06-15 2018-11-16 四川众之金科技有限公司 Data processing method and device
CN109241156A (en) * 2018-07-31 2019-01-18 安徽四创电子股份有限公司 The method updated based on ETL tool from relevant database to non-relational database
CN109254967A (en) * 2018-08-29 2019-01-22 河南智慧云大数据有限公司 A kind of depth analysis method and device based on multi-source heterogeneous mass data
CN109408480A (en) * 2018-09-29 2019-03-01 武汉达梦数据库有限公司 The method and system read based on ORACLE multinode RAC log based on SCN alignment
CN110147362A (en) * 2019-04-04 2019-08-20 中电科大数据研究院有限公司 One kind is based on the acquisition of event driven DOC DATA and processing system and its method
CN110222121A (en) * 2019-06-14 2019-09-10 浪潮软件股份有限公司 A kind of SQL Server database increment synchronization realization method and system based on CDC mode
CN110457358A (en) * 2019-07-30 2019-11-15 新华三大数据技术有限公司 A kind of information collecting method, device, server and computer readable storage medium
CN110569222A (en) * 2019-08-23 2019-12-13 浙江大搜车软件技术有限公司 link tracking method and device, computer equipment and readable storage medium
CN110569222B (en) * 2019-08-23 2022-11-15 浙江大搜车软件技术有限公司 Link tracking method and device, computer equipment and readable storage medium
CN110727724A (en) * 2019-09-09 2020-01-24 上海陆家嘴国际金融资产交易市场股份有限公司 Data extraction method and device, computer equipment and storage medium
CN110727724B (en) * 2019-09-09 2023-03-24 未鲲(上海)科技服务有限公司 Data extraction method and device, computer equipment and storage medium
CN110674154B (en) * 2019-09-26 2023-04-07 浪潮软件股份有限公司 Spark-based method for inserting, updating and deleting data in Hive
CN110674154A (en) * 2019-09-26 2020-01-10 浪潮软件股份有限公司 Spark-based method for inserting, updating and deleting data in Hive
CN111104445A (en) * 2019-12-06 2020-05-05 杭州数梦工场科技有限公司 Data synchronization method, device and equipment
WO2021174696A1 (en) * 2020-03-06 2021-09-10 平安科技(深圳)有限公司 Data updating method and apparatus, computer device, and storage medium
CN113779048A (en) * 2020-06-18 2021-12-10 北京沃东天骏信息技术有限公司 Data processing method and device
CN111813845A (en) * 2020-06-29 2020-10-23 平安国际智慧城市科技股份有限公司 ETL task-based incremental data extraction method, device, equipment and medium
CN112286892A (en) * 2020-07-01 2021-01-29 上海柯林布瑞信息技术有限公司 Real-time data synchronization method and device for post-relational database, storage medium and terminal
CN112286892B (en) * 2020-07-01 2024-04-05 上海柯林布瑞信息技术有限公司 Data real-time synchronization method and device of post-relation database, storage medium and terminal
CN111881136A (en) * 2020-07-29 2020-11-03 山东健康医疗大数据有限公司 Method for realizing incremental data management in medical industry
CN112231301A (en) * 2020-10-21 2021-01-15 黄河水利委员会黄河水利科学研究院 Yellow river water sand change data warehouse
CN112486924A (en) * 2020-12-17 2021-03-12 深圳软牛科技有限公司 Method and device for searching file deletion time in NTFS (New technology File System) and electronic equipment
CN113641742A (en) * 2021-08-05 2021-11-12 广东电网有限责任公司 Data extraction method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105488187A (en) Method and device for extracting multi-source heterogeneous data increment
US11544347B2 (en) System for synchronization of changes in edited websites and interactive applications
CN110879813B (en) Binary log analysis-based MySQL database increment synchronization implementation method
Gousios The GHTorent dataset and tool suite
Barmpis et al. Hawk: Towards a scalable model indexing architecture
RU2599538C2 (en) Methods and systems for loading data into temporal data warehouse
US7610317B2 (en) Synchronization with derived metadata
US7904488B2 (en) Time stamp methods for unified plant model
US20140279903A1 (en) Version control system using commit manifest database tables
CN101923566A (en) Data increment extraction method based on trigger
CN102521225A (en) Incremental data extraction device and incremental data extraction method
CN101882135B (en) Data processing method and device
CN102110102A (en) Data processing method and device, and file identifying method and tool
Rousseau et al. Software provenance tracking at the scale of public source code
CN105224527A (en) Be applicable to the general ETL method of multiple object table update mode
US20070088766A1 (en) Method and system for capturing and storing multiple versions of data item definitions
Maymala PostgreSQL for data architects
CN112835918A (en) MySQL database increment synchronization implementation method
CN104166739B (en) The index document handling method and device of analytical database
CN112817931B (en) Incremental version file generation method and device
Rose et al. Concordance: A framework for managing model integrity
Silva et al. Assisting data warehousing populating processes design through modelling using coloured petri nets
Guo et al. Study on Large-Scale Embedded Databases Evolution
Huber Enabling data citation for XML data
CN117743298A (en) Data tracing method and system based on snapshot

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160413

RJ01 Rejection of invention patent application after publication