CN108829747A - Data load method and device - Google Patents

Data load method and device Download PDF

Info

Publication number
CN108829747A
CN108829747A CN201810510384.XA CN201810510384A CN108829747A CN 108829747 A CN108829747 A CN 108829747A CN 201810510384 A CN201810510384 A CN 201810510384A CN 108829747 A CN108829747 A CN 108829747A
Authority
CN
China
Prior art keywords
data record
data
cache database
life
major key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810510384.XA
Other languages
Chinese (zh)
Other versions
CN108829747B (en
Inventor
李鹏
丁杉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN201810510384.XA priority Critical patent/CN108829747B/en
Publication of CN108829747A publication Critical patent/CN108829747A/en
Application granted granted Critical
Publication of CN108829747B publication Critical patent/CN108829747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This disclosure relates to a kind of data load method and device, to solve the problems, such as that data loading efficiency is low.This method includes:When scan period reaches, from data source reads data log as the first data record in data record set;The first data record in ergodic data set of records ends, is retrieved in cache database according to the major key of the first data record;When not retrieving data record matched with major key in cache database, the first data record is added to newly-increased data record set;When retrieving the end of life with the matched data record of major key and the first data record in cache database, the first data record is added to end of life data record set;By the full dose tables of data of the data record insertion target database in newly-increased data record set;With the data record replacement in end of life data record set and the data record in the consistent full dose tables of data of its major key.Improve system treatment effeciency.

Description

Data load method and device
Technical field
This disclosure relates to technical field of data processing more particularly to a kind of data load method and device.
Background technique
ETL (Extract-Transform-Load, extraction-conversion-load) is used to describe to pass through data from source terminal Extract, convert and be loaded onto the process of destination.In some ETL process application scenarios, new data records are continuously generated, and are taken out It is directly deposited into database after taking new data, no longer update historgraphic data recording;And in other application scenarios, it may be updated Data record (RecSet) have a life cycle, that is, can also be to data after the T time after data record is originally written into Individual attribute field is updated, until this data life period terminates (i.e. life cycle terminates), then updates the termination of its life Mark.Fig. 1 shows the general structure of such data record, as shown in Figure 1, the data recording structure includes:Entity identifier is opened Begin time, end time (life cycle end mark, be initially empty), renewal time and some other attribute-bit.Wherein, Entity identifier+time started makees major key.The life cycle of data record includes:Initially:Time started is t1, and the end time is Sky, relevant parameter are initial value.State 1:Time started is t1, and the end time is sky, relevant parameter x according to service conditions more Newly.State 2:State in change procedure, relevant parameter x are updated according to service conditions.Terminate:Time started is t1, at the end of Between be t2, relevant parameter x complete update.
By taking record is stayed in hotel as an example, in this case, entity identifier is resident's identification card number.One is generated when resident moves in Item record, time started and renewal time are moving in the time for this person, and the end time is sky;After N days, when this person checks out, It is the check-out time that this record, which will be updated its end time field, while the renewal time field of this record can refresh simultaneously.Fig. 2 shows The structure for having gone out the data record by taking record is stayed in hotel as an example, in Fig. 2, entity identifier in data record when resident moves in For the identification card number 411002xxxx of resident, the time started moves in time 2018-01-01 10 for resident's:00, the end time For sky, renewal time is 2018-01-01 10:00;Entity identifier is the body of resident in data record when resident terminates to move in Part card 411002xxxx, time started move in time 2018-01-01 10 for resident's:00, the end time checks out for resident Time 2018-01-03 12:00, renewal time is the time 2018-01-03 12 that resident checks out:00.
It when carrying out ETL to updatable data, needs according to the scan period, is periodically read according to " renewal time " field new Increase record, then the full dose tables of data being stored in target database is retrieved by major key, but is read within the scan period Data record need to create interim table in target database and stored (TmpTable-RecSet), can bring so additional Disk I/O (input/output), exist between interim table and full dose tables of data and press the attended operation (join) of major key, when full dose data When the data record added up in table is more, the more process resource of target database can be occupied, system Whole Response ability is influenced, Reduce system response efficiency.
Summary of the invention
In view of this, the present disclosure proposes a kind of data load method and device, to solve in the related technology in logarithm The lower problem of data loading efficiency during according to progress ETL.
According to the disclosure in a first aspect, provide a kind of data load method, including:When scan period reaches, from number According to source reads data log to data record set, as the first data record in the data record set;Described in traversal First data record in data record set, for any first data record traversed, according to first number It is retrieved in cache database according to the major key of record;It is matched when not retrieved in the cache database with the major key Data record when, first data record is added to newly-increased data record set;It is examined when in the cache database When rope is to the end of life of the matched data record of the major key and first data record, by first number End of life data record set is added to according to record;Data record in the newly-increased data record set is inserted into mesh Mark the full dose tables of data of database;It is replaced and its major key one with the data record in the end of life data record set The data record in the full dose tables of data caused.
According to the second aspect of the disclosure, a kind of data loading device is provided, including:Read module, for scanning week When phase reaches, from data source reads data log to data record set, as the first data in the data record set Record;Retrieval module, for traversing first data record in the data record set, for any traversed One data record is retrieved in cache database according to the major key of first data record;First adding module, is used for When not retrieving data record matched with the major key in the cache database, first data record is added To newly-increased data record set;Second adding module is matched for working as to retrieve in the cache database with the major key Data record and when the end of life of first data record, first data record is added to Life Cycle Final only data record set;It is inserted into module, for the data record in the newly-increased data record set to be inserted into number of targets According to the full dose tables of data in library;Replacement module, for being replaced with the data record in the end of life data record set With the data record in the consistent full dose tables of data of its major key.
By the data loading scheme of all aspects of this disclosure, during carrying out ETL to data, use is data cached Banked cache data record records without using interim table temporal data, reduces the IO of disk, therefore be not necessarily to during ETL Interim table is created, thus without the attended operation for carrying out interim table Yu full dose tables of data in target database, improves and is System treatment effeciency.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the general structure of updatable data record;
Fig. 2 shows the structures of the data record by taking record is stayed in hotel as an example;
Fig. 3 is a kind of flow chart of data load method shown according to an exemplary embodiment;
Fig. 4 is showing for the data load method shown according to an exemplary embodiment that the disclosure is realized based on electronic equipment It is intended to;
Fig. 5 is a kind of flow chart of data load method shown according to an exemplary embodiment;
Fig. 6 is a kind of block diagram of data loading device shown according to an exemplary embodiment.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In order to better illustrate the disclosure, numerous details is given in specific embodiment below.Ability Field technique personnel should be appreciated that no certain details, the disclosure equally can be implemented.In some instances, for ability Method known to field technique personnel, means, element and circuit are not described in detail, in order to highlight the purport of the disclosure.
Fig. 3 is a kind of flow chart of data load method shown according to an exemplary embodiment, and this method can be applied In server, as shown in figure 3, this method comprises the following steps:
Step 301:When scan period reaches, from data source reads data log to data record set, remember as data The first data record in record set;
The data read from data source for example can be updatable data, and the life cycle of updatable data is limited , for example, the lodging data in hotel, life cycle are n days, the Internet data of Internet bar, life cycle is x hours.In order to Periodically updatable data is loaded, can periodically be remembered from data source reads data log to data according to the scan period Record set, wherein the scan period can be a fixation of data load process of the periodical execution step 301 to step 306 Time interval, it is understood that for a process cycle of the data load method of the present embodiment, i.e., reached whenever the scan period When, then a step 301 is executed to step 306.
Step 302:First data record in ergodic data set of records ends, for any first data traversed Record, is retrieved in cache database according to the major key of the first data record;
In the step 302, can all first data records in ergodic data set of records ends, according to the first data record Major key retrieved in cache database and the consistent data record of the major key of the first data record, wherein cache database is For the data record that life cycle in data cached set of records ends does not terminate, may have in the cache database at upper one The data record that life cycle does not terminate in scanning week, alternatively, the system applied by notebook data loading method is new upper linear system It does not include any data record in the cache database in the case where system.
Step 303:When not retrieving data record matched with the major key in cache database, by described first Data record is added to newly-increased data record set;
For example, not retrieved and first number when being retrieved in cache database based on current first data record When data record consistent according to record major key, it is believed that first data record is newly-increased data record, then can be by first number Newly-increased data record set is added to according to record.
Step 304:When being retrieved in cache database and the matched data record of the major key and first data When the end of life of record, first data record is added to end of life data record set;
For example, in cache database based on current first data record retrieved when, if retrieve with this first When the consistent data record of data record major key and the end of life of first data record, which is added Add to end of life data record set.
Step 305:By the full dose tables of data of the data record insertion target database in newly-increased data record set;
Due to being the newly-increased data record read in this scan period in newly-increased data record set, in order to realize by Data are loaded onto the purpose of target database, by the newly-increased data record insertion target database in the combination of newly-increased data record In full dose tables of data.Wherein, target database loads the database of the corresponding destination of data during being ETL.
Step 306:It is replaced with the data record in end of life data record set consistent described with its major key Data record in full dose tables of data.
The life cycle of data record in end of life data record set has terminated, i.e., such data no longer produces Raw more new data.Therefore it can delete in full dose tables of data and remember with the consistent data of the data record major key of the end of life Record, then the data record that the life cycle is terminated are inserted into full dose tables of data, to realize to data record in full dose tables of data It updates.
The data load method of this implementation uses the data cached note of cache database during carrying out ETL to data Record no longer needs to record using interim table temporal data, without grasping in the enterprising hand-manipulating of needle of target database to the connection of full dose tables of data Make, improves the treatment effeciency of system.
In an achievable mode, the data load method of the disclosure may also include:It is reached in the first scan period Before, the valid data record that life cycle does not terminate in full dose tables of data is imported into cache database.For example, data load method Applied system is new online system, but target database system is in operation, then can based on experience value, will a period of time The valid data record that life cycle does not terminate in the full dose tables of data of interior target database imports cache database;For another example Target database system in operation, close start again in operation by system applied by the present processes, then basis Empirical value imports the valid data record that life cycle does not terminate in the full dose tables of data of target database in a period of time slow Deposit data library, to cover data record original in cache database.Wherein, the first scan period can be notebook data record method Applied system executes a cycle of this method.
In a kind of achievable mode, may include from data source reads data log to data record set:In scanning week When phase reaches, one or more is read more from the data source according to the value for the time variable being arranged in a upper scan period The new time is later than the data record of the value.Wherein, time variable is for example, " last maximum renewal time " last_max_ Update_time, such as within a upper scan period in cache database when the update of the data record of renewal time the latest Between.In one example, it before data source reads data log set, is recorded in cache database comprising 3 datas, Renewal time is respectively 1 day 14 April in 2018:00,2 days 13 April in 2018:00,2 days 15 April in 2018:00, then variable Last_max_update_time is 2 days 15 April in 2018 in the initial value of present scanning cycle:00.When scan period reaches, For example, being 4 days 00 May in 2018:00, that is, need the reading renewal time from data source to be later than 2 days 15 April in 2018: 00 data record, i.e. 2 days 15 April in 2018:00 to 2018 on May 4,00:The data record generated between 00.
In a kind of achievable mode, the data load method of the disclosure may also include:One or more is read from data source Renewal time is later than after the data record of time variable value, by the value of time variable be updated in this scan period from The nearest renewal time in data record that data source is read.With reference to above-mentioned example, it is assumed that from data source in this scan period Nearest renewal time is 3 days 17 May in 2018 in the data record of reading:00, then from data source reads data log it Afterwards, last_max_update_time can be updated to 3 days 17 May in 2018:00.
In a kind of achievable mode, the valid data record that life cycle does not terminate in full dose tables of data is imported into caching When database, when the nearest update that can be set the value of time variable in the cache database in all data records Between.With reference to above-mentioned example, it is assumed that renewal time nearest in the data record of importing cache database is in this scan period 3 days 17 May in 2018:00, then last_max_update_time can be updated to 3 days 17 May in 2018:00.
In a kind of achievable mode, the data load method of the disclosure may also include:It is not examined when in cache database When rope is to data record matched with the major key, first data record is stored in cache database;When data cached When retrieving the end of life with the matched data record of the major key and first data record in library, caching The data record retrieved described in being deleted in database.Based on the first data record read from data source, by newly-increased number scale Record is added to cache database, deletes the data in cache database with the first consistent end of life of data record major key Record, so that being only cached with the data record that life cycle does not terminate in cache database, it is ensured that the number in cache database It will not infinitely be expanded according to record.
In one implementation, the data load method of the disclosure may also include:To first in data record set After data record traversal is completed, data record set is deleted;Data record in newly-increased data record set is inserted into mesh After the full dose tables of data for marking database, newly-increased data record set is deleted;With in end of life data record set Data record replacement with the consistent full dose tables of data of its major key in data record after, by end of life data record Set is deleted.Wherein, data record set, newly-increased data record set and end of life data record set can be upper Stating needs to create in server or electronic equipment using when these data record sets in data load method, is using these After data record set, in order to save the memory space of server or electronic equipment, then these data record sets can be deleted.
In a kind of achievable mode, cache database is the memory database based on key-value, e.g., redis, Memcache etc., in addition, if the life cycle of data record is longer, data volume is larger, can also be using hbase as caching Database.Due to memory database, its query responding time based on major key is Millisecond, so being made using memory database For cache database, the number of the data record in cache database will not infinitely expand, so that data cached scale Controllably.
The above-mentioned data load method that the disclosure provides can realize that Fig. 4 is exemplary according to one based on an electronic equipment Implement the schematic diagram for the data load method that the disclosure is realized based on electronic equipment exemplified, as shown in figure 4, data record collection It closes 41, newly-increased data record set 42 and end of life data record set 43 is created and saved in electronic equipment 40 In electronic equipment local memory.Target database 44, data source 47 and cache database 46 can be deployed in the electronics respectively and set It is standby upper, it can also individually dispose on other electronic equipments, the disclosure is without limitation.
Fig. 5 is the flow chart of data load method shown according to an exemplary embodiment, as shown in figure 5, this method packet Include following steps:
Step 601:The valid data record that life cycle does not terminate in the full dose tables of data of target database is imported slow Deposit data library.For ease of description, the data record in cache database is known as historgraphic data recording.
The step can be divided into following several situations:
Data record is not present in situation one, the full dose tables of data of target database.
For example, target database is new online system, data are not present in full dose tables of data, are also not present in data source Data.Under such situation, the step can not be executed, correspondingly, cache database is sky.
There are data records in situation two, the full dose tables of data of target database.
For example, system applied by the present processes is new online system, but target database system is in operation, then The valid data that life cycle does not terminate in the full dose tables of data of target database in a period of time can be remembered based on experience value Record imports cache database;For another example target database system is in operation, system applied by the present processes is being run Middle closing starts again, then based on experience value, not by life cycle in the full dose tables of data of target database in a period of time The valid data record of termination imports cache database, to cover data record original in cache database.
Cache database in the present embodiment is illustrated by taking redis as an example, redis one hash (hash) of corresponding creation Structural library is defined as follows:
Key (major key):
The major key (entity identifier+time started) of data record
Fields (attribute):
Begin_time:YYYY-mm-dd hh24:mi:The ss time started
End_time:YYYY-mm-dd hh24:mi:The ss end time
Step 602:Judge whether the scan period reaches, if the scan period reaches, execute step 603, if scanning week Phase does not reach, returns to step 602;
Step 603:According to the value of variable " maximum renewal time last time " last_max_update_time, from data source Read one or more renewal time be later than the setting last_max_update_time data record to data record Gather (RecSet), after the completion of reading, the value of last_max_update_time is updated in this scan period from data source Nearest renewal time in the data record of reading.
For the ease of subsequent descriptions, the data record in RecSet is denoted as the first data record.
When reaching first scan period, need according to the initial value of variable last_max_update_time from data Data are read in source, then this variations per hour last_max_update_time initial value can be:
For the situation one in step 601, cache database is sky, the initial value of variable last_max_update_time Can be the on-line time of system, correspondingly, when reaching first scan period, can read data source from online implementing to All data records that current time (i.e. the time that first scan period reaches) generates.
For the situation two in step 601, cache database is not sky, then variable last_max_update_time's is first Initial value can be the nearest renewal time in all data records of cache database, correspondingly, when first scan period arrives Up to when, from data source read renewal time be later than last_max_update_time initial value original records.
For example, recording in cache database comprising 3 datas, renewal time is respectively 1 day 14 April in 2018:00, 2 days 13 April in 2018:00,2 days 15 April in 2018:00, then the initial value of variable last_max_update_time is 2018 On April 2,15 in:00.Correspondingly, when first scan period reaches, for example, being 4 days 00 May in 2018:00, that is, it needs from number It is later than 2 days 15 April in 2018 according to renewal time is read in source:00 data record, i.e. 2 days 15 April in 2018:00 to 4 days 00 May in 2018:The data record generated between 00.
It, can be to variable last_max_update_ after first scan period has read data record from data source The value of time is updated, and the value of updated last_max_update_time is in the data record read this scan period Nearest renewal time.After having updated last_max_update_time, next scan period is waited, after being performed simultaneously Continuous step.
When next scan period reaches, according to the value of current last_max_update_time, i.e., a upper scanning Period has read from the nearest renewal time in the data record that data source is read again from data source reads data log Cheng Hou is updated last_max_update_time.
The first data record in data record set (RecSet) is traversed, for traverse any first Data record executes following steps 604-607.
Step 604:It is retrieved in redis by the major key of the first data record.
Step 605:If do not retrieved in redis with the matched data record of the major key of the first data record, this One data record is newly-increased data record, increases this newly data record and is added to newly-increased data record set (RecSet-New), And the newly-increased data record is inserted into redis;
It should be noted that when reaching scan period first time, if data record is not present in redis, by being somebody's turn to do The retrieval of step, all newly-increased data records of data record in RecSet, is all added to RecSet-New and redis In.
Step 606:If retrieved in redis with the matched data record of the major key of the first data record, and this first The corresponding end_time field of data record is not empty, it is determined that the end of life of first data record, then by the life The data record of life cycle arrest is added to end of life data record set (RecSet-Over), and deletes in redis Except the matched data record of major key with the first data record retrieved.
Step 607:If retrieved in redis with the matched data record of the major key of the first data record, and this first The corresponding end_time field of data record be sky, then it represents that in redis with the matched data record life cycle of the major key It does not terminate, is not processed.
Step 608, after the completion of to the first data record traversal in RecSet, RecSet is deleted, return step 602, And execute step 609 and 610.
It may be performed simultaneously 609 and 610, any one in two steps can also be first carried out.Execute step 609 and When 610, the detection of step 602 scan period may be performed simultaneously.
Step 609:It will be complete in all newly-increased data record insertion target databases for including in the RecSet-New Tables of data Table-Data is measured, RecSet-New is then deleted.
Step 610:With the data record replacement in RecSet-Over and the data in the consistent Table-Data of its major key Record, then deletes RecSet-Over.
In conclusion the disclosure is during carrying out ETL to data, using cache database to can in target database The data (i.e. the unclosed data record of life cycle) of update are cached, and the data that this scan period is obtained are stored in this In ground memory, data retrieval and processing rapidly and efficiently may be implemented using local memory and a special memory database, Only after processing is completed, the partial data of update is synchronized in target database, greatly reduces disk I/O operation, while Evade the attended operation for being directed to full dose tables of data in target database in the prior art, improves treatment effeciency.
Fig. 6 is the block diagram of data loading device shown according to an exemplary embodiment, as shown in fig. 6, the device 70 wraps It includes:
Read module 71, when being reached for the scan period, from data source reads data log to data record set, as The first data record in the data record set;
Retrieval module 72, for traversing first data record in the data record set, for what is traversed Any first data record is retrieved in cache database according to the major key of first data record;
First adding module 73, for when not retrieving in the cache database and the matched data of the major key are remembered When record, first data record is added to newly-increased data record set;
Second adding module 74, for when retrieving in the cache database and the matched data of the major key are remembered When record and the end of life of first data record, first data record is added to end of life number According to set of records ends;
It is inserted into module 75, for the data record in the newly-increased data record set to be inserted into the full dose of target database Tables of data;
Replacement module 76, for being replaced and its major key with the data record in the end of life data record set Data record in the consistent full dose tables of data.
In a kind of achievable mode, the data loading device of the disclosure may also include:Import modul, for being swept first Before retouching period arrival, the valid data record that life cycle does not terminate in the full dose tables of data is imported described data cached Library.
In a kind of achievable mode, the read module is used for:When reaching the scan period, according to upper scanning week The value for the time variable being arranged in phase is later than the data of the value from the data source one or more renewal time of reading Record.
In a kind of achievable mode, the data loading device of the disclosure further includes:Update module, for from data source Reading for one or more renewal time is later than after the data record of the value, and the value of the time variable is updated to this From the nearest renewal time in the data record that data source is read in scan period.
In a kind of achievable mode, the data loading device of the disclosure further includes:Setup module, for will it is described entirely When the valid data record that life cycle does not terminate in amount tables of data imports the cache database, by taking for the time variable Value is set as in the cache database the nearest renewal time in all data records.
In a kind of achievable mode, the data loading device of the disclosure further includes:Memory module, for when described slow When not retrieving data record matched with the major key in deposit data library, first data record is stored in the caching number According to library;First removing module retrieves and the matched data record of the major key and institute in the cache database for working as When stating the end of life of the first data record, deleted in the cache database described in the data record that retrieves.
In a kind of achievable mode, the data loading device of the disclosure further includes:Described device further includes:
Second removing module, after being completed to first data record traversal in the data record set, The data record set is deleted;Third removing module, for inserting the data record in the newly-increased data record set After the full dose tables of data for entering target database, the newly-increased data record set is deleted;4th removing module, for using State the data record replacement in end of life data record set and the number in the consistent full dose tables of data of its major key After record, the end of life data record set is deleted.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology Other those of ordinary skill in domain can understand each embodiment disclosed herein.

Claims (14)

1. a kind of data load method, which is characterized in that including:
When scan period reaches, from data source reads data log to data record set, as in the data record set The first data record;
First data record in the data record set is traversed, for any first data record traversed, root It is retrieved in cache database according to the major key of first data record;
When not retrieving data record matched with the major key in the cache database, by first data record It is added to newly-increased data record set;
When retrieving the life with the matched data record of the major key and first data record in the cache database When ordering cycle arrest, first data record is added to end of life data record set;
By the full dose tables of data of the data record insertion target database in the newly-increased data record set;
With the data record replacement and the consistent full dose data of its major key in the end of life data record set Data record in table.
2. the method according to claim 1, wherein the method also includes:
Before the first scan period reaches, the valid data that life cycle does not terminate in the full dose tables of data are recorded and are imported The cache database.
3. according to the method described in claim 2, it is characterized in that, described from data source reads data log to data record collection It closes, including:
When reaching the scan period, read according to the value for the time variable being arranged in a upper scan period from the data source One or more renewal time was later than the data record of the value.
4. according to the method described in claim 3, it is characterized in that, described read for one or more evening renewal time from data source After the data record of the value, this method further includes:
By the value of the time variable be updated in this scan period from data source read data record in recently more The new time.
5. according to the method described in claim 3, it is characterized in that, described do not terminate life cycle in the full dose tables of data Valid data record import the cache database when,
The nearest renewal time set the value of the time variable in the cache database in all data records.
6. the method according to claim 1, wherein the method also includes:
When not retrieving data record matched with the major key in the cache database, by first data record It is stored in the cache database;
When retrieving the life with the matched data record of the major key and first data record in the cache database When ordering cycle arrest, the data record retrieved is deleted in the cache database.
7. the method according to claim 1, wherein the method also includes:
After being completed to first data record traversal in the data record set, the data record set is deleted It removes;
It, will be described new after the full dose tables of data of the data record insertion target database in the newly-increased data record set Increase data record set to delete;
With the data record replacement and the consistent full dose data of its major key in the end of life data record set After data record in table, the end of life data record set is deleted.
8. a kind of data loading device, which is characterized in that including:
Read module, when being reached for the scan period, from data source reads data log to data record set, as the number According to the first data record in set of records ends;
Retrieval module, for traversing first data record in the data record set, for any traversed One data record is retrieved in cache database according to the major key of first data record;
First adding module, for when not retrieving data record matched with the major key in the cache database, First data record is added to newly-increased data record set;
Second adding module retrieves and the matched data record of the major key and institute in the cache database for working as When stating the end of life of the first data record, first data record is added to end of life data record collection It closes;
It is inserted into module, for the data record in the newly-increased data record set to be inserted into the full dose data of target database Table;
Replacement module, for consistent with its major key with the data record replacement in the end of life data record set Data record in the full dose tables of data.
9. device according to claim 8, which is characterized in that described device further includes:
Import modul, for having before the first scan period reaches by what life cycle in the full dose tables of data did not terminated Imitate cache database described in Import data records.
10. device according to claim 9, which is characterized in that the read module is used for:
When reaching the scan period, read according to the value for the time variable being arranged in a upper scan period from the data source One or more renewal time was later than the data record of the value.
11. device according to claim 10, which is characterized in that described device further includes:
Update module, for being read after one or more renewal time was later than the data record of the value from data source, The value of the time variable is updated in this scan period from when the nearest update in the data record that data source is read Between.
12. device according to claim 10, which is characterized in that described device further includes:
Setup module, for the valid data record that life cycle does not terminate in the full dose tables of data to be imported the caching When database, the nearest update that sets the value of the time variable in the cache database in all data records Time.
13. device according to claim 8, which is characterized in that described device further includes:
Memory module, for when not retrieving data record matched with the major key in the cache database, by institute It states the first data record and is stored in the cache database;
First removing module retrieves and the matched data record of the major key and institute in the cache database for working as When stating the end of life of the first data record, the data record retrieved is deleted in the cache database.
14. device according to claim 8, which is characterized in that described device further includes:
Second removing module, after being completed to first data record traversal in the data record set, by institute State data record set deletion;
Third removing module, for the data record in the newly-increased data record set to be inserted into the full dose number of target database After table, the newly-increased data record set is deleted;
4th removing module, for being replaced and its major key one with the data record in the end of life data record set After the data record in the full dose tables of data caused, the end of life data record set is deleted.
CN201810510384.XA 2018-05-24 2018-05-24 Data load method and device Active CN108829747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810510384.XA CN108829747B (en) 2018-05-24 2018-05-24 Data load method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810510384.XA CN108829747B (en) 2018-05-24 2018-05-24 Data load method and device

Publications (2)

Publication Number Publication Date
CN108829747A true CN108829747A (en) 2018-11-16
CN108829747B CN108829747B (en) 2019-09-17

Family

ID=64145497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810510384.XA Active CN108829747B (en) 2018-05-24 2018-05-24 Data load method and device

Country Status (1)

Country Link
CN (1) CN108829747B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635032A (en) * 2018-12-24 2019-04-16 福建凯米网络科技有限公司 A kind of method and terminal of data conversion
CN112256775A (en) * 2020-09-27 2021-01-22 建信金融科技有限责任公司 Method and device for timed data loading of Oracle database
CN113139081A (en) * 2021-04-27 2021-07-20 中山亿联智能科技有限公司 Method for reporting and reading user online playing information with high efficiency and low delay

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101504664A (en) * 2009-03-18 2009-08-12 中国工商银行股份有限公司 Apparatus and method for extracting, converting and loading total source data
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
CN104484400A (en) * 2014-12-12 2015-04-01 北京国双科技有限公司 Method and device for data query processing
CN104573128A (en) * 2014-10-28 2015-04-29 北京国双科技有限公司 Business data processing method, a business data processing device and server
CN104731791A (en) * 2013-12-18 2015-06-24 东阳艾维德广告传媒有限公司 Marketing analysis data market system
CN105426292A (en) * 2015-10-29 2016-03-23 网易(杭州)网络有限公司 Game log real-time processing system and method
CN105512201A (en) * 2015-11-26 2016-04-20 晶赞广告(上海)有限公司 Data collection and processing method and device
CN105956123A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Local updating software-based data processing method and apparatus
CN106407321A (en) * 2016-08-31 2017-02-15 东软集团股份有限公司 Data synchronization method and device
CN107229721A (en) * 2017-06-02 2017-10-03 泰华智慧产业集团股份有限公司 A kind of method and device for changing data pick-up
CN107330003A (en) * 2017-06-12 2017-11-07 上海藤榕网络科技有限公司 Method of data synchronization, system, memory and data syn-chronization equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101504664A (en) * 2009-03-18 2009-08-12 中国工商银行股份有限公司 Apparatus and method for extracting, converting and loading total source data
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
CN104731791A (en) * 2013-12-18 2015-06-24 东阳艾维德广告传媒有限公司 Marketing analysis data market system
CN104573128A (en) * 2014-10-28 2015-04-29 北京国双科技有限公司 Business data processing method, a business data processing device and server
CN104484400A (en) * 2014-12-12 2015-04-01 北京国双科技有限公司 Method and device for data query processing
CN105426292A (en) * 2015-10-29 2016-03-23 网易(杭州)网络有限公司 Game log real-time processing system and method
CN105512201A (en) * 2015-11-26 2016-04-20 晶赞广告(上海)有限公司 Data collection and processing method and device
CN105956123A (en) * 2016-05-03 2016-09-21 无锡雅座在线科技发展有限公司 Local updating software-based data processing method and apparatus
CN106407321A (en) * 2016-08-31 2017-02-15 东软集团股份有限公司 Data synchronization method and device
CN107229721A (en) * 2017-06-02 2017-10-03 泰华智慧产业集团股份有限公司 A kind of method and device for changing data pick-up
CN107330003A (en) * 2017-06-12 2017-11-07 上海藤榕网络科技有限公司 Method of data synchronization, system, memory and data syn-chronization equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PANOS VASSILIADIS 等: "ARKTOS: TOWARDS THE MODELING, DESIGN, CONTROL AND EXECUTION OF ETL PROCESSES", 《INFORMATION SYSTEMS》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635032A (en) * 2018-12-24 2019-04-16 福建凯米网络科技有限公司 A kind of method and terminal of data conversion
CN112256775A (en) * 2020-09-27 2021-01-22 建信金融科技有限责任公司 Method and device for timed data loading of Oracle database
CN113139081A (en) * 2021-04-27 2021-07-20 中山亿联智能科技有限公司 Method for reporting and reading user online playing information with high efficiency and low delay
CN113139081B (en) * 2021-04-27 2023-10-27 中山亿联智能科技有限公司 Method for reporting online playing information of reading user with high efficiency and low delay

Also Published As

Publication number Publication date
CN108829747B (en) 2019-09-17

Similar Documents

Publication Publication Date Title
CN108829747B (en) Data load method and device
CN107229721B (en) A kind of method and device changing data pick-up
CN104133822B (en) A kind of method and device that file on memorizer is scanned
EP1116139B1 (en) Method and apparatus for reorganizing an active dbms table
CN110268399A (en) Merge tree modification for maintenance operations
US9047330B2 (en) Index compression in databases
CN110268394A (en) KVS tree
CN110383261A (en) Stream selection for multi-stream storage
US8768980B2 (en) Process for optimizing file storage systems
CN107943718B (en) Method and device for cleaning cache file
US7536512B2 (en) Method and apparatus for space efficient identification of candidate objects for eviction from a large cache
CN106155934B (en) Caching method based on repeated data under a kind of cloud environment
CN104092670A (en) Method for utilizing network cache server to process files and device for processing cache files
CN110287201A (en) Data access method, device, equipment and storage medium
CN109815425A (en) Caching data processing method, device, computer equipment and storage medium
US20130046798A1 (en) Method and apparatus for visualization of infrastructure using a non-relational graph data store
CN105915619B (en) Take the cyberspace information service high-performance memory cache method of access temperature into account
US10789234B2 (en) Method and apparatus for storing data
CN108446329A (en) Adaptive databases partition method and system towards industrial time series database
Zhang et al. Recovering SQLite data from fragmented flash pages
CN107169047A (en) A kind of method and device for realizing data buffer storage
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
CN104219271B (en) Based on the asynchronous multiserver synchronous method for downloading the page of multithreading
CN109582233A (en) A kind of caching method and device of data
CN110399451B (en) Full-text search engine caching method, system and device based on nonvolatile memory and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant