WO2011026311A1 - 数据转移系统及方法 - Google Patents

数据转移系统及方法 Download PDF

Info

Publication number
WO2011026311A1
WO2011026311A1 PCT/CN2010/001280 CN2010001280W WO2011026311A1 WO 2011026311 A1 WO2011026311 A1 WO 2011026311A1 CN 2010001280 W CN2010001280 W CN 2010001280W WO 2011026311 A1 WO2011026311 A1 WO 2011026311A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
database
transfer
target
service module
Prior art date
Application number
PCT/CN2010/001280
Other languages
English (en)
French (fr)
Inventor
陈林
茅毓铭
庄晓
鲁志军
杨燕明
白玫
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Priority to EP10813233.3A priority Critical patent/EP2474918A4/en
Priority to US13/393,205 priority patent/US8924342B2/en
Publication of WO2011026311A1 publication Critical patent/WO2011026311A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication

Definitions

  • the present invention relates to data transfer systems and methods and, more particularly, to data transfer systems and methods for multi-platform databases. Background technique
  • the existing data transfer technology between databases has the following disadvantages: high cost; large impact on database or data table, high coupling degree (that is, data transfer software adapted to A-type database or data table does not Adapt to Class B databases or data tables), even need to build triggers on the source database or rely on the support of specific functions of a particular database product, thus lacking versatility and scalability; single function, can not support near real-time (ie, minute Transfer) and real-time (ie, second-level transfer) transfer of two modes; it is difficult or even impossible to achieve data transfer between heterogeneous databases, that is, it is difficult or even impossible to achieve data filtering and deformation, thereby enabling disaster recovery and recovery of database systems. Poor.
  • Method (1) requires extra disk space, and input/input operations when saving files are time consuming; in method (2), although multiple processes are concurrently executed OK, but each process submits only one record at a time, so it is inefficient; in method (3), if a large amount of data is submitted at a time, a large log space is required, so once the submission fails, the overall failure will result. Therefore, batch insertion technology that maximizes database performance is also urgently needed in the market. Summary of the invention
  • the present invention proposes a data transfer system and method for data transfer between databases (including homogeneous and heterogeneous), and the data transfer system and method simultaneously support Data transfer in real-time (ie, minute-level transfer) and real-time (ie, second-order transfer) modes.
  • a data transfer system includes at least one host, at least one source database, at least one target database, a parameter configuration database, and a control database; wherein the parameter configuration database is connected to the at least one host,
  • the configuration parameters are provided to the at least one host in response to storing configuration parameters and in response to a request by the at least one host.
  • the control database is coupled to the at least one host for storing control information and providing the control information to the at least one host in response to the request of the at least one host, the at least one source database respectively
  • the at least one host is connected to provide source data to be transferred
  • the at least one target database is respectively connected to the at least one host for receiving target data to be transferred
  • each of the hosts includes a master control module, a source data acquisition service module, and a target table update service module, wherein the master control module is configured to adjust a transfer task and load the configuration parameter; wherein the source data acquisition service module accepts the total control
  • the module is configured to acquire the source data, generate the target data, and invoke the target table update service module.
  • the source data acquisition service module is divided into a main body and an attachment. The main part is used to read the parameters of the transfer task, and load the attachment. And calling the sub-process corresponding to the transfer task, the accessory part is State library, the dynamic library package data acquisition source, the data processing sub-strain.
  • the target table update service module accepts a schedule of the source data acquisition service module to insert or update the target data into a target table
  • the target table update service module points
  • the main body and the attachment the main part is used to read the parameters of the transfer task Number, loading the attachment, calling the sub-processing corresponding to the transfer task
  • the attachment part is a dynamic function library
  • the dynamic function library encapsulates the insertion and update sub-processing of the target data.
  • the total control module is a minute-level transfer master control module or a second-level transfer master control module, wherein the minute-level transfer master control module is configured to complete a quasi-real-time data transfer task.
  • the second-level transfer master control module is used to complete the real-time data transfer task.
  • the target table update service module further includes a bulk database update submodule, and the bulk database update submodule performs the insert and/or update process in the following manner:
  • each of the plurality of database operation processes reads a piece of data in an exclusive manner each time, and then inserts or updates the data into the target database by means of batch insertion or batch update;
  • step (C3) if step (C2) is successful, the process may continue to process other data blocks, and if step (C2) fails, insert or update the data into the target database in a single insert or update manner, Record the error log corresponding to the operation failure record and continue processing other data blocks.
  • the master control module is deployed on each of the hosts, and only one of the at least one hosts has a task scheduling function.
  • the source data acquisition service module includes a deformation processing sub-module, and the deformation processing sub-module completes deformation processing of the source data by calling a dynamic deformation function, thereby generating target data.
  • the at least one source database and the at least one target database are heterogeneous.
  • the parameter configuration database and the control database are coexisted in one database.
  • a data transfer method comprising the following steps:
  • control module invokes the source data acquisition service module according to the read transfer control information and the parameter configuration information, and transfers the transfer task parameter to the source data acquisition service module;
  • the source data obtaining service module connects at least one source database to obtain an environment variable of the at least one source database, and reads a corresponding database record in the source database according to the transfer task parameter;
  • the source data obtaining service module deforms the read database record to obtain a target record, and writes the target record into the data storage area;
  • step (D6) Repeat step (D 4 ) - ( D5 ) until the data memory area is full:
  • the source data obtaining service module calls the target table update service module, and transfers the transfer task parameter to the target table update service module;
  • the target table update service module connects at least one target database and the data storage area, thereby acquiring an environment variable of the at least one target database, and parsing the received transfer task parameter;
  • the target table update service module performs bulk insertion and update on the at least one target database according to the parsing result until the transfer task is completed;
  • the target table update service module releases the data storage area; and the source data acquisition service module accepts the scheduling of the total control module to obtain source data, generate target data, And calling the target table update service module, where the source data acquisition service module is divided into two parts: a main body and an attachment, wherein the main part is used to read the parameters of the transfer task, load the attachment, and invoke the sub-process corresponding to the transfer task,
  • the accessory part is a dynamic function library, and the dynamic function library encapsulates source data acquisition and data deformation sub-processing
  • the target table update service module accepts a schedule of the source data acquisition service module to insert or update the target data into a target table, and the target table update service module points
  • the main body and the accessory are used for reading the parameters of the transfer task, loading the attachment, calling the sub-processing corresponding to the transfer task, and the accessory part is a dynamic function library, and the dynamic function library encapsulates the insertion of the target data. And update sub-processing.
  • the step (D9) in the data transfer method further comprises the following steps: (E1) dividing the data to be inserted into blocks, each block storing a plurality of pieces of the data; (E2) each of the plurality of database operation processes reads a piece of data in an exclusive manner each time, and then inserts or updates the data into the at least one target database by means of batch insertion or batch update; E3) if step (E2) is successful, the process may continue to process other data blocks, and if step (E2) fails, insert or update the data into the target database in a single insertion or update manner, recording The operation fails to log the corresponding error log and continues to process other data blocks.
  • the total control module is a minute-level transfer master control module or a second-level transfer master control module, wherein the minute-level transfer master control module is configured to complete a quasi-real-time data transfer task.
  • the second-level transfer master control module is used to complete the real-time data transfer task.
  • the master control module is deployed on each of the hosts, and only one of the at least one hosts has a task scheduling function.
  • the source data acquisition service module includes a deformation processing sub-module, and the deformation processing sub-module completes deformation processing of the source data by calling a dynamic deformation function, thereby generating target data.
  • the at least one source database and the at least one target database are heterogeneous.
  • the data transfer system and method disclosed by the present invention have the following advantages: since the source data selection and data deformation process are encapsulated by using a dynamic function library, the data transfer system and method disclosed by the present invention are not only low in complexity but also flexible in responding to different situations.
  • the transfer task that is, data transfer between the same/heterogeneous database can be realized; in addition, the data transfer system and method disclosed by the present invention can support both real-time (ie, minute-level transfer) and real-time (ie, second-order transfer). Mode data transfer.
  • the disclosed data transfer system and method is a versatile, flexible, highly encapsulated, highly stable system and method.
  • FIG. 1 is a block diagram of a data transfer system in accordance with an embodiment of the present invention
  • 2 is a flowchart showing the operation of a source data acquisition service module according to an embodiment of the present invention
  • FIG. 3 is a flowchart showing the operation of a target table update service module according to an embodiment of the present invention
  • a data transfer system 1 comprises at least one host ⁇ 1 5 preferably comprises a plurality of host ⁇ , - ⁇ 3 (i.e. server), at least one source database preferably comprises a plurality of sources S databases Si- S N , at least one target database P, preferably comprising a plurality of target databases PrP parameter configuration database 1 and control database 2.
  • the configuration parameter database with the host 1 ⁇ , - ⁇ ; connected, for storing configuration parameters in response to the request of the host Hr ⁇ H 3 provide the configuration parameters to the host Hi-H 3 .
  • the control database 2 is connected to the host- for storing control information (ie, data transfer program parameters and transferring dynamic information) and providing the control information to the host in response to a request from the host H factory H 3 ⁇ , - ⁇ Even if the transfer process is abnormally interrupted, all control information can still be obtained from the control database 2 and the transfer operation can be continued when restarting, thereby ensuring data integrity and reliability.
  • the source database S "S N is respectively connected to the host - ⁇ for providing source data to be transferred.
  • the target databases ⁇ , - P N are respectively connected to the hosts H, -H 3 , And receiving the target data to be transferred.
  • the parameter configuration database 1 and the control database 2 may coexist in one database.
  • each host in the host H ⁇ H 3 includes a total control module 3 (a minute-level transfer master control module or a second-level transfer master control module), a source data acquisition service module 4, and a target table update.
  • Service module 5 The total control module 3 is used for memory loading of the overall task control and configuration parameters, that is, the minute-level transfer master control module and the second-level transfer master control module are separately deployed, and when the minute-level transfer service is required, the minute-level total is started.
  • the control service module starts the second-level master control service module when the second-level transfer service is required.
  • the master control module 3 has one on each host, but through parameter control, only one host in the host group has the task scheduling function. Therefore, once the host with the task scheduling function is abnormal, the task scheduling function of the master control module can be started on the backup host. Therefore, the source data acquisition service module 4 and the target table update service module 5 on each host are in a working state, thereby improving the system. System processing performance and disaster tolerance.
  • the minute-level master control service module is configured to initialize the transfer control information, and transfer the transfer control information to the transfer service of the plurality of hosts, and then control the advancement of the data transfer time slice and the update of the status.
  • the minute-level master control service module can also implement the distribution of the transfer task and the report of the transfer process in the data transfer process by configuring parameters.
  • the meaning of the time slice is as follows: In the data transfer system disclosed by the present invention, the time period is artificially divided into logical segments according to the configuration, and the time of the small segment is a time slice.
  • the configuration information includes: a transfer start time, a transfer time slice, a number of times the current transfer end time is updated every cycle, an interval between the transfer end time and the current system time, database connection information, and the number of hosts.
  • the second-level master control service module is configured to transfer data before n seconds (for example, n ⁇ 10) from a source data table to a target data table, that is, according to a data transfer control table in the database.
  • the information, the transfer task is transferred to the transfer service, and the time slice progress of the transfer is advanced.
  • the second-level master control service module can also implement the distribution of the transfer task and the report of the transfer process in the data transfer process by configuring parameters.
  • the configuration information includes: a transfer start time, a transfer time slice, a number of times the current transfer end time is updated every cycle, an interval between the transfer end time and the current system time, database connection information, and the number of hosts.
  • FIG. 2 is a flow chart showing the operation of the source data acquisition service module 4 in accordance with an embodiment of the present invention.
  • the working process of the source data obtaining service module 4 is as follows: (A1) Initializing and connecting the source database S!-Sx; (A2) Receiving the total control module 3 (minimum level transfer master control module or second level transfer total Calling the control module); (A3) acquiring the transfer task parameter according to the call information of the master control module 3; (A4) determining the task indicated by the task parameter; (A5) updating if the task is "to be updated” The transfer status in the control table is "updated", if the task is "transfer”, the transfer status of the update control table is "transfer”; 6) the environment variable of the source database S "S N is acquired; (A7) Obtaining the first record of the source data according to the acquired environment variable of the source database S ⁇ SN and determining whether there is no record; (A8) if there is no record and the task is "to be updated", the transfer state of the update control
  • the transfer status of the update control table is "transfer completed" and the current source data acquisition service is ended; (A9) if there is a record , Then, it is judged whether the record has been completely processed; if the record has been completely processed, the target table update service module 5 is called and the source data acquisition service is ended; (A10) if there is still The processed data determines whether the record requires deformation processing; (Al l ) if the record does not require deformation processing, the default target table update service is invoked; 12) if the record requires deformation processing, then deformation processing is performed if If the head of the deformation result set is empty (ie, the deformation fails), then the step (A9) is returned; (A13) if the head of the deformation result set is not empty (ie, the deformation is successful), the target record pointed by the pointer is acquired, and the specific call is called.
  • Target table update service 5 (A14) writing the target record into the data storage area corresponding to the target table; (A15) repeating steps (A9) - (A14) until the data storage area is full; A16) Calling the target table to update the service module 5, and recording the number of times the subsequent service should be processed, and calling the subsequent service.
  • the meaning of the deformation is as follows: When the source data record is inconsistent with the target data record, data conversion and target table positioning are required, and the process of the data conversion and the target table positioning is called deformation.
  • the source data obtaining service module 4 includes the following sub-modules:
  • the source data selection sub-module is configured to select data to be transferred from the source data table; the deformation processing sub-module, the module can complete the deformation processing by calling an external deformation function;
  • the target table update service determining submodule is configured to determine, according to the target database, the target data table, the target update service name, the keyword and mutual semaphore of the data storage area where the target is located, the keyword of the idle control message queue, and the like;
  • the sub-module can write the deformed data to the corresponding segment of the data storage area corresponding to the task under the mutual protection of the semaphore, so as to be read by the target table update service sub-module, wherein, When all the memory segments are unavailable, the sub-module can block the idle control message queue corresponding to the task. After reading the message, continue to search for the idle data storage area under the semaphore protection; call the target table update service sub-module , is used to create an update data storage area, and according to the determination result of the target table update service determination sub-module, the number of subsequent service calls should be counted, and at the same time, the target record is written into the temporary data storage area.
  • the source data acquisition service module 4 further includes an exception processing sub-module.
  • the submodule calls the public error service and logs.
  • the connection to the database fails, the submodule reconnects to the database and logs.
  • the sub-module will reacquire the source data. If it is still unsuccessful after reaching a certain number of times, the process is skipped and the log is recorded, and the public error service is called.
  • the deformation processing fails, the deformation processing is performed again. If it is not successful after reaching a certain number of times, the processing is skipped and the log is recorded, and the public error service is called.
  • the write data storage area fails, try to rewrite. If it is still unsuccessful after reaching a certain number of times, skip the processing and record the log, and call the public error service.
  • the source data obtaining service module 4 receives the scheduling of the total control module 3 (the minute-level transfer master control module or the second-level branch master control module), acquires the target data according to the original data of the transfer task, and calls the target table. Update the service module.
  • the source data acquisition service module can be divided into two parts: a main body and an attachment. The main part is used to read the parameters of the transfer task, load the attachment, and call the sub-function corresponding to the transfer task, so that the user does not need to modify the main part when using.
  • the attachment part is a dynamic function library, and the dynamic function library encapsulates various sub-processes such as task identification, original data acquisition, data deformation, and transfer task state information maintenance, wherein each database environment variable is acquired and data selection operation and data deformation are performed.
  • the operation can be customized according to the user's needs, so as to meet the needs of different users, different data tables, and data replication and extraction box synchronization.
  • the operation of the data deformation is optional, that is, if the user does not have the need for filtering and deformation of the data, the user only needs to customize the operation of selecting the original data from the source database.
  • FIG. 3 is a flowchart showing the operation of the target table update service module 5 according to an embodiment of the present invention.
  • the working process of the target table update service module 5 is as follows: (B1) Initializing and reading the basic configuration information, and connecting the database and the data storage area (the data storage area is created when the data storage area does not exist); (B2) accepting the call of the source data acquisition service module 4, and parsing the obtained parameter information; (B3) determining whether the transfer task is a minute-level transfer or a second-level transfer, and if it is a minute-level transfer, further determining whether the transfer number exceeds the rated number of times (B4) if the transfer task is a minute transfer and exceeds the rated number of times, recording is performed by the log and abnormal processing is performed; (B5) if the transfer task is a minute transfer and does not exceed the rated number, or the transfer The task is a second-level transfer, and the target database ⁇ is bulk inserted and updated; (B6) if the insert and update operations are
  • the target table update service module 5 includes the following sub-modules:
  • the process initialization sub-module has functions of initializing data storage area, semaphore, and connecting the target database and the control database;
  • the main control flow sub-module, the sub-module can obtain parameter information from the caller (ie, the source data acquisition service module 4) and parse, select data from the data storage area, and call the batch database update sub-module and the task state processing sub-module , thereby completing the task delivered by the source data obtaining service module 4;
  • the caller ie, the source data acquisition service module 4
  • parse select data from the data storage area
  • call the batch database update sub-module and the task state processing sub-module thereby completing the task delivered by the source data obtaining service module 4;
  • a batch database update sub-module that selects data from the data store and batch updates it to the target table
  • a task status processing sub-module which can update the processing status in the temporary data storage area and write the idle control message queue under the semaphore mutual exclusion protection, and update the corresponding time slice of the transfer status control table when all tasks are processed.
  • the transfer status is completed, and the data storage section corresponding to the current task is organized.
  • the target table update service module 5 further includes an exception processing submodule that discards the illegal delivery parameters and records the error information.
  • the submodule actively reconnects the database until the connection is reached. Successfully proceed to the next step.
  • the database inserts/updates an exception, for the second-level transfer, the exception is ignored, and the following operations are directly performed.
  • the minute-level transfer the number of errors is counted and the log is recorded. When the rated number is exceeded, the error is not repeated. Perform an insert/update operation.
  • the target table update service module 5 accepts the schedule of the source data acquisition service module 4 to update the target data to the target table, and updates the information of the transfer task to complete the current transfer task.
  • the target table update service module can be divided into two parts: a main body and an attachment.
  • the main body portion is configured to read the parameters of the transfer task, load the attachment, and invoke the sub-function corresponding to the transfer task, so that the user does not need to modify the main part when using.
  • the attachment part is a dynamic function library, and the dynamic function library encapsulates various sub-processes such as task identification, insertion and update of target data, and maintenance of state information of the transfer task, wherein each database environment variable is acquired and the target data is inserted and updated.
  • the operation can be customized according to the needs of the user, so as to meet the needs of different users and different data tables.
  • the insert/update process of the batch database update submodule is as follows: (C1) dividing the data to be inserted into blocks, each block can store multiple pieces of data; (C2) multiple Each process in the database operation process reads a piece of data in an exclusive manner each time, and then inserts or updates the data into the target database by means of batch insertion or batch update; (C3) if step If the process (C2) is successful, the process may continue to process other data blocks. If the step (C2) fails, the data is inserted or updated into the target database by a single insertion or update, and the operation failure record is recorded. Corresponding error log and continue processing other data blocks.
  • the insertion/update process disclosed by the present invention combines multi-process concurrent processing and batch insertion/update technology, so that a large amount of log space is not required, and the submission of individual processes can be allowed to fail, thereby reducing The communication cost of the server side, and the insertion speed is fundamentally improved.
  • the data transfer method disclosed by the present invention includes the following steps: (D1) initializing the master control module 3 in the host, and reading the transfer control information and parameter configuration information; (D2) initializing the source data in the host Obtaining the service module 4 and the target table update service module 5, and the S-table update service module 5 creates a data storage area; (D3) the master control module 3 calls the source according to the read transfer control information and parameter configuration information Data acquisition service module, and transfer the task parameters to the source data acquisition service module 4;
  • the source data obtaining service module 4 connects the source database S, -S N to acquire the environment variables of the source database S, -S N , and reads the source database Si-S N according to the transfer task parameter. Corresponding database record; (D5) the source data obtaining service module 4 deforms the read database record to obtain a target record, and writes the target record into a data storage area corresponding to the target table;
  • the total control module 3 is a minute-level transfer master control module or a second-level transfer master control module.
  • the source data obtaining service module 4 can be divided into two parts: a main body and an accessory.
  • the main part is used to read the parameters of the transfer task, load the attachment, and call the sub-function corresponding to the transfer task, so that the user does not need to modify the main part when using.
  • the attachment part is a dynamic function library, and the dynamic function library encapsulates task identification, original data acquisition, data deformation, and transfer task status.
  • Various sub-processes such as information maintenance, in which the operations of acquiring each database environment variable and performing data selection and data deformation can be customized according to the needs of the user, thereby satisfying different users, different data tables, and data copying and extraction box synchronization. Need.
  • the operation of the data deformation is optional, that is, if the user does not have the need for filtering and deformation of the data, the user only needs to customize the operation of selecting the original data from the source database.
  • the target table update service module 5 can also be divided into two parts: a main body and an attachment.
  • the main body portion is configured to read the parameters of the transfer task, load the attachment, and invoke the sub-function corresponding to the transfer task, so that the user does not need to modify the main part when using.
  • the attachment part is a dynamic function library, and the dynamic function library encapsulates various sub-processes such as task identification, insertion and update of target data, and maintenance of state information of the transfer task, wherein each database environment variable is acquired and the target data is inserted and updated.
  • the operation can be customized according to the needs of the user, so as to meet the needs of different users and different data tables.
  • the step (D9) in the data transfer method disclosed by the present invention further comprises the following steps: (E1) dividing the data to be inserted into blocks, each block can store multiple pieces of data; (E2) each of a plurality of database operation processes Each process reads a piece of data in an exclusive manner, and then inserts or updates the data into the target database P, -P N by means of batch insertion or batch update; (E3) if step (E2) is successful, then The process may continue to process other data blocks. If the step (E2) fails, the data is inserted or updated into the target database in a single insertion or update manner, and the corresponding error log of the operation failure record is recorded and processed. Other data blocks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

数据转移系统及方法 技术领域
本发明涉及数据转移系统和方法, 更具体地, 涉及多平台数据库的数据转 移系统和方法。 背景技术
目前, 随着数据库系统应用的不断增加和日益普及, 多个数据库之间的数 据转移变得越来越常见和重要, 并且,在同一数据库应用系统中往往包含有多 种平台的数据库, 因此, 同 /异构数据库之间的数据平滑转移技术也是市场迫 切需要的。 商用数据库具备一定的复制能力,但是必须在诸多限制的理想情况 下工作, 因而具有较大的局限性。独立的数据复制软件具有点对点复制数据库 的体系结构, 因而不能灵活地解决具有复杂拓朴结构的多个同 /异构数据库间 的数据转移。 此外还有针对特定应用系统而专门开发的数据转移软件, 然而此 种数据转移软件缺乏通用性。 综上所述, 已有的数据库间的数据转移技术存在 如下缺点: 成本较高; 对数据库或数据表的影响较大, 偶合度高 (即适应于 A 类数据库或数据表的数据转移软件不适应 B类数据库或数据表), 甚至需要在 源数据库上建立触发器或依赖于特定数据库产品的特定功能的支持,因而缺乏 通用性和扩展性;功能单一,无法同时支持准实时(即分钟级转移)和实时(即 秒级转移)两种模式的转移; 较难甚至无法实现异构数据库间的数据转移, 即 较难甚至无法实现数据的过滤、变形 ,从而数据库系统的容灾和恢复能力较差。
另外, 随着数据库技术的迅速发展, 大型数据库系统为了满足越来越多的 应用需求, 都支持并发操作。 同时, 为了进一步提高数据库的插入性能, 很多 数据库产品开始支持批量插入技术。已有的将大量待插入的动态数据迅速插入 指定目标库中的方法有如下几种: ( 1 )将待插入的动态数据保存成数据库文件, 通过数据库备份或者通过数据库提供的加载工具批量导入; ( 2 )通过多进程并 发, 将数据通过多个链接插入到数据库中; (3 )通过批量插入, 即一次提交多 条数据。 然而, 上述方法存在如下缺点: 方法(1 )需要额外的磁盘空间, 并 且保存文件时的输入 /输入操作比较耗时; 在方法(2 )中, 虽然多进程并发执 行, 但是每个进程一次仅提交一条记录, 因而效率很低; 在方法(3 ) 中, 如 果一次提交大量数据, 则需要较大的日志空间, 因而一旦提交失败, 会造成整 体失败。 因此,可以最大限度地发挥数据库性能的批量插入技术也是市场迫切 需要的。 发明内容
为了解决上述现有技术方案所存在的缺陷,本发明提出了一种可在数据库 间(包括同构和异构)进行数据转移的数据转移系统和方法, 并且该数据转移 系统和方法同时支持准实时(即分钟级转移)和实时(即秒级转移)两种模式 的数据转移。
本发明的目的是通过以下技术方案实现的:
一种数据转移系统, 所述数据转移系统包括至少一个主机、至少一个源数 据库、 至少一个目标数据库、 参数配置数据库以及控制数据库; 其中, 所述参 数配置数据库与所述至少一个主机相连接,用于存储配置参数并响应所述至少 一个主机的请求而将所述配置参数提供给所述至少一个主机。所述控制数据库 与所述至少一个主机相连接,用于存储控制信息并响应所述至少一个主机的请 求而将所述控制信息提供给所述至少一个主机,所述至少一个源数据库分别与 所述至少一个主机相连接, 用于提供待转移的源数据, 所述至少一个目标数据 库分别与所述至少一个主机相连接, 用于接收待转移的目标数据; 并且, 每个 所述主机均包括总控模块、 源数据获取服务模块、 目标表更新服务模块, 所述 总控模块用于总体调控转移任务并装载所述配置参数; 其特征在于, 所述源数 据获取服务模块接受所述总控模块的调度从而获取源数据、生成目标数据、并 调用目标表更新服务模块, 所述源数据获取服务模块分为主体和附件两部分, 所述主体部分用于读取转移任务的参数,加载附件, 并调用所述转移任务对应 的子处理, 所述附件部分为动态函数库, 所述动态函数库封装源数据获取、 数 据形变子处理。
在上面公开的方案中,优选地, 所述目标表更新服务模块接受所述源数据 获取服务模块的调度而将所述目标数据插入或更新到目标表中 ,并且所述目标 表更新服务模块分为主体和附件两部分,所述主体部分用于读取转移任务的参 数, 加载附件, 调用所述转移任务对应的子处理, 附件部分为动态函数库, 所 述动态函数库封装目标数据的插入和更新子处理。
在上面公开的方案中,优选地, 所述总控模块为分钟级转移总控模块或秒 级转移总控模块, 其中, 所述分钟级转移总控模块用于完成准实时数据转移任 务, 所述秒级转移总控模块用于完成实时数据转移任务。
在上面公开的方案中,优选地, 所述目标表更新服务模块进一步包括批量 数据库更新子模块, 所述批量数据库更新子模块以如下方式进行插入和 /或更 新过程:
( C1 )将待插入的数据分割成块, 每块可存放多条所述数据;
( C2 )多个数据库操作进程中的每个进程每次以排他的方式读取一块数据, 然后采用批量插入或批量更新的方式将所述数据插入或更新到所述目标数据 库中;
( C3 )如果步骤(C2 )成功, 则所述进程可以继续处理其他数据块, 如果 步骤(C2 )失败, 则采用单个插入或更新的方式将所述数据插入或更新到所述 目标数据库中, 记录操作失败记录对应的错误日志并继续处理其他数据块。
在上面公开的方案中, 优选地, 所述总控模块部署于每个所述主机上, 其 中, 所述至少一个主机中只有一个主机具有任务调度功能。
在上面公开的方案中,优选地, 所述源数据获取服务模块包括形变处理子 模块, 所述形变处理子模块通过调用动态形变函数而完成对源数据的形变处 理, 从而生成目标数据。
在上面公开的方案中,优选地, 所述至少一个源数据库和所述至少一个目 标数据库是异构的。
在上面公开的方案中,优选地, 所述参数配置数据库和所述控制数据库共 存于一个数据库中。
本发明的目的是通过以下技术方案实现的:
一种数据转移方法, 所述数据转移方法包括如下步骤:
( D1 )初始化至少一个主机中的总控模块, 并读取转移控制信息和参数配 置信息;
( D2 )初始化至少一个主机中的源数据获取服务模块和目标表更新服务模 块, 并且所述目标表更新服务模块创建数据存储区;
( D3 )所述控制模块根据所读取的转移控制信息和参数配置信息调用所述 源数据获取服务模块, 并将转移任务参数传递给所述源数据获取服务模块;
( D4 )所述源数据获取服务模块连接至少一个源数据库从而获取所述至少 一个源数据库的环境变量,并根据所述转移任务参数读取所述源数据库中相应 的数据库记录;
( D5 )所述源数据获取服务模块对已读取的数据库记录进行形变处理从而 得到目标记录, 并将所述目标记录写入所述数据存储区中;
( D6 )重复步骤(D4 ) - ( D5 ) 直至所述数据存储区已写满:
( D7 )所述数据存储区写满后, 所述源数据获取服务模块调用所述目标表 更新服务模块, 并将转移任务参数传递给所述目标表更新服务模块;
( D8 )所述目标表更新服务模块连接至少一个目标数据库以及所述数据存 储区,从而获取所述至少一个目标数据库的环境变量, 并对接收的所述转移任 务参数进行解析;
( D9 )所述目标表更新服务模块依据解析结果对所述至少一个目标数据库 进行批量插入和更新直至转移任务完成;
( D10 )转移任务完成后, 所述目标表更新服务模块释放所述数据存储区; 其特征在于,所述源数据获取服务模块接受所述总控模块的调度从而获取 源数据、 生成目标数据、 并调用目标表更新服务模块, 所述源数据获取服务模 块分为主体和附件两部分,所述主体部分用于读取转移任务的参数,加载附件, 并调用所述转移任务对应的子处理, 所述附件部分为动态函数库, 所述动态函 数库封装源数据获取、 数据形变子处理
在上面公开的方案中,优选地, 所述目标表更新服务模块接受所述源数据 获取服务模块的调度而将所述目标数据插入或更新到目标表中,并且所述目标 表更新服务模块分为主体和附件两部分,所述主体部分用于读取转移任务的参 数, 加载附件, 调用所述转移任务对应的子处理, 附件部分为动态函数库, 所 述动态函数库封装目标数据的插入和更新子处理。
在上面公开的方案中, 优选地, 所述数据转移方法中的步骤(D9 )进一步 包括如下步驟: (E1 )将待插入的数据分割成块, 每块可存放多条所述数据; ( E2 ) 多个数据库操作进程中的每个进程每次以排他的方式读取一块数据, 然 后采用批量插入或批量更新的方式将所述数据插入或更新到所述至少一个目 标数据库中; ( E3 )如果步骤( E2 )成功, 则所述进程可以继续处理其他数据 块, 如果步骤(E2 )失败, 则采用单个插入或更新的方式将所述数据插入或更 新到所述目标数据库中,记录操作失败记录对应的错误日志并继续处理其他数 据块。
在上面公开的方案中,优选地, 所述总控模块为分钟级转移总控模块或秒 级转移总控模块, 其中, 所述分钟级转移总控模块用于完成准实时数据转移任 务, 所述秒级转移总控模块用于完成实时数据转移任务。
在上面公开的方案中, 优选地, 所述总控模块部署于每个所述主机上, 其 中, 所述至少一个主机中只有一个主机具有任务调度功能。
在上面公开的方案中,优选地, 所述源数据获取服务模块包括形变处理子 模块, 所述形变处理子模块通过调用动态形变函数而完成对源数据的形变处 理, 从而生成目标数据。
在上面公开的方案中,优选地, 所述至少一个源数据库和所述至少一个目 标数据库是异构的。
本发明所公开的数据转移系统和方法具有如下优点:由于使用动态函数库 封装了源数据选取和数据形变过程,因此本发明所公开的数据转移系统和方法 不但复杂度低而且可以灵活应对不同状况的转移任务, 即可以实现同 /异构数 据库间的数据转移; 另外, 本发明所公开的数据转移系统和方法可以同时支持 准实时(即分钟级转移)和实时(即秒级转移) 两种模式的数据转移。 此外, 由于在数据的插入和更新过程中, 并发处理和批量操作相结合, 因此可以最大 限度地发挥数据库的性能。 综上所述,本发明所公开的数据转移系统和方法是 一种通用, 灵活、 高度封装、 高稳定性的系统和方法。 附图说明
结合附图, 本发明的技术特征以及优点将会被本领域技术人员更好地理 解, 其中:
图 1为根据本发明的实施例的敖据转移系统的结构图; 图 2为根据本发明的实施例的源数据获取服务模块的工作流程图; 图 3为根据本发明的实施例的目标表更新服务模块的工作流程图; 图 4为根据本发明的实施例的数据转移方法的流程图; 具体实施方式
图 1 为 居本发明的实施例的数据转移系统的结构图。 如图 1所示, 本 发明公开的数据转移系统包括至少一个主机 Η1 5优选地包括多个主机 Η,-Η3 (即 服务器) , 至少一个源数据库 S 优选地包括多个源数据库 Si-SN、 至少一个 目标数据库 P,, 优选地包括多个目标数据库 PrP 参数配置数据库 1 以及控 制数据库 2。 其中, 所述参数配置数据库 1与所述主机 Η ,-Η;相连接, 用于存 储配置参数并响应所述主机 Hr~H3的请求而将该配置参数提供给所述主机 Hi-H3。 所述控制数据库 2与所述主机 - 相连接, 用于存储控制信息 (即数 据转移程序参数以及转移动态信息)并响应所述主机 H厂 H3的请求而将该控制 信息提供给所述主机 Η,-Η^ 即使转移进程出现异常而中断, 当重新启动时仍 然可以从所述控制数据库 2中获取所有控制信息并继续进行转移操作,从而保 证数据的完整性和可靠性。 所述源数据库 S「SN分别与所述主机 -^相连接, 用于提供待转移的源数据。所述目标数据库 Ρ,- PN分别与所述主机 H,-H3相连接, 用于接收待转移的目标数据。 其中, 可选地, 所述参数配置数据库 1和所述控 制数据库 2可以共存于一个数据库中。
如图 1所示, 所述主机 ΗΓ· H3中的每个主机均包括总控模块 3 (分钟级转 移总控模块或秒级转移总控模块) 、 源数据获取服务模块 4、 目标表更新服务 模块 5。 其中, 所述总控模块 3用于总体任务调控和配置参数的内存装载, 即 所述分钟级转移总控模块和秒级转移总控模块单独部署,当需要分钟级转移服 务时启动分钟级总控服务模块, 当需要秒级转移服务时启动秒级总控服务模 块。
如图 1所示, 总控模块 3在每台主机上均有一个, 但通过参数控制, 主机 群组中只有一个主机具备任务调度的功能。 因此, 一旦具备任务调度功能的主 机异常, 可在备份主机上启动总控模块的任务调度功能。 因此, 各主机上的源 数据获取服务模块 4和目标表更新服务模块 5均处于工作状态,从而提高了系 统的处理性能和容灾能力。
如图 1所示, 所述分钟级总控服务模块用于初始化转移控制信息, 并将转 移控制信息平均传递给多台主机的转移服务,随后控制数据转移时间片的推进 及状态的更新, 同时, 所述分钟级总控服务模块还可以通过配置参数而实现数 据转移过程中转移任务的分发和转移进程的报告。所述时间片的含义如下: 在 本发明所公开的数据转移系统中,根据配置将时间段人为划分成逻辑上的若干 小段,所述一小段的时间即为一个时间片。所述配置信息包括:转移开始时间、 转移时间片、每循环更新当前转移结束时间的次数、转移结束时间与当前系统 时间的间隔、 数据库连接信息和主机个数。
如图 1所示, 所述秒级总控服务模块用于将 n秒(例如, n<10 )前的数据 从源数据表转移到目标数据表中, 即根据数据库中的数据转移控制表的信息, 分配转移任务给转移服务, 并推进转移的时间片进度, 同时, 所述秒级总控服 务模块还可以通过配置参数而实现数据转移过程中转移任务的分发和转移进 程的报告。 所述配置信息包括: 转移开始时间、 转移时间片、 每循环更新当前 转移结束时间的次数、转移结束时间与当前系统时间的间隔、数据库连接信息 和主机个数。
图 2为根据本发明的实施例的源数据获取服务模块 4的工作流程图。如图 2所示, 源数据获取服务模块 4的工作过程如下: ( A1 )初始化并连接源数据 库 S!-Sx; ( A2 )接收总控模块 3 (分钟级转移总控模块或秒级转移总控模块) 的调用; ( A3 )根据总控模块 3的调用信息获取转移任务参数; ( A4 )判断所 述任务参数所指示的任务; (A5 )如果所述任务为 "待更新" , 则更新控制表 中的转移状态为 "更新中" , 如果所述任务为 "转移" , 则更新控制表的转移 状态为 "转移中" ; 6 )获取源数据库 S「SN的环境变量; (A7 )根据已获 取的源数据库 S^SN的环境变量, 进一步获取源数据的首记录并判断是否无记 录; ( A8 )如果无记录且所述任务为 "待更新" , 则更新控制表的转移状态为 "全部完成"并结束本次源数据获取服务, 如果无记录且所述任务为转移, 则 更新控制表的转移状态为 "转移完成" 并结束本次源数据获取服务; (A9 )如 果有记录, 则判断记录是否已全部处理完毕; 如果记录已全部处理完毕, 则调 用目标表更新服务模块 5并结束本次源数据获取服务; ( A10 )如果还存在未 处理的数据, 则判断该记录是否需要形变处理; (Al l )如果该记录不需要形 变处理, 则调用缺省的目标表更新服务; 12 )如果该记录需要形变处理, 则进行形变处理,如果形变结果集头指针为空(即形变失败),则返回步骤( A9 ); ( A13 )如果形变结果集头指针不为空(即形变成功) , 则获取指针所指的目 标记录, 并调用特定的目标表更新服务 5; ( A14 )将所述目标记录写入目标 表对应的数据存储区中; (A15 )重复步骤(A9 ) - ( A14 ) , 直至所述数据存 储区已写满; (A16 )调用目标表更新服务模块 5, 并记录后续服务应处理次 数, 并调用后续服务。 其中, 所述形变的含义如下: 当源数据记录与目标数据 记录不一致时, 需要进行数据转换与目标表定位, 所述数据转换与目标表定位 的过程被称之为形变。
所述源数据获取服务模块 4包括如下子模块:
初始化子模块, 用于荻取参数信息,初始化全局控制变量和临时数据存储 区信号量, 设置主机组群状态信息等;
源数据选择子模块, 用于从源数据表中选取待转移的数据; 形变处理子模 块, 该模块可通过调用外部形变函数而完成形变处理;
目标表更新服务确定子模块, 用于根据目标数据库, 目标数据表确定目标 更新服务名, 目标所在的数据存储区的关键字和互斥信号量, 空闲控制消息队 列的关键字等;
写数据存储区子模块,该子模块可将变形后的数据在信号量互斥保护下写 入任务所对应的数据存储区的对应段, 以便供目标表更新服务子模块读取, 其 中, 当所有内存段均不可用时, 该子模块可以阻塞方式读取任务所对应的空闲 控制消息队列, 当读到消息后, 继续在信号量保护下寻找空闲的数据存储区; 调用目标表更新服务子模块, 用于创建更新数据存储区, 并根据目标表更 新服务确定子模块的确定结果统计后续服务应调用次数, 同时,将目标记录写 入临时数据存储区。
所述源数据获取服务模块 4还包括异常处理子模块。 当初始化失败时, 该 子模块调用公共报错服务并记录日志。 当连接数据库失败时,该子模块将重连 数据库并记录日志。 当数据选取失败时, 该子模块将重新获取源数据, 如果达 到一定次数仍未成功, 则跳过该处理并记录日志, 同时调用公共报错服务。 当 形变处理失败时, 重新进行形变处理, 如果达到一定次数仍未成功, 则跳过该 处理并记录日志, 同时调用公共报错服务。 当写数据存储区失败时,尝试重写, 如果达到一定次数仍未成功,,则跳过该处理并记录日志, 同时调用公共报错服 务。
所述源数据获取服务模块 4接受总控模块 3 (分钟级转移总控模块或秒级 转移总控模块)的调度,根据调用参数完成转移任务的原始数据获取到目标数 据生成, 并调用目标表更新服务模块。 所述源数据获取服务模块可分为主体和 附件两部分。 主体部分用于读取转移任务的参数, 加载附件, 调用所述转移任 务对应的子函数,从而用户在使用时无须修改主体部分。 附件部分为动态函数 库, 所述动态函数库封装任务识别、 原始数据获取、 数据形变、 转移任务状态 信息维护等各种子处理,其中, 获取各数据库环境变量并进行选择数据的操作 以及数据形变的操作可根据用户的需求进行定制,从而可以满足不同用户, 不 同数据表以及数据复制、抽取盒同步的需要。 其中, 所述数据形变的操作是可 选的, 即如果用户对数据没有过滤和形变的需求, 则用户只需定制从源数据库 中选择出原始数据的操作即可。
图 3为根据本发明的实施例的目标表更新服务模块 5的工作流程图。如图 3所示, 目标表更新服务模块 5的工作过程如下: ( B1 )初始化并读取基本配 置信息, 并连接数据库与数据存储区(当数据存储区不存在时创建该数据存储 区); ( B2 )接受源数据获取服务模块 4的调用, 并对获取的参数信息进行解 析; (B3 )判断转移任务是分钟级转移还是秒级转移, 如果是分钟级转移进一 步判断转移次数是否超过额定次数; ( B4 )如果所述转移任务是分钟级转移且 超过额定次数, 则通过日志进行记录并进行异常处理; ( B5 )如果所述转移任 务是分钟级转移且未超过额定次数,或所述转移任务是秒级转移, 则对目标数 据库 ^进行批量插入和更新; (B6 )如果所述插入和更新操作成功, 则更 新临时监控内存的任务完成情况; (B7 )判断所述任务完成情况, 如果所述任 务完成情况为未全部完成, 则释放数据存储区段并返回步骤( B2 ); ( B8 )如 杲所述任务完成情况为全部完成, 则更新控制表中数据转移状态为 "完成" , 并清理临时监控内存: ( B9 )释放数据存储区段并返回步骤( B2 ) 。
所述目标表更新服务模块 5包括如下子模块: 进程初始化子模块, 具有初始化数据存储区、 信号量, 以及连接目标数据 库和控制数据库等功能;
主控流程子模块, 该子模块可从调用者(即源数据获取服务模块 4 )处获 取参数信息并进行解析,从数据存储区中选取数据并调用批量数据库更新子模 块和任务状态处理子模块, 从而完成源数据获取服务模块 4下发的任务;
批量数据库更新子模块,该子模块可从数据存储区中选取数据并将其批量 更新到目标表中;
任务状态处理子模块,该子模块可在信号量互斥保护下更新临时数据存储 区中的处理状态并写空闲控制消息队列, 当所有任务完成处理时, 更新转移状 态控制表的相应时间片的转移状态为完成,并整理当前任务对应的数据存储区 段。
所述目标表更新服务模块 5还包括异常处理子模块,该子模块对非法的传 递参数采取丟弃处理并记录出错信息, 当数据库连接异常时, 该子模块将主动 进行数据库重连, 直到连接成功才进行下一步处理。 同时, 当数据库插入 /更 新异常时, 对于秒级转移, 对此异常忽略不计, 直接执行下面的操作, 对于分 钟级转移, 则统计出错次数并记录日志, 当超过额定次数时, 就不再重复进行 插入 /更新操作。
所述目标表更新服务模块 5接受源数据获取服务模块 4的调度而将目标数 据更新到目标表中, 并更新转移任务的信息, 以便完成当前转移任务。 所述目 标表更新服务模块可分为主体和附件两部分。所述主体部分用于读取转移任务 的参数, 加载附件, 调用所述转移任务对应的子函数, 从而用户在使用时无须 修改主体部分。 附件部分为动态函数库, 所述动态函数库封装任务识别、 目标 数据的插入和更新、 转移任务状态信息维护等各种子处理, 其中, 获取各数据 库环境变量并进行目标数据的插入和更新的操作可根据用户的需求进行定制, 从而可以满足不同用户, 不同数据表的需要。
在所述目标表更新服务模块 5中, 所述批量数据库更新子模块的插入 /更 新过程如下: ( C1 )将待插入的数据分割成块, 每块可存放多条数据; ( C2 ) 多个数据库操作进程中的每个进程每次以排他的方式读取一块数据,然后采用 批量插入或批量更新的方式将数据插入或更新到目标数据库中; ( C3 )如果步 骤(C2 )成功, 则所述进程可以继续处理其他数据块, 如果步骤(C2 ) 失败, 则采用单个插入或更新的方式将所述数据插入或更新到所述目标数据库中,记 录操作失败记录对应的错误日志并继续处理其他数据块。 与已有技术相比,本 发明所公开的插入 /更新过程结合了多进程并发处理和批量插入 /更新技术,因 此不需要大量的日志空间,也能够允许个别进程的提交失败,从而减少了与服 务器端的通信代价, 并从根本上提高了插入速度。
图 4为根据本发明的实施例的数据转移方法的流程图。如图 4所示,本发 明所公开的数据转移方法包括如下步骤: (D1 )初始化主机中的总控模块 3 , 并读取转移控制信息和参数配置信息; ( D2 )初始化主机中的源数据获取服务 模块 4和目标表更新服务模块 5 , 并且所述 S标表更新服务模块 5创建数据存 储区; ( D3 )所述总控模块 3根据所读取的转移控制信息和参数配置信息调用 源数据获取服务模块 ,并将转移任务参数传递给所述源数据获取服务模块 4;
( D4 )所述源数据获取服务模块 4连接源数据库 S,-SN从而获取所述源数据库 S,-SN的环境变量, 并根据所述转移任务参数读取源数据库 Si-SN中相应的数据 库记录; ( D5 )所述源数据获取服务模块 4对已读取的数据库记录进行形变处 理从而得到目标记录, 并将所述目标记录写入目标表对应的数据存储区中;
( D6 )重复步骤( D4 ) - ( D5 )直至所述数据存储区已写满: ( D7 )所述数据 存储区写满后, 所述源数据获取服务模块 4调用目标表更新服务模块 5, 并将 转移任务参数传递给所述目标表更新服务模块 5; ( D8 )所述目标表更新服务 模块 5连接目标数据库 Pi-Pw以及所述数据存储区, 从而获取所述目标数据库 P -PN的环境变量, 并对接收的所述转移任务参数进行解析; ( D9 )所述目标 表更新服务模块 5依据解析结果对所述目标数据库 Ρ,-ΡΝ进行批量插入和更新 直至转移任务完成; ( DI G )转移任务完成后, 所述目标表更新服务模块 5释 放所述数据存储区。
在本发明所公开的上述数据转移方法中,所述总控模块 3为分钟级转移总 控模块或秒级转移总控模块。其中, 所述源数据获取服务模块 4可分为主体和 附件两部分。 主体部分用于读取转移任务的参数, 加载附件, 调用所述转移任 务对应的子函数,从而用户在使用时无须修改主体部分。 附件部分为动态函数 库, 所述动态函数库封装任务识别、 原始数据获取、 数据形变、 转移任务状态 信息维护等各种子处理, 其中, 获取各数据库环境变量并进行选择数据的操作 以及数据形变的操作可根据用户的需求进行定制,从而可以满足不同用户, 不 同数据表以及数据复制、 抽取盒同步的需要。 其中, 所述数据形变的操作是可 选的, 即如果用户对数据没有过滤和形变的需求, 则用户只需定制从源数据库 中选择出原始数据的操作即可。
同时, 在本发明所公开的数据转移方法中, 所述目标表更新服务模块 5 同样可分为主体和附件两部分。 所述主体部分用于读取转移任务的参数,加载 附件,调用所述转移任务对应的子函数,从而用户在使用时无须修改主体部分。 附件部分为动态函数库, 所述动态函数库封装任务识别、 目标数据的插入和更 新、 转移任务状态信息维护等各种子处理, 其中, 获取各数据库环境变量并进 行目标数据的插入和更新的操作可根据用户的需求进行定制,从而可以满足不 同用户, 不同数据表的需要。
本发明所公开的数据转移方法中的步骤( D9 )进一步包括如下步骤: ( E1 ) 将待插入的数据分割成块, 每块可存放多条数据; ( E2 )多个数据库操作进程 中的每个进程每次以排他的方式读取一块数据,然后采用批量插入或批量更新 的方式将数据插入或更新到所述目标数据库 P,-PN中; ( E3 )如果步骤( E2 ) 成功, 则所述进程可以继续处理其他数据块, 如果步骤(E2 )失败, 则采用单 个插入或更新的方式将所述数据插入或更新到所述目标数据库中,记录操作失 败记录对应的错误日志并继续处理其他数据块。
尽管本发明是通过上述的优选实施方式进行描述的,但是其实现形式并不 局限于上述的实施方式。 应该认识到: 在不脱离本发明主旨和范围的情况下, 本领域技术人员可以对本发明做出不同的变化和修改。

Claims

权利要求
1. 一种数据转移系统, 所述数据转移系统包括至少一个主机、 至少一个 源数据库、 至少一个目标数据库、 参数配置数据库以及控制数据库; 其中, 所 述参数配置数据库与所述至少一个主机相连接,用于存储配置参数并响应所述 至少一个主机的请求而将所述配置参数提供给所述至少一个主机;所述控制数 据库与所述至少一个主机相连接,用于存储控制信息并响应所述至少一个主机 的请求而将所述控制信息提供给所述至少一个主机,所述至少一个源数据库分 别与所述至少一个主机相连接, 用于提供待转移的源数据, 所述至少一个目标 数据库分别与所述至少一个主机相连接, 用于接收待转移的目标数据; 并且, 每个所述主机均包括总控模块、 源数据获取服务模块、 目标表更新服务模块, 所述总控模块用于总体调控转移任务并装载所述配置参数;
其特征在于,所述源数据获取服务模块接受所述总控模块的调度从而获取 源数据、 生成目标数据、 并调用目标表更新服务模块, 所述源数据获取服务模 块分为主体和附件两部分,所述主体部分用于读取转移任务的参数,加载附件, 并调用所述转移任务对应的子处理, 所述附件部分为动态函数库, 所述动态函 数库封装源数据获取、 数据形变子处理。
2. 根据权利要求 1所述的数据转移系统, 其特征在于, 所述目标表更新 服务模块接受所述源数据获取服务模块的调度而将所述目标数据插入或更新 到目标表中, 并且所述目标表更新服务模块分为主体和附件两部分, 所述主体 部分用于读取转移任务的参数, 加载附件, 调用所述转移任务对应的子处理, 附件部分为动态函数库, 所述动态函数库封装目标数据的插入和更新子处理。
3. 根据权利要求 1-2中任一个权利要求所述的数据转移系统, 其特征在 于, 所述总控模块为分钟级转移总控模块或秒级转移总控模块, 其中, 所述分 钟级转移总控模块用于完成准实时数据转移任务,所述秒级转移总控模块用于 完成实时数据转移任务。
4. 根据权利要求 1-3中任一个权利要求所述的数据转移系统, 其特征在 于, 所述目标表更新服务模块进一步包括批量数据库更新子模块, 所述批量数 据库更新子模块以如下方式进行插入和 /或更新过程: ( CI )将待插入的数据分割成块, 每块可存放多条所述数据;
( C2 )多个数据库操作进程中的每个进程每次以排他的方式读取一块数据, 然后采用批量插入或批量更新的方式将所述数据插入或更新到所述目标数据 库中;
( C3 )如果步骤(C2 )成功, 则所述进程可以继续处理其他数据块, 如果 步骤(C2 )失败, 则采用单个插入或更新的方式将所述数据插入或更新到所述 目标数据库中, 记录操作失败记录对应的错误日志并继续处理其他数据块。
5. 根据权利要求 1-4中任一个权利要求所述的数据转移系统, 其特征在 于, 所述总控模块部署于每个所述主机上, 其中, 所述至少一个主机中只有一 个主机具有任务调度功能。
6. 根据权利要求 1-5中任一个权利要求所述的数据转移系统, 其特征在 于, 所述源数据获取服务模块包括形变处理子模块, 所述形变处理子模块通过 调用动态形变函数而完成对源数据的形变处理, 从而生成目标数据。
7. 根据权利要求 1-6中任一个权利要求所述的数据转移系统, 其特征在 于, 所述至少一个源数据库和所述至少一个目标数据库是异构的。
8. 根据权利要求 1-7中任一个权利要求所述的数据转移系统, 其特征在 于, 所述参数配置数据库和所述控制数据库共存于一个数据库中。
9. 一种数据转移方法, 所述数据转移方法包括如下步骤:
( D1 )初始化至少一个主机中的总控模块, 并读取转移控制信息和参数配 置信息;
( D2 )初始化至少一个主机中的源数据获取服务模块和目标表更新服务模 块, 并且所述目标表更新服务模块创建数据存储区;
( D3 )所述控制模块根据所读取的转移控制信息和参数配置信息调用所述 源数据获取服务模块, 并将转移任务参数传递给所述源数据获取服务模块;
( D4 )所迷源数据获取服务模块连接至少一个源数据库从而获取所述至少 一个源数据库的环境变量,并根据所述转移任务参数读取所述源数据库中相应 的数据库记录;
( D5 )所述源数据获取服务模块对已读取的数据库记录进行形变处理从而 得到目标记录, 并将所述目标记录写入所述数据存储区中; ( D6 )重复步骤( D4 ) - ( D5 )直至所述数据存储区已写满:
( D7 )所述数据存储区写满后, 所述源数据获取服务模块调用所述目标表 更新服务模块, 并将转移任务参数传递给所述目标表更新服务模块;
( D8 )所述目标表更新服务模块连接至少一个目标数据库以及所述数据存 储区,从而获取所述至少一个目标数据库的环境变量, 并对接收的所述转移任 务参数进行解析;
( D9 )所述目标表更新服务模块依据解析结果对所述至少一个目标数据库 进行批量插入和更新直至转移任务完成;
( D10 )转移任务完成后, 所述目标表更新服务模块释放所述数据存储区; 其特征在于,所述源数据获取服务模块接受所述总控模块的调度从而获取 源数据、 生成目标数据、 并调用目标表更新服务模块, 所述源数据获取服务模 块分为主体和附件两部分,所述主体部分用于读取转移任务的参数,加载附件, 并调用所述转移任务对应的子处理, 所述附件部分为动态函数库, 所述动态函 数库封装源数据获取、 数据形变子处理。
10. 根据权利要求 9所述的数据转移方法,其特征在于, 所述目标表更新 服务模块接受所述源数据获取服务模块的调度而将所述目标数据插入或更新 到目标表中, 并且所述目标表更新服务模块分为主体和附件两部分, 所述主体 部分用于读取转移任务的参数, 加载附件, 调用所述转移任务对应的子处理, 附件部分为动态函数库, 所述动态函数库封装目标数据的插入和更新子处理。
1 1. 根据权利要求 9-10中任一个权利要求所述的数据转移方法, 其特征 在于, 所述数据转移方法中的步骤(D9 )进一步包括如下步骤: (E1 )将待插 入的数据分割成块, 每块可存放多条所述数据; (E2 )多个数据库操作进程中 的每个进程每次以排他的方式读取一块数据,然后采用批量插入或批量更新的 方式将所述数据插入或更新到所述至少一个目标数据库中; ( E3 )如果步骤( E2 ) 成功, 则所述进程可以继续处理其他数据块, 如果步骤(E2 )失败, 则采用单 个插入或更新的方式将所述数据插入或更新到所述目标数据库中,记录操作失 败记录对应的错误日志并继续处理其他数据块。
12. 根据权利要求 9-11中任一个权利要求所述的数据转移方法, 其特征 在于, 所述总控模块为分钟级转移总控模块或秒级转移总控模块, 其中, 所述 分钟级转移总控模块用于完成准实时数据转移任务,所述秒级转移总控模块用 于完成实时数据转移任务。
1 3. 根据权利要求 9-12中任一个权利要求所述的数据转移方法, 其特征 在于, 所述总控模块部署于每个所述主机上, 其中, 所述至少一个主机中只有 一个主机具有任务调度功能。
14. 根据权利要求 9-13中任一个权利要求所述的数据转移方法, 其特征 在于, 所述源数据获取服务模块包括形变处理子模块, 所述形变处理子模块通 过调用动态形变函数而完成对源数据的形变处理, 从而生成目标数据。
15. 根据权利要求 9- 14中任一个权利要求所述的数据转移方法, 其特征 在于, 所述至少一个源数据库和所述至少一个目标数据库是异构的。
PCT/CN2010/001280 2009-09-02 2010-08-24 数据转移系统及方法 WO2011026311A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP10813233.3A EP2474918A4 (en) 2009-09-02 2010-08-24 SYSTEM AND METHOD FOR DATA TRANSFER
US13/393,205 US8924342B2 (en) 2009-09-02 2010-08-24 System and method for data transfer

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910195025.0 2009-09-02
CN200910195025.0A CN102004745B (zh) 2009-09-02 2009-09-02 数据转移系统及方法

Publications (1)

Publication Number Publication Date
WO2011026311A1 true WO2011026311A1 (zh) 2011-03-10

Family

ID=43648847

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/001280 WO2011026311A1 (zh) 2009-09-02 2010-08-24 数据转移系统及方法

Country Status (4)

Country Link
US (1) US8924342B2 (zh)
EP (1) EP2474918A4 (zh)
CN (1) CN102004745B (zh)
WO (1) WO2011026311A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104348618A (zh) * 2013-07-30 2015-02-11 中国银联股份有限公司 与资源的转移相关联的安全性信息交互方法
CN107038192A (zh) * 2016-11-17 2017-08-11 阿里巴巴集团控股有限公司 数据库容灾方法和装置

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999537B (zh) * 2011-09-19 2017-01-18 阿里巴巴集团控股有限公司 一种数据迁移系统和方法
US8782101B1 (en) * 2012-01-20 2014-07-15 Google Inc. Transferring data across different database platforms
US20130297627A1 (en) * 2012-05-07 2013-11-07 Sandeep J. Shah Business intelligence engine
US9305067B2 (en) * 2013-07-19 2016-04-05 International Business Machines Corporation Creation of change-based data integration jobs
CN105574051B (zh) * 2014-11-06 2019-11-22 阿里巴巴集团控股有限公司 更新用户所满足的规则的方法及处理系统
CN106682017B (zh) * 2015-11-09 2020-07-31 阿里巴巴(中国)有限公司 一种数据库更新方法及装置
CN108139987B (zh) * 2016-08-30 2020-06-02 华为技术有限公司 一种数据转移的进度计算方法、装置及系统
US10915542B1 (en) * 2017-12-19 2021-02-09 Palantir Technologies Inc. Contextual modification of data sharing constraints in a distributed database system that uses a multi-master replication scheme
CN109189761A (zh) * 2018-08-31 2019-01-11 中国农业银行股份有限公司 一种数据迁移方法和装置
US11100087B2 (en) * 2019-04-26 2021-08-24 Microsoft Technology Licensing, Llc Data tokenization system maintaining data integrity
CN110175115B (zh) * 2019-04-30 2022-12-27 中国航空无线电电子研究所 基于变量的动态数据运作及管理系统
CN111190622A (zh) * 2019-12-25 2020-05-22 哈尔滨安天科技集团股份有限公司 一种低带宽的在线升级方法、装置、电子设备及存储介质
US20210248162A1 (en) * 2020-02-12 2021-08-12 Roblox Corporation Parallel data transfer from one database to another database
CN111611244B (zh) * 2020-05-20 2023-07-28 浩云科技股份有限公司 一种将数据库的数据进行级联的方法及装置
CN111767332B (zh) * 2020-06-12 2021-07-30 上海森亿医疗科技有限公司 异构数据源的数据集成方法、系统以及终端
WO2022006652A1 (en) * 2020-07-07 2022-01-13 Chand Rachelle Data transfer between databases in real time, via qrcode or barcode
CN113835685B (zh) * 2021-11-26 2022-02-18 之江实验室 一种基于拟态数据库的网络操作系统设计方法
CN114780654B (zh) * 2022-05-27 2022-11-15 河北省科学技术情报研究院(河北省科技创新战略研究院) 一种多源主附实体结构模块化构建的处理方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070174308A1 (en) * 2006-01-10 2007-07-26 Sas Institute Inc. Data warehousing systems and methods having reusable user transforms
CN101504664A (zh) * 2009-03-18 2009-08-12 中国工商银行股份有限公司 对全量源数据进行抽取转换加载的装置及方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE59501924D1 (de) * 1994-05-10 1998-05-20 Siemens Ag Datenverwaltungssystem eines realzeitsystems
US5706434A (en) * 1995-07-06 1998-01-06 Electric Classifieds, Inc. Integrated request-response system and method generating responses to request objects formatted according to various communication protocols
US6016501A (en) * 1998-03-18 2000-01-18 Bmc Software Enterprise data movement system and method which performs data load and changed data propagation operations
US6499036B1 (en) 1998-08-12 2002-12-24 Bank Of America Corporation Method and apparatus for data item movement between disparate sources and hierarchical, object-oriented representation
US7774299B2 (en) * 2005-05-09 2010-08-10 Microsoft Corporation Flow computing
CN100371900C (zh) * 2006-01-19 2008-02-27 华为技术有限公司 数据同步的方法和系统
US9176975B2 (en) * 2006-05-31 2015-11-03 International Business Machines Corporation Method and system for transformation of logical data objects for storage
CN101364186B (zh) * 2008-09-27 2012-01-25 腾讯科技(深圳)有限公司 数据迁移方法、数据迁移服务器及数据接口服务器

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070174308A1 (en) * 2006-01-10 2007-07-26 Sas Institute Inc. Data warehousing systems and methods having reusable user transforms
CN101504664A (zh) * 2009-03-18 2009-08-12 中国工商银行股份有限公司 对全量源数据进行抽取转换加载的装置及方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104348618A (zh) * 2013-07-30 2015-02-11 中国银联股份有限公司 与资源的转移相关联的安全性信息交互方法
CN107038192A (zh) * 2016-11-17 2017-08-11 阿里巴巴集团控股有限公司 数据库容灾方法和装置

Also Published As

Publication number Publication date
CN102004745A (zh) 2011-04-06
US8924342B2 (en) 2014-12-30
EP2474918A1 (en) 2012-07-11
US20120221536A1 (en) 2012-08-30
CN102004745B (zh) 2013-06-12
EP2474918A4 (en) 2014-04-09

Similar Documents

Publication Publication Date Title
WO2011026311A1 (zh) 数据转移系统及方法
EP2474919B1 (en) System and method for data replication between heterogeneous databases
US8190562B2 (en) Linking framework for information technology management
EP0319034B1 (en) Method of recovering failure of online control program
CN108345617B (zh) 一种数据同步方法、装置以及电子设备
CN104809202A (zh) 一种数据库同步的方法和装置
IL134681A (en) External job scheduling within a distributed processing system having a local job control system
KR20140147812A (ko) 데이터베이스로의 미들-티어 트랜잭션 로그들의 인라인 위임을 지원하는 시스템들 및 방법들
CN110895488B (zh) 任务调度方法及装置
CN110795420A (zh) 一种基于Ansible的MySQL数据库自动化备份方法
CN112383610A (zh) 区块链状态数据的同步处理方法及系统
CN112953983A (zh) Sftp传输方法及装置
CN110650164B (zh) 文件的上传方法、装置、终端以及计算机存储介质
CN111355802A (zh) 信息推送方法和装置
CN110895486A (zh) 分布式任务调度系统
CN108108119B (zh) 一种可扩展的存储集群事物的配置方法及装置
CN111767346A (zh) 一种数据库的数据同步方法、装置、设备及存储介质
CN113656116A (zh) 业务流程的处理方法和装置
CN110955469A (zh) 一种x86平台分布式批量调用联机交易的方法及装置
JPH10187567A (ja) 通信システムのソケットバインディング方法
CN113468143A (zh) 数据迁移方法、系统、计算设备及存储介质
CN110895485A (zh) 任务调度系统
CN110825758B (zh) 一种交易处理的方法及装置
CN111880947A (zh) 一种数据传输方法及装置
JP2001022627A (ja) 複数装置間でのデータベース同期方式および方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10813233

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2010813233

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 13393205

Country of ref document: US