WO2017114199A1 - 一种数据同步方法和装置 - Google Patents

一种数据同步方法和装置 Download PDF

Info

Publication number
WO2017114199A1
WO2017114199A1 PCT/CN2016/110658 CN2016110658W WO2017114199A1 WO 2017114199 A1 WO2017114199 A1 WO 2017114199A1 CN 2016110658 W CN2016110658 W CN 2016110658W WO 2017114199 A1 WO2017114199 A1 WO 2017114199A1
Authority
WO
WIPO (PCT)
Prior art keywords
synchronization
data
threads
thread
synchronized
Prior art date
Application number
PCT/CN2016/110658
Other languages
English (en)
French (fr)
Inventor
贾元乔
苏艳
Original Assignee
阿里巴巴集团控股有限公司
贾元乔
苏艳
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 贾元乔, 苏艳 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017114199A1 publication Critical patent/WO2017114199A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to a data synchronization method and a data synchronization device.
  • a data synchronization task is a series of periodically scheduled tasks created for data synchronization between different databases.
  • data of HDFS Hadoop Distributed File System
  • mySQL relational database management system
  • data is synchronized from HDFS to Hbase (Hadoop Database, distributed storage system).
  • the user When using the data synchronization task to synchronize the data of the source database to the target database, the user usually needs to set a synchronization thread in advance, and the synchronization controller calls the corresponding thread according to the preset number of synchronization threads, and distributes to one or more synchronous processing devices. , the data processing is performed by the synchronous processing device.
  • a data synchronization method including:
  • the thread configuration is performed according to the number of synchronization threads that are expected to run, and the configured thread is used to synchronize the data to be synchronized to the target database.
  • performing thread configuration according to the number of synchronization threads that are expected to run includes:
  • the corresponding thread is supplementally configured according to the difference between the number of synchronization threads that are expected to run and the number of synchronization threads required for the operation.
  • the method before the generating the number of synchronization threads to be run according to the data volume of the data to be synchronized and the priority of the synchronization task in the source database, the method further includes:
  • the generating, according to the data volume of the data to be synchronized in the source database and the priority of the synchronization task, generating the number of synchronization threads that are expected to run include:
  • the number of synchronization threads expected to run is calculated using the average synchronization rate, the maximum synchronization rate of the synchronization processing device, and the number of threads that the synchronization processing device is available for synchronization.
  • the method further includes:
  • the number of synchronization threads required for the operation is obtained from the target database to be synchronized with the source database.
  • the method before the using the configured thread to synchronize the data to be synchronized to the target database, the method further includes:
  • the thread configured to synchronize the data to be synchronized to the target database is:
  • a synchronization thread for synchronizing the respective data blocks to be synchronized is scheduled to a synchronization processing device to be processed by the synchronization processing device.
  • the using the configured thread to synchronize the data to be synchronized to the target database includes:
  • At least one synchronization thread whose thread attribute satisfies a preset condition is preferentially transmitted to the synchronization processing device.
  • the preset condition includes at least one of the following:
  • the at least one synchronization thread belongs to the same data synchronization task, the to-be-processed time is greater than the preset time threshold, the priority of the corresponding synchronization task is greater than the preset priority threshold, and the single synchronization processing device can be processed synchronously.
  • the using the configured thread to synchronize the data to be synchronized to the target database includes:
  • the synchronization thread is preferentially sent to a synchronization processing device whose maximum number of threads that can be processed synchronously is greater than a preset thread number threshold.
  • a data synchronization device including:
  • a synchronization thread number generation module configured to generate, according to the data volume of the data to be synchronized in the source database and the priority of the synchronization task, the number of synchronization threads that are expected to run;
  • the synchronization processing module is configured to perform thread configuration according to the number of synchronization threads that are expected to run, and synchronize the data to be synchronized to the target database by using the configured thread.
  • the synchronization processing module includes:
  • a synchronization thread number judging submodule configured to determine whether the number of synchronization threads expected to run is greater than a number of synchronization threads required for operation, and if so, invoking a thread to supplement the submodule;
  • the thread supplementing submodule is configured to supplement the configuration of the corresponding thread according to the difference between the number of synchronization threads that are expected to run and the number of synchronization threads required for the operation.
  • the device further includes:
  • the priority and data volume obtaining module is configured to obtain the priority of the synchronization task from the data synchronization task submitted by the user, and obtain the data volume of the data to be synchronized from the source database that stores the data to be synchronized.
  • the synchronization thread number generation module includes:
  • An average synchronization rate finding sub-module configured to find an average synchronization rate for the data amount of the data to be synchronized and the priority of the synchronization task
  • a synchronization thread number calculation submodule configured to calculate the number of synchronization threads that are expected to run by using the average synchronization rate, a maximum synchronization rate of the synchronization processing device, and a number of threads that the synchronization processing device can process synchronously.
  • the device further includes:
  • the running synchronization thread number obtaining module is configured to obtain the number of synchronization threads required for the operation from the target database to be synchronized with the source database.
  • the device further includes:
  • the data to be synchronized module is configured to split the data to be synchronized into a plurality of data blocks to be synchronized according to the number of synchronization threads required for the operation;
  • the synchronization processing module is specifically configured to:
  • a synchronization thread for synchronizing the respective data blocks to be synchronized is scheduled to a synchronization processing device to be processed by the synchronization processing device.
  • the synchronization processing module is specifically configured to:
  • At least one synchronization thread whose thread attribute satisfies a preset condition is preferentially transmitted to the synchronization processing device.
  • the preset condition includes at least one of the following:
  • the at least one synchronization thread belongs to the same data synchronization task, the to-be-processed time is greater than the preset time threshold, the priority of the corresponding synchronization task is greater than the preset priority threshold, and the single synchronization processing device can be processed synchronously.
  • the synchronization processing device includes multiple, and the synchronization processing module is specifically configured to:
  • the synchronization thread is preferentially sent to a synchronization processing device whose maximum number of threads that can be processed synchronously is greater than a preset thread number threshold.
  • the number of synchronization threads that are expected to be run is generated according to the data amount of the data to be synchronized in the source database and the priority of the synchronization task, and the corresponding number of threads are configured to perform data synchronization according to the number of synchronization threads that are expected to be run, so that The thread is dynamically configured in the actual situation of the synchronization task to avoid the situation that some threads have completed the synchronization task, and other threads of the same task are still in the waiting state, which improves the efficiency and stability of data synchronization.
  • the number of synchronization threads that are expected to run is dynamically adjusted according to the priority of the synchronization task, so that the synchronization tasks with higher importance can be processed preferentially.
  • FIG. 1 is a flow chart showing the steps of a first embodiment of a data synchronization method according to the present application
  • FIG. 2 is a flow chart of steps of a second embodiment of a data synchronization method according to the present application.
  • Embodiment 3 is a structural block diagram of Embodiment 1 of a data synchronization apparatus according to the present application;
  • FIG. 4 is a structural block diagram of a second embodiment of a data synchronization apparatus according to the present application.
  • FIG. 1 a flow chart of a first embodiment of a data synchronization method of the present application is shown, which may specifically include the following steps:
  • Step 101 Generate, according to the data volume of the data to be synchronized in the source database and the priority of the synchronization task, the number of synchronization threads that are expected to run.
  • the number of synchronization threads that are expected to run may be the number of first-time synchronization threads or multiple rounds of synchronization threads preset by the user. Because the user cannot know that the synchronization task is to be synchronized. The situation of the data and the operation of the synchronous processing device, or the user does not know how to set, the number of synchronization threads set may not match the number of synchronization threads actually required when running the synchronization, thereby causing the synchronization thread to be distributed to the synchronization.
  • the data volume of the data to be synchronized in the source database and the priority of the synchronization task may be utilized to generate the number of synchronization threads that are expected to run.
  • a user can submit a data synchronization task, which usually includes information about the priority of the synchronization task, so that the priority of the synchronization task can be obtained therefrom.
  • the data amount of the synchronization data can be obtained from the source database of the data source to be synchronized.
  • a comparison table of data amount-priority-average synchronization rate may be preset, and a combination of different data amounts and priorities in the comparison table corresponds to a specific average synchronization rate. The corresponding average synchronization rate is found according to the amount of data and priority.
  • the average synchronization rate to be searched, and the synchronization rate that each CPU (Central Processing Unit) on the synchronous processing device can run, can determine the number of CPUs required for synchronization, and the number of CPUs can be used as the synchronization thread that is expected to run. number.
  • Step 102 Perform thread configuration according to the number of synchronization threads that are expected to run, and synchronize the data to be synchronized to the target database by using the configured thread.
  • Synchronous threads can be configured according to the number of synchronization threads that are expected to run. There are several ways to configure it, such as the number of synchronization threads that can be run according to expectations and the actual required operation. Synchronize the difference between the number of threads, configure the corresponding virtual thread, use the configured virtual thread to put some CPUs on the synchronous processing device into the standby state and suspend processing other synchronization tasks to ensure that the synchronous processing device has sufficient resources to process the data synchronization. Task; when the number of synchronization threads expected to run is the same as the number of synchronization threads actually required for synchronous operation, the corresponding entity thread can be configured directly according to the number of synchronization threads that are expected to run.
  • the data to be synchronized can be processed synchronously by the configured entity thread and/or virtual thread.
  • the data to be synchronized may be first split into a plurality of data blocks to be synchronized, and each configured thread is used to synchronize a plurality of data blocks to be synchronized.
  • the plurality of threads are scheduled to one or more synchronization processing devices according to the remaining processing resources of the synchronization processing device, and the plurality of to-be-synchronized data blocks are synchronized by the synchronization processing device to the target database.
  • the synchronization controller schedules the synchronization thread to the synchronization processing device, it can be determined whether the number of synchronization threads actually required for the synchronous operation is greater than the number of synchronization threads that are expected to run, and different manners are scheduled according to the determination result.
  • the synchronization controller may adopt a multi-machine multi-thread data synchronization mode to prepare a scheduling of the first round of physical threads configured for the synchronization task, with priority. Sending the entity thread with the longest waiting time, the highest priority, and belonging to the same synchronization task to multiple synchronous processing devices, to ensure that each thread of the synchronization task can be prioritized, and avoiding a thread having completed synchronization, and the same synchronization task The other threads are still in a state of waiting for processing, affecting the average synchronization rate of the synchronization task.
  • the synchronization controller can adopt a single-machine multi-thread data synchronization mode to prepare the scheduling of the physical thread and the virtual thread configured for the synchronization task, and the waiting time is the most A thread with the highest priority and belonging to the same synchronization task is preferentially sent to a synchronization processing device that can support the number of synchronization threads that are expected to run. If there is no synchronization processing device that satisfies the condition, the remaining processing resources can be preferentially selected. The most one of the synchronization processing devices is the transmission object.
  • the scheduling processing device can process resources according to the synchronization processing device.
  • the load balancing scheduling may be performed in a random manner, which is not limited in this embodiment of the present application.
  • the number of synchronization threads that are expected to be run is generated according to the data amount of the data to be synchronized in the source database and the priority of the synchronization task, and the corresponding number of threads are configured to perform data synchronization according to the number of synchronization threads that are expected to be run, so that The thread is dynamically configured in the actual situation of the synchronization task to avoid the situation that some threads have completed the synchronization task, and other threads of the same task are still in the waiting state, which improves the efficiency and stability of data synchronization.
  • the number of synchronization threads that are expected to run is dynamically adjusted according to the priority of the synchronization task, so that the synchronization tasks with higher importance can be processed preferentially.
  • the method may include the following steps:
  • Step 201 Obtain the priority of the synchronization task from the data synchronization task submitted by the user, and obtain the data volume of the data to be synchronized from the source database that holds the data to be synchronized.
  • the user when data synchronization is required, can create a data synchronization task on the client and submit it to the corresponding synchronization processing server.
  • the client When creating a data synchronization task, the client can generate a corresponding priority according to the importance of the data to be synchronized, and is included in the data synchronization task, so the synchronization processing server can obtain the priority of the synchronization task from the submitted data synchronization task.
  • the amount of data of the data to be synchronized can also be read from the metadata of the source database in which the data to be synchronized is stored. For example, when the HDFS is synchronized to Hbase, the amount of data of the data to be synchronized is obtained by reading the information of the HDFS, and other types of databases can also acquire the amount of data in a similar manner.
  • Step 202 Find an average synchronization rate for the data amount of the data to be synchronized and the priority of the synchronization task.
  • Step 203 Calculate the number of synchronization threads that are expected to run by using the average synchronization rate, the maximum synchronization rate of the synchronization processing device, and the number of threads that the synchronization processing device can use for synchronization processing.
  • the average synchronization rate of the step task may be initialized in advance according to the amount of data and the priority, for example:
  • a synchronization task with a data volume of 0-10G and a priority of 8 can find that the corresponding average synchronization task is 30 mb/ s.
  • each synchronous processing device has 6 CPU cores with a maximum synchronization rate of 60 mb/s, and it can be calculated that the maximum synchronization rate that each CPU can run is 10 mb/s.
  • Step 204 Obtain the number of synchronization threads required for the operation from a target database to be synchronized with the source database.
  • the number of synchronization threads required for the operation can be read from the source data of the target database.
  • the number of Regions of the HBase target table can be read, and the number of Regions of the Hbase target table is used as the synchronization required for the operation.
  • Region is the basic unit of HBase data storage and management. Usually, Region determines the number of threads that the target database uses when synchronizing.
  • Step 205 Determine whether the number of synchronization threads that are expected to run is greater than the number of synchronization threads required for operation; if yes, proceed to step 206.
  • Step 206 according to the expected number of synchronization threads running and the synchronization line required for the operation The difference between the number of programs, supplement the configuration of the corresponding thread.
  • the difference between the number of synchronization threads required to run and the number of synchronization threads required for the operation may be subtracted, and the corresponding virtual thread may be supplemented to configure the virtual thread.
  • the thread puts part of the CPU on the synchronous processing device into standby and suspends processing of other synchronization tasks to ensure that the synchronization processing device has sufficient resources to handle the data synchronization task.
  • Step 207 Split the data to be synchronized into a plurality of data blocks to be synchronized according to the number of synchronization threads required for the operation.
  • Step 208 Synchronize the data to be synchronized to the target database by using the configured thread.
  • the step 208 may be specifically:
  • a synchronization thread for synchronizing the respective data blocks to be synchronized is scheduled to a synchronization processing device to be processed by the synchronization processing device.
  • the data to be synchronized can be split into a plurality of data blocks to be synchronized, and each configured thread is used to synchronize a plurality of data blocks to be synchronized.
  • the plurality of threads are scheduled to one or more synchronization processing devices according to the remaining processing resources of the synchronization processing device, and the plurality of to-be-synchronized data blocks are synchronized by the synchronization processing device to the target database.
  • multiple threads corresponding to the data synchronization task may be submitted to the synchronization controller, and the synchronous controller performs the judgment process of whether the number of synchronization threads expected to run is greater than the number of synchronization threads required for the operation, and according to the determination result.
  • the configuration thread is complemented and the threads are scheduled to multiple simultaneous processing devices.
  • the step 208 may include:
  • the synchronization processing device And sending, by the synchronization processing device, the at least one synchronization thread that meets the preset condition of the thread attribute to the synchronization processing device; wherein the preset condition may include at least one of the following:
  • the at least one synchronization thread belongs to the same data synchronization task, the to-be-processed time is greater than the preset time threshold, the priority of the corresponding synchronization task is greater than the preset priority threshold, and the single synchronization processing device can be processed synchronously.
  • the synchronization controller may adopt a single-machine multi-thread data synchronization mode, and prepare the scheduling of the entity thread and the virtual thread configured for the synchronization task, which will be A thread whose processing time is greater than the preset time threshold and whose priority is greater than the preset priority threshold and belongs to the same data synchronization task is preferentially sent to the synchronization processing device that can be synchronously processed by a single synchronization processing device; if there is no synchronization processing device that satisfies the condition , a synchronization processing device with the most remaining processing resources can be preferentially selected as the transmission target.
  • the synchronization processing device includes multiple, and the step 208 may include:
  • the synchronization thread is preferentially sent to a synchronization processing device whose maximum number of threads that can be processed synchronously is greater than a preset thread number threshold.
  • the synchronization thread can be preferentially sent to the maximum number of threads available for synchronization processing, which is greater than the preset thread number threshold.
  • the load balancing scheduling may be performed according to the processing resources of the synchronization processing device, or may be randomly scheduled, which is not limited in this embodiment of the present application.
  • the corresponding virtual thread is supplementally configured according to the difference between the number of synchronization threads that are expected to run and the number of synchronization threads required to run, and the virtual CPU is used to put some CPUs on the synchronous processing device into a standby state and suspend processing other
  • the task is synchronized to ensure that the synchronous processing device has sufficient resources to process the data synchronization task, thereby improving the efficiency and stability of data synchronization.
  • FIG. 3 a structural block diagram of a first embodiment of a data synchronization apparatus of the present application is shown, which may specifically include the following modules:
  • the synchronization thread number generation module 301 is configured to generate a synchronization thread number to be run according to the data amount of the data to be synchronized in the source database and the priority of the synchronization task.
  • the synchronization processing module 302 is configured to perform thread configuration according to the number of synchronization threads that are expected to run, and synchronize the data to be synchronized to the target database by using the configured thread.
  • the number of synchronization threads that are expected to be run is generated according to the data amount of the data to be synchronized in the source database and the priority of the synchronization task, and the corresponding number of threads are configured to perform data synchronization according to the number of synchronization threads that are expected to be run, so that The thread is dynamically configured in the actual situation of the synchronization task to avoid the situation that some threads have completed the synchronization task, and other threads of the same task are still in the waiting state, which improves the efficiency and stability of data synchronization.
  • the number of synchronization threads that are expected to run is dynamically adjusted according to the priority of the synchronization task, so that the synchronization tasks with higher importance can be processed preferentially.
  • FIG. 4 a structural block diagram of a second embodiment of a data synchronization apparatus of the present application is shown, which may specifically include the following modules:
  • the priority and data volume obtaining module 401 is configured to obtain the priority of the synchronization task from the data synchronization task submitted by the user, and obtain the data volume of the data to be synchronized from the source database that stores the data to be synchronized.
  • the running synchronization thread number obtaining module 402 is configured to obtain the number of synchronization threads required for the operation from the target database to be synchronized with the source database.
  • the synchronization thread number generation module 403 is configured to generate a synchronization thread number to be run according to the data amount of the data to be synchronized in the source database and the priority of the synchronization task.
  • the synchronization processing module 404 is configured to perform thread configuration according to the number of synchronization threads that are expected to run, and synchronize the data to be synchronized to the target database by using the configured thread.
  • the synchronization thread number generation module 403 may include:
  • An average synchronization rate finding sub-module configured to find an average synchronization rate for the data amount of the data to be synchronized and the priority of the synchronization task
  • a synchronization thread number calculation submodule configured to calculate the number of synchronization threads that are expected to run by using the average synchronization rate, a maximum synchronization rate of the synchronization processing device, and a number of threads that the synchronization processing device can process synchronously.
  • the synchronization processing module 404 may include:
  • the synchronization thread number judging sub-module is configured to determine whether the number of synchronization threads expected to run is greater than the number of synchronization threads required for operation, and if so, the thread replenishing sub-module.
  • the thread supplementing submodule is configured to supplement the configuration of the corresponding thread according to the difference between the number of synchronization threads that are expected to run and the number of synchronization threads required for the operation.
  • the apparatus may further include:
  • the data to be synchronized module is configured to split the data to be synchronized into a plurality of data blocks to be synchronized according to the number of synchronization threads required for the operation.
  • the synchronization processing module 404 may be specifically configured to:
  • a synchronization thread for synchronizing the respective data blocks to be synchronized is scheduled to a synchronization processing device to be processed by the synchronization processing device.
  • the synchronization processing module 404 may be specifically configured to:
  • At least one synchronization thread whose thread attribute satisfies a preset condition is preferentially transmitted to the synchronization processing device.
  • the preset condition includes at least one of the following:
  • the at least one synchronization thread belongs to the same data synchronization task, and the pending time is greater than the pre-processing time.
  • the time threshold, the priority of the corresponding synchronization task is greater than the preset priority threshold, and the single synchronization processing device can be processed synchronously.
  • the synchronization processing device includes multiple, and the synchronization processing module 404 may be specifically configured to:
  • the synchronization thread is preferentially sent to a synchronization processing device whose maximum number of threads that can be processed synchronously is greater than a preset thread number threshold.
  • the corresponding virtual thread is supplementally configured according to the difference between the number of synchronization threads that are expected to run and the number of synchronization threads required to run, and the virtual CPU is used to put some CPUs on the synchronous processing device into a standby state and suspend processing other
  • the task is synchronized to ensure that the synchronous processing device has sufficient resources to process the data synchronization task, thereby improving the efficiency and stability of data synchronization.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is a computer readable medium Example.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据同步方法,根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数(101);按照期望运行的同步线程数进行线程配置,并采用配置的线程针对所述待同步数据同步至目标数据库(102)。该方法可以针对于同步任务的实际情况动态配置线程,避免出现部分线程已经完成同步任务,而同一任务的其他线程还处于等待状态的情况,提升了数据同步的效率和稳定性。

Description

一种数据同步方法和装置 技术领域
本申请涉及互联网技术领域,特别是涉及一种数据同步方法和一种数据同步装置。
背景技术
数据同步任务是针对不同数据库之间的数据同步而创建的一系列周期调度的任务。在一个大型的数据调度系统中,经常会在同一时间运行有大量的数据同步任务。例如将HDFS(Hadoop Distributed File System,Hadoop分布式文件系统)的数据同步至mySQL(关系型数据库管理系统),或者从HDFS同步数据至Hbase(Hadoop Database,分布式存储系统)。
在使用数据同步任务将源数据库的数据同步至目标数据库时,用户通常需要预先设置一个同步线程数,同步控制器根据预设的同步线程数调用相应的线程,分发到一个或多个同步处理设备,由同步处理设备进行数据同步处理。然而,当某个同步任务的待同步数据的数据量较大,如果将该同步任务的线程分发到比较繁忙的同步处理设备,其无法调用满足预设的同步线程数进行同步,可能部分线程已经完成同步任务,而同一任务的其他线程还处于等待状态,从而导致整个同步任务的平均同步速度较低。因此,目前的数据同步方式无法有效利用同步处理设备的同步处理能力,存在着数据同步效率较低的问题。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种数据同步方法和相应的一种数据同步装置。
为了解决上述问题,本申请公开了一种数据同步方法,包括:
根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数;
按照期望运行的同步线程数进行线程配置,并采用配置的线程针对所述待同步数据同步至目标数据库。
可选地,所述按照期望运行的同步线程数进行线程配置包括:
判断所述期望运行的同步线程数是否大于运行所需的同步线程数;
若是,则根据所述期望运行的同步线程数和所述运行所需的同步线程数的差值,补充配置相应的线程。
可选地,在所述根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数之前,所述方法还包括:
从用户提交的数据同步任务中,获取所述同步任务的优先级,并从保存有待同步数据的所述源数据库中,获取所述待同步数据的数据量。
可选地,所述根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数包括:
查找针对所述待同步数据的数据量和同步任务的优先级的平均同步速率;
利用所述平均同步速率、同步处理设备的最大同步速率和所述同步处理设备可供同步处理的线程数,计算所述期望运行的同步线程数。
可选地,在所述判断所述期望运行的同步线程数是否大于运行所需的同步线程数之前,所述方法还包括:
从待与源数据库进行数据同步的所述目标数据库中,获取所述运行所需的同步线程数。
可选地,在所述采用配置的线程针对所述待同步数据同步至目标数据库之前,所述方法还包括:
按照所述运行所需的同步线程数将所述待同步数据拆分成多个待同步数据块;
所述采用配置的线程针对所述待同步数据同步至目标数据库为:
将用于同步各个待同步数据块的同步线程,调度至同步处理设备,以由所述同步处理设备处理所述同步线程。
可选地,所述采用配置的线程针对所述待同步数据同步至目标数据库包括:
将线程属性满足预设条件的至少一个同步线程优先发送至所述同步处理设备。
可选地,所述预设条件包括以下至少一种:
所述至少一个同步线程属于同一个数据同步任务、待处理时间大于预设时间阈值、对应同步任务的优先级大于预设优先级阈值、单个同步处理设备可同步处理。
可选地,所述采用配置的线程针对所述待同步数据同步至目标数据库包括:
将所述同步线程优先发送至可供同步处理的最大线程数大于预设线程数阈值的同步处理设备。
为了解决上述问题,本申请还公开了一种数据同步装置,包括:
同步线程数生成模块,用于根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数;
同步处理模块,用于按照期望运行的同步线程数进行线程配置,并采用配置的线程针对所述待同步数据同步至目标数据库。
可选地,所述同步处理模块包括:
同步线程数判断子模块,用于判断所述期望运行的同步线程数是否大于运行所需的同步线程数,若是,则调用线程补充子模块;
线程补充子模块,用于根据所述期望运行的同步线程数和所述运行所需的同步线程数的差值,补充配置相应的线程。
可选地,所述装置还包括:
优先级及数据量获取模块,用于从用户提交的数据同步任务中,获取所述同步任务的优先级,并从保存有待同步数据的源数据库中,获取所述待同步数据的数据量。
可选地,所述同步线程数生成模块包括:
平均同步速率查找子模块,用于查找针对所述待同步数据的数据量和同步任务的优先级的平均同步速率;
同步线程数计算子模块,用于利用所述平均同步速率、同步处理设备的最大同步速率和所述同步处理设备可供同步处理的线程数,计算所述期望运行的同步线程数。
可选地,所述装置还包括:
运行所需同步线程数获取模块,用于从待与源数据库进行数据同步的目标数据库中,获取所述运行所需的同步线程数。
可选地,所述装置还包括:
待同步数据拆分模块,用于按照所述运行所需的同步线程数将所述待同步数据拆分成多个待同步数据块;
所述同步处理模块具体用于:
将用于同步各个待同步数据块的同步线程,调度至同步处理设备,以由所述同步处理设备处理所述同步线程。
可选地,所述同步处理模块具体用于:
将线程属性满足预设条件的至少一个同步线程优先发送至所述同步处理设备。
可选地,所述预设条件包括以下至少一种:
所述至少一个同步线程属于同一个数据同步任务、待处理时间大于预设时间阈值、对应同步任务的优先级大于预设优先级阈值、单个同步处理设备可同步处理。
可选地,所述同步处理设备包括多个,所述同步处理模块具体用于:
将所述同步线程优先发送至可供同步处理的最大线程数大于预设线程数阈值的同步处理设备。
本申请实施例包括以下优点:
根据本申请实施例,通过根据源数据库中待同步数据的数据量和同步任务的优先级生成期望运行的同步线程数,根据期望运行的同步线程数配置相应数量的线程进行数据同步,从而可以针对于同步任务的实际情况动态配置线程,避免出现部分线程已经完成同步任务,而同一任务的其他线程还处于等待状态的情况,提升了数据同步的效率和稳定性。而且,根据同步任务的优先级动态调整期望运行的同步线程数,使得重要程度较高的同步任务可以优先得到处理。
附图说明
图1是本申请的一种数据同步方法实施例一的步骤流程图;
图2是本申请的一种数据同步方法实施例二的步骤流程图;
图3是本申请的一种数据同步装置实施例一的结构框图;
图4是本申请的一种数据同步装置实施例二的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
参照图1,示出了本申请的一种数据同步方法实施例一的步骤流程图,具体可以包括如下步骤:
步骤101,根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数。
需要说明的是,期望运行的同步线程数可以为由用户预先设置的首轮同步线程数或多轮同步线程数。因为用户无法得知本次同步任务中待同步 数据的情况和同步处理设备的运行情况,或者用户并不了解如何设置,所设置的同步线程数可能与运行同步时实际所需的同步线程数并不匹配,从而导致同步时将同步线程分发到繁忙的同步处理设备上;或者用户将期望运行的同步线程数设置成一个固定值,从而导致同步任务得不到合理的同步处理资源分配而影响同步效率。此外,不同数据的同步任务的重要程序不相同,而同步控制器将各个同步任务平等对待,导致重要的同步线程因得不到处理资源而无法优先同步。因此,利用目前的数据同步方式进行数据同步,可能会导致上述的问题,从而最终导致数据同步效率较低,同步任务的处理也缺乏稳定性。
本申请实施例的具体实现中,可以利用源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数。实际应用中,用户可以提交一个数据同步任务中,该数据同步任务通常包含有同步任务优先级的相关信息,因此可以从中获取同步任务的优先级。针对用户提交的数据同步任务,可以从待同步数据来源的源数据库中,获取同步数据的数据量。当然,本领域技术人员可以采用其他途径获取数据量和优先级等的信息,例如在数据同步任务中嵌入待同步数据的数据量和同步任务的优先级等的信息,以便于从提交的数据同步任务中直接获取数据量和优先级。
根据数据量和优先级生成期望运行的同步线程数的具体方式可以有多种。例如,可以预设一个数据量-优先级-平均同步速率的对照表,对照表中不同的数据量和优先级的组合对应一个特定的平均同步速率。根据数据量和优先级查找到对应的平均同步速率。将查找的平均同步速率,处以同步处理设备上每个CPU(Central Processing Unit,中央处理器)可以运行的同步速率,可确定同步所需的CPU数量,可以将该CPU数量作为期望运行的同步线程数。
步骤102,按照期望运行的同步线程数进行线程配置,并采用配置的线程针对所述待同步数据同步至目标数据库。
可以按照期望运行的同步线程数,对同步线程进行配置。配置的方式可以有多种,例如可以根据期望运行的同步线程数与同步运行实际所需的 同步线程数的差值,配置相应的虚拟线程,利用配置的虚拟线程将同步处理设备上的部分CPU置于待机状态并暂停处理其他同步任务,以保证同步处理设备有足够的资源处理该数据同步任务;当期望运行的同步线程数与同步运行实际所需的同步线程数相同,则可以直接按照期望运行的同步线程数,配置相应的实体线程。
可以采用配置的实体线程和/或虚拟线程同步处理待同步数据。实际应用中,可以先将待同步数据拆分成多个待同步数据块,每个配置的线程分别用于同步多个待同步数据块。根据同步处理设备的剩余处理资源,将多个线程调度至一个或多个同步处理设备,由同步处理设备将多个待同步数据块同步至目标数据库。
在同步控制器将同步线程调度至同步处理设备时,可以判断同步运行实际所需的同步线程数是否大于期望运行的同步线程数,根据判断结果进行不同方式的调度。
例如,当同步运行实际所需的同步线程数大于期望运行的同步线程数,同步控制器可以采用多机多线程的数据同步模式,准备针对该同步任务配置的第一轮实体线程的调度,优先发送等待时间最长、优先级最高且属于相同同步任务的实体线程至多个同步处理设备,以保证该同步任务的各个线程的可以得到优先处理,避免某个线程已经完成同步,而同属一个同步任务的其他线程还处于等待处理的状态,影响到该同步任务的平均同步速率。
当同步运行实际所需的同步线程数小于期望运行的同步线程数,同步控制器可以采用单机多线程的数据同步模式,准备针对该同步任务配置的实体线程和虚拟线程的调度,将等待时间最长、优先级最高且属于相同同步任务的线程,优先发送至某一个剩余处理资源可以支持期望运行的同步线程数的同步处理设备上;如果没有满足条件的同步处理设备,可以优先选择剩余处理资源最多的一个同步处理设备作为发送对象。
当然,本领域技术人员可以根据实际情况采用单机单线程、多机单线程等的数据同步模式进行调度,调度时可以根据同步处理设备的处理资源 进行负载均衡调度,也可以随机调度,本申请实施例对此不作限制。
根据本申请实施例,通过根据源数据库中待同步数据的数据量和同步任务的优先级生成期望运行的同步线程数,根据期望运行的同步线程数配置相应数量的线程进行数据同步,从而可以针对于同步任务的实际情况动态配置线程,避免出现部分线程已经完成同步任务,而同一任务的其他线程还处于等待状态的情况,提升了数据同步的效率和稳定性。而且,根据同步任务的优先级动态调整期望运行的同步线程数,使得重要程度较高的同步任务可以优先得到处理。
参照图2,示出了本申请的一种数据同步方法实施例二的步骤流程图,具体可以包括如下步骤:
步骤201,从用户提交的数据同步任务中,获取所述同步任务的优先级,并从保存有待同步数据的源数据库中,获取所述待同步数据的数据量。
具体的实现中,在需要进行数据同步时,用户可以在客户端创建数据同步任务并提交给相应的同步处理服务器。在创建数据同步任务时,客户端可以根据待同步数据的重要程度生成相应的优先级,并包含在数据同步任务中,因此同步处理服务器可以从提交的数据同步任务中获取同步任务的优先级。此外,还可以从保存有待同步数据的源数据库的元数据中读取待同步数据的数据量。例如从HDFS同步至Hbase时,通过读取HDFS的信息获取待同步数据的数据量,其他类型的数据库也可以通过类似的方式获取数据量。
步骤202,查找针对所述待同步数据的数据量和同步任务的优先级的平均同步速率。
步骤203,利用所述平均同步速率、同步处理设备的最大同步速率和所述同步处理设备可供同步处理的线程数,计算所述期望运行的同步线程数。
可以根据获取到的待同步数据的数据量和同步任务的优先级,查找同 步任务的平均同步速率。具体地,可以预先根据数据量和优先级初始化同步任务的期望平均同步速率,例如:
Figure PCTCN2016110658-appb-000001
根据当前的同步任务中待同步数据的数据量和同步任务的优先级进行查找,例如,数据量为0-10G,优先级为8的同步任务,可以查找到其对应的平均同步任务为30mb/s。
然后根据根据每个同步处理设备的处理资源,计算同步所需的CPU数量。例如,每个同步处理设备具有6个CPU内核,其最大同步速率为60mb/s,则可以计算得到每个CPU可以运行的最大同步速率为10mb/s。
将查找的同步任务的期望平均同步速率30mb/s除以每个CPU可以运行的最大同步速率为10mb/s,可以计算出所需的CPU个数为3,即为期望运行的同步线程数。
步骤204,从待与源数据库进行数据同步的目标数据库中,获取所述运行所需的同步线程数。
可以从目标数据库的源数据读取运行所需的同步线程数,例如从HDFS同步至HBase时,可以读取HBase目标表的Region个数,以Hbase目标表的Region个数作为运行所需的同步线程数。Region是HBase数据存储和管理的基本单位,通常由Region确定目标数据库在同步时采用的线程数。
步骤205,判断所述期望运行的同步线程数是否大于运行所需的同步线程数;若是,则进行步骤206。
步骤206,根据所述期望运行的同步线程数和所述运行所需的同步线 程数的差值,补充配置相应的线程。
当期望运行的同步线程数大于同步运行实际所需的同步线程数,可以根据期望运行的同步线程数减去运行所需的同步线程数的差值,补充配置相应的虚拟线程,利用配置的虚拟线程将同步处理设备上的部分CPU置于待机状态并暂停处理其他同步任务,以保证同步处理设备有足够的资源处理该数据同步任务。
步骤207,按照所述运行所需的同步线程数将所述待同步数据拆分成多个待同步数据块。
步骤208,采用配置的线程针对所述待同步数据同步至目标数据库。
作为本申请实施例的优选示例,所述步骤208可以具体为:
将用于同步各个待同步数据块的同步线程,调度至同步处理设备,以由所述同步处理设备处理所述同步线程。
可以将待同步数据拆分成多个待同步数据块,每个配置的线程分别用于同步多个待同步数据块。根据同步处理设备的剩余处理资源,将多个线程调度至一个或多个同步处理设备,由同步处理设备将多个待同步数据块同步至目标数据库。
在实际的应用中,可以将数据同步任务对应的多个线程提交至同步控制器,由同步控制器进行期望运行的同步线程数是否大于运行所需的同步线程数的判断处理,并根据判断结果相应地补充配置线程和将线程调度至多个同步处理设备。
作为本申请实施例的优选示例,所述步骤208可以包括:
将线程属性满足预设条件的至少一个同步线程优先发送至所述同步处理设备;其中,所述预设条件可以包括以下至少一种:
所述至少一个同步线程属于同一个数据同步任务、待处理时间大于预设时间阈值、对应同步任务的优先级大于预设优先级阈值、单个同步处理设备可同步处理。
针对期望运行的同步线程数大于同步运行实际所需的同步线程数的情况,同步控制器可以采用单机多线程的数据同步模式,准备针对该同步任务配置的实体线程和虚拟线程的调度,将待处理时间大于预设时间阈值、优先级大于预设优先级阈值且属于同一个数据同步任务的线程,优先发送至单个同步处理设备可同步处理的同步处理设备上;如果没有满足条件的同步处理设备,可以优先选择剩余处理资源最多的一个同步处理设备作为发送对象。
作为本申请实施例的优选示例,所述同步处理设备包括多个,所述步骤208可以包括:
将所述同步线程优先发送至可供同步处理的最大线程数大于预设线程数阈值的同步处理设备。
实际应用中,可能暂时没有能满足单个同步处理设备可同步处理的该预设条件的同步处理设备,此时可以将同步线程优先发送至可供同步处理的最大线程数大于预设线程数阈值的多个同步处理设备上,即选择CPU剩余处理资源较多的同步处理设备进行首轮或多轮的数据同步。针对多个同步处理设备的调度,可以根据同步处理设备的处理资源进行负载均衡调度,也可以随机调度,本申请实施例对此不作限制。
根据本申请实施例,根据期望运行的同步线程数和运行所需的同步线程数的差值补充配置相应的虚拟线程,利用虚拟线程将同步处理设备上的部分CPU置于待机状态并暂停处理其他同步任务,以保证同步处理设备有足够的资源处理该数据同步任务,提升了数据同步的效率和稳定性。而且,不需要从系统层面控制CPU的使用,提升了数据同步的灵活性。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
参照图3,示出了本申请的一种数据同步装置实施例一的结构框图,具体可以包括如下模块:
同步线程数生成模块301,用于根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数。
同步处理模块302,用于按照期望运行的同步线程数进行线程配置,并采用配置的线程针对所述待同步数据同步至目标数据库。
根据本申请实施例,通过根据源数据库中待同步数据的数据量和同步任务的优先级生成期望运行的同步线程数,根据期望运行的同步线程数配置相应数量的线程进行数据同步,从而可以针对于同步任务的实际情况动态配置线程,避免出现部分线程已经完成同步任务,而同一任务的其他线程还处于等待状态的情况,提升了数据同步的效率和稳定性。而且,根据同步任务的优先级动态调整期望运行的同步线程数,使得重要程度较高的同步任务可以优先得到处理。
参照图4,示出了本申请的一种数据同步装置实施例二的结构框图,具体可以包括如下模块:
优先级及数据量获取模块401,用于从用户提交的数据同步任务中,获取所述同步任务的优先级,并从保存有待同步数据的源数据库中,获取所述待同步数据的数据量。
运行所需同步线程数获取模块402,用于从待与源数据库进行数据同步的目标数据库中,获取所述运行所需的同步线程数。
同步线程数生成模块403,用于根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数。
同步处理模块404,用于按照期望运行的同步线程数进行线程配置,并采用配置的线程针对所述待同步数据同步至目标数据库。
作为本申请实施例的优选示例,所述同步线程数生成模块403可以包括:
平均同步速率查找子模块,用于查找针对所述待同步数据的数据量和同步任务的优先级的平均同步速率;
同步线程数计算子模块,用于利用所述平均同步速率、同步处理设备的最大同步速率和所述同步处理设备可供同步处理的线程数,计算所述期望运行的同步线程数。
作为本申请实施例的优选示例,所述同步处理模块404可以包括:
同步线程数判断子模块,用于判断所述期望运行的同步线程数是否大于运行所需的同步线程数,若是,则调用线程补充子模块。
线程补充子模块,用于根据所述期望运行的同步线程数和所述运行所需的同步线程数的差值,补充配置相应的线程。
作为本申请实施例的优选示例,所述装置可以还包括:
待同步数据拆分模块,用于按照所述运行所需的同步线程数将所述待同步数据拆分成多个待同步数据块。
作为本申请实施例的优选示例一,所述同步处理模块404可以具体用于:
将用于同步各个待同步数据块的同步线程,调度至同步处理设备,以由所述同步处理设备处理所述同步线程。
作为本申请实施例的优选示例二,所述同步处理模块404可以具体用于:
将线程属性满足预设条件的至少一个同步线程优先发送至所述同步处理设备。
作为本申请实施例的优选示例,所述预设条件包括以下至少一种:
所述至少一个同步线程属于同一个数据同步任务、待处理时间大于预 设时间阈值、对应同步任务的优先级大于预设优先级阈值、单个同步处理设备可同步处理。
作为本申请实施例的优选示例三,所述同步处理设备包括多个,所述同步处理模块404可以具体用于:
将所述同步线程优先发送至可供同步处理的最大线程数大于预设线程数阈值的同步处理设备。
根据本申请实施例,根据期望运行的同步线程数和运行所需的同步线程数的差值补充配置相应的虚拟线程,利用虚拟线程将同步处理设备上的部分CPU置于待机状态并暂停处理其他同步任务,以保证同步处理设备有足够的资源处理该数据同步任务,提升了数据同步的效率和稳定性。而且,不需要从系统层面控制CPU的使用,提升了数据同步的灵活性。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质 的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种数据同步方法和一种数据同步装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (18)

  1. 一种数据同步方法,其特征在于,包括:
    根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数;
    按照期望运行的同步线程数进行线程配置,并采用配置的线程针对所述待同步数据同步至目标数据库。
  2. 根据权利要求1所述的方法,其特征在于,所述按照期望运行的同步线程数进行线程配置包括:
    判断所述期望运行的同步线程数是否大于运行所需的同步线程数;
    若是,则根据所述期望运行的同步线程数和所述运行所需的同步线程数的差值,补充配置相应的线程。
  3. 根据权利要求1所述的方法,其特征在于,在所述根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数之前,所述方法还包括:
    从用户提交的数据同步任务中,获取所述同步任务的优先级,并从保存有待同步数据的所述源数据库中,获取所述待同步数据的数据量。
  4. 根据权利要求1所述的方法,其特征在于,所述根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数包括:
    查找针对所述待同步数据的数据量和同步任务的优先级的平均同步速率;
    利用所述平均同步速率、同步处理设备的最大同步速率和所述同步处理设备可供同步处理的线程数,计算所述期望运行的同步线程数。
  5. 根据权利要求2所述的方法,其特征在于,在所述判断所述期望运行的同步线程数是否大于运行所需的同步线程数之前,所述方法还包 括:
    从待与源数据库进行数据同步的所述目标数据库中,获取所述运行所需的同步线程数。
  6. 根据权利要求1所述的方法,其特征在于,在所述采用配置的线程针对所述待同步数据同步至目标数据库之前,所述方法还包括:
    按照所述运行所需的同步线程数将所述待同步数据拆分成多个待同步数据块;
    所述采用配置的线程针对所述待同步数据同步至目标数据库为:
    将用于同步各个待同步数据块的同步线程,调度至同步处理设备,以由所述同步处理设备处理所述同步线程。
  7. 根据权利要求1所述的方法,其特征在于,所述采用配置的线程针对所述待同步数据同步至目标数据库包括:
    将线程属性满足预设条件的至少一个同步线程优先发送至所述同步处理设备。
  8. 根据权利要求7所述的方法,所述预设条件包括以下至少一种:
    所述至少一个同步线程属于同一个数据同步任务、待处理时间大于预设时间阈值、对应同步任务的优先级大于预设优先级阈值、单个同步处理设备可同步处理。
  9. 根据权利要求1所述的方法,其特征在于,所述同步处理设备包括多个,所述采用配置的线程针对所述待同步数据同步至目标数据库包括:
    将所述同步线程优先发送至可供同步处理的最大线程数大于预设线程数阈值的同步处理设备。
  10. 一种数据同步装置,其特征在于,包括:
    同步线程数生成模块,用于根据源数据库中待同步数据的数据量和同步任务的优先级,生成期望运行的同步线程数;
    同步处理模块,用于按照期望运行的同步线程数进行线程配置,并采用配置的线程针对所述待同步数据同步至目标数据库。
  11. 根据权利要求10所述的装置,其特征在于,所述同步处理模块包括:
    同步线程数判断子模块,用于判断所述期望运行的同步线程数是否大于运行所需的同步线程数,若是,则调用线程补充子模块;
    线程补充子模块,用于根据所述期望运行的同步线程数和所述运行所需的同步线程数的差值,补充配置相应的线程。
  12. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    优先级及数据量获取模块,用于从用户提交的数据同步任务中,获取所述同步任务的优先级,并从保存有待同步数据的源数据库中,获取所述待同步数据的数据量。
  13. 根据权利要求10所述的装置,其特征在于,所述同步线程数生成模块包括:
    平均同步速率查找子模块,用于查找针对所述待同步数据的数据量和同步任务的优先级的平均同步速率;
    同步线程数计算子模块,用于利用所述平均同步速率、同步处理设备的最大同步速率和所述同步处理设备可供同步处理的线程数,计算所述期望运行的同步线程数。
  14. 根据权利要求11所述的装置,其特征在于,所述装置还包括:
    运行所需同步线程数获取模块,用于从待与源数据库进行数据同步的目标数据库中,获取所述运行所需的同步线程数。
  15. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    待同步数据拆分模块,用于按照所述运行所需的同步线程数将所述待同步数据拆分成多个待同步数据块;
    所述同步处理模块具体用于:
    将用于同步各个待同步数据块的同步线程,调度至同步处理设备,以由所述同步处理设备处理所述同步线程。
  16. 根据权利要求10所述的装置,其特征在于,所述同步处理模块具体用于:
    将线程属性满足预设条件的至少一个同步线程优先发送至所述同步处理设备。
  17. 根据权利要求16所述的装置,所述预设条件包括以下至少一种:
    所述至少一个同步线程属于同一个数据同步任务、待处理时间大于预设时间阈值、对应同步任务的优先级大于预设优先级阈值、单个同步处理设备可同步处理。
  18. 根据权利要求10所述的装置,其特征在于,所述同步处理设备包括多个,所述同步处理模块具体用于:
    将所述同步线程优先发送至可供同步处理的最大线程数大于预设线程数阈值的同步处理设备。
PCT/CN2016/110658 2015-12-31 2016-12-19 一种数据同步方法和装置 WO2017114199A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511032646.9 2015-12-31
CN201511032646.9A CN106933534B (zh) 2015-12-31 2015-12-31 一种数据同步方法和装置

Publications (1)

Publication Number Publication Date
WO2017114199A1 true WO2017114199A1 (zh) 2017-07-06

Family

ID=59224514

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110658 WO2017114199A1 (zh) 2015-12-31 2016-12-19 一种数据同步方法和装置

Country Status (2)

Country Link
CN (1) CN106933534B (zh)
WO (1) WO2017114199A1 (zh)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241175A (zh) * 2018-06-28 2019-01-18 东软集团股份有限公司 数据同步方法、装置、存储介质及电子设备
CN110225077A (zh) * 2019-04-25 2019-09-10 深圳壹账通智能科技有限公司 变更供应数据的同步方法、装置、计算机设备及计算机存储介质
CN110334018A (zh) * 2019-06-18 2019-10-15 梁俊杰 一种大数据导入方法以及相关设备
CN110377402A (zh) * 2018-04-12 2019-10-25 腾讯科技(深圳)有限公司 业务数据处理方法、装置、存储介质及计算机设备
CN111046091A (zh) * 2019-10-24 2020-04-21 杭州数梦工场科技有限公司 数据交换系统的运行方法、装置及设备
CN111190961A (zh) * 2019-12-18 2020-05-22 航天信息股份有限公司 一种动态优化的多线程数据同步方法及系统
CN112162964A (zh) * 2020-10-15 2021-01-01 苏州交驰人工智能研究院有限公司 一种自适应数据同步方法、装置、计算机设备及存储介质
CN112487007A (zh) * 2020-12-01 2021-03-12 银清科技有限公司 一种多网间流程同步管理方法、装置及系统
CN114237505A (zh) * 2021-12-14 2022-03-25 中国建设银行股份有限公司 业务数据的批量处理方法、装置及计算机设备
CN116048780A (zh) * 2022-12-07 2023-05-02 广州海量数据库技术有限公司 一种基于openGauss数据库的多线程批量文件加载方法
CN116204587A (zh) * 2023-02-21 2023-06-02 中国人民解放军海军工程大学 数据同步任务生成方法、装置和计算机可读存储介质
CN116225660A (zh) * 2023-03-17 2023-06-06 中国华能集团有限公司北京招标分公司 一种装载引擎使用方法
CN116257365A (zh) * 2023-05-15 2023-06-13 建信金融科技有限责任公司 数据入库方法、装置、设备、存储介质及程序产品
CN116303702A (zh) * 2022-12-27 2023-06-23 易方达基金管理有限公司 一种基于etl的数据并行处理方法、装置、设备和存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107402804B (zh) * 2017-07-31 2019-12-10 Oppo广东移动通信有限公司 后台进程管控方法、装置、存储介质及电子设备
CN111382199B (zh) * 2018-12-29 2024-06-21 金篆信科有限责任公司 一种数据库同步复制的方法和装置
CN112148793B (zh) * 2020-09-17 2024-02-20 广东睿住智能科技有限公司 数据同步方法、系统及存储介质
CN112182100B (zh) * 2020-09-22 2022-11-08 烽火通信科技股份有限公司 一种云管平台状态数据同步方法与系统
CN112818054B (zh) * 2020-10-15 2022-05-06 广州南天电脑系统有限公司 数据同步方法、装置、计算机设备和存储介质
CN116708480B (zh) * 2023-07-27 2023-09-29 深圳迅策科技有限公司 一种基于Datax框架的数据同步方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101542302A (zh) * 2005-07-11 2009-09-23 摩托罗拉公司 用于直达多节点系统同步的方法和装置
CN103678718A (zh) * 2013-12-31 2014-03-26 金蝶软件(中国)有限公司 数据库同步方法及系统
CN103778136A (zh) * 2012-10-19 2014-05-07 阿里巴巴集团控股有限公司 一种跨机房数据库同步方法及系统
CN103942259A (zh) * 2014-03-21 2014-07-23 浙江大学 一种数据库同步中实现数据缓存的方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136035B (zh) * 2011-11-30 2015-11-25 国际商业机器公司 用于混合线程模式的程序的线程管理的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101542302A (zh) * 2005-07-11 2009-09-23 摩托罗拉公司 用于直达多节点系统同步的方法和装置
CN103778136A (zh) * 2012-10-19 2014-05-07 阿里巴巴集团控股有限公司 一种跨机房数据库同步方法及系统
CN103678718A (zh) * 2013-12-31 2014-03-26 金蝶软件(中国)有限公司 数据库同步方法及系统
CN103942259A (zh) * 2014-03-21 2014-07-23 浙江大学 一种数据库同步中实现数据缓存的方法

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377402A (zh) * 2018-04-12 2019-10-25 腾讯科技(深圳)有限公司 业务数据处理方法、装置、存储介质及计算机设备
CN109241175A (zh) * 2018-06-28 2019-01-18 东软集团股份有限公司 数据同步方法、装置、存储介质及电子设备
CN110225077A (zh) * 2019-04-25 2019-09-10 深圳壹账通智能科技有限公司 变更供应数据的同步方法、装置、计算机设备及计算机存储介质
CN110334018A (zh) * 2019-06-18 2019-10-15 梁俊杰 一种大数据导入方法以及相关设备
CN111046091A (zh) * 2019-10-24 2020-04-21 杭州数梦工场科技有限公司 数据交换系统的运行方法、装置及设备
CN111046091B (zh) * 2019-10-24 2023-12-08 杭州数梦工场科技有限公司 数据交换系统的运行方法、装置及设备
CN111190961A (zh) * 2019-12-18 2020-05-22 航天信息股份有限公司 一种动态优化的多线程数据同步方法及系统
CN111190961B (zh) * 2019-12-18 2023-09-29 航天信息股份有限公司 一种动态优化的多线程数据同步方法及系统
CN112162964B (zh) * 2020-10-15 2024-04-05 苏州交驰人工智能研究院有限公司 一种自适应数据同步方法、装置、计算机设备及存储介质
CN112162964A (zh) * 2020-10-15 2021-01-01 苏州交驰人工智能研究院有限公司 一种自适应数据同步方法、装置、计算机设备及存储介质
CN112487007A (zh) * 2020-12-01 2021-03-12 银清科技有限公司 一种多网间流程同步管理方法、装置及系统
CN112487007B (zh) * 2020-12-01 2024-05-10 银清科技有限公司 一种多网间流程同步管理方法、装置及系统
CN114237505A (zh) * 2021-12-14 2022-03-25 中国建设银行股份有限公司 业务数据的批量处理方法、装置及计算机设备
CN116048780B (zh) * 2022-12-07 2023-08-08 广州海量数据库技术有限公司 一种基于openGauss数据库的多线程批量文件加载方法
CN116048780A (zh) * 2022-12-07 2023-05-02 广州海量数据库技术有限公司 一种基于openGauss数据库的多线程批量文件加载方法
CN116303702A (zh) * 2022-12-27 2023-06-23 易方达基金管理有限公司 一种基于etl的数据并行处理方法、装置、设备和存储介质
CN116303702B (zh) * 2022-12-27 2024-04-05 易方达基金管理有限公司 一种基于etl的数据并行处理方法、装置、设备和存储介质
CN116204587A (zh) * 2023-02-21 2023-06-02 中国人民解放军海军工程大学 数据同步任务生成方法、装置和计算机可读存储介质
CN116204587B (zh) * 2023-02-21 2024-01-30 中国人民解放军海军工程大学 数据同步任务生成方法、装置和计算机可读存储介质
CN116225660B (zh) * 2023-03-17 2024-02-27 中国华能集团有限公司北京招标分公司 一种装载引擎使用方法
CN116225660A (zh) * 2023-03-17 2023-06-06 中国华能集团有限公司北京招标分公司 一种装载引擎使用方法
CN116257365B (zh) * 2023-05-15 2023-08-22 建信金融科技有限责任公司 数据入库方法、装置、设备、存储介质
CN116257365A (zh) * 2023-05-15 2023-06-13 建信金融科技有限责任公司 数据入库方法、装置、设备、存储介质及程序产品

Also Published As

Publication number Publication date
CN106933534B (zh) 2020-07-28
CN106933534A (zh) 2017-07-07

Similar Documents

Publication Publication Date Title
WO2017114199A1 (zh) 一种数据同步方法和装置
US9916183B2 (en) Scheduling mapreduce jobs in a cluster of dynamically available servers
US20150295970A1 (en) Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system
TWI660274B (zh) 基於區塊鏈的資料處理方法及設備
US20150169684A1 (en) Dynamic Priority-Based Query Scheduling
WO2017016421A1 (zh) 一种集群中的任务执行方法及装置
US11275622B2 (en) Utilizing accelerators to accelerate data analytic workloads in disaggregated systems
WO2018177250A1 (zh) 一种基于区块链的数据处理方法及设备
US20170024251A1 (en) Scheduling method and apparatus for distributed computing system
US9934276B2 (en) Systems and methods for fault tolerant, adaptive execution of arbitrary queries at low latency
US20170068574A1 (en) Multiple pools in a multi-core system
US20150363229A1 (en) Resolving task dependencies in task queues for improved resource management
US10505863B1 (en) Multi-framework distributed computation
TWI694700B (zh) 資料處理方法和裝置、用戶端
TWI679581B (zh) 任務執行的方法及裝置
WO2017114176A1 (zh) 一种分布式环境协调消费队列方法和装置
US11888952B2 (en) Topic-based data routing in a publish-subscribe messaging environment
TWI697223B (zh) 資料處理方法
CN103873587A (zh) 一种基于云平台实现调度的方法及装置
CN110618860A (zh) 基于Spark的Kafka消费并发处理方法及装置
CN111338803A (zh) 一种线程处理方法和装置
CN109032779B (zh) 任务处理方法、装置、计算机设备及可读存储介质
TWI731926B (zh) 一種資料同步方法和裝置
WO2016082463A1 (zh) 一种多核处理器下的数据处理方法、装置及存储介质
CN109947843B (zh) 区块链中时间的确定方法、装置及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16880988

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16880988

Country of ref document: EP

Kind code of ref document: A1