WO2017097124A1 - 基于分库分表的任务传输方法、装置及系统 - Google Patents

基于分库分表的任务传输方法、装置及系统 Download PDF

Info

Publication number
WO2017097124A1
WO2017097124A1 PCT/CN2016/107409 CN2016107409W WO2017097124A1 WO 2017097124 A1 WO2017097124 A1 WO 2017097124A1 CN 2016107409 W CN2016107409 W CN 2016107409W WO 2017097124 A1 WO2017097124 A1 WO 2017097124A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
scheduling
tables
database
unit
Prior art date
Application number
PCT/CN2016/107409
Other languages
English (en)
French (fr)
Inventor
洪鲛
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017097124A1 publication Critical patent/WO2017097124A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Definitions

  • the present invention relates to the field of database technologies, and in particular, to a task transmission method, apparatus and system based on a sub-database.
  • data synchronization plays a very important role as the export and import of data warehouses, especially offline data synchronization.
  • a task needs to synchronize data of hundreds of GB or even TB, which is the data synchronization tool.
  • Stability requirements are very high, and because the extractor database (DB, Database) supports concurrent reads, the pressure on the extractor and write database DB is also very large. It is precisely because of the synchronization of a large amount of data that people can more conveniently pay attention to the data that was originally ignored, which causes more long tail tasks in the data synchronization process, resulting in the pressure and long tail of the extraction end DB becoming the bottleneck of data synchronization. .
  • the solution in the prior art is for a single library (that is, only one library is extracted for one task), that is, a layer of services is performed on the upper layer of the underlying synchronization tool, and scheduling control is performed in the service to avoid concurrent tasks extracted from one library at the same time. Too many.
  • the single library can no longer meet the demand of large data volume, and it is necessary to split a single database into multi-database multi-table storage data.
  • the de-library extraction strategy directly determines the task extraction speed, so that the above solution for single-library is no longer applicable.
  • the embodiment of the invention provides a task transmission method, device and system based on a sub-library table, so as to at least solve the problem that the extracting end is from the sub-library table DB in the prior art in the concurrent transmission of the task based on the sub-database table.
  • the problem of concurrently reading data in the concurrent pressure is too large, resulting in a technical problem of low efficiency of concurrent transmission of the sub-segment table.
  • a task transfer method based on a split table including: extracting a task set to be transmitted from a sub-database table, wherein the task set includes: a plurality of sub-databases, and The sub-tables included in each sub-database; n sub-tables are retrieved from multiple sub-libraries according to the pre-configured total concurrent granularity, n is equal to the total concurrent granularity; n sub-tables to be retrieved are hash-allocated Hashing to different scheduling units, wherein the units of each scheduling unit are pre-configured to have the same granularity; the parts tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity.
  • a task transmission apparatus based on a sub-segment table, including: an extraction module, configured to extract a task set to be transmitted from a sub-database table, wherein the task set includes : a plurality of sub-libraries, and sub-tables included in each sub-library; a retrieving module for retrieving n sub-tables from a plurality of sub-libraries according to a pre-configured total concurrent granularity, n being equal to the total concurrent granularity; a module, configured to hash the n sub-tables hashed to different scheduling units by using a hash allocation manner, where pre-configured units of each scheduling unit have the same granularity; concurrent modules are used for each scheduling unit The sub-lists contained in the table are concurrently transmitted to the target location according to the unit concurrent granularity.
  • a task transmission system based on a sub-database including: a source data terminal, configured to store a sub-segment table; and a scheduling terminal, configured to communicate with the source data terminal, for
  • the task set to be transmitted is extracted from the library sub-table, wherein the task set includes: a plurality of sub-libraries, and a sub-table included in each sub-library, and is obtained from a plurality of sub-libraries according to a pre-configured total concurrent granularity.
  • n sub-tables, n is equal to the total concurrent granularity.
  • each scheduling unit After the n sub-tables to be hashed are hashed to different scheduling units, the sub-tables included in each scheduling unit are concurrently distributed according to the unit concurrent granularity.
  • the transmission wherein the units of each scheduling unit are pre-configured to have the same granularity; the target terminal is in communication with the scheduling terminal, and is configured to receive a set of tasks transmitted by the scheduling terminal concurrently.
  • n sub-tables may be retrieved from the plurality of sub-databases according to the pre-configured total concurrent granularity.
  • hashing the retrieved n sub-tables into different scheduling units by hash allocation the sub-tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity, and the scheme can be based on the total concurrent granularity. Separate the set of tasks to be transmitted with the unit concurrent granularity to balance the task of parallel transfer of the sub-tables.
  • Task transfer which not only realizes the task transfer of the sub-database sub-table, but also, when transmitting the tasks of the sub-segment table, balances the task of transferring the sub-tables in the case of satisfying the pre-configured total concurrent granularity, therefore, It can balance multiple tasks in the sub-table, reduce the pressure of concurrent data reading, and improve the efficiency of concurrent transmission.
  • the solution of the foregoing embodiment provided by the present application solves the problem that the extracting end concurrently reads data from the sub-database sub-table DB in the prior-stage concurrent transmission of the task based on the sub-segment table, resulting in concurrent The technical problem of low efficiency of the task of transferring the sub-tables.
  • FIG. 1 is a block diagram showing a hardware structure of a computer terminal based on a task transfer method based on a split table according to an embodiment of the present application;
  • FIG. 2 is a flowchart of a task transmission method based on a sub-database table according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a sub-segment corresponding scheduling management unit according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a sub-directory hash table hashing to a scheduling management unit according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a scheduling management unit acquiring an optimal solution according to an embodiment of the present application.
  • FIG. 6 is an interaction flowchart of an optional task-based transfer method based on a split-table table according to an embodiment of the present application
  • FIG. 7 is a schematic diagram of a task transfer apparatus based on a split table according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an optional task-based transmission device based on a sub-segment table according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an optional task-based transmission device based on a sub-segment table according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an optional task-based transmission device based on a sub-segment table according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of an optional task-based transmission device based on a library table according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of an optional task-based transmission device based on a sub-segment table according to an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a task transfer system based on a split table according to an embodiment of the present application.
  • FIG. 14 is a structural block diagram of a computer terminal according to an embodiment of the present application.
  • ETL Extract-Transform-Load short name, used to describe the process of extracting, transforming, and loading data from the source to the destination.
  • ETL is an important part of building a data warehouse. Users extract the required data from the data source, and after data cleaning, the data is faked into the data warehouse according to the predefined data warehouse module.
  • Sub-database sub-table Stores the data stored in one library into multiple libraries, and stores the data stored in one table into multiple tables.
  • DB (Database), a database, is a data warehouse organized, stored, and managed according to the data structure.
  • a task transfer method based on a sub-segment table
  • steps shown in the flowchart of the drawing may be in a computer system such as a set of computer executable instructions. The execution is performed, and although the logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
  • FIG. 1 is a hardware structural block diagram of a computer terminal based on a task transfer method based on a library table according to an embodiment of the present application.
  • computer terminal 10 can include One or more (only one shown) processor 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), memory 104 for storing data, and The transmission module 106 of the communication function.
  • processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), memory 104 for storing data, and The transmission module 106 of the communication function.
  • FIG. 1 is merely illustrative and does not limit the structure of the above electronic device.
  • computer terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration than that shown in FIG.
  • the memory 104 can be used to store software programs and modules of application software, such as program instructions/modules corresponding to the task transfer method based on the sub-segment table in the embodiment of the present invention, and the processor 102 runs the software program stored in the memory 104 and Modules, which perform various functional applications and data processing, that is, implement the vulnerability detection method of the above application.
  • Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 104 may further include memory remotely located relative to processor 102, which may be coupled to computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Transmission device 106 is for receiving or transmitting data via a network.
  • the network specific examples described above may include a wireless network provided by a communication provider of the computer terminal 10.
  • the transmission device 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 106 can be a Radio Frequency (RF) module for communicating with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a task transfer method based on a split table according to an embodiment of the present application.
  • the method shown in FIG. 2 may include the following steps:
  • Step S21 Extracting a task set to be transmitted from the sub-division table, where the task set includes: a plurality of sub-libraries, and a sub-table included in each sub-library.
  • a task set may include multiple sub-libraries, each sub-library may include multiple sub-tables, and each sub-table may record multiple pieces of data information, which may be registered user data information, accessing a webpage. Data information, purchase of product data information, and so on. After extracting the set of tasks to be transmitted from the sub-division table, they can be numbered according to the extraction order.
  • a configuration file may be obtained, where the name of the plurality of sub-databases to be transmitted is recorded, and each sub-library is included The name of the sub-table.
  • the sub-database sub-table After reading the configuration file, according to the name of multiple sub-libraries read and the names of multiple sub-tables, from the source database The sub-database sub-table extracts a plurality of sub-libraries to be transmitted and sub-tables included in each sub-library.
  • the task set may include three sub-databases, namely, sub-database A, sub-database B, and sub-library C.
  • sub-database A may include four sub-tables, respectively Sub-table T1, sub-table T2, sub-table T3 and sub-table T4,
  • sub-library B can contain two sub-tables, namely sub-table T1 and sub-table T2,
  • sub-library C can contain 4 sub-tables, respectively T1, sub-table T2, sub-table T3, and sub-table T4.
  • step S23 n sub-tables are retrieved from the plurality of sub-databases according to the pre-configured total concurrent granularity, where n is equal to the total concurrent granularity.
  • the total concurrent granularity indicates the number of simultaneous concurrent tasks, and n sub-tables satisfying the total concurrent granularity may be sequentially extracted from the task set according to the pre-configured total concurrent granularity.
  • the pre-configured total concurrent granularity may be configured according to the actual needs of the user, or may be configured according to the concurrent capability of the database.
  • the pre-configured total concurrent granularity may be 9, and the sub-table T1, the sub-table T2, and the sub-table are extracted from the sub-database A according to the total concurrent granularity.
  • T3 and sub-table T4 the sub-table T1 and the sub-table T2 are extracted from the sub-database B, and the sub-table T1, the sub-table T2 and the sub-table T3 are extracted from the sub-library C.
  • the number of extracted sub-tables is the same as the total concurrent granularity.
  • step S25 the n sub-tables that are retrieved are hashed to different scheduling units by using a hash allocation manner, wherein the units of each scheduling unit are pre-configured to have the same granularity.
  • the n sub-tables are sequentially hashed into different scheduling units according to the hash allocation manner.
  • the unit concurrency granularity (tgChannel) of each of the pre-configured scheduling units is used to indicate the number of concurrently concurrent tasks of each scheduling unit, and the unit concurrency granularity is less than or equal to the total concurrent granularity.
  • the scheduling unit may be a scheduling management module (TG, a shorthand for taskGroup), and the sub-database scheduling task may include a sub-database A, a sub-database B, and a sub-database C.
  • Each sub-database can include n sub-tables, each of which is a task to be transmitted, and 3n sub-tables are hashed into n scheduling management units, and each scheduling management unit can include three sub-tables. .
  • the unit of each scheduling unit may be configured with a concurrent granularity of 3, and the extracted 9 sub-tables may be hashed to different scheduling units, and the scheduling unit.
  • the scheduling unit. 1 may include: AT3, BT2, and CT3
  • scheduling unit 2 may include: AT1, AT4, and CT1
  • scheduling unit 3 may include: AT2, BT1, and CT2.
  • the above hashing method hashes the sub-tables in the same sub-database into different scheduling units, and tries to It satisfies the consistency of multiple sub-tables in different sub-libraries, and reduces the concurrency pressure of the extracted segment DB.
  • step S27 the sub-lists included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity.
  • the target location may be a target database for storing the extracted data information.
  • each scheduling unit concurrently transmits the hashed n sub-tables to the target database according to the unit concurrent granularity.
  • the plurality of sub-databases may be retrieved according to the pre-configured total concurrent granularity.
  • the sub-tables are hashed to different scheduling units by hashing the n sub-tables, and the sub-tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity.
  • the task set to be transmitted is split according to the total concurrent granularity and unit concurrency granularity to balance the task of parallel transfer of the sub-table.
  • the task transfer of the sub-database sub-table not only realizes the task transfer of the sub-database sub-table, but also, in the case of transmitting the task of sub-database sub-table, in the case of satisfying the pre-configured total concurrent granularity, the balanced transfer sub-table
  • the task therefore, can balance multiple tasks in the sub-table, reduce the pressure of concurrent data reading, and improve the efficiency of concurrent transmission.
  • the solution of the above-mentioned first embodiment provided by the present application solves the problem that the data extracted by the extracting end from the sub-database sub-table DB is too large when the data is concurrently transmitted by the sub-database sub-table DB in the prior art.
  • step S25 the n sub-tables that are retrieved are hashed to different scheduling units by using a hash allocation manner, and the method further includes the following steps S251 and S255:
  • Step S251 determining the number of scheduling units according to the total concurrent granularity and the unit concurrent granularity, and assigning corresponding numbers to each scheduling unit.
  • the total concurrent granularity T may be 9
  • the unit concurrent granularity t may be 3
  • Step S253 the hash allocation value of each of the partial tables Ti in any one of the sub-databases is calculated by the following formula, wherein the hash allocation value is used to represent the number Tpos hashed by the sub-table Ti to the corresponding scheduling unit:
  • the initial value is 0, the totalChannel is the total concurrent granularity, and the tgChannel is the unit concurrent granularity.
  • Tpos (TCount+offset)%tgCount. The number assigned to the scheduling unit, TGi.
  • the numbers of the sub-base A, the sub-base B, and the sub-base C are 0, 1, and 2, respectively, and the offset A of the sub-base A is 0,
  • step S255 the n sub-tables are respectively hashed according to the calculated hash allocation value to the corresponding scheduling unit.
  • the n parts table is hashed to the corresponding scheduling unit.
  • the sub-table T1, the sub-table T2, the sub-table T3, and the sub-table T4 in the sub-database A can be divided into the sub-base B.
  • the sub-table T1 and the sub-table T2, the sub-table T1, the sub-table T2, the sub-table T3 and the sub-table T4 in the sub-library C are hashed into the corresponding scheduling unit, and the hash result may be the sub-table AT3, the sub-table BT2
  • the sub-table CT3 is hashed to the scheduling unit TG0
  • the sub-table AT1, the sub-table AT4, the sub-table CT1, and the sub-table CT4 are hashed to the scheduling unit TG1, and the sub-table AT2, the sub-table BT1, and the sub-table CT2 are hashed to Scheduling unit TG2.
  • the scheduling unit may be determined according to the pre-configured total concurrent granularity and the unit concurrency granularity. Quantity, and assign a corresponding number to each scheduling unit, and calculate the number of the scheduling unit corresponding to each sub-table in each sub-library by the following formula, so as to achieve the purpose of segmenting the tasks of the sub-database sub-table.
  • the tasks of the sub-database sub-tables are evenly distributed to different scheduling units, thereby avoiding waste of resources caused by the sequential transmission of tasks in the sub-library sub-tables in the prior art.
  • step S27 the sub-tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity, and in step S22701, the n sub-databases are obtained in real time.
  • step S2703 hashing m sub-tables to different scheduling units by hash allocation according to the total concurrent granularity, step S2705, if the threads of the scheduling unit are full, The part table allocated to the corresponding scheduling unit is placed in the waiting queue of the corresponding scheduling unit, and after the at least one thread of the scheduling unit is released, the part table in the waiting queue is scheduled to the corresponding scheduling unit.
  • the plurality of sub-tables that are not transmitted in the plurality of sub-databases are hashed to different
  • the scheduling unit if the thread of the scheduling unit is occupied, the hashed sub-table is placed in the waiting queue until any one of the scheduling units is released, and then can be concurrently transmitted by the scheduling unit to the target location, if scheduling If the unit's thread is not full, the hashed table is directly transmitted by the scheduling unit to the target location.
  • the pre-configured total concurrent granularity may be 9, and the sub-tables extracted according to the total concurrent granularity are the sub-table AT1, the sub-table AT2, the sub-table AT3, and the sub-table AT4. , sub-table BT1, sub-table BT2, sub-table CT1, sub-table CT2 and sub-table CT3. At this time, the remaining untransferred sub-tables of the three sub-banks are the sub-table CT4.
  • the sub-table CT4 should be allocated to the scheduling unit TG1. Since the scheduling unit TG1 is occupied, the first The sub-table CT4 enters a waiting queue for the scheduling unit TG1, as shown by the dashed line in FIG. Any one of the sub-tables AT1, AT4, and CT1 is transmitted to the target location. After the thread is released, the sub-table CT4 in the waiting queue is dispatched to the scheduling unit TG1 and transmitted to the target location.
  • the remaining m sub-tables of the plurality of sub-databases are acquired, and are allocated to the corresponding scheduling unit after the hash allocation. If the scheduling unit thread is full, at least one thread entering the waiting queue waiting for the scheduling unit is released, and the purpose of splitting the allocation table is implemented, and the waiting queue is entered when the thread is all occupied, after at least one thread is released. Enter the scheduling unit to ensure maximum resource utilization when concurrently transferring tasks of the sub-database.
  • the above embodiment can achieve the purpose, so that the DB extraction pressure of the sub-division task is balanced, but this The tasks only exist in the ideal situation. Most of the tasks in the same sub-library will not keep the same number of records, and the extraction time cannot be predicted. At this time, the scheduling module needs to do a control scheduling and select the optimal solution.
  • the method further includes steps S271 to S275:
  • Step S271 if there is an idle thread in any one of the scheduling units, obtain the current of each of the sub-databases.
  • the concurrency number where the current concurrency number is used to represent the number of sub-tables in the sub-database that have been scheduled into the corresponding scheduling unit.
  • the current concurrency number of each of the foregoing repositories may be saved in a thread concurrency manager (TC-MGR, short for Thread-Concurrency-Manager), and when the thread is started, the corresponding sub-database of the thread in the scheduling module is scheduled.
  • T-MGR thread concurrency manager
  • the number of concurrent variables is increased by 1. If the thread finishes executing, the number of concurrent branches in the thread concurrency manager is decremented by 1.
  • the two processes are marked as synchronized, and the threads are mutually exclusive access. Become a hold channel and a release channel.
  • the greedy algorithm is used to obtain the current concurrency number of each sub-database in the scheduling module when the thread needs to be started.
  • the scheduling unit TG0 acquires the current concurrent number of each of the sub-pools in the scheduling module in a case where at least one of the three threads is idle, for example, the thread 1 is idle.
  • Step S273 scheduling a corresponding number of sub-tables from the waiting queue of any one of the scheduling units according to the current concurrency number of each sub-database.
  • the scheduling unit releases two For each thread, two sub-tables in the waiting queue of the scheduling unit are scheduled according to the current concurrency number of each sub-database.
  • the current concurrency number of the current sub-database A recorded in the thread concurrency manager can be 3
  • the current concurrency number of the sub-database B can be 1
  • the current concurrency number of the sub-database C can be 2 If the number of threads to be started is 2, the two sub-tables of the sub-database B can be scheduled from the waiting queue of the scheduling unit. If there is only one sub-database B in the waiting queue, the waiting queue is further scheduled.
  • a sub-table of C in a sub-library if there is only one sub-database B in the waiting queue, the waiting queue is further scheduled.
  • the scheduling queue of the scheduling unit TG1 saves a sub-table T4 of the sub-database B and the sub-tables of other sub-libraries, at this time, according to the proportion of the concurrent number of each sub-library, and the thread is released, the priority is adjusted.
  • the score table T4 enters the scheduling unit TG1. If the threads of the TG1 are all released, the other parts of the wait queue other than T4 can be retrieved.
  • the retrieval rules of any other scheduling unit are the same.
  • Step S275 the corresponding number of sub-tables are scheduled to be concurrently transmitted from the corresponding waiting queue to the target location.
  • the corresponding number of sub-tables are concurrently transmitted to the target location.
  • step S273 a corresponding number of sub-tables are scheduled from the waiting queue of any one of the scheduling units according to the current concurrency number of each sub-database, including the following steps S2731 to S2735:
  • Step S2731 Sorting the current concurrency number of each sub-database to determine the scheduling priority of the sub-tables belonging to different sub-libraries, wherein the lower the current concurrency number of the sub-library, the lower the current concurrency number in the sub-library The scheduling priority of the table is higher.
  • each sub-library is sorted in ascending order according to the current concurrent number, and the sub-database with the lowest current concurrent number has the highest priority, and the current concurrent maximum is the highest.
  • the allocation in the library has the lowest priority, and the scheduling priority of each sub-segment table in the waiting queue of the scheduling unit is determined.
  • Step S2733 determining the number of scheduling according to the number of idle threads existing in any one of the scheduling units.
  • the scheduling unit determines that the number of scheduling is the number of idle threads, and when the number of idle threads is greater than the number of sub-tables in the waiting queue of the scheduling unit, determine The number of schedules is the number of sub-tables in the waiting queue of the scheduling unit.
  • Step S2735 after determining that the part table in the first sub-database is the sub-table with the highest scheduling priority, scheduling the sub-table belonging to the first sub-library from the waiting queue according to the scheduling quantity.
  • the sub-tables that meet the scheduling quantity and the scheduling priority are scheduled from the waiting queue of the scheduling unit.
  • the current concurrency number of the sub-database A may be 3, the current concurrency number of the sub-database B may be 1, and the current concurrency number of the sub-database C may be 2, by analyzing the concurrency of the three sub-databases.
  • the ratio of degrees can determine that the current concurrency number of sub-database B is the lowest, indicating that in the previous concurrency process, the distribution or synchronization efficiency of the sub-tables in sub-database B is the lowest, so the sub-table of sub-database B has the highest scheduling priority, and the system needs
  • the sub-tables in the sub-database B are processed as soon as possible, and the current concurrency number of the sub-database A is the highest, so the sub-table of the sub-database A has the lowest scheduling priority.
  • the waiting queue of the scheduling unit has a sub-table AT5, a sub-table AT6, a sub-table BT3 and a sub-table CT5. When one thread is idle, according to the analysis result of the scheduling priority, the scheduling unit dispatches the sub-waiting queue. Table BT3 is synchronized.
  • the scheduling priority of each sub-table in the scheduling unit is determined according to the sorting result, and the waiting queue of the scheduling unit is obtained according to the number of idle threads. Scheduling the corresponding scheduling priority and the corresponding number of sub-tables, so that when the number of sub-table records is different, the root According to the current concurrency number of each sub-database, the optimal solution is selected, so that the extraction pressure of the sub-database sub-table tasks is balanced.
  • the scheduling priority of each sub-table in the scheduling unit is determined according to the sorting result, and the sub-library of each scheduling management unit needs to be saved in the thread concurrent management device. Usage.
  • step S2735 before the corresponding number of sub-tables are scheduled from the waiting queue according to the scheduling quantity, the method further includes steps S27351 to S27353:
  • Step S27351 the number of the sub-tables belonging to the first sub-bank in the waiting queue is read.
  • step S27353 it is determined whether the number of the sub-tables belonging to the first sub-database is greater than or equal to the number of scheduling.
  • step S2735 scheduling the corresponding quantity of the sub-tables from the waiting queue according to the scheduling quantity; and determining the sub-tables belonging to the first sub-database If the quantity is smaller than the number of scheduling, the sub-tables belonging to the first sub-library and belonging to other sub-libraries are scheduled from the waiting queue according to the scheduling quantity, wherein the other sub-libraries are sub-databases whose current concurrency number is greater than the first sub-library.
  • the current concurrency number of the sub-database A may be 3, the current concurrency number of the sub-database B may be 1, and the current concurrency number of the sub-database C may be 2, by analyzing the concurrency of the three sub-databases.
  • the ratio of degree can be determined that the current concurrency number of sub-database B is the lowest, so the sub-table of sub-database B has the highest scheduling priority, and the current concurrency number of sub-database A is the highest, so the sub-table of sub-database A has the lowest scheduling priority.
  • the current concurrency number of the library C is located between the two, so the scheduling priority of the sub-table C is also located between the two.
  • the waiting queue of the scheduling unit TG0 has a sub-table AT5, a sub-table AT6, a sub-table BT3 and a sub-table CT5.
  • the waiting queue can be read into the sub-database.
  • the number of sub-tables of B is 1
  • the number of sub-tables belonging to sub-database C is 1
  • the number of sub-tables belonging to sub-database A is 2.
  • the current concurrency number of the sub-database A may be 3, the current concurrency number of the sub-database B may be 1, and the current concurrency number of the sub-database C may be 2, by analyzing the three sub-databases.
  • the ratio of concurrency can determine that the current concurrency number of sub-database B is the lowest, so the sub-table of sub-database B has the highest scheduling priority, and the current concurrency number of sub-database A is the highest, so the scheduling priority of sub-database A is the lowest.
  • the current concurrency number of the sub-database C is located between the two, so the scheduling priority of the sub-table C is also located between the two.
  • the waiting queue of the scheduling unit has a sub-table AT5, a sub-table AT6, a sub-table BT3 and a sub-table CT5.
  • the number of the sub-tables belonging to the sub-database B in the waiting queue can be read as 1, and the number of sub-tables belonging to the sub-database C is 1, belonging to the sub-database A.
  • the number of sub-tables is 2, and in the case where the scheduling unit TG0 has three threads idle, the scheduling unit TG0 schedules the sub-table BT3, the sub-table CT5, and the sub-table AT5 from the waiting queue.
  • the current concurrency number of the sub-database A may be 3, the current concurrency number of the sub-database B may be 2, and the current concurrency number of the sub-database C may be 2, by analyzing the three sub-databases.
  • the ratio of concurrency can determine that the current concurrency of sub-database B and sub-database C is the lowest, so the sub-table B and sub-category C have the highest scheduling priority, and the sub-database A has the highest current concurrency, so the sub-database A The scheduling of the sub-tables has the lowest priority.
  • the waiting queue of the scheduling unit has a sub-table AT5, a sub-table AT6, a sub-table BT3, a sub-table BT4, a sub-table BT5, a sub-table CT5 and a sub-table CT6.
  • the number of the sub-tables belonging to the sub-database B in the waiting queue can be read as 3, and the number of sub-tables belonging to the sub-database C is 2, belonging to the sub-database A.
  • the number of sub-tables is 2, and in the case where the scheduling unit TG0 has three threads idle, the scheduling unit TG0 schedules the sub-table BT3, the sub-table BT4, and the sub-table CT5 from the waiting queue.
  • the corresponding number of sub-tables are scheduled from the waiting queue according to the scheduling quantity;
  • the number of the sub-tables belonging to the first sub-library is less than the number of scheduled, if it is determined that the number of the sub-tables belonging to the first sub-library is greater than or equal to the number of scheduling, according to the number of scheduling Scheduling the sub-tables belonging to the first sub-library and belonging to other sub-libraries from the waiting queue, thereby realizing dynamic scheduling equalization, reducing the pressure on the extracted segment DB by the sub-database sub-table task, and reducing the long-tail task.
  • the sub-table marking sub-library identifiers hashed to different scheduling units are used, and the sub-library identifiers are used to represent the sub-library corresponding to the sub-tables.
  • the task set includes multiple sub-databases and multiple sub-tables as application scenarios, and an optional task-based sub-table-based task transmission method is provided, and the method may include The following steps S61 to S67:
  • Step S61 Acquire a task set of the sub-library table.
  • the scheduling terminal 133 reads the name of the sub-database table from the configuration file, and extracts the task set to be transmitted from the sub-library table of the source data terminal 131, and the task The collection includes multiple sub-libraries and sub-tables corresponding to the sub-libraries.
  • the configuration file includes a sub-database A, a sub-database B, and a sub-library C.
  • the sub-database A can include four sub-tables, namely, a sub-table T1, a sub-table T2, a sub-table T3, and a sub-item.
  • Table T4 sub-library B can contain two sub-tables, which are sub-table T1 and sub-table T2
  • sub-library C can contain four sub-tables, which are sub-table T1, sub-table T2, sub-table T3 and sub-table T4. .
  • step S63 the task set of the sub-division table is hashed into a plurality of scheduling management units.
  • the task of dividing the sub-table is divided into n single-table granularity tasks (ie, the above-mentioned n sub-tables),
  • the tasks are uniformly allocated to the scheduling management according to the sub-database. In the unit, and mark which sub-library each task belongs to.
  • the scheduling terminal 133 retrieves n sub-tables from multiple sub-databases according to the total concurrent granularity configured by the user, and determines scheduling management according to the total concurrent granularity configured by the user and the concurrent granularity of the single scheduling management unit.
  • the number of units, and number each scheduling management unit calculate the number of the scheduling management unit corresponding to each task in the task of n single table granularity, and assign n task hashes with a single sub-table granularity to Corresponding to the scheduling management unit.
  • step S63 is the same as the implementation of the step S23 and the step S25 in the foregoing embodiment of the present application, and details are not described herein again.
  • step S65 the plurality of scheduling units obtain the optimal solution from the scheduling module.
  • the optimal solution is equal to the N sub-libraries with the smallest number of concurrent concurrency, and the N sub-databases still have unconsumed tasks in the current TG (the results are arranged according to the number of concurrent numbers from small to large) (ie, the above multiple In the sub-library, m sub-tables other than n sub-tables), N is the request parameter for maintaining the channel, and is generally the number of channels currently hungry in the scheduling management unit.
  • the scheduling module saves the current concurrency number of each sub-database in the task set of the sub-database sub-table and the usage of each sub-database by each scheduling unit, and the scheduling terminal 133 from the scheduling module Obtain the current concurrency number of each sub-database, and sort the current concurrency number of each sub-database to determine the scheduling priority of each sub-library, and read the number of idle threads of the scheduling unit and the sub-library with the smallest current concurrency number.
  • the number of sub-tables not transmitted in the middle when the number of idle threads is greater than the number of sub-tables, the sub-table of the sub-library with the smallest current concurrency and the sub-table of other sub-libraries are retrieved; the number of idle threads is less than or equal to the number of sub-tables When, the sub-table of the sub-library with the smallest number of concurrent concurrency is retrieved.
  • step S67 the transmission task set is concurrently transmitted to the target location.
  • the task set is concurrently transmitted, and the untransferred task is placed in the waiting queue of the scheduling management unit when the scheduling management unit thread is occupied. In the middle, until the scheduling management unit has an idle thread, the optimal solution is sent in parallel.
  • the implementation manner of the foregoing step S63 is the same as the implementation manner of the S27 in the foregoing embodiment of the present application, and details are not described herein again.
  • the task transfer method based on the split table according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, Hardware, but in many cases the former is a better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
  • a task transfer device based on a sub-database table for implementing the above-described task transfer method based on a sub-division table is further provided.
  • the device includes: an extracting module 71, The module 73, the processing module 75 and the concurrent module 77 are taken.
  • the extracting module 71 is configured to extract a task set to be transmitted from the sub-database table, where the task set includes: a plurality of sub-libraries, and a sub-table included in each sub-library.
  • the retrieving module 73 is configured to retrieve a total of n sub-tables from a plurality of sub-databases according to a pre-configured total concurrent granularity, where n is equal to the total concurrent granularity.
  • the processing module 75 is configured to hash the n sub-tables that are retrieved into different scheduling units by using a hash allocation manner, where the units of each scheduling unit are pre-configured to have the same granularity.
  • the concurrency module 77 is configured to concurrently transmit the sub-tables included in each scheduling unit to the target location according to the unit concurrent granularity.
  • the foregoing extraction module 71, the retrieval module 73, the processing module 75, and the concurrent module 77 correspond to the steps S21 to S27 in the first embodiment, and the examples and applications implemented by the four modules and corresponding steps.
  • the scene is the same, but is not limited to the content disclosed in the first embodiment.
  • the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the plurality of sub-databases may be retrieved according to the pre-configured total concurrent granularity.
  • the sub-tables are hashed to different scheduling units by hashing the n sub-tables, and the sub-tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity.
  • the task set to be transmitted is split according to the total concurrent granularity and unit concurrency granularity to balance the task of parallel transfer of the sub-table.
  • the task transmission based on the sub-database table can be realized by the solution provided by the embodiment of the present application, so that not only the task transmission of the sub-database sub-table is realized, but also, when the task of the sub-database sub-table is transmitted In the case of satisfying the pre-configured total concurrent granularity, the task of transferring the sub-tables is balanced, so that multiple tasks in the sub-division table can be balanced, the pressure of concurrently reading data is reduced, and the concurrent transmission efficiency is improved.
  • the solution of the foregoing embodiment 2 provided by the present application solves the problem that the data extracted by the extracting end from the sub-database sub-table DB is too large when the data is concurrently transmitted by the sub-database sub-table DB in the prior art.
  • the foregoing apparatus further includes: a first determining module 81 and a calculating module 83, wherein the processing module 75 includes: a first hash allocating module 85.
  • the first determining module 81 is configured to determine the number of scheduling units according to the total concurrent granularity and the unit concurrent granularity, and assign a corresponding number to each scheduling unit.
  • the initial value is 0, the totalChannel is the total concurrent granularity, and the tgChannel is the unit concurrent granularity.
  • the first hash allocation module 85 is configured to hash the n hash tables according to the calculated hash allocation values to corresponding scheduling units.
  • first determining module 81, the calculating module 83 and the first hash allocating module 85 correspond to the steps S251 to S255 in the first embodiment, and the examples and applications implemented by the module and the corresponding steps.
  • the scene is the same, but is not limited to the content disclosed in the first embodiment.
  • the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the foregoing apparatus further includes: a first obtaining module 91, a second hash allocating module 93, and a sub-processing module 95.
  • the first obtaining module 91 is configured to acquire, in real time, m sub-tables other than n sub-tables in the plurality of sub-databases.
  • the second hash allocation module 93 is configured to hash the m parts by hash allocation to different scheduling units according to the total concurrent granularity.
  • the sub-processing module 95 is configured to: if the thread of the scheduling unit is full, place the partial table allocated to the corresponding scheduling unit in the waiting queue of the corresponding scheduling unit, after at least one thread of the scheduling unit is released, The sub-tables in the waiting queue are scheduled to the corresponding scheduling unit.
  • the first obtaining module 91, the second hash allocating module 93 and the sub-processing module 95 correspond to the steps S2701 to S2705 in the first embodiment, and the examples and corresponding steps implemented by the module and the corresponding steps are The scene is the same, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the foregoing apparatus further includes: a second obtaining module 101, a scheduling module 103, and a transmitting module 105.
  • the second obtaining module 101 is configured to obtain the current concurrent number of each sub-database in the case that there is an idle thread in any one of the scheduling units, where the current concurrent number is used to represent that the sub-database has been scheduled to correspond to the corresponding one.
  • the number of partial tables in the scheduling unit is configured to schedule a corresponding number of sub-tables from the waiting queue of any one of the scheduling units according to the current concurrency number of each sub-database.
  • the transmission module 105 is configured to concurrently transmit a corresponding number of sub-tables from the corresponding waiting queue to the target location.
  • the foregoing second obtaining module 101, the scheduling module 103, and the transmission module 105 correspond to steps S271 to S275 in the first embodiment, and the module is the same as the example and application scenario implemented by the corresponding steps, but It is not limited to the contents disclosed in the above embodiment 1. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the scheduling module 103 includes: a sorting module 111, a second determining module 113, and a sub-scheduling module 115.
  • the sorting module 111 is configured to perform the sorting according to the current concurrent number of each of the sub-libraries, and determine the scheduling priority of the sub-tables belonging to different sub-libraries, wherein the lower the current concurrent number of the sub-library, the lower the current concurrent number.
  • the scheduling priority of the sub-tables in the library is higher.
  • the second determining module 113 is configured to determine the number of scheduling according to the number of idle threads existing in any one of the scheduling units.
  • the sub-scheduling module 115 is configured to, after determining that the sub-table in the first sub-database is the sub-table with the highest scheduling priority, schedule the sub-list belonging to the first sub-library from the waiting queue according to the scheduling quantity.
  • the foregoing sorting module 111, the second determining module 113, and the sub-scheduling module 115 correspond to steps S2731 to S2735 in the first embodiment, which are the same as the examples and application scenarios implemented in the corresponding steps, but It is not limited to the contents disclosed in the above embodiment 1. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • the foregoing apparatus further includes: a reading module 121, a determining module 123, a first executing module 125, and a second executing module 127.
  • the reading module 121 is configured to read the number of the partial tables belonging to the first branch in the waiting queue.
  • the judging module 123 is configured to determine whether the number of the sub-tables belonging to the first sub-database is greater than or equal to the number of scheduling.
  • the first execution module 125 is configured to perform the function of the sub-scheduling module if it is greater than or equal to the number of schedulings.
  • the second execution module 127 is configured to, if less than the number of schedulings, schedule the sub-tables belonging to the first sub-library and belong to other sub-libraries from the waiting queue according to the scheduling quantity, where The other sub-libraries are sub-libraries whose current concurrency is greater than the first sub-library.
  • the above-mentioned reading module 121, the determining module 123, the first executing module 125 and the second executing module 127 correspond to step S27351 to step S27353 in the first embodiment, and the module and the corresponding steps are implemented.
  • the example and the application scenario are the same, but are not limited to the content disclosed in the first embodiment.
  • the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.
  • a task transmission system based on a sub-division table is further provided.
  • the system may include: a source data terminal 131, a scheduling terminal 133, and a target terminal 135.
  • the source data terminal 131 is configured to store a sub-segment table.
  • the scheduling terminal 133 is configured to communicate with the source data terminal 131, and is configured to extract a task set to be transmitted from the sub-database table, where the task set includes: a plurality of sub-libraries, and a sub-table included in each sub-library, and according to The pre-configured total concurrent granularity fetches n sub-tables from multiple sub-databases, n is equal to the total concurrent granularity, and after hashing the n sub-tables hashed to different scheduling units, The sub-tables included in each scheduling unit are concurrently transmitted according to the unit concurrent granularity, wherein the units of each scheduling unit are pre-configured to have the same granularity.
  • a task set may include multiple sub-libraries, each sub-library may include multiple sub-tables, and each sub-table may record multiple pieces of data information, which may be registered user data information, accessing a webpage. Data information, purchase of product data information, and so on. After extracting the set of tasks to be transmitted from the sub-division table, they can be numbered according to the extraction order.
  • a configuration file before extracting the task set to be transmitted from the sub-division table, a configuration file may be obtained, where the name of the plurality of sub-databases with the transmission is recorded and included in each sub-library The name of the sub-table.
  • the configuration file After reading the configuration file, according to the names of the plurality of sub-libraries read and the names of the plurality of sub-tables, the plurality of sub-libraries with transmissions and the sub-libraries contained in the sub-databases are extracted from the sub-library table of the source database. Sub-table.
  • the task set may include three sub-databases, namely, sub-database A, sub-database B, and sub-library C.
  • sub-database A may include four sub-tables, respectively Sub-table T1, sub-table T2, sub-table T3 and sub-table T4,
  • sub-library B can contain two sub-tables, namely sub-table T1 and sub-table T2,
  • sub-library C can contain 4 sub-tables, respectively T1, sub-table T2, sub-table T3, and sub-table T4.
  • the total development granularity indicates the number of simultaneous concurrent tasks, and n sub-tables satisfying the total development granularity may be sequentially extracted from the task set according to the pre-configured total concurrent granularity.
  • the pre-configured total concurrent granularity may be configured according to the actual needs of the user, or may be configured according to the concurrent capability of the database.
  • the pre-configured total development granularity may be 9, and the sub-table T1, the sub-table T2, and the sub-table are extracted from the sub-database A according to the total development granularity.
  • T3 and sub-table T4 the sub-table T1 and the sub-table T2 are extracted from the sub-database B, and the sub-table T1, the sub-table T2 and the sub-table T3 are extracted from the sub-library C.
  • the number of extracted sub-tables is the same as the total development granularity.
  • the n sub-tables are sequentially hashed into different scheduling units according to the hash allocation manner.
  • the unit concurrency granularity (tgChannel) of each of the pre-configured scheduling units is used to indicate the number of concurrently concurrent tasks of each scheduling unit, and the unit concurrency granularity is less than or equal to the total concurrent granularity.
  • the scheduling unit may be a scheduling management module (TG, a shorthand for taskGroup), and the sub-database scheduling task may include a sub-database A, a sub-database B, and a sub-database C.
  • Each sub-database can include n sub-tables, each of which is a task to be transmitted, and 3n sub-tables are hashed into n scheduling management units, and each scheduling management unit can include three sub-tables. .
  • the unit of each scheduling unit may be configured with a concurrent granularity of 3, and the extracted 9 sub-tables may be hashed to different scheduling units, and the scheduling unit.
  • the scheduling unit. 1 may include: AT3, BT2, and CT3
  • scheduling unit 2 may include: AT1, AT4, and CT1
  • scheduling unit 3 may include: AT2, BT1, and CT2.
  • the hashing method hashes the sub-tables in the same sub-database into different scheduling units, and satisfies the plurality of sub-tables in different sub-libraries to achieve equalization and concurrency, and reduces the concurrency pressure of the extracted segment DB.
  • the target terminal 135 is in communication with the scheduling terminal 133 and is configured to receive a set of tasks transmitted by the scheduling terminal concurrently.
  • the target terminal may be a target database for storing the extracted data information.
  • each scheduling unit concurrently transmits the hashed n sub-tables to the target database according to the unit concurrent granularity.
  • the plurality of sub-databases may be retrieved according to the pre-configured total concurrent granularity.
  • the sub-tables are hashed to different scheduling units by hashing the n sub-tables, and the sub-tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity.
  • the task set to be transmitted is split according to the total concurrent granularity and unit concurrency granularity to balance the task of parallel transfer of the sub-table.
  • the task transmission based on the sub-database table can be realized by the solution provided by the embodiment of the present application, so that not only the task transmission of the sub-database sub-table is realized, but also, when the task of the sub-database sub-table is transmitted In the case of satisfying the pre-configured total concurrent granularity, the task of transferring the sub-tables is balanced, so that multiple tasks in the sub-division table can be balanced, the pressure of concurrently reading data is reduced, and the concurrent transmission efficiency is improved.
  • the solution of the foregoing embodiment 3 provided by the present application solves the problem that the data extracted by the extracting end from the sub-database sub-table DB is excessively generated when the data is concurrently transmitted by the sub-database sub-table DB in the prior art.
  • the concurrent granularity, the tgChannel is the unit concurrent granularity, and the n partial tables are respectively hashed according to the calculated hash allocation values to the corresponding scheduling
  • the total concurrent granularity T may be 9
  • the unit concurrent granularity t may be 3
  • Tpos (TCount+offset)%tgCount. The number assigned to the scheduling unit, TGi.
  • the numbers of the sub-base A, the sub-base B, and the sub-base C are 0, 1, and 2, respectively, and the offset A of the sub-base A is 0,
  • the n parts table is hashed to the corresponding scheduling unit.
  • the sub-table T1, the sub-table T2, the sub-table T3, and the sub-table T4 in the sub-database A can be divided into the sub-base B.
  • the sub-table T1 and the sub-table T2, the sub-table T1, the sub-table T2, the sub-table T3 and the sub-table T4 in the sub-library C are hashed into the corresponding scheduling unit, and the hash result may be the sub-table AT3, the sub-table BT2
  • the sub-table CT3 is hashed to the scheduling unit TG0
  • the sub-table AT1, the sub-table AT4, the sub-table CT1, and the sub-table CT4 are hashed to the scheduling unit TG1, and the sub-table AT2, the sub-table BT1, and the sub-table CT2 are hashed to Scheduling unit TG2.
  • the scheduling unit may be determined according to the pre-configured total concurrent granularity and the unit concurrency granularity. Quantity, and assign a corresponding number to each scheduling unit, and calculate the number of the scheduling unit corresponding to each sub-table in each sub-library by the following formula, so as to achieve the purpose of segmenting the tasks of the sub-database sub-table.
  • the tasks of the sub-database sub-tables are evenly distributed to different scheduling units, thereby avoiding waste of resources caused by the sequential transmission of tasks in the sub-library sub-tables in the prior art.
  • the scheduling terminal 133 is further configured to: in the process of concurrently transmitting the sub-lists included in each scheduling unit to the target location according to the unit concurrent granularity, acquiring multiple sub-databases in real time. m sub-tables other than n sub-tables, and hash m sub-tables to different scheduling units by hash allocation according to total concurrent granularity. If the scheduling unit's threads are full, The part table allocated to the corresponding scheduling unit is placed in the waiting queue of the corresponding scheduling unit, and after the at least one thread of the scheduling unit is released, the part table in the waiting queue is scheduled to the corresponding scheduling unit.
  • the plurality of sub-tables that are not transmitted in the plurality of sub-databases are hashed to different
  • the scheduling unit if the thread of the scheduling unit is occupied, the hashed sub-table is placed in the waiting queue until any one of the scheduling units is released, and then can be concurrently transmitted by the scheduling unit to the target location, if scheduling If the unit's thread is not full, the hashed table is directly transmitted by the scheduling unit to the target location.
  • the pre-configured total development granularity may be 9, and the sub-tables extracted according to the total development granularity are the sub-table AT1, the sub-table AT2, the sub-table AT3, and the sub-table AT4. , sub-table BT1, sub-table BT2, sub-table CT1, sub-table CT2 and sub-table CT3. At this time, the remaining untransferred sub-tables of the three sub-banks are the sub-table CT4.
  • the sub-table CT4 should be assigned to the scheduling unit TG1 due to the scheduling unit TG1.
  • the thread is full, so the partition CT4 is first entered into the waiting queue of the scheduling unit TG1, as shown by the dashed line in FIG. Any one of the sub-tables AT1, AT4, and CT1 is transmitted to the target location. After the thread is released, the sub-table CT4 in the waiting queue is dispatched to the scheduling unit TG1 and transmitted to the target location.
  • the remaining m sub-tables of the plurality of sub-databases are acquired, and are allocated to the corresponding scheduling unit after the hash allocation. If the scheduling unit thread is full, at least one thread entering the waiting queue waiting for the scheduling unit is released, and the purpose of splitting the allocation table is implemented, and the waiting queue is entered when the thread is all occupied, after at least one thread is released. Enter the scheduling unit to ensure maximum resource utilization when concurrently transferring tasks of the sub-database.
  • the above embodiment can achieve the purpose, so that the DB extraction pressure of the sub-division task is balanced, but this The tasks only exist in the ideal situation. Most of the tasks in the same sub-library will not keep the same number of records, and the extraction time cannot be predicted. At this time, the scheduling module needs to do a control scheduling and select the optimal solution.
  • the scheduling terminal 133 is further configured to: after each of the scheduling units included in each scheduling unit is concurrently transmitted to the target location according to the unit concurrent granularity, there is idle in any one of the scheduling units.
  • the current concurrency number of each sub-database is obtained, wherein the current concurrency number is used to represent the number of sub-tables in the sub-database that have been scheduled to the corresponding scheduling unit, and according to the current concurrency number of each sub-database
  • the corresponding number of sub-tables are scheduled from the waiting queue of any one of the scheduling units, and the corresponding number of sub-tables are scheduled to be concurrently transmitted to the target location from the corresponding waiting queue.
  • the current concurrency number of each of the foregoing repositories may be saved in a thread concurrency manager (TC-MGR, short for Thread-Concurrency-Manager), and when the thread is started, the corresponding sub-database of the thread in the scheduling module is scheduled.
  • T-MGR thread concurrency manager
  • the number of concurrent variables is increased by 1. If the thread finishes executing, the number of concurrent branches in the thread concurrency manager is decremented by 1.
  • the two processes are marked as synchronized, and the threads are mutually exclusive access. Become a hold channel and a release channel.
  • the greedy algorithm is used to obtain the current concurrency number of each sub-database in the scheduling module when the thread needs to be started.
  • the scheduling unit TG0 acquires the current concurrent number of each of the sub-pools in the scheduling module in a case where at least one of the three threads is idle, for example, the thread 1 is idle.
  • the scheduling unit releases two For each thread, two sub-tables in the waiting queue of the scheduling unit are scheduled according to the current concurrency number of each sub-database.
  • the current concurrency number of the current sub-database A recorded in the thread concurrency manager can be 3
  • the current concurrency number of the sub-database B can be 1
  • the current concurrency number of the sub-database C can be 2 If the number of threads to be started is 2, the two sub-tables of the sub-database B can be scheduled from the waiting queue of the scheduling unit. If there is only one sub-database B in the waiting queue, the waiting queue is further scheduled.
  • a sub-table of C in a sub-library if there is only one sub-database B in the waiting queue, the waiting queue is further scheduled.
  • the scheduling queue of the scheduling unit TG1 saves a sub-table T4 of the sub-database B and the sub-tables of other sub-libraries, at this time, according to the proportion of the concurrent number of each sub-library, and the thread is released, the priority is adjusted.
  • the score table T4 enters the scheduling unit TG1. If the threads of the TG1 are all released, the other parts of the wait queue other than T4 can be retrieved.
  • the retrieval rules of any other scheduling unit are the same.
  • the corresponding number of sub-tables are concurrently transmitted to the target location.
  • the scheduling unit's waiting queue dispatches a corresponding number of sub-tables and transmits them to the target location.
  • the greedy algorithm is used to start the channel first when the scheduling unit has channel thread starvation, thereby achieving local optimum to global optimal. The task of achieving balanced concurrent transmission of sub-tables.
  • the scheduling terminal 133 is further configured to perform scheduling according to the current concurrent number of each of the sub-databases, and determine scheduling priorities of the sub-tables belonging to different sub-databases, where The lower the current concurrency number, the lower the scheduling priority of the sub-tables in the sub-database with the lower current concurrency number, and the number of idle threads in any one of the scheduling units is determined, and the number of scheduling is determined in the first sub-deposit. After the sub-table is the sub-table with the highest scheduling priority, the sub-table belonging to the first sub-database is scheduled from the waiting queue according to the number of scheduling.
  • each sub-library is sorted in ascending order according to the current concurrent number, and the sub-database with the lowest current concurrent number has the highest priority, and the current concurrent maximum is the highest.
  • the allocation in the library has the lowest priority, and the scheduling priority of each sub-segment table in the waiting queue of the scheduling unit is determined.
  • the scheduling unit determines that the number of scheduling is the number of idle threads, and when the number of idle threads is greater than the number of sub-tables in the waiting queue of the scheduling unit, determine The number of schedules is the number of sub-tables in the waiting queue of the scheduling unit.
  • the scheduling unit After determining the scheduling priority and the number of scheduling of the sub-tables belonging to different sub-databases, from the scheduling unit A sub-table in the waiting queue that matches the number of scheduling and scheduling priority.
  • the current concurrency number of the sub-database A may be 3, the current concurrency number of the sub-database B may be 1, and the current concurrency number of the sub-database C may be 2, by analyzing the concurrency of the three sub-databases.
  • the ratio of degrees can determine that the current concurrency number of sub-database B is the lowest, indicating that in the previous concurrency process, the distribution or synchronization efficiency of the sub-tables in sub-database B is the lowest, so the sub-table of sub-database B has the highest scheduling priority, and the system needs
  • the sub-tables in the sub-database B are processed as soon as possible, and the current concurrency number of the sub-database A is the highest, so the sub-table of the sub-database A has the lowest scheduling priority.
  • the waiting queue of the scheduling unit has a sub-table AT5, a sub-table AT6, a sub-table BT3 and a sub-table CT5. When one thread is idle, according to the analysis result of the scheduling priority, the scheduling unit dispatches the sub-waiting queue. Table BT3 is synchronized.
  • the scheduling priority of each sub-table in the scheduling unit is determined according to the sorting result, and the waiting queue of the scheduling unit is obtained according to the number of idle threads.
  • the scheduling corresponding to the scheduling priority and the corresponding number of sub-tables so that in the case that the number of sub-table records are different, the optimal solution is selected according to the current concurrency number of each sub-database, so that the extraction pressure of the sub-database sub-table task is obtained. balanced.
  • the scheduling priority of each sub-table in the scheduling unit is determined according to the sorting result, and the sub-library of each scheduling management unit needs to be saved in the thread concurrent management device. Usage.
  • the scheduling terminal 133 is further configured to read the number of the sub-tables belonging to the first sub-database in the waiting queue before scheduling the corresponding number of sub-tables from the waiting queue according to the scheduling quantity. And determining whether the number of the sub-tables belonging to the first sub-database is greater than or equal to the number of scheduling, and if it is determined that the number of sub-tables belonging to the first sub-database is greater than or equal to the number of scheduling, scheduling the corresponding number of sub-tables from the waiting queue according to the scheduling quantity If it is determined that the number of the sub-tables belonging to the first sub-library is less than the number of scheduling, the sub-tables belonging to the first sub-library and belonging to other sub-libraries are scheduled from the waiting queue according to the scheduling quantity, wherein the other sub-libraries are current concurrent numbers greater than The first sub-library.
  • the current concurrency number of the sub-database A may be 3, the current concurrency number of the sub-database B may be 1, and the current concurrency number of the sub-database C may be 2, by analyzing the concurrency of the three sub-databases.
  • the ratio of degree can be determined that the current concurrency number of sub-database B is the lowest, so the sub-table of sub-database B has the highest scheduling priority, and the current concurrency number of sub-database A is the highest, so the sub-table of sub-database A has the lowest scheduling priority.
  • the current concurrency number of the library C is located between the two, so the scheduling priority of the sub-table C is also located between the two.
  • the waiting queue of the scheduling unit TG0 has a sub-table AT5, a sub-table AT6, a sub-table BT3 and a sub-table CT5.
  • the waiting queue can be read into the sub-database.
  • the number of sub-tables of B is 1
  • the number of sub-tables belonging to sub-database C is 1
  • the number of sub-tables belonging to sub-database A is 2.
  • the current concurrency number of the sub-database A may be 3, the current concurrency number of the sub-database B may be 1, and the current concurrency number of the sub-database C may be 2, by analyzing the three sub-databases.
  • the ratio of concurrency can determine that the current concurrency number of sub-database B is the lowest, so the sub-table of sub-database B has the highest scheduling priority, and the current concurrency number of sub-database A is the highest, so the scheduling priority of sub-database A is the lowest.
  • the current concurrency number of the sub-database C is located between the two, so the scheduling priority of the sub-table C is also located between the two.
  • the waiting queue of the scheduling unit has a sub-table AT5, a sub-table AT6, a sub-table BT3 and a sub-table CT5.
  • the number of the sub-tables belonging to the sub-database B in the waiting queue can be read as 1, and the number of sub-tables belonging to the sub-database C is 1, belonging to the sub-database A.
  • the number of sub-tables is 2, and in the case where the scheduling unit TG0 has three threads idle, the scheduling unit TG0 schedules the sub-table BT3, the sub-table CT5, and the sub-table AT5 from the waiting queue.
  • the current concurrency number of the sub-database A may be 3, the current concurrency number of the sub-database B may be 2, and the current concurrency number of the sub-database C may be 2, by analyzing the three sub-databases.
  • the ratio of concurrency can determine that the current concurrency of sub-database B and sub-database C is the lowest, so the sub-table B and sub-category C have the highest scheduling priority, and the sub-database A has the highest current concurrency, so the sub-database A The scheduling of the sub-tables has the lowest priority.
  • the waiting queue of the scheduling unit has a sub-table AT5, a sub-table AT6, a sub-table BT3, a sub-table BT4, a sub-table BT5, a sub-table CT5 and a sub-table CT6.
  • the number of the sub-tables belonging to the sub-database B in the waiting queue can be read as 3, and the number of sub-tables belonging to the sub-database C is 2, belonging to the sub-database A.
  • the number of sub-tables is 2, and in the case where the scheduling unit TG0 has three threads idle, the scheduling unit TG0 schedules the sub-table BT3, the sub-table BT4, and the sub-table CT5 from the waiting queue.
  • the corresponding number of sub-tables are scheduled from the waiting queue according to the scheduling quantity;
  • the number of the sub-tables belonging to the first sub-library is less than the number of scheduled, if it is determined that the number of the sub-tables belonging to the first sub-library is greater than or equal to the number of scheduling, according to the number of scheduling Scheduling the sub-tables belonging to the first sub-library and belonging to other sub-libraries from the waiting queue, thereby realizing dynamic scheduling equalization, reducing the pressure on the extracted segment DB by the sub-database sub-table task, and reducing the long-tail task.
  • the scheduling terminal 133 is further configured to: use a sub-table labeling sub-database identifier that is hashed to different scheduling units, where the sub-library identifier is used to represent the sub-table The original corresponding sub-library.
  • Embodiments of the present invention may provide a computer terminal, which may be any one of computer terminal groups.
  • the foregoing computer terminal may also be replaced with a terminal device such as a mobile terminal.
  • the computer terminal may be located in at least one network device of the plurality of network devices of the computer network.
  • the computer terminal may execute the program code of the following steps in the vulnerability detection method of the application: extracting a task set to be transmitted from the sub-database table, wherein the task set includes: multiple sub-libraries, and each The sub-tables included in the sub-library; n sub-tables are retrieved from multiple sub-libraries according to the pre-configured total concurrent granularity, n is equal to the total concurrent granularity; the n sub-tables to be retrieved are hash-distributed Columns are assigned to different scheduling units, wherein the units of each scheduling unit are pre-configured to have the same granularity; the partial tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity.
  • FIG. 14 is a structural block diagram of a computer terminal according to an embodiment of the present invention.
  • the computer terminal A may include one or more (only one shown in the figure) processor, memory, and transmission means.
  • the memory can be used to store software programs and modules, such as the security vulnerability detection method and the program instruction/module corresponding to the device in the embodiment of the present invention.
  • the processor executes various functions by running a software program and a module stored in the memory.
  • Application and data processing that is, the detection method for implementing the above system vulnerability attack.
  • the memory may include a high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • the memory can further include memory remotely located relative to the processor, which can be connected to terminal A via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the processor may call the memory stored information and the application by the transmission device to perform the following steps: extracting the task set to be transmitted from the sub-database table, wherein the task set includes: multiple sub-libraries, and each sub-library The included sub-tables; n sub-tables are retrieved from multiple sub-databases according to the pre-configured total concurrent granularity, n is equal to the total concurrent granularity; n sub-tables to be retrieved are hashed to different by hash allocation
  • the scheduling unit wherein the units of each scheduling unit are pre-configured to have the same granularity; the partial tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity.
  • the foregoing processor may further execute the following program code: obtain m sub-tables of the plurality of sub-databases other than n sub-tables in real time, and hash the m sub-tables according to the total concurrent granularity.
  • the method is hashed to different scheduling units, wherein if the thread of the scheduling unit is full, the partial table allocated to the corresponding scheduling unit is placed in the waiting queue of the corresponding scheduling unit, at least one of the scheduling units After the thread is released, the sub-table in the waiting queue is dispatched to the corresponding scheduling unit.
  • the foregoing processor may further execute the following program code: if there is an idle thread in any one of the scheduling units, obtain a current concurrent number of each of the sub-databases, where the current concurrent number is used to represent the sub-database The number of sub-tables that have been scheduled to the corresponding scheduling unit; according to the current concurrency number of each sub-database, the corresponding number of sub-tables are scheduled from the waiting queue of any one of the scheduling units; The number of sub-tables is transmitted concurrently to the target location.
  • the foregoing processor may further execute the following program code: sorting according to the current concurrent number of each sub-database, and determining a scheduling priority of the sub-tables belonging to different sub-libraries, wherein the current concurrent number of the sub-databases is more Low, the lower the current concurrency number, the higher the scheduling priority of the sub-tables; the number of idle threads determined according to the number of idle threads existing in any one of the scheduling units; the sub-table in the first sub-database is determined as the scheduling priority After the highest level of the table, the parts belonging to the first bank are scheduled from the waiting queue according to the number of schedules.
  • the foregoing processor may further execute the following program code: reading the number of the sub-tables belonging to the first sub-database in the waiting queue; determining whether the number of the sub-tables belonging to the first sub-database is greater than or equal to the number of scheduling; If it is greater than or equal to the number of scheduling, the step of scheduling the corresponding number of sub-tables from the waiting queue according to the scheduling quantity; if less than the scheduling quantity, scheduling the first sub-database and belonging to other sub-libraries from the waiting queue according to the scheduling quantity A sub-table, wherein the other sub-libraries are sub-libraries whose current concurrency is greater than the first sub-library.
  • n sub-tables may be retrieved from the plurality of sub-databases according to the pre-configured total concurrent granularity.
  • the n sub-tables that are retrieved are hashed to different scheduling units by hash allocation, and the sub-tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity.
  • the scheme can be based on the total concurrent granularity and The unit concurrency granularity splits the set of tasks to be transmitted to balance the task of parallel transfer of the sub-tables.
  • the task transmission based on the sub-database table can be realized by the solution provided by the embodiment of the present application, so that not only the task transmission of the sub-database sub-table is realized, but also, when the task of the sub-database sub-table is transmitted In the case of satisfying the pre-configured total concurrent granularity, the task of transferring the sub-tables is balanced, so that multiple tasks in the sub-division table can be balanced, the pressure of concurrently reading data is reduced, and the concurrent transmission efficiency is improved.
  • the solution of the foregoing embodiment provided by the present application solves the problem that the extracting end concurrently reads data from the sub-database sub-table DB in the prior-stage concurrent transmission of the task based on the sub-segment table, resulting in concurrent The technical problem of low efficiency of the task of transferring the sub-tables.
  • FIG. 14 is only for illustration, and the computer terminal can also be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, an applause computer, and a mobile Internet device (MID). Terminal equipment such as PAD.
  • Fig. 14 does not limit the structure of the above electronic device.
  • computer terminal 10 may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 14, or have a different configuration than that shown in FIG.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be used to save the program code executed by the task transfer method based on the split table according to the first embodiment.
  • the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
  • the storage medium is configured to store program code for performing the following steps: extracting a task set to be transmitted from the sub-database table, wherein the task set includes: a plurality of sub-libraries, and The sub-tables included in each sub-database; n sub-tables are retrieved from multiple sub-libraries according to the pre-configured total concurrent granularity, n is equal to the total concurrent granularity; n sub-tables to be retrieved are hash-allocated Hashing to different scheduling units, wherein the units of each scheduling unit are pre-configured to have the same granularity; the parts tables included in each scheduling unit are concurrently transmitted to the target location according to the unit concurrent granularity.
  • the storage medium is configured to store program code for performing the steps of: determining the number of scheduling units according to the total concurrent granularity and the unit concurrent granularity, and assigning a corresponding number to each scheduling unit.
  • the number, offset is the offset assigned to the corresponding scheduling unit by each sub-database.
  • the initial value is 0, the totalChannel is the total concurrent granularity, and the tgChannel is the unit concurrent granularity; wherein the n sub-tables are respectively calculated according to the hash.
  • the assigned value is hashed to the corresponding scheduling unit.
  • the storage medium is configured to store program code for performing the following steps: in the process of concurrently transmitting the sub-tables included in each scheduling unit to the target location according to the unit concurrent granularity, in real time Obtain m sub-tables other than n sub-tables in multiple sub-databases, and hash m sub-tables to different scheduling units by hash allocation according to total concurrent granularity, wherein if the scheduling unit threads are In the case of full, the sub-table allocated to the corresponding scheduling unit is placed in the waiting queue of the corresponding scheduling unit, and after at least one thread of the scheduling unit is released, the sub-table in the waiting queue is scheduled to the corresponding scheduling. In the unit.
  • the storage medium is configured to store program code for performing the following steps: in the case where there is an idle thread in any one of the scheduling units, the current concurrency number of each sub-database is obtained, wherein The current concurrency number is used to represent the number of sub-tables in the sub-database that have been scheduled to the corresponding scheduling unit; and according to the current concurrency number of each sub-database, the corresponding number of sub-tables are scheduled from the waiting queue of any one scheduling unit; A corresponding number of sub-tables are scheduled to be concurrently transmitted from the corresponding waiting queue to the target location.
  • the storage medium is configured to store program code for performing the following steps: sorting according to the current concurrency number of each sub-database, and determining scheduling priorities of the sub-tables belonging to different sub-libraries, The lower the current concurrency number of the sub-database, the lower the scheduling priority of the sub-tables in the sub-database with the lower current concurrency number; the number of idle threads determined according to the number of idle threads existing in any one of the scheduling units; After the sub-table in the sub-database is the sub-table with the highest scheduling priority, the sub-table belonging to the first sub-library is scheduled from the waiting queue according to the number of scheduling.
  • the storage medium is configured to store program code for performing the following steps: reading the number of the sub-tables belonging to the first sub-database in the waiting queue; determining the sub-table belonging to the first sub-library Whether the number is greater than or equal to the number of scheduling; wherein, if it is greater than or equal to the number of scheduling, the step of scheduling the corresponding number of sub-tables from the waiting queue according to the scheduling quantity; if less than the scheduling quantity, scheduling from the waiting queue according to the scheduling quantity belongs to the first A sub-library and a sub-table belonging to other sub-libraries, wherein the other sub-libraries are sub-libraries whose current concurrency number is greater than the first sub-library.
  • the storage medium is configured to store: a sub-table tag sub-library identifier hashed to different scheduling units, and the sub-library identifier is used to represent the sub-library corresponding to the original table.
  • the disclosed technical contents may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

Abstract

本发明公开了一种基于分库分表的任务传输方法、装置及系统。其中,该方法包括:从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表;按照预先配置的总并发粒度从多个分库中调取n个分表,n等于总并发粒度;将调取到的n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同;将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置。本发明解决了现有技术中在基于分库分表的任务进行并发传输时,抽取端从分库分表DB中并发读取数据压力过大,导致并发传输分库分表的任务效率低的技术问题。

Description

基于分库分表的任务传输方法、装置及系统
本申请要求2015年12月07日递交的申请号为201510888403.9、发明名称为“基于分库分表的任务传输方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据库技术领域,具体而言,涉及一种基于分库分表的任务传输方法、装置及系统。
背景技术
随着互联网应用的广泛普及,对于一个大型的互联网应用,每天有几十亿的页面访问量,大量数据存储在数据库中。在大数据平台处理数据之前,需要把数据导入大数据平台的存储系统,一般采用ETL(数据提取、转换和加载,Extract-Transform-Load的缩写)技术。
在ETL技术中,数据同步作为数据仓库的出口和入口,扮演着非常重要的角色,特别是离线数据同步,往往一个任务就需要同步上百GB甚至TB量级的数据,这对数据同步工具的稳定性要求非常高,同时因为抽取端数据库(DB,Database)支持并发读取,对抽取端和写入端数据库DB的压力也很大。正是由于大量数据的同步,使得人们可以更方便的关注到原本忽略的数据,使得数据同步过程中产生了更多的长尾任务,导致抽取端DB的压力和长尾渐渐成为数据同步的瓶颈。
现有技术中的解决方案都是针对单库的(即一个任务只抽取一个库),即在底层同步工具上层做一层服务,在服务中进行调度控制,避免同时对一个库抽取的并发任务数过多。
随着数据量的增加,单库已经无法满足大数据量的需求,需要将单个数据库进行拆分成多库多表存储数据。针对分库分表的单个任务而言,由于分库分表任务会对多个分库进行抽取,分库的抽取策略直接决定着任务抽取速度,使得上述针对单库的解决方案不再适用。
针对现有技术中在基于分库分表的任务进行并发传输时,抽取端从分库分表DB中并发读取数据压力过大,导致并发传输分库分表的任务效率低的技术问题,目前尚未提出有效的解决方案。
发明内容
本发明实施例提供了一种基于分库分表的任务传输方法、装置及系统,以至少解决现有技术中在基于分库分表的任务进行并发传输时,抽取端从分库分表DB中并发读取数据压力过大,导致并发传输分库分表的任务效率低的技术问题。
根据本发明实施例的一个方面,提供了一种基于分库分表的任务传输方法,包括:从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表;按照预先配置的总并发粒度从多个分库中调取n个分表,n等于总并发粒度;将调取到的n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同;将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置。
根据本发明实施例的另一方面,还提供了一种基于分库分表的任务传输装置,包括:抽取模块,用于从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表;调取模块,用于按照预先配置的总并发粒度从多个分库中共调取n个分表,n等于总并发粒度;处理模块,用于将调取到的n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同;并发模块,用于将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置。
根据本发明实施例的另一方面,还提供了一种基于分库分表的任务传输系统包括:源数据终端,用于存储分库分表;调度终端,与源数据终端通信,用于从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表,并按照预先配置的总并发粒度从多个分库中共调取n个分表,n等于总并发粒度,在将调取到的n个分表采用散列分配方式散列至不同的调度单元之后,将每个调度单元中包含的分表按照单元并发粒度并发传输,其中,预先配置每个调度单元的单元并发粒度相同;目标终端,与调度终端通信,用于接收调度终端并发传输来的任务集合。
在本发明实施例中,如果希望减小分库分表任务的并行传输,可以在抽取待传输的任务集合之后,按照预先配置的总并发粒度从多个分库中调取n个分表,通过将调取到的n个分表采用散列分配方式散列至不同的调度单元,将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置,本方案可以根据总并发粒度和单元并发粒度对待传输的任务集合进行拆分,以均衡并行传输分库分表的任务。
容易注意到,由于在抽取待传输的任务集合之后,需要在按照预先配置的总并发粒度从多个分库中调取n个分表,并通过将调取到的n个分表采用散列分配方式散列至不 同的调度单元,就可以根据散列后的分表并行发送任务集合,达到均衡并发传输分库分表的任务,因此,通过本申请实施例所提供的方案,可以实现基于分库分表的任务传输,这样不仅实现了分库分表的任务传输,而且,在传输分库分表的任务时,在满足预先配置的总并发粒度的情况下,均衡传输分库分表的任务,因此,可以均衡分库分表中的多个任务,降低并发读取数据的压力,提高并发传输效率。
由此,本申请提供的上述实施例的方案解决了现有技术中在基于分库分表的任务进行并发传输时,抽取端从分库分表DB中并发读取数据压力过大,导致并发传输分库分表的任务效率低的技术问题。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本申请实施例的一种基于分库分表的任务传输方法的计算机终端的硬件结构框图;
图2是根据本申请实施例的一种基于分库分表的任务传输方法的流程图;
图3是根据本申请实施例的分库分表对应调度管理单元的示意图;
图4是根据本申请实施例的分库分表散列至调度管理单元的示意图;
图5是根据本申请实施例的调度管理单元获取最优解的示意图;
图6是根据本申请实施例的一种可选的基于分库分表的任务传输方法的交互流程图;
图7是根据本申请实施例的一种基于分库分表的任务传输装置的示意图;
图8是根据本申请实施例的一种可选的基于分库分表的任务传输装置的示意图;
图9是根据本申请实施例的一种可选的基于分库分表的任务传输装置的示意图;
图10是根据本申请实施例的一种可选的基于分库分表的任务传输装置的示意图;
图11是根据本申请实施例的一种可选的基于分库分表的任务传输装置的示意图;
图12是根据本申请实施例的一种可选的基于分库分表的任务传输装置的示意图;
图13是根据本申请实施例的一种基于分库分表的任务传输系统的示意图;以及
图14是根据本申请实施例的一种计算机终端的结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
首先,在对本申请实施例进行描述的过程中出现的部分名词或术语适用于如下解释:
ETL:Extract-Transform-Load简称,用来描述将数据从来源端经过抽取(extract)、转换(transform)、加载(load)至目的端的过程。ETL是构建数据仓库的重要一环,用户从数据源抽取出所需的数据,经过数据清洗,最终按照预先定义好的数据仓库模块,将数据假造到数据仓库中取。
分库分表:把原本存储于一个库的数据分块存储到多个库上,把原本存储于一个表的数据分块存储到多个表上。分库分表有垂直切分和水平切分两种,其中,垂直切分是指将表按照功能模块、关系密切程度划分出来,部署到不同的库上,例如,可以建立商品数据库存储商品定义表,建立用户数据库存储用户数据表等。水平切分是指当一个表中的数据量过大时,可以把该表的数据按照某种规则进行划分,然后存储到多个结构相同的表和不同的库上。
DB:(Database),数据库,是按照数据结构来组织、存储和管理的数据仓库。
实施例1
根据本发明实施例,还提供了一种基于分库分表的任务传输方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请实施例一所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在计算机终端上为例,图1是根据本申请实施例的一种基于分库分表的任务传输方法的计算机终端的硬件结构框图。如图1所示,计算机终端10可以包括 一个或多个(图中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输模块106。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,计算机终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可用于存储应用软件的软件程序以及模块,如本发明实施例中的基于分库分表的任务传输方法对应的程序指令/模块,处理器102通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的应用程序的漏洞检测方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
在上述运行环境下,本申请提供了如图2所示的基于分库分表的任务传输方法。图2是根据本申请实施例的一种基于分库分表的任务传输方法的流程图,如图2所示的方法可以包括如下步骤:
步骤S21,从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表。
可选地,一个任务集合可以包括多个分库,每个分库可以包含多个分表,且每个分表可以记录有多条数据信息,该数据信息可以是注册用户数据信息、访问网页数据信息、购买商品数据信息等等。从分库分表中抽取待传输的任务集合之后,可以根据抽取顺序进行编号。
在一种可选的方案中,在从分库分表中抽取待传输的任务集合之前,可以获取配置文件,该配置文件中记录待传输的多个分库的名称以及每个分库中包含的分表的名称。在读取配置文件之后,按照读取到的多个分库的名称以及多个分表的名称,从源数据库 的分库分表中抽取待传输的多个分库以及每个分库包含的分表。
结合图3所示,在一个可选的实施例中,任务集合可以包含3个分库,分别是分库A、分库B和分库C,分库A可以包含4个分表,分别是分表T1、分表T2、分表T3和分表T4,分库B可以包含2个分表,分别是分表T1和分表T2,分库C可以包含4个分表,分别是分表T1、分表T2、分表T3和分表T4。
步骤S23,按照预先配置的总并发粒度从多个分库中调取n个分表,n等于总并发粒度。
可选地,总并发粒度(totalChannel)表示同时并发的任务个数,可以按照预先配置的总并发粒度,依次从任务集合中顺序抽取满足总并发粒度数的n个分表。上述预先配置的总并发粒度可以按照用户的实际需求进行配置,也可以根据数据库的并发能力进行配置。
仍以上述抽取的任务集合为例,在一种可选的方案中,预先配置的总并发粒度可以为9,按照总并发粒度依次从分库A中提取分表T1、分表T2、分表T3和分表T4,从分库B中提取分表T1和分表T2,从分库C中提取分表T1、分表T2和分表T3。提取的分表个数与总并发粒度相同。
步骤S25,将调取到的n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同。
可选地,在按照预先配置的总并发粒度从多个分库中调取n个分表之后,按照散列分配方式,将n个分表依次散列至不同的调度单元中。上述预先配置的每个调度单元的单元并发粒度(tgChannel)用于表示每个调度单元同时并发的任务个数,单元并发粒度小于等于总并发粒度。
如图4所示,在一种可选的方案中,上述调度单元可以是调度管理模块(TG,taskGroup的简写),分库分表任务可以包括分库A、分库B和分库C,每个分库都可以包括n个分表,每一个分表都是一个待传输的任务,将3n个分表散列到n个调度管理单元中,每个调度管理单元可以包括三个分表。
仍以上述抽取的任务集合为例,在一种可选的方案中,预先配置每个调度单元的单元并发粒度可以为3,将提取的9个分表散列至不同的调度单元,调度单元1可以包括:AT3、BT2和CT3,调度单元2可以包括:AT1、AT4和CT1,调度单元3可以包括:AT2、BT1和CT2。
需要说明的是,上述散列方式将相同分库中的分表散列在不同的调度单元中,尽量 满足不同分库中的多个分表均衡并发,减小抽取段DB的并发压力。
步骤S27,将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置。
可选地,上述目标位置可以是目标数据库,用于存储提取的数据信息。将调取到的n个分表采用散列分配方式散列至不同的调度单元之后,每个调度单元将散列后的n个分表按照单元并发粒度并发传输至目标数据库。
本申请上述实施例一公开的方案中,如果希望减小分库分表任务的并行传输,可以在抽取待传输的任务集合之后,按照预先配置的总并发粒度从多个分库中调取n个分表,通过将调取到的n个分表采用散列分配方式散列至不同的调度单元,将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置,本方案可以根据总并发粒度和单元并发粒度对待传输的任务集合进行拆分,以均衡并行传输分库分表的任务。
容易注意到,由于在抽取待传输的任务集合之后,需要在按照预先配置的总并发粒度从多个分库中调取n个分表,并通过将调取到的n个分表采用散列分配方式散列至不同的调度单元,就可以根据散列后的分表并行发送任务集合,达到均衡并发传输分库分表的任务,因此,通过本申请实施例所提供的方案,可以实现基于分库分表的任务传输,这样不仅实现了分库分表的任务传输,而且,在传输分库分表的任务时,在满足预先配置的总并发粒度的情况下,均衡传输分库分表的任务,因此,可以均衡分库分表中的多个任务,降低并发读取数据的压力,提高并发传输效率。
由此,本申请提供的上述实施例一的方案解决了现有技术中在基于分库分表的任务进行并发传输时,抽取端从分库分表DB中并发读取数据压力过大,导致并发传输分库分表的任务效率低的技术问题。
在本申请上述实施例中,在步骤S25,将调取到的n个分表采用散列分配方式散列至不同的调度单元之前,方法还包括如下步骤S251和步骤S255:
步骤S251,根据总并发粒度和单元并发粒度,确定调度单元的数量,并对每个调度单元分配对应的编号。
可选地,在按照预先配置的总并发粒度从多个分库中调取n个分表之后,可以根据预先配置的总并发粒度和单元并发粒度,根据公式tgCount=totalChannel/tgChannel得到调度单元的数量,然后对每个调度单元分配对应的编号。
结合图3所示,在一种可选的方案中,总并发粒度T可以为9,单元并发粒度t可以为3,则调度单元数量tgCount=totalChannel/tgChannel=3,3个调度单元的编号分别为TG0、TG1和TG2。
步骤S253,通过如下公式计算得到任意一个分库中的每个分表Ti的散列分配值,其中,散列分配值用于表征分表Ti所散列至对应的调度单元的编号Tpos:
Tpos=(TCount+offset)%tgCount,tgCount=totalChannel/tgChannel,其中,TCount为任意一个分库中每个分表Ti的编号,offset为每个分库向对应的调度单元分配的偏移量,初始值为0,totalChannel为总并发粒度,tgChannel为单元并发粒度。
可选地,任意一个分库向对应的调度单元分配的偏移量可以为该分库对应前一个分库最后一个分表Tn的lastOffset,例如A分库最后一个分表T4de的lastOffset为1,则B分库的偏移量BOffset=lastOffset=1,在计算得到每个调度单元的数量之后,可以根据公式Tpos=(TCount+offset)%tgCount,对每个分库中每个分表对应的调度单元分配的编号TGi。
结合图3所示,在一种可选的实施例中,分库A、分库B和分库C的编号分别为0,1,和2,分库A的偏移量Aoffset=0,分表AT1对应的调度单元的编号为TGi=(1+Aoffset)%3=1,依次计算出分库A中其他分表对应的调度单元的编号,lastOffset=4%3=1,分库B的偏移量Boffset=lastOffset=1,则分表BT1对应的调度单元的编号为TGi=(1+Boffset)%3=2,依次可以得到3个分库中每个分库对应的调度单元的编号。
步骤S255,n个分表分别按照计算得到的散列分配值散列至对应的调度单元。
可选地,在计算得到n个分表对应的调度单元分配的编号TGi之后,将n个分表散列至对应的调度单元。
结合图3所示,在一种可选的实施例中,基于步骤S253的计算方法,可以将分库A中的分表T1、分表T2、分表T3和分表T4,分库B中的分表T1和分表T2,分库C中的分表T1、分表T2、分表T3和分表T4散列到对应的调度单元中,散列结果可以是分表AT3、分表BT2和分表CT3对应散列至调度单元TG0,分表AT1、分表AT4、分表CT1和分表CT4对应散列至调度单元TG1,分表AT2、分表BT1和分表CT2对应散列至调度单元TG2。
由上可知,在本申请上述实施例中,按照预先配置的总并发粒度从多个分库中调取n个分表之后,可以根据预先配置的总并发粒度和单元并发粒度,确定调度单元的数量,并对每个调度单元分配对应的编号,并通过如下公式计算得到每个分库中的每个分表所对应的调度单元的编号,实现将分库分表的任务进行切分的目的,将分库分表的任务均匀的分散到不同的调度单元中,避免现有技术中分库分表的任务顺序传输造成的资源浪费。
在本申请上述实施例中,在执行步骤S27,将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置的过程中,步骤S22701,实时获取多个分库中除n个分表之外的m个分表,步骤S2703,按照总并发粒度将m个分表采用散列分配方式散列至不同的调度单元,步骤S2705,如果调度单元的线程被占满的情况下,将分配至对应的调度单元的分表放置在对应的调度单元的等待队列中,在调度单元的至少一个线程被释放之后,将等待队列中的分表调度至对应的调度单元中。
可选地,将按照预先配置的总并发粒度从所述多个分库中调取n个分表并发传输至目标位置时,将多个分库中未传输的多个分表散列至不同的调度单元,如果调度单元的线程被占满,则将散列后的分表放置在等待队列中,直到调度单元中任意一个线程被释放之后,才能被调度单元并发传输至目标位置,如果调度单元的线程未被占满,则散列后的分表直接被调度单元并发传输至目标位置。
仍旧结合图3所示,在一种可选的方案中,预先配置的总并发粒度可以为9,按照总并发粒度提取的分表为分表AT1、分表AT2、分表AT3、分表AT4、分表BT1、分表BT2、分表CT1、分表CT2和分表CT3。此时,三个分库中剩余的未被传输的分表为分表CT4,经过散列分配,该分表CT4应该被分配至调度单元TG1,由于调度单元TG1线程被占满,因此先将该分表CT4进入对调度单元TG1的等待队列,如图3中虚线所示。等待分表AT1、AT4和CT1中任意一个分表被传输至目标位置,线程释放之后,等待队列中的分表CT4被调度至调度单元TG1中,并发传输至目标位置。
由上可知,在本申请上述实施例中,在n个分表并发传输至目标位置的过程中,获取多个分库中其余的m个分表,在散列分配之后分配给对应的调度单元,如果调度单元线程被占满,则进入等待队列等待调度单元的至少一个线程被释放,实现切分分配分表的目的,在线程全部被占用的情况下进入等待队列,在至少一个线程释放后进入调度单元,保证并发传输分库分表的任务时资源最大化利用。
需要说明的是,如果相同分库不同分表的表的记录数基本相同,抽取时间相差不大,那么通过上述实施例就可以达到目的,使分库分表任务的DB抽取压力均衡,但是这种任务只存在于理想情况下,大多数任务相同分库不同分表的记录数不会保持一致,抽取时间也无法预测,这时候需要调度模块去做一个控制调度,选出最优解。
在本申请上述实施例中,在步骤S27,将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置之后,上述方法还包括步骤S271至步骤S275:
步骤S271,在任意一个调度单元中存在空闲的线程的情况下,获取每个分库的当前 并发数,其中,当前并发数用于表征分库中已经被调度至对应的调度单元中的分表数量。
可选地,上述每个分库的当前并发数可以保存在线程并发管理器(TC-MGR,Thread-Concurrency-Manager的简写)中,在启动线程时,将调度模块中线程对应的分库的并发数加1,如果线程执行完毕,则将线程并发管理器中对应的分库的并发数减1,这两个过程标记为同步(synchronized),线程之间互斥访问,为方便将上述两个成为启动通道(holdChannel)和释放通道(releaseChannel)。采用贪心算法,在需要启动线程时,获取调度模块中的每个分库的当前并发数。
如图5所示,在一种可选的方案中,线程并发管理器中记录了分库A、分库B和分库C的当前并发数TAC、TAB和TAC,分库A的当前并发数TAC=X,分库B的当前并发数TBC=Y,分库C的当前并发数TCC=Z,调度管理单元(即上述的调度单元)包括3个通道(即上述的线程),通道1、通道2和通道3。调度单元TG0在3个线程中有至少一个线程空闲的情况下,例如线程1空闲,获取调度模块中每个分库的当前并发数。
步骤S273,根据每个分库的当前并发数,从任意一个调度单元的等待队列中调度对应数量的分表。
可选地,当线程被释放之后,获取每个分库的当前并发数,并根据每个分库的当前并发数调度单元的等待队列中相应数量的分表,例如,如果调度单元释放了两个线程,则根据每个分库的当前并发数,调度该调度单元的等待队列中两个分表。
在一种可选的方案中,如果线程并发管理器中记录的当下分库A的当前并发数可以为3,分库B的当前并发数可以为1,分库C的当前并发数可以为2,此时如果需要启动的线程个数为2,则可以从调度单元的等待队列中调度两个分库B的分表,如果等待队列中只有一个分库B的分表,则进一步调度等待队列中一个分库C的分表。如果调度单元TG1的等待队列中保存了分库B的一个分表T4和其他分库的分表,此时,可以根据每个分库的并发数的比例,以及线程被释放,优先选则调取分表T4进入调度单元TG1,如果TG1的线程被全部释放,则可以调取等待队列中除T4以外的其他分表。其他任意一个调度单元的调取规则同理。
步骤S275,将从对应的等待队列中调度对应数量的分表并发传输至目标位置。
可选地,在从任意一个调度单元的等待队列中调度对应数量的分表之后,将该对应数量的分表并发传输至目标位置。
由上可知,在本申请上述实施例中,在任意一个调度单元中存在空闲的线程的情况 下,获取每个分库的当前并发数,并根据每个分库的当前并发数,从任意一个调度单元的等待队列中调度对应数量的分表,并发传输至目标位置,采用贪心算法,每当调度单元有通道线程饥饿时就先做启动通道,从而实现从局部最优到全局最优,进一步实现均衡并发传输分库分表的任务。
在本申请上述实施例中,步骤S273,根据每个分库的当前并发数,从任意一个调度单元的等待队列中调度对应数量的分表,包括如下步骤S2731至步骤S2735:
步骤S2731,按照每个分库的当前并发数进行排序,确定属于不同分库的分表的调度优先级,其中,分库的当前并发数越低,当前并发数越低的分库中的分表的调度优先级越高。
可选地,获取到每个分库的当前并发数之后,将每个分库按照当前并发数进行升序排序,当前并发数最低的分库中的分表优先级最高,当前并发数最高的分库中的分配的优先级最低,确定调度单元的等待队列中每个分库分表的调度优先级。
步骤S2733,根据任意一个调度单元中存在的空闲线程数量,确定调度数量。
可选地,在空闲线程数量小于等于调度单元的等待队列中的分表个数时,确定调度数量为空闲线程数量,在空闲线程数量大于调度单元的等待队列中的分表个数时,确定调度数量为调度单元的等待队列中的分表个数。
步骤S2735,在确定第一分库中的分表为调度优先级最高的分表之后,按照调度数量从等待队列中调度属于第一分库的分表。
可选地,在确定属于不同分库的分表的调度优先级和调度数量之后,从调度单元的等待队列中调度符合调度数量和调度优先级的分表。
在一种可选的方案中,分库A的当前并发数可以为3,分库B的当前并发数可以为1,分库C的当前并发数可以为2,通过分析三个分库的并发度的比例可以确定分库B的当前并发数最低,表示在之前的并发过程中,分库B中分表的分发或同步效率最低,因此分库B的分表的调度优先级最高,系统需要尽快处理分库B中的分表,而分库A的当前并发数最高,因此分库A的分表的调度优先级最低。调度单元的等待队列中有分表AT5,分表AT6,分表BT3和分表CT5,在有一个线程空闲的情况下,按照调度优先级的分析结果,此时调度单元从等待队列中调度分表BT3进行同步。
由上可知,在本申请上述实施例中,获取每个分库的当前并发数之后,根据排序结果确定调度单元中每个分表的调度优先级,并根据空闲线程数从调度单元的等待队列中调度对应调度优先级和对应数量的分表,从而实现在分表记录数均不相同的情况下,根 据每个分库的当前并发数选取最优解,使分库分表任务的抽取压力均衡。
需要说明的是,为了获取每个分库的当前并发数之后,根据排序结果确定调度单元中每个分表的调度优先级,线程并发管理其器中还需要保存每个调度管理单元的分库使用情况。
在本申请上述实施例中,在步骤S2735,按照调度数量从等待队列中调度对应数量的分表之前,上述方法还包括步骤S27351至步骤S27353:
步骤S27351,读取等待队列中属于第一分库的分表的数量。
步骤S27353,判断属于第一分库的分表的数量是否大于等于调度数量。
可选地,如果判断属于第一分库的分表的数量大于等于调度数量,则进入步骤S2735,按照调度数量从等待队列中调度对应数量的分表;如果判断属于第一分库的分表的数量小于调度数量,则按照调度数量从等待队列中调度属于第一分库和属于其他分库的分表,其中,其他分库为当前并发数大于第一分库的分库。
在一种可选的方案中,分库A的当前并发数可以为3,分库B的当前并发数可以为1,分库C的当前并发数可以为2,通过分析三个分库的并发度的比例可以确定分库B的当前并发数最低,因此分库B的分表的调度优先级最高,分库A的当前并发数最高,因此分库A的分表的调度优先级最低,分库C的当前并发数位于两者之间,因此分库C的分表的调度优先级也位于两者之间。调度单元TG0的等待队列中有分表AT5,分表AT6,分表BT3和分表CT5,按照上述一种可选方案所确定的调度优先级的调度规则,可以读取等待队列中属于分库B的分表数量为1,属于分库C的分表数量为1,属于分库A的分表数量为2,在调度单元TG0有两个线程空闲的情况下,调度单元TG0从等待队列中调度分表BT3和CT5。
在另一种可选的方案中,分库A的当前并发数可以为3,分库B的当前并发数可以为1,分库C的当前并发数可以为2,通过分析三个分库的并发度的比例可以确定分库B的当前并发数最低,因此分库B的分表的调度优先级最高,分库A的当前并发数最高,因此分库A的分表的调度优先级最低,分库C的当前并发数位于两者之间,因此分库C的分表的调度优先级也位于两者之间。调度单元的等待队列中有分表AT5,分表AT6,分表BT3和分表CT5。按照上述另一种可选方案所确定的调度优先级的调度规则,可以读取等待队列中属于分库B的分表数量为1,属于分库C的分表数量为1,属于分库A的分表数量为2,在调度单元TG0有三个线程空闲的情况下,调度单元TG0从等待队列中调度分表BT3、分表CT5和分表AT5。
在又一种可选的方案中,分库A的当前并发数可以为3,分库B的当前并发数可以为2,分库C的当前并发数可以为2,通过分析三个分库的并发度的比例可以确定分库B和分库C的当前并发数最低,因此分库B和分库C的分表的调度优先级最高,分库A的当前并发数最高,因此分库A的分表的调度优先级最低。调度单元的等待队列中有分表AT5,分表AT6,分表BT3,分表BT4,分表BT5,分表CT5和分表CT6。按照上述又一种可选方案所确定的调度优先级的调度规则,可以读取等待队列中属于分库B的分表数量为3,属于分库C的分表数量为2,属于分库A的分表数量为2,在调度单元TG0有三个线程空闲的情况下,调度单元TG0从等待队列中调度分表BT3、分表BT4和分表CT5。
由上可知,在本申请上述实施例中,在判断读取到的属于第一分库的分表的数量大于等于调度数量的情况下,按照调度数量从等待队列中调度对应数量的分表;在判断读取到的属于第一分库的分表的数量小于调度数量的情况下,在判断读取到的属于第一分库的分表的数量大于等于调度数量的情况下,按照调度数量从等待队列中调度属于第一分库和属于其他分库的分表,从而实现动态调度均衡,降低分库分表任务对抽取段DB的压力,减少长尾任务。
根据本申请上述实施例,对散列至不同的所述调度单元的分表标记分库标识,所述分库标识用于表征所述分表原始对应的分库。
下面结合图4、图5和图6详细介绍本申请的一种优选实施例。
如图4,图5和图6所示,以任务集合包含多个分库和多个分表为应用场景,提供了一种可选的基于分库分表的任务传输方法,该方法可以包括如下步骤S61至步骤S67:
步骤S61,获取分库分表的任务集合。
可选地,为了获取分库分表的任务集合,调度终端133从配置文件中读取分库分表的名称,并从源数据终端131的分库分表中提取待传输的任务集合,任务集合中包括多个分库以及分库对应的分表。
在一种可选的方案中,配置文件中包含分库A、分库B和分库C,分库A可以包含4个分表,分别是分表T1、分表T2、分表T3和分表T4,分库B可以包含2个分表,分别是分表T1和分表T2,分库C可以包含4个分表,分别是分表T1、分表T2、分表T3和分表T4。
步骤S63,将分库分表的任务集合散列至多个调度管理单元中。
可选地,将分库分表的任务切分成n个单个分表粒度的任务(即上述的n个分表), 并按照用户配置的总并发粒度和单个调度管理单元(即上述的调度单元)的并发粒度(即上述的预先配置的总并发粒度和单元并发粒度),将任务按照分库均匀的分配到调度管理单元中,并对每个任务属于哪个分库打标。
在一种可选的方案中,调度终端133按照用户配置的总并发粒度从多个分库中调取n个分表,按照用户配置的总并发粒度和单个调度管理单元的并发粒度确定调度管理单元的个数,并对每个调度管理单元进行编号;计算n个单个分表粒度的任务中每个任务对应的调度管理单元的编号,并将n个单个分表粒度的任务散列分配到对应调度管理单元中。
在该实施例中,上述步骤S63的实现方式与本申请上述实施例中的步骤S23和步骤S25的实现方式一致,在此不再赘述。
步骤S65,多个调度单元从调度模块中获取最优解。
可选地,最优解等于当前并发数最小的N个分库,且这N个分库在当前TG中还存在未消费的任务(结果按照并发数从小到大排列)(即上述的多个分库中除n个分表之外的m个分表),N为保持通道的请求参数,一般为调度管理单元中当前饥饿的通道数。
在一种可选的方案中,调度模块中保存了分库分表的任务集合中每个分库的当前并发数和每个调度单元对每个分库的使用情况,调度终端133从调度模块中获取每个分库的当前并发数,并将每个分库的当前的并发数进行排序确定每个分库的调度优先级,读取调度单元的空闲线程数量和当前并发数最小的分库中未传输的分表的数量,在空闲线程数量大于分表的数量时,调取当前并发数最小的分库的分表和其他分库的分表;在空闲线程数量小于等于分表的数量时,调取当前并发数最小的分库的分表。
步骤S67,并发传输任务集合至目标位置。
可选地,按照用户配置的总并发粒度和单个调度管理单元的并发粒度,并发传输任务集合,在调度管理单元线程被占满的情况下,将未传输的任务放置在调度管理单元的等待队列中,直到调度管理单元存在空闲线程,则并行发送最优解。
在该实施例中,上述步骤S63的实现方式与本申请上述实施例中的S27的实现方式一致,在此不再赘述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定 是本发明所必须的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的基于分库分表的任务传输方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
实施例2
根据本发明实施例,还提供了一种用于实施上述基于分库分表的任务传输方法的基于分库分表的任务传输装置,如图7所示,该装置包括:抽取模块71,调取模块73,处理模块75和并发模块77。
其中,抽取模块71用于从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表。调取模块73用于按照预先配置的总并发粒度从多个分库中共调取n个分表,n等于总并发粒度。处理模块75用于将调取到的n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同。并发模块77用于将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置。
此处需要说明的是,上述抽取模块71,调取模块73,处理模块75和并发模块77对应于实施例一中的步骤S21至步骤S27,四个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。
本申请上述实施例二公开的方案中,如果希望减小分库分表任务的并行传输,可以在抽取待传输的任务集合之后,按照预先配置的总并发粒度从多个分库中调取n个分表,通过将调取到的n个分表采用散列分配方式散列至不同的调度单元,将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置,本方案可以根据总并发粒度和单元并发粒度对待传输的任务集合进行拆分,以均衡并行传输分库分表的任务。
容易注意到,由于在抽取待传输的任务集合之后,需要在按照预先配置的总并发粒度从多个分库中调取n个分表,并通过将调取到的n个分表采用散列分配方式散列至不同的调度单元,就可以根据散列后的分表并行发送任务集合,达到均衡并发传输分库分 表的任务,因此,通过本申请实施例所提供的方案,可以实现基于分库分表的任务传输,这样不仅实现了分库分表的任务传输,而且,在传输分库分表的任务时,在满足预先配置的总并发粒度的情况下,均衡传输分库分表的任务,因此,可以均衡分库分表中的多个任务,降低并发读取数据的压力,提高并发传输效率。
由此,本申请提供的上述实施例二的方案解决了现有技术中在基于分库分表的任务进行并发传输时,抽取端从分库分表DB中并发读取数据压力过大,导致并发传输分库分表的任务效率低的技术问题。
在本申请上述实施例中,如图8所示,上述装置还包括:第一确定模块81和计算模块83,其中,处理模块75包括:第一散列分配模块85。
其中,第一确定模块81用于根据总并发粒度和单元并发粒度,确定调度单元的数量,并对每个调度单元分配对应的编号。计算模块83用于通过如下公式计算得到任意一个分库中的每个分表Ti的散列分配值,其中,散列分配值用于表征分表Ti所散列至对应的调度单元的编号Tpos:Tpos=(TCount+offset)%tgCount,tgCount=totalChannel/tgChannel,其中,TCount为任意一个分库中每个分表Ti的编号,offset为每个分库向对应的调度单元分配的偏移量,初始值为0,totalChannel为总并发粒度,tgChannel为单元并发粒度。第一散列分配模块85用于n个分表分别按照计算得到的散列分配值散列至对应的调度单元。
此处需要说明的是,上述第一确定模块81,计算模块83和第一散列分配模块85对应于实施例一中的步骤S251至步骤S255,该模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。
在本申请上述实施例中,如图9所示,上述装置还包括:第一获取模块91,第二散列分配模块93和子处理模块95。
其中,第一获取模块91用于实时获取多个分库中除n个分表之外的m个分表。第二散列分配模块93用于按照总并发粒度将m个分表采用散列分配方式散列至不同的调度单元。子处理模块95用于如果调度单元的线程被占满的情况下,将分配至对应的调度单元的分表放置在对应的调度单元的等待队列中,在调度单元的至少一个线程被释放之后,将等待队列中的分表调度至对应的调度单元中。
此处需要说明的是,上述第一获取模块91,第二散列分配模块93和子处理模块95对应于实施例一中的步骤S2701至步骤S2705,该模块与对应的步骤所实现的实例和应 用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。
在本申请上述实施例中,如图10所示,上述装置还包括:第二获取模块101,调度模块103和传输模块105。
其中,第二获取模块101用于在任意一个调度单元中存在空闲的线程的情况下,获取每个分库的当前并发数,其中,当前并发数用于表征分库中已经被调度至对应的调度单元中的分表数量。调度模块103用于根据每个分库的当前并发数,从任意一个调度单元的等待队列中调度对应数量的分表。传输模块105用于将从对应的等待队列中调度对应数量的分表并发传输至目标位置。
此处需要说明的是,上述第二获取模块101,调度模块103和传输模块105对应于实施例一中的步骤S271至步骤S275,该模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。
在本申请上述实施例中,如图11所示,调度模块103包括:排序模块111,第二确定模块113和子调度模块115。
其中,排序模块111用于按照每个分库的当前并发数进行排序,确定属于不同分库的分表的调度优先级,其中,分库的当前并发数越低,当前并发数越低的分库中的分表的调度优先级越高。第二确定模块113用于根据任意一个调度单元中存在的空闲线程数量,确定调度数量。子调度模块115用于在确定第一分库中的分表为调度优先级最高的分表之后,按照调度数量从等待队列中调度属于第一分库的分表。
此处需要说明的是,上述排序模块111,第二确定模块113和子调度模块115对应于实施例一中的步骤S2731至步骤S2735,该模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。
在本申请上述实施例中,如图12所示,上述装置还包括:读取模块121,判断模块123,第一执行模块125和第二执行模块127。
其中,读取模块121用于读取等待队列中属于第一分库的分表的数量。判断模块123用于判断属于第一分库的分表的数量是否大于等于调度数量。第一执行模块125用于如果大于等于调度数量,则执行子调度模块的功能。第二执行模块127用于如果小于调度数量,则按照调度数量从等待队列中调度属于第一分库和属于其他分库的分表,其中, 其他分库为当前并发数大于第一分库的分库。
此处需要说明的是,上述读取模块121,判断模块123,第一执行模块125和第二执行模块127对应于实施例一中的步骤S27351至步骤S27353,该模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例一所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例一提供的计算机终端10中。
实施例3
根据本申请实施例,还提供了一种基于分库分表的任务传输系统,如图13所示,该系统可以包括:源数据终端131,调度终端133和目标终端135。
其中,源数据终端131用于存储分库分表。
调度终端133,与源数据终端131通信,用于从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表,并按照预先配置的总并发粒度从多个分库中共调取n个分表,n等于总并发粒度,在将调取到的n个分表采用散列分配方式散列至不同的调度单元之后,将每个调度单元中包含的分表按照单元并发粒度并发传输,其中,预先配置每个调度单元的单元并发粒度相同。
可选地,一个任务集合可以包括多个分库,每个分库可以包含多个分表,且每个分表可以记录有多条数据信息,该数据信息可以是注册用户数据信息、访问网页数据信息、购买商品数据信息等等。从分库分表中抽取待传输的任务集合之后,可以根据抽取顺序进行编号。
在一种可选的方案中,在从分库分表中抽取待传输的任务集合之前,可以获取配置文件,该配置文件中记录带传输的多个分库的名称以及每个分库中包含的分表的名称。在读取配置文件之后,按照读取到的多个分库的名称以及多个分表的名称,从源数据库的分库分表中抽取带传输的多个分库以及每个分库包含的分表。
结合图3所示,在一个可选的实施例中,任务集合可以包含3个分库,分别是分库A、分库B和分库C,分库A可以包含4个分表,分别是分表T1、分表T2、分表T3和分表T4,分库B可以包含2个分表,分别是分表T1和分表T2,分库C可以包含4个分表,分别是分表T1、分表T2、分表T3和分表T4。
可选地,总开发粒度(totalChannel)表示同时并发的任务个数,可以按照预先配置的总并发粒度,依次从任务集合中顺序抽取满足总开发粒度数的n个分表。上述预先配置的总并发粒度可以按照用户的实际需求进行配置,也可以根据数据库的并发能力进行配置。
仍以上述抽取的任务集合为例,在一种可选的方案中,预先配置的总开发粒度可以为9,按照总开发粒度依次从分库A中提取分表T1、分表T2、分表T3和分表T4,从分库B中提取分表T1和分表T2,从分库C中提取分表T1、分表T2和分表T3。提取的分表个数与总开发粒度相同。
可选地,在按照预先配置的总开发粒度从多个分库中调取n个分表之后,按照散列分配方式,将n个分表依次散列至不同的调度单元中。上述预先配置的每个调度单元的单元并发粒度(tgChannel)用于表示每个调度单元同时并发的任务个数,单元并发粒度小于等于总并发粒度。
如图4所示,在一种可选的方案中,上述调度单元可以是调度管理模块(TG,taskGroup的简写),分库分表任务可以包括分库A、分库B和分库C,每个分库都可以包括n个分表,每一个分表都是一个待传输的任务,将3n个分表散列到n个调度管理单元中,每个调度管理单元可以包括三个分表。
仍以上述抽取的任务集合为例,在一种可选的方案中,预先配置每个调度单元的单元并发粒度可以为3,将提取的9个分表散列至不同的调度单元,调度单元1可以包括:AT3、BT2和CT3,调度单元2可以包括:AT1、AT4和CT1,调度单元3可以包括:AT2、BT1和CT2。
需要说明的是,上述散列方式将相同分库中的分表散列在不同的调度单元中,尽量满足不同分库中的多个分表均衡并发,减小抽取段DB的并发压力。
目标终端135,与调度终端133通信,用于接收调度终端并发传输来的任务集合。
可选地,上述目标终端可以是目标数据库,用于存储提取的数据信息。将调取到的n个分表采用散列分配方式散列至不同的调度单元之后,每个调度单元将散列后的n个分表按照单元并发粒度并发传输至目标数据库。
本申请上述实施例三公开的方案中,如果希望减小分库分表任务的并行传输,可以在抽取待传输的任务集合之后,按照预先配置的总并发粒度从多个分库中调取n个分表,通过将调取到的n个分表采用散列分配方式散列至不同的调度单元,将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置,本方案可以根据总并发粒度和单元并发粒度对待传输的任务集合进行拆分,以均衡并行传输分库分表的任务。
容易注意到,由于在抽取待传输的任务集合之后,需要在按照预先配置的总并发粒度从多个分库中调取n个分表,并通过将调取到的n个分表采用散列分配方式散列至不同的调度单元,就可以根据散列后的分表并行发送任务集合,达到均衡并发传输分库分 表的任务,因此,通过本申请实施例所提供的方案,可以实现基于分库分表的任务传输,这样不仅实现了分库分表的任务传输,而且,在传输分库分表的任务时,在满足预先配置的总并发粒度的情况下,均衡传输分库分表的任务,因此,可以均衡分库分表中的多个任务,降低并发读取数据的压力,提高并发传输效率。
由此,本申请提供的上述实施例三的方案解决了现有技术中在基于分库分表的任务进行并发传输时,抽取端从分库分表DB中并发读取数据压力过大,导致并发传输分库分表的任务效率低的技术问题。
在本申请提供的一种可选实施例中,上述调度终端133还用于根据总并发粒度和单元并发粒度,确定调度单元的数量,并对每个调度单元分配对应的编号,通过如下公式计算得到任意一个分库中的每个分表Ti的散列分配值,其中,散列分配值用于表征分表Ti所散列至对应的调度单元的编号Tpos:Tpos=(TCount+offset)%tgCount,tgCount=totalChannel/tgChannel,其中,TCount为任意一个分库中每个分表Ti的编号,offset为每个分库向对应的调度单元分配的偏移量,初始值为0,totalChannel为总并发粒度,tgChannel为单元并发粒度,并将n个分表分别按照计算得到的散列分配值散列至对应的调度单元。
可选地,在按照预先配置的总并发粒度从多个分库中调取n个分表之后,可以根据预先配置的总并发粒度和单元并发粒度,根据公式tgCount=totalChannel/tgChannel得到调度单元的数量,然后对每个调度单元分配对应的编号。
结合图3所示,在一种可选的方案中,总并发粒度T可以为9,单元并发粒度t可以为3,则调度单元数量tgCount=totalChannel/tgChannel=3,3个调度单元的编号分别为TG0、TG1和TG2。
可选地,任意一个分库向对应的调度单元分配的偏移量可以为该分库对应前一个分库最后一个分表Tn的lastOffset,例如A分库最后一个分表T4de的lastOffset为1,则B分库的偏移量BOffset=lastOffset=1,在计算得到每个调度单元的数量之后,可以根据公式Tpos=(TCount+offset)%tgCount,对每个分库中每个分表对应的调度单元分配的编号TGi。
结合图3所示,在一种可选的实施例中,分库A、分库B和分库C的编号分别为0,1,和2,分库A的偏移量Aoffset=0,分表AT1对应的调度单元的编号为TGi=(1+Aoffset)%3=1,依次计算出分库A中其他分表对应的调度单元的编号,lastOffset=4%3=1,分库B的偏移量Boffset=lastOffset=1,则分表BT1对应的调度单元的编号为TGi=(1+Boffset) %3=2,依次可以得到3个分库中每个分分库对应的调度单元的编号。
可选地,在计算得到n个分表对应的调度单元分配的编号TGi之后,将n个分表散列至对应的调度单元。
结合图3所示,在一种可选的实施例中,基于步骤S253的计算方法,可以将分库A中的分表T1、分表T2、分表T3和分表T4,分库B中的分表T1和分表T2,分库C中的分表T1、分表T2、分表T3和分表T4散列到对应的调度单元中,散列结果可以是分表AT3、分表BT2和分表CT3对应散列至调度单元TG0,分表AT1、分表AT4、分表CT1和分表CT4对应散列至调度单元TG1,分表AT2、分表BT1和分表CT2对应散列至调度单元TG2。
由上可知,在本申请上述实施例中,按照预先配置的总并发粒度从多个分库中调取n个分表之后,可以根据预先配置的总并发粒度和单元并发粒度,确定调度单元的数量,并对每个调度单元分配对应的编号,并通过如下公式计算得到每个分库中的每个分表所对应的调度单元的编号,实现将分库分表的任务进行切分的目的,将分库分表的任务均匀的分散到不同的调度单元中,避免现有技术中分库分表的任务顺序传输造成的资源浪费。
在本申请提供的一种可选实施例中,上述调度终端133还用于将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置的过程中,,实时获取多个分库中除n个分表之外的m个分表,并按照总并发粒度将m个分表采用散列分配方式散列至不同的调度单元,如果调度单元的线程被占满的情况下,将分配至对应的调度单元的分表放置在对应的调度单元的等待队列中,在调度单元的至少一个线程被释放之后,将等待队列中的分表调度至对应的调度单元中。
可选地,将按照预先配置的总并发粒度从所述多个分库中调取n个分表并发传输至目标位置时,将多个分库中未传输的多个分表散列至不同的调度单元,如果调度单元的线程被占满,则将散列后的分表放置在等待队列中,直到调度单元中任意一个线程被释放之后,才能被调度单元并发传输至目标位置,如果调度单元的线程未被占满,则散列后的分表直接被调度单元并发传输至目标位置。
仍旧结合图3所示,在一种可选的方案中,预先配置的总开发粒度可以为9,按照总开发粒度提取的分表为分表AT1、分表AT2、分表AT3、分表AT4、分表BT1、分表BT2、分表CT1、分表CT2和分表CT3。此时,三个分库中剩余的未被传输的分表为分表CT4,经过散列分配,该分表CT4应该被分配至调度单元TG1,由于调度单元TG1 线程被占满,因此先将该分表CT4进入对调度单元TG1的等待队列,如图3中虚线所示。等待分表AT1、AT4和CT1中任意一个分表被传输至目标位置,线程释放之后,等待队列中的分表CT4被调度至调度单元TG1中,并发传输至目标位置。
由上可知,在本申请上述实施例中,在n个分表并发传输至目标位置的过程中,获取多个分库中其余的m个分表,在散列分配之后分配给对应的调度单元,如果调度单元线程被占满,则进入等待队列等待调度单元的至少一个线程被释放,实现切分分配分表的目的,在线程全部被占用的情况下进入等待队列,在至少一个线程释放后进入调度单元,保证并发传输分库分表的任务时资源最大化利用。
需要说明的是,如果相同分库不同分表的表的记录数基本相同,抽取时间相差不大,那么通过上述实施例就可以达到目的,使分库分表任务的DB抽取压力均衡,但是这种任务只存在于理想情况下,大多数任务相同分库不同分表的记录数不会保持一致,抽取时间也无法预测,这时候需要调度模块去做一个控制调度,选出最优解。
在本申请提供的一种可选实施例中,上述调度终端133还用于将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置之后,在任意一个调度单元中存在空闲的线程的情况下,获取每个分库的当前并发数,其中,当前并发数用于表征分库中已经被调度至对应的调度单元中的分表数量,并根据每个分库的当前并发数,从任意一个调度单元的等待队列中调度对应数量的分表,将从对应的等待队列中调度对应数量的分表并发传输至目标位置。
可选地,上述每个分库的当前并发数可以保存在线程并发管理器(TC-MGR,Thread-Concurrency-Manager的简写)中,在启动线程时,将调度模块中线程对应的分库的并发数加1,如果线程执行完毕,则将线程并发管理器中对应的分库的并发数减1,这两个过程标记为同步(synchronized),线程之间互斥访问,为方便将上述两个成为启动通道(holdChannel)和释放通道(releaseChannel)。采用贪心算法,在需要启动线程时,获取调度模块中的每个分库的当前并发数。
如图5所示,在一种可选的方案中,线程并发管理器中记录了分库A、分库B和分库C的当前并发数TAC、TAB和TAC,分库A的当前并发数TAC=X,分库B的当前并发数TBC=Y,分库C的当前并发数TCC=Z,调度管理单元(即上述的调度单元)包括3个通道(即上述的线程),通道1、通道2和通道3。调度单元TG0在3个线程中有至少一个线程空闲的情况下,例如线程1空闲,获取调度模块中每个分库的当前并发数。
可选地,当线程被释放之后,获取每个分库的当前并发数,并根据每个分库的当前并发数调度单元的等待队列中相应数量的分表,例如,如果调度单元释放了两个线程,则根据每个分库的当前并发数,调度该调度单元的等待队列中两个分表。
在一种可选的方案中,如果线程并发管理器中记录的当下分库A的当前并发数可以为3,分库B的当前并发数可以为1,分库C的当前并发数可以为2,此时如果需要启动的线程个数为2,则可以从调度单元的等待队列中调度两个分库B的分表,如果等待队列中只有一个分库B的分表,则进一步调度等待队列中一个分库C的分表。如果调度单元TG1的等待队列中保存了分库B的一个分表T4和其他分库的分表,此时,可以根据每个分库的并发数的比例,以及线程被释放,优先选则调取分表T4进入调度单元TG1,如果TG1的线程被全部释放,则可以调取等待队列中除T4以外的其他分表。其他任意一个调度单元的调取规则同理。
可选地,在从任意一个调度单元的等待队列中调度对应数量的分表之后,将该对应数量的分表并发传输至目标位置。
由上可知,在本申请上述实施例中,在任意一个调度单元中存在空闲的线程的情况下,获取每个分库的当前并发数,并根据每个分库的当前并发数,从任意一个调度单元的等待队列中调度对应数量的分表,并发传输至目标位置,采用贪心算法,每当调度单元有通道线程饥饿时就先做启动通道,从而实现从局部最优到全局最优,进一步实现均衡并发传输分库分表的任务。
在本申请提供的一种可选实施例中,上述调度终端133还用于按照每个分库的当前并发数进行排序,确定属于不同分库的分表的调度优先级,其中,分库的当前并发数越低,当前并发数越低的分库中的分表的调度优先级越高,并根据任意一个调度单元中存在的空闲线程数量,确定调度数量,在确定第一分库中的分表为调度优先级最高的分表之后,按照调度数量从等待队列中调度属于第一分库的分表。
可选地,获取到每个分库的当前并发数之后,将每个分库按照当前并发数进行升序排序,当前并发数最低的分库中的分表优先级最高,当前并发数最高的分库中的分配的优先级最低,确定调度单元的等待队列中每个分库分表的调度优先级。
可选地,在空闲线程数量小于等于调度单元的等待队列中的分表个数时,确定调度数量为空闲线程数量,在空闲线程数量大于调度单元的等待队列中的分表个数时,确定调度数量为调度单元的等待队列中的分表个数。
可选地,在确定属于不同分库的分表的调度优先级和调度数量之后,从调度单元的 等待队列中调度符合调度数量和调度优先级的分表。
在一种可选的方案中,分库A的当前并发数可以为3,分库B的当前并发数可以为1,分库C的当前并发数可以为2,通过分析三个分库的并发度的比例可以确定分库B的当前并发数最低,表示在之前的并发过程中,分库B中分表的分发或同步效率最低,因此分库B的分表的调度优先级最高,系统需要尽快处理分库B中的分表,而分库A的当前并发数最高,因此分库A的分表的调度优先级最低。调度单元的等待队列中有分表AT5,分表AT6,分表BT3和分表CT5,在有一个线程空闲的情况下,按照调度优先级的分析结果,此时调度单元从等待队列中调度分表BT3进行同步。
由上可知,在本申请上述实施例中,获取每个分库的当前并发数之后,根据排序结果确定调度单元中每个分表的调度优先级,并根据空闲线程数从调度单元的等待队列中调度对应调度优先级和对应数量的分表,从而实现在分表记录数均不相同的情况下,根据每个分库的当前并发数选取最优解,使分库分表任务的抽取压力均衡。
需要说明的是,为了获取每个分库的当前并发数之后,根据排序结果确定调度单元中每个分表的调度优先级,线程并发管理其器中还需要保存每个调度管理单元的分库使用情况。
在本申请提供的一种可选实施例中,上述调度终端133还用于按照调度数量从等待队列中调度对应数量的分表之前,读取等待队列中属于第一分库的分表的数量,并判断属于第一分库的分表的数量是否大于等于调度数量,如果判断属于第一分库的分表的数量大于等于调度数量,则按照调度数量从等待队列中调度对应数量的分表;如果判断属于第一分库的分表的数量小于调度数量,则按照调度数量从等待队列中调度属于第一分库和属于其他分库的分表,其中,其他分库为当前并发数大于第一分库的分库。
在一种可选的方案中,分库A的当前并发数可以为3,分库B的当前并发数可以为1,分库C的当前并发数可以为2,通过分析三个分库的并发度的比例可以确定分库B的当前并发数最低,因此分库B的分表的调度优先级最高,分库A的当前并发数最高,因此分库A的分表的调度优先级最低,分库C的当前并发数位于两者之间,因此分库C的分表的调度优先级也位于两者之间。调度单元TG0的等待队列中有分表AT5,分表AT6,分表BT3和分表CT5,按照上述一种可选方案所确定的调度优先级的调度规则,可以读取等待队列中属于分库B的分表数量为1,属于分库C的分表数量为1,属于分库A的分表数量为2,在调度单元TG0有两个线程空闲的情况下,调度单元TG0从等待队列中调度分表BT3和CT5。
在另一种可选的方案中,分库A的当前并发数可以为3,分库B的当前并发数可以为1,分库C的当前并发数可以为2,通过分析三个分库的并发度的比例可以确定分库B的当前并发数最低,因此分库B的分表的调度优先级最高,分库A的当前并发数最高,因此分库A的分表的调度优先级最低,分库C的当前并发数位于两者之间,因此分库C的分表的调度优先级也位于两者之间。调度单元的等待队列中有分表AT5,分表AT6,分表BT3和分表CT5。按照上述另一种可选方案所确定的调度优先级的调度规则,可以读取等待队列中属于分库B的分表数量为1,属于分库C的分表数量为1,属于分库A的分表数量为2,在调度单元TG0有三个线程空闲的情况下,调度单元TG0从等待队列中调度分表BT3、分表CT5和分表AT5。
在又一种可选的方案中,分库A的当前并发数可以为3,分库B的当前并发数可以为2,分库C的当前并发数可以为2,通过分析三个分库的并发度的比例可以确定分库B和分库C的当前并发数最低,因此分库B和分库C的分表的调度优先级最高,分库A的当前并发数最高,因此分库A的分表的调度优先级最低。调度单元的等待队列中有分表AT5,分表AT6,分表BT3,分表BT4,分表BT5,分表CT5和分表CT6。按照上述又一种可选方案所确定的调度优先级的调度规则,可以读取等待队列中属于分库B的分表数量为3,属于分库C的分表数量为2,属于分库A的分表数量为2,在调度单元TG0有三个线程空闲的情况下,调度单元TG0从等待队列中调度分表BT3、分表BT4和分表CT5。
由上可知,在本申请上述实施例中,在判断读取到的属于第一分库的分表的数量大于等于调度数量的情况下,按照调度数量从等待队列中调度对应数量的分表;在判断读取到的属于第一分库的分表的数量小于调度数量的情况下,在判断读取到的属于第一分库的分表的数量大于等于调度数量的情况下,按照调度数量从等待队列中调度属于第一分库和属于其他分库的分表,从而实现动态调度均衡,降低分库分表任务对抽取段DB的压力,减少长尾任务。
在本申请提供的一种可选实施例中,上述调度终端133还用于对散列至不同的所述调度单元的分表标记分库标识,所述分库标识用于表征所述分表原始对应的分库。
实施例4
本发明的实施例可以提供一种计算机终端,该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地,在本实施例中,上述计算机终端也可以替换为移动终端等终端设备。
可选地,在本实施例中,上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。
在本实施例中,上述计算机终端可以执行应用程序的漏洞检测方法中以下步骤的程序代码:从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表;按照预先配置的总并发粒度从多个分库中调取n个分表,n等于总并发粒度;将调取到的n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同;将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置。
可选地,图14是根据本发明实施例的一种计算机终端的结构框图。如图14所示,该计算机终端A可以包括:一个或多个(图中仅示出一个)处理器、存储器、以及传输装置。
其中,存储器可用于存储软件程序以及模块,如本发明实施例中的安全漏洞检测方法和装置对应的程序指令/模块,处理器通过运行存储在存储器内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的系统漏洞攻击的检测方法。存储器可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至终端A。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表;按照预先配置的总并发粒度从多个分库中调取n个分表,n等于总并发粒度;将调取到的n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同;将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置。
可选的,上述处理器还可以执行如下步骤的程序代码:根据总并发粒度和单元并发粒度,确定调度单元的数量,并对每个调度单元分配对应的编号;通过如下公式计算得到任意一个分库中的每个分表Ti的散列分配值,其中,散列分配值用于表征分表Ti所散列至对应的调度单元的编号Tpos:Tpos=(TCount+offset)%tgCount,tgCount=totalChannel/tgChannel,其中,TCount为任意一个分库中每个分表Ti的编号,offset为每个分库向对应的调度单元分配的偏移量,初始值为0,totalChannel为总并发粒 度,tgChannel为单元并发粒度;其中,n个分表分别按照计算得到的散列分配值散列至对应的调度单元。
可选的,上述处理器还可以执行如下步骤的程序代码:实时获取多个分库中除n个分表之外的m个分表,并按照总并发粒度将m个分表采用散列分配方式散列至不同的调度单元,其中,如果调度单元的线程被占满的情况下,将分配至对应的调度单元的分表放置在对应的调度单元的等待队列中,在调度单元的至少一个线程被释放之后,将等待队列中的分表调度至对应的调度单元中。
可选的,上述处理器还可以执行如下步骤的程序代码:在任意一个调度单元中存在空闲的线程的情况下,获取每个分库的当前并发数,其中,当前并发数用于表征分库中已经被调度至对应的调度单元中的分表数量;根据每个分库的当前并发数,从任意一个调度单元的等待队列中调度对应数量的分表;将从对应的等待队列中调度对应数量的分表并发传输至目标位置。
可选的,上述处理器还可以执行如下步骤的程序代码:按照每个分库的当前并发数进行排序,确定属于不同分库的分表的调度优先级,其中,分库的当前并发数越低,当前并发数越低的分库中的分表的调度优先级越高;根据任意一个调度单元中存在的空闲线程数量,确定调度数量;在确定第一分库中的分表为调度优先级最高的分表之后,按照调度数量从等待队列中调度属于第一分库的分表。
可选的,上述处理器还可以执行如下步骤的程序代码:读取等待队列中属于第一分库的分表的数量;判断属于第一分库的分表的数量是否大于等于调度数量;其中,如果大于等于调度数量,则进入按照调度数量从等待队列中调度对应数量的分表的步骤;如果小于调度数量,则按照调度数量从等待队列中调度属于第一分库和属于其他分库的分表,其中,其他分库为当前并发数大于第一分库的分库。
采用本发明实施例,如果希望减小分库分表任务的并行传输,可以在抽取待传输的任务集合之后,按照预先配置的总并发粒度从多个分库中调取n个分表,通过将调取到的n个分表采用散列分配方式散列至不同的调度单元,将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置,本方案可以根据总并发粒度和单元并发粒度对待传输的任务集合进行拆分,以均衡并行传输分库分表的任务。
容易注意到,由于在抽取待传输的任务集合之后,需要在按照预先配置的总并发粒度从多个分库中调取n个分表,并通过将调取到的n个分表采用散列分配方式散列至不同的调度单元,就可以根据散列后的分表并行发送任务集合,达到均衡并发传输分库分 表的任务,因此,通过本申请实施例所提供的方案,可以实现基于分库分表的任务传输,这样不仅实现了分库分表的任务传输,而且,在传输分库分表的任务时,在满足预先配置的总并发粒度的情况下,均衡传输分库分表的任务,因此,可以均衡分库分表中的多个任务,降低并发读取数据的压力,提高并发传输效率。
由此,本申请提供的上述实施例的方案解决了现有技术中在基于分库分表的任务进行并发传输时,抽取端从分库分表DB中并发读取数据压力过大,导致并发传输分库分表的任务效率低的技术问题。
本领域普通技术人员可以理解,图14所示的结构仅为示意,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌声电脑以及移动互联网设备(MobileInternetDevices,MID)、PAD等终端设备。图14其并不对上述电子装置的结构造成限定。例如,计算机终端10还可包括比图14中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图14所示不同的配置。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
实施例5
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于保存上述实施例一所提供的基于分库分表的任务传输方法所执行的程序代码。
可选地,在本实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:从分库分表中抽取待传输的任务集合,其中,任务集合包括:多个分库,以及每个分库所包含的分表;按照预先配置的总并发粒度从多个分库中调取n个分表,n等于总并发粒度;将调取到的n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同;将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:根据总并发粒度和单元并发粒度,确定调度单元的数量,并对每个调度单元分配对应的编号;通过如下公式计算得到任意一个分库中的每个分表Ti的散列分配值,其中,散列分 配值用于表征分表Ti所散列至对应的调度单元的编号Tpos:Tpos=(TCount+offset)%tgCount,tgCount=totalChannel/tgChannel,其中,TCount为任意一个分库中每个分表Ti的编号,offset为每个分库向对应的调度单元分配的偏移量,初始值为0,totalChannel为总并发粒度,tgChannel为单元并发粒度;其中,n个分表分别按照计算得到的散列分配值散列至对应的调度单元。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:在将每个调度单元中包含的分表按照单元并发粒度并发传输至目标位置的过程中,实时获取多个分库中除n个分表之外的m个分表,并按照总并发粒度将m个分表采用散列分配方式散列至不同的调度单元,其中,如果调度单元的线程被占满的情况下,将分配至对应的调度单元的分表放置在对应的调度单元的等待队列中,在调度单元的至少一个线程被释放之后,将等待队列中的分表调度至对应的调度单元中。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:在任意一个调度单元中存在空闲的线程的情况下,获取每个分库的当前并发数,其中,当前并发数用于表征分库中已经被调度至对应的调度单元中的分表数量;根据每个分库的当前并发数,从任意一个调度单元的等待队列中调度对应数量的分表;将从对应的等待队列中调度对应数量的分表并发传输至目标位置。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:按照每个分库的当前并发数进行排序,确定属于不同分库的分表的调度优先级,其中,分库的当前并发数越低,当前并发数越低的分库中的分表的调度优先级越高;根据任意一个调度单元中存在的空闲线程数量,确定调度数量;在确定第一分库中的分表为调度优先级最高的分表之后,按照调度数量从等待队列中调度属于第一分库的分表。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:读取等待队列中属于第一分库的分表的数量;判断属于第一分库的分表的数量是否大于等于调度数量;其中,如果大于等于调度数量,则进入按照调度数量从等待队列中调度对应数量的分表的步骤;如果小于调度数量,则按照调度数量从等待队列中调度属于第一分库和属于其他分库的分表,其中,其他分库为当前并发数大于第一分库的分库。
可选地,在本实施例中,存储介质被设置为存储:对散列至不同的调度单元的分表标记分库标识,分库标识用于表征分表原始对应的分库。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详 述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (14)

  1. 一种基于分库分表的任务传输方法,其特征在于,包括:
    从分库分表中抽取待传输的任务集合,其中,所述任务集合包括:多个分库,以及每个分库所包含的分表;
    按照预先配置的总并发粒度从所述多个分库中调取n个分表,n等于所述总并发粒度;
    将调取到的所述n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同;
    将所述每个调度单元中包含的分表按照所述单元并发粒度并发传输至目标位置。
  2. 根据权利要求1所述的方法,其特征在于,在将调取到的所述n个分表采用散列分配方式散列至不同的调度单元之前,所述方法还包括:
    根据所述总并发粒度和所述单元并发粒度,确定所述调度单元的数量,并对所述每个调度单元分配对应的编号;
    通过如下公式计算得到任意一个分库中的每个分表Ti的散列分配值,其中,所述散列分配值用于表征所述分表Ti所散列至对应的调度单元的编号Tpos:
    Tpos=(TCount+offset)%tgCount,tgCount=totalChannel/tgChannel,其中,TCount为所述任意一个分库中每个分表Ti的编号,offset为每个分库向对应的调度单元分配的偏移量,初始值为0,totalChannel为所述总并发粒度,tgChannel为所述单元并发粒度;
    其中,所述n个分表分别按照计算得到的散列分配值散列至对应的调度单元。
  3. 根据权利要求2所述的方法,其特征在于,在将所述每个调度单元中包含的分表按照所述单元并发粒度并发传输至目标位置的过程中,实时获取所述多个分库中除所述n个分表之外的m个分表,并按照所述总并发粒度将所述m个分表采用所述散列分配方式散列至所述不同的调度单元,其中,如果所述调度单元的线程被占满的情况下,将分配至对应的调度单元的分表放置在所述对应的调度单元的等待队列中,在所述调度单元的至少一个线程被释放之后,将所述等待队列中的分表调度至所述对应的调度单元中。
  4. 根据权利要求1至3中任意一项所述的方法,其特征在于,在将所述每个调度单元中包含的分表按照所述单元并发粒度并发传输至目标位置之后,所述方法还包括:
    在任意一个调度单元中存在空闲的线程的情况下,获取每个分库的当前并发数,其中,所述当前并发数用于表征分库中已经被调度至对应的调度单元中的分表数量;
    根据所述每个分库的当前并发数,从所述任意一个调度单元的等待队列中调度对应数量的分表;
    将从对应的等待队列中调度对应数量的分表并发传输至所述目标位置。
  5. 根据权利要求4所述的方法,其特征在于,根据所述每个分库的当前并发数,从所述任意一个调度单元的等待队列中调度对应数量的分表,包括:
    按照所述每个分库的当前并发数进行排序,确定属于不同分库的分表的调度优先级,其中,分库的当前并发数越低,所述当前并发数越低的分库中的分表的调度优先级越高;
    根据所述任意一个调度单元中存在的空闲线程数量,确定调度数量;
    在确定第一分库中的分表为所述调度优先级最高的分表之后,按照所述调度数量从所述等待队列中调度属于所述第一分库的分表。
  6. 根据权利要求5所述的方法,其特征在于,在按照所述调度数量从所述等待队列中调度对应数量的分表之前,所述方法还包括:
    读取所述等待队列中属于所述第一分库的分表的数量;
    判断所述属于所述第一分库的分表的数量是否大于等于所述调度数量;
    其中,如果大于等于所述调度数量,则进入按照所述调度数量从所述等待队列中调度对应数量的分表的步骤;
    如果小于所述调度数量,则按照所述调度数量从所述等待队列中调度属于所述第一分库和属于其他分库的分表,其中,所述其他分库为当前并发数大于所述第一分库的分库。
  7. 根据权利要求1所述的方法,其特征在于,对散列至不同的所述调度单元的分表标记分库标识,所述分库标识用于表征所述分表原始对应的分库。
  8. 一种基于分库分表的任务传输装置,其特征在于,包括:
    抽取模块,用于从分库分表中抽取待传输的任务集合,其中,所述任务集合包括:多个分库,以及每个分库所包含的分表;
    调取模块,用于按照预先配置的总并发粒度从所述多个分库中共调取n个分表,n等于所述总并发粒度;
    处理模块,用于将调取到的所述n个分表采用散列分配方式散列至不同的调度单元,其中,预先配置每个调度单元的单元并发粒度相同;
    并发模块,用于将所述每个调度单元中包含的分表按照所述单元并发粒度并发传输 至目标位置。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    第一确定模块,用于根据所述总并发粒度和所述单元并发粒度,确定所述调度单元的数量,并对所述每个调度单元分配对应的编号;
    计算模块,用于通过如下公式计算得到任意一个分库中的每个分表Ti的散列分配值,其中,所述散列分配值用于表征所述分表Ti所散列至对应的调度单元的编号Tpos:
    Tpos=(TCount+offset)%tgCount,tgCount=totalChannel/tgChannel其中,TCount为所述任意一个分库中每个分表Ti的编号,offset为每个分库向对应的调度单元分配的偏移量,初始值为0,totalChannel为所述总并发粒度,tgChannel为所述单元并发粒度;
    其中,所述处理模块包括:第一散列分配模块,用于所述n个分表分别按照计算得到的散列分配值散列至对应的调度单元。
  10. 根据权利要求9所述的装置,其特征在于,所述装置还包括,
    第一获取模块,用于实时获取所述多个分库中除所述n个分表之外的m个分表;
    第二散列分配模块,用于按照所述总并发粒度将所述m个分表采用所述散列分配方式散列至所述不同的调度单元;
    子处理模块,用于如果所述调度单元的线程被占满的情况下,将分配至对应的调度单元的分表放置在所述对应的调度单元的等待队列中,在所述调度单元的至少一个线程被释放之后,将所述等待队列中的分表调度至所述对应的调度单元中。
  11. 根据权利要求8至10中任意一项所述的装置,其特征在于,所述装置还包括:
    第二获取模块,用于在任意一个调度单元中存在空闲的线程的情况下,获取每个分库的当前并发数,其中,所述当前并发数用于表征分库中已经被调度至对应的调度单元中的分表数量;
    调度模块,用于根据所述每个分库的当前并发数,从所述任意一个调度单元的等待队列中调度对应数量的分表;
    传输模块,用于将从对应的等待队列中调度对应数量的分表并发传输至所述目标位置。
  12. 根据权利要求11所述的装置,其特征在于,所述调度模块包括:
    排序模块,用于按照所述每个分库的当前并发数进行排序,确定属于不同分库的分表的调度优先级,其中,分库的当前并发数越低,所述当前并发数越低的分库中的分表的调度优先级越高;
    第二确定模块,用于根据所述任意一个调度单元中存在的空闲线程数量,确定调度数量;
    子调度模块,用于在确定第一分库中的分表为所述调度优先级最高的分表之后,按照所述调度数量从所述等待队列中调度属于所述第一分库的分表。
  13. 根据权利要求12所述的装置,其特征在于,所述装置还包括:
    读取模块,用于读取所述等待队列中属于所述第一分库的分表的数量;
    判断模块,用于判断所述属于所述第一分库的分表的数量是否大于等于所述调度数量;
    第一执行模块,用于如果大于等于所述调度数量,则执行所述子调度模块的功能;
    第二执行模块,用于如果小于所述调度数量,则按照所述调度数量从所述等待队列中调度属于所述第一分库和属于其他分库的分表,其中,所述其他分库为当前并发数大于所述第一分库的分库。
  14. 一种基于分库分表的任务传输系统,其特征在于,包括:
    源数据终端,用于存储分库分表;
    调度终端,与所述源数据终端通信,用于从所述分库分表中抽取待传输的任务集合,其中,所述任务集合包括:多个分库,以及每个分库所包含的分表,并按照预先配置的总并发粒度从所述多个分库中共调取n个分表,n等于所述总并发粒度,在将调取到的所述n个分表采用散列分配方式散列至不同的调度单元之后,将每个调度单元中包含的分表按照单元并发粒度并发传输,其中,预先配置所述每个调度单元的所述单元并发粒度相同;
    目标终端,与所述调度终端通信,用于接收所述调度终端并发传输来的所述任务集合。
PCT/CN2016/107409 2015-12-07 2016-11-28 基于分库分表的任务传输方法、装置及系统 WO2017097124A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510888403.9A CN106844397B (zh) 2015-12-07 2015-12-07 基于分库分表的任务传输方法、装置及系统
CN201510888403.9 2015-12-07

Publications (1)

Publication Number Publication Date
WO2017097124A1 true WO2017097124A1 (zh) 2017-06-15

Family

ID=59012685

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/107409 WO2017097124A1 (zh) 2015-12-07 2016-11-28 基于分库分表的任务传输方法、装置及系统

Country Status (2)

Country Link
CN (1) CN106844397B (zh)
WO (1) WO2017097124A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112965956A (zh) * 2021-03-18 2021-06-15 上海东普信息科技有限公司 数据库水平扩容方法、装置、设备和存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766169B (zh) * 2017-11-10 2023-07-04 阿里巴巴集团控股有限公司 调度数据访问量的方法、控制组件、设备和计算机存储介质
CN109962951B (zh) * 2017-12-25 2022-04-15 航天信息股份有限公司 云平台监控数据系统
CN111597041B (zh) * 2020-04-27 2023-01-10 深圳市金证科技股份有限公司 一种分布式系统的调用方法、装置、终端设备及服务器
CN111930741A (zh) * 2020-07-15 2020-11-13 中国银行股份有限公司 数据库分库方法、装置及交易请求数据读写系统
CN113065084B (zh) * 2021-03-08 2022-12-23 南京苏宁软件技术有限公司 数据加载方法、装置、计算机设备和存储介质
CN112765184A (zh) * 2021-04-07 2021-05-07 四川新网银行股份有限公司 一种基于Mysql分库分表的实时采集方法
CN114238333A (zh) * 2021-12-17 2022-03-25 中国邮政储蓄银行股份有限公司 数据拆分方法、装置以及设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678408A (zh) * 2012-09-21 2014-03-26 阿里巴巴集团控股有限公司 一种查询数据的方法及装置
CN103942209A (zh) * 2013-01-18 2014-07-23 阿里巴巴集团控股有限公司 数据处理方法
CN104317749A (zh) * 2014-10-31 2015-01-28 小米科技有限责任公司 信息写入方法和装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014160029A1 (en) * 2013-03-14 2014-10-02 Gamesys Ltd Systems and methods for dynamic sharding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678408A (zh) * 2012-09-21 2014-03-26 阿里巴巴集团控股有限公司 一种查询数据的方法及装置
CN103942209A (zh) * 2013-01-18 2014-07-23 阿里巴巴集团控股有限公司 数据处理方法
CN104317749A (zh) * 2014-10-31 2015-01-28 小米科技有限责任公司 信息写入方法和装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112965956A (zh) * 2021-03-18 2021-06-15 上海东普信息科技有限公司 数据库水平扩容方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN106844397B (zh) 2020-05-12
CN106844397A (zh) 2017-06-13

Similar Documents

Publication Publication Date Title
WO2017097124A1 (zh) 基于分库分表的任务传输方法、装置及系统
CN102831120B (zh) 一种数据处理方法及系统
CN103516536B (zh) 基于线程数量限制的服务器业务请求并行处理方法及系统
CN107391629B (zh) 集群间数据迁移方法、系统、服务器及计算机存储介质
KR101994021B1 (ko) 파일 조작 방법 및 장치
CN107241281B (zh) 一种数据处理方法及其装置
CN104969213A (zh) 用于低延迟数据存取的数据流分割
TWI680404B (zh) 數據虛擬化存儲方法和裝置
CN104348859B (zh) 文件同步方法、装置、服务器、终端及系统
CN107515784A (zh) 一种在分布式系统中计算资源的方法与设备
CN103324487A (zh) 一种流程引擎SaaS化的实现方法
CN109062697A (zh) 一种提供空间分析服务的方法和装置
CN103701653B (zh) 一种接口热插拔配置数据的处理方法及网络配置服务器
TW201738781A (zh) 資料表連接方法及裝置
CN111813573A (zh) 管理平台与机器人软件的通信方法及其相关设备
CN113419846A (zh) 资源配置方法和装置、电子设备及计算机可读存储介质
US11947534B2 (en) Connection pools for parallel processing applications accessing distributed databases
CN108399175B (zh) 一种数据存储、查询方法及其装置
CN104731660B (zh) 数据分配方法、装置和系统
CN101833585A (zh) 数据库服务器操作控制系统、方法及设备
CN116069493A (zh) 一种数据处理方法、装置、设备以及可读存储介质
CN102970349A (zh) 一种dht网络的存储负载均衡方法
CN116304390B (zh) 时序数据处理方法、装置、存储介质及电子设备
CN111291045A (zh) 服务隔离数据传输方法、装置、计算机设备及存储介质
US10572486B2 (en) Data communication in a distributed data grid

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16872316

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16872316

Country of ref document: EP

Kind code of ref document: A1