WO2018001200A1 - 数据处理方法、集群管理器、资源管理器、数据处理系统 - Google Patents

数据处理方法、集群管理器、资源管理器、数据处理系统 Download PDF

Info

Publication number
WO2018001200A1
WO2018001200A1 PCT/CN2017/090001 CN2017090001W WO2018001200A1 WO 2018001200 A1 WO2018001200 A1 WO 2018001200A1 CN 2017090001 W CN2017090001 W CN 2017090001W WO 2018001200 A1 WO2018001200 A1 WO 2018001200A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
chase
import
increment
Prior art date
Application number
PCT/CN2017/090001
Other languages
English (en)
French (fr)
Inventor
张宗禹
丁岩
徐宜良
郭龙波
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP17819197.9A priority Critical patent/EP3480686B1/en
Publication of WO2018001200A1 publication Critical patent/WO2018001200A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0617Improving the reliability of storage systems in relation to availability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files

Definitions

  • the present application relates to, but is not limited to, the field of communications, and more particularly to a data processing method, a cluster manager, a resource manager, and a data processing system.
  • distributed databases involve a large number of distributed nodes, the amount of data required to be supported is much larger than that of a single-machine database. Therefore, distributed databases also face data redistribution problems different from stand-alone computers.
  • One of the most commonly used online redistribution modes for distributed databases is data pre-distribution. This mode needs to design the data distribution rules in advance, that is, pre-segmented into a large number of pre-allocation tables in the design of the table, put them into different databases, and need to redistribute the table online when a certain piece or a few pieces The data migration is copied to the specified node.
  • the incremental data in this mode can only be performed on a whole library. According to the characteristics of the data, the data should be divided in advance, and the routing rules of the data should be specified. If this design is unreasonable, it will cause some libraries to be too large. If the data of the library itself is very large, there is no way to use this method. This library is redistributed.
  • Embodiments of the present invention provide a data processing method, a cluster manager, a resource manager, and a data processing system, so that data redistribution can also be completed in a case where the database data is large.
  • a data processing method includes: receiving a data redistribution request; instructing, according to the data redistribution request, a first resource manager to split original data stored in the first node and Importing each of the split sub-raw data into a second node corresponding to each of the partial sub-original data, wherein the first resource manager is configured to manage data stored in the first node; Instructing the first resource manager to import the chase increment data stored in the first node during the splitting and importing process into the second node corresponding to the chase increment data.
  • the instructing the first resource manager to import the tracking incremental data stored in the first node during the splitting and importing process into the first corresponding to the tracking incremental data The two nodes include: acquiring a logical transaction log of the second node corresponding to the trace incremental data after importing the child raw data, wherein the logical transaction log is used to describe the storage in the second node Information of the sub-raw data; sending a chase increment indication to the first resource manager according to the logical transaction log, wherein the chase increment indication is used to indicate that the first resource manager is according to the logic
  • the transaction log determines the chase increment data stored in the first node during the split and import process, and imports the chase delta data into the second node corresponding to the chase increment data.
  • the tracking incremental data stored in the first node during the performing splitting and importing process is instructed by the first resource manager to be imported into the second corresponding to the tracking incremental data.
  • the method further includes: receiving a first check value, an import time, and an import result of the trace incremental data from the first resource manager, and receiving the location from the second resource manager a second check value of the incremental data, wherein the second resource manager is configured to manage data stored in the second node; and when determining that the import result is successful, determining the first check Whether the value and the second check value are the same; performing at least one of the following processes: in a case where the result of the determination is that the first check value and the second check value are not the same, Re-executing a process of instructing the first resource manager to import the chase increment data into the second node corresponding to the chase increment data; the judgment result is the first check value and the If the second check value is the same, determining whether the import time is less than a predetermined threshold, and if the result of the determination is
  • a data processing method comprising: receiving a data redistribution indication from a cluster manager; and splitting the original data stored in the first node according to the data redistribution indication And importing each of the split sub-raw data into a second node corresponding to each partial sub-origin data; receiving a chase increment indication from the cluster manager; and performing the chase increment indication according to the chase increment indication
  • the chase increment data stored in the first node during the splitting and importing process is imported into the second node corresponding to the chase increment data.
  • the second node corresponding to the data includes: deriving the original data from the first node according to the data redistribution indication; splitting the original data according to a data distribution rule, and splitting the original data into a plurality of files, wherein one file corresponds to a second node; and the plurality of files are respectively uploaded to a second node corresponding to the plurality of files for import.
  • the tracking incremental data stored in the first node during the splitting and importing process is imported into the second corresponding to the tracking incremental data according to the tracking increment indication
  • the node includes: determining, according to the logical transaction log carried in the trace increment indication, the trace transaction data, wherein the logical transaction log is used to describe the sub-origin stored in the second node Information of the start data; generating a data operation language DML statement according to the trace incremental data; and importing the trace incremental data into the second node corresponding to the trace incremental data by using the DML statement.
  • the tracking incremental data stored in the first node during the splitting and importing process is imported into the first corresponding to the tracking incremental data according to the tracking increment indication.
  • the method further includes: determining a first check value of the chase incremental data, and importing the chase incremental data into the second node and importing the result; The first check value, the import time, and the import result are sent to the cluster manager.
  • the method further includes at least one of: receiving from the cluster management The chase increment re-import indication, re-executing the process of importing the chase delta data into the second node corresponding to the chase increment data according to the chase delta re-import indication; receiving from the cluster Terminating the termination of the manager, terminating the first node to store new data according to the termination indication, and importing data stored in the first node and not imported into the corresponding second node into the corresponding second node Receiving a repeated import indication from the cluster manager; and returning, according to the repeated import instruction, the step of importing data stored in the first node and not importing the corresponding second node into the corresponding second node Executing until the time when the data not imported into the corresponding second node is imported into the corresponding second node is less than a predetermined threshold.
  • a cluster manager including: a first receiving module configured to receive a data redistribution request; and a first indication module configured to indicate a first according to the data redistribution request
  • the resource manager splits the original data stored in the first node and respectively imports each of the split sub-raw data into a second node corresponding to each partial sub-original data, wherein the first
  • the resource manager is configured to manage data stored in the first node; the second indication module is configured to instruct the first resource manager to store the chase in the first node during the splitting and importing process The incremental data is imported into the second node corresponding to the chase increment data.
  • the second indication module includes: an acquiring unit, configured to acquire and And the logical transaction log after the second node corresponding to the incremental data is imported, wherein the logical transaction log is used to describe the information of the sub-original data stored in the second node.
  • a sending unit configured to send a chase increment indication to the first resource manager according to the logical transaction log, where the chase increment indication is used to instruct the first resource manager to follow the logical transaction log Determining the incremental data stored in the first node during the splitting and importing process, and importing the chase incremental data into the second node corresponding to the chase incremental data.
  • the cluster manager further includes: a first processing module, configured to increase the storage to the first node during the process of performing the splitting and importing process by the first resource manager After the quantity data is imported into the second node corresponding to the chase increment data, receiving the first check value, the import time, and the import result of the chase increment data from the first resource manager, and Receiving a second check value of the trace incremental data from the second resource manager, wherein the second resource manager is configured to manage data stored in the second node; determining that the import result is successful import Determining whether the first check value and the second check value are the same; performing at least one of: the determining result is that the first check value and the second check value are not In the same case, the process of instructing the first resource manager to import the chase increment data into the second node corresponding to the chase increment data is re-executed; Check value and the second If the test value is the same, determining whether the import time is less than a predetermined threshold, and if the result of the determination is that the import time is less
  • a resource manager comprising: a second receiving module, configured to receive a data redistribution indication from a cluster manager; a first import mode a block, configured to split the original data stored in the first node according to the data redistribution indication, and import each of the split sub-raw data into a second node corresponding to each partial sub-original data; a third receiving module, configured to receive a chase increment indication from the cluster manager; and a second import module configured to store, according to the chase increment indication, to the first node during a splitting and importing process The chase increment data in the middle is imported into the second node corresponding to the chase increment data.
  • the resource manager further includes: a second processing module, configured to store the chase stored in the first node during the splitting and importing process according to the chase increment indication After the incremental data is imported into the second node corresponding to the tracking incremental data, determining a first check value of the tracking incremental data, and importing the tracking incremental data into the second node Import time and import result; send the first check value, the import time, and the import result to the cluster manager.
  • a second processing module configured to store the chase stored in the first node during the splitting and importing process according to the chase increment indication After the incremental data is imported into the second node corresponding to the tracking incremental data, determining a first check value of the tracking incremental data, and importing the tracking incremental data into the second node Import time and import result; send the first check value, the import time, and the import result to the cluster manager.
  • the resource manager further includes: a third processing module, configured to receive after sending the first check value, the import time, and the import result to the cluster manager a chase increment re-import instruction from the cluster manager, and re-executing a process of importing the chase increment data into a second node corresponding to the chase increment data according to the chase increment re-import instruction; Or receiving a termination indication from the cluster manager, terminating, according to the termination indication, the first node to store new data, and importing data stored in the first node and not imported into the corresponding second node.
  • a data processing system comprising a cluster manager, a first resource manager, wherein the cluster manager is connected to the first resource manager, configured to receive according to Data redistribution request sends a data redistribution indication and a trace increment indication to the first resource manager; the first resource manager is configured to point to the first node according to a data redistribution indication from the cluster manager
  • the original data stored in the partition is split and each part of the split raw data is separately imported into the first corresponding to each partial sub-raw data
  • the chase increment data stored in the first node during the splitting and importing process is imported into the corresponding to the chase increment data.
  • the chase increment data stored in the first node during the splitting and importing process is imported into the corresponding to the chase increment data.
  • the data processing system further includes a second resource manager, wherein the second resource manager is connected to the cluster manager; the cluster manager is further configured to: After the resource manager sends the chase increment indication, receiving a first check value, an import time, and an import result of the chase increment data from the first resource manager, and receiving the second from the second resource manager And determining, by the resource manager, the second check value of the incremental data; determining whether the first check value and the second check value are the same when determining that the import result is successful; performing the following processing At least one of: in a case where the determination result is that the first check value and the second check value are not the same, sending a chase increment re-import indication to the first resource manager; If the first check value and the second check value are the same, determining whether the import time is less than a predetermined threshold, and if the result of the determination is that the import time is less than the predetermined threshold, First resource tube And transmitting, in the case that the first check value and the second check value are the same, determining whether the import
  • the first resource manager is configured to: determine the tracking incremental data according to the logical transaction log carried in the tracking incremental indication, where the logical transaction log is used to describe the second node. Generating the information of the sub-raw data in the data; generating a data operation language DML statement according to the chase increment data; importing the chase increment data into the second corresponding to the chase increment data by using the DML statement In the node.
  • a storage medium is also provided.
  • the storage medium is arranged to store program code for performing the above steps.
  • online data redistribution is adopted, thereby avoiding service interruption, and since the redistribution of the incremental data is performed, only the incremental data can be redistributed without chasing
  • the database in which the quantity data is located is overall redistributed, thereby ensuring the purpose of data redistribution in the case of a large database data. Therefore, the avoidance of business interruption is achieved to ensure the effect of data redistribution in the case of large database data.
  • FIG. 1 is a block diagram showing the hardware configuration of a computer terminal of a data processing method according to an embodiment of the present invention
  • FIG. 2 is a flow chart of a data processing method according to an embodiment of the present invention.
  • FIG. 3 is a flow chart of another data processing method according to an embodiment of the present invention.
  • FIG. 4 is a timing diagram of a distributed database online redistribution tracking increment according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a re-distribution increment of a regional payment record according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram showing the re-distribution increment of the train ticket record according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of distributed database online redistribution tracking incremental data and verification according to an embodiment of the present invention.
  • FIG. 8 is a structural block diagram of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 9 is a block diagram showing the structure of another data processing apparatus according to an embodiment of the present invention.
  • FIG. 1 is a hardware block diagram of a computer terminal of a data processing method according to an embodiment of the present invention.
  • computer terminal 10 may include one or more (only one of which is shown in FIG. 1) processor 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA.
  • FIG. 1 is merely illustrative and does not limit the structure of the above electronic device.
  • computer terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration than that shown in FIG.
  • the memory 104 can be configured as a software program and a module for storing application software, such as program instructions/modules corresponding to the data processing method in the embodiment of the present invention, and the processor 102 executes each by executing a software program and a module stored in the memory 104.
  • a functional application and data processing, that is, the above method is implemented.
  • Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 104 may further include memory remotely located relative to processor 102, which may be coupled to computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Transmission device 106 is arranged to receive or transmit data via a network.
  • the network examples described above may include a wireless network provided by a communication provider of computer terminal 10.
  • the transmission device 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission device 106 can be a Radio Frequency (RF) module configured to communicate with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a data processing method according to an embodiment of the present invention. As shown in FIG. 2, the process includes the following steps:
  • Step S202 receiving a data redistribution request
  • Step S204 instructing the first resource manager to split the original data stored in the first node according to the data redistribution request, and importing each of the split sub-raw data into the corresponding original data of each part.
  • the first resource manager is configured to manage data stored in the first node;
  • Step S206 instructing the first resource manager to import the tracking incremental data stored in the first node during the splitting and importing process into the second node corresponding to the tracking incremental data.
  • the above operation may be a cluster manager.
  • the cluster manager can receive the cluster-related request of the upper-layer service (such as the data redistribution request described above), manage the distributed cluster, coordinate the resource manager database (Data Base, referred to as DB) status report, and notify the resource manager. Commands such as switching, backup, and redistribution.
  • the first resource manager described above is usually an upper layer agent of a database (including the first node mentioned above), and is a local database monitoring program that responds to upper layer requests for complex operations on the database.
  • the first resource manager may be configured to execute a redistribution process in response to a redistribution request of the cluster manager.
  • the first node and the second node described above are all basic nodes set to save data.
  • the first resource manager is instructed to import the tracking incremental data stored in the first node during the splitting and importing process to the corresponding incremental data.
  • the second node includes: acquiring a logical transaction log of the second node corresponding to the foregoing chase increment data after importing the sub-origin data, wherein the logical transaction log is used to describe the sub-original data stored in the second node. Information; sending a chase increment indication to the first resource manager according to the logical transaction log, wherein the chase increment indication is used to instruct the first resource manager to determine, according to the logical transaction log, that the storage is performed during the splitting and import processing The incremental data in the first node is imported, and the tracking incremental data is imported into the second node corresponding to the incremental data.
  • the first resource manager may determine, according to the logical transaction log, which data in the first node is backed up to the second node, thereby determining data that is not backed up, that is, determining the incremental data.
  • the method further includes: receiving a first check value, an import time, and an import result of the above-mentioned chase increment data from the first resource manager, and receiving a second of the chase increment data from the second resource manager a check value, wherein the second resource manager is configured to manage data stored in the second node; and when determining that the import result is successful, determining whether the first check value and the second check value are the same; If the result of the determination is that the first check value and the second check value are different, the process of instructing the first resource manager to import the tracking incremental data into the second node corresponding to the incremental data is re-executed ;and / or,
  • the first node is controlled to terminate the storage. And the data indicating that the first resource manager imports the data stored in the first node and not imported into the second node into the corresponding second node; and/or, in the judgment result, the import time is greater than or equal to a predetermined threshold In the case that the step of instructing the first resource manager to import the data stored in the first node and not importing the corresponding second node into the corresponding second node is continued, until the corresponding second node is not imported. The time during which the data is imported into the corresponding second node is less than the predetermined threshold.
  • the import time when it is determined that the import time is greater than or equal to the predetermined threshold, it indicates that the amount of data stored in the first node is too large, and in order to ensure the continuity of the service, the foregoing execution of the first node is performed cyclically.
  • the processing of storing and not importing new data in the second node into the second node (including the above-mentioned determination of the import result in the loop processing, comparing the two check values to And an operation step of judging the import time, that is, each loop performs a new chase increment operation on the data stored in the first node and not imported into the second node, and, in the loop processing,
  • the first control is also performed to terminate the storage of the new data, and the first resource manager is instructed to import the data stored in the first node and not imported into the second node into the corresponding second node. operating.
  • the second resource manager before receiving the second check value of the above-mentioned chase increment data from the second resource manager, performing the following operation: sending a chase incremental data check request to the second resource management, wherein the chase The incremental data verification request carries the above logical transaction log.
  • the second resource manager generates a query statement for querying data in the second node according to the trace incremental data check request, thereby acquiring data and calculating a check value, and returning the calculated check value.
  • the data redistribution method in the embodiment of the present invention does not need to analyze the data in advance, and no need to worry about the data distribution rule. Designing unreasonable issues does not require the design of distribution rules for library-level data. In order to ensure the load balancing of each node, the reasonable distribution rules of each node can be designed according to the current scenario. Moreover, once a certain node is too large and needs to be redistributed, the cluster manager is notified by the client, and the cluster manager controls the resource manager of each DB to perform the redistribution process.
  • the original data of the table is exported, re-split according to the latest data distribution rules, and imported into each new data node.
  • the incremental data flow is started.
  • the cluster manager collects the incremental results for each node and initiates an incremental verification process to the new node.
  • the cluster manager compares the check values of the old and new nodes and completes the incremental check.
  • the verification of incremental data is not supported due to data redistribution in the related art. Moreover, the late-tracking increment in the related art is directly parsing and playing back using the original tools of the database, so there is no way to verify whether the newly added data is correct. As a result, data redistribution process causes data errors and affects subsequent normal services.
  • the verification operation of the data is added, and the first check value of the traced incremental data reported by the first resource manager and the second checksum of the incremental data reported by the second resource manager are added. The test values are compared to determine whether the redistributed data is correctly imported into the second node, which ensures the integrity of the data and provides guarantee for the normal operation of the subsequent services.
  • FIG. 3 is a flowchart of another data processing method according to an embodiment of the present invention. As shown in FIG. 3, the process includes the following steps:
  • Step S302 receiving a data redistribution indication from the cluster manager
  • Step S304 splitting the original data stored in the first node according to the data redistribution indication, and importing each of the split sub-raw data into the second node corresponding to each partial sub-original data;
  • Step S306 receiving a tracking increment indication from the cluster manager.
  • Step S308 the chase increment data stored in the first node during the splitting and importing process is imported into the second node corresponding to the chase increment data according to the chase increment instruction.
  • performing the above operation may be a first resource manager for managing data in the first node.
  • the original data stored in the first node is split according to the data redistribution indication, and each of the split sub-raw data is separately imported into the second node corresponding to each partial sub-original data.
  • the method includes: deriving the original data from the first node according to the data redistribution indication; splitting the original data according to a data distribution rule, and splitting the original data into multiple files, wherein one file corresponds to a second node And uploading the plurality of files to the second node corresponding to the plurality of files for importing.
  • the above splitting rules may be flexibly adjusted. As the distributed database is running, the database data will be more and more. Once the previously determined splitting rules are found to be problematic, the splitting may be performed at any time. The rules are adjusted, and the splitting rules may correspond to the distribution rules of the original data described above. Similarly, the distribution rules of the original data may be flexibly adjusted.
  • the above-mentioned chase increment indication will be during the split and import process.
  • Importing the tracking incremental data stored in the first node into the second node corresponding to the tracking incremental data includes: determining the foregoing incremental data according to the logical transaction log carried in the foregoing incremental increment indication, where The logical transaction log is used to describe the information of the foregoing sub-raw data stored in the second node; generate a Data Manipulation Language (DML) statement according to the above-mentioned chasing incremental data; use the DML statement to increase the above-mentioned chase increment The data is imported into the second node corresponding to the above-mentioned chase increment data.
  • DML Data Manipulation Language
  • the initial DML statement for importing the data into the first node may be first generated, and then the initial DML statement is modified to obtain the above DML statement, and the main modified content is Including: replacing the old node name (ie, the name of the first node) with the new node name (ie, the name of the second node) in the initial DML statement, and specifying a new distribution key field after the where, the distribution key field is used Point to the second node.
  • the temporary connection may be established with the second node first, and the generated is performed after the connection. The above DML statement.
  • the method further includes: determining a first check value of the chase increment data, and importing the chase increment data into an import time and an import result in the second node; and the first check value, the import time, and the importing The result is sent to the cluster manager.
  • the cluster manager may perform related verification according to the import result, the first check value, and the import time (in the school)
  • the verification is performed, the verification is performed in conjunction with the second verification value from the second resource manager, and the operation is judged.
  • the actual operation has been stated in the foregoing embodiments, and details are not described herein again.
  • the first check value, the import time, and the import result may be used by the cluster manager to perform related verification and judgment operations, and according to different verification and/or judgment results, may be under the cluster manager.
  • Different instructions sent include: receiving a chase incremental re-import indication from the cluster manager, and re-executing the chasing incremental data into the second node corresponding to the truncated incremental data according to the chase incremental re-import indication And/or receiving a termination indication from the cluster manager, terminating the first node to store new data according to the termination indication, and importing data stored in the first node and not imported into the corresponding second node to Corresponding to the second node; And/or receiving a repeated import indication from the cluster manager; and returning, according to the repeated import instruction, the step of importing data stored in the first node and not importing the corresponding second node into the corresponding second node, and continuing to perform, Until the time when the data not imported into the corresponding second node is imported into the corresponding second node is less than the
  • a termination indication from the cluster manager is also received, and the first node is terminated according to the termination indication to store new data, and the corresponding information stored in the first node is not imported.
  • the data of the second node is imported into the corresponding second node.
  • the resource manager when performing the incremental data flow process, the resource manager (ie, the first resource manager described above) first acquires the database transaction log of the old node (ie, the first node described above), according to The last time the backup was made or the location that was last played back, the transaction log block was filtered out, and the structured query language (Structured Query Language, SQL for short) related to the old table was filtered. Use these SQL statements to connect to the database to perform the acquisition of data to calculate the incremental check value. Then these SQL statements are reconstructed and sent to the corresponding machine according to the new table name, new distribution rules, and other node information, and submitted in parallel.
  • SQL Structured Query Language
  • the playback result, check value, and playback time are reported to the cluster manager.
  • the new node (corresponding to the second node mentioned above) can obtain the SQL statement of the current playback through the transaction log position of the start and end of the incremental playback, thereby obtaining the relevant
  • the data calculates the checksum and reports it to the cluster manager.
  • a data processing method including the following steps:
  • the second check value is returned to the cluster manager.
  • the execution body of the above operation may be a second resource manager for managing data in the second node.
  • the second check value is used by the cluster manager to verify that the first resource manager is imported to the second node. Whether the incremental data in the data is correct, thus ensuring the integrity of the data.
  • the second tracking node when the second check value of the tracking incremental data in the second node is acquired according to the foregoing incremental data verification request, the second tracking node may be first generated according to the foregoing tracking incremental data.
  • the query statement of the data thereby using the query statement to obtain the chase incremental data and calculate the check value.
  • the data processing process in the embodiment of the present invention is described below with reference to FIG. 4 from the cluster manager, the first resource manager, and the second resource manager.
  • the overall process includes the following steps:
  • Step 1 After receiving the request for data redistribution, the cluster manager performs full export of the table that needs to be redistributed, and splits according to the distribution rule, uploads the split file to the corresponding node for import, and the control is completed. Redistribution of the initial full amount of data for the table.
  • Step 2 Because during the initial full redistribution process, the service still submits data on the initial node. Therefore, it is necessary to append data to this time to the new node.
  • the cluster controller first queries each new node's current logical transaction log (binlog) location and records (corresponding to steps 1, 2, 3, and 4 in Figure 4), and then sends a chase increment request to the resource controller of the old node (corresponding to In step 5) of FIG. 4, the resource controller scans the logical transaction log according to the location of the backup to obtain the corresponding SQL statement, and obtains the check value by using the SQL to obtain the data (corresponding to steps 6, 7 in FIG. 4). And modify the SQL statement according to the new distribution rules, and calculate the corresponding node of each SQL statement.
  • the cluster manager In the fourth step, after receiving the reply, the cluster manager initiates a verification request for the current incremental increment for the new node (corresponding to steps 13, 14 in FIG. 4).
  • the resource manager obtains the SQL statement of the current chase increment by chasing the transaction log position before the increment, connects the DB to obtain the data, calculates the check value, and returns the response to the cluster manager ( Corresponding to steps 15, 16) in Fig. 4.
  • the resource manager compares the check values of the old node and the new node to determine the current chase.
  • the incremental data is correct and initiates the next process of chasing the increment.
  • it is determined whether to initiate the lock table and initiate the last truncation increment operation (corresponding to step 17 in FIG. 4).
  • FIG. 5 illustrates an online payment system based on the MySQL distributed cluster database:
  • the record of the area of Shanghai will be distributed by the cluster manager to the table of the node 1 ShanghaiPayRecord, the area is The record of Nanjing is distributed by the cluster manager to the redistribution of the node 2's table NanjingPayRecord), so that the data of the original single-node Shanghai area is decomposed into the tables of multiple nodes.
  • This redistribution firstly uses the initial data of the Shanghai table according to the newly established distribution rules (for example, according to the administrative distribution area of Shanghai area, the hash distribution rules are used, for example, the records of Shanghai Putuo District are distributed to the PutuoPayRecord table or the ShanghaiPayRecord1 table of node 1 in turn.
  • the analogy is to redistribute, that is, first to export the data of the original Shanghai area through the MySQL export command mysqldump, after which the resource manager splits all the records of the data file according to the new distribution rule and splits into multiple files. Each file corresponds to a node, and these files are respectively imported into the MySQL database node to which the new table belongs under their new rule (the table has been built in advance). The redistribution operation of the initial original payment record data is completed.
  • the online chase incremental data is started, and the new payment records generated in the Shanghai area during the last redistribution are redistributed to the new table node.
  • the cluster manager sends a chase increment request to the resource manager of the corresponding node of the old table in the Shanghai area, including the location where the previous round of redistributed data is restored.
  • the resource manager After receiving the request for online tracking increment, the resource manager analyzes the binlog file of the transaction log MySQL according to the delivered location, obtains the data of the payment record, calculates the check value for the data, and uses the data to generate the DML statement.
  • the new table name replaces the old table name, where after the new distribution key field is specified, the new DML statement is generated, and the distribution fields are saved in different files according to the node and the new distribution rule, and finally these new DML statements are sent. Execute to the corresponding new table node to complete the redistribution of the delivery data of the old table in Shanghai area. After completion, reply to the verification value of the cluster manager's current round of data, chase the incremental result and the round-up incremental time, check Refer to Figure 7 for operation.
  • the cluster manager initiates the current round-up incremental data check to the resource manager of these new table nodes, and carries the binlog location that was queried before the current round-up increment.
  • the resource manager of the new node After receiving the incremental check request, the resource manager of the new node analyzes the binlog log to generate a query SQL statement according to the delivered location and the table name, thereby obtaining the data calculation check value, and replying to the cluster manager.
  • the cluster manager integrates the check value of the new node and compares it with the check value of the old node. If it is different, the playback failure needs to be replayed. Otherwise, the cluster manager will judge the incremental time initiated by this round. If it is less than the chase increment threshold, it will notify the resource manager to initiate the lock table operation and complete the last round of the chase increment operation. Otherwise, it will continue a new round of chase increment operation until a certain round of chase increment time. Less than the chase increment threshold.
  • FIG. 6 illustrates a ticket online ticketing system based on the MariaDB distributed cluster database:
  • the ticket purchase record is saved according to the type of the ticket, when a certain type of ticket purchase
  • the ticket record suddenly increases or the number of ticket purchase accounts increases, causing the amount of data received by a certain table to increase to the extent that it needs to be redistributed.
  • an online redistribution operation of the distributed database is initiated, and tickets for the ticket are purchased.
  • Information is segmented according to new distribution rules (eg, by way of arrival).
  • the ticket purchase records of all the trains before the redistribution are kept in a separate table. Once the data of the train ticket has accumulated or suddenly explosively increased, the single watch will be overwhelmed. Must be redistributed. It is desirable to modify the distribution rule (in this case, the node position where the data exists according to the ticket type content, for example, the ticket purchase record corresponding to the node 1 of the train ticket, the DongcheRecord, the record of the high-speed ticket corresponding to the node 2 of the table GaotieRecord) The redistribution decomposes the data of the original single-node train ticket onto the table of multiple nodes.
  • the distribution rule in this case, the node position where the data exists according to the ticket type content, for example, the ticket purchase record corresponding to the node 1 of the train ticket, the DongcheRecord, the record of the high-speed ticket corresponding to the node 2 of the table GaotieRecord
  • This redistribution firstly assigns the initial raw data of the EMU to the newly distributed distribution rules (for example, hashing the data according to the arrival location. If the arrival of the EMU ticket ticket to Nanjing, the cluster manager is distributed to the node 1 table. DongcheRecordOfNanjing, and so on) redistribution, that is, first through the mariadb export command mysqldump to export the data of the original train ticket table, after which the resource manager splits all the records of the data file according to the new distribution rule, splits into Multiple files, one for each node, are imported into the MariaDB database node to which the new table belongs under their new rule (the table has been built in advance), and the redistribution of the initial car ticket data is completed.
  • DongcheRecordOfNanjing, and so on redistribution that is, first through the mariadb export command mysqldump to export the data of the original train ticket table, after which the resource manager splits all the records of the data file
  • the online chase incremental data is started, and the new ticket purchase record of the train ticket generated during the last redistribution is redistributed to the new table node.
  • the cluster manager sends a chase increment request to the resource manager of the corresponding node of the old ticket of the train ticket, including the location where the previous round of redistributed data is restored.
  • the resource manager After receiving the request for online tracking increment, the resource manager sequentially analyzes the binlog file of the transaction log MariaDB according to the delivered location, obtains the data of the ticket purchase record, and calculates the check value for all the column data of the data; using the data Generate a DML statement, replace the old table name with a new table name, specify a new distribution key field after where, create a new DML statement, and save the fields in different files according to the node and the new distribution rule, and finally these The new DML statement is sent to the corresponding new table node for execution, thus completing the heavy purchase data of the old ticket of the train ticket.
  • the verification operation can refer to Figure 7.
  • the cluster manager initiates the current round-up incremental data check to the resource manager of these new table nodes, and carries the binlog location that was queried before the current round-up increment.
  • the resource manager of the new node After receiving the incremental check request, the resource manager of the new node analyzes the binlog log to generate a query SQL statement according to the delivered location and the table name, thereby obtaining the data calculation check value, and replying to the cluster manager.
  • the cluster manager integrates the check value of the new node and compares it with the check value of the node where the old ticket purchase record is located. If it is different, the playback failure needs to be replayed. Otherwise, the cluster manager will increment the round. Time to judge. If it is less than the chase increment threshold, it will notify the resource manager to initiate the lock table operation and complete the last round of the chase increment operation. Otherwise, it will continue a new round of chase increment operation until a certain round of chase increment time. Less than the chase increment threshold.
  • the shopping records are saved according to the regional classification table.
  • the amount of data received by a certain table increases to the extent that it needs to be redistributed.
  • an online redistribution operation of the distributed database is initiated, and the shopping information of the region is segmented according to the new distribution rule.
  • This redistribution firstly redistributes the initial raw data of the Nanjing shopping data table according to the newly-established distribution rules (for example, according to the administrative division, etc.), that is, firstly exports the original shopping record table through the Oracle export command. , after the resource manager will data All records of the file are split according to the new distribution rules, split into multiple files, each file corresponds to a node, and these files are respectively imported into the Oracle database node to which the new table belongs under their new rule (the table has been built in advance) , the redistribution operation of the shopping data in the initial Nanjing area was completed.
  • the newly-established distribution rules for example, according to the administrative division, etc.
  • the online tracking incremental data is started, and the new shopping data of the Nanjing area generated during the last redistribution is redistributed to the new table node.
  • the cluster manager sends a chase increment request to the resource manager of the node corresponding to the old table of the shopping data in the Nanjing area, including the location where the previous round of redistributed data is restored.
  • the resource manager After receiving the request for online tracking increment, the resource manager analyzes the Oracle transaction log file sequentially according to the delivered location, obtains the data of the shopping record, and calculates a check value for all the column data of the data; and uses the data to generate DML. Statement, using the new table name instead of the old table name, where after the new distribution key field is specified, the new DML statement is generated, and the distribution fields are saved in different files according to the node and the new distribution rule, and finally these new The DML statement is sent to the corresponding new table node for execution, thereby completing the redistribution of the shopping data of the old table in the Nanjing area. Upon completion, the verification value of the current round of the cluster manager, the incremental result, and the incremental rounding time are returned. For the verification operation, refer to Figure 7.
  • the cluster manager initiates this round of incremental data verification to the resource manager of these new table nodes, carrying the transaction log location queried before the current round of the increment.
  • the resource manager of the new node After receiving the incremental check request, the resource manager of the new node analyzes the transaction log to generate a query SQL statement according to the delivered location and the table name, thereby obtaining the data calculation check value, and replying to the cluster manager.
  • the cluster manager integrates the check value of the new node and compares it with the check value of the node where the old Nanjing shopping record is located. If it is different, the playback failure needs to be replayed. Otherwise, the cluster manager will initiate the roundup. The amount of time is judged. If it is less than the chase increment threshold, it will notify the resource manager to initiate the lock table operation and complete the last round of the chase increment operation. Otherwise, it will continue a new round of chase increment operation until a certain round of chase increment time. Less than the chase increment threshold.
  • the online stock trading system based on PosgreSQL distributed cluster database
  • the system explains:
  • This redistribution firstly redistributes the initial raw data of the large-cap stock transaction record table according to the newly-established distribution rules (such as hashing according to the stock code, etc.), that is, firstly through the PosgreSQL export command to the original large-cap stock transaction.
  • the recorded table is used for data export.
  • the resource manager splits all the records of the data file according to the new distribution rule, splits into multiple files, and each file corresponds to one node, and these files are respectively imported into their new rules.
  • the PosgreSQL database node to which the table belongs (the table has been built in advance) completes the redistribution of the initial large-cap stock transaction record data.
  • the online chase incremental data is started, and the new transaction records of the large-cap stocks generated during the last redistribution are redistributed to the new table node.
  • the cluster manager sends a chase increment request to the resource manager of the corresponding node of the large-cap stock transaction record, including the location where the previous round of redistributed data is restored.
  • the resource manager After receiving the request for online tracking increment, the resource manager sequentially analyzes the transaction log file of PosgreSQL according to the delivered location, obtains the data of the transaction record, and calculates a check value for all column data of the data; and uses the data to generate DML.
  • the cluster manager initiates this round of incremental data verification to the resource manager of these new table nodes, carrying the transaction log location queried before the current round of the increment.
  • the resource manager of the new node After receiving the incremental check request, the resource manager of the new node analyzes the transaction log to generate a query SQL statement according to the delivered location and the table name, thereby obtaining the data calculation check value, and replying to the cluster manager.
  • the cluster manager integrates the check value of the new node and compares it with the check value of the node where the old large-cap stock transaction record is located. If it is different, the playback failure needs to be replayed. Otherwise, the cluster manager will initiate the roundup. The amount of time is judged. If it is less than the chase increment threshold, it will notify the resource manager to initiate the lock table operation and complete the last round of the chase increment operation. Otherwise, it will continue a new round of chase increment operation until a certain round of chase increment time. Less than the chase increment threshold.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the embodiment of the present invention may be embodied in the form of a software product stored in a storage medium (such as a ROM/RAM, a magnetic disk, an optical disk), and includes a plurality of instructions for making a A terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) performs the method described in each embodiment of the present invention.
  • a data processing device is provided, which is used to implement the foregoing embodiments and implementation manners, and has not been described again.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the devices described in the following embodiments may be implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 8 is a structural block diagram of a cluster manager according to an embodiment of the present invention. As shown in FIG. 8, the device includes a first receiving module 82, a first indicating module 84, and a second indicating module 86. The device is described below. :
  • the first receiving module 82 is configured to receive a data redistribution request; the first indication module 84, Connected to the first receiving module 82, configured to instruct the first resource manager to split the original data stored in the first node according to the data redistribution request, and import each of the split sub-raw data into the a second node corresponding to each partial sub-origin data, wherein the first resource manager is configured to manage data stored in the first node; the second indication module 86 is connected to the first indication module 84, and is set to indicate The first resource manager imports the chase increment data stored in the first node during the splitting and importing process into the second node corresponding to the truncated increment data.
  • the second indication module 86 includes an obtaining unit and a sending unit, wherein the acquiring unit is configured to acquire a logical transaction log of the second node corresponding to the foregoing incremental data after importing the sub-raw data.
  • the logical transaction log is used to describe information of the sub-raw data stored in the second node;
  • the sending unit is configured to send a chase increment indication to the first resource manager according to the logical transaction log, where the chase increment
  • the indication is used to instruct the first resource manager to determine, according to the logical transaction log, the tracking incremental data stored in the first node during the splitting and importing process, and import the tracking incremental data into the corresponding incremental data.
  • the second node In the second node.
  • the apparatus further includes a first processing module configured to import the chase increment data stored in the first node during the splitting and importing process by the first resource manager into the chase increment After receiving the second node corresponding to the data, receiving the first check value, the import time, and the import result of the tracking incremental data from the first resource manager, and receiving the tracking incremental data from the second resource manager a second check value, wherein the second resource manager is configured to manage data stored in the second node; and when determining that the import result is successful, determining whether the first check value and the second check value are the same; If the result of the determination is that the first check value and the second check value are different, the process of instructing the first resource manager to import the chase incremental data into the second node corresponding to the incremental data is re-executed; And/or, if the result of the determination is that the first check value and the second check value are the same, determining whether the import time is less than a predetermined threshold; and determining that the import time is less than a predetermined threshold In the case that
  • FIG. 9 is a structural block diagram of a resource manager according to an embodiment of the present invention.
  • the device includes a second receiving module 92, a first importing module 94, a third receiving module 96, and a second importing module 98.
  • the device includes a second receiving module 92, a first importing module 94, a third receiving module 96, and a second importing module 98.
  • the device includes a second receiving module 92, a first importing module 94, a third receiving module 96, and a second importing module 98.
  • the second receiving module 92 is configured to receive a data redistribution indication from the cluster manager.
  • the first importing module 94 is connected to the second receiving module 92, and is configured to set the original stored in the first node according to the data redistribution indication.
  • the data is split and each of the split sub-raw data is respectively imported into a second node corresponding to each partial sub-original data;
  • the third receiving module 96 is connected to the first importing module 94, and is configured to receive from the data.
  • the second clustering module 98 is connected to the third receiving module 96, and is configured to store the chasing in the first node during the splitting and importing process according to the chase increment indication.
  • the incremental data is imported into the second node corresponding to the above-mentioned chasing incremental data.
  • the first importing module 94 may split the original data stored in the first node according to the data redistribution indication and import each of the split sub-raw data into each part separately.
  • the second node corresponding to the sub-origin data deriving the original data from the first node according to the data redistribution indication; splitting the original data according to the data distribution rule, and splitting the original data into multiple files, wherein a file corresponds to a second node; and the plurality of files are respectively uploaded to a second node corresponding to the plurality of files for import.
  • the second importing module 98 may import the tracking incremental data stored in the first node during the splitting and importing process into the corresponding incremental data according to the tracking increment indication.
  • the second node determining the chase incremental data according to the logical transaction log carried in the chase increment indication, where the logical transaction log is used to describe information of the sub-original data stored in the second node;
  • the chase incremental data generates a data operation language DML statement; the DML statement is used to import the chasing incremental data into the second node corresponding to the truncated incremental data.
  • the apparatus further includes a second processing module configured to be in accordance with the above
  • the chase increment indication determines the first school of the above-mentioned chasing incremental data after the above-mentioned chasing delta data stored in the first node is imported into the second node corresponding to the chase increment data during the splitting and importing process.
  • the apparatus further includes a third processing module configured to receive the chase incremental re-import indication from the cluster manager after transmitting the first check value, the import time, and the import result to the cluster manager. And re-executing the process of importing the above-mentioned chase increment data into the second node corresponding to the chase increment data according to the chase increment re-import instruction; and/or receiving a termination indication from the cluster manager, according to the termination indication Terminating the first node to store new data, and importing data stored in the first node and not importing the corresponding second node into the corresponding second node; and/or receiving repeated import from the cluster manager Instructing to return the data stored in the first node and not importing the corresponding second node into the corresponding second node according to the above-mentioned repeated import instruction, until the data of the corresponding second node is not imported The time to the corresponding second node is less than the predetermined threshold.
  • a data processing system comprising: a cluster manager, a first resource manager, wherein the cluster manager is connected to the first resource manager, configured to redistribute the request according to the received data
  • the first resource manager sends a data redistribution indication and a tracking increment indication; the first resource manager is configured to split the original data stored in the first node according to the data redistribution indication from the cluster manager and split the original data
  • Each partial sub-data is respectively imported into a second node corresponding to each partial sub-original data, and is stored in the first node during the splitting and import processing according to the chase increment indication from the cluster manager.
  • the incremental data is imported into the second node corresponding to the incremental data.
  • the cluster manager may send the foregoing chase increment indication to the first resource manager by: acquiring a logical transaction log of the second node corresponding to the chase increment data after importing the sub-origin data.
  • the logical transaction log is used to describe information of the sub-original data stored in the second node; and the foregoing chase increment indication is sent to the first resource manager according to the logical transaction log.
  • the data processing system further includes a second resource manager, wherein the second resource manager is connected to the cluster manager (the cluster manager is further configured to manage the a second resource manager, in the embodiment, after the cluster manager sends the tracking increment indication to the first resource manager, the cluster manager is further configured to: receive the tracking incremental data from the first resource manager.
  • the first resource manager imports the chase increment data stored in the first node during the splitting and importing process into the chase increment data according to the chase increment indication from the cluster manager.
  • the method further comprises: determining a first check value of the incremental data, and importing the import time and the import result into the second node; and the first check value, The import time and the import result are sent to the cluster manager; the tracking incremental re-import instruction from the cluster manager is received, and the chasing incremental data is re-executed according to the chase incremental re-importing instruction.
  • the first resource manager splits the original data stored in the first node according to the data redistribution indication and imports each of the split sub-raw data into each sub-raw data separately.
  • the corresponding second node may be implemented by: deriving original data from the first node according to the data redistribution indication; splitting the original data according to the data distribution rule, and splitting the original data into multiple files, Wherein, one file corresponds to a second node; and the plurality of files are respectively uploaded to a second node corresponding to the plurality of files for import.
  • the foregoing first resource manager may be implemented by: The logical transaction log carried in the indication determines the chase incremental data, wherein the logical transaction log is used to describe the information of the sub-original data stored in the second node; the data operation language DML statement is generated according to the above-mentioned chasing incremental data; and the DML statement is utilized; The chase incremental data is imported into the second node corresponding to the truncated incremental data.
  • a storage medium is also provided in an embodiment of the present invention.
  • the above storage medium may be arranged to store program code for performing the steps in the above method embodiments.
  • the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • the processor executes each of the above steps in accordance with the program code stored in the storage medium.
  • the embodiment of the present invention has the following beneficial effects:
  • the modules or steps of the above-described embodiments of the present invention may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of computing devices that may be executable by the computing device.
  • the program code is implemented such that they can be stored in a storage device by a computing device, and in some cases, the steps shown or described can be performed in an order different than that herein, or separately.
  • the integrated circuit modules are implemented, or a plurality of modules or steps thereof are fabricated into a single integrated circuit module.
  • embodiments of the invention are not limited to any specific combination of hardware and software.
  • online data redistribution is adopted, thereby avoiding service interruption, and since the redistribution of the incremental data is performed, only the incremental data can be redistributed without chasing
  • the database in which the quantity data is located is overall redistributed, thereby ensuring the purpose of data redistribution in the case of a large database data. Therefore, the avoidance of business interruption is achieved to ensure the effect of data redistribution in the case of large database data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本文公布一种数据处理方法、集群管理器、资源管理器、数据处理系统,其中,该方法包括:接收数据重分布请求;根据上述数据重分布请求指示第一资源管理器对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与上述每部分子原始数据对应的第二节点中,其中,该第一资源管理器设置为管理上述第一节点中存储的数据;指示上述第一资源管理器将在进行拆分及导入处理期间存储到上述第一节点中的追增量数据导入到与该追增量数据对应的第二节点中。

Description

数据处理方法、集群管理器、资源管理器、数据处理系统 技术领域
本申请涉及但不限于通信领域,尤指一种数据处理方法、集群管理器、资源管理器、数据处理系统。
背景技术
由于分布式数据库涉及大量分散节点,要求支持的数据量也要远大于单机数据库,因此分布式数据库也要面临不同于单机的数据重分布问题。
然而实现重分布却面临着很多难题,其中关键的一个难题是如何实现在线重分布,即如何在几乎不停止业务的情况下,完成分布数据库的数据重分布。
目前数据重分布方法主要有两种,一种是简单的离线重分布,一种是使用提前分库分片的方式,然而这些模型下的数据重分布追增量数据有着明显的缺陷和局限性。
目前分布式数据库最常用的一种在线重分布模式是数据预分布。该模式需要提前设计好数据的分发规则,即在表设计时就预先分片成大量预分配表,将其放入到不同的数据库中,需要重分布表时在线将某片或者某几片的数据迁移拷贝到指定节点中。
这种架构下在线重分布在后期追增量数据时,由于是按照一个分片(库)作为单位进行数据重分布,因此通过数据库原有工具导出属于这个库的数据,发送到新节点进行数据回放。
但这种模式下的追增量数据方式有着明显的局限性:
这种模式下的追增量数据只能针对某个库整体进行。要根据数据特点提前对数据进行分库,指定数据的路由规则,这样的设计如果一旦不合理,将会导致某些库过于庞大,如果库本身数据很庞大,那么使用这种方式是没有办法完成这个库的重分布的。
发明概述
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本发明实施例提供了一种数据处理方法、集群管理器、资源管理器、数据处理系统,以使在数据库数据庞大的情况下,也能完成数据重分布。
根据本发明的一个实施例,提供了一种数据处理方法,包括:接收数据重分布请求;根据所述数据重分布请求指示第一资源管理器对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与所述每部分子原始数据对应的第二节点中,其中,所述第一资源管理器设置为管理所述第一节点中存储的数据;指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
在一实施方式中,所述指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中包括:获取与所述追增量数据对应的第二节点在导入了所述子原始数据后的逻辑事务日志,其中,所述逻辑事务日志用于描述所述第二节点中存储的所述子原始数据的信息;根据所述逻辑事务日志向所述第一资源管理器发送追增量指示,其中,所述追增量指示用于指示所述第一资源管理器根据所述逻辑事务日志确定在进行拆分及导入处理期间存储到所述第一节点中的追增量数据,并将所述追增量数据导入到与所述追增量数据对应的第二节点中。
在一实施方式中,在指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中之后,所述方法还包括:接收来自所述第一资源管理器的所述追增量数据的第一校验值、导入时间和导入结果,以及,接收来自第二资源管理器的所述追增量数据的第二校验值,其中,所述第二资源管理器设置为管理第二节点中存储的数据;在确定所述导入结果为导入成功时,判断所述第一校验值和所述第二校验值是否相同;执行如下处理中的至少之一:在判断结果为所述第一校验值和所述第二校验值不相同的情况下, 重新执行指示所述第一资源管理器将所述追增量数据导入到与所述追增量数据对应的所述第二节点中的处理;在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值,在判断结果为所述导入时间小于所述预定阈值的情况下,控制所述第一节点终止存储新的数据以及指示所述第一资源管理器将所述第一节点中存储的且未导入第二节点的数据导入到对应的第二节点中;在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值,在判断结果为所述导入时间大于或等于所述预定阈值的情况下,返回指示所述第一资源管理器将所述第一节点中存储的且未导入对应的所述第二节点的数据导入到对应的所述第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于所述预定阈值为止。
根据本发明的另一个实施例,还提供了一种数据处理方法,包括:接收来自集群管理器的数据重分布指示;根据所述数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中;接收来自所述集群管理器的追增量指示;根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
在一实施方式中,所述根据所述数据重分布指示对所述第一节点中存储的所述原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中包括:根据所述数据重分布指示从所述第一节点中导出所述原始数据;根据数据分发规则对所述原始数据进行拆分,将所述原始数据拆分成多个文件,其中,一个文件对应一个第二节点;将所述多个文件分别上传到与所述多个文件对应的第二节点进行导入。
在一实施方式中,所述根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中包括:根据所述追增量指示中携带的逻辑事务日志确定所述追增量数据,其中,所述逻辑事务日志用于描述第二节点中存储的所述子原 始数据的信息;根据所述追增量数据生成数据操作语言DML语句;利用所述DML语句将所述追增量数据导入到与所述追增量数据对应的第二节点中。
在一实施方式中,在根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的所述追增量数据导入到与所述追增量数据对应的第二节点中之后,所述方法还包括:确定所述追增量数据的第一校验值,以及将所述追增量数据导入到所述第二节点中的导入时间及导入结果;将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器。
在一实施方式中,在将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器之后,所述方法还包括如下至少之一:接收来自所述集群管理器的追增量重导入指示,根据所述追增量重导入指示重新执行将所述追增量数据导入到与所述追增量数据对应的第二节点中的处理;接收来自所述集群管理器的终止指示,根据所述终止指示终止所述第一节点存储新的数据,并将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中;接收来自所述集群管理器的重复导入指示;根据所述重复导入指示返回将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于预定阈值为止。
根据本发明的另一个实施例,还提供了一种集群管理器,包括:第一接收模块,设置为接收数据重分布请求;第一指示模块,设置为根据所述数据重分布请求指示第一资源管理器对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与所述每部分子原始数据对应的第二节点中,其中,所述第一资源管理器设置为管理所述第一节点中存储的数据;第二指示模块,设置为指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
在一实施方式中,所述第二指示模块包括:获取单元,设置为获取与 所述追增量数据对应的第二节点在导入了所述子原始数据后的逻辑事务日志,其中,所述逻辑事务日志用于描述所述第二节点中存储的所述子原始数据的信息;发送单元,设置为根据所述逻辑事务日志向所述第一资源管理器发送追增量指示,其中,所述追增量指示用于指示所述第一资源管理器根据所述逻辑事务日志确定在进行拆分及导入处理期间存储到所述第一节点中的追增量数据,并将所述追增量数据导入到与所述追增量数据对应的第二节点中。
在一实施方式中,所述集群管理器还包括:第一处理模块,设置为在指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中之后,接收来自所述第一资源管理器的所述追增量数据的第一校验值、导入时间和导入结果,以及,接收来自第二资源管理器的所述追增量数据的第二校验值,其中,所述第二资源管理器设置为管理第二节点中存储的数据;在确定所述导入结果为导入成功时,判断所述第一校验值和所述第二校验值是否相同;执行如下处理中的至少之一:在判断结果为所述第一校验值和所述第二校验值不相同的情况下,重新执行指示所述第一资源管理器将所述追增量数据导入到与所述追增量数据对应的所述第二节点中的处理;在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值,在判断结果为所述导入时间小于所述预定阈值的情况下,控制所述第一节点终止存储新的数据以及指示所述第一资源管理器将所述第一节点中存储的且未导入第二节点的数据导入到对应的第二节点中;在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值,在判断结果为所述导入时间大于或等于所述预定阈值的情况下,返回指示所述第一资源管理器将所述第一节点中存储的且未导入对应的所述第二节点的数据导入到对应的所述第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于所述预定阈值为止。
根据本发明的另一个实施例,还提供了一种资源管理器,包括:第二接收模块,设置为接收来自集群管理器的数据重分布指示;第一导入模 块,设置为根据所述数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中;第三接收模块,设置为接收来自所述集群管理器的追增量指示;第二导入模块,设置为根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
在一实施方式中,所述资源管理器还包括:第二处理模块,设置为在根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的所述追增量数据导入到与所述追增量数据对应的第二节点中之后,确定所述追增量数据的第一校验值,以及将所述追增量数据导入到所述第二节点中的导入时间及导入结果;将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器。
在一实施方式中,所述资源管理器还包括:第三处理模块,设置为在将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器之后,接收来自所述集群管理器的追增量重导入指示,根据所述追增量重导入指示重新执行将所述追增量数据导入到与所述追增量数据对应的第二节点中的处理;或,接收来自所述集群管理器的终止指示,根据所述终止指示终止所述第一节点存储新的数据,并将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中;或,接收来自所述集群管理器的重复导入指示;根据所述重复导入指示返回将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于预定阈值为止。
根据本发明的另一个实施例,还提供了一种数据处理系统,包括集群管理器、第一资源管理器,其中,所述集群管理器连接至所述第一资源管理器,设置为根据接收的数据重分布请求向所述第一资源管理器发送数据重分布指示以及追增量指示;所述第一资源管理器,设置为根据来自所述集群管理器的数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与所述每部分子原始数据对应的第 二节点中,以及根据来自所述集群管理器的追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
在一实施方式中,所述数据处理系统还包括第二资源管理器,其中,所述第二资源管理器连接至所述集群管理器;所述集群管理器还设置为:在向所述第一资源管理器发送所述追增量指示之后,接收来自所述第一资源管理器的所述追增量数据的第一校验值、导入时间和导入结果,以及,接收来自所述第二资源管理器的所述追增量数据的第二校验值;在确定所述导入结果为导入成功时,判断所述第一校验值和所述第二校验值是否相同;执行如下处理中的至少之一:在判断结果为所述第一校验值和所述第二校验值不相同的情况下,向所述第一资源管理器发送追增量重导入指示;在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值,在判断结果为所述导入时间小于所述预定阈值的情况下,向所述第一资源管理器发送终止指示;在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值,在判断结果为所述导入时间大于或等于所述预定阈值的情况下,向所述第一资源管理器发送重复导入指示;所述第一资源管理器还设置为:在根据来自所述集群管理器的追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中之后,确定所述追增量数据的第一校验值,以及将所述追增量数据导入到所述第二节点中的导入时间及导入结果;将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器;接收来自所述集群管理器的追增量重导入指示,根据所述追增量重导入指示重新执行将所述追增量数据导入到与所述追增量数据对应的第二节点中的处理;或,接收来自所述集群管理器的终止指示,根据所述终止指示终止所述第一节点存储新的数据,并将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中;或,接收来自所述集群管理器的重复导入指示;根据所述重复导入指示返回将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于所述预定阈值 为止。
在一实施方式中,所述第一资源管理器设置为:根据所述追增量指示中携带的逻辑事务日志确定所述追增量数据,其中,所述逻辑事务日志用于描述第二节点中存储的所述子原始数据的信息;根据所述追增量数据生成数据操作语言DML语句;利用所述DML语句将所述追增量数据导入到与所述追增量数据对应的第二节点中。
根据本发明的又一个实施例,还提供了一种存储介质。该存储介质设置为存储用于执行上述步骤的程序代码。
通过本发明实施例,采用的是在线的数据重分布,从而避免了业务中断,并且,由于在进行追增量数据的重分布时,可以仅对追增量数据进行重分布,无需对追增量数据所在的数据库进行整体的重分布,从而保证了在数据库数据庞大的情况下,也能完成数据重分布的目的。因此,达到避免业务中断保证在数据库数据庞大的情况下也能实现数据重分布的效果。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
图1是根据本发明实施例的数据处理方法的计算机终端的硬件结构框图;
图2是根据本发明实施例的一种数据处理方法的流程图;
图3是根据本发明实施例的另一种数据处理方法的流程图;
图4是根据本发明实施例的分布式数据库在线重分布追增量时序图;
图5是根据本发明实施例的地区支付记录重分布追增量示意图;
图6是根据本发明实施例的动车车票记录重分布追增量示意图;
图7是根据本发明实施例的分布式数据库在线重分布追增量数据及校验示意图;
图8是根据本发明实施例的一种数据处理装置的结构框图;
图9是根据本发明实施例的另一种数据处理装置的结构框图。
详述
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
在一个实施例中,本申请中的数据处理方法可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在计算机终端上为例,图1是本发明实施例的一种数据处理方法的计算机终端的硬件结构框图。如图1所示,计算机终端10可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输装置106。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,计算机终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可设置为存储应用软件的软件程序以及模块,如本发明实施例中的数据处理方法对应的程序指令/模块,处理器102通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106设置为经由一个网络接收或者发送数据。上述的网络实例可包括计算机终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一 个实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其设置为通过无线方式与互联网进行通讯。
在本实施例中提供了一种运行于上述计算机终端的数据处理方法,图2是根据本发明实施例的一种数据处理方法的流程图,如图2所示,该流程包括如下步骤:
步骤S202,接收数据重分布请求;
步骤S204,根据上述数据重分布请求指示第一资源管理器对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与上述每部分子原始数据对应的第二节点中,其中,该第一资源管理器设置为管理上述第一节点中存储的数据;
步骤S206,指示上述第一资源管理器将在进行拆分及导入处理期间存储到上述第一节点中的追增量数据导入到与该追增量数据对应的第二节点中。
其中,执行上述操作的可以是集群管理器。该集群管理器可以接收上层业务的集群相关请求(如上述的数据重分布请求),对分布式集群进行管理,协调资源管理器的数据库(Data Base,简称为DB)状态上报,通知资源管理器进行切换、备份以及重分布等命令。上述的第一资源管理器通常是数据库(包括上述第一节点)的上层代理,是响应上层请求对数据库进行复杂操作的本地数据库监控程序。在本发明实施例中,第一资源管理器可以设置为响应集群管理器的重分布请求,执行重分布流程。上述的第一节点以及第二节点均是设置为保存数据的基本节点。
通过上述步骤,采用的是在线的数据重分布,从而避免了业务中断,并且,由于在进行追增量数据的重分布时,可以仅对追增量数据进行重分布,无需对追增量数据所在的数据库进行整体的重分布,从而保证了在数据库数据庞大的情况下,也能完成数据重分布的目的。因此,达到避免业务中断保证在数据库数据庞大的情况下也能实现数据重分布的效果。
在一个实施例中,指示上述第一资源管理器将在进行拆分及导入处理期间存储到上述第一节点中的追增量数据导入到与上述追增量数据对应的 第二节点中包括:获取与上述追增量数据对应的第二节点在导入了上述子原始数据后的逻辑事务日志,其中,该逻辑事务日志用于描述上述第二节点中存储的子原始数据的信息;根据上述逻辑事务日志向第一资源管理器发送追增量指示,其中,该追增量指示用于指示第一资源管理器根据逻辑事务日志确定在进行拆分及导入处理期间存储到上述第一节点中的追增量数据,并将上述追增量数据导入到与追增量数据对应的第二节点中。在本实施例中,第一资源管理器可以根据逻辑事务日志确定第一节点中都有哪些数据备份到第二节点中,从而确定未备份的数据,即确定追增量数据。
在一个实施例中,在指示上述第一资源管理器将在进行拆分及导入处理期间存储到上述第一节点中的追增量数据导入到与上述追增量数据对应的第二节点中之后,上述方法还包括:接收来自第一资源管理器的上述追增量数据的第一校验值、导入时间和导入结果,以及,接收来自第二资源管理器的上述追增量数据的第二校验值,其中,该第二资源管理器设置为管理第二节点中存储的数据;在确定上述导入结果为导入成功时,判断上述第一校验值和第二校验值是否相同;在判断结果为第一校验值和第二校验值不相同的情况下,重新执行指示第一资源管理器将追增量数据导入到与追增量数据对应的所述第二节点中的处理;和/或,
在判断结果为第一校验值和第二校验值相同的情况下,判断上述导入时间是否小于预定阈值;在判断结果为上述导入时间小于预定阈值的情况下,控制第一节点终止存储新的数据以及指示第一资源管理器将第一节点中存储的且未导入第二节点的数据导入到对应的第二节点中;和/或,在判断结果为上述导入时间大于或等于预定阈值的情况下,返回指示第一资源管理器将第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于上述预定阈值为止。
在本实施例中,当判断出导入时间大于或等于上述预定阈值后,说明最新存储到第一节点中的数据量是偏大的,为了保证业务的持续,循环执行前述的将第一节点中存储的且未导入第二节点中的新数据导入到第二节点中的处理(在该循环处理中包括上述的确定导入结果,比较两个校验值以 及判断导入时间的相关操作步骤,即,每一次循环都是对第一节点中存储的且未导入第二节点中的数据执行一次新的追增量操作),并且,在该循环处理中,当循环结束时,也是同样执行上述的控制第一节点终止存储新的数据以及指示第一资源管理器将第一节点中存储的且未导入第二节点的数据导入到对应的第二节点中的操作。
在上述实施例中,在接收来自第二资源管理器的上述追增量数据的第二校验值之前,执行以下操作:向第二资源管理发送追增量数据校验请求,其中,该追增量数据校验请求中携带上述的逻辑事务日志。第二资源管理器中会根据该追增量数据校验请求生成用于查询第二节点中的数据的查询语句,从而获取数据并计算校验值,并返回计算出的校验值。
上述的实施例主要是从集群管理器侧进行描述的,从上述的实施例中可知,本发明实施例中的数据重分布方法不再需要提前对数据进行分析,不再需要担心数据分发规则的设计不合理的问题,也不需要对库级的数据设计分发规则。在保证每个节点负载均衡情况,根据目前场景设计每个节点的合理的分发规则即可。并且,一旦发生某个节点某个表过于庞大需要进行重分布时,通过客户端通知集群管理器,集群管理器控制每个DB的资源管理器,执行重分布过程。在在线情况下,将该表原有数据导出,按照最新的数据分发规则重新拆分,并导入到每个新的数据节点。上述全量数据重分布结束以后,开始追增量数据流程。集群管理器收集每个节点的追增量结果,并向新节点发起增量校验流程。集群管理器比较新旧节点的校验值,完成增量校验。
由于在相关技术中进行数据重分布时是不支持增量数据的校验。并且,相关技术中的后期追增量是直接使用数据库原有工具解析和回放,因此没有办法对新增的数据是否正确进行校验。从而会导致在进行数据重分布过程导致数据出错,影响后续的正常业务。在本发明实施例中,增加了数据的校验操作,通过将第一资源管理器上报的追增量数据的第一校验值和第二资源管理器上报的追增量数据的第二校验值进行比较,从而确定重分布的数据是否正确地导入到了第二节点中,保证了数据的完整性,为后续的业务的正常进行提供了保障。
图3是根据本发明实施例的另一种数据处理方法的流程图,如图3所示,该流程包括如下步骤:
步骤S302,接收来自集群管理器的数据重分布指示;
步骤S304,根据上述数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中;
步骤S306,接收来自上述集群管理器的追增量指示;
步骤S308,根据上述追增量指示将在进行拆分及导入处理期间存储到第一节点中的追增量数据导入到与追增量数据对应的第二节点中。
其中,执行上述操作的可以是用于管理第一节点中的数据的第一资源管理器。
通过上述步骤,采用的是在线的数据重分布,从而避免了业务中断,并且,由于在进行追增量数据的重分布时,可以仅对追增量数据进行重分布,无需对追增量数据所在的数据库进行整体的重分布,从而保证了在数据库数据庞大的情况下,也能完成数据重分布的目的。达到避免业务中断保证在数据库数据庞大的情况下也能实现数据重分布的效果。
在一个实施例中,根据上述数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中包括:根据上述数据重分布指示从第一节点中导出上述原始数据;根据数据分发规则对上述原始数据进行拆分,将上述原始数据拆分成多个文件,其中,一个文件对应一个第二节点;将上述多个文件分别上传到与该多个文件对应的第二节点进行导入。在本实施例中,上述的拆分规则可以是灵活调整的,随着分布式数据库的运行,数据库数据会越来越多,一旦发现之前确定的拆分规则有问题,便可以随时对拆分规则进行调整,并且,该拆分规则与上述的原始数据的分发规则可以是相对应的,同样的,该原始数据的分发规则也是可以灵活调整的。
在一个实施例中,根据上述追增量指示将在进行拆分及导入处理期间 存储到上述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中包括:根据上述追增量指示中携带的逻辑事务日志确定上述追增量数据,其中,该逻辑事务日志用于描述第二节点中存储的上述子原始数据的信息;根据上述追增量数据生成数据操作语言(Data Manipulation Language,简称为DML)语句;利用该DML语句将上述追增量数据导入到与上述追增量数据对应的第二节点中。在本实施例中,在生成上述DML语句时,可以首先生成用于将数据导入到第一节点中的初始DML语句,然后再对该初始DML语句进行改造得到上述的DML语句,主要改造的内容包括:在初始DML语句中用新节点名(即,第二节点的名称)代替旧节点名(即,第一节点的名称),并在where后指定新的分发键字段,该分发键字段用于指向第二节点。在本实施例中,在利用上述DML语句将上述追增量数据导入到与上述追增量数据对应的第二节点中时,可以首先与第二节点建立临时连接,并在连接之后执行生成的上述DML语句。
在一个实施例中,在根据上述追增量指示将在进行拆分及导入处理期间存储到第一节点中的上述追增量数据导入到与上述追增量数据对应的第二节点中之后,上述方法还包括:确定上述追增量数据的第一校验值,以及将上述追增量数据导入到第二节点中的导入时间及导入结果;将上述第一校验值、导入时间和导入结果发送给集群管理器。在本实施例中,在将第一校验值、导入时间和导入结果发送给集群管理器之后,集群管理器可以根据导入结果、第一校验值和导入时间进行相关的验证(在对校验值进行验证时,需要结合来自第二资源管理器的第二校验值进行验证)及判断操作,实际操作在前述的实施例中已进行了陈述,在此,不再赘述。
在一个实施例中,上述第一校验值、导入时间和导入结果可以用于集群管理器进行相关的验证及判断操作,并且,根据不同的验证和/或判断结果,会受到集群管理器下发的不同的指示,包括:接收来自集群管理器的追增量重导入指示,根据该追增量重导入指示重新执行将上述追增量数据导入到与追增量数据对应的第二节点中的处理;和/或,接收来自集群管理器的终止指示,根据该终止指示终止第一节点存储新的数据,并将该第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中; 和/或,接收来自集群管理器的重复导入指示;根据该重复导入指示返回将第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于上述预定阈值为止。在本实施例中,在循环操作终止时,也会接收到来自集群管理器的终止指示,根据该终止指示终止第一节点存储新的数据,并将该第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中。
图3及与图3相关的实施例是从第一资源管理器侧进行描述的。从上述的实施例中可知,在进行追增量数据流程时,资源管理器(即,上述的第一资源管理器)首先获取旧节点(即,上述的第一节点)的数据库事务日志,根据上一次备份到的位置或者上一次回放到的位置筛选出事务日志块,筛选得到跟旧表有关的结构化查询语言(Structured Query Language,简称为SQL)语句。通过这些SQL语句连接数据库执行获取数据计算增量校验值。然后再将这些SQL语句根据新表名、新的分发规则以及其他节点信息等重构并发送到相应的机器,并行提交执行。将回放结果、校验值和回放时间上报给集群管理器。新节点(对应于上述的第二节点)收到来自集群管理器的增量校验请求以后,可以通过本次增量回放开始和结束的事务日志位置获取本次回放的SQL语句,从而获取相关数据计算校验值并上报给集群管理器。
在本发明的一个实施例中,还提供了一种数据处理方法,包括如下步骤:
接收来自集群管理器的追增量数据校验请求;
根据该追增量数据校验请求获取第二节点中的追增量数据的第二校验值;
将该第二校验值返回给所述集群管理器。
其中,上述操作的执行主体可以是用于管理第二节点中的数据的第二资源管理器。
上述第二校验值用于集群管理器验证第一资源管理器导入到第二节点 中的追增量数据是否正确,从而保证了数据的完整性。
在一个实施例中,在根据上述追增量数据校验请求获取第二节点中的追增量数据的第二校验值时,可以首先根据上述追增量数据生成用于查询第二节点中的数据的查询语句,从而利用该查询语句获取追增量数据并计算校验值。
下面结合图4从集群管理器、第一资源管理器、第二资源管理器整体出发对本发明实施例中的数据处理过程进行说明,该整体流程包括如下步骤:
第一步:集群管理器收到数据重分布的请求后,对需要重分布的表进行全量导出,并根据分发规则进行拆分,将拆分后的文件上传到相应的节点进行导入,控制完成表的初始全量数据的重分布。
第二步:由于在初始全量重分布过程中,业务仍然在初始节点上提交数据。因此需要对这段时间的数据追加到新节点上。集群控制器首先查询每新节点当前逻辑事务日志(binlog)位置并记录(对应于图4中的步骤1、2、3、4),然后发送追增量请求给旧节点的资源控制器(对应于图4中的步骤5),资源控制器根据备份到的位置,扫描逻辑事务日志过滤获取相应SQL语句,通过这些SQL获取数据计算校验值(对应于图4中的步骤6、7),并根据新的分发规则等对SQL语句进行改造,计算每条SQL语句相应的节点。
第三步,并发对每条新SQL到其对应的新节点上执行回放(对应于图4中的步骤8、9),执行完成以后,统计回放时间以及回放结果和校验值上报集群管理器(对应于图4中的步骤10、11、12)。
第四步,集群管理器收到回复后,对新节点发起本次追增量的校验请求(对应于图4中的步骤13、14)。
第五步,资源管理器收到校验请求后,通过追增量前的事务日志位置获取本次追增量的SQL语句,连接DB获取数据,并计算校验值,回复给集群管理器(对应于图4中的步骤15、16)。
第六步,资源管理器对旧节点和新节点的校验值进行比较确定本次追 增量数据是否正确,并启动下一次追增量的流程。根据回放结果,确定是否发起锁表和发起最后一次追增量操作(对应于图4中的步骤17)。
通过上述各步骤能够保证分布式数据库在线重分布的可靠性和普适性,确保了分布式数据库的可维护性和数据安全。
下面结合不同的数据库对本发明实施例进行说明:
实施例一
在本实施例集合图5对基于MySQL分布式集群数据库的在线支付系统进行说明:
在该系统中,假设是按照地区分表保存支付记录,那么某地区出现的交付记录突然暴涨,或者支付账户突然暴涨等情况,导致某个表承受的数据量增大到需要重分布的程度,此时就要发起分布式数据库的在线重分布操作,对该地区的支付信息根据新的分发规则进行切分。
在本实施例中,以上海地区的支付记录为例,假设重分布前跟上海地区有关的所有支付记录保存在一个单独的上海地区表中,一旦上海地区的数据一直累积或者突然爆发式增长,将使该单表不堪重负,必须进行重分布。期望在修改原来的哈希分发规则(即在此例中,是根据地区字段内容来确定数据存在的节点位置,比如地区为上海的记录将被集群管理器分发到节点1的表ShanghaiPayRecord,地区为南京的记录被集群管理器分发到节点2的表NanjingPayRecord)以后的重分布,使本来单节点上海地区的数据分解到多个节点的表上。
本次重分布首先将上海这个表的初始原数据根据新制定的分发规则(比如根据上海地区的行政区使用hash分发规则,比如对于上海普陀区的记录分发到节点1的PutuoPayRecord表或者ShanghaiPayRecord1表,依次类推)进行重新分布,即首先通过MySQL的导出命令mysqldump对原来的上海地区的表进行数据导出,之后资源管理器将数据文件的所有记录根据新的分发规则进行拆分,拆分成多个文件,每个文件对应一个节点,将这些文件分别导入他们新规则下新表所属的MySQL数据库节点(已提前建过表), 就完成初始原支付记录数据的重分布操作。
完成这一步后,开始在线追增量数据,将在上一次重分布期间产生上海地区的新支付记录重分布到新表节点。集群管理器给上海地区旧表对应节点的资源管理器发送追增量请求,包含上一轮重分布数据恢复到的位置。
资源管理器收到在线追增量的请求后,根据下发的位置,顺序分析事务日志MySQL的binlog文件,获取支付记录的数据,对这些数据计算校验值;利用这些数据产生DML语句,使用新表名代替旧表名,where后指定新的分发键字段,组建产生新的DML语句,并根据节点和新的分发规则分发字段分别保存在不同的文件中,最后将这些新的DML语句发送到相应的新表节点执行,从而完成上海地区旧表的交付数据的重新分布,完成后,回复集群管理器本轮数据的校验值,追增量结果和本轮追增量时间,校验操作可参考图7。
集群管理器向这些新表节点的资源管理器发起本轮的追增量数据校验,携带本轮追增量之前查询到的binlog位置。
新节点的资源管理器收到增量校验的请求后,根据下发的位置和表名,分析binlog日志生成查询SQL语句,从而获取数据计算校验值,并回复给集群管理器。
集群管理器综合新节点的校验值,跟旧节点的校验值比较,如果不同则本次回放失效需要重新回放,否则,集群管理器将对本轮发起的追增量时间进行判断。如果小于追增量阈值,则会通知资源管理器发起锁表操作,完成最后一轮的追增量操作,否则,将继续新的一轮的追增量操作,直到某一轮追增量时间小于追增量阈值。
实施例二
在本实施例集合图6对基于MariaDB分布式集群数据库的车票在线购票系统进行说明:
在该系统中,假设是按照票据类型保存购票记录,当某种类型的票购 票记录突然增多或者购票账户人数不断增多,导致某个表承受的数据量增大到需要重分布的程度,此时就要发起分布式数据库的在线重分布操作,对该类车票的购票信息根据新的分发规则(比如按照到达地区等方式)进行切分。
在本实施例中,以动车车票为例,假设重分布前所有动车的购票记录保存在一个单独的表中,一旦动车车票的数据一直累积或者突然爆发式增长,将使该单表不堪重负,必须进行重分布。期望在修改分发规则(在此例中,是根据车票类型内容来确定数据存在的节点位置,比如动车车票的购票记录对应节点1的表DongcheRecord,高铁车票的记录对应节点2的表GaotieRecord)以后的重分布,使本来单节点动车车票的数据分解到多个节点的表上。
本次重分布首先将动车这个表的初始原数据根据新制定的分发规则(比如根据到达地点对数据进行哈希分发,如到达南京的动车购票记录将集群管理器被分发到节点1的表DongcheRecordOfNanjing,依次类推)进行重新分布,即首先通过mariadb的导出命令mysqldump对原来的动车车票的表进行数据导出,之后资源管理器将数据文件的所有记录根据新的分发规则进行拆分,拆分成多个文件,每个文件对应一个节点,将这些文件分别导入他们新规则下新表所属的MariaDB数据库节点(已提前建过表),就完成初始原动车车票数据的重分布操作。
完成这一步后,开始在线追增量数据,将在上一次重分布期间产生的动车车票的新购票记录重分布到新表节点。集群管理器给动车车票旧表对应节点的资源管理器发送追增量请求,包含上一轮重分布数据恢复到的位置。
资源管理器收到在线追增量的请求后,根据下发的位置,顺序分析事务日志MariaDB的binlog文件,获取购票记录的数据,对这些数据的所有列数据计算校验值;利用这些数据产生DML语句,使用新表名代替旧表名,where后指定新的分发键字段,组建产生新的DML语句,并根据节点和新的分发规则分发字段分别保存在不同的文件中,最后将这些新的DML语句发送到相应的新表节点执行,从而完成动车车票旧表的购票数据的重 新分布,完成后,回复集群管理器本轮数据的校验值,追增量结果和本轮追增量时间,校验操作可参考图7。
集群管理器向这些新表节点的资源管理器发起本轮的追增量数据校验,携带本轮追增量之前查询到的binlog位置。
新节点的资源管理器收到增量校验的请求后,根据下发的位置和表名,分析binlog日志生成查询SQL语句,从而获取数据计算校验值,并回复给集群管理器。
集群管理器综合新节点的校验值,跟旧的购票记录所在节点的校验值比较,如果不同则本次回放失效需要重新回放,否则,集群管理器将对本轮发起的追增量时间进行判断。如果小于追增量阈值,则会通知资源管理器发起锁表操作,完成最后一轮的追增量操作,否则,将继续新的一轮的追增量操作,直到某一轮追增量时间小于追增量阈值。
实施例三
在本实施例中对基于Oracle的分布式集群数据库的在线购物系统进行说明:
在该系统中,假设是按照地区分表保存购物记录,当购物记录突然增多或者购物人数不断增多亦或者是商品种类不断增加,导致某个表承受的数据量增大到需要重分布的程度,此时就要发起分布式数据库的在线重分布操作,对该地区的购物信息根据新的分发规则进行切分。
在本实施例中,以南京地区的购物数据为例,假设重分布前所有南京地区的购票记录保存在一个单独的表中,一旦南京地区购物的数据一直累积或者突然爆发式增长,将使该单表不堪重负,必须进行重分布。期望在修改分发规则以后的重分布,使本来单节点南京地区的购物数据分解到多个节点的表上。
本次重分布首先将南京地区购物数据这个表的初始原数据根据新制定的分发规则(比如根据行政区划分等)进行重新分布,即首先通过Oracle的导出命令对原来的购物记录的表进行数据导出,之后资源管理器将数据 文件的所有记录根据新的分发规则进行拆分,拆分成多个文件,每个文件对应一个节点,将这些文件分别导入他们新规则下新表所属的Oracle数据库节点(已提前建过表),就完成初始南京地区的购物数据的重分布操作。
完成这一步后,开始在线追增量数据,将在上一次重分布期间产生的南京地区的新购物数据重分布到新表节点。集群管理器给南京地区购物数据旧表对应节点的资源管理器发送追增量请求,包含上一轮重分布数据恢复到的位置。
资源管理器收到在线追增量的请求后,根据下发的位置,顺序分析Oracle的事务日志文件,获取购物记录的数据,对这些数据的所有列数据计算校验值;利用这些数据产生DML语句,使用新表名代替旧表名,where后指定新的分发键字段,组建产生新的DML语句,并根据节点和新的分发规则分发字段分别保存在不同的文件中,最后将这些新的DML语句发送到相应的新表节点执行,从而完成南京地区旧表的购物数据的重新分布,完成后,回复集群管理器本轮数据的校验值、追增量结果和本轮追增量时间,校验操作可参考图7。
集群管理器向这些新表节点的资源管理器发起本轮的追增量数据校验,携带本轮追增量之前查询到的事务日志位置。
新节点的资源管理器收到增量校验的请求后,根据下发的位置和表名,分析事务日志生成查询SQL语句,从而获取数据计算校验值,并回复给集群管理器。
集群管理器综合新节点的校验值,跟旧的南京地区购物记录所在节点的校验值比较,如果不同则本次回放失效需要重新回放,否则,集群管理器将对本轮发起的追增量时间进行判断。如果小于追增量阈值,则会通知资源管理器发起锁表操作,完成最后一轮的追增量操作,否则,将继续新的一轮的追增量操作,直到某一轮追增量时间小于追增量阈值。
实施例四
在本实施例中对基于PosgreSQL分布式集群数据库的在线股票交易系 统进行说明:
在该系统中,假设是按照股票类别分表保存股票记录,当股票交易记录突然增多或者该股票类别的股票不断增多亦或者是持有某类股票的人数不断增加,导致某个表承受的数据量增大到需要重分布的程度,此时就要发起分布式数据库的在线重分布操作,对该地区的股票相关信息根据新的分发规则进行切分。
在本实施例中,以大盘股为例,假设重分布前所有大盘股的交易记录保存在一个单独的表中,一旦大盘股的交易记录数据一直累积或者突然爆发式增长,将使该单表不堪重负,必须进行重分布。期望在修改分发规则以后的重分布,使本来单节点大盘股的交易记录数据分解到多个节点的表上。
本次重分布首先将大盘股交易记录这个表的初始原数据根据新制定的分发规则(比如根据股票代码进行哈希划分等)进行重新分布,即首先通过PosgreSQL的导出命令对原来的大盘股交易记录的表进行数据导出,之后资源管理器将数据文件的所有记录根据新的分发规则进行拆分,拆分成多个文件,每个文件对应一个节点,将这些文件分别导入他们新规则下新表所属的PosgreSQL数据库节点(已提前建过表),就完成初始大盘股交易记录数据的重分布操作。
完成这一步后,开始在线追增量数据,将在上一次重分布期间产生的大盘股的新交易记录重分布到新表节点。集群管理器给大盘股交易记录旧表对应节点的资源管理器发送追增量请求,包含上一轮重分布数据恢复到的位置。
资源管理器收到在线追增量的请求后,根据下发的位置,顺序分析PosgreSQL的事务日志文件,获取交易记录的数据,对这些数据的所有列数据计算校验值;利用这些数据产生DML语句,使用新表名代替旧表名,where后指定新的分发键字段,组建产生新的DML语句,并根据节点和新的分发规则分发字段分别保存在不同的文件中,最后将这些新的DML语句发送到相应的新表节点执行,从而完成大盘股交易记录旧表的购票数据的重新分布,完成后,回复集群管理器本轮数据的校验值,追增量结果和本 轮追增量时间,校验操作可参考图7。
集群管理器向这些新表节点的资源管理器发起本轮的追增量数据校验,携带本轮追增量之前查询到的事务日志位置。
新节点的资源管理器收到增量校验的请求后,根据下发的位置和表名,分析事务日志生成查询SQL语句,从而获取数据计算校验值,并回复给集群管理器。
集群管理器综合新节点的校验值,跟旧的大盘股交易记录所在节点的校验值比较,如果不同则本次回放失效需要重新回放,否则,集群管理器将对本轮发起的追增量时间进行判断。如果小于追增量阈值,则会通知资源管理器发起锁表操作,完成最后一轮的追增量操作,否则,将继续新的一轮的追增量操作,直到某一轮追增量时间小于追增量阈值。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明实施例的技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明每个实施例所述的方法。
在本实施例中还提供了一种数据处理装置,该装置用于实现上述实施例及实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置可以以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图8是根据本发明实施例的集群管理器的结构框图,如图8所示,该装置包括第一接收模块82、第一指示模块84和第二指示模块86,下面对该装置进行说明:
第一接收模块82,设置为接收数据重分布请求;第一指示模块84,连 接至上述第一接收模块82,设置为根据上述数据重分布请求指示第一资源管理器对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中,其中,该第一资源管理器设置为管理上述第一节点中存储的数据;第二指示模块86,连接至上述第一指示模块84,设置为指示上述第一资源管理器将在进行拆分及导入处理期间存储到第一节点中的追增量数据导入到与追增量数据对应的第二节点中。
在一个实施例中,上述第二指示模块86包括获取单元和发送单元,其中,该获取单元设置为获取与上述追增量数据对应的第二节点在导入了上述子原始数据后的逻辑事务日志,其中,该逻辑事务日志用于描述第二节点中存储的子原始数据的信息;上述发送单元设置为根据上述逻辑事务日志向第一资源管理器发送追增量指示,其中,该追增量指示用于指示第一资源管理器根据逻辑事务日志确定在进行拆分及导入处理期间存储到所述第一节点中的追增量数据,并将追增量数据导入到与追增量数据对应的第二节点中。
在一个实施例中,上述装置还包括第一处理模块,设置为在指示第一资源管理器将在进行拆分及导入处理期间存储到第一节点中的追增量数据导入到与追增量数据对应的第二节点中之后,接收来自第一资源管理器的追增量数据的第一校验值、导入时间和导入结果,以及,接收来自第二资源管理器的追增量数据的第二校验值,其中,该第二资源管理器设置为管理第二节点中存储的数据;在确定上述导入结果为导入成功时,判断第一校验值和第二校验值是否相同;在判断结果为第一校验值和第二校验值不相同的情况下,重新执行指示第一资源管理器将上述追增量数据导入到与追增量数据对应的第二节点中的处理;和/或,在判断结果为上述第一校验值和第二校验值相同的情况下,判断导入时间是否小于预定阈值;在判断结果为导入时间小于预定阈值的情况下,控制第一节点终止存储新的数据以及指示第一资源管理器将第一节点中存储的且未导入第二节点的数据导入到对应的第二节点中;和/或,在判断结果为导入时间大于或等于预定阈值的情况下,返回指示上述第一资源管理器将第一节点中存储的且未导入 对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于上述预定阈值为止。
图9是根据本发明实施例的资源管理器的结构框图,如图9所示,该装置包括第二接收模块92、第一导入模块94、第三接收模块96和第二导入模块98,下面对该装置进行说明:
第二接收模块92,设置为接收来自集群管理器的数据重分布指示;第一导入模块94,连接至上述第二接收模块92,设置为根据上述数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中;第三接收模块96,连接至上述第一导入模块94,设置为接收来自上述集群管理器的追增量指示;第二导入模块98,连接至上述第三接收模块96,设置为根据上述追增量指示将在进行拆分及导入处理期间存储到第一节点中的追增量数据导入到与上述追增量数据对应的第二节点中。
在一个实施例中,上述第一导入模块94可以通过如下方式根据数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中:根据上述数据重分布指示从第一节点中导出上述原始数据;根据数据分发规则对上述原始数据进行拆分,将上述原始数据拆分成多个文件,其中,一个文件对应一个第二节点;将上述多个文件分别上传到与多个文件对应的第二节点进行导入。
在一个实施例中,上述第二导入模块98可以通过如下方式根据追增量指示将在进行拆分及导入处理期间存储到第一节点中的追增量数据导入到与该追增量数据对应的第二节点中:根据上述追增量指示中携带的逻辑事务日志确定所述追增量数据,其中,该逻辑事务日志用于描述第二节点中存储的所述子原始数据的信息;根据该追增量数据生成数据操作语言DML语句;利用该DML语句将上述追增量数据导入到与追增量数据对应的第二节点中。
在一个实施例中,上述装置还包括第二处理模块,设置为在根据上述 追增量指示将在进行拆分及导入处理期间存储到第一节点中的上述追增量数据导入到与追增量数据对应的第二节点中之后,确定上述追增量数据的第一校验值,以及将上述追增量数据导入到第二节点中的导入时间及导入结果;将上述第一校验值、导入时间和导入结果发送给集群管理器。
在一个实施例中,上述装置还包括第三处理模块,设置为在将上述第一校验值、导入时间和导入结果发送给集群管理器之后,接收来自集群管理器的追增量重导入指示,根据上述追增量重导入指示重新执行将上述追增量数据导入到与追增量数据对应的第二节点中的处理;和/或,接收来自集群管理器的终止指示,根据该终止指示终止第一节点存储新的数据,并将该第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中;和/或,接收来自上述集群管理器的重复导入指示;根据上述重复导入指示返回将第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于所述预定阈值为止。
在一个实施例中,还提供了一种数据处理系统,包括集群管理器、第一资源管理器,其中,该集群管理器连接至第一资源管理器,设置为根据接收的数据重分布请求向第一资源管理器发送数据重分布指示以及追增量指示;第一资源管理器设置为根据来自集群管理器的数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中,以及根据来自集群管理器的追增量指示将在进行拆分及导入处理期间存储到第一节点中的追增量数据导入到与追增量数据对应的第二节点中。
在一个实施例中,上述集群管理器可以通过如下方式向第一资源管理器发送上述追增量指示:获取与追增量数据对应的第二节点在导入了上述子原始数据后的逻辑事务日志,其中,该逻辑事务日志用于描述第二节点中存储的子原始数据的信息;根据上述逻辑事务日志向第一资源管理器发送上述追增量指示。
在一个实施例中,上述数据处理系统还包括第二资源管理器,其中,该第二资源管理器连接至上述集群管理器(上述集群管理器还设置为管理该 第二资源管理器);在本实施例中,上述集群管理器在向第一资源管理器发送所述追增量指示之后,还设置为:接收来自第一资源管理器的追增量数据的第一校验值、导入时间和导入结果,以及,接收来自第二资源管理器的追增量数据的第二校验值;在确定上述导入结果为导入成功时,判断第一校验值和第二校验值是否相同;在判断结果为第一校验值和第二校验值不相同的情况下,向第一资源管理器发送追增量重导入指示;和/或,在判断结果为第一校验值和第二校验值相同的情况下,判断上述导入时间是否小于预定阈值;在判断结果为上述导入时间小于预定阈值的情况下,向第一资源管理器发送终止指示;和/或,在判断结果为导入时间大于或等于预定阈值的情况下,向第一资源管理器发送重复导入指示。在本实施例中,第一资源管理器在根据来自集群管理器的追增量指示将在进行拆分及导入处理期间存储到第一节点中的追增量数据导入到与追增量数据对应的第二节点中之后,还设置为:确定追增量数据的第一校验值,以及将上述追增量数据导入到第二节点中的导入时间及导入结果;将第一校验值、所述导入时间和导入结果发送给集群管理器;接收来自集群管理器的追增量重导入指示,根据追增量重导入指示重新执行将追增量数据导入到与追增量数据对应的第二节点中的处理;和/或,接收来自集群管理器的终止指示,根据上述终止指示终止第一节点存储新的数据,并将第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中;和/或,接收来自集群管理器的重复导入指示;根据重复导入指示返回将第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于预定阈值为止。
在一个实施例中,上述第一资源管理器在根据数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点时,可以通过如下方式实现:根据上述数据重分布指示从第一节点中导出原始数据;根据数据分发规则对上述原始数据进行拆分,将原始数据拆分成多个文件,其中,一个文件对应一个第二节点;将上述多个文件分别上传到与多个文件对应的第二节点进行导入。
在一个实施例中,上述第一资源管理器在根据追增量指示将追增量数据导入到与该追增量数据对应的第二节点中时,可以通过如下方式实现:根据上述追增量指示中携带的逻辑事务日志确定追增量数据,其中,该逻辑事务日志用于描述第二节点中存储的子原始数据的信息;根据上述追增量数据生成数据操作语言DML语句;利用DML语句将追增量数据导入到与追增量数据对应的第二节点中。
在本发明的实施例中还提供了一种存储介质。在本实施例中,上述存储介质可以被设置为存储用于执行上述方法实施例中的步骤的程序代码。
在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
在本实施例中,处理器根据存储介质中已存储的程序代码执行上述每个步骤。
本实施例中的示例可以参考上述实施例及实施方式中所描述的示例,本实施例在此不再赘述。
本发明实施例相较于相关技术中的分布式数据库在线重分布追增量数据方案,具有如下有益效果:
适应性更加广泛。不再需要提前对数据进行分片分库的架构设计。为那些本身就不支持重分布或者并没有重分布计划的分布式数据库进行可行的在线重分布提供了一个关键方案支持。因此本方案对于分布式数据库来说适应性更加广泛。
保证分布式数据库的可靠性和可维护性。由于采用本方案的追增量数据方式,将不再受限于每个分片(库)的追增量限制,从而使得初期定义的数据分发规则不会成为限制,使得分布式数据库在后期拥有更强的可维护性。
保证数据的安全准确。通过提供新型的数据增量校验方法,保证了每次追增量的准确性,从而保证了重分布的数据安全和效率。
上述的本发明实施例的模块或步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明实施例不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
工业实用性
通过本发明实施例,采用的是在线的数据重分布,从而避免了业务中断,并且,由于在进行追增量数据的重分布时,可以仅对追增量数据进行重分布,无需对追增量数据所在的数据库进行整体的重分布,从而保证了在数据库数据庞大的情况下,也能完成数据重分布的目的。因此,达到避免业务中断保证在数据库数据庞大的情况下也能实现数据重分布的效果。

Claims (17)

  1. 一种数据处理方法,包括:
    接收数据重分布请求;
    根据所述数据重分布请求指示第一资源管理器对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与所述每部分子原始数据对应的第二节点中,其中,所述第一资源管理器设置为管理所述第一节点中存储的数据;
    指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
  2. 根据权利要求1所述的方法,其中,所述指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中包括:
    获取与所述追增量数据对应的第二节点在导入了所述子原始数据后的逻辑事务日志,其中,所述逻辑事务日志用于描述所述第二节点中存储的所述子原始数据的信息;
    根据所述逻辑事务日志向所述第一资源管理器发送追增量指示,其中,所述追增量指示用于指示所述第一资源管理器根据所述逻辑事务日志确定在进行拆分及导入处理期间存储到所述第一节点中的追增量数据,并将所述追增量数据导入到与所述追增量数据对应的第二节点中。
  3. 根据权利要求1或2所述的方法,其中,在指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中之后,所述方法还包括:
    接收来自所述第一资源管理器的所述追增量数据的第一校验值、导入时间和导入结果,以及,接收来自第二资源管理器的所述追增量数据的第二校验值,其中,所述第二资源管理器设置为管理第二节点中存储的数据;
    在确定所述导入结果为导入成功时,判断所述第一校验值和所述第二校验值是否相同;
    执行如下处理中的至少之一:
    在判断结果为所述第一校验值和所述第二校验值不相同的情况下,重新执行指示所述第一资源管理器将所述追增量数据导入到与所述追增量数据对应的所述第二节点中的处理;
    在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值;在判断结果为所述导入时间小于所述预定阈值的情况下,控制所述第一节点终止存储新的数据以及指示所述第一资源管理器将所述第一节点中存储的且未导入第二节点的数据导入到对应的第二节点中;
    在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值;在判断结果为所述导入时间大于或等于所述预定阈值的情况下,返回指示所述第一资源管理器将所述第一节点中存储的且未导入对应的所述第二节点的数据导入到对应的所述第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于所述预定阈值为止。
  4. 一种数据处理方法,包括:
    接收来自集群管理器的数据重分布指示;
    根据所述数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中;
    接收来自所述集群管理器的追增量指示;
    根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
  5. 根据权利要求4所述的方法,其中,所述根据所述数据重分布指示对所述第一节点中存储的所述原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中包括:
    根据所述数据重分布指示从所述第一节点中导出所述原始数据;
    根据数据分发规则对所述原始数据进行拆分,将所述原始数据拆分成多个文件,其中,一个文件对应一个第二节点;
    将所述多个文件分别上传到与所述多个文件对应的第二节点进行导入。
  6. 根据权利要求4所述的方法,其中,所述根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中包括:
    根据所述追增量指示中携带的逻辑事务日志确定所述追增量数据,其中,所述逻辑事务日志用于描述第二节点中存储的所述子原始数据的信息;
    根据所述追增量数据生成数据操作语言DML语句;
    利用所述DML语句将所述追增量数据导入到与所述追增量数据对应的第二节点中。
  7. 根据权利要求4至6中任一项所述的方法,其中,在根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的所述追增量数据导入到与所述追增量数据对应的第二节点中之后,所述方法还包括:
    确定所述追增量数据的第一校验值,以及将所述追增量数据导入到所述第二节点中的导入时间及导入结果;
    将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器。
  8. 根据权利要求7所述的方法,其中,在将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器之后,所述方法还包括如下至少之一:
    接收来自所述集群管理器的追增量重导入指示,根据所述追增量重导入指示重新执行将所述追增量数据导入到与所述追增量数据对应的第二节点中的处理;
    接收来自所述集群管理器的终止指示,根据所述终止指示终止所述第一节点存储新的数据,并将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中;
    接收来自所述集群管理器的重复导入指示;根据所述重复导入指示返回将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应 的第二节点中的时间小于预定阈值为止。
  9. 一种集群管理器,包括:
    第一接收模块,设置为接收数据重分布请求;
    第一指示模块,设置为根据所述数据重分布请求指示第一资源管理器对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与所述每部分子原始数据对应的第二节点中,其中,所述第一资源管理器设置为管理所述第一节点中存储的数据;
    第二指示模块,设置为指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
  10. 根据权利要求9所述的集群管理器,其中,所述第二指示模块包括:
    获取单元,设置为获取与所述追增量数据对应的第二节点在导入了所述子原始数据后的逻辑事务日志,其中,所述逻辑事务日志用于描述所述第二节点中存储的所述子原始数据的信息;
    发送单元,设置为根据所述逻辑事务日志向所述第一资源管理器发送追增量指示,其中,所述追增量指示用于指示所述第一资源管理器根据所述逻辑事务日志确定在进行拆分及导入处理期间存储到所述第一节点中的追增量数据,并将所述追增量数据导入到与所述追增量数据对应的第二节点中。
  11. 根据权利要求9或10所述的集群管理器,其中,所述集群管理器还包括:
    第一处理模块,设置为在指示所述第一资源管理器将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中之后,接收来自所述第一资源管理器的所述追增量数据的第一校验值、导入时间和导入结果,以及,接收来自第二资源管理器的所述追增量数据的第二校验值,其中,所述第二资源管理器设置为管理第二节点中存储的数据;
    在确定所述导入结果为导入成功时,判断所述第一校验值和所述第二校验值是否相同;
    执行如下处理中的至少之一:
    在判断结果为所述第一校验值和所述第二校验值不相同的情况下,重新执行指示所述第一资源管理器将所述追增量数据导入到与所述追增量数据对应的所述第二节点中的处理;
    在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值;在判断结果为所述导入时间小于所述预定阈值的情况下,控制所述第一节点终止存储新的数据以及指示所述第一资源管理器将所述第一节点中存储的且未导入第二节点的数据导入到对应的第二节点中;
    在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值;在判断结果为所述导入时间大于或等于所述预定阈值的情况下,返回指示所述第一资源管理器将所述第一节点中存储的且未导入对应的所述第二节点的数据导入到对应的所述第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于所述预定阈值为止。
  12. 一种资源管理器,包括:
    第二接收模块,设置为接收来自集群管理器的数据重分布指示;
    第一导入模块,设置为根据所述数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与每部分子原始数据对应的第二节点中;
    第三接收模块,设置为接收来自所述集群管理器的追增量指示;
    第二导入模块,设置为根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
  13. 根据权利要求12所述的资源管理器,其中,所述资源管理器还包括:
    第二处理模块,设置为在根据所述追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的所述追增量数据导入到与所述追增量数据对应的第二节点中之后,确定所述追增量数据的第一校验值,以及将所述追增 量数据导入到所述第二节点中的导入时间及导入结果;将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器。
  14. 根据权利要求13所述的资源管理器,其中,所述资源管理器还包括:
    第三处理模块,设置为在将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器之后,接收来自所述集群管理器的追增量重导入指示,根据所述追增量重导入指示重新执行将所述追增量数据导入到与所述追增量数据对应的第二节点中的处理;或,
    接收来自所述集群管理器的终止指示,根据所述终止指示终止所述第一节点存储新的数据,并将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中;或,
    接收来自所述集群管理器的重复导入指示;根据所述重复导入指示返回将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于预定阈值为止。
  15. 一种数据处理系统,包括集群管理器、第一资源管理器,其中,
    所述集群管理器连接至所述第一资源管理器,设置为根据接收的数据重分布请求向所述第一资源管理器发送数据重分布指示以及追增量指示;
    所述第一资源管理器设置为根据来自所述集群管理器的数据重分布指示对第一节点中存储的原始数据进行拆分并将拆分后的每部分子原始数据分别导入到与所述每部分子原始数据对应的第二节点中,以及根据来自所述集群管理器的追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中。
  16. 根据权利要求15所述的数据处理系统,所述数据处理系统还包括第二资源管理器,其中,所述第二资源管理器连接至所述集群管理器;
    所述集群管理器还设置为:在向所述第一资源管理器发送所述追增量指示之后,接收来自所述第一资源管理器的所述追增量数据的第一校验值、导入时间和导入结果,以及,接收来自所述第二资源管理器的所述追增量数据的第二校验值;在确定所述导入结果为导入成功时,判断所述第一校验值和 所述第二校验值是否相同;执行如下处理中的至少之一:在判断结果为所述第一校验值和所述第二校验值不相同的情况下,向所述第一资源管理器发送追增量重导入指示;在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值,在判断结果为所述导入时间小于所述预定阈值的情况下,向所述第一资源管理器发送终止指示;在判断结果为所述第一校验值和所述第二校验值相同的情况下,判断所述导入时间是否小于预定阈值,在判断结果为所述导入时间大于或等于所述预定阈值的情况下,向所述第一资源管理器发送重复导入指示;
    所述第一资源管理器还设置为:在根据来自所述集群管理器的追增量指示将在进行拆分及导入处理期间存储到所述第一节点中的追增量数据导入到与所述追增量数据对应的第二节点中之后,确定所述追增量数据的第一校验值,以及将所述追增量数据导入到所述第二节点中的导入时间及导入结果;将所述第一校验值、所述导入时间和所述导入结果发送给所述集群管理器;接收来自所述集群管理器的追增量重导入指示,根据所述追增量重导入指示重新执行将所述追增量数据导入到与所述追增量数据对应的第二节点中的处理;或,接收来自所述集群管理器的终止指示,根据所述终止指示终止所述第一节点存储新的数据,并将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中;或,接收来自所述集群管理器的重复导入指示;根据所述重复导入指示返回将所述第一节点中存储的且未导入对应的第二节点的数据导入到对应的第二节点中的步骤继续执行,直到将未导入对应的第二节点的数据导入到对应的第二节点中的时间小于所述预定阈值为止。
  17. 根据权利要求15所述的数据处理系统,其中,所述第一资源管理器设置为:根据所述追增量指示中携带的逻辑事务日志确定所述追增量数据,其中,所述逻辑事务日志用于描述第二节点中存储的所述子原始数据的信息;根据所述追增量数据生成数据操作语言DML语句;利用所述DML语句将所述追增量数据导入到与所述追增量数据对应的第二节点中。
PCT/CN2017/090001 2016-06-29 2017-06-26 数据处理方法、集群管理器、资源管理器、数据处理系统 WO2018001200A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP17819197.9A EP3480686B1 (en) 2016-06-29 2017-06-26 Data processing method, cluster manager, resource manager and data processing system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610494245.3 2016-06-29
CN201610494245.3A CN107547606B (zh) 2016-06-29 2016-06-29 数据处理方法、集群管理器、资源管理器、数据处理系统

Publications (1)

Publication Number Publication Date
WO2018001200A1 true WO2018001200A1 (zh) 2018-01-04

Family

ID=60785934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/090001 WO2018001200A1 (zh) 2016-06-29 2017-06-26 数据处理方法、集群管理器、资源管理器、数据处理系统

Country Status (3)

Country Link
EP (1) EP3480686B1 (zh)
CN (1) CN107547606B (zh)
WO (1) WO2018001200A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108259394A (zh) * 2018-01-17 2018-07-06 成都中安频谱科技有限公司 信号获取方法、装置及电子设备
WO2019166940A3 (en) * 2018-02-28 2019-10-17 International Business Machines Corporation Transactional operations in multi-master distributed data management systems
CN110647509A (zh) * 2018-06-26 2020-01-03 北京亿阳信通科技有限公司 一种集客数据的快速核查共享方法及装置
US11042522B2 (en) 2018-06-11 2021-06-22 International Business Machines Corporation Resolving versions in an append-only large-scale data store in distributed data management systems
CN110647509B (zh) * 2018-06-26 2024-05-14 北京亿阳信通科技有限公司 一种集客数据的快速核查共享方法及装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388481B (zh) * 2018-09-21 2021-08-17 网易(杭州)网络有限公司 一种事务信息的传输方法、系统、装置、计算设备和介质
CN111291112B (zh) * 2018-12-07 2023-04-28 阿里巴巴集团控股有限公司 分布式数据库的读写控制方法和装置以及电子设备
CN110427383A (zh) * 2019-05-13 2019-11-08 国网冀北电力有限公司 基于数据冗余的mysql集群在线数据重分布方法及相关设备
CN110597879B (zh) * 2019-09-17 2022-01-14 第四范式(北京)技术有限公司 时序数据的处理方法和装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103580898A (zh) * 2012-08-01 2014-02-12 华为技术有限公司 网络协调方法和装置
CN104407807A (zh) * 2014-10-24 2015-03-11 华中科技大学 一种针对rs编码存储集群的存储扩容方法
US20160080490A1 (en) * 2014-09-15 2016-03-17 Microsoft Corporation Online data movement without compromising data integrity

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9805108B2 (en) * 2010-12-23 2017-10-31 Mongodb, Inc. Large distributed database clustering systems and methods
CN102752372A (zh) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 一种基于文件的数据库同步方法
CN103793424B (zh) * 2012-10-31 2018-04-20 阿里巴巴集团控股有限公司 数据库数据迁移方法及系统
US9083724B2 (en) * 2013-05-30 2015-07-14 Netapp, Inc. System iteratively reducing I/O requests during migration of virtual storage system
CN103473334B (zh) * 2013-09-18 2017-01-11 中控技术(西安)有限公司 数据存储、查询方法及系统
CN105069149B (zh) * 2015-08-24 2019-03-12 电子科技大学 一种面向结构化列式数据的分布式并行数据导入方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103580898A (zh) * 2012-08-01 2014-02-12 华为技术有限公司 网络协调方法和装置
US20160080490A1 (en) * 2014-09-15 2016-03-17 Microsoft Corporation Online data movement without compromising data integrity
CN104407807A (zh) * 2014-10-24 2015-03-11 华中科技大学 一种针对rs编码存储集群的存储扩容方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3480686A4 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108259394A (zh) * 2018-01-17 2018-07-06 成都中安频谱科技有限公司 信号获取方法、装置及电子设备
CN108259394B (zh) * 2018-01-17 2021-04-06 成都中安频谱科技有限公司 信号获取方法、装置及电子设备
WO2019166940A3 (en) * 2018-02-28 2019-10-17 International Business Machines Corporation Transactional operations in multi-master distributed data management systems
CN111801661A (zh) * 2018-02-28 2020-10-20 国际商业机器公司 多主机分布式数据管理系统中的事务操作
GB2586373A (en) * 2018-02-28 2021-02-17 Ibm Transactional operations in multi-master distributed data management systems
US11119678B2 (en) 2018-02-28 2021-09-14 International Business Machines Corporation Transactional operations in multi-master distributed data management systems
GB2586373B (en) * 2018-02-28 2022-04-27 Ibm Transactional operations in multi-master distributed data management systems
US11042522B2 (en) 2018-06-11 2021-06-22 International Business Machines Corporation Resolving versions in an append-only large-scale data store in distributed data management systems
US11487727B2 (en) 2018-06-11 2022-11-01 International Business Machines Corporation Resolving versions in an append-only large-scale data store in distributed data management systems
CN110647509A (zh) * 2018-06-26 2020-01-03 北京亿阳信通科技有限公司 一种集客数据的快速核查共享方法及装置
CN110647509B (zh) * 2018-06-26 2024-05-14 北京亿阳信通科技有限公司 一种集客数据的快速核查共享方法及装置

Also Published As

Publication number Publication date
CN107547606A (zh) 2018-01-05
CN107547606B (zh) 2021-01-26
EP3480686A1 (en) 2019-05-08
EP3480686A4 (en) 2020-03-04
EP3480686B1 (en) 2023-05-24

Similar Documents

Publication Publication Date Title
WO2018001200A1 (zh) 数据处理方法、集群管理器、资源管理器、数据处理系统
CN108932282B (zh) 一种数据库迁移方法、装置和存储介质
WO2017049764A1 (zh) 数据读写方法及分布式存储系统
CN101334797B (zh) 一种分布式文件系统及其数据块一致性管理的方法
US9648059B2 (en) System and methods for multi-user CAx editing conflict management
US7895501B2 (en) Method for auditing data integrity in a high availability database
WO2017162032A1 (zh) 执行数据恢复操作的方法及装置
US10248709B2 (en) Promoted properties in relational structured data
US20150032695A1 (en) Client and server integration for replicating data
US10599676B2 (en) Replication control among redundant data centers
US11928089B2 (en) Data processing method and device for distributed database, storage medium, and electronic device
WO2020108289A1 (zh) 一种数据库系统、节点和方法
JP2018200683A (ja) 自動化された試験システムの方法及び設計
US20180101589A1 (en) High-performance database replication systems and methods
US20170169091A1 (en) Replication of structured data records among partitioned data storage spaces
CN107153609B (zh) 一种自动化测试方法和装置
WO2020258674A1 (zh) 脚本文件校验方法、装置、服务器及存储介质
CN111367994A (zh) 数据库增量数据同步备份方法及系统
CN110334484B (zh) 一种版权验证方法、装置、计算机设备和存储介质
WO2021109777A1 (zh) 一种数据文件的导入方法及装置
CN112835885A (zh) 一种分布式表格存储的处理方法、装置及系统
CN111459913B (zh) 分布式数据库的容量扩展方法、装置及电子设备
EP3349416B1 (en) Relationship chain processing method and system, and storage medium
US11960369B2 (en) Efficient creation of a secondary database system
CN116186082A (zh) 基于分布式的数据汇总方法、第一服务器和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17819197

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017819197

Country of ref document: EP

Effective date: 20190129