EP4293510A1 - Data migration method and apparatus, and device, medium and computer product - Google Patents

Data migration method and apparatus, and device, medium and computer product Download PDF

Info

Publication number
EP4293510A1
EP4293510A1 EP22868914.7A EP22868914A EP4293510A1 EP 4293510 A1 EP4293510 A1 EP 4293510A1 EP 22868914 A EP22868914 A EP 22868914A EP 4293510 A1 EP4293510 A1 EP 4293510A1
Authority
EP
European Patent Office
Prior art keywords
node
identity
data
primary key
load balancing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP22868914.7A
Other languages
German (de)
French (fr)
Inventor
Liangchun XIONG
Anqun PAN
Hailin LEI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of EP4293510A1 publication Critical patent/EP4293510A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration

Definitions

  • This disclosure relates to the field of Internet technologies and the field of traffic, and in particular, to data migration.
  • the LVS forwards a user request according to an address and a port, and performs data migration according to a connection processed by a current cluster database service process.
  • Data migration may provide the distributed cache system with high scalability.
  • Embodiments of this disclosure provide a data migration method and apparatus, a device, a storage medium, and a computer product.
  • data desired to be migrated can be determined by using the primary key identity, and a node to which the data is to be migrated can be determined by using the partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • a first aspect of this disclosure provides a data migration method.
  • the data migration method is applied to a load balancing system.
  • the load balancing system includes a first node and a second node.
  • the method includes: receiving, by the first node, a first instruction carrying a first primary key identity and a first transaction identity, the first primary key identity indicating first data that is to be processed, the first transaction identity indicating a first transaction, the first node being used for processing the first transaction; obtaining, by the first node and using a first route assignment table, a first partition identity based on the first primary key identity, the first route assignment table containing a mapping relationship between the first primary key identity and the first partition identity, the first partition identity indicating the second node for processing the first transaction; determining, by the first node, the second node based on the first partition identity; and transmitting, by the first node, the first data to the second node.
  • a second aspect of this disclosure provides a data migration apparatus, including: an obtaining module, configured to obtain a first route assignment table, the first route assignment table including a mapping relationship between a first primary key identity and a first partition identity, the first primary key identity being used for uniquely identifying first data, and the first partition identity indicating a second node; a receiving module, configured to receive a first instruction, the first instruction carrying the first primary key identity and a first transaction identity, the first transaction identity indicating a first transaction, and a first node being configured to process the first transaction; the obtaining module being further configured to obtain, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction; a processing module, configured to determine the second node based on the first partition identity, the second node being configured to process the first transaction; and a transmission module, configured to transmit the first data to the second node.
  • a third aspect of this disclosure provides a computer-readable storage medium.
  • the computer-readable storage medium stores instructions which, when run in a computer, enable the computer to perform the method as described in each of the foregoing aspects.
  • a fourth aspect of this disclosure provides a computer program product or computer program.
  • the computer program product or computer program includes computer instructions.
  • the computer instructions are stored in a computer-readable storage medium.
  • a processor of a computer device reads the computer instructions from the computer-readable storage medium.
  • the processor executes the computer instructions to cause the computer device to perform the method provided in each of the foregoing aspects.
  • the embodiments of this disclosure have the following advantages.
  • the embodiments of this disclosure are applied to the load balancing system.
  • the load balancing system includes the first node and the second node. Based on this, the first route assignment table is first obtained by using the first node.
  • the first route assignment table includes the mapping relationship between the first primary key identity and the first partition identity.
  • the first primary key identity is used for uniquely identifying the first data.
  • the first partition identity indicates the second node.
  • the first node receives the first instruction.
  • the first instruction carries the first primary key identity and the first transaction identity.
  • the first transaction identity indicates the first transaction.
  • the first node is configured to process the first transaction.
  • the first node obtains, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction, and determines the second node based on the first partition identity.
  • the second node is configured to process the first transaction. Then, the first node transmits the first data uniquely identified by the first primary key identity to the second node.
  • data desired to be migrated can be determined by using the primary key identity, and a node to which the data is to be migrated can be determined by using the partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • the embodiments of this disclosure provide a data migration method and apparatus, a computer device, and a storage medium.
  • data desired to be migrated may be determined by using the primary key identity, and a node to which the data is to be migrated may be determined by using the partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • data migration may be implemented in a distributed cache system by using an LVS.
  • the LVS forwards a user request according to a target address and a target port.
  • the LVS only forwards the user request without generating traffic, and performs data migration according to a connection processed by a current cluster database service process, thereby implementing load balancing. Based on this, data migration may provide the distributed cache system with high scalability.
  • load balancing may be performed according to the connection processed by the current cluster database service process.
  • a user transaction is simple (for example, data interaction is completed on one node)
  • load balancing may be performed according to the connection processed by the current cluster database service process.
  • a user transaction is complex, for example, there is data interaction between multiple nodes
  • data migration based on the LVS may require the user request to be forwarded for many times between the multiple nodes, which reduces data migration efficiency. Therefore, how to improve the data migration efficiency in a scenario in which the user transaction is complex becomes a problem urgent to be solved.
  • the embodiments of this disclosure provide a data migration method.
  • data desired to be migrated and a node to which the data is to be migrated may be determined based on a mapping relationship between a primary key identity and a partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • a load balancing node uniformly sends a request that is sent by a user client to a database side to a back-end database service process to provide a data service.
  • multiple peer database service processes in the distributed cache system may externally provide services. Based on this, a user transaction load is uniformly distributed in all database service processes, such that an entire database can provide a maximum data service capability.
  • the node includes a computing node and a storage node.
  • the computing node is configured to process a specific computing request of a user.
  • the computing node is specifically a node capable of executing a user request.
  • the storage node is a storage device in the distributed cache system, that is, is configured to store data.
  • the storage node is specifically a node completing execution and committing of a distributed transaction in the distributed cache system.
  • the computing node and the storage node are specifically servers.
  • the node including the computing node and the storage node belongs to a database in a load balancing system.
  • 2PC is an algorithm designed in the fields of computer networks and databases to keep consistency of nodes under a distributed cache system architecture during transaction committing.
  • the consistent hashing algorithm is applied extensively to a distributed system.
  • consistent hashing can minimize a change in an existing mapping relationship between a service request and a request processing server when a server is removed or added, to maximally meet a requirement for monotonicity.
  • FIG. 1 is a schematic diagram of the consistent hashing algorithm according to an embodiment of this disclosure.
  • A1 to A4 denote nodes
  • B1 to B5 denote data.
  • a first node that is found clockwise and that is greater than data is a node in which the data is.
  • a hash value calculated based on a partition identity of the data is compared with hash values of the nodes, and a node corresponding to a first hash value that is found clockwise and that is greater than the hash value calculated based on the partition identity of the data is a node in which the data is.
  • a first node found clockwise for the data B1 is the node A1, so that the data B1 is stored in the node A1, and a user request corresponding to the data B1 is executed to complete execution and committing of a transaction corresponding to the data B1.
  • a first node found clockwise for the data B2 is the node A2
  • a first node found clockwise for the data B3 is the node A3
  • first nodes found clockwise for the data B4 and the data B5 are the node A4.
  • FIG. 2 is a schematic diagram of a case that load balancing is performed based on the consistent hashing algorithm according to an embodiment of this disclosure.
  • C1 to C5 denote nodes
  • D1 to D5 denote data. Based on this, if a computing capability is currently desired to be expanded, and a node currently with a maximum computational load is the node C4, the node C5 is added between the node C3 and the node C4.
  • the data D4 in the data D4 and the data D5 that are originally assigned to the node C4 according to an assignment rule of the consistent hashing algorithm described with reference to FIG. 1 may be assigned to the added node C5, to expand the computing capability, thereby completing load balancing. Then, if the computing capability is desired to be reduced, it is only necessary to remove the node C5. In this case, the data D4 assigned to the node C5 may be reassigned to the node C4 according to the assignment rule of the consistent hashing algorithm described with reference to FIG. 1 , to reduce the computing capability, thereby completing dynamic load balancing.
  • FIG. 3 is a schematic diagram of an architecture of a load balancing system according to an embodiment of this disclosure.
  • a node in FIG. 3 specifically includes multiple computing nodes and a storage node communicating with each computing node.
  • a user specifically sends a request corresponding to a transaction to the computing node by using a terminal device.
  • data corresponding to the transaction is specifically called from the storage node.
  • the data is executed and committed by the storage node, such that the computing node completes execution of the request corresponding to the transaction.
  • the node may output log information to the load balancing system, and then the load balancing node analytically processes the log information to determine whether load balancing (that is, data migration) is desired.
  • Each of the load balancing node and the node in FIG. 3 may be a server, a server cluster including multiple servers, a cloud computing center, or the like. This is not specifically limited herein.
  • a client is specifically deployed in the terminal device.
  • the terminal device may be a tablet computer, a notebook computer, a palm computer, a mobile phone, a personal computer (PC), or a voice interaction device shown in FIG. 3 .
  • the terminal device may communicate with the node by using a wireless network, a wired network, or a removable storage medium.
  • the wireless network uses a standard communication technology and/or protocol.
  • the wireless network is generally the Internet, but may alternatively be any network, including, but not limited to, any combination of Bluetooth, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a mobile private network, or a virtual private network.
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide area network
  • mobile private network or a virtual private network.
  • the foregoing data communication technology may be replaced or supplemented with a custom or private data communication technology.
  • the removable storage medium may be a universal serial bus (USB) flash drive, a mobile hard disk, another removable storage medium, or the like.
  • USB universal serial bus
  • FIG. 3 Only five terminal devices, one node, and one load balancing node are shown in FIG. 3 . However, it is to be understood that an example in FIG. 3 is used only to understand this solution, and specific quantities of terminal devices, nodes, and load balancing nodes are flexibly determined in combination with an actual situation.
  • the node may transmit the log information to the load balancing node by adapting a log generated by the node (for example, adding some output fields). Specific implementation complexity is low, and impact on the node is reduced.
  • the data migration method of this solution may be performed only by adapting an output of the log in the node and adding a corresponding processing logic to the load balancing node, so that implementability is high.
  • the node outputs the log information to transfer computation for load balancing to the load balancing node, so that contention with data generated during transaction execution for a data resource, when a load balancing algorithm is executed on the node, is reduced. Therefore, a resource utilization is increased, and data migration efficiency is improved.
  • migrated data may be traffic-related data, for example, real-time road condition data, vehicle travel data, or driver need data. Therefore, the method provided in the embodiments of this disclosure may be applied to the field of traffic. Based on this, the following describes an intelligent traffic system (ITS) and an intelligent vehicle infrastructure cooperative system (IVICS).
  • ITS intelligent traffic system
  • IVICS intelligent vehicle infrastructure cooperative system
  • the ITS also referred to as an intelligent transportation system, effectively integrates and applies advanced sciences and technologies (an information technology, a computer technology, a data communication technology, a sensor technology, an electronic control technology, an automatic control technology, operational research, artificial intelligence, and the like) to transportation, service control, and vehicle manufacturing to strengthen a connection between a vehicle, infrastructure, and a user, thereby forming an integrated transportation system to ensure safety, improve efficiency and an environment, and save energy.
  • the IVICS is referred to as a cooperative vehicle infrastructure system for short, and is a development direction of the ITS.
  • the cooperative vehicle infrastructure system comprehensively implements dynamic real-time vehicle-vehicle and vehicle-infrastructure information interaction by using advanced wireless communication and new-generation Internet technologies and the like, and develops active vehicle safety control and cooperative infrastructure management based on full space-time dynamic traffic information acquisition and fusion to fully implement effective cooperation of a person, a vehicle, and infrastructure, ensure traffic safety, and improve traffic efficiency, thereby forming a safe, efficient, and environment-friendly road traffic system.
  • the data migration method provided in the embodiments of this disclosure specifically involves a cloud technology.
  • the cloud technology is a hosting technology that unifies a series of hardware, software, and network resources, and the like in a WAN or a LAN to implement calculation, storage, processing, and sharing of data.
  • the cloud technology is a generic term of a network technology, information technology, integration technology, management platform technology, application technology, and the like based on commercial-mode application of cloud computing.
  • a resource pool may be formed, and is flexibly and conveniently used on demand.
  • a cloud computing technology will become an important support.
  • a background service of a technical network system requires a large quantity of computing and storage resources, for example, a video website, a picture website, and more portals.
  • each item may have its own identification mark in the future, which is desired to be transmitted to a background system for logical processing.
  • Data of different levels may be processed separately. All kinds of industry data require a strong system support, which can only be realized by cloud computing.
  • Cloud computing distributes computing tasks on a resource pools including a large quantity of computers, such that various application systems can obtain computing power, storage space, and information services as desired.
  • a network providing resources is referred to as a "cloud”.
  • the resources in the "cloud” appear to a user to be infinitely extensible and available at any time.
  • the resources are available on demand and extensible at any time, and the user pays for use.
  • a basic capability provider of cloud computing may construct a cloud computing resource pool platform (referred to as a cloud platform for short, generally referred to as infrastructure as a service (IaaS)), and multiple types of virtual resources are deployed in a resource pool for an external client to select and use.
  • the cloud computing resource pool mainly includes a computing device (a virtual machine, including an operating system), a storage device, and a network device.
  • a platform as a service (PaaS) layer may be deployed on an IaaS layer, and then a software as a service (SaaS) layer is deployed on the PaaS layer.
  • the SaaS layer may be directly deployed on the IaaS layer.
  • PaaS is a platform on which software is run, for example, a database or a web container.
  • SaaS is various transaction software, for example, a web portal or a mass texting device.
  • SaaS and PaaS are upper layers relative to IaaS.
  • a distributed cloud storage system (referred to as a storage system hereinafter) is a storage system that integrates a number of different types of storage devices (the storage device is also referred to as a storage node) in the network through application software or application interfaces by using a function such as a cluster application, a grid technology, or a distributed storage file system to cooperate to externally provide data storage and transaction access functions.
  • a storage method of the storage system is as follows.
  • a logical volume is created.
  • physical storage space is allocated to each logical volume.
  • the physical storage space may include a disk of one or more storage devices.
  • the file system divides the data into many portions, each portion being an object.
  • the object includes not only data but also additional information such as a data identity (ID, ID entity).
  • ID data identity
  • the file system writes each object to physical storage space of the logical volume.
  • the file system records storage position information of each object. Therefore, when the client requests to access the data, the file system may enable the client to access the data according to the storage position information of each object.
  • the storage system specifically allocates the physical storage space to the logical volume through the following process: pre-dividing physical storage space into stripes according to a capacity estimate for an object to be stored in the logical volume (the estimate often has a large margin with respect to an actual capacity for the object to be stored) and a group of a redundant array of independent disks (RAID), one logical volume being understood as one stripe, thereby allocating the physical storage space to the logical volume.
  • a capacity estimate for an object to be stored in the logical volume the estimate often has a large margin with respect to an actual capacity for the object to be stored
  • RAID redundant array of independent disks
  • a blockchain is a novel application mode of a computer technology such as distributed data storage, point-to-point transmission, a consensus mechanism, or an encryption algorithm.
  • the blockchain is essentially a decentralized database, and is a string of data blocks associatively generated by using a cryptographic method. Each data block includes information of a batch of network transactions, and is used for verifying validity of the information (anti-counterfeiting) and generating a next block.
  • the blockchain may include a blockchain underlying platform, a platform product service layer, and an application service layer.
  • the blockchain underlying platform may include a processing module 1102 such as a user management module, a basic service module, a smart contract module, and an operation management module.
  • the user management module is configured to manage identity information of all blockchain participants, including maintenance of public and private key generation (account management), key management, maintenance of a correspondence between a real identity of a user and a blockchain address (authority management), and the like, and when authorized, determine and audit transaction conditions of some real identities and provide a rule configuration for risk control (risk control auditing).
  • the basic service module is deployed on all blockchain node devices, and is configured to verify validity of a transaction request, and record a valid request on a storage after consensus about the valid request is completed.
  • the basic service module For a new transaction request, the basic service module first performs interface adaptation parsing and authentication processing (interface adaptation), then encrypts transaction information by using a consensus algorithm (consensus management), completely and uniformly transmits the transaction information to a shared ledger after encryption (network communication), and performs recording and storage.
  • the smart contract module is configured for registration, issuance, triggering, and execution of a contract.
  • a developer may define a contract logic by using a specific programming language, issue the contract logic on the blockchain (contract registration), and call a key or another event according to a logic of a contract term to trigger execution to complete the contract logic.
  • a contract upgrade and cancellation function is further provided.
  • the operation management module is mainly configured for deployment in a release process of a product, modification of a configuration, contract setting, cloud adaptation, and visual output of a real-time state during running of the product, for example, alarming, managing a network condition, or managing a health condition of the node device.
  • the platform product service layer provides a basic capability and implementation framework of a typical application. The developer may superimpose characteristics of a transaction based on the basic capability to complete blockchain implementation of a transaction logic.
  • the application service layer provides a blockchain-solution-based application service for a transaction participant to use.
  • FIG. 4 is a schematic diagram of a data migration method according to an embodiment of this disclosure.
  • the data migration method in this embodiment of this disclosure is applied to the load balancing system shown in FIG. 3 .
  • the load balancing system may further include more nodes, and elaborations are omitted herein.
  • the data migration method in this embodiment of this disclosure includes the following steps 101 to 105.
  • the first node obtains a first route assignment table, the first route assignment table including a mapping relationship between a first primary key identity and a first partition identity, the first primary key identity being used for uniquely identifying first data, and the first partition identity indicating the second node.
  • the first obtains the first route assignment table.
  • the first route assignment table includes the mapping relationship between the first primary key identity and the first partition identity.
  • the first primary key identity is used for uniquely identifying first data.
  • the first partition identity indicates the second node.
  • one primary key identity may uniquely identify only one piece of data.
  • a primary key identity 1 is used for uniquely identifying data 1
  • a primary key identity 2 is used for uniquely identifying data 2.
  • One partition identity may indicate only one node, but one node may be indicated by multiple partition identities.
  • a partition identity 1 is used for indicating a node 1
  • a partition identity 2 is used for indicating the node 1
  • a partition identity 3 is used for indicating a node 2.
  • the node 1 is indicated by the partition identity 1 and the partition identity 2
  • the node 2 is indicated by the partition identity 3.
  • mapping relationship between the first primary key identity and the first partition identity is, for example, a mapping relationship between the partition identity 1 and the primary key identity 1 or a mapping relationship between the partition identity 2 and the primary key identity 2.
  • the first route assignment table may include mapping relationships between multiple primary key identities, multiple transaction identities, and multiple partition identities. Not all mapping relationships are exhausted herein.
  • the example in this embodiment is not to be understood as a limitation on this disclosure.
  • a primary key identity and a partition identity carried in each piece of data may be flexibly determined according to an actual situation.
  • the first node receives a first instruction, the first instruction carrying the first primary key identity and a first transaction identity, the first transaction identity indicating a first transaction, and the first node being configured to process the first transaction.
  • the first node receives the first instruction.
  • the first instruction carries the first primary key identity and the first transaction identity.
  • the first transaction identity indicates the first transaction.
  • the first transaction identity may be carried in a log of the first transaction generated by the first node by executing the first transaction.
  • the first node may learn based on the first transaction identity in the first instruction that a user initiates an operation on the first transaction at this time, and data desired to be called for this operation on the first transaction is the first data indicated by the first primary key identity.
  • the first node obtains, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction.
  • the first node may obtain the first primary key identity carried in the first instruction by obtaining the first instruction in step 102, and then determine, by using the mapping relationship between the first primary key identity and the first partition identity in the first route assignment table, the first partition identity based on the first primary key identity.
  • the first route assignment table includes the mapping relationship between the partition identity 1 and the primary key identity 1 and the mapping relationship between the partition identity 2 and the primary key identity 2 is used for description.
  • the first primary identity carried in the first instruction is the primary key identity 1
  • the first primary identity carried in the first instruction is the primary key identity 2
  • the first node determines the second node based on the first partition identity, the second node being configured to process the first transaction.
  • the first node may determine the second node based on the first partition identity. Second, since the first instruction received in step 102 further carries the first transaction identity, the determined second node is a node capable of processing the first transaction indicated by the first transaction identity.
  • the first node may be indicated by multiple partition identities.
  • the partition identity 1 indicates the node 1
  • the partition identity 2 indicates the node 2
  • the partition identity 3 indicates the node 1.
  • both the partition identity 1 and the partition identity 3 may indicate that corresponding data is processed by the node 1.
  • FIG. 5 is a schematic diagram of a case in which the partition identity uniquely identifies the node according to an embodiment of this disclosure.
  • E11 and E12 denote nodes
  • E21 to E26 denote data. Based on this, hash calculation is first performed on the node E11 to obtain a hash value 1 corresponding to the node E11. Similarly, hash calculation is performed on the node E12 to obtain a hash value 2 corresponding to the node E12.
  • hash calculation is performed on the partition identity 1 of the data E21 to obtain a hash value 3 corresponding to the partition identity 1 of the data E21.
  • hash calculation is performed on the partition identity 2 of the data E22 to obtain a hash value 4 corresponding to the partition identity 2 of the data E22.
  • Hash calculation is performed on the partition identity 3 of the data E23 to obtain a hash value 5 corresponding to the partition identity 3 of the data E23.
  • Hash calculation is performed on a partition identity 4 of the data E24 to obtain a hash value 6 corresponding to the partition identity 4 of the data E24.
  • Hash calculation is performed on a partition identity 5 of the data E25 to obtain a hash value 7 corresponding to the partition identity 5 of the data E25.
  • Hash calculation is performed on a partition identity 6 of the data E26 to obtain a hash value 8 corresponding to the partition identity 6 of the data E26.
  • the node E11 is a node in which the data E21 is stored, and the partition identity 1 of the data E21 may uniquely indicates the node E11.
  • the node E11 is a node in which the data E22 is stored, and the partition identity 2 of the data E22 may uniquely indicates the node E11.
  • the node E11 is a node in which the data E23 is stored, and the partition identity 3 of the data E23 may uniquely indicates the node E11.
  • the node E12 is a node in which the data E24 is stored, and the partition identity 4 of the data E24 may uniquely indicates the node E12.
  • the node E12 is a node in which the data E25 is stored, and the partition identity 5 of the data E25 may uniquely indicates the node E12.
  • the node E12 is a node in which the data E26 is stored, and the partition identity 6 of the data E24 may uniquely indicates the node E12.
  • FIG. 5 and the corresponding example are used only to understand how to uniquely indicate a node by using a partition identity, but the example is not to be understood as a limitation on this disclosure.
  • the first node transmits the first data to the second node.
  • the first node after receiving the first instruction in step 102, determines, by using the first primary key identity, the first data uniquely identified by the first primary key identity, and may determine through the foregoing steps that the first data desired to be called by the first instruction is currently not data processed by the first node. Therefore, the first data has to be transmitted to the second node, and the second node processes the first data to complete processing the first transaction.
  • This embodiment of this disclosure provides a data migration method.
  • data desired to be migrated may be determined by using the primary key identity, and a node to which the data is to be migrated may be determined by using the partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • the first route assignment table further includes a mapping relationship between a first index identity and the first partition identity.
  • the first instruction further includes the first index identity. That the first node obtains, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction specifically includes that: the first node determines, by using the first route assignment table, N partition identies based on the first index identity carried in the first instruction, the N partition identities including the first partition identity, and N being an integer greater than or equal to 1; and the first node determines, by using the first route assignment table, the first partition identity from the N partition identities based on the first primary key identity carried in the first instruction.
  • the first route assignment table further includes the mapping relationship between the first index identity and the first partition identity
  • the first instruction further includes the first index identity.
  • a calculation amount may be large if partition identities are determined by using primary key identities.
  • One index identity may correspond to at least one partition.
  • the first node specifically determines, by using the mapping relationship between the first index identity and the first partition identity in the first route assignment table, the N partition identities based on the first index identity carried in the first instruction.
  • the N partition identities include the first partition identity, and N is an integer greater than or equal to 1. Then, the first partition identity corresponding to the first primary key identity in the mapping relationship is desired to be determined from the N partition identities.
  • the first node determines, by using the mapping relationship between the first primary key identity and the first partition identity in the first route assignment table, the first partition identity from the determined N partition identities based on the first primary key identity carried in the first instruction.
  • the first route assignment table includes the mapping relationship between the partition identity 1 and the primary key identity 1, a mapping relationship between the partition identity 1 and an index identity 1, and a mapping relationship between the partition identity 2 and the index identity 1.
  • the partition identity 1 and the partition identity 2 may be determined based on the index identity 1, and then the partition identity 1 is determined from the partition identity 1 and the partition identity 2 as the first partition identity based on the primary key identity 1 by using the mapping relationship between the partition identity 1 and the primary key identity 1.
  • the first instruction may include multiple primary key identities, and the first index identity may indicate multiple primary key identities. That is, the operation initiated by the user on the first transaction requires multiple pieces of data indicated by the multiple primary key identities to be called.
  • the first route assignment table includes the mapping relationship between the partition identity 1 and the primary key identity 1, the mapping relationship between the partition identity 2 and the primary key identity 2, the mapping relationship between the partition identity 1 and the index identity 1, and the mapping relationship between the partition identity 2 and the index identity 1, and the first instruction includes the primary key identity 1, the primary key identity 2, and the index identity 1.
  • the partition identity 1 and the partition identity 2 may be determined based on the index identity 1, or the partition identity 1 and the partition identity 2 may be determined by using the foregoing mapping relationships based on the primary key identity 1 and the primary key identity 2. That is, data is in the first node and the second node.
  • the first node completes calling data indicated by the primary key identity 2, and sends the first data indicated by the primary key identity 1 to the second node to enable the second node to call the first data, thereby completing the first transaction.
  • the first transaction is the 2PC transaction described above.
  • This embodiment of this disclosure provides another data migration method.
  • at least one partition identity is first determined by using the first index identity.
  • the first partition identity corresponding to the first primary key identity in the mapping relationship is determined from the at least one partition identity based on the first primary key identity.
  • the data migration method may further include that: the first node obtains a second route assignment table at a first time point, the second route assignment table including a mapping relationship between the first primary key identity and a second partition identity, and the second partition identity indicating the first node. That the first node obtains a first route assignment table includes that: the first node obtains the first route assignment table at a second time point, the second time point being later than the first time point. After the first node transmits the first data uniquely identified by the first primary key identity to the second node, the method further includes that: the first node deletes the second route assignment table.
  • the first node obtains the second route assignment table at the first time point.
  • the second route assignment table includes the mapping relationship between the first primary key identity and the second partition identity.
  • the second partition identity indicates the first node.
  • the first node is configured to process the first transaction. That is, in this case, based on the mapping relationship in the first route assignment table, the first data uniquely identified by the first primary key identity is data to be managed by the first node.
  • the first node obtains the first route assignment table at the second time point, and the second time point is later than the first time point. That is, the first route assignment table is obtained by updating the mapping relationship between the first primary key identity and the second partition identity in the second route assignment table. Based on this, after the first node transmits the first data to the second node in 105, the first data has been migrated, according to the latest route assignment table, to the second node managing the first data. In this case, the second route assignment table may be deleted.
  • the first instruction may be a first statement instruction generated for the first transaction when the user initiates the operation on the first transaction. That is, after the user initiates the operation on the first transaction, the first node receives the first statement instruction (that is, the first instruction) for the first transaction, and determines based on the first instruction that the first transaction is desired to be executed, that is, the first data is desired to be called. In this case, after the first data and the second node are determined in the foregoing step, the first data is sent to the second node.
  • the first statement instruction that is, the first instruction
  • the first data is sent to the second node.
  • a first statement instruction generated for the first transaction when the user initiates the operation on the first transaction is sent to the second node, and the second node may determine the first data and the second node (that is, the current node) through steps similar to the foregoing.
  • the first data is yet not migrated to the second node. Since the second node may retain the obtained second route assignment table when migration of the first data is not completed, the second node determines, based on the first primary key identity by using the mapping relationship between the first primary key identity and the second partition identity in the second route assignment table, that the first data is currently on the first node indicated by the second partition identity.
  • the second node is desired to generate a second statement instruction for the first transaction, and send the second statement instruction (that is, the first instruction) for the first transaction to the first node to enable the first node to determine the first data and the second node based on the first instruction.
  • the second node receives the first data sent by the first node, and completes the first transaction.
  • the second node deletes the second route assignment table.
  • FIG. 6 is a schematic flowchart of data migration according to an embodiment of this disclosure.
  • the second node receives the first statement instruction for the first transaction.
  • the first statement instruction for the first transaction carries a primary key identity capable of identifying data and the first transaction identity.
  • the second node determines whether data identified by the primary key identity is data to be processed by the current node (that is, the second node), that is, data to be processed by the second data. That is, in the foregoing manner, a partition identity is obtained by using the first route assignment table based on the primary key identity carried in the data.
  • step F3 If the partition identity indicates another node, it is determined that the data is not data to be processed by the second node, and step F3 is performed; or if the partition identity indicates the second node, it is determined that the data is data to be processed by the second node, and step F4 is performed.
  • step F3 since the partition identity indicates the another node, the second node transmits the data to the another node.
  • step F4 the second node is desired to further determine whether the data identified by the primary key identity is on the second node. If the data identified by the primary key identity is on the second node, step F5 is performed; or if the data identified by the primary key identity is not on the second node, step F6 is performed.
  • step F5 the second node calls the data identified by the primary key identity to complete the first transaction.
  • step F6 since the data identified by the primary key identity is not on the second node, the second node is desired to determine, based on the second route assignment table that is not updated, a node storing the data identified by the primary key identity, and send the second statement instruction for the first transaction to the node. Then, the node transmits the data identified by the primary key identity to the second node by using a method similar to the foregoing, such that the second node obtains the data identified by the primary key identity, and performs step F5. Finally, in step F7, the second node performs transaction committing on the first transaction. It is to be understood that the example in FIG. 6 is used only to understand this solution, and is not to be understood as a limitation on this disclosure.
  • This embodiment of this disclosure provides another data migration method.
  • whether received data is data to be processed by a current node is further determined based on a mapping relationship. If the received data is data to be processed by the current node, data processing is performed. If the received data is not processed by the current node, the data is migrated to a node corresponding to the data. In this way, processing of data of the same transaction type and migration between the multiple nodes are completed. Therefore, the data migration efficiency is further improved.
  • the load balancing system further includes a load balancing node.
  • the load balancing node determines that load balancing is desired, or determines that the first node is a hotspot (that is, a load bearable by the first node is exceeded), or determines that the first node fails, the foregoing update step may be performed.
  • the following describes how the load balancing node determines that a route assignment table is desired to be updated and how to update.
  • the load balancing system further includes a load balancing node.
  • the data migration method further includes that: the load balancing node obtains a second route assignment table, the second route assignment table including a mapping relationship between the first primary key identity and a second partition identity, the second partition identity indicating the first node, and the first node being configured to process the first transaction; the load balancing node determines the first data and the second node in response to determining that a data migration condition is satisfied, the first data being data desired to be migrated to the second node; the load balancing node replaces the second partition identity in the second route assignment table with the first partition identity to obtain the first route assignment table, the first partition identity being used for uniquely identifying the second node; and the load balancing node transmits the first route assignment table to the first node and the second node.
  • the load balancing system further includes the load balancing node.
  • the load balancing node may obtain the second route assignment table through system initialization.
  • the second route assignment table includes the mapping relationship between the first primary key identity and the second partition identity.
  • the second partition identity is used for uniquely identifying the first node.
  • the second route assignment table is a route assignment table obtained through initialization.
  • a node is used for the first time and data is imported, or a node completes data migration, or the like.
  • a specific initialization scenario is not limited herein.
  • a node may generate corresponding log information during data processing.
  • the load balancing node receives log information transmitted by each node, and determines a result obtained by statistically analyzing the log information to specifically perform load balancing processing, hotspot data migration, data migration of a failing node, or the like by using the result.
  • log information is added to each component of a node, or corresponding field information is added to original log information, so that impact on a processing logic of an existing system and impact on performance of each node are reduced.
  • a node is desired to generate the following log information.
  • log information is generated in this solution by using the primary key recording format or using the primary key recording format and the secondary index recording format.
  • the load balancing node may receive the log information transmitted by each node, statistically analyze the log information to obtain a statistical result, and determine, based on the statistical result, whether the data migration condition is satisfied. If the data migration condition is not satisfied, the system is not desired to perform data migration between nodes. If the data migration condition is satisfied, the load balancing node determines data desired to be migrated and a node receiving the data desired to be migrated.
  • the load balancing node determines the first data, and the first data is data desired to be migrated to the second node. That is, the first data and the second node may be determined. Since the second route assignment table includes the mapping relationship between the first primary key identity and the second partition identity, the second partition identity indicates the first node, and the first data uniquely indicated by the first primary key identity is desired to be migrated to the second node, the mapping relationship in the second route assignment table is desired to be updated. Based on this, the load balancing node replaces the second partition identity in the second route assignment table with the first partition identity to obtain the first route assignment table. That is, the obtained first route assignment table includes the mapping relationship between the first primary key identity and the first partition identity.
  • the first partition identity is used for uniquely identifying the second node.
  • the load balancing node transmits the first route assignment table to the first node and the second node, such that after receiving the first data, the first node or the second node may migrate or perform other processing on the first data in the manner described in the foregoing embodiments by using the mapping relationship in the first route assignment table.
  • This embodiment of this disclosure provides a route assignment table update method.
  • load balancing may be desired, a hotspot may be eliminated, or there may be a failing node.
  • data migration is performed between nodes. Data desired to be migrated and a node receiving the data desired to be migrated are determined based on a transaction identity.
  • an updated route assignment table is sent to each node, to ensure that each node may perform data processing based on the updated route assignment table and ensure processing accuracy of the node. Therefore, stability and data processing efficiency of the system are improved.
  • the load balancing node may obtain log information about each transaction. After the load balancing node completes analytically processing the log information, a cross-node transaction graph of 2PC transaction execution in the system may be constructed.
  • log information specifically used in this embodiment of this disclosure includes the starting transaction log information, the starting statement log information, the creation, read, update and deletion record log information, and the transaction committing log information or transaction rollback log information.
  • the load balancing system includes the node 1, the node 2, a node 3, a node 4, and the load balancing node and is applied to a cross-node data interaction scenario is used for description.
  • the node 1 is desired to deduct [300] from numerical information [1500] indicated by the primary key identity [100000001]
  • the node 3 is desired to add [600] to numerical information [300] indicated by the primary key identity [075500567].
  • the numerical information indicated by the primary key identity [100000001] on the node 1 changes to [1200]
  • the numerical information indicated by the primary key identity [075500567] on the node 3 changes to [600].
  • FIG. 7 is a schematic diagram of the cross-node transaction graph according to an embodiment of this disclosure.
  • G1 denotes the node 1
  • G2 denotes the node 2
  • G3 denotes the node 3
  • G4 denotes the node 4
  • G5 denotes a data node storing data indicated by the primary key identity [100000001] in the node G1
  • G6 denotes a data node storing data indicated by the primary key identity [075500567] in the node G3
  • G7 denotes a cross-node edge.
  • the load balancing node determines a node sending out a transaction execution statement as a starting point of the edge (that is, the node G5 in FIG. 7 ) in the cross-node transaction graph, and determines a node receiving the transaction execution statement as an ending point of the edge (that is, the node G6 in FIG. 7 ). If the data indicated by the primary key identity [100000001] is a data node in the node G1, and the data indicated by the primary key identity [075500567] is a data node in the node G3, a directed arrow between the two data nodes represents an execution sequence as the edge in the cross-node transaction graph. If the edge crosses two nodes, that is, the cross-node edge G7 shown in FIG. 7 , it indicates that the transaction indicated by the transaction identity [4567896128452] is a 2PC transaction.
  • the cross-node transaction graph, shown in FIG. 7 in the load balancing system may be constructed in a manner similar to the foregoing by using log information generated by a computing node when executing multiple transactions.
  • the following describes various cases in which the load balancing node determines, by using log information about each transaction, whether the data migration condition is satisfied.
  • the data migration condition is that a ratio of a 2PC transaction processing throughput to a total transaction processing throughput of a node is greater than a first preset threshold.
  • the operation of determining that a data migration condition is satisfied specifically includes: the load balancing node obtains a 2PC transaction identity in a case that log information transmitted by the first node is received in a first preset period, the log information transmitted by the first node including the 2PC transaction identity, and the 2PC transaction identity indicating that the log information is generated after the first node processes a 2PC transaction; the load balancing node statistically obtains a total transaction processing throughput of the first node based on the log information transmitted by the first node; the load balancing node statistically obtains a 2PC transaction processing throughput of the first node based on the 2PC transaction identity; and the load balancing node determines, in a case that a ratio of the 2PC transaction processing throughput of the first node to the total transaction processing throughput of the first node is greater than the first preset threshold, that the data migration condition is satisfied.
  • the data migration condition is that a ratio of a 2PC transaction processing throughput to a total transaction processing throughput of a node is greater than the first preset threshold.
  • the load balancing node can receive, in the first preset period, the log information transmitted by the first node, it can be seen from the foregoing embodiments that the log information transmitted by the first node may include the first transaction identity. If the first node processes the 2PC transaction, the log information transmitted by the first node further includes the 2PC transaction identity. The 2PC transaction identity indicates that the log information is generated after the first node processes the 2PC transaction.
  • the load balancing node statistically obtains the total transaction processing throughput of the first node based on the log information transmitted by the first node.
  • the total transactions processing throughput of the first node is a total quantity of transactions executed by the first node in the first preset period.
  • the load balancing node statistically obtains the 2PC transaction processing throughput of the first node based on the 2PC transaction identity.
  • the 2PC transaction processing throughput of the first node is a total quantity of 2PC transactions executed by the first node in the first preset period. Then, the ratio of the 2PC transaction processing throughput of the first node to the total transaction processing throughput of the first node is calculated. If the ratio is greater than the first preset threshold, the load balancing node determines that the data migration condition is satisfied.
  • the foregoing preset period may be 60 seconds and 5 minutes. Whether the preset period is desired to be adjusted is determined according to a running status of the system. Alternatively, the preset period may be adjusted on line as desired by the user. The preset period is mainly to make a compromise between a load balancing adjustment frequency and an adjustment delay acceptable by the user, so as to achieve a best load balancing effect. Obtaining the load balancing adjustment frequency also requires analysis and organization of a user load of each node.
  • the first preset threshold may be 5%, 10%, or the like.
  • the first preset threshold is 5%
  • load balancing adjustment is desired to be performed on the entire system by taking the first transaction type as a unit, to reduce the throughput of the data related to the first transaction type during execution within a range of the first preset threshold (5%). Stability of the load balancing adjustment may be improved based on the first preset threshold.
  • the load balancing node may determine, during load balancing by using the first preset threshold, a data migration degree at which data migration may be stopped. In this way, a load balancing effect is improved.
  • FIG. 7 is used as an example, and the first preset threshold is 10%.
  • a total transaction processing throughput of the node G1 is 100, and a 2PC transaction processing throughput of the node G1 is 25.
  • a ratio of the 2PC transaction processing throughput of the node G1 to the total transaction processing throughput of the node G1 is 25%, exceeding the first preset threshold, so that it is determined that load balancing is desired.
  • a total transaction processing throughput of the node G1 is 50, while a 2PC transaction processing throughput is 49.
  • a ratio of the 2PC transaction processing throughput of the node G3 to the total transaction processing throughput of the node G3 is 98%, also exceeding the first preset threshold, so that it is determined that load balancing is desired.
  • both the preset period and the first preset threshold that are described in the foregoing example are desired to be flexibly determined according to the running status of the system and an actual running requirement of the system, and specific numerical values are not to be understood as a limitation on this disclosure.
  • This embodiment of this disclosure provides a method for determining that the data migration condition is satisfied.
  • a data throughput during execution of a same transaction is determined based on a transaction identity, and whether data migration is desired may be determined based on the first preset threshold, thereby improving feasibility of this solution.
  • the load balancing node may determine, during load balancing by using the first preset threshold, a data migration degree at which data migration may be stopped. In this way, the load balancing effect is improved. That is, fluctuation of the processing capability of each node due to continuous migration of data between the nodes is avoided, thereby improving the data processing efficiency of the node.
  • the data migration condition is that total memory usage of a node is greater than a second preset threshold.
  • the operation of determining that a data migration condition is satisfied specifically includes: receiving, in a second preset period, total memory usage of the first node transmitted by the first node, the total memory usage indicating a memory resource occupied by multiple transactions processed by the first node; and determining, by the load balancing node in a case that the total memory usage of the first node is greater than the second preset threshold, that the data migration condition is satisfied.
  • the data migration condition is that total memory usage of a node is greater than the second preset threshold, and is used for determining whether the node is a hotspot. Based on this, if the load balancing node can receive, in the second preset period, the total memory usage of the first node transmitted by the first node, the total memory usage indicating the memory resource occupied by the multiple transactions processed by the first node, and the total memory usage of the first node is greater than the second preset threshold, the load balancing node determines that the data migration condition is satisfied.
  • whether the node is a hotspot may be determined in another manner, for example, a ratio of a total quantity of transactions processed by the node to a total quantity of transactions processed by each computing node in the system, a memory resource utilization of the node, or management on another resource utilization in the node. Therefore, a manner provided in this embodiment, in which the node is determined as a hotspot in response to determining that a resource utilization of the node is greater than the second preset threshold, is not to be understood as the only implementation of determining a hotspot.
  • the second preset period may be 60 seconds and 5 minutes, and may have duration the same as or different from that of the first preset period.
  • a specific numerical value of the second preset period is not limited herein.
  • the second preset threshold may be 85% 90%, or the like. In an example in which the second preset threshold is 85%, the load balancing system includes the node 1 and the node 2, total memory usage of the node 1 is 95%, and total memory usage of the node 2 is 60%.
  • the node 1 may be determined as a hotspot, and it is determined that the data migration condition is satisfied. Subsequent data migration desires data on the node 1 to be migrated to the node 2 or another non-hotspot node.
  • FIG. 8 is a schematic flowchart of determining that the data migration condition is satisfied according to an embodiment of this disclosure.
  • the load balancing node receives, in the second preset period, the total memory usage of the first node transmitted by the first node.
  • the load balancing node determines whether the total memory usage of the first node is greater than the second preset threshold. If the total memory usage of the first node is not greater than the second preset threshold, step H3 is performed; or if the total memory usage of the first node is greater than the second preset threshold, step H4 is performed.
  • step H3 the load balancing node receives total memory usage of the first node transmitted by the first node in a next first preset period, and processes the log information based on a method similar to step H1 and step H2.
  • step H4 since the total memory usage of the first node is greater than the second preset threshold, that is, the first node is a hotspot, the load balancing node determines that the data migration condition is satisfied.
  • step H5 the load balancing node determines, based on the manner described in the foregoing embodiments, data desired to be migrated and a node receiving the data desired to be migrated, and updates a mapping relationship in a route assignment table. Further, in step H6, the load balancing node sends an updated route assignment table to each node in the system, such that the first node migrates, based on the updated route assignment table, some data born by the first node. After data migration is completed, the load balancing node may further continue to determine whether current total memory usage of the first node is greater than the second preset threshold by using the method in step H2, until the hotspot is eliminated. It is to be understood that the example in FIG. 8 is used only to understand this solution, and a specific process and implementation steps may be flexibly adjusted according to an actual situation.
  • This embodiment of this disclosure provides another method for determining that the data migration condition is satisfied.
  • the total memory usage of the first node is obtained, and whether data migration is desired may be determined based on the second preset threshold, thereby improving feasibility of this solution.
  • the load balancing system further includes a third node.
  • the log information transmitted by the first node further includes the first transaction identity and the first primary key identity.
  • the data migration method further includes that: the load balancing node receives log information transmitted by the second node and log information transmitted by the third node, the log information transmitted by the second node including the first transaction identity and the first primary key identity, and the log information transmitted by the third node including the first transaction identity and the first primary key identity.
  • That the load balancing node determines the first data and the second node specifically includes that: the load balancing node collects statistics on the log information transmitted by the first node, the log information transmitted by the second node, and the log information transmitted by the third node, to obtain that a quantity of times the first node initiates the first transaction to the second node is L and that a quantity of times the first node initiates the first transaction to the third node is M, L and M being integers greater than or equal to 1; and the load balancing node determines the second node in a case that L is greater than M, and determines the first data by using the second route assignment table based on the first primary key identity.
  • the load balancing system further includes the third node, and the log information transmitted by the first node further includes the first transaction identity and the first primary key identity.
  • the load balancing node receives the log information transmitted by the second node and the log information transmitted by the third node.
  • the log information transmitted by the second node includes the first transaction identity and the first primary key identity
  • the log information transmitted by the third node includes the first transaction identity and the first primary key identity.
  • the load balancing node collects statistics on the log information transmitted by the first node, the log information transmitted by the second node, and the log information transmitted by the third node, to obtain that the quantity of times the first node initiates the first transaction to the second node is L and that the quantity of times the first node initiates the first transaction to the third node is M, L and M being integers greater than or equal to 1. If L is greater than M, it indicates that the first node processes the 2PC transaction for more times than the second node. Therefore, the second node is determined as a node capable of receiving migrated data. Then, after data of the 2PC transaction is migrated to the second node, a quantity of interactions for the 2PC transaction between the first node and the second node may be reduced by L.
  • the quantity of times the first node initiates the first transaction to the second node is 100
  • the quantity of times the first node initiates the first transaction to the third node is 50. If all data related to the first transaction in the first node is migrated to the third node, 50 interactions for the data between the first node and the third node may be eliminated, but 100 interactions between the data and the second node may not be eliminated. Therefore, if all the data related to the first transaction in the first node is migrated to the second node, the 100 interactions for the data between the first node and the second node may be eliminated. Although the 500 interactions between the data and the third node may not be eliminated, compared with the foregoing migration manner, this manner has the advantage that 2PC transactions are maximally eliminated, thereby reducing a system load.
  • this embodiment may also provide another method for determining to-be-migrated data.
  • the load balancing node finds all edges crossing computing nodes in FIG. 7 , calculates a difference between an amount of cross-node data processed by one data node and an amount of all data processed by the data node, and determines data requiring cross-node processing in a data node corresponding to a maximum difference as a first data interface. For example, for the data node G5 storing the data indicated by the primary key identity [100000001] in the node G1 in FIG. 7 , an amount of all data processed by the data node G5 is 100, and an amount of cross-node data processed by the data node G5 is 25.
  • a difference is 75 (100-25).
  • an amount of all data processed by the data node G6 is 50, and an amount of cross-node data processed by the data node G6 is 49.
  • a difference is 1 (50-49).
  • Foregoing calculation may be performed for a data node in which each piece of cross-node data is in the system, and a node corresponding to a maximum difference desires to migrate data in this node to another node. The corresponding data in the node is determined based on a transaction identity.
  • FIG. 9 is a schematic flowchart of determining to-be-migrated data according to an embodiment of this disclosure.
  • step I1 log information transmitted by each node in a preset period is received in the manner described in the foregoing embodiments to determine that there is a node desiring to perform data migration, that is, determine that the data migration condition is satisfied.
  • step I2 to-be-migrated data desired to be migrated is determined by using the method in this embodiment.
  • step 13 multiple nodes capable of processing the to-be-migrated data are determined, and a node receiving the data desired to be migrated is determined by using the method in this embodiment.
  • step I4 a mapping relationship in a route assignment table is updated.
  • step I5 a route assignment table including an updated mapping relationship is transmitted to all nodes in the load balancing system, such that the node performs data migration based on the route assignment table including the updated mapping relationship.
  • a method for data migration between the nodes is similar to that described in the foregoing embodiments, and will not be elaborated herein.
  • step I6 the load balancing node desires to redetermine whether the data migration condition is satisfied. If the data migration condition is satisfied, steps similar to step I1 to step I4 are performed; or if the data migration condition is not satisfied, step I7 is performed.
  • the load balancing node receives log information transmitted by each node in a next preset period, and processes the log information based on a manner similar to that in the foregoing embodiments. It is to be understood that the example in FIG. 9 is used only to understand this solution, and a specific process and implementation steps may be flexibly adjusted according to an actual situation.
  • This embodiment of this disclosure provides another data migration method.
  • quantities of 2PC transactions processed by different nodes are determined based on quantities of times the nodes process data in the same transaction, a node that processes a larger amount of data of 2PC transactions is determined as a node desiring to migrate data, and data on this node desired to be migrated is determined based on a transaction identity.
  • 2PC transactions are maximally eliminated, thereby improving the load balancing effect, that is, improving reliability of data migration in this solution.
  • the data migration condition is that a node fails.
  • the load balancing system further includes a third node.
  • the operation of determining that a data migration condition is satisfied specifically includes: the load balancing node determines, in a case that the first node does not transmit log information of the first node to the load balancing node in a first preset period, that the data migration condition is satisfied.
  • the data migration method further includes that: the load balancing node receives, in the first preset period, log information transmitted by the second node and log information transmitted by the third node, the log information transmitted by the second node including the first transaction identity and the first primary key identity, and the log information transmitted by the third node including the first transaction identity and the first primary key identity.
  • That the load balancing node determines the first data and the second node specifically includes that: the load balancing node obtains a partition identity set corresponding to the first node, the partition identity set corresponding to the first node including the first partition identity; the load balancing node obtains, based on the partition identity set corresponding to the first node, a primary key identity set corresponding to the first node, the primary key identity set corresponding to the first node including the first primary key identity; and the load balancing node determines the first data and the second node based on the first primary key identity, the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node.
  • the data migration condition is that a node fails. Based on this, if the load balancing node does not receive, in the first preset period, the log information of the first node transmitted by the first node, that is, the first node may fail, and cannot generate the corresponding log information, it is determined that the data migration condition is satisfied. Second, if the load balancing node can receive, in the first preset period, the log information transmitted by the second node and the log information transmitted by the third node, it indicates that the second node and the third node are both nodes that operate normally. Specifically, the log information transmitted by the second node includes the first transaction identity and the first primary key identity, and the log information transmitted by the third node includes the first transaction identity and the first primary key identity.
  • the load balancing node obtains the partition identity set corresponding to the first node.
  • the partition identity set corresponding to the first node includes the first partition identity.
  • the primary key identity corresponding to the first node is obtained based on the partition identity set corresponding to the first node.
  • the primary key identity set corresponding to the first node includes the first primary key identity.
  • the first data and the second node may be determined based on the first primary key identity, the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node.
  • the load balancing node is prevented from considering the failing node as a node capable of bearing data, thereby improving reliability of this solution. The following specifically describes how to determine the first data and the second node based on the foregoing information.
  • the load balancing node determines the first data and the second node based on the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node specifically includes that: the load balancing node determines, by using the second route assignment table, the first data based on the first primary key identity; the load balancing node collects statistics on the log information transmitted by the second node and the log information transmitted by the third node, to obtain that a quantity of times the second node initiates the first transaction is Q and that a quantity of times the third node initiates the first transaction is P, Q and P being integers greater than or equal to 1; and the load balancing node determines the second node in a case that Q is greater than P, and determines the first data by using the second route assignment table based on the first primary key identity.
  • the load balancing node determines, by using the second route assignment table, the first data based on the first primary key identity. Second, the load balancing node collects, in a manner similar to that in the foregoing embodiments, statistics on the log information transmitted by the second node and the log information transmitted by the third node, to obtain that the quantity of times the second node initiates the first transaction is Q and that the quantity of times the third node initiates the first transaction is P, Q and P being integers greater than or equal to 1. When Q is greater than P, it indicates that a quantity of data interactions performed by the second node is larger. Therefore, the load balancing node determines the second node as a node capable of bearing the first data.
  • the load balancing node may add a node to migrate the data on the failing first node to the new node, and then update the route assignment table, such that the corresponding data may be processed by the new node.
  • This manner is similar to the manner of initializing the route assignment table, and updating a related mapping of the partition identity is also similar to that described in the foregoing embodiments. Therefore, this is not limited herein.
  • This embodiment of this disclosure provides another data migration method.
  • data on the failing node is all desired to be migrated.
  • a node to which each piece of data may be migrated is determined based on a mapping relationship between a primary key identity of each piece of data and a transaction identity, and a node that processes fewer 2PC transactions is determined, based on quantities of times different nodes process data in a transaction, as a node capable of receiving the data.
  • a hotspot or a system load imbalance is avoided. Therefore, the load balancing effect is improved.
  • the load balancing node obtains a second route assignment table specifically includes that: the load balancing node obtains a first primary key identity set, the first primary key identity set including multiple primary key identities, and one primary key identity being used for uniquely identifying the first data; the load balancing node divides the first primary key identity set into S second primary key identity sets, each primary key identity in the second primary key identity set being managed by a same node, and S being an integer greater than or equal to 1; the load balancing node assigns the S second primary key identities to S nodes; the load balancing node determines, from the S nodes, the first node managing a third primary key identity set, the third primary key identity set including the first primary key identity; the load balancing node obtains a partition identity set corresponding to the first node; the load balancing node determines, from the partition identity set, the second partition identity corresponding
  • User-specified partitioning is not desired to specify a partition identity for each user table, that is, determine a partition identity for each column of data in a user table of a user.
  • the user notifies the load balancing node of a transaction assignment logic in advance, such that the load balancing node may perform load balancing most efficiently.
  • the user may specify partitioning to be performed according to a province to which the numerical information account belongs.
  • data of numerical information accounts in the same province is all managed by a same node. It is generally considered that most transactions related to numerical information may be directly executed in numerical information accounts in the same province. In this way, taking the province as a partition identity may implement load balancing processing well.
  • Load-balancing-based partitioning is desired to be based on a primary key identity corresponding to each piece of data, and is a partitioning mode in which a ratio of a throughput of the 2PC transaction in the entire system is specified based on a running effect of load balancing and adjustment is performed according to the ratio. Since transaction processing is generally recorded based on the primary key identity corresponding to each piece of data, a node in which data is or a transaction corresponding to the data may be determined by using a mapping relationship during running in a transaction dimension.
  • a user-specified parameter "USER_SPECIFIED_PARITITON_KEYS” is used for indicating that the user specifies a mapping relationship between a primary key identity and a partition identity
  • a load balancing parameter "USER_SPECIFIED_ALGORITHM” is used for instructing partitioning to be performed based on load balancing. Therefore, when the load balancing system is started, a program may detect whether there is the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS".
  • partitioning is performed in the user-specified partitioning mode, thereby establishing the mapping relationship between the primary key identity and the partition identity, and generating the second route assignment table, to complete initialization.
  • the program may also detect whether there is the load balancing parameter "USER_SPECIFIED_ALGORITHM”. If there is the load balancing parameter "USER_SPECIFIED_ALGORITHM”, partitioning is performed in the load-balancing-based partitioning mode. The following describes in detail how to perform partitioning based on load balancing.
  • the load balancing node obtains the first primary key identity set.
  • the first primary key identity set includes the multiple primary key identities.
  • One primary key identity is used for uniquely identifying the first data.
  • a node is used for the first time and data is imported, the node completes data migration, or the like.
  • the load balancing system has started importing the data, and each piece of data corresponds to a primary key identity uniquely identifying the data. Therefore, primary key identities corresponding to all data may be obtained, and the first primary key identity set including the multiple primary key identities is further obtained.
  • the load balancing node is desired to determine how many nodes are available in the load balancing system, and then equally divides the first primary key identity set into a corresponding quantity of second primary key identity sets. Based on this, the load balancing node divides the first primary key identity set into the S second primary key identity sets. Each primary key identity in the second primary key identity set is managed by a same node. S is an integer greater than or equal to 1. Then, the load balancing node assigns the S second primary key identity sets to the S nodes, such that each node may manage the data identified by multiple primary key identities in the second primary key identity set.
  • a second primary key identity set including the first primary key identity may be first determined from the S second primary key identity sets.
  • the second primary key identity set including the first primary key identity is determined as the third primary key identity set. Since the S second primary key identity sets are assigned to the S nodes, the first node managing the third primary key identity set may further be determined. Further, the load balancing node obtains a partition identity set corresponding to the first node, and determines the second partition identity corresponding to the first data from partition identities which are in the partition identity set corresponding to the first node and for which mapping relationships are not established.
  • mapping relationships cannot be established between other data and the second partition identity, unless the first data no longer corresponds to the second partition identity. Then, the load balancing node may establish the mapping relationship between the first primary key identity and the second partition identity. It is to be understood that, for another data identity, a mapping relationship between a primary key identity and a partition identity may also be established in a similar manner, thereby generating the second route assignment table.
  • FIG. 10 is a schematic flowchart of obtaining the second route assignment table according to an embodiment of this disclosure.
  • step J1 load balancing is started.
  • the node is used for the first time and the data is imported, or the node completes data migration. This is not limited herein.
  • a corresponding parameter is loaded for initialization, that is, input of the parameter corresponding to the load balancing node is completed, and an initialization phase is entered.
  • Input of the parameter may include inputting the load balancing parameter "USER_SPECIFIED_ALGORITHM” or the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS". It is to be understood that the input parameter in this embodiment necessarily includes “USER_SPECIFIED_ALGORITHM", but does not necessarily include “USER_SPECIFIED_PARITITON_KEYS".
  • step 13 whether the user specifies the partition identity is determined, that is, whether the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS" may be detected is determined. If the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS" is detected, it is determined that the user specifies the partition identity, and step J4 is performed.
  • step J4 a primary key identity and a partition identity of each user table are collected, and consistent hash division is performed on the partition identities according to node values corresponding to the nodes to obtain a node specifically indicated by each partition identity. After identity information of all user tables is collected, mapping relationships are established based on the primary key identity and the partition identity of each user table. Therefore, in step J5, the second route assignment table is generated.
  • step J6 load-balancing-based partitioning is started. Specifically, a related data structure is first initialized. Partition information is initialized for each user table, and statistical primary key information is collected for each user table. Therefore, in step J7, the first primary key identity set is obtained. For more ease of understanding this solution, the following uses establishment of the mapping relationship between the first primary key identity and the second partition identity as an example for description. Based on this, in step J8, the first primary key identity set is divided into the S second primary key identity sets, and the S second primary key identity sets are assigned to the S nodes.
  • step J9 the first node managing the third primary key identity set is determined from the S nodes, and the partition identity set corresponding to the first node is obtained. Based on this, in step J10, the second partition identity corresponding to the first data is determined from the partition identity set. Processing of steps J8 to J10 is performed on each row of data in each user table, and then the mapping relationships between the primary key identities and the partition identities are established. In this way, the second route assignment table may be generated in step J5.
  • This embodiment of this disclosure provides a method for obtaining the route assignment table through initialization.
  • the route assignment table is obtained in different initialization manners. Therefore, feasibility and flexibility of this solution are improved.
  • the load balancing node obtains a second route assignment table specifically includes that: the load balancing node obtains a third route assignment table; the load balancing node receives a data addition instruction, the data addition instruction carrying the mapping relationship between the first primary key identity and the second partition identity; the load balancing node obtains the mapping relationship between the first primary key identity and the second partition identity according to the data addition instruction; and the load balancing node adds the mapping relationship between the first primary key identity and the second partition identity to the third route assignment table to obtain the first route assignment table.
  • the load balancing node obtains the third route assignment table.
  • the third route assignment table may be a route assignment table obtained through initialization, or a route assignment table obtained by updating the mapping relationship. This is not specifically limited herein.
  • the load balancing node receives the data addition instruction.
  • a user table is desired to be created for the new user account.
  • the user table specifically includes multiple primary key identities corresponding to the related data of the new transaction and nodes to which the related data of the new transaction is specified by the user to belong to in the user table (that is, partition identities are obtained).
  • the data addition instruction carries the mapping relationship between the first primary key identity and the second partition identity.
  • the load balancing node may obtain the mapping relationship between the first primary key identity and the second partition identity according to the data addition instruction, and add the mapping relationship between the first primary key identity and the second partition identity to the third route assignment table to obtain the first route assignment table. That is, the related data of the new transaction is added by using mapping relationships between the primary key identities and the partition identities that are specified by the user.
  • the load balancing node may assign the related data of the new transaction for uniform addition and distribution to different nodes. Then, the existing third route assignment table is initialized in a manner similar to the foregoing. In this way, the first route assignment table may be obtained.
  • the load balancing node obtains a second route assignment table specifically includes that: the load balancing node obtains a fourth route assignment table, the fourth route assignment table including a mapping relationship between a second primary key identity and the first partition identity; the load balancing node receives a data deletion instruction, the data deletion instruction carrying the second primary key identity; the load balancing node obtains the second primary key identity according to the data deletion instruction; the load balancing node determines, from the fourth route assignment table, the mapping relationship between the second primary key identity and the first partition identity based on the second primary key identity; and the load balancing node deletes the mapping relationship between the second primary key identity and the first partition identity in the fourth route assignment table to obtain the second route assignment table.
  • the load balancing node obtains the fourth route assignment table.
  • the fourth route assignment table includes the mapping relationship between the second primary key identity and the first partition identity.
  • the fourth route assignment table may be a route assignment table obtained through initialization, or a route assignment table obtained by updating the mapping relationship. This is not specifically limited herein.
  • the load balancing node may obtain the second primary key identity according to the data deletion instruction, determine, from the fourth route assignment table, the mapping relationship between the second primary key identity and the first partition identity based on the second primary key identity, and delete the mapping relationship between the second primary key identity and the first partition identity in the fourth route assignment table to obtain the second route assignment table. Then, the second route assignment table no longer includes the foregoing mapping relationship.
  • mapping relationships corresponding to the multiple pieces in the transaction in the route assignment table are deleted in a manner similar to the foregoing.
  • a specific node in the load balancing system may update the route assignment table by using its own computing capability, to complete addition or deletion of a mapping relationship.
  • the node is also desired to transmit an updated route assignment table to another node to ensure consistency of the route assignment table in the load balancing system.
  • This embodiment of this disclosure provides another method for obtaining the second route assignment table.
  • the mapping relationship in the existing route assignment table may be updated by using the data addition instruction or the data deletion instruction.
  • an obtained route assignment table can accurately reflect a mapping relationship between a primary key identity and a partition identity of each piece of data, thereby ensuring data processing accuracy of each node.
  • FIG. 11 is a schematic diagram of a structure of the data migration apparatus according to an embodiment of this disclosure.
  • the data migration apparatus 1100 includes: an obtaining module 1101, configured to obtain a first route assignment table, the first route assignment table including a mapping relationship between a first primary key identity and a first partition identity, the first primary key identity being used for uniquely identifying first data, and the first partition identity indicating a second node; a receiving module 1102, configured to receive a first instruction, the first instruction carrying the first primary key identity and a first transaction identity, the first transaction identity indicating a first transaction, and a first node being configured to process the first transaction; the obtaining module 1101 being further configured to obtain, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction; a processing module 1103, configured to determine the second node based on the first partition identity, the second node being configured to process the first transaction; and
  • the first route assignment table further includes a mapping relationship between a first index identity and the first partition identity.
  • the first instruction further includes the first index identity.
  • the obtaining module 1101 is specifically configured to: determine, by using the first route assignment table, N partition identities based on the first index identity carried in the first instruction, the N partition identities including the first partition identity, and N being an integer greater than or equal to 1; and determine, by using the first route assignment table, the first partition identity from the N partition identities based on the first primary key identity carried in the first instruction.
  • the data migration apparatus 1100 further includes a deletion module 1105.
  • the obtaining module 1101 is further configured to obtain a second route assignment table at a first time point, the second route assignment table including a mapping relationship between the first primary key identity and a second partition identity, and the second partition identity indicating the first node.
  • the obtaining module 1101 is specifically configured to obtain the first route assignment table at a second time point, the second time point being later than the first time point.
  • the deletion module 1105 is configured to delete the second route assignment table after the transmission module transmits the first data uniquely identified by the first primary key identity to the second node.
  • the data migration apparatus 1100 further includes a determining module 1106 and a replacement module 1107.
  • the obtaining module 1101 is further configured to obtain a second route assignment table, the second route assignment table including a mapping relationship between the first primary key identity and a second partition identity, the second partition identity indicating the first node, and the first node being configured to process the first transaction.
  • the determining module 1106 is configured to determine the first data and the second node in response to determining that a data migration condition is satisfied, the first data being data desired to be migrated to the second node.
  • the replacement module 1107 is configured to replace the second partition identity in the second route assignment table with the first partition identity to obtain the first route assignment table, the first partition identity being used for uniquely identifying the second node.
  • the transmission module 1104 is further configured to transmit the first route assignment table to the first node and the second node.
  • the data migration condition is that a ratio of a 2PC transaction processing throughput to a total transaction processing throughput of a node is greater than the first preset threshold.
  • the determining module 1106 is specifically configured to: obtain a 2PC transaction identity in a case that log information transmitted by the first node is received in a first preset period, the log information transmitted by the first node including the 2PC transaction identity, and the 2PC transaction identity indicating that the log information is generated after the first node processes a 2PC transaction; statistically obtain a total transaction processing throughput of the first node based on the log information transmitted by the first node; statistically obtain a 2PC transaction processing throughput of the first node based on the 2PC transaction identity; and determine, in a case that a ratio of the 2PC transaction processing throughput of the first node to the total transaction processing throughput of the first node is greater than the first preset threshold, that the data migration condition is satisfied.
  • the data migration condition is that total memory usage of a node is greater than a second preset threshold.
  • the determining module 1106 is specifically configured to: receive, in a second preset period, total memory usage of the first node transmitted by the first node, the total memory usage indicating a memory resource occupied by multiple transactions processed by the first node; and determine, in a case that the total memory usage of the first node is greater than the second preset threshold, that the data migration condition is satisfied.
  • the log information transmitted by the first node further includes the first transaction identity and the first primary key identity.
  • the receiving module 1102 is further configured to receive log information transmitted by the second node and log information transmitted by a third node, the log information transmitted by the second node including the first transaction identity and the first primary key identity, and the log information transmitted by the third node including the first transaction identity and the first primary key identity.
  • the determining module 1106 is specifically configured to: collect statistics on the log information transmitted by the first node, the log information transmitted by the second node, and the log information transmitted by the third node, to obtain that a quantity of times the first node initiates the first transaction to the second node is L and that a quantity of times the first node initiates the first transaction to the third node is M, L and M being integers greater than or equal to 1; and determine the second node in a case that L is greater than M, and determine the first data by using the second route assignment table based on the first primary key identity.
  • the data migration condition is that a node fails.
  • the determining module 1106 is specifically configured to determine, in a case that the first node does not transmit log information of the first node to a load balancing node in a first preset period, that the data migration condition is satisfied.
  • the receiving module 1102 is further configured to receive, in the first preset period, log information transmitted by the second node and log information transmitted by a third node, the log information transmitted by the second node including the first transaction identity and the first primary key identity, and the log information transmitted by the third node including the first transaction identity and the first primary key identity.
  • the determining module 1106 is specifically configured to: obtain a partition identity set corresponding to the first node, the partition identity set corresponding to the first node including the first partition identity; obtain, based on the partition identity set corresponding to the first node, a primary key identity set corresponding to the first node, the primary key identity set corresponding to the first node including the first primary key identity; and determine the first data and the second node based on the first primary key identity, the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node.
  • the determining module 1106 is specifically configured to: determine, by using the second route assignment table, the first data based on the first primary key identity; collect statistics on the log information transmitted by the second node and the log information transmitted by the third node, to obtain that a quantity of times the second node initiates the first transaction is Q and that a quantity of times the third node initiates the first transaction is P, Q and P being integers greater than or equal to 1; and determine the second node in a case that Q is greater than P, and determine the first data by using the second route assignment table based on the first primary key identity.
  • the obtaining module 1101 is specifically configured to: obtain a first primary key identity set, the first primary key identity set including multiple primary key identities, and one primary key identity being used for uniquely identifying the first data; divide the first primary key identity set into S second primary key identity sets, each primary key identity in the second primary key identity set being managed by a same node, and S being an integer greater than or equal to 1; assign the S second primary key identy sets to S nodes; determine, from the S nodes, the first node managing a third primary key identity set, the third primary key identity set including the first primary key identity; obtain a partition identity set corresponding to the first node; determine, from the partition identity set, the second partition identity corresponding to the first data; and establish the mapping relationship between the first primary key identity and the second partition identity, and generate the first route assignment table.
  • the obtaining module 1101 is specifically configured to: obtain a third route assignment table; receive a data addition instruction, the data addition instruction carrying the mapping relationship between the first primary key identity and the second partition identity; obtain the mapping relationship between the first primary key identity and the second partition identity according to the data addition instruction; and add the mapping relationship between the first primary key identity and the second partition identity to the third route assignment table to obtain the first route assignment table.
  • the obtaining module 1101 is specifically configured to: obtain a fourth route assignment table, the fourth route assignment table including a mapping relationship between a second primary key identity and the first partition identity; receive a data deletion instruction, the data deletion instruction carrying the second primary key identity; obtain the second primary key identity according to the data deletion instruction; determine, from the fourth route assignment table, the mapping relationship between the second primary key identity and the first partition identity based on the second primary key identity; and delete the mapping relationship between the second primary key identity and the first partition identity in the fourth route assignment table to obtain the second route assignment table.
  • FIG. 12 is a schematic diagram of an embodiment of a server according to an embodiment of this disclosure.
  • the server 1000 may differ greatly in case of different configurations or performance. It may include one or more central processing units (CPUs) 1022 (for example, one or more processors) and memories 1032, and one or more storage media 1030 (for example, one or more mass storage devices) that store application programs 1042 or data 1044.
  • the memory 1032 and the storage medium 1030 may implement temporary storage or persistent storage.
  • the program stored in the storage medium 1030 may include one or more modules (not shown in the figure), each of which may include a series of instruction operations in the server. Furthermore, the CPU 1022 may be configured to communicate with the storage medium 1030 to execute, on the server 1000, the series of instruction operations in the storage medium 1030.
  • the server 1000 may further include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1058, and/or one or more operating systems 1041, for example, Windows Server TM , Mac OS X TM , Unix TM , Linux TM , and FrEBSD TM .
  • the steps performed by the server in the foregoing embodiments may be based on the structure of the server shown in FIG. 12 .
  • the CPU 1022 of the server is configured to execute the embodiment shown in FIG. 4 and each embodiment corresponding to FIG. 4 .
  • An embodiment of this disclosure also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program which, when run in a computer, enables the computer to perform the steps performed by a node of the server or the load balancing node in the method described in the embodiment shown in FIG. 4 and the method described in each embodiment corresponding to FIG. 4 .
  • An embodiment of this disclosure also provides a computer program product including a program.
  • the computer program product when run in a computer, enables the computer to perform the steps performed by a node of the server or the load balancing node in the method described in the embodiment shown in FIG. 4 .
  • the disclosed system, apparatus, and method may be implemented in another manner.
  • the apparatus embodiment described above is merely schematic.
  • division of the units is merely a logic function division, and other division manners may be used in actual implementations.
  • multiple units or components may be combined or integrated into another system, or some characteristics may be neglected or not executed.
  • coupling or direct coupling or communication connection between displayed or discussed components may be indirect coupling or communication connection, implemented through some interfaces, of the apparatus or the units, and may be electrical and mechanical or use other forms.
  • the units described as separate parts may or may not be physically separated. Parts displayed as units may or may not be physical units, that is, may be located in the same place or distributed to multiple network units. Some or all of the units may be selected as actually required to achieve an objective of the solution of this embodiment.
  • each function unit in each embodiment of this disclosure may be integrated into a processing unit.
  • each unit may physically exist independently.
  • two or more than two units may be integrated into a unit.
  • the integrated unit may be implemented in a hardware form or in a form of a software function unit.
  • the integrated unit When implemented in the form of the software function unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium, including multiple instructions for enabling a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or some of the steps in the method as described in each embodiment of this disclosure.
  • the foregoing storage medium includes various media capable of storing program code, for example, a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.

Abstract

Disclosed in the present application are a data migration method and apparatus, and a device, a medium and a computer product. The method relates to the field of traffic, and is applied to a load balancing system. The load balancing system comprises a first node and a second node. The method comprises: a first node acquiring a first route allocation table; the first node receiving a first instruction; the first node acquiring a first partition identifier by means of the first route allocation table and on the basis of a first primary key identifier carried in the first instruction; the first node determining a second node on the basis of the first partition identifier; and the first node transmitting, to the second node, first data uniquely identified by the first primary key identifier. By means of the method, in a scenario where transactions need to perform data interaction between a plurality of nodes, on the basis of a mapping relationship between a primary key identifier and a partition identifier, data that needs to be migrated and which node the data is migrated to are determined, without the need to forward user requests multiple times between the plurality of nodes, thereby improving the efficiency of data migration.

Description

  • This application claims priority to Chinese Patent Application No. 202111076493.3, entitled "DATA MIGRATION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM" filed with the China National Intellectual Property Administration on September 14, 2021 , which is incorporated by reference in its entirety.
  • FIELD OF THE TECHNOLOGY
  • This disclosure relates to the field of Internet technologies and the field of traffic, and in particular, to data migration.
  • BACKGROUND OF THE DISCLOSURE
  • With development of Internet services, more and more application programs are required to provide services through the Internet. With continuous growth of services, data migration may be implemented in a distributed cache system by using a Linux virtual server (LVS).
  • As a virtual layer-four switch cluster system, the LVS forwards a user request according to an address and a port, and performs data migration according to a connection processed by a current cluster database service process. Data migration may provide the distributed cache system with high scalability.
  • However, in a scenario in which a user transaction is complex, there is data interaction between multiple nodes, and efficiency of performing data migration by using the LVS is reduced to some extent. Therefore, how to improve the data migration efficiency in the scenario in which the user transaction is complex becomes a problem urgent to be solved.
  • SUMMARY
  • Embodiments of this disclosure provide a data migration method and apparatus, a device, a storage medium, and a computer product. In a scenario in which data interaction between multiple nodes is desired by a transaction, based on a mapping relationship between a primary key identity and a partition identity, data desired to be migrated can be determined by using the primary key identity, and a node to which the data is to be migrated can be determined by using the partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • In view of this, a first aspect of this disclosure provides a data migration method. The data migration method is applied to a load balancing system. The load balancing system includes a first node and a second node. The method includes: receiving, by the first node, a first instruction carrying a first primary key identity and a first transaction identity, the first primary key identity indicating first data that is to be processed, the first transaction identity indicating a first transaction, the first node being used for processing the first transaction; obtaining, by the first node and using a first route assignment table, a first partition identity based on the first primary key identity, the first route assignment table containing a mapping relationship between the first primary key identity and the first partition identity, the first partition identity indicating the second node for processing the first transaction; determining, by the first node, the second node based on the first partition identity; and transmitting, by the first node, the first data to the second node.
  • A second aspect of this disclosure provides a data migration apparatus, including: an obtaining module, configured to obtain a first route assignment table, the first route assignment table including a mapping relationship between a first primary key identity and a first partition identity, the first primary key identity being used for uniquely identifying first data, and the first partition identity indicating a second node; a receiving module, configured to receive a first instruction, the first instruction carrying the first primary key identity and a first transaction identity, the first transaction identity indicating a first transaction, and a first node being configured to process the first transaction; the obtaining module being further configured to obtain, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction; a processing module, configured to determine the second node based on the first partition identity, the second node being configured to process the first transaction; and a transmission module, configured to transmit the first data to the second node.
  • A third aspect of this disclosure provides a computer-readable storage medium. The computer-readable storage medium stores instructions which, when run in a computer, enable the computer to perform the method as described in each of the foregoing aspects.
  • A fourth aspect of this disclosure provides a computer program product or computer program. The computer program product or computer program includes computer instructions. The computer instructions are stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium. The processor executes the computer instructions to cause the computer device to perform the method provided in each of the foregoing aspects.
  • According to the foregoing technical solutions, it can be seen that the embodiments of this disclosure have the following advantages. The embodiments of this disclosure are applied to the load balancing system. The load balancing system includes the first node and the second node. Based on this, the first route assignment table is first obtained by using the first node. The first route assignment table includes the mapping relationship between the first primary key identity and the first partition identity. The first primary key identity is used for uniquely identifying the first data. The first partition identity indicates the second node. Then, the first node receives the first instruction. The first instruction carries the first primary key identity and the first transaction identity. The first transaction identity indicates the first transaction. The first node is configured to process the first transaction. Based on this, the first node obtains, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction, and determines the second node based on the first partition identity. The second node is configured to process the first transaction. Then, the first node transmits the first data uniquely identified by the first primary key identity to the second node. In the foregoing manner, in a scenario in which data interaction between multiple nodes is desired by a transaction, based on a mapping relationship between a primary key identity and a partition identity, data desired to be migrated can be determined by using the primary key identity, and a node to which the data is to be migrated can be determined by using the partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • BRIEF DESCRIPTION OF THE DRAWINGS
    • FIG. 1 is a schematic diagram of a consistent hashing algorithm according to an embodiment of this disclosure.
    • FIG. 2 is a schematic diagram of a case that load balancing is performed based on a consistent hashing algorithm according to an embodiment of this disclosure.
    • FIG. 3 is a schematic diagram of an architecture of a load balancing system according to an embodiment of this disclosure.
    • FIG. 4 is a schematic diagram of a data migration method according to an embodiment of this disclosure.
    • FIG. 5 is a schematic diagram of a case that a partition identity uniquely identifies a node according to an embodiment of this disclosure.
    • FIG. 6 is a schematic flowchart of data migration according to an embodiment of this disclosure.
    • FIG. 7 is a schematic diagram of a cross-node transaction graph according to an embodiment of this disclosure.
    • FIG. 8 is a schematic flowchart of determining that a data migration condition is satisfied according to an embodiment of this disclosure.
    • FIG. 9 is a schematic flowchart of determining to-be-migrated data according to an embodiment of this disclosure.
    • FIG. 10 is a schematic flowchart of obtaining a second route assignment table according to an embodiment of this disclosure.
    • FIG. 11 is a schematic diagram of a structure of a data migration apparatus according to an embodiment of this disclosure.
    • FIG. 12 is a schematic diagram of a server according to an embodiment of this disclosure.
    DESCRIPTION OF EMBODIMENTS
  • The embodiments of this disclosure provide a data migration method and apparatus, a computer device, and a storage medium. In a scenario in which data interaction between multiple nodes is desired by a transaction, based on a mapping relationship between a primary key identity and a partition identity, data desired to be migrated may be determined by using the primary key identity, and a node to which the data is to be migrated may be determined by using the partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • Terms "first", "second", "third", "fourth", and the like (if existing) in the specification, the claims, and the drawings of this disclosure are used to distinguish between similar objects, but are not used to describe a specific order. It is to be understood that data used like this may be interchanged as appropriate, such that the embodiments of this disclosure described herein may be implemented according to, for example, sequences in addition to those illustrated or described herein. In addition, terms "include", "corresponding to", and any other variants are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those expressly listed steps or units, but may include other steps or units not expressly listed or inherent to such a process, method, product, or device.
  • With development of Internet services, more and more application programs are required to provide services through the Internet. With continuous growth of services, data migration may be implemented in a distributed cache system by using an LVS. As a virtual layer-four switch cluster system, the LVS forwards a user request according to a target address and a target port. In addition, the LVS only forwards the user request without generating traffic, and performs data migration according to a connection processed by a current cluster database service process, thereby implementing load balancing. Based on this, data migration may provide the distributed cache system with high scalability.
  • However, based on the characteristic that the LVS can only forward the user request, in a scenario in which a user transaction is simple (for example, data interaction is completed on one node), load balancing may be performed according to the connection processed by the current cluster database service process. In a scenario in which a user transaction is complex, for example, there is data interaction between multiple nodes, data migration based on the LVS may require the user request to be forwarded for many times between the multiple nodes, which reduces data migration efficiency. Therefore, how to improve the data migration efficiency in a scenario in which the user transaction is complex becomes a problem urgent to be solved.
  • Based on this, the embodiments of this disclosure provide a data migration method. In a scenario in which data interaction between multiple nodes is desired by data of a same transaction type, data desired to be migrated and a node to which the data is to be migrated may be determined based on a mapping relationship between a primary key identity and a partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • For ease of understanding, some terms or concepts involved in the embodiments of this disclosure are explained first.
  • First: load balancing
  • In a distributed cache system, a load balancing node uniformly sends a request that is sent by a user client to a database side to a back-end database service process to provide a data service. In general, multiple peer database service processes in the distributed cache system may externally provide services. Based on this, a user transaction load is uniformly distributed in all database service processes, such that an entire database can provide a maximum data service capability.
  • Second: node
  • In this embodiment, the node includes a computing node and a storage node. The computing node is configured to process a specific computing request of a user. The computing node is specifically a node capable of executing a user request. The storage node is a storage device in the distributed cache system, that is, is configured to store data. The storage node is specifically a node completing execution and committing of a distributed transaction in the distributed cache system. In this embodiment, the computing node and the storage node are specifically servers. In addition, in this embodiment, the node including the computing node and the storage node belongs to a database in a load balancing system.
  • Third: two-phase commit (2PC)
  • 2PC is an algorithm designed in the fields of computer networks and databases to keep consistency of nodes under a distributed cache system architecture during transaction committing.
  • Fourth: consistent hashing algorithm
  • The consistent hashing algorithm is applied extensively to a distributed system. As a hash algorithm, consistent hashing can minimize a change in an existing mapping relationship between a service request and a request processing server when a server is removed or added, to maximally meet a requirement for monotonicity.
  • Specifically, a hash value is calculated for each node by using the consistent hashing algorithm. As shown in FIG. 1, FIG. 1 is a schematic diagram of the consistent hashing algorithm according to an embodiment of this disclosure. As shown in FIG. 1, A1 to A4 denote nodes, and B1 to B5 denote data. A first node that is found clockwise and that is greater than data is a node in which the data is. Specifically, a hash value calculated based on a partition identity of the data is compared with hash values of the nodes, and a node corresponding to a first hash value that is found clockwise and that is greater than the hash value calculated based on the partition identity of the data is a node in which the data is. For example, a first node found clockwise for the data B1 is the node A1, so that the data B1 is stored in the node A1, and a user request corresponding to the data B1 is executed to complete execution and committing of a transaction corresponding to the data B1. Similarly, a first node found clockwise for the data B2 is the node A2, a first node found clockwise for the data B3 is the node A3, and first nodes found clockwise for the data B4 and the data B5 are the node A4.
  • Fifth: load balancing based on the consistent hashing algorithm
  • For ease of understanding, based on the embodiment shown in FIG. 1, how to perform load balancing based on the consistent hashing algorithm is briefly described with reference to FIG. 2. FIG. 2 is a schematic diagram of a case that load balancing is performed based on the consistent hashing algorithm according to an embodiment of this disclosure. As shown in FIG. 2, C1 to C5 denote nodes, and D1 to D5 denote data. Based on this, if a computing capability is currently desired to be expanded, and a node currently with a maximum computational load is the node C4, the node C5 is added between the node C3 and the node C4. In this case, the data D4 in the data D4 and the data D5 that are originally assigned to the node C4 according to an assignment rule of the consistent hashing algorithm described with reference to FIG. 1 may be assigned to the added node C5, to expand the computing capability, thereby completing load balancing. Then, if the computing capability is desired to be reduced, it is only necessary to remove the node C5. In this case, the data D4 assigned to the node C5 may be reassigned to the node C4 according to the assignment rule of the consistent hashing algorithm described with reference to FIG. 1, to reduce the computing capability, thereby completing dynamic load balancing.
  • It is to be understood that in actual applications, there are many methods for smaller-granularity or more uniform data assignment based on the consistent hashing algorithm, which will not be described herein. In the data migration method proposed in the embodiments of this disclosure, data in a transaction is desired to be assigned based on the consistent hashing algorithm.
  • The above explains some terms or concepts involved in the embodiments of this disclosure. The following describes a virtual layered framework in the embodiments of this disclosure, so as to better understand this solution. Refer to FIG. 3. FIG. 3 is a schematic diagram of an architecture of a load balancing system according to an embodiment of this disclosure. As shown in FIG. 3, a node in FIG. 3 specifically includes multiple computing nodes and a storage node communicating with each computing node. A user specifically sends a request corresponding to a transaction to the computing node by using a terminal device. Then, data corresponding to the transaction is specifically called from the storage node. The data is executed and committed by the storage node, such that the computing node completes execution of the request corresponding to the transaction. In a process in which the request is completed, the node may output log information to the load balancing system, and then the load balancing node analytically processes the log information to determine whether load balancing (that is, data migration) is desired.
  • Each of the load balancing node and the node in FIG. 3 may be a server, a server cluster including multiple servers, a cloud computing center, or the like. This is not specifically limited herein. A client is specifically deployed in the terminal device. The terminal device may be a tablet computer, a notebook computer, a palm computer, a mobile phone, a personal computer (PC), or a voice interaction device shown in FIG. 3.
  • The terminal device may communicate with the node by using a wireless network, a wired network, or a removable storage medium. The wireless network uses a standard communication technology and/or protocol. The wireless network is generally the Internet, but may alternatively be any network, including, but not limited to, any combination of Bluetooth, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a mobile private network, or a virtual private network. In some embodiments, the foregoing data communication technology may be replaced or supplemented with a custom or private data communication technology. The removable storage medium may be a universal serial bus (USB) flash drive, a mobile hard disk, another removable storage medium, or the like.
  • Only five terminal devices, one node, and one load balancing node are shown in FIG. 3. However, it is to be understood that an example in FIG. 3 is used only to understand this solution, and specific quantities of terminal devices, nodes, and load balancing nodes are flexibly determined in combination with an actual situation.
  • It can be seen from the system framework shown in FIG. 3 that the node may transmit the log information to the load balancing node by adapting a log generated by the node (for example, adding some output fields). Specific implementation complexity is low, and impact on the node is reduced. Second, the data migration method of this solution may be performed only by adapting an output of the log in the node and adding a corresponding processing logic to the load balancing node, so that implementability is high. The node outputs the log information to transfer computation for load balancing to the load balancing node, so that contention with data generated during transaction execution for a data resource, when a load balancing algorithm is executed on the node, is reduced. Therefore, a resource utilization is increased, and data migration efficiency is improved.
  • In the data migration method provided in the embodiments of this disclosure, migrated data may be traffic-related data, for example, real-time road condition data, vehicle travel data, or driver need data. Therefore, the method provided in the embodiments of this disclosure may be applied to the field of traffic. Based on this, the following describes an intelligent traffic system (ITS) and an intelligent vehicle infrastructure cooperative system (IVICS). First, the ITS, also referred to as an intelligent transportation system, effectively integrates and applies advanced sciences and technologies (an information technology, a computer technology, a data communication technology, a sensor technology, an electronic control technology, an automatic control technology, operational research, artificial intelligence, and the like) to transportation, service control, and vehicle manufacturing to strengthen a connection between a vehicle, infrastructure, and a user, thereby forming an integrated transportation system to ensure safety, improve efficiency and an environment, and save energy. Second, the IVICS is referred to as a cooperative vehicle infrastructure system for short, and is a development direction of the ITS. The cooperative vehicle infrastructure system comprehensively implements dynamic real-time vehicle-vehicle and vehicle-infrastructure information interaction by using advanced wireless communication and new-generation Internet technologies and the like, and develops active vehicle safety control and cooperative infrastructure management based on full space-time dynamic traffic information acquisition and fusion to fully implement effective cooperation of a person, a vehicle, and infrastructure, ensure traffic safety, and improve traffic efficiency, thereby forming a safe, efficient, and environment-friendly road traffic system.
  • The data migration method provided in the embodiments of this disclosure specifically involves a cloud technology. The following further describes the cloud technology. The cloud technology is a hosting technology that unifies a series of hardware, software, and network resources, and the like in a WAN or a LAN to implement calculation, storage, processing, and sharing of data. The cloud technology is a generic term of a network technology, information technology, integration technology, management platform technology, application technology, and the like based on commercial-mode application of cloud computing. A resource pool may be formed, and is flexibly and conveniently used on demand. A cloud computing technology will become an important support. A background service of a technical network system requires a large quantity of computing and storage resources, for example, a video website, a picture website, and more portals. As the Internet industry is highly developed and applied, each item may have its own identification mark in the future, which is desired to be transmitted to a background system for logical processing. Data of different levels may be processed separately. All kinds of industry data require a strong system support, which can only be realized by cloud computing.
  • Cloud computing distributes computing tasks on a resource pools including a large quantity of computers, such that various application systems can obtain computing power, storage space, and information services as desired. A network providing resources is referred to as a "cloud". The resources in the "cloud" appear to a user to be infinitely extensible and available at any time. The resources are available on demand and extensible at any time, and the user pays for use.
  • A basic capability provider of cloud computing may construct a cloud computing resource pool platform (referred to as a cloud platform for short, generally referred to as infrastructure as a service (IaaS)), and multiple types of virtual resources are deployed in a resource pool for an external client to select and use. The cloud computing resource pool mainly includes a computing device (a virtual machine, including an operating system), a storage device, and a network device.
  • According to logical functions, a platform as a service (PaaS) layer may be deployed on an IaaS layer, and then a software as a service (SaaS) layer is deployed on the PaaS layer. Alternatively, the SaaS layer may be directly deployed on the IaaS layer. PaaS is a platform on which software is run, for example, a database or a web container. SaaS is various transaction software, for example, a web portal or a mass texting device. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.
  • Second, cloud storage is a novel concept extending and developing based on the concept of cloud computing. A distributed cloud storage system (referred to as a storage system hereinafter) is a storage system that integrates a number of different types of storage devices (the storage device is also referred to as a storage node) in the network through application software or application interfaces by using a function such as a cluster application, a grid technology, or a distributed storage file system to cooperate to externally provide data storage and transaction access functions.
  • At present, a storage method of the storage system is as follows. A logical volume is created. When the logical volume is created, physical storage space is allocated to each logical volume. The physical storage space may include a disk of one or more storage devices. When a client stores data in a specific logical volume, that is, stores the data in a file system, the file system divides the data into many portions, each portion being an object. The object includes not only data but also additional information such as a data identity (ID, ID entity). The file system writes each object to physical storage space of the logical volume. In addition, the file system records storage position information of each object. Therefore, when the client requests to access the data, the file system may enable the client to access the data according to the storage position information of each object.
  • The storage system specifically allocates the physical storage space to the logical volume through the following process: pre-dividing physical storage space into stripes according to a capacity estimate for an object to be stored in the logical volume (the estimate often has a large margin with respect to an actual capacity for the object to be stored) and a group of a redundant array of independent disks (RAID), one logical volume being understood as one stripe, thereby allocating the physical storage space to the logical volume.
  • Further, a blockchain is a novel application mode of a computer technology such as distributed data storage, point-to-point transmission, a consensus mechanism, or an encryption algorithm. The blockchain is essentially a decentralized database, and is a string of data blocks associatively generated by using a cryptographic method. Each data block includes information of a batch of network transactions, and is used for verifying validity of the information (anti-counterfeiting) and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, and an application service layer.
  • The blockchain underlying platform may include a processing module 1102 such as a user management module, a basic service module, a smart contract module, and an operation management module. The user management module is configured to manage identity information of all blockchain participants, including maintenance of public and private key generation (account management), key management, maintenance of a correspondence between a real identity of a user and a blockchain address (authority management), and the like, and when authorized, determine and audit transaction conditions of some real identities and provide a rule configuration for risk control (risk control auditing). The basic service module is deployed on all blockchain node devices, and is configured to verify validity of a transaction request, and record a valid request on a storage after consensus about the valid request is completed. For a new transaction request, the basic service module first performs interface adaptation parsing and authentication processing (interface adaptation), then encrypts transaction information by using a consensus algorithm (consensus management), completely and uniformly transmits the transaction information to a shared ledger after encryption (network communication), and performs recording and storage. The smart contract module is configured for registration, issuance, triggering, and execution of a contract. A developer may define a contract logic by using a specific programming language, issue the contract logic on the blockchain (contract registration), and call a key or another event according to a logic of a contract term to trigger execution to complete the contract logic. In addition, a contract upgrade and cancellation function is further provided. The operation management module is mainly configured for deployment in a release process of a product, modification of a configuration, contract setting, cloud adaptation, and visual output of a real-time state during running of the product, for example, alarming, managing a network condition, or managing a health condition of the node device.
  • The platform product service layer provides a basic capability and implementation framework of a typical application. The developer may superimpose characteristics of a transaction based on the basic capability to complete blockchain implementation of a transaction logic. The application service layer provides a blockchain-solution-based application service for a transaction participant to use.
  • With reference to the above descriptions, refer to FIG. 4. FIG. 4 is a schematic diagram of a data migration method according to an embodiment of this disclosure. As shown in FIG. 4, the data migration method in this embodiment of this disclosure is applied to the load balancing system shown in FIG. 3. It is to be understood that, for ease of understanding, an example in which the load balancing system includes a first node and a second node is used in this embodiment for description. In actual applications, the load balancing system may further include more nodes, and elaborations are omitted herein. Based on this, the data migration method in this embodiment of this disclosure includes the following steps 101 to 105.
  • 101: The first node obtains a first route assignment table, the first route assignment table including a mapping relationship between a first primary key identity and a first partition identity, the first primary key identity being used for uniquely identifying first data, and the first partition identity indicating the second node.
  • In this embodiment, the first obtains the first route assignment table. The first route assignment table includes the mapping relationship between the first primary key identity and the first partition identity. The first primary key identity is used for uniquely identifying first data. The first partition identity indicates the second node. Specifically, one primary key identity may uniquely identify only one piece of data. For example, a primary key identity 1 is used for uniquely identifying data 1, and a primary key identity 2 is used for uniquely identifying data 2. One partition identity may indicate only one node, but one node may be indicated by multiple partition identities. For example, a partition identity 1 is used for indicating a node 1, a partition identity 2 is used for indicating the node 1, and a partition identity 3 is used for indicating a node 2. In this case, the node 1 is indicated by the partition identity 1 and the partition identity 2, and the node 2 is indicated by the partition identity 3.
  • Further, the mapping relationship between the first primary key identity and the first partition identity is, for example, a mapping relationship between the partition identity 1 and the primary key identity 1 or a mapping relationship between the partition identity 2 and the primary key identity 2. It is to be understood that, in actual applications, the first route assignment table may include mapping relationships between multiple primary key identities, multiple transaction identities, and multiple partition identities. Not all mapping relationships are exhausted herein. The example in this embodiment is not to be understood as a limitation on this disclosure. A primary key identity and a partition identity carried in each piece of data may be flexibly determined according to an actual situation.
  • 102: The first node receives a first instruction, the first instruction carrying the first primary key identity and a first transaction identity, the first transaction identity indicating a first transaction, and the first node being configured to process the first transaction.
  • In this embodiment, the first node receives the first instruction. The first instruction carries the first primary key identity and the first transaction identity. The first transaction identity indicates the first transaction. The first transaction identity may be carried in a log of the first transaction generated by the first node by executing the first transaction.
  • The first node may learn based on the first transaction identity in the first instruction that a user initiates an operation on the first transaction at this time, and data desired to be called for this operation on the first transaction is the first data indicated by the first primary key identity.
  • 103: The first node obtains, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction.
  • In this embodiment, since the first route assignment table includes the mapping relationship between the first primary key identity and the first partition identity, the first node may obtain the first primary key identity carried in the first instruction by obtaining the first instruction in step 102, and then determine, by using the mapping relationship between the first primary key identity and the first partition identity in the first route assignment table, the first partition identity based on the first primary key identity.
  • Exemplarily, an example in which the first route assignment table includes the mapping relationship between the partition identity 1 and the primary key identity 1 and the mapping relationship between the partition identity 2 and the primary key identity 2 is used for description. If the first primary identity carried in the first instruction is the primary key identity 1, it may be determined by using the mapping relationship between the partition identity 1 and the primary key identity 1 that the first partition identity is the partition identity 1. Similarly, if the first primary identity carried in the first instruction is the primary key identity 2, it may be determined by using the mapping relationship between the partition identity 2 and the primary key identity 2 that the first partition identity is the partition identity 2. It is to be understood that the foregoing example is not to be understood as a limitation on this disclosure, and the determined partition identity is desired to be determined according to a specific mapping relationship in the first route assignment table.
  • 104: The first node determines the second node based on the first partition identity, the second node being configured to process the first transaction.
  • In this embodiment, since one partition identity may indicate only one node, the first node may determine the second node based on the first partition identity. Second, since the first instruction received in step 102 further carries the first transaction identity, the determined second node is a node capable of processing the first transaction indicated by the first transaction identity.
  • Specifically, in actual applications, the first node may be indicated by multiple partition identities. For example, the partition identity 1 indicates the node 1, the partition identity 2 indicates the node 2, and the partition identity 3 indicates the node 1. In this case, both the partition identity 1 and the partition identity 3 may indicate that corresponding data is processed by the node 1. For ease of understanding, the following describes, in detail based on the consistent hashing algorithm shown in FIG. 1, how a partition identity identifies a node. FIG. 5 is a schematic diagram of a case in which the partition identity uniquely identifies the node according to an embodiment of this disclosure. As shown in FIG. 5, E11 and E12 denote nodes, and E21 to E26 denote data. Based on this, hash calculation is first performed on the node E11 to obtain a hash value 1 corresponding to the node E11. Similarly, hash calculation is performed on the node E12 to obtain a hash value 2 corresponding to the node E12.
  • Then, hash calculation is performed on the partition identity 1 of the data E21 to obtain a hash value 3 corresponding to the partition identity 1 of the data E21. Similarly, hash calculation is performed on the partition identity 2 of the data E22 to obtain a hash value 4 corresponding to the partition identity 2 of the data E22. Hash calculation is performed on the partition identity 3 of the data E23 to obtain a hash value 5 corresponding to the partition identity 3 of the data E23. Hash calculation is performed on a partition identity 4 of the data E24 to obtain a hash value 6 corresponding to the partition identity 4 of the data E24. Hash calculation is performed on a partition identity 5 of the data E25 to obtain a hash value 7 corresponding to the partition identity 5 of the data E25. Hash calculation is performed on a partition identity 6 of the data E26 to obtain a hash value 8 corresponding to the partition identity 6 of the data E26.
  • Further, for the data E21, it is determined by clockwise searching that the hash value 1 is greater than the hash value 3. In this case, it may be determined that the node E11 is a node in which the data E21 is stored, and the partition identity 1 of the data E21 may uniquely indicates the node E11. Similarly, the node E11 is a node in which the data E22 is stored, and the partition identity 2 of the data E22 may uniquely indicates the node E11. The node E11 is a node in which the data E23 is stored, and the partition identity 3 of the data E23 may uniquely indicates the node E11.
  • Second, it may be obtained through calculation similar to the foregoing that the node E12 is a node in which the data E24 is stored, and the partition identity 4 of the data E24 may uniquely indicates the node E12. The node E12 is a node in which the data E25 is stored, and the partition identity 5 of the data E25 may uniquely indicates the node E12. The node E12 is a node in which the data E26 is stored, and the partition identity 6 of the data E24 may uniquely indicates the node E12. It is to be understood that FIG. 5 and the corresponding example are used only to understand how to uniquely indicate a node by using a partition identity, but the example is not to be understood as a limitation on this disclosure.
  • 105: The first node transmits the first data to the second node.
  • In this embodiment, after receiving the first instruction in step 102, the first node determines, by using the first primary key identity, the first data uniquely identified by the first primary key identity, and may determine through the foregoing steps that the first data desired to be called by the first instruction is currently not data processed by the first node. Therefore, the first data has to be transmitted to the second node, and the second node processes the first data to complete processing the first transaction.
  • This embodiment of this disclosure provides a data migration method. In the foregoing manner, in a scenario in which data interaction between multiple nodes is desired by a transaction, based on a mapping relationship between a primary key identity and a partition identity, data desired to be migrated may be determined by using the primary key identity, and a node to which the data is to be migrated may be determined by using the partition identity, thereby completing migration of the data between the multiple nodes, without forwarding a user request for many times between the multiple nodes. Therefore, data migration efficiency is improved.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided in this disclosure, the first route assignment table further includes a mapping relationship between a first index identity and the first partition identity. The first instruction further includes the first index identity. That the first node obtains, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction specifically includes that: the first node determines, by using the first route assignment table, N partition identies based on the first index identity carried in the first instruction, the N partition identities including the first partition identity, and N being an integer greater than or equal to 1; and the first node determines, by using the first route assignment table, the first partition identity from the N partition identities based on the first primary key identity carried in the first instruction.
  • In this embodiment, the first route assignment table further includes the mapping relationship between the first index identity and the first partition identity, and the first instruction further includes the first index identity. Specifically, when multiple pieces of data are stored in a node, a calculation amount may be large if partition identities are determined by using primary key identities. One index identity may correspond to at least one partition. Based on this, the first node specifically determines, by using the mapping relationship between the first index identity and the first partition identity in the first route assignment table, the N partition identities based on the first index identity carried in the first instruction. The N partition identities include the first partition identity, and N is an integer greater than or equal to 1. Then, the first partition identity corresponding to the first primary key identity in the mapping relationship is desired to be determined from the N partition identities.
  • Further, the first node determines, by using the mapping relationship between the first primary key identity and the first partition identity in the first route assignment table, the first partition identity from the determined N partition identities based on the first primary key identity carried in the first instruction. In this way, a range for searching for mapping relationships by using the primary key identity can be reduced, and it can be ensured that the N partition identities include the partition identity corresponding to the primary key identity in the mapping relationship. For example, the first route assignment table includes the mapping relationship between the partition identity 1 and the primary key identity 1, a mapping relationship between the partition identity 1 and an index identity 1, and a mapping relationship between the partition identity 2 and the index identity 1. In this case, the partition identity 1 and the partition identity 2 may be determined based on the index identity 1, and then the partition identity 1 is determined from the partition identity 1 and the partition identity 2 as the first partition identity based on the primary key identity 1 by using the mapping relationship between the partition identity 1 and the primary key identity 1.
  • It is to be understood that, in actual applications, the first instruction may include multiple primary key identities, and the first index identity may indicate multiple primary key identities. That is, the operation initiated by the user on the first transaction requires multiple pieces of data indicated by the multiple primary key identities to be called. For example, the first route assignment table includes the mapping relationship between the partition identity 1 and the primary key identity 1, the mapping relationship between the partition identity 2 and the primary key identity 2, the mapping relationship between the partition identity 1 and the index identity 1, and the mapping relationship between the partition identity 2 and the index identity 1, and the first instruction includes the primary key identity 1, the primary key identity 2, and the index identity 1. In this case, the partition identity 1 and the partition identity 2 may be determined based on the index identity 1, or the partition identity 1 and the partition identity 2 may be determined by using the foregoing mapping relationships based on the primary key identity 1 and the primary key identity 2. That is, data is in the first node and the second node. In this case, the first node completes calling data indicated by the primary key identity 2, and sends the first data indicated by the primary key identity 1 to the second node to enable the second node to call the first data, thereby completing the first transaction. The first transaction is the 2PC transaction described above.
  • This embodiment of this disclosure provides another data migration method. In the foregoing manner, at least one partition identity is first determined by using the first index identity. Then, the first partition identity corresponding to the first primary key identity in the mapping relationship is determined from the at least one partition identity based on the first primary key identity. In this way, a range for searching for multiple mapping relationships by using the first primary key identity can be reduced, and efficiency of determining the first partition identity based on the first primary key identity can be improved. Therefore, data migration efficiency is improved.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided in this disclosure, the data migration method may further include that: the first node obtains a second route assignment table at a first time point, the second route assignment table including a mapping relationship between the first primary key identity and a second partition identity, and the second partition identity indicating the first node. That the first node obtains a first route assignment table includes that: the first node obtains the first route assignment table at a second time point, the second time point being later than the first time point. After the first node transmits the first data uniquely identified by the first primary key identity to the second node, the method further includes that: the first node deletes the second route assignment table.
  • In this embodiment, the first node obtains the second route assignment table at the first time point. In this case, the second route assignment table includes the mapping relationship between the first primary key identity and the second partition identity. The second partition identity indicates the first node. The first node is configured to process the first transaction. That is, in this case, based on the mapping relationship in the first route assignment table, the first data uniquely identified by the first primary key identity is data to be managed by the first node. Further, the first node obtains the first route assignment table at the second time point, and the second time point is later than the first time point. That is, the first route assignment table is obtained by updating the mapping relationship between the first primary key identity and the second partition identity in the second route assignment table. Based on this, after the first node transmits the first data to the second node in 105, the first data has been migrated, according to the latest route assignment table, to the second node managing the first data. In this case, the second route assignment table may be deleted.
  • Specifically, in actual applications, the first instruction may be a first statement instruction generated for the first transaction when the user initiates the operation on the first transaction. That is, after the user initiates the operation on the first transaction, the first node receives the first statement instruction (that is, the first instruction) for the first transaction, and determines based on the first instruction that the first transaction is desired to be executed, that is, the first data is desired to be called. In this case, after the first data and the second node are determined in the foregoing step, the first data is sent to the second node.
  • In another possibility, a first statement instruction generated for the first transaction when the user initiates the operation on the first transaction is sent to the second node, and the second node may determine the first data and the second node (that is, the current node) through steps similar to the foregoing. However, in such case, the first data is yet not migrated to the second node. Since the second node may retain the obtained second route assignment table when migration of the first data is not completed, the second node determines, based on the first primary key identity by using the mapping relationship between the first primary key identity and the second partition identity in the second route assignment table, that the first data is currently on the first node indicated by the second partition identity. Therefore, the second node is desired to generate a second statement instruction for the first transaction, and send the second statement instruction (that is, the first instruction) for the first transaction to the first node to enable the first node to determine the first data and the second node based on the first instruction. The second node receives the first data sent by the first node, and completes the first transaction. In addition, after the second node receives the first data sent by the first node, the second node deletes the second route assignment table.
  • In order to describe the second case, refer to FIG. 6. FIG. 6 is a schematic flowchart of data migration according to an embodiment of this disclosure. As shown in FIG. 6, in step F1, the second node receives the first statement instruction for the first transaction. The first statement instruction for the first transaction carries a primary key identity capable of identifying data and the first transaction identity. In step F2, the second node determines whether data identified by the primary key identity is data to be processed by the current node (that is, the second node), that is, data to be processed by the second data. That is, in the foregoing manner, a partition identity is obtained by using the first route assignment table based on the primary key identity carried in the data. If the partition identity indicates another node, it is determined that the data is not data to be processed by the second node, and step F3 is performed; or if the partition identity indicates the second node, it is determined that the data is data to be processed by the second node, and step F4 is performed. In step F3, since the partition identity indicates the another node, the second node transmits the data to the another node. In step F4, the second node is desired to further determine whether the data identified by the primary key identity is on the second node. If the data identified by the primary key identity is on the second node, step F5 is performed; or if the data identified by the primary key identity is not on the second node, step F6 is performed. In step F5, the second node calls the data identified by the primary key identity to complete the first transaction.
  • If the data identified by the primary key identity is not on the second node, in step F6, since the data identified by the primary key identity is not on the second node, the second node is desired to determine, based on the second route assignment table that is not updated, a node storing the data identified by the primary key identity, and send the second statement instruction for the first transaction to the node. Then, the node transmits the data identified by the primary key identity to the second node by using a method similar to the foregoing, such that the second node obtains the data identified by the primary key identity, and performs step F5. Finally, in step F7, the second node performs transaction committing on the first transaction. It is to be understood that the example in FIG. 6 is used only to understand this solution, and is not to be understood as a limitation on this disclosure.
  • This embodiment of this disclosure provides another data migration method. In the foregoing manner, in a scenario in which data interaction between multiple nodes is desired by a same transaction type, whether received data is data to be processed by a current node is further determined based on a mapping relationship. If the received data is data to be processed by the current node, data processing is performed. If the received data is not processed by the current node, the data is migrated to a node corresponding to the data. In this way, processing of data of the same transaction type and migration between the multiple nodes are completed. Therefore, the data migration efficiency is further improved.
  • It is to be understood that the load balancing system further includes a load balancing node. When the load balancing node determines that load balancing is desired, or determines that the first node is a hotspot (that is, a load bearable by the first node is exceeded), or determines that the first node fails, the foregoing update step may be performed. The following describes how the load balancing node determines that a route assignment table is desired to be updated and how to update.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided of this disclosure, the load balancing system further includes a load balancing node. The data migration method further includes that: the load balancing node obtains a second route assignment table, the second route assignment table including a mapping relationship between the first primary key identity and a second partition identity, the second partition identity indicating the first node, and the first node being configured to process the first transaction; the load balancing node determines the first data and the second node in response to determining that a data migration condition is satisfied, the first data being data desired to be migrated to the second node; the load balancing node replaces the second partition identity in the second route assignment table with the first partition identity to obtain the first route assignment table, the first partition identity being used for uniquely identifying the second node; and the load balancing node transmits the first route assignment table to the first node and the second node.
  • In this embodiment, it can be seen from the system architecture shown in FIG. 3 that the load balancing system further includes the load balancing node. Based on this, the load balancing node may obtain the second route assignment table through system initialization. The second route assignment table includes the mapping relationship between the first primary key identity and the second partition identity. The second partition identity is used for uniquely identifying the first node. In this case, the second route assignment table is a route assignment table obtained through initialization. In a scenario in which initialization is desired, a node is used for the first time and data is imported, or a node completes data migration, or the like. A specific initialization scenario is not limited herein.
  • Further, a node may generate corresponding log information during data processing. The load balancing node receives log information transmitted by each node, and determines a result obtained by statistically analyzing the log information to specifically perform load balancing processing, hotspot data migration, data migration of a failing node, or the like by using the result. In this embodiment, log information is added to each component of a node, or corresponding field information is added to original log information, so that impact on a processing logic of an existing system and impact on performance of each node are reduced. Specifically, in this embodiment, when data corresponding to a transaction of each transaction type is processed, a node is desired to generate the following log information.
    1. 1: Starting transaction log information: the starting transaction log information includes transaction tag information, a transaction identity, a node identity, and other field information. In this embodiment, the transaction tag information is used for indicating carrying. Log information carrying the transaction tag information is log information desired in this solution. The load balancing node does not analytically process log information that does not carry the transaction tag information. Therefore, log information processing efficiency is improved.
    2. 2: Starting statement log information: the starting statement log information includes the transaction tag information, the transaction identity, the node identity, a statement identity, and other field information. The statement identity instructs the node to perform a specific operation of a starting statement.
    3. 3: Creation, read, update and deletion record log information: the creation, read, update and deletion record log information specifically includes a primary key recording format, and further includes a secondary index recording format in some scenarios. Log information of the primary key recording format includes the transaction tag information, the transaction identity, the node identity, the statement identity, a primary key identity, a partition identity, and other field information. Second, log information of the secondary index recording format includes the transaction tag information, the transaction identity, the node identity, the statement identity, a primary key identity, a partition identity, an index identity, and other field information. There is a mapping relationship between the index identity, the primary key identity, and the partition identity. Therefore, when the data carries the index identity and the primary key identity, the node corresponding to the data may be accurately determined.
    4. 4: Transaction committing log information or transaction rollback log information: the transaction committing log information or transaction rollback log information includes the transaction tag information, the transaction identity, the node identity, and other field information, and further carries a 2PC transaction identity if the transaction is a 2PC transaction. It may be determined by using the log information that data interaction between multiple nodes has been performed for the data in the transaction.
  • It is to be understood that the above uses execution of one transaction in this solution to describe the log information desired to be obtained in this solution. The other field information may output more other information as desired for more flexible analysis on the user transaction in more dimensions. Second, for each operation in a transaction, including creation, read, update, deletion or the like, log information is generated in this solution by using the primary key recording format or using the primary key recording format and the secondary index recording format.
  • Further, since each node may generate corresponding log information when processing data, the load balancing node may receive the log information transmitted by each node, statistically analyze the log information to obtain a statistical result, and determine, based on the statistical result, whether the data migration condition is satisfied. If the data migration condition is not satisfied, the system is not desired to perform data migration between nodes. If the data migration condition is satisfied, the load balancing node determines data desired to be migrated and a node receiving the data desired to be migrated.
  • Specifically, the load balancing node determines the first data, and the first data is data desired to be migrated to the second node. That is, the first data and the second node may be determined. Since the second route assignment table includes the mapping relationship between the first primary key identity and the second partition identity, the second partition identity indicates the first node, and the first data uniquely indicated by the first primary key identity is desired to be migrated to the second node, the mapping relationship in the second route assignment table is desired to be updated. Based on this, the load balancing node replaces the second partition identity in the second route assignment table with the first partition identity to obtain the first route assignment table. That is, the obtained first route assignment table includes the mapping relationship between the first primary key identity and the first partition identity. The first partition identity is used for uniquely identifying the second node. Finally, the load balancing node transmits the first route assignment table to the first node and the second node, such that after receiving the first data, the first node or the second node may migrate or perform other processing on the first data in the manner described in the foregoing embodiments by using the mapping relationship in the first route assignment table.
  • This embodiment of this disclosure provides a route assignment table update method. In the foregoing manner, after a route assignment table is obtained through initialization, in actual running of the system, load balancing may be desired, a hotspot may be eliminated, or there may be a failing node. In this case, data migration is performed between nodes. Data desired to be migrated and a node receiving the data desired to be migrated are determined based on a transaction identity. After a mapping relationship in the route assignment table is updated, an updated route assignment table is sent to each node, to ensure that each node may perform data processing based on the updated route assignment table and ensure processing accuracy of the node. Therefore, stability and data processing efficiency of the system are improved.
  • Further, in a process of executing each transaction, the load balancing node may obtain log information about each transaction. After the load balancing node completes analytically processing the log information, a cross-node transaction graph of 2PC transaction execution in the system may be constructed. First, it can be seen based on the foregoing embodiments that log information specifically used in this embodiment of this disclosure includes the starting transaction log information, the starting statement log information, the creation, read, update and deletion record log information, and the transaction committing log information or transaction rollback log information. For ease of understanding, exemplarily, an example in which the load balancing system includes the node 1, the node 2, a node 3, a node 4, and the load balancing node and is applied to a cross-node data interaction scenario is used for description.
  • First, for a user A, there is [100000001][1500][Beijing], where [100000001] represents a primary key identity of numerical information of the user A, [1500] represents specific numerical information of the user A, and [Beijing] represents a partition identity of the numerical information of the user A. For a user B, there is [075500567][300][Guangdong], where [075500567] represents a primary key identity of numerical information of the user B, [300] represents specific numerical information of the user B, and [Guangdong] represents a partition identity of the numerical information of the user B. During initialization, the numerical information of the user A is assigned to the node 1, and the numerical information of the user B is assigned to the node 3. In this case, when 300 is desired to be transferred from an account of the user A to an account of the user B, the following log information may be obtained:
    • [REBALANCE][4567896128452][1][BEGIN]
    • [REBALANCE][4567896128452][1][01][UPDATE]
    • [REBALANCE][4567896128452][1][100000001][BALANCE:-300][UPDATE]
    • [REBALANCE][4567896128452][3][02][UPDATE]
    • [REBALANCE] [4567896128452][3][075500567][BALANCE:+300][UPDATE]
    • [REBALANCE][4567896128452][1][2PC][COMMIT].
  • First, it can be seen from "[2PC]" in "[REBALANCE][4567896128452] [1][2PC][COMMIT]" in the log information that this transaction is a 2PC transaction. Secondly, it can be seen from a transaction identity "[4567896128452]" in the log information that the log information is all log information generated when the same transaction is executed. Further, it can be seen from the log information "[REBALANCE] [4567896128452] [1] [100000001][BALANCE: -300][UPDATE]" and "[REBALANCE][4567896128452][3][075500567][BALANCE: + 300][UPDATE]" that on the node 1, [300] in data indicated by a primary key identity [100000001] is desired to be migrated to data indicated by a primary key identity [075500567], and the data indicated by the primary key identity [075500567] is on the node 3. Thus, it can be seen that the node 1 is desired to deduct [300] from numerical information [1500] indicated by the primary key identity [100000001], and the node 3 is desired to add [600] to numerical information [300] indicated by the primary key identity [075500567]. After both the node 1 and the node 3 complete data committing, the numerical information indicated by the primary key identity [100000001] on the node 1 changes to [1200], and the numerical information indicated by the primary key identity [075500567] on the node 3 changes to [600].
  • Further, the load balancing node may process the log information about the transaction to obtain FIG. 7. FIG. 7 is a schematic diagram of the cross-node transaction graph according to an embodiment of this disclosure. As shown in FIG. 7, G1 denotes the node 1, G2 denotes the node 2, G3 denotes the node 3, G4 denotes the node 4, G5 denotes a data node storing data indicated by the primary key identity [100000001] in the node G1, G6 denotes a data node storing data indicated by the primary key identity [075500567] in the node G3, and G7 denotes a cross-node edge. Based on this, the load balancing node determines a node sending out a transaction execution statement as a starting point of the edge (that is, the node G5 in FIG. 7) in the cross-node transaction graph, and determines a node receiving the transaction execution statement as an ending point of the edge (that is, the node G6 in FIG. 7). If the data indicated by the primary key identity [100000001] is a data node in the node G1, and the data indicated by the primary key identity [075500567] is a data node in the node G3, a directed arrow between the two data nodes represents an execution sequence as the edge in the cross-node transaction graph. If the edge crosses two nodes, that is, the cross-node edge G7 shown in FIG. 7, it indicates that the transaction indicated by the transaction identity [4567896128452] is a 2PC transaction.
  • Second, (100, 25) in the data node G5 storing the data indicated by the primary key identity [100000001] in the node G1 represents that the data node G5 processes 100 transactions, and 25 of the 100 transactions are 2PC transactions. Similarly, (50, 49) in the data node G6 storing the data indicated by the primary key identity [075500567] in the node G3 represents that the data node G6 processes 50 transactions, and 49 of the 50 transactions are 2PC transactions. Therefore, the cross-node transaction graph, shown in FIG. 7, in the load balancing system may be constructed in a manner similar to the foregoing by using log information generated by a computing node when executing multiple transactions.
  • The following describes various cases in which the load balancing node determines, by using log information about each transaction, whether the data migration condition is satisfied.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided this disclosure, the data migration condition is that a ratio of a 2PC transaction processing throughput to a total transaction processing throughput of a node is greater than a first preset threshold. The operation of determining that a data migration condition is satisfied specifically includes: the load balancing node obtains a 2PC transaction identity in a case that log information transmitted by the first node is received in a first preset period, the log information transmitted by the first node including the 2PC transaction identity, and the 2PC transaction identity indicating that the log information is generated after the first node processes a 2PC transaction; the load balancing node statistically obtains a total transaction processing throughput of the first node based on the log information transmitted by the first node; the load balancing node statistically obtains a 2PC transaction processing throughput of the first node based on the 2PC transaction identity; and the load balancing node determines, in a case that a ratio of the 2PC transaction processing throughput of the first node to the total transaction processing throughput of the first node is greater than the first preset threshold, that the data migration condition is satisfied.
  • In this embodiment, the data migration condition is that a ratio of a 2PC transaction processing throughput to a total transaction processing throughput of a node is greater than the first preset threshold. Based on this, if the load balancing node can receive, in the first preset period, the log information transmitted by the first node, it can be seen from the foregoing embodiments that the log information transmitted by the first node may include the first transaction identity. If the first node processes the 2PC transaction, the log information transmitted by the first node further includes the 2PC transaction identity. The 2PC transaction identity indicates that the log information is generated after the first node processes the 2PC transaction. Therefore, the load balancing node statistically obtains the total transaction processing throughput of the first node based on the log information transmitted by the first node. The total transactions processing throughput of the first node is a total quantity of transactions executed by the first node in the first preset period. Second, the load balancing node statistically obtains the 2PC transaction processing throughput of the first node based on the 2PC transaction identity. The 2PC transaction processing throughput of the first node is a total quantity of 2PC transactions executed by the first node in the first preset period. Then, the ratio of the 2PC transaction processing throughput of the first node to the total transaction processing throughput of the first node is calculated. If the ratio is greater than the first preset threshold, the load balancing node determines that the data migration condition is satisfied.
  • Specifically, the foregoing preset period may be 60 seconds and 5 minutes. Whether the preset period is desired to be adjusted is determined according to a running status of the system. Alternatively, the preset period may be adjusted on line as desired by the user. The preset period is mainly to make a compromise between a load balancing adjustment frequency and an adjustment delay acceptable by the user, so as to achieve a best load balancing effect. Obtaining the load balancing adjustment frequency also requires analysis and organization of a user load of each node.
  • Second, the first preset threshold may be 5%, 10%, or the like. In an example in which the first preset threshold is 5%, when a throughput of data related to a first transaction type exceeds 5% of a data processing throughput of each node, it is determined that the data migration condition is satisfied. In this case, load balancing adjustment is desired to be performed on the entire system by taking the first transaction type as a unit, to reduce the throughput of the data related to the first transaction type during execution within a range of the first preset threshold (5%). Stability of the load balancing adjustment may be improved based on the first preset threshold. In general, when the distributed system is optimized to a specific extent, some data associated with a transaction cannot be assigned to one computing node for execution, and then a 2PC transaction is generated. That is, data carrying a same transaction identity may be distributed in different nodes, and a transaction type indicated by the transaction identity is a 2PC transaction. Based on this, continuing to perform optimization after a specific data throughput in the distributed system is reached may fluctuate a processing capability of each node due to continuous migration of data between the nodes, and further reduce data processing efficiency of the node. Therefore, the load balancing node may determine, during load balancing by using the first preset threshold, a data migration degree at which data migration may be stopped. In this way, a load balancing effect is improved.
  • For ease of understanding, FIG. 7 is used as an example, and the first preset threshold is 10%. In this case, for the node G1, a total transaction processing throughput of the node G1 is 100, and a 2PC transaction processing throughput of the node G1 is 25. A ratio of the 2PC transaction processing throughput of the node G1 to the total transaction processing throughput of the node G1 is 25%, exceeding the first preset threshold, so that it is determined that load balancing is desired. Second, for the node G3, a total transaction processing throughput of the node G1 is 50, while a 2PC transaction processing throughput is 49. A ratio of the 2PC transaction processing throughput of the node G3 to the total transaction processing throughput of the node G3 is 98%, also exceeding the first preset threshold, so that it is determined that load balancing is desired.
  • It may be understood that both the preset period and the first preset threshold that are described in the foregoing example are desired to be flexibly determined according to the running status of the system and an actual running requirement of the system, and specific numerical values are not to be understood as a limitation on this disclosure.
  • This embodiment of this disclosure provides a method for determining that the data migration condition is satisfied. In the foregoing manner, a data throughput during execution of a same transaction is determined based on a transaction identity, and whether data migration is desired may be determined based on the first preset threshold, thereby improving feasibility of this solution. Second, the load balancing node may determine, during load balancing by using the first preset threshold, a data migration degree at which data migration may be stopped. In this way, the load balancing effect is improved. That is, fluctuation of the processing capability of each node due to continuous migration of data between the nodes is avoided, thereby improving the data processing efficiency of the node.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided in this disclosure, the data migration condition is that total memory usage of a node is greater than a second preset threshold. The operation of determining that a data migration condition is satisfied specifically includes: receiving, in a second preset period, total memory usage of the first node transmitted by the first node, the total memory usage indicating a memory resource occupied by multiple transactions processed by the first node; and determining, by the load balancing node in a case that the total memory usage of the first node is greater than the second preset threshold, that the data migration condition is satisfied.
  • In this embodiment, the data migration condition is that total memory usage of a node is greater than the second preset threshold, and is used for determining whether the node is a hotspot. Based on this, if the load balancing node can receive, in the second preset period, the total memory usage of the first node transmitted by the first node, the total memory usage indicating the memory resource occupied by the multiple transactions processed by the first node, and the total memory usage of the first node is greater than the second preset threshold, the load balancing node determines that the data migration condition is satisfied.
  • It is to be understood that, in actual applications, whether the node is a hotspot may be determined in another manner, for example, a ratio of a total quantity of transactions processed by the node to a total quantity of transactions processed by each computing node in the system, a memory resource utilization of the node, or management on another resource utilization in the node. Therefore, a manner provided in this embodiment, in which the node is determined as a hotspot in response to determining that a resource utilization of the node is greater than the second preset threshold, is not to be understood as the only implementation of determining a hotspot.
  • Specifically, the second preset period may be 60 seconds and 5 minutes, and may have duration the same as or different from that of the first preset period. A specific numerical value of the second preset period is not limited herein. Second, the second preset threshold may be 85% 90%, or the like. In an example in which the second preset threshold is 85%, the load balancing system includes the node 1 and the node 2, total memory usage of the node 1 is 95%, and total memory usage of the node 2 is 60%. In this case, the node 1 may be determined as a hotspot, and it is determined that the data migration condition is satisfied. Subsequent data migration desires data on the node 1 to be migrated to the node 2 or another non-hotspot node.
  • For ease of understanding, refer to FIG. 8. FIG. 8 is a schematic flowchart of determining that the data migration condition is satisfied according to an embodiment of this disclosure. As shown in FIG. 8, in step H1, the load balancing node receives, in the second preset period, the total memory usage of the first node transmitted by the first node. In step H2, the load balancing node determines whether the total memory usage of the first node is greater than the second preset threshold. If the total memory usage of the first node is not greater than the second preset threshold, step H3 is performed; or if the total memory usage of the first node is greater than the second preset threshold, step H4 is performed. In step H3, the load balancing node receives total memory usage of the first node transmitted by the first node in a next first preset period, and processes the log information based on a method similar to step H1 and step H2. In step H4, since the total memory usage of the first node is greater than the second preset threshold, that is, the first node is a hotspot, the load balancing node determines that the data migration condition is satisfied.
  • Therefore, in step H5, the load balancing node determines, based on the manner described in the foregoing embodiments, data desired to be migrated and a node receiving the data desired to be migrated, and updates a mapping relationship in a route assignment table. Further, in step H6, the load balancing node sends an updated route assignment table to each node in the system, such that the first node migrates, based on the updated route assignment table, some data born by the first node. After data migration is completed, the load balancing node may further continue to determine whether current total memory usage of the first node is greater than the second preset threshold by using the method in step H2, until the hotspot is eliminated. It is to be understood that the example in FIG. 8 is used only to understand this solution, and a specific process and implementation steps may be flexibly adjusted according to an actual situation.
  • This embodiment of this disclosure provides another method for determining that the data migration condition is satisfied. In the foregoing manner, the total memory usage of the first node is obtained, and whether data migration is desired may be determined based on the second preset threshold, thereby improving feasibility of this solution. Second, when load balancing is performed, if there is a hotspot in the system, the hotspot may be eliminated by data migration, thereby ensuring stability of the load balancing system.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided this disclosure, the load balancing system further includes a third node. The log information transmitted by the first node further includes the first transaction identity and the first primary key identity. The data migration method further includes that: the load balancing node receives log information transmitted by the second node and log information transmitted by the third node, the log information transmitted by the second node including the first transaction identity and the first primary key identity, and the log information transmitted by the third node including the first transaction identity and the first primary key identity. That the load balancing node determines the first data and the second node specifically includes that: the load balancing node collects statistics on the log information transmitted by the first node, the log information transmitted by the second node, and the log information transmitted by the third node, to obtain that a quantity of times the first node initiates the first transaction to the second node is L and that a quantity of times the first node initiates the first transaction to the third node is M, L and M being integers greater than or equal to 1; and the load balancing node determines the second node in a case that L is greater than M, and determines the first data by using the second route assignment table based on the first primary key identity.
  • In this embodiment, the load balancing system further includes the third node, and the log information transmitted by the first node further includes the first transaction identity and the first primary key identity. Based on this, the load balancing node receives the log information transmitted by the second node and the log information transmitted by the third node. In this case, the log information transmitted by the second node includes the first transaction identity and the first primary key identity, and the log information transmitted by the third node includes the first transaction identity and the first primary key identity. Thus, it can be seen that the first node, the second node and the third node all have processed the first transaction indicated by the first transaction identity, and steps related to the first data indicated by the first primary key identity are completed during data processing in a process of processing the first transaction.
  • Further, the load balancing node collects statistics on the log information transmitted by the first node, the log information transmitted by the second node, and the log information transmitted by the third node, to obtain that the quantity of times the first node initiates the first transaction to the second node is L and that the quantity of times the first node initiates the first transaction to the third node is M, L and M being integers greater than or equal to 1. If L is greater than M, it indicates that the first node processes the 2PC transaction for more times than the second node. Therefore, the second node is determined as a node capable of receiving migrated data. Then, after data of the 2PC transaction is migrated to the second node, a quantity of interactions for the 2PC transaction between the first node and the second node may be reduced by L.
  • Exemplarily, the quantity of times the first node initiates the first transaction to the second node is 100, and the quantity of times the first node initiates the first transaction to the third node is 50. If all data related to the first transaction in the first node is migrated to the third node, 50 interactions for the data between the first node and the third node may be eliminated, but 100 interactions between the data and the second node may not be eliminated. Therefore, if all the data related to the first transaction in the first node is migrated to the second node, the 100 interactions for the data between the first node and the second node may be eliminated. Although the 500 interactions between the data and the third node may not be eliminated, compared with the foregoing migration manner, this manner has the advantage that 2PC transactions are maximally eliminated, thereby reducing a system load.
  • Second, this embodiment may also provide another method for determining to-be-migrated data. Refer back to FIG. 7. The load balancing node finds all edges crossing computing nodes in FIG. 7, calculates a difference between an amount of cross-node data processed by one data node and an amount of all data processed by the data node, and determines data requiring cross-node processing in a data node corresponding to a maximum difference as a first data interface. For example, for the data node G5 storing the data indicated by the primary key identity [100000001] in the node G1 in FIG. 7, an amount of all data processed by the data node G5 is 100, and an amount of cross-node data processed by the data node G5 is 25. Then, it may be obtained that a difference is 75 (100-25). For the data node G6 storing the data indicated by the primary key identity [075500567] in the node G3 in FIG. 7, an amount of all data processed by the data node G6 is 50, and an amount of cross-node data processed by the data node G6 is 49. Then, it may be obtained that a difference is 1 (50-49). Foregoing calculation may be performed for a data node in which each piece of cross-node data is in the system, and a node corresponding to a maximum difference desires to migrate data in this node to another node. The corresponding data in the node is determined based on a transaction identity.
  • For ease of understanding, refer to FIG. 9. FIG. 9 is a schematic flowchart of determining to-be-migrated data according to an embodiment of this disclosure. As shown in FIG. 9, in step I1, log information transmitted by each node in a preset period is received in the manner described in the foregoing embodiments to determine that there is a node desiring to perform data migration, that is, determine that the data migration condition is satisfied. In step I2, to-be-migrated data desired to be migrated is determined by using the method in this embodiment. In step 13, multiple nodes capable of processing the to-be-migrated data are determined, and a node receiving the data desired to be migrated is determined by using the method in this embodiment. Then, in step I4, a mapping relationship in a route assignment table is updated. In step I5, a route assignment table including an updated mapping relationship is transmitted to all nodes in the load balancing system, such that the node performs data migration based on the route assignment table including the updated mapping relationship. A method for data migration between the nodes is similar to that described in the foregoing embodiments, and will not be elaborated herein. Then, in step I6, the load balancing node desires to redetermine whether the data migration condition is satisfied. If the data migration condition is satisfied, steps similar to step I1 to step I4 are performed; or if the data migration condition is not satisfied, step I7 is performed. That is, the load balancing node receives log information transmitted by each node in a next preset period, and processes the log information based on a manner similar to that in the foregoing embodiments. It is to be understood that the example in FIG. 9 is used only to understand this solution, and a specific process and implementation steps may be flexibly adjusted according to an actual situation.
  • This embodiment of this disclosure provides another data migration method. In the foregoing manner, quantities of 2PC transactions processed by different nodes are determined based on quantities of times the nodes process data in the same transaction, a node that processes a larger amount of data of 2PC transactions is determined as a node desiring to migrate data, and data on this node desired to be migrated is determined based on a transaction identity. In this way, 2PC transactions are maximally eliminated, thereby improving the load balancing effect, that is, improving reliability of data migration in this solution.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided in this disclosure, the data migration condition is that a node fails. The load balancing system further includes a third node. The operation of determining that a data migration condition is satisfied specifically includes: the load balancing node determines, in a case that the first node does not transmit log information of the first node to the load balancing node in a first preset period, that the data migration condition is satisfied. The data migration method further includes that: the load balancing node receives, in the first preset period, log information transmitted by the second node and log information transmitted by the third node, the log information transmitted by the second node including the first transaction identity and the first primary key identity, and the log information transmitted by the third node including the first transaction identity and the first primary key identity. That the load balancing node determines the first data and the second node specifically includes that: the load balancing node obtains a partition identity set corresponding to the first node, the partition identity set corresponding to the first node including the first partition identity; the load balancing node obtains, based on the partition identity set corresponding to the first node, a primary key identity set corresponding to the first node, the primary key identity set corresponding to the first node including the first primary key identity; and the load balancing node determines the first data and the second node based on the first primary key identity, the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node.
  • In this embodiment, the data migration condition is that a node fails. Based on this, if the load balancing node does not receive, in the first preset period, the log information of the first node transmitted by the first node, that is, the first node may fail, and cannot generate the corresponding log information, it is determined that the data migration condition is satisfied. Second, if the load balancing node can receive, in the first preset period, the log information transmitted by the second node and the log information transmitted by the third node, it indicates that the second node and the third node are both nodes that operate normally. Specifically, the log information transmitted by the second node includes the first transaction identity and the first primary key identity, and the log information transmitted by the third node includes the first transaction identity and the first primary key identity.
  • Further, the load balancing node obtains the partition identity set corresponding to the first node. In this case, the partition identity set corresponding to the first node includes the first partition identity. Then, the primary key identity corresponding to the first node is obtained based on the partition identity set corresponding to the first node. In this case, the primary key identity set corresponding to the first node includes the first primary key identity. Then, the first data and the second node may be determined based on the first primary key identity, the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node. The load balancing node is prevented from considering the failing node as a node capable of bearing data, thereby improving reliability of this solution. The following specifically describes how to determine the first data and the second node based on the foregoing information.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided in this disclosure, that the load balancing node determines the first data and the second node based on the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node specifically includes that: the load balancing node determines, by using the second route assignment table, the first data based on the first primary key identity; the load balancing node collects statistics on the log information transmitted by the second node and the log information transmitted by the third node, to obtain that a quantity of times the second node initiates the first transaction is Q and that a quantity of times the third node initiates the first transaction is P, Q and P being integers greater than or equal to 1; and the load balancing node determines the second node in a case that Q is greater than P, and determines the first data by using the second route assignment table based on the first primary key identity.
  • In this embodiment, since the first node is a failing node, data born in the first node is all data desired to be migrated. Therefore, the load balancing node determines, by using the second route assignment table, the first data based on the first primary key identity. Second, the load balancing node collects, in a manner similar to that in the foregoing embodiments, statistics on the log information transmitted by the second node and the log information transmitted by the third node, to obtain that the quantity of times the second node initiates the first transaction is Q and that the quantity of times the third node initiates the first transaction is P, Q and P being integers greater than or equal to 1. When Q is greater than P, it indicates that a quantity of data interactions performed by the second node is larger. Therefore, the load balancing node determines the second node as a node capable of bearing the first data.
  • In actual applications, the load balancing node may add a node to migrate the data on the failing first node to the new node, and then update the route assignment table, such that the corresponding data may be processed by the new node. This manner is similar to the manner of initializing the route assignment table, and updating a related mapping of the partition identity is also similar to that described in the foregoing embodiments. Therefore, this is not limited herein.
  • This embodiment of this disclosure provides another data migration method. In the foregoing manner, when a node fails, data on the failing node is all desired to be migrated. A node to which each piece of data may be migrated is determined based on a mapping relationship between a primary key identity of each piece of data and a transaction identity, and a node that processes fewer 2PC transactions is determined, based on quantities of times different nodes process data in a transaction, as a node capable of receiving the data. After the node obtains the migrated data, a hotspot or a system load imbalance is avoided. Therefore, the load balancing effect is improved.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided in this disclosure, that the load balancing node obtains a second route assignment table specifically includes that: the load balancing node obtains a first primary key identity set, the first primary key identity set including multiple primary key identities, and one primary key identity being used for uniquely identifying the first data; the load balancing node divides the first primary key identity set into S second primary key identity sets, each primary key identity in the second primary key identity set being managed by a same node, and S being an integer greater than or equal to 1; the load balancing node assigns the S second primary key identities to S nodes; the load balancing node determines, from the S nodes, the first node managing a third primary key identity set, the third primary key identity set including the first primary key identity; the load balancing node obtains a partition identity set corresponding to the first node; the load balancing node determines, from the partition identity set, the second partition identity corresponding to the first data; and the load balancing node establishes the mapping relationship between the first primary key identity and the second partition identity, and generates the first route assignment table.
  • In this embodiment, how the load balancing node obtains the second route assignment table through initialization is specifically described. For ease of understanding, optional partitioning modes in this solution are first described. The first is user-specified partitioning. The second is load-balancing-based partitioning. The following describes the two partitioning modes respectively.
  • 1: User-specified partitioning
  • User-specified partitioning is not desired to specify a partition identity for each user table, that is, determine a partition identity for each column of data in a user table of a user. The user notifies the load balancing node of a transaction assignment logic in advance, such that the load balancing node may perform load balancing most efficiently. For example, if the user is a numerical information account, the user may specify partitioning to be performed according to a province to which the numerical information account belongs. In this case, data of numerical information accounts in the same province is all managed by a same node. It is generally considered that most transactions related to numerical information may be directly executed in numerical information accounts in the same province. In this way, taking the province as a partition identity may implement load balancing processing well.
  • 2: Load-balancing-based partitioning
  • Load-balancing-based partitioning is desired to be based on a primary key identity corresponding to each piece of data, and is a partitioning mode in which a ratio of a throughput of the 2PC transaction in the entire system is specified based on a running effect of load balancing and adjustment is performed according to the ratio. Since transaction processing is generally recorded based on the primary key identity corresponding to each piece of data, a node in which data is or a transaction corresponding to the data may be determined by using a mapping relationship during running in a transaction dimension.
  • Specifically, a user-specified parameter "USER_SPECIFIED_PARITITON_KEYS" is used for indicating that the user specifies a mapping relationship between a primary key identity and a partition identity, and a load balancing parameter "USER_SPECIFIED_ALGORITHM" is used for instructing partitioning to be performed based on load balancing. Therefore, when the load balancing system is started, a program may detect whether there is the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS". If there is the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS", partitioning is performed in the user-specified partitioning mode, thereby establishing the mapping relationship between the primary key identity and the partition identity, and generating the second route assignment table, to complete initialization. Second, the program may also detect whether there is the load balancing parameter "USER_SPECIFIED_ALGORITHM". If there is the load balancing parameter "USER_SPECIFIED_ALGORITHM", partitioning is performed in the load-balancing-based partitioning mode. The following describes in detail how to perform partitioning based on load balancing.
  • The load balancing node obtains the first primary key identity set. The first primary key identity set includes the multiple primary key identities. One primary key identity is used for uniquely identifying the first data. In general, in the initialization scenario, a node is used for the first time and data is imported, the node completes data migration, or the like. In this case, the load balancing system has started importing the data, and each piece of data corresponds to a primary key identity uniquely identifying the data. Therefore, primary key identities corresponding to all data may be obtained, and the first primary key identity set including the multiple primary key identities is further obtained. Then, the load balancing node is desired to determine how many nodes are available in the load balancing system, and then equally divides the first primary key identity set into a corresponding quantity of second primary key identity sets. Based on this, the load balancing node divides the first primary key identity set into the S second primary key identity sets. Each primary key identity in the second primary key identity set is managed by a same node. S is an integer greater than or equal to 1. Then, the load balancing node assigns the S second primary key identity sets to the S nodes, such that each node may manage the data identified by multiple primary key identities in the second primary key identity set.
  • Establishment of the mapping relationship between the first primary key identity and the second partition identity is used as an example. A second primary key identity set including the first primary key identity may be first determined from the S second primary key identity sets. The second primary key identity set including the first primary key identity is determined as the third primary key identity set. Since the S second primary key identity sets are assigned to the S nodes, the first node managing the third primary key identity set may further be determined. Further, the load balancing node obtains a partition identity set corresponding to the first node, and determines the second partition identity corresponding to the first data from partition identities which are in the partition identity set corresponding to the first node and for which mapping relationships are not established. It is to be understood that, after this step is completed, mapping relationships cannot be established between other data and the second partition identity, unless the first data no longer corresponds to the second partition identity. Then, the load balancing node may establish the mapping relationship between the first primary key identity and the second partition identity. It is to be understood that, for another data identity, a mapping relationship between a primary key identity and a partition identity may also be established in a similar manner, thereby generating the second route assignment table.
  • In actual applications, all the data imported in the system is processed as above, thereby generating the second route assignment table. The second route assignment table includes mapping relationships between primary key identities of all the data and partition identities. The following describes, in detail based on FIG. 10, how to obtain the route assignment table through initialization, so as to further understand the initialization steps in this solution. FIG. 10 is a schematic flowchart of obtaining the second route assignment table according to an embodiment of this disclosure. As shown in FIG. 10, in step J1, load balancing is started. In a scenario in which load balancing is started and initialization is desired, the node is used for the first time and the data is imported, or the node completes data migration. This is not limited herein. In step J2, a corresponding parameter is loaded for initialization, that is, input of the parameter corresponding to the load balancing node is completed, and an initialization phase is entered. Input of the parameter may include inputting the load balancing parameter "USER_SPECIFIED_ALGORITHM" or the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS". It is to be understood that the input parameter in this embodiment necessarily includes "USER_SPECIFIED_ALGORITHM", but does not necessarily include "USER_SPECIFIED_PARITITON_KEYS". In step 13, whether the user specifies the partition identity is determined, that is, whether the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS" may be detected is determined. If the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS" is detected, it is determined that the user specifies the partition identity, and step J4 is performed. In step J4, a primary key identity and a partition identity of each user table are collected, and consistent hash division is performed on the partition identities according to node values corresponding to the nodes to obtain a node specifically indicated by each partition identity. After identity information of all user tables is collected, mapping relationships are established based on the primary key identity and the partition identity of each user table. Therefore, in step J5, the second route assignment table is generated.
  • If the user-specified parameter "USER_SPECIFIED_PARITITON_KEYS" is not detected, it is determined that the user does not specify the partition identity, and step J6 is performed. In step J6, load-balancing-based partitioning is started. Specifically, a related data structure is first initialized. Partition information is initialized for each user table, and statistical primary key information is collected for each user table. Therefore, in step J7, the first primary key identity set is obtained. For more ease of understanding this solution, the following uses establishment of the mapping relationship between the first primary key identity and the second partition identity as an example for description. Based on this, in step J8, the first primary key identity set is divided into the S second primary key identity sets, and the S second primary key identity sets are assigned to the S nodes. In step J9, the first node managing the third primary key identity set is determined from the S nodes, and the partition identity set corresponding to the first node is obtained. Based on this, in step J10, the second partition identity corresponding to the first data is determined from the partition identity set. Processing of steps J8 to J10 is performed on each row of data in each user table, and then the mapping relationships between the primary key identities and the partition identities are established. In this way, the second route assignment table may be generated in step J5.
  • It is to be understood that the example in FIG. 10 is used only to explain a process in which the route assignment table is obtained through initialization, and is not to be understood as a limitation on a specific process in this solution. Second, a specific implementation has been described in the foregoing embodiments, and will not be elaborated herein.
  • This embodiment of this disclosure provides a method for obtaining the route assignment table through initialization. In the foregoing manner, the route assignment table is obtained in different initialization manners. Therefore, feasibility and flexibility of this solution are improved.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided in this disclosure, that the load balancing node obtains a second route assignment table specifically includes that: the load balancing node obtains a third route assignment table; the load balancing node receives a data addition instruction, the data addition instruction carrying the mapping relationship between the first primary key identity and the second partition identity; the load balancing node obtains the mapping relationship between the first primary key identity and the second partition identity according to the data addition instruction; and the load balancing node adds the mapping relationship between the first primary key identity and the second partition identity to the third route assignment table to obtain the first route assignment table.
  • In this embodiment, the load balancing node obtains the third route assignment table. The third route assignment table may be a route assignment table obtained through initialization, or a route assignment table obtained by updating the mapping relationship. This is not specifically limited herein. When there is a new user account getting on-line, and there is related data of a new transaction is desired to be added, the load balancing node receives the data addition instruction. A user table is desired to be created for the new user account. The user table specifically includes multiple primary key identities corresponding to the related data of the new transaction and nodes to which the related data of the new transaction is specified by the user to belong to in the user table (that is, partition identities are obtained). If the related data of the new transaction includes the first data, the data addition instruction carries the mapping relationship between the first primary key identity and the second partition identity. Clearly, the load balancing node may obtain the mapping relationship between the first primary key identity and the second partition identity according to the data addition instruction, and add the mapping relationship between the first primary key identity and the second partition identity to the third route assignment table to obtain the first route assignment table. That is, the related data of the new transaction is added by using mapping relationships between the primary key identities and the partition identities that are specified by the user.
  • Alternatively, if the user does not specify the mapping relationships, the load balancing node may assign the related data of the new transaction for uniform addition and distribution to different nodes. Then, the existing third route assignment table is initialized in a manner similar to the foregoing. In this way, the first route assignment table may be obtained.
  • Optionally, based on the embodiment corresponding to FIG. 4, in an optional embodiment of the data migration method provided in this disclosure, that the load balancing node obtains a second route assignment table specifically includes that: the load balancing node obtains a fourth route assignment table, the fourth route assignment table including a mapping relationship between a second primary key identity and the first partition identity; the load balancing node receives a data deletion instruction, the data deletion instruction carrying the second primary key identity; the load balancing node obtains the second primary key identity according to the data deletion instruction; the load balancing node determines, from the fourth route assignment table, the mapping relationship between the second primary key identity and the first partition identity based on the second primary key identity; and the load balancing node deletes the mapping relationship between the second primary key identity and the first partition identity in the fourth route assignment table to obtain the second route assignment table.
  • In this embodiment, the load balancing node obtains the fourth route assignment table. The fourth route assignment table includes the mapping relationship between the second primary key identity and the first partition identity. The fourth route assignment table may be a route assignment table obtained through initialization, or a route assignment table obtained by updating the mapping relationship. This is not specifically limited herein. When a user gets off-line, or the user stops a specific transaction, the load balancing node receives the data deletion instruction. The data deletion instruction carries the second primary key identity. The second primary key identity indicates data in the transaction that the user is desired to stop. Clearly, the load balancing node may obtain the second primary key identity according to the data deletion instruction, determine, from the fourth route assignment table, the mapping relationship between the second primary key identity and the first partition identity based on the second primary key identity, and delete the mapping relationship between the second primary key identity and the first partition identity in the fourth route assignment table to obtain the second route assignment table. Then, the second route assignment table no longer includes the foregoing mapping relationship.
  • It is to be understood that since one transaction may include multiple pieces of data, when the user stops a specific transaction, all mapping relationships corresponding to the multiple pieces in the transaction in the route assignment table are deleted in a manner similar to the foregoing.
  • It is to be understood that when receiving the data addition instruction or the data deletion instruction, a specific node in the load balancing system may update the route assignment table by using its own computing capability, to complete addition or deletion of a mapping relationship. However, the node is also desired to transmit an updated route assignment table to another node to ensure consistency of the route assignment table in the load balancing system.
  • This embodiment of this disclosure provides another method for obtaining the second route assignment table. In the foregoing manner, the mapping relationship in the existing route assignment table may be updated by using the data addition instruction or the data deletion instruction. In this way, when a transaction and corresponding data in the load balancing system are updated in real time, it is ensured that an obtained route assignment table can accurately reflect a mapping relationship between a primary key identity and a partition identity of each piece of data, thereby ensuring data processing accuracy of each node.
  • The following describes a data migration apparatus in this disclosure in detail. Refer to FIG. 11. FIG. 11 is a schematic diagram of a structure of the data migration apparatus according to an embodiment of this disclosure. As shown in FIG. 11, the data migration apparatus 1100 includes: an obtaining module 1101, configured to obtain a first route assignment table, the first route assignment table including a mapping relationship between a first primary key identity and a first partition identity, the first primary key identity being used for uniquely identifying first data, and the first partition identity indicating a second node; a receiving module 1102, configured to receive a first instruction, the first instruction carrying the first primary key identity and a first transaction identity, the first transaction identity indicating a first transaction, and a first node being configured to process the first transaction; the obtaining module 1101 being further configured to obtain, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction; a processing module 1103, configured to determine the second node based on the first partition identity, the second node being configured to process the first transaction; and a transmission module 1104, configured to transmit the first data to the second node.
  • Optionally, based on the embodiment corresponding to FIG. 11, the first route assignment table further includes a mapping relationship between a first index identity and the first partition identity. The first instruction further includes the first index identity. The obtaining module 1101 is specifically configured to: determine, by using the first route assignment table, N partition identities based on the first index identity carried in the first instruction, the N partition identities including the first partition identity, and N being an integer greater than or equal to 1; and determine, by using the first route assignment table, the first partition identity from the N partition identities based on the first primary key identity carried in the first instruction.
  • Optionally, based on the embodiment corresponding to FIG. 11, the data migration apparatus 1100 further includes a deletion module 1105. The obtaining module 1101 is further configured to obtain a second route assignment table at a first time point, the second route assignment table including a mapping relationship between the first primary key identity and a second partition identity, and the second partition identity indicating the first node. The obtaining module 1101 is specifically configured to obtain the first route assignment table at a second time point, the second time point being later than the first time point. The deletion module 1105 is configured to delete the second route assignment table after the transmission module transmits the first data uniquely identified by the first primary key identity to the second node.
  • Optionally, based on the embodiment corresponding to FIG. 11, the data migration apparatus 1100 further includes a determining module 1106 and a replacement module 1107. The obtaining module 1101 is further configured to obtain a second route assignment table, the second route assignment table including a mapping relationship between the first primary key identity and a second partition identity, the second partition identity indicating the first node, and the first node being configured to process the first transaction. The determining module 1106 is configured to determine the first data and the second node in response to determining that a data migration condition is satisfied, the first data being data desired to be migrated to the second node. The replacement module 1107 is configured to replace the second partition identity in the second route assignment table with the first partition identity to obtain the first route assignment table, the first partition identity being used for uniquely identifying the second node. The transmission module 1104 is further configured to transmit the first route assignment table to the first node and the second node.
  • Optionally, based on the embodiment corresponding to FIG. 11, the data migration condition is that a ratio of a 2PC transaction processing throughput to a total transaction processing throughput of a node is greater than the first preset threshold. The determining module 1106 is specifically configured to: obtain a 2PC transaction identity in a case that log information transmitted by the first node is received in a first preset period, the log information transmitted by the first node including the 2PC transaction identity, and the 2PC transaction identity indicating that the log information is generated after the first node processes a 2PC transaction; statistically obtain a total transaction processing throughput of the first node based on the log information transmitted by the first node; statistically obtain a 2PC transaction processing throughput of the first node based on the 2PC transaction identity; and determine, in a case that a ratio of the 2PC transaction processing throughput of the first node to the total transaction processing throughput of the first node is greater than the first preset threshold, that the data migration condition is satisfied.
  • Optionally, based on the embodiment corresponding to FIG. 11, the data migration condition is that total memory usage of a node is greater than a second preset threshold. The determining module 1106 is specifically configured to: receive, in a second preset period, total memory usage of the first node transmitted by the first node, the total memory usage indicating a memory resource occupied by multiple transactions processed by the first node; and determine, in a case that the total memory usage of the first node is greater than the second preset threshold, that the data migration condition is satisfied.
  • Optionally, based on the embodiment corresponding to FIG. 11, the log information transmitted by the first node further includes the first transaction identity and the first primary key identity. The receiving module 1102 is further configured to receive log information transmitted by the second node and log information transmitted by a third node, the log information transmitted by the second node including the first transaction identity and the first primary key identity, and the log information transmitted by the third node including the first transaction identity and the first primary key identity. The determining module 1106 is specifically configured to: collect statistics on the log information transmitted by the first node, the log information transmitted by the second node, and the log information transmitted by the third node, to obtain that a quantity of times the first node initiates the first transaction to the second node is L and that a quantity of times the first node initiates the first transaction to the third node is M, L and M being integers greater than or equal to 1; and determine the second node in a case that L is greater than M, and determine the first data by using the second route assignment table based on the first primary key identity.
  • Optionally, based on the embodiment corresponding to FIG. 11, the data migration condition is that a node fails. The determining module 1106 is specifically configured to determine, in a case that the first node does not transmit log information of the first node to a load balancing node in a first preset period, that the data migration condition is satisfied. The receiving module 1102 is further configured to receive, in the first preset period, log information transmitted by the second node and log information transmitted by a third node, the log information transmitted by the second node including the first transaction identity and the first primary key identity, and the log information transmitted by the third node including the first transaction identity and the first primary key identity. The determining module 1106 is specifically configured to: obtain a partition identity set corresponding to the first node, the partition identity set corresponding to the first node including the first partition identity; obtain, based on the partition identity set corresponding to the first node, a primary key identity set corresponding to the first node, the primary key identity set corresponding to the first node including the first primary key identity; and determine the first data and the second node based on the first primary key identity, the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node.
  • Optionally, based on the embodiment corresponding to FIG. 11, the determining module 1106 is specifically configured to: determine, by using the second route assignment table, the first data based on the first primary key identity; collect statistics on the log information transmitted by the second node and the log information transmitted by the third node, to obtain that a quantity of times the second node initiates the first transaction is Q and that a quantity of times the third node initiates the first transaction is P, Q and P being integers greater than or equal to 1; and determine the second node in a case that Q is greater than P, and determine the first data by using the second route assignment table based on the first primary key identity.
  • Optionally, based on the embodiment corresponding to FIG. 11, the obtaining module 1101 is specifically configured to: obtain a first primary key identity set, the first primary key identity set including multiple primary key identities, and one primary key identity being used for uniquely identifying the first data; divide the first primary key identity set into S second primary key identity sets, each primary key identity in the second primary key identity set being managed by a same node, and S being an integer greater than or equal to 1; assign the S second primary key identy sets to S nodes; determine, from the S nodes, the first node managing a third primary key identity set, the third primary key identity set including the first primary key identity; obtain a partition identity set corresponding to the first node; determine, from the partition identity set, the second partition identity corresponding to the first data; and establish the mapping relationship between the first primary key identity and the second partition identity, and generate the first route assignment table.
  • Optionally, based on the embodiment corresponding to FIG. 11, the obtaining module 1101 is specifically configured to: obtain a third route assignment table; receive a data addition instruction, the data addition instruction carrying the mapping relationship between the first primary key identity and the second partition identity; obtain the mapping relationship between the first primary key identity and the second partition identity according to the data addition instruction; and add the mapping relationship between the first primary key identity and the second partition identity to the third route assignment table to obtain the first route assignment table.
  • Optionally, based on the embodiment corresponding to FIG. 11, the obtaining module 1101 is specifically configured to: obtain a fourth route assignment table, the fourth route assignment table including a mapping relationship between a second primary key identity and the first partition identity; receive a data deletion instruction, the data deletion instruction carrying the second primary key identity; obtain the second primary key identity according to the data deletion instruction; determine, from the fourth route assignment table, the mapping relationship between the second primary key identity and the first partition identity based on the second primary key identity; and delete the mapping relationship between the second primary key identity and the first partition identity in the fourth route assignment table to obtain the second route assignment table.
  • An embodiment of this disclosure also provides another data migration apparatus. An example in which the data migration apparatus is deployed in a server is used in this disclosure for description. Refer to FIG. 12. FIG. 12 is a schematic diagram of an embodiment of a server according to an embodiment of this disclosure. As shown in this figure, the server 1000 may differ greatly in case of different configurations or performance. It may include one or more central processing units (CPUs) 1022 (for example, one or more processors) and memories 1032, and one or more storage media 1030 (for example, one or more mass storage devices) that store application programs 1042 or data 1044. The memory 1032 and the storage medium 1030 may implement temporary storage or persistent storage. The program stored in the storage medium 1030 may include one or more modules (not shown in the figure), each of which may include a series of instruction operations in the server. Furthermore, the CPU 1022 may be configured to communicate with the storage medium 1030 to execute, on the server 1000, the series of instruction operations in the storage medium 1030.
  • The server 1000 may further include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1058, and/or one or more operating systems 1041, for example, Windows Server, Mac OS X, Unix, Linux, and FrEBSD.
  • The steps performed by the server in the foregoing embodiments may be based on the structure of the server shown in FIG. 12.
  • The CPU 1022 of the server is configured to execute the embodiment shown in FIG. 4 and each embodiment corresponding to FIG. 4.
  • An embodiment of this disclosure also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program which, when run in a computer, enables the computer to perform the steps performed by a node of the server or the load balancing node in the method described in the embodiment shown in FIG. 4 and the method described in each embodiment corresponding to FIG. 4.
  • An embodiment of this disclosure also provides a computer program product including a program. The computer program product, when run in a computer, enables the computer to perform the steps performed by a node of the server or the load balancing node in the method described in the embodiment shown in FIG. 4.
  • A person skilled in the art may clearly learn that for ease and brevity of description, specific working processes of the foregoing system, apparatus, and unit may refer to the corresponding processes in the method embodiment, and will not be elaborated herein.
  • In some embodiments provided in this disclosure, it is to be understood that the disclosed system, apparatus, and method may be implemented in another manner. For example, the apparatus embodiment described above is merely schematic. For example, division of the units is merely a logic function division, and other division manners may be used in actual implementations. For example, multiple units or components may be combined or integrated into another system, or some characteristics may be neglected or not executed. In addition, coupling or direct coupling or communication connection between displayed or discussed components may be indirect coupling or communication connection, implemented through some interfaces, of the apparatus or the units, and may be electrical and mechanical or use other forms.
  • The units described as separate parts may or may not be physically separated. Parts displayed as units may or may not be physical units, that is, may be located in the same place or distributed to multiple network units. Some or all of the units may be selected as actually required to achieve an objective of the solution of this embodiment.
  • In addition, each function unit in each embodiment of this disclosure may be integrated into a processing unit. Alternatively, each unit may physically exist independently. Alternatively, two or more than two units may be integrated into a unit. The integrated unit may be implemented in a hardware form or in a form of a software function unit.
  • When implemented in the form of the software function unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this disclosure substantially or parts making contributions to the related art or all or part of the technical solutions may be embodied in a form of a software product. The computer software product is stored in a storage medium, including multiple instructions for enabling a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or some of the steps in the method as described in each embodiment of this disclosure. The foregoing storage medium includes various media capable of storing program code, for example, a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
  • The above embodiment is used not to limit but only to describe the technical solutions of this disclosure. Although this disclosure is described in detail with reference to the foregoing embodiments, it is to be understood by a person of ordinary skill in the art that modifications may still be made to the technical solutions described in the foregoing embodiments, or equivalent replacements may be made to some technical features. Essences of corresponding technical solutions obtained by these modifications or replacements are kept within a spirit and a scope of the technical solution of each embodiment of this disclosure.

Claims (16)

  1. A data migration method, the method being applied to a load balancing system, the load balancing system comprising a first node and a second node, and the method comprising:
    receiving, by the first node, a first instruction carrying a first primary key identity and a first transaction identity, the first primary key identity indicating first data that is to be processed, the first transaction identity indicating a first transaction, the first node being used for processing the first transaction;
    obtaining, by the first node and using a first route assignment table, a first partition identity based on the first primary key identity, the first route assignment table containing a mapping relationship between the first primary key identity and the first partition identity, the first partition identity indicating a second node for processing the first transaction;
    determining, by the first node, the second node based on the first partition identity; and
    transmitting, by the first node, the first data to the second node.
  2. The method according to claim 1, wherein the first route assignment table further comprises a mapping relationship between a first index identity and the first partition identity;
    the first instruction further comprises the first index identity; and
    the obtaining, by the first node and using a first route assignment table, a first partition identity based on the first primary key identity comprises:
    determining, by the first node by using the first route assignment table, N partition identities based on the first index identity carried in the first instruction, the N partition identities comprising the first partition identity, and N being an integer greater than or equal to 1; and
    determining, by the first node by using the first route assignment table, the first partition identity from the N partition identities based on the first primary key identity carried in the first instruction.
  3. The method according to claim 1, further comprising:
    obtaining, by the first node, a second route assignment table at a first time point, the second route assignment table comprising a mapping relationship between the first primary key identity and a second partition identity, and the second partition identity indicating the first node; wherein
    obtaining, by the first node, the first route assignment table comprises:
    obtaining, by the first node, the first route assignment table at a second time point, the second time point being later than the first time point; and
    after the transmitting, by the first node, the first data to the second node, the method further comprises:
    deleting, by the first node, the second route assignment table.
  4. The method according to claim 1, wherein the load balancing system further comprises a load balancing node; and
    the method further comprises:
    obtaining, by the load balancing node, a second route assignment table, the second route assignment table comprising a mapping relationship between the first primary key identity and a second partition identity, and the second partition identity indicating the first node;
    determining, by the load balancing node, the first data and the second node in response to determining that a data migration condition is satisfied, the first data being data desired to be migrated to the second node;
    replacing, by the load balancing node, the second partition identity in the second route assignment table with the first partition identity to obtain the first route assignment table, the first partition identity being used for uniquely identifying the second node; and
    transmitting, by the load balancing node, the first route assignment table to the first node and the second node.
  5. The method according to claim 4, wherein the data migration condition is that a ratio of a two-phase commit (2PC) transaction processing throughput to a total transaction processing throughput of a node is greater than a first preset threshold; and
    the determining that a data migration condition is satisfied comprises:
    obtaining, by the load balancing node, a 2PC transaction identity in a case that log information transmitted by the first node is received in a first preset period, the log information transmitted by the first node comprising the 2PC transaction identity, and the 2PC transaction identity indicating that the log information is generated after the first node processes a 2PC transaction;
    obtaining, statistically by the load balancing node, a total transaction processing throughput of the first node based on the log information transmitted by the first node;
    obtaining, statistically by the load balancing node, a 2PC transaction processing throughput of the first node based on the 2PC transaction identity; and
    determining, by the load balancing node in a case that a ratio of the 2PC transaction processing throughput of the first node to the total transaction processing throughput of the first node is greater than the first preset threshold, that the data migration condition is satisfied.
  6. The method according to claim 4, wherein the data migration condition is that total memory usage of a node is greater than a second preset threshold;
    the determining that a data migration condition is satisfied comprises:
    receiving, in a second preset period, total memory usage of the first node transmitted by the first node, the total memory usage indicating a memory resource occupied by a plurality of transactions processed by the first node; and
    determining, by the load balancing node in a case that the total memory usage of the first node is greater than the second preset threshold, that the data migration condition is satisfied.
  7. The method according to any one of claims 4 to 6, wherein the load balancing system further comprises a third node;
    the log information transmitted by the first node further comprises the first transaction identity and the first primary key identity;
    the method further comprises:
    receiving, by the load balancing node, log information transmitted by the second node and log information transmitted by the third node, the log information transmitted by the second node comprising the first transaction identity and the first primary key identity, and the log information transmitted by the third node comprising the first transaction identity and the first primary key identity; and
    the determining, by the load balancing node, the first data and the second node comprises:
    collecting, by the load balancing node, statistics on the log information transmitted by the first node, the log information transmitted by the second node, and the log information transmitted by the third node, to obtain a result that a quantity of times the first node initiates the first transaction to the second node is L and that a quantity of times the first node initiates the first transaction to the third node is M, L and M being integers greater than or equal to 1; and
    determining, by the load balancing node, the second node in a case that L is greater than M, and determining the first data by using the second route assignment table based on the first primary key identity.
  8. The method according to claim 4, wherein the data migration condition is that a node fails;
    the load balancing system further comprises a third node;
    the determining that a data migration condition is satisfied comprises:
    determining, by the load balancing node in a case that the first node does not transmit log information of the first node to the load balancing node in a first preset period, that the data migration condition is satisfied;
    the method further comprises:
    receiving, by the load balancing node in the first preset period, log information transmitted by the second node and log information transmitted by the third node, the log information transmitted by the second node comprising the first transaction identity and the first primary key identity, and the log information transmitted by the third node comprising the first transaction identity and the first primary key identity; and
    the determining, by the load balancing node, the first data and the second node comprises:
    obtaining, by the load balancing node, a partition identity set corresponding to the first node, the partition identity set corresponding to the first node comprising the first partition identity;
    obtaining, by the load balancing node based on the partition identity set corresponding to the first node, a primary key identity set corresponding to the first node, the primary key identity set corresponding to the first node comprising the first primary key identity; and
    determining, by the load balancing node, the first data and the second node based on the first primary key identity, the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node.
  9. The method according to claim 8, wherein the determining, by the load balancing node, the first data and the second node based on the first transaction identity, the log information transmitted by the second node, and the log information transmitted by the third node comprises:
    determining, by the load balancing node by using the second route assignment table, the first data based on the first primary key identity;
    collecting, by the load balancing node, statistics on the log information transmitted by the second node and the log information transmitted by the third node, to obtain a result that a quantity of times the second node initiates the first transaction is Q and that a quantity of times the third node initiates the first transaction is P, Q and P being integers greater than or equal to 1; and
    determining, by the load balancing node, the second node in a case that Q is greater than P, and determining the first data by using the second route assignment table based on the first primary key identity.
  10. The method according to claim 4, wherein the obtaining, by the load balancing node, a second route assignment table comprises:
    obtaining, by the load balancing node, a first primary key identity set, the first primary key identity set comprising a plurality of primary key identities, and one primary key identity being used for uniquely identifying the first data;
    dividing, by the load balancing node, the first primary key identity set into S second primary key identity sets, primary key identities in the second primary key identity set being managed by a same node, and S being an integer greater than or equal to 1;
    assigning, by the load balancing node, the S second primary key identities to S nodes;
    determining, by the load balancing node from the S nodes, the first node managing a third primary key identity set, the third primary key identity set comprising the first primary key identity;
    obtaining, by the load balancing node, a partition identity set corresponding to the first node;
    determining, by the load balancing node from the partition identity set, the second partition identity corresponding to the first data; and
    establishing, by the load balancing node, the mapping relationship between the first primary key identity and the second partition identity, and generating the first route assignment table.
  11. The method according to claim 4, wherein the obtaining, by the load balancing node, a second route assignment table comprises:
    obtaining, by the load balancing node, a third route assignment table;
    receiving, by the load balancing node, a data addition instruction, the data addition instruction carrying the mapping relationship between the first primary key identity and the second partition identity;
    obtaining, by the load balancing node, the mapping relationship between the first primary key identity and the second partition identity according to the data addition instruction; and
    adding, by the load balancing node, the mapping relationship between the first primary key identity and the second partition identity to the third route assignment table to obtain the first route assignment table.
  12. The method according to claim 4, wherein the obtaining, by the load balancing node, a second route assignment table comprises:
    obtaining, by the load balancing node, a fourth route assignment table, the fourth route assignment table comprising a mapping relationship between a second primary key identity and the first partition identity;
    receiving, by the load balancing node, a data deletion instruction, the data deletion instruction carrying the second primary key identity;
    obtaining, by the load balancing node, the second primary key identity according to the data deletion instruction; and
    determining, by the load balancing node from the fourth route assignment table, the mapping relationship between the second primary key identity and the first partition identity based on the second primary key identity; and
    deleting, by the load balancing node, the mapping relationship between the second primary key identity and the first partition identity in the fourth route assignment table to obtain the second route assignment table.
  13. A data migration apparatus, comprising:
    an obtaining module, configured to obtain a first route assignment table, the first route assignment table comprising a mapping relationship between a first primary key identity and a first partition identity, the first primary key identity being used for uniquely identifying first data, the first partition identity indicating a second node, and the second node and a first node being in a load balancing system;
    a receiving module, configured to receive a first instruction, the first instruction carrying the first primary key identity and a first transaction identity, the first transaction identity indicating a first transaction, and the first node being configured to process the first transaction;
    the obtaining module being further configured to obtain, by using the first route assignment table, the first partition identity based on the first primary key identity carried in the first instruction;
    a processing module, configured to determine the second node based on the first partition identity, the second node being configured to process the first transaction; and
    a transmission module, configured to transmit the first data to the second node.
  14. A computer device, comprising a memory, a transceiver, a processor, and a bus system,
    the memory being configured to store a program;
    the processor being configured to execute the program in the memory to implement the method according to any one of claims 1 to 12; and
    the bus system being configured to connect the memory to the processor to enable the memory to communicate with the processor.
  15. A computer-readable storage medium, comprising instructions which, when run in a computer, enable the computer to perform the method according to any one of claims 1 to 12.
  16. A computer program product comprising instructions, the computer program product, when run in a computer, enabling the computer to perform the method according to any one of claims 1 to 12.
EP22868914.7A 2021-09-14 2022-08-15 Data migration method and apparatus, and device, medium and computer product Pending EP4293510A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111076493.3A CN113515364B (en) 2021-09-14 2021-09-14 Data migration method and device, computer equipment and storage medium
PCT/CN2022/112395 WO2023040538A1 (en) 2021-09-14 2022-08-15 Data migration method and apparatus, and device, medium and computer product

Publications (1)

Publication Number Publication Date
EP4293510A1 true EP4293510A1 (en) 2023-12-20

Family

ID=78063363

Family Applications (1)

Application Number Title Priority Date Filing Date
EP22868914.7A Pending EP4293510A1 (en) 2021-09-14 2022-08-15 Data migration method and apparatus, and device, medium and computer product

Country Status (4)

Country Link
US (1) US20230367749A1 (en)
EP (1) EP4293510A1 (en)
CN (1) CN113515364B (en)
WO (1) WO2023040538A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515364B (en) * 2021-09-14 2022-03-01 腾讯科技(深圳)有限公司 Data migration method and device, computer equipment and storage medium
CN113778692B (en) * 2021-11-10 2022-03-08 腾讯科技(深圳)有限公司 Data processing method and device, computer equipment and storage medium
CN117235185B (en) * 2023-11-09 2024-01-16 腾讯科技(深圳)有限公司 Data balance processing method and device and electronic equipment
CN117349267B (en) * 2023-12-04 2024-03-22 和元达信息科技有限公司 Database migration processing method and system

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8849749B2 (en) * 2010-05-14 2014-09-30 Oracle International Corporation Load balancing in parallel database systems using multi-reordering
US9183102B2 (en) * 2011-09-30 2015-11-10 Alcatel Lucent Hardware consumption architecture
WO2013117002A1 (en) * 2012-02-09 2013-08-15 华为技术有限公司 Method, device and system for data reconstruction
CN105468473B (en) * 2014-07-16 2019-03-01 北京奇虎科技有限公司 Data migration method and data migration device
CN104219163B (en) * 2014-08-28 2016-08-17 杭州天宽科技有限公司 The load-balancing method that a kind of node based on dynamic copies method and dummy node method dynamically moves forward
CN106446111B (en) * 2016-09-14 2020-01-14 Oppo广东移动通信有限公司 Data migration method and terminal
CN106973021A (en) * 2017-02-27 2017-07-21 华为技术有限公司 The method and node of load balancing in network system
US10956197B2 (en) * 2017-12-13 2021-03-23 Citrix Systems, Inc. Virtual machine with an emulator manager for migration of synchronized streams of state data
CN109995813B (en) * 2017-12-29 2021-02-26 华为技术有限公司 Partition expansion method, data storage method and device
CN109766190A (en) * 2019-01-15 2019-05-17 无锡华云数据技术服务有限公司 Cloud resource dispatching method, device, equipment and storage medium
CN111984395B (en) * 2019-05-22 2022-12-13 中移(苏州)软件技术有限公司 Data migration method, system and computer readable storage medium
WO2021046750A1 (en) * 2019-09-11 2021-03-18 华为技术有限公司 Data redistribution method, device, and system
CN112578997B (en) * 2019-09-30 2022-07-22 华为云计算技术有限公司 Data migration method, system and related equipment
CN111190883A (en) * 2019-12-03 2020-05-22 腾讯科技(深圳)有限公司 Data migration method and device, computer readable storage medium and computer equipment
CN111563070A (en) * 2020-05-15 2020-08-21 苏州浪潮智能科技有限公司 Method and device for storing result of Hash algorithm
CN113297166A (en) * 2020-07-27 2021-08-24 阿里巴巴集团控股有限公司 Data processing system, method and device
CN113342783A (en) * 2021-06-30 2021-09-03 招商局金融科技有限公司 Data migration method and device, computer equipment and storage medium
CN113312339B (en) * 2021-07-28 2021-10-29 腾讯科技(深圳)有限公司 Data migration method and device, computer equipment and storage medium
CN113515364B (en) * 2021-09-14 2022-03-01 腾讯科技(深圳)有限公司 Data migration method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113515364A (en) 2021-10-19
US20230367749A1 (en) 2023-11-16
WO2023040538A1 (en) 2023-03-23
CN113515364B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
EP4293510A1 (en) Data migration method and apparatus, and device, medium and computer product
US9052962B2 (en) Distributed storage of data in a cloud storage system
JP6353924B2 (en) Reduced data volume durability status for block-based storage
JP5841177B2 (en) Method and system for synchronization mechanism in multi-server reservation system
CN110737442A (en) edge application management method and system
CN112532675B (en) Method, device and medium for establishing network edge computing system
Edwin et al. An efficient and improved multi-objective optimized replication management with dynamic and cost aware strategies in cloud computing data center
JP2018514018A (en) Timely resource migration to optimize resource placement
Torabi et al. Data replica placement approaches in fog computing: a review
CN107085539B (en) cloud database system and dynamic cloud database resource adjustment method
US20060200469A1 (en) Global session identifiers in a multi-node system
EP2875439A1 (en) Migrating applications between networks
CN103200020A (en) Resource allocating method and resource allocating system
EP3442201B1 (en) Cloud platform construction method and cloud platform
US10761869B2 (en) Cloud platform construction method and cloud platform storing image files in storage backend cluster according to image file type
CN112835977A (en) Database management method and system based on block chain
EP4068725A1 (en) Load balancing method and related device
Sotiriadis et al. Advancing inter-cloud resource discovery based on past service experiences of transient resource clustering
US11881996B2 (en) Input and output for target device communication
JP2024514467A (en) Geographically distributed hybrid cloud cluster
US11121981B1 (en) Optimistically granting permission to host computing resources
CN112910796A (en) Traffic management method, apparatus, device, storage medium, and program product
KR101681651B1 (en) System and method for managing database
US11381468B1 (en) Identifying correlated resource behaviors for resource allocation
US20240086559A1 (en) Permission synchronization across computing sites

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20230913

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR