WO2022001942A1 - 数据迁移方法、装置、网络设备和存储介质 - Google Patents

数据迁移方法、装置、网络设备和存储介质 Download PDF

Info

Publication number
WO2022001942A1
WO2022001942A1 PCT/CN2021/102712 CN2021102712W WO2022001942A1 WO 2022001942 A1 WO2022001942 A1 WO 2022001942A1 CN 2021102712 W CN2021102712 W CN 2021102712W WO 2022001942 A1 WO2022001942 A1 WO 2022001942A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
migration
incremental
partition
migrated
Prior art date
Application number
PCT/CN2021/102712
Other languages
English (en)
French (fr)
Inventor
张启军
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP21832075.2A priority Critical patent/EP4170510A4/en
Priority to JP2022581476A priority patent/JP2023531805A/ja
Publication of WO2022001942A1 publication Critical patent/WO2022001942A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning

Definitions

  • the present disclosure relates to, but is not limited to, the technical field of database management.
  • the present disclosure provides a data migration method, including: determining incremental migration data according to a global transaction identifier value in a data storage information table, wherein the global transaction identifier value is used to indicate the frequency of business processing; data is migrated.
  • the present disclosure provides a data migration device, comprising: a determination module configured to determine incremental migration data according to a global transaction identifier value in a data storage information table, wherein the global transaction identifier value is used to indicate the value of the business processing Frequency; migration module, configured to migrate incremental migration data.
  • the present disclosure provides a network device, comprising: one or more processors; a memory having one or more programs stored thereon, when the one or more programs are executed by the one or more processors, so that a The or multiple processors implement any one of the data migration methods in the present disclosure.
  • the present disclosure provides a storage medium, where the storage medium stores a computer program, and when the computer program is executed by a processor, any one of the data migration methods in the present disclosure is implemented.
  • FIG. 1 shows a schematic flowchart of the data migration method of the present disclosure.
  • FIG. 2 shows a schematic flowchart of the data migration method provided by the present disclosure.
  • FIG. 3 shows a schematic structural diagram of a data migration apparatus provided by the present disclosure.
  • FIG. 4 shows a block diagram of the composition of the data migration system in the present disclosure.
  • FIG. 5 shows a flow chart of the working method of the data migration system in the present disclosure.
  • FIG. 6 shows a schematic diagram of migrating data in the partition to be migrated in the data migration system of the present disclosure.
  • FIG. 7 shows a schematic diagram of migrating incremental migration data in the data migration system of the present disclosure.
  • FIG. 8 shows a block diagram of an exemplary hardware architecture of a computing device capable of implementing methods and apparatuses according to the present disclosure.
  • the offline redistribution method is to migrate the overall data first, and then redeploy the migrated data. Although it saves the processing time of newly added hot data, it needs to interrupt the existing business of the specified table;
  • the online redistribution method allows the entire data to be migrated without interrupting business processing, and then the migrated data can be deployed.
  • no matter which of the above redistribution methods is used to migrate data it will take a lot of time to process the overall data, resulting in low data migration efficiency; in addition, additional storage resources are needed to store intermediate temporary data, so that Unnecessary waste of storage resources.
  • the present disclosure provides a data migration method, apparatus, network device and storage medium, at least for solving the problem of low migration efficiency when migrating data in a distributed database.
  • FIG. 1 is a schematic flowchart of a data migration method of the present disclosure, which can be applied to a data migration apparatus. As shown in FIG. 1 , the method may include steps 110 and 120 as follows.
  • step 110 the incremental migration data is determined according to the global transaction identifier value in the data storage information table.
  • GTID Global Transaction Identifiers
  • step 110 may be implemented in the following manner: determine an active data list according to a preset identification range and a global transaction identification value in the data storage information table, where the active data list includes active data; migrate according to incremental data The decision filters active data to determine incremental migration data, wherein the incremental data migration decision is a decision determined according to the dynamic processing requirements of the business.
  • the preset identification range is variable.
  • the chase_min_gtid_1 in the first incremental data migration is used as min_gtid
  • the preset identifier range during the second incremental data migration is greater than or equal to chase_min_gtid_1 and less than chase_min_gtid_2.
  • the chase_min_gtid_i-1 in the latest ie, the i-1th incremental data migration
  • the preset for the i-th incremental data migration The identification range is greater than or equal to chase_min_gtid_i-1, and less than chase_min_gtid_i, where i may be an integer greater than 1.
  • all the data whose GTID value in the data storage information table is within the range of the corresponding preset identifier is used as the active data in the active data list.
  • the active data is filtered through the incremental data migration decision to determine the incremental migration data, so that the incremental migration data corresponding to the business can be quickly migrated to meet the dynamic processing requirements of the business, shorten the processing time of online data migration, and optimize the System performance of distributed databases.
  • step 120 the incremental migration data is migrated.
  • step 120 can be implemented by the following methods: obtaining the source storage address and primary key value of the incremental migration data in the data storage information table, wherein the source storage address includes a partition address and a grouping address; according to the source storage address and the primary key value to migrate incremental migration data from the source storage address to the target storage address.
  • the primary key value corresponding to the incremental migration data and the source storage address (ie, the source partition address and the source group address) of the incremental migration data can be obtained.
  • the GTID value is At 325
  • look up the data storage information table you can find the primary key key value corresponding to the incremental migration data is 8, and then obtain the incremental migration data with the primary key key value of 8 stored in the third partition in the first group, then the The source storage address of the incremental migration data is the third partition in the first group.
  • the incremental migration data in the third partition in the first group is fetched, and the incremental migration data is sent to the target storage address.
  • a migration file is generated according to the extracted incremental migration data, and the migration file is sent to the target storage address.
  • the incremental migration data is determined according to the global transaction identifier value in the data storage information table. Since the global transaction identifier value is used to indicate the frequency of business processing, if the database is processing a certain business data more and more The more frequent, the faster the global transaction identifier value will increase, so that the data corresponding to the global transaction identifier value becomes incremental migration data. Due to business requirements, the incremental migration data needs to be migrated, so that the data can be quickly migrated to the required devices, reducing the data migration time, improving the data migration efficiency, and improving the user experience.
  • the method further includes: dividing the data storage space into N groups according to the primary key value in the data storage information table, and each group is set to There is a unique group address, and the group is divided into M partitions, each partition has a unique partition address, where all data on the partition is only stored in one group corresponding to the partition, M is an integer greater than or equal to 1, N is an integer greater than or equal to 1; the data storage information table is generated according to the primary key key value, the global transaction identification value, the partition address and the grouping address.
  • the primary key value is the unique identifier that the distributed database can find the stored data, and can generally be set to be greater than or equal to 1, and less than or equal to the maximum number of data items that the distributed database can store.
  • the primary key value can be used as the index value to search the data in the distributed database.
  • the data storage space is divided into 4 groups, namely group g1, group g2, group g3 and group g4, and each group is provided with a unique group address.
  • the grouping g1 is divided into 3 partitions, namely the partition P1, the partition P2 and the partition P3; the group g2 is divided into 4 partitions, that is, the partitions P4, Partition P5, partition P6 and partition P7; divide group g3 into 5 partitions, namely partition P8, partition P9, partition P10, partition P11 and partition P12; divide group g4 into 6 partitions, namely partition P13, partition P14, Partition P15, Partition P16, Partition P17 and Partition P18.
  • Each partition has a unique partition address, and all data on the partition is only stored in one group corresponding to the partition.
  • the data in partition P1 is stored in the group g1, and the data in the partition P1 cannot be stored. Stored in two different groups at the same time. If both the group g1 and the group g2 own the data in the partition P1, the data in the partition P1 will be truncated, and the data in the partition P1 will be lost when the data in the partition P1 is migrated. In order to avoid this phenomenon, all the data in the partition P1 can only be stored in the group g1 to ensure the integrity of the data.
  • step 120 the method further includes: clearing the data in the source storage address.
  • the source storage address can provide storage services for other data, so as to increase the storage space of the distributed database and avoid waste of storage resources.
  • the method further includes: updating the data storage information table according to the target storage address.
  • the source storage address of the incrementally migrated data is replaced with the target storage address, so that the incrementally migrated data can be quickly found during the next data migration.
  • the method further includes: counting the migration time of the incremental migration data; and stopping the incremental migration data according to the migration time and a preset time threshold. migrate.
  • the migration time is the maximum time for the data migration device to process the incrementally migrated data, and the preset time threshold can be 1 minute, 5 minutes, etc. If the migration time is less than or equal to 5 minutes, stop the incremental migration data. migrate.
  • the above-mentioned preset time threshold is only an example for illustration, and may be specifically set according to specific conditions during specific implementation. Other unexplained preset time thresholds also fall within the protection scope of the present disclosure, and will not be repeated here.
  • the incremental migration data by counting the migration time of the incremental migration data, it can be determined whether the incremental migration data is successfully migrated, and by comparing the migration time with a preset time threshold, it can be determined whether to end the data migration process. It ensures that the incremental migration data can successfully complete the data migration to meet the processing requirements of the business, so that the processing time of online data migration can be shortened, and the system performance of the distributed database can be optimized.
  • FIG. 2 shows a schematic flowchart of a data migration method provided by the present disclosure, and the method can be applied to a data migration apparatus. As shown in FIG. 2 , the method may include the following steps 210 to 240 .
  • step 210 a data migration policy is obtained.
  • the data migration strategy is the strategy determined by the database administrator according to the data storage requirements. For example, when the storage space in the first partition of the distributed database cannot meet the data storage requirements, but the newly obtained data needs to be stored in the first partition, a part of the first partition that is not used for a long time may be stored for a long time. The data is migrated to other partitions, thereby generating a data migration policy, so that the first partition can obtain enough storage space to store the newly obtained data.
  • step 220 the data to be migrated is fully migrated according to the data migration policy.
  • the data to be migrated includes all data in the kth partition, and k is an integer greater than or equal to 1 and less than or equal to M.
  • the data to be migrated is all data in the first partition in the distributed database, and all data in the first partition in the distributed database is migrated to the second partition, so that the first partition can obtain free storage space, In order to facilitate the first partition to store other data that needs to be processed urgently.
  • step 220 can be implemented in the following manner: determining the to-be-migrated partition corresponding to the to-be-migrated data according to the data migration policy; and migrating the data in the to-be-migrated partition to the target partition.
  • the data migration strategy is a data merging strategy, that is, the data of the first partition and the data of the second partition need to be combined to generate new data. Then, the data in the partitions to be migrated (ie, the first partition and the second partition) are migrated to the target partition (eg, the third partition).
  • the target partition eg, the third partition.
  • step 230 the incremental migration data is determined according to the global transaction identifier value in the data storage information table.
  • step 240 the incremental migration data is migrated.
  • steps 230 to 240 in this embodiment are the same as steps 110 to 120 in the above-mentioned embodiment, and are not repeated here.
  • the data to be migrated is fully migrated through the data migration strategy, so that the data in the specified partition can be migrated, and data migration between partitions can be achieved without cross-grouping.
  • the incremental migration data is determined, and the incremental migration data is migrated, which ensures that the incremental migration data can be migrated in the process of dynamic business processing without affecting the business. Normal processing ensures the continuity of business processing and shortens the execution time of data migration.
  • only one data storage information table needs to be maintained to meet the needs of data migration, which reduces the requirements for hardware storage resources, saves user costs, and improves product competitiveness.
  • simulation data is stored in the data storage information table, and the simulation data is processed by the data migration method in the present disclosure, and the simulation data is migrated by the conventional data migration method, so that different technical performances can be obtained.
  • the data migration method in the present disclosure is implemented in the following manner.
  • the storage space of the digital distributed database is formatted into 128 partitions of different ranges, namely partition P1, partition P2, ..., partition P128.
  • 1 million pieces of data can be stored in each partition.
  • the data range in partition P1 is [1, 1000001)
  • the data range in partition P2 is [1000001, 2000001)
  • ... the data range in partition P127 is [126000001, 127000001)
  • the data range in is [127000001, MAXVALUE), where MAXVALUE represents the maximum value of the primary key value in the distributed database, which can be defined when the database is created.
  • each group owns and manages several corresponding partitions in the data storage information table, and as shown in Table 2, stores the valid data
  • the valid data partitions in the group g1 are partitions P1 to P10
  • the valid data partitions in the group g2 are partitions P11 to P70
  • the valid data partitions in the group g3 are partitions P71 to P128.
  • the database administrator formulates data migration strategies based on data storage requirements by analyzing the data in each partition.
  • the merging migration strategy shown in Table 3 is to combine the data in the to-be-migrated partition P1 to the to-be-migrated partition P10 in the group g1 and the data in the to-be-migrated partition P11 to the to-be-migrated partition P70 in the group g2, Store the data in the group g3; or, as shown in Table 4, according to the split migration strategy, store the data in the to-be-migrated partition P1 to the to-be-migrated partition P10 in the group g3 into the group g1, and store the data in the to-be-migrated partition in the group g3
  • the data in the partitions P11 to P70 to be migrated are stored in the group g2.
  • the first merge time of the data is counted; after the split migration strategy in Table 4 is executed to migrate the data, the first split time of the data is counted.
  • the data in the distributed database is migrated by using a conventional data migration method.
  • the data in the to-be-migrated partition P11 to the to-be-migrated partition P70 are merged, and the second merging time using the conventional data merging method is counted.
  • Execute database language distributed by range(key)(g2 values less than(10000000), g1 values less than(70000000), g3 values less than MAXVALUE), to realize the data splitting in Table 4, that is, to group the pending data in g3
  • the data in the migration partition P1 to the to-be-migrated partition P10 is stored in the group g1
  • the data in the to-be-migrated partition P11 to the to-be-migrated partition P70 in the group g3 is stored in the group g2
  • the conventional data splitting method is used for statistics the second split time.
  • the technical performance (eg, merging time and splitting time) of the data migration method in the present disclosure and the conventional data migration method it can be seen that the first method of using the conventional data migration method to split the data
  • the second splitting time is 3722 seconds, and the second merging time is 1205 seconds; while the first splitting time for splitting data using the data migration method in the present disclosure is 208 seconds, and the first merging time is 238 seconds.
  • the first splitting time is much shorter than the second splitting time, and the first merging time is also much shorter than the second merging time, so that the performance of the database is improved.
  • the data in the distributed database is migrated by using the data migration method in the present disclosure, and the first splitting time and the first merging time are obtained by statistics, and the conventional data migration method is used to migrate the distributed database.
  • the data in the data is migrated, and the second split time and the second merge time are obtained by statistics.
  • FIG. 3 shows a schematic structural diagram of a data migration apparatus provided by the present disclosure.
  • the data migration apparatus may include the following modules.
  • the determining module 310 is configured to determine incremental migration data according to the global transaction identifier value in the data storage information table, wherein the global transaction identifier value is a value that increases monotonically with the processing of the business; Migrate data to migrate.
  • the incremental migration data is determined by the determination module according to the global transaction identifier value in the data storage information table. Since the global transaction identifier value is used to indicate the frequency of business processing, if the database is processing a certain business data The more frequent, the faster the global transaction identifier value will increase, so that the data corresponding to the global transaction identifier value becomes incremental migration data. Due to business requirements, the incremental migration data needs to be migrated using the migration module, so that the data can be quickly migrated to the required devices, reducing data migration time, improving data migration efficiency, and improving user experience.
  • each module involved in this embodiment is a logical module.
  • a logical unit may be a physical unit, a part of a physical unit, or multiple physical units.
  • a composite implementation of the unit in order to highlight the innovative part of the present disclosure, the present embodiment does not introduce units that are not closely related to solving the technical problems raised by the present disclosure, but this does not mean that there are no other units in the present embodiment.
  • FIG. 4 shows a block diagram of the composition of the data migration system in the present disclosure.
  • the data migration system includes: an application unit 410, a control unit 420 and a service processing unit 430, wherein the service processing unit 430 includes N groups, and each group includes a control component and a data processing component, for example , group 1 includes a data processing component 432-1 and a file generation component 433-1, group 2 includes a data processing component 432-2 and a file generation component 433-2, ..., group N includes a data processing component 432-N and a file generation component Assembly 433-N. Wherein, N is an integer greater than or equal to 1.
  • the application unit 410 can be implemented by using a metadata management (Oracle Metadata Management, OMM) component, and a database administrator (Database Administrator, DBA) can formulate data migration according to data storage requirements through a management page in the OMM component strategy Rk, and send the data migration strategy to the control unit 420.
  • the control unit 420 may be implemented using a metadata manager (Metadata Server, MDS). MDS is mainly used to query the partition addresses and group addresses of data stored in the data storage information table, and then filter the partition addresses of empty partitions that do not store data.
  • MDS Metadata Server
  • the control component 431 in the service processing unit 430 is configured to receive the decision issued by the MDS component, and forward it to each group (for example, group 1, group 2, ..., group N, etc.) managed by the control component 431 through multicast, And feedback the processing results of each group to the MDS component.
  • group for example, group 1, group 2, ..., group N, etc.
  • the data processing component 432-1 in group 1 first determines the incremental migration data according to the incremental data migration decision, and executes the incremental data migration process on the incremental data migration decision.
  • FIG. 5 shows a flow chart of the working method of the data migration system in the present disclosure.
  • the business processing unit 430 uses a distributed database for data storage, the distributed database is divided into M partitions according to the primary key value, and each partition is provided with a unique partition address, for example, partition P 1.
  • N is an integer greater than or equal to 1
  • M is an integer greater than or equal to 1; then the primary key key value, global transaction identification value, each partition address and each grouping address are recorded in the data storage information table.
  • Migrating data can include three stages, as shown in Figure 5, including the following steps.
  • the first stage full data migration, including steps 501 to 508 .
  • step 501 the application unit 410 sends the data migration strategy formulated by the DBA according to the data storage requirements to the control unit 420.
  • the group (source group, gs) is migrated to the destination group (gd).
  • the partition ID and group ID where valid data is located are displayed, and the partition ID or group ID with empty data is filtered out.
  • step 502 the control unit 420 forwards the data migration policy Rk to the control component 431 in the service processing unit 430 .
  • the control component 431 first parses the data migration policy Rk, and obtains the information therein, for example, the partition identifier Pk where the data to be migrated is located, the source packet gs and the destination packet gd. Wait. Then, the parsed information is distributed to each group by multicast, so that the data processing component in the corresponding group can migrate the data in the group.
  • step 503 the service processing unit 430 processes the data to be migrated and generates a data transmission file.
  • the data processing component in the service processing unit 430 obtains the GTID list and the maximum value (maxgtid) by querying the data storage information table, takes the minimum value in the GTID list, and records the minimum value as the minimum full load ID (full_min_gtid).
  • the data processing component obtains and exports the data to be migrated according to the full_min_gtid and GTID list. For example, it can be obtained using a filter statement in a database. Then, the data to be migrated is sent to the file generation component, and the data transmission file Ts.k is generated.
  • step 504 the service processing unit 430 reports the data export result (ie, the data transmission file Ts.k) and the full_min_gtid to the control unit 420.
  • step 505 the control unit 420 reports the data export result to the application unit 420.
  • step 506 is performed.
  • step 506 the control unit 420 issues a data import decision to the service processing unit 430 .
  • step 507 after receiving the data import decision, the control component in the service processing unit 430 forwards the data import decision to gd in the form of multicast.
  • gd parses the data transmission file Ts.k, imports the data to be migrated, and then, when the import is completed, feeds back the data import result to the control component.
  • step 508 the control component in the service processing unit 430 feeds back the data import result to the control unit 420 . At this point, the full data migration is completed.
  • the second stage incremental data migration, that is, updating the hotspot data in the specified partition. Steps 509 to 521 are included.
  • min_gtid represents the GTID value of gs.
  • min_gtid is equal to full_min_gtid.
  • step 510 the service processing unit 430 migrates the incremental migration data according to the incremental data migration decision Hk.
  • the control component 431 sends the incremental data migration decision Hk to the gs in a multicast mode, and after receiving the incremental data migration decision Hk, the gs records the GTID list and maxgtid in the data storage information table on the current grouping database, and takes The smallest GTID value in the GTID list is the smallest incremental value (ie chase_min_gtid_i), where i may be an integer greater than 1. Then, the data whose GTID value is greater than or equal to min_gtid and less than chase_min_gtid_i in the Pk partition is used as incremental migration data, and the incremental migration data is derived. For example, it can be obtained using a filter statement in a database.
  • step 511 the business processing unit 430 forwards the export result, import result and information ⁇ gs, chase_min_gtid_i ⁇ of the incremental migration data to the control unit 420.
  • step 512 after receiving the export result, import result and information ⁇ gs, chase_min_gtid_i ⁇ of the incremental migration data fed back by the service processing unit 430, the control unit 420 updates min_gtid to chase_min_gtid_i. Then, the control unit 420 determines whether the incremental data needs to be migrated again according to the dynamic processing requirements of the service, and if necessary, repeatedly executes steps 509 to 511 (for example, executes steps 513 to 515 ). Steps 513 and 515 in FIG. 5 have the same implementation methods as steps 509 to 511, and are not described again here.
  • each group derives the corresponding incremental migration data according to the latest chase_min_gtid_i until the incremental data migration is no longer required. .
  • step 516 the control unit 420 reports the migration result of the incremental migration data to the application unit 410 .
  • step 517 the control unit 420 counts the migration time of the single incremental migration data, that is, the maximum time Tuse processed by each component, and compares Tuse with the preset time threshold Tpre (for example, 1 minute or 15 minutes, etc.), when When Tuse is less than or equal to Tpre, the last incremental data migration is performed, that is, steps 518 to 520 are executed,
  • Tpre the preset time threshold
  • step 519 the service processing unit 430 migrates the incremental migration data according to the last incremental data migration decision Hk.
  • step 520 the service processing unit 430 forwards the last incremental migration data export result, import result and information ⁇ gs, chase_min_gtid_n ⁇ to the control unit 420. Then, stop the migration of the incrementally migrated data.
  • step 521 the control unit 420 reports a migration completion message to the application unit 410, that is, the migration of the incremental migration data has been completed.
  • the third stage the table defines the switching process, that is, the data in the source storage address (including the partition address and the grouping address) is cleared. Steps 522 to 527 are included.
  • step 523 the service processing unit 430 cleans the data in the source storage address according to the cleanup decision Ck.
  • the business processing unit 430 forwards it to the gs in the form of multicast, and the gs clears the data on the partition Pk in the local data storage information table according to the clearing decision Ck, and then, The gs feeds back the cleaning result to the control component 431 .
  • step 524 the service processing unit 430 forwards the cleaning result to the control unit 420.
  • step 525 the control unit 420 updates the data storage information table according to the migrated target storage address (eg, the address of gd).
  • FIG. 6 shows a schematic diagram of migrating data in the partition to be migrated in the data migration system of the present disclosure.
  • the database administrator sets the classification policy (R); the database administrator sends the classification policy (R) to the control unit 420, and then the control unit 420 parses the classification rule (R) to obtain and The analysis result is sent to the service processing unit 430 .
  • the classification strategy (R) includes the following migration rules: 1) P5: g1 g5; 2) P6: g2 g5; 3) P7: g2 g3; 4) P12: g3 g5; 5) P17-P18: g4 g5.
  • P5, P6, P7 and P12, and P17-P18 all indicate the partition to be migrated.
  • the data in the fifth partition P5 can be divided by The source group g1 is migrated to the target group g5; according to the second rule in the classification rule (R), the data in the sixth partition P6 is migrated from the source group g2 to the target group g5; according to the classification rule (R)
  • the third rule in migrate the data in the seventh partition P7 from the source group g2 to the target group g3; according to the fourth rule in the classification rule (R), move the data in the 12th partition P12 , migrate from the source group g3 to the target group g5; according to the fifth rule in the classification rule (R), the data in the 17th partition P17 and the 18th partition P18 are migrated from the source group g4 to the target group g5 superior.
  • the storage pressure of the source group can be relieved, the online service processing capability of each source group can be improved, and the service processing does not need to be interrupted , it can automatically migrate the data to be migrated, speed up the data processing speed, improve the processing efficiency of the distributed database, and improve the user experience.
  • FIG. 7 shows a schematic diagram of migrating incremental migration data in the data migration system according to an embodiment of the present disclosure.
  • the OMM component sends the data migration strategy to the MDS component.
  • the MDS component obtains the incremental data migration decision by analyzing the data migration strategy, that is, the group g1,
  • the active data in the active data lists in the group g2, the group g3, and the group g4 are filtered to obtain incremental migration data, and the incremental migration data is migrated.
  • the MDS component sends the incremental data migration decision to the corresponding group g1, group g2, group g3, and group g4, respectively.
  • FIG. 7 can be displayed on the management page of the OMM component, so as to facilitate the database administrator to dynamically obtain the data migration situation of each group.
  • the maximum GTID value in all four groups is 308555713.
  • the group g1 stores multiple pieces of data with GTID values of 1001, 1004, 1002, and 1003. Among them, the data with the GTID value of 1004 and the data with the GTID value of 1003 are active data. Therefore, the active data list of the group g1 (that is, mgtidlist) includes data with a GTID value of 1004 and data with a GTID value of 1003; and, the GTID value of 1003 is the smallest value in the active data list.
  • the active data list of group g2 includes data with a GTID value of 3001, data with a GTID value of 2308, and GTID Data with a value of 1003.
  • the incremental migration data obtained from the packet g2 is empty, that is, there is no incremental migration data in the packet g2 that can be migrated.
  • the data stored in the group g3 includes multiple pieces of data with GTID values of 8802, 7045, 2331, and 6665.
  • data with a GTID value of 8802, data with a GTID value of 7045, data with a GTID value of 2331, and data with a GTID value of 6665 are active data. Therefore, the active data list of group g3 includes data with a GTID value of 8802, Data with a GTID value of 7045, data with a GTID value of 2331, and data with a GTID value of 6665.
  • the incremental migration data obtained from the packet g3 is empty, that is, there is no incremental migration data in the packet g3 that can be migrated.
  • Incremental data migration decision that is, the transaction corresponding to data whose GTID value is less than 308555713 needs to be derived
  • the incremental migration data obtained from group g4 is empty, that is, there is no incremental migration data in group g4 that can be migrated.
  • the active data in each group can be processed quickly, the data processing time is shortened, and the optimization of The system performance of distributed database improves product competitiveness.
  • graphically displaying the data storage status of each group on the management page of the OMM component you can intuitively view the data migration changes of each group, simplifying the system complexity of the distributed database and reducing operation and maintenance costs.
  • FIG. 8 shows a block diagram of an exemplary hardware architecture of a computing device capable of implementing methods and apparatuses according to embodiments of the present disclosure.
  • the computing device 800 includes an input device 801 , an input interface 802 , a central processing unit 803 , a memory 804 , an output interface 805 , and an output device 806 .
  • the input interface 802, the central processing unit 803, the memory 804, and the output interface 805 are connected to each other through the bus 807, and the input device 801 and the output device 806 are respectively connected to the bus 807 through the input interface 802 and the output interface 805, and then to the computing device 800. connections to other components.
  • the input device 801 receives input information from the outside, and transmits the input information to the central processing unit 803 through the input interface 802; the central processing unit 803 processes the input information based on the computer-executable instructions stored in the memory 804 to generate output information, temporarily or permanently store the output information in the memory 804, and then transmit the output information to the output device 806 through the output interface 805; the output device 806 outputs the output information to the outside of the computing device 800 for the user to use.
  • the computing device shown in FIG. 8 may be implemented as a network device, and the network device may include: a memory configured to store a program; a processor configured to execute the program stored in the memory to Execute the data migration method described in the above embodiment.
  • the data migration method, device, network device and storage medium provided by the present disclosure determine incremental migration data according to the global transaction identifier value in the data storage information table. Since the global transaction identifier value is used to indicate the frequency of business processing, if As the database processes certain business data more and more frequently, the global transaction identifier value will increase faster, so that the data corresponding to the global transaction identifier value becomes incremental migration data. Due to business requirements, the incremental migration data needs to be migrated, so that the data can be quickly migrated to the required devices, reducing the data migration time, improving the data migration efficiency, and improving the user experience.
  • various embodiments of the present disclosure may be implemented in hardware or special purpose circuits, software, logic, or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor or other computing device, although the present disclosure is not limited thereto.
  • Embodiments of the present disclosure may be implemented by the execution of computer program instructions by a data processor of a mobile device, eg, in a processor entity, or by hardware, or by a combination of software and hardware.
  • Computer program instructions may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code written in any combination of one or more programming languages or object code.
  • ISA instruction set architecture
  • the block diagrams of any logic flow in the figures of the present disclosure may represent program steps, or may represent interconnected logic circuits, modules, and functions, or may represent a combination of program steps and logic circuits, modules, and functions.
  • Computer programs can be stored on memory.
  • the memory may be of any type suitable for the local technical environment and may be implemented using any suitable data storage technology, such as but not limited to read only memory (ROM), random access memory (RAM), optical memory devices and systems (Digital Versatile Discs). DVD or CD disc) etc.
  • Computer-readable media may include non-transitory storage media.
  • the data processor may be of any type suitable for the local technical environment, such as, but not limited to, a general purpose computer, special purpose computer, microprocessor, digital signal processor (DSP), application specific integrated circuit (ASIC), programmable logic device (FGPA) and processors based on multi-core processor architectures.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FGPA programmable logic device

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据迁移方法、装置、网络设备和存储介质,其中,该数据迁移方法包括:依据数据存储信息表中的全局事务标识值,确定增量迁移数据(110),其中,全局事务标识值用于指示业务处理的频率;对增量迁移数据进行迁移(120)。

Description

数据迁移方法、装置、网络设备和存储介质
相关申请的交叉引用
本申请要求2020年6月28日提交给中国专利局的第202010602593.4号专利申请的优先权,其全部内容通过引用合并于此。
技术领域
本公开涉及但不限于数据库管理技术领域。
背景技术
随着分布式数据库、高速网络以及虚拟化技术的不断演进,使得云技术得到广泛应用,例如,云桌面,云存储和云游戏等。由于各行业的业务扩展需求不断变化,导致业务数据随着时间不断累积,在采用现有分布式数据库对累积的业务数据进行处理时,会出现分布式数据库性能不足的问题。
发明内容
一方面,本公开提供一种数据迁移方法,包括:依据数据存储信息表中的全局事务标识值,确定增量迁移数据,其中,全局事务标识值用于指示业务处理的频率;对增量迁移数据进行迁移。
另一方面,本公开提供一种数据迁移装置,包括:确定模块,配置为依据数据存储信息表中的全局事务标识值,确定增量迁移数据,其中,全局事务标识值用于指示业务处理的频率;迁移模块,配置为对增量迁移数据进行迁移。
另一方面,本公开提供一种网络设备,包括:一个或多个处理器;存储器,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现本公开中的任意一种数据迁移方法。
另一方面,本公开提供了一种存储介质,存储介质存储有计算 机程序,计算机程序被处理器执行时实现本公开中的任意一种数据迁移方法。
附图说明
图1示出本公开的数据迁移方法的流程示意图。
图2示出本公开提供的数据迁移方法的流程示意图。
图3示出本公开提供的数据迁移装置的结构示意图。
图4示出本公开中的数据迁移系统的组成框图。
图5示出本公开中的数据迁移系统的工作方法流程图。
图6示出本公开中的数据迁移系统中对待迁移分区中的数据进行迁移的示意图。
图7示出本公开中的数据迁移系统中对增量迁移数据进行迁移的示意图。
图8示出能够实现根据本公开的方法和装置的计算设备的示例性硬件架构的结构图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚明白,下文中将结合附图对本公开的实施方式进行详细说明。需要说明的是,在不冲突的情况下,本公开中的实施方式及实施方式中的特征可以相互任意组合。
随着分布式数据库、高速网络以及虚拟化技术的不断演进,使得云技术得到广泛应用,例如,云桌面,云存储和云游戏等。由于各行业的业务扩展需求不断变化,导致业务数据随着时间不断累积,在采用现有分布式数据库对累积的业务数据进行处理时,会出现分布式数据库性能不足的问题。
现有技术中,可以通过扩展数据存储服务器或对累积的数据进行重分布来解决,其中,对数据进行重分布可以包括离线重分布方式和在线重分布方式。但是,采用以上重分布方式需要耗费大量的时间用以处理数据,导致用户体验度差。
在相同的生产环境下,离线重分布方式是先对整体数据进行迁移,然后再对迁移数据进行重新部署,虽然节省了对新增热点数据的处理时间,但需要中断指定表的现有业务;在线重分布方式允许在不中断业务处理的情况下,对整体数据进行迁移,然后再对迁移后的数据进行部署。但是,采用无论采用以上哪种重分布方式对数据进行迁移,都需要耗费大量的时间用以处理整体数据,导致数据的迁移效率低下;并且,还需要额外的存储资源以存储中间临时数据,使得存储资源造成不必要的浪费。
本公开提供数据迁移方法、装置、网络设备和存储介质,至少用于解决在对分布式数据库中的数据进行迁移时的迁移效率低下的问题。
图1是本公开的数据迁移方法的流程示意图,该方法可应用于数据迁移装置。如图1所示,所述方法可以包括如下步骤110和120。
在步骤110,依据数据存储信息表中的全局事务标识值,确定增量迁移数据。
其中,全局事务标识(Global Transaction Identifiers,GTID)值用于指示业务处理的频率。例如,当某个业务被频繁执行时,该业务对应的全局事务标识值会随着业务的执行次数逐渐递增,当该业务对应的GTID值超过预设范围时,则表示该业务是热点业务。
在一些实施方式中,可通过如下方式实现步骤110:依据预设标识范围和数据存储信息表中的全局事务标识值,确定活跃数据列表,其中,活跃数据列表包括活跃数据;依据增量数据迁移决策对活跃数据进行筛选,确定增量迁移数据,其中,增量数据迁移决策是依据业务的动态处理需求确定的决策。
需要说明的是,其中的预设标识范围是可变的,在进行第一次增量数据迁移时,需要获取记录在当前分组数据库上的GTID列表和GTID最大值(即maxgtid),然后,取GTID列表中最小的GTID值(即min_gtid),作为最小追增值(即chase_min_gtid_1),则此时的预设标识范围是:小于或等于min_gtid。在进行第二次增量数据迁移时,则将第一次增量数据迁移中的chase_min_gtid_1作为min_gtid, 第二次增量数据迁移时的预设标识范围为大于或等于chase_min_gtid_1,且,小于chase_min_gtid_2。以此类推,在进行第i次增量数据迁移时,则采用最近一次(即第i-1次增量数据迁移)中的chase_min_gtid_i-1作为min_gtid,第i次增量数据迁移时的预设标识范围为大于或等于chase_min_gtid_i-1,且,小于chase_min_gtid_i,其中,i可以是大于1的整数。然后,将数据存储信息表中的GTID值在对应预设标识范围内的所有数据作为活跃数据列表中的活跃数据。
通过增量数据迁移决策对活跃数据进行筛选,确定增量迁移数据,使得能够将业务对应的增量迁移数据进行快速的迁移,以满足业务的动态处理需求,缩短在线数据迁移的处理时间,优化分布式数据库的系统性能。
在步骤120,对增量迁移数据进行迁移。
在一些实施方式中,可通过如下方式实现步骤120:获取增量迁移数据在数据存储信息表中的源存储地址和主键键值,其中,源存储地址包括分区地址和分组地址;依据源存储地址和主键键值,将增量迁移数据由源存储地址迁移至目标存储地址。
例如,依据GTID值,查找数据存储信息表,可获得增量迁移数据对应的主键键值和该增量迁移数据的源存储地址(即源分区地址和源分组地址),例如,当GTID值为325时,查找数据存储信息表,可查找到增量迁移数据对应的主键键值为8,进而获得主键键值为8的增量迁移数据存储在第一分组中的第3分区中,则该增量迁移数据的源存储地址为第一分组中的第3分区。然后将第一分组中的第3分区中的该增量迁移数据取出,并发送该增量迁移数据至目标存储地址。或者,根据取出的增量迁移数据生成迁移文件,并把该迁移文件发送至目标存储地址。
在本实施方式中,通过依据数据存储信息表中的全局事务标识值,确定增量迁移数据,由于全局事务标识值用于指示业务处理的频率,若数据库在对某个业务数据的处理越来越频繁,则该全局事务标识值就会增加的越快,使得该全局事务标识值对应的数据成为增量迁 移数据。由于业务需求,需要对该增量迁移数据进行迁移,使得能够快速的将数据迁移至所需要的设备中,减少数据迁移的时间,提升对数据的迁移效率,提升用户体验度。
本公开提供了另一种可能的实现方式,其中,在步骤110之前,所述方法还包括:依据数据存储信息表中的主键键值,将数据存储空间划分为N个分组,每个分组设有唯一的分组地址,将分组划分为M个分区,每个分区设有唯一的分区地址,其中,分区上的所有数据只存放在分区对应的一个分组上,M为大于或等于1的整数,N为大于或等于1的整数;依据主键键值、全局事务标识值、分区地址和分组地址,生成数据存储信息表。
需要说明的是,其中的主键键值是分布式数据库能够查找到存储数据的唯一标识,一般可设定为大于或等于1,且,小于或等于分布式数据库可存储的数据条数的最大值,该主键键值可作为索引值,来对分布式数据库中的数据进行查找。例如,将数据存储空间划分为4个分组,即分组g1、分组g2、分组g3和分组g4,每个分组都设有唯一的分组地址。然后,在每个分组中,可划分不同个数的分区,例如,将分组g1划分为3个分区,即分区P1、分区P2和分区P3;将分组g2划分为4个分区,即分区P4、分区P5、分区P6和分区P7;将分组g3划分为5个分区,即分区P8、分区P9、分区P10、分区P11和分区P12;将分组g4划分为6个分区,即分区P13、分区P14、分区P15、分区P16、分区P17和分区P18。每个分区设有唯一的分区地址,并且,分区上的所有数据只存放在该分区对应的一个分组上,例如,分区P1中的数据至存放在分组g1中,无法将该分区P1中的数据同时存放在两个不同的分组上。若分组g1和分组g2同时拥有分区P1中的数据,则会导致分区P1中的数据被截断,在对分区P1中的数据完成数据迁移时,会使得分区P1中的数据丢失。为了避免此现象的发生,只能将分区P1中的所有数据都存储在分组g1上,以保证数据的完整性。
本公开提供了另一种可能的实现方式,其中,在步骤120之后,所述方法还包括:清空源存储地址中的数据。
在本实施方式中,通过清空源存储地址中的数据,使得源存储地址能够为其他数据提供存储服务,以增加分布式数据库的存储空间,避免存储资源的浪费。
本公开提供了另一种可能的实现方式,其中,在步骤120之后,所述方法还包括:依据目标存储地址,更新数据存储信息表。
例如,在数据迁移完成后,使用目标存储地址替换增量迁移数据的源存储地址,使得能够在下一次数据迁移时,快速的找到该增量迁移数据。
在本实施方式中,通过更新数据存储信息表,使得能够为下一次数据迁移做准备,保证在进行下一次数据迁移时,能够快速的找到增量迁移数据,提升数据的处理效率,优化分布式数据库的系统性能。
本公开提供了另一种可能的实现方式,其中,在步骤120之后,所述方法还包括:统计增量迁移数据的迁移时间;依据迁移时间和预设时间阈值,停止对增量迁移数据的迁移。
其中,迁移时间是数据迁移装置处理增量迁移数据的最大时间,预设时间阈值可以是1分钟、5分钟等时间长度,若迁移时间小于或等于5分钟时,则停止对增量迁移数据的迁移。以上对于预设时间阈值仅是举例说明,具体实现时可根据具体情况具体设定。其他未说明的预设时间阈值也在本公开的保护范围之内,在此不再赘述。
在本实施方式中,通过统计增量迁移数据的迁移时间,可以确定增量迁移数据在迁移的过程中是否顺利,通过对比迁移时间和预设时间阈值,使得能够确定是否结束数据的迁移过程。保证增量迁移数据能够顺利的完成数据迁移,以满足业务的处理需求,使得能够缩短在线数据迁移的处理时间,优化分布式数据库的系统性能。
图2示出本公开提供的数据迁移方法的流程示意图,该方法可应用于数据迁移装置。如图2所示,所述方法可以包括如下步骤210至240。
在步骤210,获取数据迁移策略。
其中,数据迁移策略是数据库管理员依据数据存储需求确定的 策略。例如,当分布式数据库中的第一分区中的存储空间无法满足数据的存储需求,但还需要将新获得的数据存储在该第一分区中时,可以将第一分区中的一部分长期不用的数据迁移至其他分区中,从而生成数据迁移策略,以使得第一分区能够获得足够的存储空间,以存储新获得的数据。
在步骤220,依据数据迁移策略,对待迁移数据进行全量迁移。
其中,待迁移数据包括第k个分区中的全部数据,k为大于或等于1,且,小于或等于M的整数。例如,待迁移数据是分布式数据库中的第一分区中的所有数据,将分布式数据库中的第一分区中的所有数据迁移至第二分区中,使得第一分区可以获得空闲的存储空间,以方便第一分区对其他急需处理的数据进行存储。
在一些实施方式中,可通过如下方式实现步骤220:依据数据迁移策略,确定待迁移数据对应的待迁移分区;将待迁移分区中的数据迁移至目标分区。
例如,数据迁移策略为数据合并策略,即需要把第一分区的数据和第二分区的数据进行合并,以生成新的数据。然后将待迁移分区(即第一分区和第二分区)中的数据迁移至目标分区(例如,第三分区)中。通过对数据进行全量迁移,使得只需要从源分组中迁移出指定的待迁移分区中的全部数据至目标分组中的目标分区中,而无需对其他未指定的数据进行数据迁移,提高了数据迁移的速度,节省数据迁移时间。
在步骤230,依据数据存储信息表中的全局事务标识值,确定增量迁移数据。
在步骤240,对增量迁移数据进行迁移。
需要说明的是,本实施方式中的步骤230~步骤240,与上述实施方式中的步骤110~步骤120相同,在此不再赘述。
在本实施方式中,通过数据迁移策略对待迁移数据进行全量迁移,使得能够对指定分区中的数据进行迁移,无需跨分组,即可实现分区之间的数据迁移。依据数据存储信息表中的全局事务标识值,确定增量迁移数据,并对增量迁移数据进行迁移,保证了在业务动态处 理的过程中,能够对增量迁移数据进行迁移,不影响业务的正常处理,保证业务处理的连续性,以缩短数据迁移的执行时间。并且,只需维护一张数据存储信息表,即可满足数据迁移的需求,降低了对硬件存储资源的要求,节约了用户成本,提高了产品竞争力。
在一些实施方式中,数据存储信息表中存储有模拟数据,采用本公开中的数据迁移方法对模拟数据进行处理,以及采用常规的数据迁移方法对模拟数据进行迁移,可获得不同的技术性能。
例如,通过如下方式来实现本公开中的数据迁移方法。依据主键键值,将数分布式数据库的存储空间格式化为128个不同范围的分区,分别为分区P1、分区P2、……、分区P128。并且,每个分区中可存储100万条数据。如表1所示,分区P1中的数据范围是[1,1000001),分区P2中的数据范围是[1000001,2000001),……,分区P127中的数据范围是[126000001,127000001),分区P128中的数据范围是[127000001,MAXVALUE),其中,MAXVALUE表示分布式数据库中的主键键值的最大值,可在数据库被创建时进行定义。
表1分区表
分区名称 划分范围
P1 [1,1000001)
P2 [1000001,2000001)
…… ……
P127 [126000001,127000001)
P128 [127000001,MAXVALUE)
将以上128个分区分为3个分组,记为分组g1,分组g2和分组g3,每个分组拥有并管理数据存储信息表中的若干个对应分区,并如表2所示,将有效数据存放在对应的分区中,例如,分组g1中的有效数据分区为分区P1~分区P10;分组g2中的有效数据分区为分区P11~分区P70;分组g3中的有效数据分区为分区P71~分区P128。数据库管理员通过对各个分区中的数据进行分析,依据数据存储需求制定数据迁移策略。
例如,如表3所示的合并迁移策略,即将分组g1中的待迁移分区P1~待迁移分区P10中的数据和分组g2中的待迁移分区P11~待迁移分区P70中的数据进行合并后,存放至分组g3中;或,如表4 所示的拆分迁移策略,将分组g3中的待迁移分区P1~待迁移分区P10中的数据存放至分组g1中,将分组g3中的待迁移分区P11~待迁移分区P70中的数据存放至分组g2中。
在执行表3中的合并迁移策略对数据进行迁移后,统计数据的第一合并时间;在执行表4中的拆分迁移策略对数据进行迁移后,统计数据的第一拆分时间。
表2分组分区表
分区名称 有效数据分区
g1 P1~P10
g2 P11~P70
g3 P71~P128
表3合并转迁移策略
Figure PCTCN2021102712-appb-000001
表4拆分迁移策略
Figure PCTCN2021102712-appb-000002
然后,使用以上相同的数据,采用常规的数据迁移方法对分布式数据库中的数据进行迁移。例如,执行数据库语言:distributed by range(key)(g3 values less than MAXVALUE),以实现表3中的数据合并,即将分组g1中的待迁移分区P1~待迁移分区P10中的数据和分组g2中的待迁移分区P11~待迁移分区P70中的数据进行合并,并统计使用该常规的数据合并方法的第二合并时间。
执行数据库语言:distributed by range(key)(g2 values less than(10000000),g1 values less than(70000000),g3 values less than MAXVALUE),以实现表4中的数据拆分,即将分组g3中的待迁移分区P1~待迁移分区P10中的数据存放至分组g1中,将分组g3中的待迁移分区P11~待迁移分区P70中的数据存放至分组g2中,并统计使用该常规的数据拆分方法的第二拆分时间。
如表5所示,将本公开中的数据迁移方法和常规的数据迁移方法的技术性能(例如,合并时间和拆分时间)进行对比可知,使用常规的数据迁移方法对数据进行拆分的第二拆分时间为3722秒,第二合并时间为1205秒;而使用本公开中的数据迁移方法对数据进行拆分的第一拆分时间为208秒,第一合并时间为238秒。第一拆分时间远小于第二拆分时间,并且第一合并时间也远小于第二合并时间,使得数据库的性能得到提升。
表5两种数据迁移方法的技术性能对比表
Figure PCTCN2021102712-appb-000003
在本实施方式中,通过使用本公开中的数据迁移方法对分布式数据库中的数据进行迁移,并统计获得第一拆分时间和第一合并时间,并使用常规的数据迁移方法对分布式数据库中的数据进行迁移,并统计获得第二拆分时间和第二合并时间。通过对比,使得能够直观的看到数据的处理时间得到大幅的降低,使得数据库的性能得到很大的提升,提升用户体验度。
下面结合附图,详细介绍根据本公开的装置。图3示出本公开提供的数据迁移装置的结构示意图。如图3所示,数据迁移装置可以包括如下模块。
确定模块310,配置为依据数据存储信息表中的全局事务标识值,确定增量迁移数据,其中,全局事务标识值是随着业务的处理单调递增的数值;迁移模块320,配置为对增量迁移数据进行迁移。
在本实施方式中,通过确定模块依据数据存储信息表中的全局事务标识值,确定增量迁移数据,由于全局事务标识值用于指示业务处理的频率,若数据库在对某个业务数据的处理越来越频繁,则该全局事务标识值就会增加的越快,使得该全局事务标识值对应的数据成为增量迁移数据。由于业务需求,需要使用迁移模块对该增量迁移数据进行迁移,使得能够快速的将数据迁移至所需要的设备中,减少数 据迁移的时间,提升对数据的迁移效率,提升用户体验度。
值得一提的是,本实施方式中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本公开的创新部分,本实施方式中并没有将与解决本公开所提出的技术问题关系不太密切的单元引入,但这并不表明本实施方式中不存在其它的单元。
图4示出本公开中的数据迁移系统的组成框图。如图4所示,该数据迁移系统包括:应用单元410、控制单元420和业务处理单元430,其中,业务处理单元430包括N个分组,每个分组中都包括控制组件和数据处理组件,例如,分组1包括数据处理组件432-1和文件生成组件433-1,分组2包括数据处理组件432-2和文件生成组件433-2,……,分组N包括数据处理组件432-N和文件生成组件433-N。其中,N为大于或等于1的整数。
在一些实施方式中,应用单元410可以使用元数据管理(Oracle Metadata Management,OMM)组件来实现,数据库管理员(Database Administrator,DBA)可通过OMM组件中的管理页面,依据数据存储需求制定数据迁移策略Rk,并将该数据迁移策略发送给控制单元420。该控制单元420可以使用元数据管理器(Metadata Server,MDS)来实现。MDS主要用于查询数据存储信息表中存放数据的各个分区地址和分组地址,然后对没有存储数据的空分区的分区地址进行过滤。接收并处理OMM组件发送的数据迁移策略Rk,并发送该数据迁移策略给业务处理单元430,同时,接收业务处理单元的反馈结果,再将该反馈结果转发给OMM组件,以使OMM组件更新数据迁移策略Rk。
业务处理单元430中的控制组件431配置为接收MDS组件下发的决策,通过组播的方式转发给控制组件431管理的各个分组(例如,分组1、分组2、……、分组N等),并将各个分组的处理结果反馈给MDS组件。例如,分组1中的数据处理组件432-1在接收到控制 组件431下发的增量数据迁移决策,首先依据该增量数据迁移决策,确定增量迁移数据,并对该增量迁移数据执行数据导出操作,然后将导出的增量迁移数据发送给分组1中的文件生成组件433-1,使得文件生成组件433-1能够依据导出的增量迁移数据,生成数据传输文件,并将该数据传输文件导入至目标存储地址,待导入操作完成后,反馈处理结果给控制组件431。
图5示出本公开中的数据迁移系统的工作方法流程图。在不影响业务的情况下,首先需要处理数据存储信息表中的指定分区的数据;然后再对该指定分区中的某些活跃数据进行处理;最后,在处理完成时,更新数据存储信息表。
需要说明的是,业务处理单元430中是采用分布式数据库进行数据存储的,该分布式数据库是依据主键键值被划分为M个分区,每个分区设有唯一的分区地址,例如,分区P 1、分区P 2、……、P M;然后,将某几个分区划分为一个分组,例如,可划分为N个分组,每个分组设有唯一的分组地址,例如,分组g 1、分组g 2、……、分组g N;一个分区内的有效数据只存放在该分区对应的一个分组中。其中,N为大于或等于1的整数,M为大于或等于1的整数;然后将主键键值、全局事务标识值、各个分区地址和各个分组地址记录到数据存储信息表。
对数据进行迁移可包括三个阶段,如图5所示,包括如下步骤。
第一阶段:全量数据迁移,包括步骤501~步骤508。
在步骤501,应用单元410将DBA依据数据存储需求制定的数据迁移策略,发送给控制单元420。
例如,DBA通过OMM组件中的管理页面,依据数据存储需求制定数据迁移策略Rk={Pk,gs,gd},即表示将数据存储信息表中的第k个分区Pk中的数据,从其源分组(source group,gs)迁移到目的分组(destination group,gd)。在管理页面中,只显示有效数据所在的分区标识和分组标识,而会过滤掉数据为空的分区标识或分组标识。
在步骤502,控制单元420转发数据迁移策略Rk给业务处理单元430中的控制组件431。
需要说明的是,控制组件431在接收到数据迁移策略Rk后,首先会对数据迁移策略Rk进行解析,获取其中的信息,例如,待迁移数据所在的分区标识Pk,源分组gs和目的分组gd等。然后,再将解析后的信息,以组播的方式分发给各个分组,以使对应的分组中的数据处理组件能够对本分组中的数据进行迁移。
在步骤503,业务处理单元430对待迁移数据进行处理,生成数据传输文件。
例如,业务处理单元430中的数据处理组件通过查询数据存储信息表,获得GTID列表和最大值(maxgtid),取GTID列表中的最小值,并将该最小值记录为最小满载ID(full_min_gtid)。数据处理组件依据full_min_gtid和GTID列表,获得导出待迁移数据。例如,可采用数据库中的筛选语句获得。然后,将该待迁移数据发送给文件生成组件,生成数据传输文件Ts.k。
在步骤504,业务处理单元430将数据导出结果(即数据传输文件Ts.k)和full_min_gtid上报给控制单元420。
在步骤505,控制单元420将数据导出结果上报给应用单元420。同时,执行步骤506。
在步骤506,控制单元420下发数据导入决策给业务处理单元430。
在步骤507,业务处理单元430中的控制组件在接收到数据导入决策后,以组播形式向gd转发该数据导入决策。gd在收到该数据导入决策后,对数据传输文件Ts.k进行解析,并将待迁移数据导入,然后,在导入完成时,向控制组件反馈数据导入结果。
在步骤508,业务处理单元430中的控制组件向控制单元420反馈数据导入结果。至此,完成全量数据迁移。
第二阶段:增量数据迁移,即更新指定分区中的热点数据。包括步骤509~步骤521。
在步骤509,控制单元420向业务处理单元430中的控制组件 431下发增量数据迁移决策Hk={Pk,gs,gd,min_gtid}。
其中,min_gtid表示gs的GTID值,在进行第一次的增量数据迁移时,min_gtid等于full_min_gtid。
在步骤510,业务处理单元430依据增量数据迁移决策Hk,对增量迁移数据进行迁移。
例如,控制组件431以组播的方式将增量数据迁移决策Hk发送给gs,gs接收到该增量数据迁移决策Hk后,记录当前分组数据库上数据存储信息表中的GTID列表和maxgtid,取GTID列表中最小的GTID值为最小追增值(即chase_min_gtid_i),其中,i可以是大于1的整数。然后,将Pk分区中GTID值大于或等于min_gtid,且,小于chase_min_gtid_i的数据作为增量迁移数据,并导出该增量迁移数据。例如,可采用数据库中的筛选语句获得。将该增量迁移数据保存至增量数据传输文件Tsql.k中。发送增量数据传输文件Tsql.k给gd,以使gd能够解析该增量数据传输文件Tsql.k,并导入增量迁移数据至gd本地数据库。gs向控制组件431反馈增量迁移数据的导出结果、导入结果和信息{gs,chase_min_gtid_i}。
在步骤511,业务处理单元430向控制单元420转发增量迁移数据的导出结果、导入结果和信息{gs,chase_min_gtid_i}。
在步骤512,控制单元420接收到业务处理单元430反馈的增量迁移数据的导出结果、导入结果和信息{gs,chase_min_gtid_i}后,将min_gtid更新为chase_min_gtid_i。然后,控制单元420依据业务的动态处理需求,确定是否需要再次进行增量数据的迁移,若需要重复执行步骤509~步骤511(例如,执行步骤513~步骤515)。图5中的步骤513步骤515与步骤509~步骤511的实施方法相同,在此不再赘述。
需要说明的是,对于增量数据的迁移,从第二次进行开始,在迁移的过程中,各分组都依据最近一次的chase_min_gtid_i导出对应增量迁移数据,直至不再需要进行增量数据的迁移。
在步骤516,控制单元420向应用单元410上报增量迁移数据的迁移结果。
在步骤517,控制单元420统计单次增量迁移数据的迁移时间,即各组件处理的最大时间Tuse,并将Tuse与预设时间阈值Tpre(例如,1分钟或15分钟等)进行对比,当Tuse小于或等于Tpre时,则进行最后一次的增量数据迁移,即执行步骤518~步骤520,
在步骤518,控制单元420下发最后一次的增量数据迁移决策Hk={Pk,gs,gd,chase_min_gtid_n}给业务处理单元430。
在步骤519,业务处理单元430依据最后一次的增量数据迁移决策Hk,对增量迁移数据进行迁移。
在步骤520,业务处理单元430向控制单元420转发最后一次的增量迁移数据的导出结果、导入结果和信息{gs,chase_min_gtid_n}。然后,停止对增量迁移数据的迁移。
在步骤521,控制单元420向应用单元410上报迁移完成消息,即已完成对增量迁移数据的迁移。
第三阶段:表定义切换过程,即清空源存储地址(包括分区地址和分组地址)中的数据。包括步骤522~步骤527。
在步骤522,控制单元420下发清理决策Ck={Pk,gs}给业务处理单元430。
在步骤523,业务处理单元430依据清理决策Ck,对源存储地址中的数据进行清理。
例如,业务处理单元430收到清理决策Ck后,以组播的形式将该转发给gs,gs依据该清理决策Ck将本地的数据存储信息表中的分区Pk上的数据进行清空处理,然后,gs向控制组件431反馈清理结果。
在步骤524,业务处理单元430向控制单元420转发清理结果。
在步骤525,控制单元420依据迁移后的目标存储地址(例如,gd的地址),更新数据存储信息表。
例如,将原先存储到分组gs中的Pk分区的数据,转发至新的分组gd,将更新后的数据存储信息表记为Dt1={Pk,gd}。
在步骤526,控制单元420向应用单元410反馈清理结果,即将更新后的数据存储信息表Dt1={Pk,gd}发送给应用单元410。
在步骤527,应用单元410接收并通过管理页面显示更新后的数据存储信息表Dt1={Pk,gd}。
在本实施方式中,在不影响业务处理的情况下,通过全量数据迁移、增量数据迁移和表定义切换过程,使得能够对分布式数据中的变化的数据,或,增量数据进行迁移,以满足业务的处理需求,使得能够缩短在线数据迁移的处理时间,极大优化了分布式数据库的系统性能。同时,在数据迁移的过程中,只需要维护一张数据存储信息表即可,无需其他的中间临时表,避免存储空间的浪费。
图6示出本公开中的数据迁移系统中对待迁移分区中的数据进行迁移的示意图。如图6所示,数据库管理员设定分类策略(R);数据库管理员将分类策略(R)下发至控制单元420,然后,控制单元420对该分类规则(R)进行解析,获得并将解析结果发送至业务处理单元430中。例如,分类策略(R)包括如下迁移规则:1)P5:g1 g5;2)P6:g2 g5;3)P7:g2 g3;4)P12:g3 g5;5)P17-P18:g4 g5。其中,P5、P6、P7和P12,以及P17-P18都表示待迁移分区。
如图6所示,由于新建立了分组g5,可以为分布式数据库系统分担一些存储压力,故可根据分类规则(R)中的第一条规则,将第5个分区P5中的数据,由源分组g1迁移至目标分组g5上;根据分类规则(R)中的第2条规则,将第6个分区P6中的数据,由源分组g2迁移至目标分组g5上;根据分类规则(R)中的第3条规则,将第7个分区P7中的数据,由源分组g2迁移至目标分组g3上;根据分类规则(R)中的第4条规则,将第12个分区P12中的数据,由源分组g3迁移至目标分组g5上;根据分类规则(R)中的第5条规则,将第17个分区P17和第18个分区P18中的数据,由源分组g4迁移至目标分组g5上。
在本实施方式中,通过将不同分区中的数据由源分组迁移至新建立的分组上,使得能够缓解源分组的存储压力,提升各个源分组的在线处理业务的能力,使得无需中断业务的处理,就能够自动对待迁移的数据进行迁移,加快数据的处理速度,提升分布式数据库的处理 效率,提升用户体验度。
图7示出本公开实施方式中的数据迁移系统中对增量迁移数据进行迁移的示意图。如图7所示,OMM组件将数据迁移策略发送给MDS组件,MDS组件在接收到数据迁移策略后,通过对该数据迁移策略的解析,获得增量数据迁移决策,即需要分别对分组g1、分组g2、分组g3和分组g4中的活跃数据列表中的活跃数据进行筛选,以获得增量迁移数据,并对该增量迁移数据进行迁移。然后,MDS组件将增量数据迁移决策分别发送至对应的分组g1、分组g2、分组g3和分组g4中。
需要说明的是,图7可以显示在OMM组件的管理页面中,以方便数据库管理员动态的获得各个分组的数据迁移情况。
如图7所示,四个分组中的最大GTID值均是308555713。分组g1中存储有GTID值为1001、1004、1002和1003等多条数据,其中,GTID值为1004的数据和GTID值为1003的数据是活跃数据,因此,分组g1的活跃数据列表(即,mgtidlist)包括GTID值为1004的数据和GTID值为1003的数据;并且,GTID值为1003是活跃数据列表中的最小值。因此,在获取分组g1中的增量迁移数据时,依据增量数据迁移决策,需要导出GTID值小于1003的数据对应的事务,即GTID值为1001的数据和GTID值为1002的数据。
而对于分组g2,由于该分组g2上存储的数据包括GTID值为1004、3001、2308和1003等多条数据。其中,GTID值为3001的数据、GTID值为2308的数据和GTID值为1003的数据是活跃数据,因此,分组g2的活跃数据列表包括GTID值为3001的数据、GTID值为2308的数据和GTID值为1003的数据。在获取分组g2中的增量迁移数据时,依据增量数据迁移决策(即需要导出GTID值小于1003的数据对应的事务),由于分组g2中的活跃数据列表中没有GTID值小于1003的数据,因此,从分组g2中获得的增量迁移数据为空,即分组g2中没有增量迁移数据可被迁移。
对于分组g3,由于该分组g3上存储的数据包括GTID值为8802、 7045、2331和6665等多条数据。其中,GTID值为8802的数据、GTID值为7045的数据、GTID值为2331的数据、和GTID值为6665的数据是活跃数据,因此,分组g3的活跃数据列表包括GTID值为8802的数据、GTID值为7045的数据、GTID值为2331的数据、和GTID值为6665的数据。在获取分组g3中的增量迁移数据时,依据增量数据迁移决策(即需要导出GTID值小于2331的数据对应的事务),由于分组g3中的活跃数据列表中没有GTID值小于2331的数据,因此,从分组g3中获得的增量迁移数据为空,即分组g3中没有增量迁移数据可被迁移。
对于分组g4,由于该分组g4中没有存储数据,是一个空的分组,因此,该分组g4中也没有活跃数据,此时,取最大GTID值(即308555713)作为活跃数据的最小值,则依据增量数据迁移决策(即需要导出GTID值小于308555713的数据对应的事务),因此,从分组g4中获得的增量迁移数据为空,即分组g4中没有增量迁移数据可被迁移。
在本实施方式中,通过依据增量数据迁移决策,并行的对不同分组中的活跃数据列表进行筛选,使得能够快速的对各个分组中的活跃数据进行处理,缩短了数据的处理时间,优化了分布式数据库的系统性能,提升了产品竞争力。并且,通过在OMM组件的管理页面中图形化的显示各个分组的数据存储情况,可以直观的查看到各个分组的数据迁移变化,简化了分布式数据库的系统复杂度,降低了运维成本。
需要明确的是,本公开并不局限于上文实施方式中所描述并在图中示出的特定配置和处理。为了描述的方便和简洁,这里省略了对已知方法的详细描述,并且上述描述的系统、模块和单元的具体工作过程,可以参考前述方法实施方式中的对应过程,在此不再赘述。
图8示出能够实现根据本公开实施方式的方法和装置的计算设备的示例性硬件架构的结构图。
如图8所示,计算设备800包括输入设备801、输入接口802、 中央处理器803、存储器804、输出接口805、以及输出设备806。其中,输入接口802、中央处理器803、存储器804、以及输出接口805通过总线807相互连接,输入设备801和输出设备806分别通过输入接口802和输出接口805与总线807连接,进而与计算设备800的其他组件连接。
具体地,输入设备801接收来自外部的输入信息,并通过输入接口802将输入信息传送到中央处理器803;中央处理器803基于存储器804中存储的计算机可执行指令对输入信息进行处理以生成输出信息,将输出信息临时或者永久地存储在存储器804中,然后通过输出接口805将输出信息传送到输出设备806;输出设备806将输出信息输出到计算设备800的外部供用户使用。
在一个实施方式中,图8所示的计算设备可以被实现为一种网络设备,该网络设备可以包括:存储器,被配置为存储程序;处理器,被配置为运行存储器中存储的程序,以执行上述实施方式描述的数据迁移方法。
本公开提供的数据迁移方法、装置、网络设备和存储介质,通过依据数据存储信息表中的全局事务标识值,确定增量迁移数据,由于全局事务标识值是用于指示业务处理的频率,若数据库在对某个业务数据的处理越来越频繁,则该全局事务标识值就会增加的越快,使得该全局事务标识值对应的数据成为增量迁移数据。由于业务需求,需要对该增量迁移数据进行迁移,使得能够快速的将数据迁移至所需要的设备中,减少数据迁移的时间,提升对数据的迁移效率,提升用户体验度。
一般来说,本公开的多种实施方式可以在硬件或专用电路、软件、逻辑或其任何组合中实现。例如,一些方面可以被实现在硬件中,而其它方面可以被实现在可以被控制器、微处理器或其它计算装置执行的固件或软件中,尽管本公开不限于此。
本公开的实施方式可以通过移动装置的数据处理器执行计算机程序指令来实现,例如在处理器实体中,或者通过硬件,或者通过软件和硬件的组合。计算机程序指令可以是汇编指令、指令集架构(ISA) 指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码。
本公开附图中的任何逻辑流程的框图可以表示程序步骤,或者可以表示相互连接的逻辑电路、模块和功能,或者可以表示程序步骤与逻辑电路、模块和功能的组合。计算机程序可以存储在存储器上。存储器可以具有任何适合于本地技术环境的类型并且可以使用任何适合的数据存储技术实现,例如但不限于只读存储器(ROM)、随机访问存储器(RAM)、光存储器装置和系统(数码多功能光碟DVD或CD光盘)等。计算机可读介质可以包括非瞬时性存储介质。数据处理器可以是任何适合于本地技术环境的类型,例如但不限于通用计算机、专用计算机、微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、可编程逻辑器件(FGPA)以及基于多核处理器架构的处理器。
通过示范性和非限制性的示例,上文已提供了对本公开的示范实施方式的详细描述。但结合附图和权利要求来考虑,对以上实施方式的多种修改和调整对本领域技术人员来说是显而易见的,但不偏离本公开的范围。因此,本公开的恰当范围将根据权利要求确定。

Claims (12)

  1. 一种数据迁移方法,包括:
    依据数据存储信息表中的全局事务标识值,确定增量迁移数据,其中,所述全局事务标识值用于指示业务处理的频率;
    对所述增量迁移数据进行迁移。
  2. 根据权利要求1所述的方法,其中,所述依据数据存储信息表中的全局事务标识值,确定增量迁移数据,包括:
    依据预设标识范围和所述数据存储信息表中的全局事务标识值,确定活跃数据列表,其中,所述活跃数据列表包括活跃数据;
    依据增量数据迁移决策对所述活跃数据进行筛选,确定所述增量迁移数据,其中,所述增量数据迁移决策是依据业务的动态处理需求确定的决策。
  3. 根据权利要求2所述的方法,其中,所述对所述增量迁移数据进行迁移,包括:
    获取所述增量迁移数据在所述数据存储信息表中的源存储地址和主键键值,其中,所述源存储地址包括分区地址和分组地址;
    依据所述源存储地址和所述主键键值,将所述增量迁移数据由所述源存储地址迁移至目标存储地址。
  4. 根据权利要求3所述的方法,其中,在所述对所述增量迁移数据进行迁移的步骤之后,所述方法还包括:
    清空所述源存储地址中的数据。
  5. 根据权利要求3所述的方法,其中,在所述对所述增量迁移数据进行迁移的步骤之后,所述方法还包括:
    依据所述目标存储地址,更新所述数据存储信息表。
  6. 根据权利要求1至5中任一项所述的方法,其中,在所述对所述增量迁移数据进行迁移的步骤之后,所述方法还包括:
    统计所述增量迁移数据的迁移时间;
    依据所述迁移时间和预设时间阈值,停止对所述增量迁移数据的迁移。
  7. 根据权利要求1所述的方法,其中,在所述依据数据存储信息表中的全局事务标识值,确定增量迁移数据的步骤之前,所述方法还包括:
    依据所述数据存储信息表中的主键键值,将数据存储空间划分为N个分组,每个所述分组设有唯一的分组地址,其中,N为大于或等于1的整数;
    将所述分组划分为M个分区,每个分区设有唯一的分区地址,其中,所述分区上的所有数据只存放在所述分区对应的一个分组上,M为大于或等于1的整数;
    依据所述主键键值、所述全局事务标识值、所述分区地址和所述分组地址,生成所述数据存储信息表。
  8. 根据权利要求7所述的方法,其中,在所述依据数据存储信息表中的全局事务标识值,确定增量迁移数据的步骤之前,所述方法还包括:
    获取数据迁移策略,其中,所述数据迁移策略是数据库管理员依据数据存储需求确定的策略;
    依据所述数据迁移策略,对待迁移数据进行全量迁移,其中,所述待迁移数据包括第k个分区中的全部数据,k为大于或等于1,且,小于或等于M的整数。
  9. 根据权利要求8所述的方法,其中,所述依据所述数据迁移策略,对所述待迁移数据进行全量迁移,包括:
    依据所述数据迁移策略,确定所述待迁移数据对应的待迁移分 区;
    将所述待迁移分区中的数据迁移至目标分区。
  10. 一种数据迁移装置,包括:
    确定模块,配置为依据数据存储信息表中的全局事务标识值,确定增量迁移数据,其中,所述全局事务标识值用于指示业务处理的频率;
    迁移模块,配置为对所述增量迁移数据进行迁移。
  11. 一种网络设备,包括:
    一个或多个处理器;
    存储器,其上存储有一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现权利要求1-9中任一项所述的方法。
  12. 一种存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-9任一项所述的方法。
PCT/CN2021/102712 2020-06-28 2021-06-28 数据迁移方法、装置、网络设备和存储介质 WO2022001942A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21832075.2A EP4170510A4 (en) 2020-06-28 2021-06-28 DATA MIGRATION METHOD AND DEVICE AS WELL AS NETWORK DEVICE AND STORAGE MEDIUM
JP2022581476A JP2023531805A (ja) 2020-06-28 2021-06-28 データ移行方法、装置、ネットワークデバイスおよび記憶媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010602593.4A CN113849476A (zh) 2020-06-28 2020-06-28 数据迁移方法、装置、网络设备和存储介质
CN202010602593.4 2020-06-28

Publications (1)

Publication Number Publication Date
WO2022001942A1 true WO2022001942A1 (zh) 2022-01-06

Family

ID=78972720

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102712 WO2022001942A1 (zh) 2020-06-28 2021-06-28 数据迁移方法、装置、网络设备和存储介质

Country Status (4)

Country Link
EP (1) EP4170510A4 (zh)
JP (1) JP2023531805A (zh)
CN (1) CN113849476A (zh)
WO (1) WO2022001942A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090119476A1 (en) * 2007-11-01 2009-05-07 Verizon Business Network Services Inc. Data migration
CN104978324A (zh) * 2014-04-03 2015-10-14 腾讯科技(深圳)有限公司 一种数据处理方法和装置
US20160234304A1 (en) * 2015-02-09 2016-08-11 Fujitsu Limited Data migration apparatus and system
CN108491165A (zh) * 2018-03-27 2018-09-04 中国农业银行股份有限公司 一种用于分级存储的数据迁移方法及系统
CN110531938A (zh) * 2019-09-02 2019-12-03 广东紫晶信息存储技术股份有限公司 一种基于多维度的冷热数据迁移方法及系统
CN111104392A (zh) * 2019-12-12 2020-05-05 京东数字科技控股有限公司 一种数据库迁移方法、装置、电子设备及存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8677093B2 (en) * 2010-04-19 2014-03-18 Hitachi, Ltd. Method and apparatus to manage tier information
WO2014041664A1 (ja) * 2012-09-13 2014-03-20 富士通株式会社 情報処理システム、情報処理装置、移行制御プログラム、及び移行制御方法
US9477743B2 (en) * 2014-07-02 2016-10-25 Oracle International Corporation System and method for load balancing in a distributed system by dynamic migration
CN105574141B (zh) * 2015-12-15 2021-04-27 杭州朗和科技有限公司 一种对数据库进行数据迁移的方法和装置
CN106874389B (zh) * 2017-01-11 2023-04-07 腾讯科技(深圳)有限公司 数据的迁移方法和装置
CN107450853A (zh) * 2017-08-01 2017-12-08 郑州云海信息技术有限公司 一种数据文件读取方法及装置
CN109901786B (zh) * 2017-12-08 2021-07-16 腾讯科技(深圳)有限公司 数据迁移方法、系统、装置及计算机可读存储介质
CN110175163A (zh) * 2019-05-24 2019-08-27 江西尚通科技发展股份有限公司 基于业务功能智能解析的多库分离方法、系统及介质
CN110674108A (zh) * 2019-08-30 2020-01-10 中国人民财产保险股份有限公司 数据处理方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090119476A1 (en) * 2007-11-01 2009-05-07 Verizon Business Network Services Inc. Data migration
CN104978324A (zh) * 2014-04-03 2015-10-14 腾讯科技(深圳)有限公司 一种数据处理方法和装置
US20160234304A1 (en) * 2015-02-09 2016-08-11 Fujitsu Limited Data migration apparatus and system
CN108491165A (zh) * 2018-03-27 2018-09-04 中国农业银行股份有限公司 一种用于分级存储的数据迁移方法及系统
CN110531938A (zh) * 2019-09-02 2019-12-03 广东紫晶信息存储技术股份有限公司 一种基于多维度的冷热数据迁移方法及系统
CN111104392A (zh) * 2019-12-12 2020-05-05 京东数字科技控股有限公司 一种数据库迁移方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4170510A4 *

Also Published As

Publication number Publication date
CN113849476A (zh) 2021-12-28
JP2023531805A (ja) 2023-07-25
EP4170510A4 (en) 2024-01-17
EP4170510A1 (en) 2023-04-26

Similar Documents

Publication Publication Date Title
WO2018121120A1 (zh) 数据的迁移方法及系统
US11868315B2 (en) Method for splitting region in distributed database, region node, and system
CN110555012B (zh) 数据迁移方法及装置
US11030196B2 (en) Method and apparatus for processing join query
WO2021052169A1 (zh) 分布式数据的均衡处理方法、装置、计算终端和存储介质
EP3172682B1 (en) Distributing and processing streams over one or more networks for on-the-fly schema evolution
WO2021077741A1 (zh) 一种基因数据的查询方法、系统、设备及存储介质
CN112559473B (zh) 一种基于优先级的双路同步方法和同步系统
US20180060392A1 (en) Batch data query method and apparatus
CN112052413A (zh) Url模糊匹配方法、装置和系统
WO2022001942A1 (zh) 数据迁移方法、装置、网络设备和存储介质
WO2016175880A1 (en) Merging incoming data in a database
WO2015049734A1 (ja) 検索システム及び検索方法
KR20170130178A (ko) 분산 환경 기반 빅데이터 실시간 분석을 위한 인-메모리 db 연결 지원형 스케줄링 방법 및 시스템
WO2019174558A1 (zh) 一种数据索引方法及装置
CN111708763A (zh) 分片集群的数据迁移方法、装置和分片集群系统
CN111061719B (zh) 数据收集方法、装置、设备和存储介质
CN109739883B (zh) 提升数据查询性能的方法、装置和电子设备
JP2009065256A (ja) トラヒック情報処理装置、トラヒック情報処理方法、及び、トラヒック情報処理プログラム
Grahne et al. DFA minimization in map-reduce
JP7133037B2 (ja) メッセージ処理方法、装置およびシステム
CN113032368A (zh) 一种数据迁移方法、装置、存储介质及平台
JP4420114B2 (ja) 計数方法、計数プログラム、計数装置
CN113381945B (zh) 基于冷热分离的流量处理方法及系统
CN113297198B (zh) 数据库索引优化方法、分布式数据库查询方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21832075

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022581476

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021832075

Country of ref document: EP

Effective date: 20230118

NENP Non-entry into the national phase

Ref country code: DE