WO2021017269A1 - Data migration method and apparatus, computer device, and storage medium - Google Patents

Data migration method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2021017269A1
WO2021017269A1 PCT/CN2019/116706 CN2019116706W WO2021017269A1 WO 2021017269 A1 WO2021017269 A1 WO 2021017269A1 CN 2019116706 W CN2019116706 W CN 2019116706W WO 2021017269 A1 WO2021017269 A1 WO 2021017269A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
source database
business
database
attribute
Prior art date
Application number
PCT/CN2019/116706
Other languages
French (fr)
Chinese (zh)
Inventor
包晓华
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021017269A1 publication Critical patent/WO2021017269A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • This application relates to the field of computers, in particular to methods, devices, computer equipment and storage media for migrating data.
  • the main purpose of this application is to provide a method of migration data, aiming to solve the existing technical problem that the stable migration of data from the oracle database to the cassandra database cannot be realized.
  • This application proposes a method of migration data, including:
  • the source database is a database storing data to be migrated, and the source database includes a first index table
  • the target database is a database storing the migration data after migration
  • each piece of data is migrated from the source database to the target database in a preset migration manner.
  • This application also provides a device for migration data, including:
  • the first obtaining module is configured to obtain the business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table;
  • a dividing module configured to divide the data of the source database into a specified number of fragmented data according to a preset dividing manner according to the partition of the first index table and the business attributes of the source database;
  • the second acquisition module is configured to acquire the corresponding relationship between each of the fragmented data and the data structure in the target database, wherein the target database is a database storing the migration data after migration;
  • the migration module is configured to migrate each piece of data from the source database to the target database according to a preset migration mode according to the corresponding relationship.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method when the computer program is executed.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are realized.
  • This application implements data transfer between databases of different database types through the method of sharding data transfer, such as from a relational database to a non-relational database, and realizes the correspondence between the data stored in the two databases according to the mapping relationship of the data structure .
  • the data division standard of the first index table is selected based on the business attribute priority sorting. For example, the computer room corresponding to the source database has the highest priority, and the computer room corresponding to the source database corresponds to three partitions, and the computer room corresponds to the three partition pairs.
  • the source database is fragmented, and the data of the source database is divided into three fragmented data.
  • FIG. 1 is a schematic flowchart of a migration data method according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a migration data device according to an embodiment of the present application.
  • Fig. 3 is a schematic diagram of the internal structure of a computer device according to an embodiment of the present application.
  • a data migration method includes:
  • S1 Obtain business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table.
  • the business attributes of this embodiment include, but are not limited to: the computer room, network environment, link role of the service corresponding to the source database, professional company to which the service belongs, system to which the service belongs, service registration code, service domain name, etc.
  • the source database is a database storing data to be migrated, such as an oracle relational database.
  • the index table of the database includes index item attributes, information catalogs and address links, which is convenient for querying data in the database. The data is obtained by first traversing the index table, and then according to the address link in the index table.
  • the data in the source database is divided into pieces according to the partitions and business attributes of the first index table, and each piece of data is marked by the business attribute, so as to manage and distinguish each piece of data.
  • the above-mentioned preset division method includes forming each sub-index by partitioning the first index table, and realizing the data in the source database to be divided into pieces by sub-index to form the piece data.
  • the target database in this embodiment is a database storing the migration data after migration, such as a non-relational database cassandra, so as to meet the requirements of data migration from an oracle database to a cassandra database.
  • This embodiment divides the data in the source database into fragmented data, and realizes the data transfer from the source database to the target database by separately transferring the fragmented data, so as to avoid continuous data migration. It hinders the normal business in the system, and avoids the failure of data transfer in a local time period caused by the interference of accidental factors, which causes the entire data transfer transaction to fail.
  • the above-mentioned corresponding relationship is formed according to a preset mapping relationship, and includes an association relationship between each piece of data of the source database and the storage structure position in the target database.
  • This embodiment implements data transfer between databases of different database types by sequentially transferring fragmented data, such as from a relational database to a non-relational database, and realizes storage in the two databases according to the mapping relationship of the data structure One-to-one correspondence of data.
  • step S2 of dividing the data of the source database into a specified number of pieces of data according to a preset dividing manner according to the partition of the first index table and the business attributes of the source database includes:
  • S21 Acquire priority rankings corresponding to the business attributes of all the source databases, where the priority ranking is a ranking of priority levels from high to low.
  • S22 Select each partition corresponding to the first business attribute before the designated serial number from the priority ranking as the data division standard of the first index table, wherein the first business attribute is included in all the source databases In the business attributes of, the first business attribute includes index item attributes.
  • S24 Determine whether the data volume of the source database corresponding to each sub-index is within a preset single transmission volume.
  • the data division criteria are selected based on the priority ranking of business attributes. For example, the computer room where the service corresponding to the source database has the highest priority, and the computer room where the service corresponding to the source database is located corresponds to three partitions, and the computer room where the service is located corresponds to the three partition pairs
  • the source database is fragmented, and the data of the source database is divided into three fragmented data according to the three partitions corresponding to the service room.
  • the above-mentioned first business attribute can include multiple at the same time, such as business attribute A and business attribute B.
  • Business attribute A has three partitions A1, A2, and A3, and business attribute B has two partitions B1 and B2, so the corresponding first index
  • the number of partitions of the table's data division standard is 6, and they are sorted by priority as A1B1, A1B2, A2B1, A2B2, A3B1, and A3B2.
  • the above business attributes include the index item attributes and other attributes in the first index table, and the index item attributes are preferred for partitioning. Because the principle of index establishment is to divide the data in a balanced manner, the index item attributes are used to fragment the migration data. The obtained fragmented data is more balanced. In this embodiment, the index table is first divided into sub-indexes, and then the corresponding data is led by the sub-indexes to become fragmented data.
  • the method includes:
  • the data volume of the fragmented data is not within the single transmission volume, and the effect of a single fast transfer is not achieved. It can be added by adding other than the index item attributes Attributes, together with the attributes of index items, complete the slicing of the data in the source database, so that the data volume of the sharded data is within the preset single transmission volume, realizing rapid data transfer without affecting the normal operation of the system in the business .
  • step S21 of obtaining the respective priority rankings corresponding to the business attributes of all the source databases includes:
  • S211 According to a preset collection rule, collect a first number of data sets from the source database.
  • S214 Calculate the closeness of the aggregation result set corresponding to the business attributes of each source database to the division capability coefficient, where the aggregation result set is the aggregation classification result of the data set, and the number of aggregation result sets is equal to The number of partitions of the business attributes of each source database;
  • S215 Determine the priority sorting according to each of the proximity degrees, wherein the higher the proximity degree, the higher the priority corresponding to the aggregation result set.
  • a specified number of data sets are collected from the source database to evaluate the division capability coefficient of each business attribute, so as to realize the optimization of the fragmented data.
  • the foregoing preset collection rules include, for example, collecting a data set every specified time period, so that the collected data set is more analytically representative. For example, a total of 100 data sets are collected, 100 data sets are used as samples, and the number of aggregated result sets is equal to the number of data types divided by shards, such as the above 100
  • the data set distribution belongs to three professional companies, such as technology, property insurance, and life insurance.
  • the number of aggregated result sets is three. Among the above-mentioned three professional companies, the corresponding quantity sets of science and technology, property insurance, and life insurance are: 30, 10, and 60.
  • the above division ability coefficient is expressed as an evenly divided result set.
  • the number of data sets in the evenly divided result set is equal to the total number of collected data sets divided by the number of aggregated result sets, which is equal to 100 divided by 3 equals 33.3333, that is, the evenly divided result set data set The number is 33.3333.
  • the division ability coefficient is expressed as an evenly divided result set, and the evenly divided result set includes a data set obtained by dividing the total number of data sets collected according to a preset collection rule interval by the number of aggregate result sets, and said
  • the step S215 of determining the priority ranking by the proximity degree includes:
  • S2151 Determine whether there is a third service attribute and a fourth service attribute that have the same degree of proximity to the division capability coefficient, where the third service attribute and the fourth service attribute are included in all the service attributes of the source database.
  • the smaller the Manhattan distance the better the division ability.
  • the 100 data sets in the above example belong to 3 professional companies according to the index column index1, namely technology, property insurance, and life insurance; and the corresponding quantity sets of science and technology, property insurance, and life insurance in the 3 professional companies are: 30, 10 and 60.
  • 53.3, the above 33.3 is the number of data sets in the result set, Manhattan distance is an absolute distance.
  • the data structure in the target database is a multi-level data nesting structure
  • the step S3 of obtaining the corresponding relationship between each of the fragmented data and the data structure in the target database includes:
  • S31 Obtain the designated service attributes in the priority ranking that are invoked when the source database is divided into fragments, where the designated service attributes are included in all business attributes of the source database.
  • S32 Establish a one-to-one mapping relationship between the priority order corresponding to each of the designated business attributes and the multi-layer data nesting structure in the target database, wherein the designated business attribute with the highest priority corresponds to the multi-layer data nesting structure.
  • the outermost layer of the sleeve structure is
  • the target database in this embodiment includes non-relational databases, such as the cassandra database.
  • the data structure of the cassandra database is the primary key construction structure.
  • the data fragments determined in the data to be migrated are used as the primary key reference of the cassandra database.
  • the primary key is the structure of the data organization.
  • the data in cassandra is stored nested in the order of the primary key.
  • the primary key is: K1, K2, K3..., which can be understood as K1 data nested K2 data, K2 data nested K3 data. According to the priority order of business attributes, this embodiment is mapped to the data primary key sequence in cassandra.
  • the shard data corresponding to the business attributes with high priority is mapped to the data of the outermost nesting layer in the data in cassandra.
  • the file data is written in order, which improves the efficiency of data writing and facilitates the calling and management of data.
  • step S4 of migrating each piece of data from the source database to the target database according to a preset migration mode according to the corresponding relationship includes:
  • S43 Store the data to be migrated in the cache server, and convert the data format.
  • This embodiment implements data migration in batches and orderly by slicing the data.
  • Each batch can be performed independently without mutual dependence, and the migration volume of each batch matches the single transmission volume supported by the system. , If a single migration fails, you only need to repeat the current failed single transmission again without affecting the overall data migration effect. And through the migration in batches, the fragmented time of the system running other businesses can be used to complete the data migration, and the efficiency of the system in processing transactions is improved.
  • This embodiment judges whether the current business is at a low period or a peak period by identifying the operating load status of the system.
  • the business flow is within a preset threshold, indicating that it is in the low business period, and the data migration thread is started to perform data migration; the business flow is not at the preset threshold Inside, it means that in the peak period of business, the data migration thread is suspended to terminate the data migration. It not only supports the control of migration tasks according to business attributes, but also ensures the complete migration of business data by category, and avoids business peaks. After the fragmented data is formed in this embodiment, threads can be started to start data migration.
  • the data is queried and cut out from the oracle database according to the fragmentation information, and the cut data results are stored in the cache server, and then formatted, and then injected into the cassandra database; in this way, one fragment of data is divided into one.
  • the slice data is processed in stages through the same thread. Different fragmented data can also be allocated to different threads for parallel processing to improve the efficiency of fragment migration, but the number of threads running in parallel needs to be reasonably controlled to avoid overloading the databases at both ends.
  • a data migration device includes:
  • the first obtaining module 1 is configured to obtain the business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table.
  • the business attributes of this embodiment include, but are not limited to: the computer room, network environment, link role of the service corresponding to the source database, professional company to which the service belongs, system to which the service belongs, service registration code, service domain name, etc.
  • the source database is a database storing data to be migrated, such as an oracle relational database.
  • the index table of the database includes index item attributes, information catalogs and address links, which is convenient for querying data in the database. The data is obtained by first traversing the index table, and then according to the address link in the index table.
  • the dividing module 2 is configured to divide the data of the source database into a specified number of pieces of data according to a preset dividing manner according to the partition of the first index table and the business attributes of the source database.
  • the data in the source database is divided into pieces according to the partitions and business attributes of the first index table, and each piece of data is marked by the business attribute, so as to manage and distinguish each piece of data.
  • the above-mentioned preset division method includes forming each sub-index by partitioning the first index table, and realizing the data in the source database to be divided into pieces by sub-index to form the piece data.
  • the second acquiring module 3 is configured to acquire the corresponding relationship between each of the fragmented data and the data structure in the target database, wherein the target database is a database storing the migration data after migration.
  • the target database in this embodiment is a database storing the migration data after migration, such as a non-relational database cassandra, so as to meet the requirements of data migration from an oracle database to a cassandra database.
  • This embodiment divides the data in the source database into fragmented data, and realizes the data transfer from the source database to the target database by separately transferring the fragmented data, so as to avoid continuous data migration. It hinders the normal business in the system, and avoids the failure of data transfer in a local time period caused by the interference of accidental factors, which causes the entire data transfer transaction to fail.
  • the above-mentioned corresponding relationship is formed according to a preset mapping relationship, and includes an association relationship between each piece of data of the source database and the storage structure position in the target database.
  • the migration module 4 is configured to migrate each piece of data from the source database to the target database according to a preset migration mode according to the corresponding relationship.
  • This embodiment implements data transfer between databases of different database types by sequentially transferring fragmented data, such as from a relational database to a non-relational database, and realizes storage in the two databases according to the mapping relationship of the data structure One-to-one correspondence of data.
  • the dividing module 2 includes:
  • the first obtaining sub-module is configured to obtain the priority rankings corresponding to the business attributes of all the source databases, wherein the priority ranking is a ranking of priority levels from high to low.
  • the selection sub-module is used to select each partition corresponding to the first business attribute before the designated sequence number from the priority ranking as the data division standard of the first index table, wherein the first business attribute is included in all In the business attributes of the source database, the first business attributes include index item attributes.
  • the first division submodule is configured to divide the first index table into sub-indexes corresponding to each partition according to the data division standard of the first index table.
  • the first judging sub-module is used to judge whether the data volume of the source database corresponding to each of the sub-indexes is within a preset single transmission volume.
  • the second division sub-module is configured to, if yes, divide the data of the source database into a first specified number of first fragmented data according to each of the sub-indexes, where the first specified number is the first The number of each partition corresponding to the business attribute.
  • the data division criteria are selected based on the priority ranking of business attributes. For example, the computer room where the service corresponding to the source database has the highest priority, and the computer room where the service corresponding to the source database is located corresponds to three partitions, and the computer room where the service is located corresponds to the three partition pairs
  • the source database is fragmented, and the data of the source database is divided into three fragmented data according to the three partitions corresponding to the service room.
  • the above-mentioned first business attribute can include multiple at the same time, such as business attribute A and business attribute B.
  • Business attribute A has three partitions A1, A2, and A3, and business attribute B has two partitions B1 and B2, so the corresponding first index
  • the number of partitions of the table's data division standard is 6, and they are sorted by priority as A1B1, A1B2, A2B1, A2B2, A3B1, and A3B2.
  • the above business attributes include the index item attributes and other attributes in the first index table, and the index item attributes are preferred for partitioning. Because the principle of index establishment is to divide the data in a balanced manner, the index item attributes are used to fragment the migration data. The obtained fragmented data is more balanced. In this embodiment, the index table is first divided into sub-indexes, and then the corresponding data is led by the sub-indexes to become fragmented data.
  • the division module 2 includes:
  • the second business attribute includes at least one attribute.
  • the third division submodule is configured to divide the data of the source database into a second specified number of second fragmented data according to each of the sub-indexes and the partition corresponding to the second business attribute, so that each of the first The data volume of the two-slice data is within a preset single transmission volume, wherein the second specified number is a product of the number of partitions of the first service attribute and the number of partitions of the second service attribute.
  • the data volume of the fragmented data is not within the single transmission volume, and the effect of a single fast transfer is not achieved. It can be added by adding other than the index item attributes Attributes, together with the attributes of index items, complete the slicing of the data in the source database, so that the data volume of the sharded data is within the preset single transmission volume, realizing rapid data transfer without affecting the normal operation of the system in the business .
  • the obtaining sub-module includes:
  • the collection unit is configured to collect a first number of data sets from the source database according to a preset collection rule.
  • the acquisition unit is used to acquire the data volume of a single migration data of the service system.
  • the obtaining unit is configured to obtain the division capability coefficient according to the data amount of the single migration data divided by the first amount.
  • the calculation unit is configured to calculate the closeness of the aggregation result set corresponding to the business attributes of each source database to the division capability coefficient, wherein the aggregation result set is the aggregation classification result of the data set, and the aggregation result set
  • the number of is equal to the number of partitions of the business attributes of each source database
  • the determining unit is configured to determine the priority ranking according to the respective degrees of proximity, wherein the higher the degree of proximity, the higher the priority corresponding to the aggregation result set.
  • a specified number of data sets are collected from the source database to evaluate the division capability coefficient of each business attribute, so as to realize the optimization of the fragmented data.
  • the foregoing preset collection rules include, for example, collecting a data set every specified time period, so that the collected data set is more analytically representative. For example, a total of 100 data sets are collected, 100 data sets are used as samples, and the number of aggregated result sets is equal to the number of data types divided by shards, such as the above 100
  • the data set distribution belongs to three professional companies, such as technology, property insurance, and life insurance.
  • the number of aggregated result sets is three. Among the above-mentioned three professional companies, the corresponding quantity sets of science and technology, property insurance, and life insurance are: 30, 10, and 60.
  • the above division ability coefficient is expressed as an evenly divided result set.
  • the number of data sets in the evenly divided result set is equal to the total number of collected data sets divided by the number of aggregated result sets, which is equal to 100 divided by 3 equals 33.3333, that is, the evenly divided result set data set The number is 33.3333.
  • the division ability coefficient is expressed as an evenly divided result set, and the evenly divided result set includes a data set obtained by dividing the total amount of data sets collected at a preset collection regular interval by the number of aggregate result sets, and the determining unit, include:
  • the first judgment subunit is used to judge whether there is a third service attribute and a fourth service attribute that have the same degree of closeness as the division capability coefficient, wherein the third service attribute and the fourth service attribute are included in all the source databases Business properties.
  • the acquiring subunit is configured to, if it exists, acquire the first Manhattan distance in which the third business attribute corresponds to the number of data sets in the evenly divided result set, and the fourth business attribute corresponds to the number of data sets in the evenly divided result set Second Manhattan distance.
  • the second judgment subunit is used to judge whether the first Manhattan distance is greater than the second Manhattan distance.
  • the sorting subunit is configured to, if yes, arrange the priority order of the fourth service attribute corresponding to the second Manhattan distance before the third service attribute corresponding to the first Manhattan distance.
  • the smaller the Manhattan distance the better the division ability.
  • the 100 data sets in the above example belong to 3 professional companies according to the index column index1, namely technology, property insurance, and life insurance; and the corresponding quantity sets of science and technology, property insurance, and life insurance in the 3 professional companies are: 30, 10 and 60.
  • 53.3, the above 33.3 is the number of data sets in the result set, Manhattan distance is an absolute distance.
  • the data structure in the target database is a multi-level data nesting structure
  • the second acquisition module 3 includes:
  • the second acquisition sub-module is used to acquire the designated service attributes in the priority ranking that are invoked when the source database is divided into slices, wherein the designated service attributes are included in all services of the source database Properties.
  • the mapping submodule is used to establish a one-to-one mapping relationship between the priority order corresponding to each of the designated business attributes and the multi-level data nesting structure in the target database, wherein the designated business attribute with the highest priority corresponds to the The outermost layer of the multi-level data nesting structure.
  • the target database in this embodiment includes non-relational databases, such as cassandra database.
  • the data structure of the cassandra database is the primary key construction structure.
  • the data fragmentation point determined in the data to be migrated is used as the primary key reference of the cassandra database.
  • the primary key in is the structure of data organization.
  • the data in cassandra is stored in the order of the primary key.
  • the primary key is: K1, K2, K3..., which can be understood as K1 data nested K2 data, K2 data nested again K3 data.
  • this embodiment is mapped to the data primary key sequence in cassandra.
  • the shard data corresponding to the business attributes with high priority is mapped to the data of the outermost nesting layer in the data in cassandra.
  • the file data is written in order, which improves the efficiency of data writing and facilitates the calling and management of data.
  • the migration module 4 includes:
  • the second judgment sub-module is used to judge whether the corresponding service flow at the current moment is within a preset threshold.
  • the start sub-module is used to start a preset migration thread if it is, and search for data to be migrated from the source database.
  • the storage sub-module is used to store the data to be migrated in the cache server and convert the data format.
  • the running sub-module is used to run the preset migration thread according to the preset thread mode, and sequentially inject the data to be migrated into the target database according to the mode of fragmented data.
  • This embodiment implements data migration in batches and orderly by slicing the data.
  • Each batch can be performed independently without mutual dependence, and the migration volume of each batch matches the single transmission volume supported by the system. , If a single migration fails, you only need to repeat the current failed single transmission again without affecting the overall data migration effect. And through the migration in batches, the fragmented time of the system running other businesses can be used to complete the data migration, and the efficiency of the system in processing transactions is improved.
  • This embodiment judges whether the current business is at a low period or a peak period by identifying the operating load status of the system.
  • the business flow is within a preset threshold, indicating that it is in the low business period, and the data migration thread is started to perform data migration; the business flow is not at the preset threshold Inside, it means that in the peak period of business, the data migration thread is suspended to terminate data migration. It not only supports the control of migration tasks according to business attributes, but also ensures the complete migration of business data by category, and avoids business peak periods. After the fragmented data is formed in this embodiment, threads can be started to start data migration.
  • the data is queried and cut out from the oracle database according to the fragmentation information, and the cut data results are stored in the cache server, and then formatted, and then injected into the cassandra database; in this way, one fragment of data is divided into one.
  • the slice data is processed in stages through the same thread. Different fragmented data can also be allocated to different threads for parallel processing to improve the efficiency of fragment migration, but the number of threads running in parallel needs to be reasonably controlled to avoid overloading the databases at both ends.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the computer device includes a readable storage medium and an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions, and a database.
  • the above-mentioned readable storage medium includes a non-volatile readable storage medium and a volatile readable storage medium.
  • the memory provides an environment for the operation of the operating system and computer readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as migration data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instruction executes the process of the above-mentioned method embodiment.
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • An embodiment of the present application further provides a computer-readable storage medium on which computer-readable instructions are stored.
  • the processes of the foregoing method embodiments are executed.
  • the above-mentioned readable storage medium includes non-volatile readable storage medium and volatile readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data migration method and apparatus, a computer device, and a storage medium. The data migration method comprises: acquiring a service attribute of a source database, wherein the source database stores data to be migrated and the source database comprises a first index table (S1); dividing, according to a partition of the first index table and the service attribute, the data in the source database into a specified number of data slices by using a preconfigured division method (S2); acquiring a correspondence between each data slice and a data structure in a target database (S3); and migrating, according to the correspondence, each data slice from the source database to the target database by using a preconfigured migration method (S4). By using the data slice migration method, data is migrated between databases of different types, for example, from a relational database to a non-relational database. A correspondence between data stored in two databases is established according to a mapping relationship between data structures.

Description

迁徙数据的方法、装置、计算机设备及存储介质Method, device, computer equipment and storage medium for migrating data
本申请要求于2019年07月30日提交中国专利局、申请号为201910696304.9,发明名称为“迁徙数据的方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on July 30, 2019, the application number is 201910696304.9, and the invention title is "methods, devices, computer equipment and storage media for migration data", the entire contents of which are incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及到计算机领域,特别是涉及到迁徙数据的方法、装置、计算机设备及存储介质。This application relates to the field of computers, in particular to methods, devices, computer equipment and storage media for migrating data.
背景技术Background technique
由于系统内调用数据量的剧增,原来的关系型数据库在读写效率、存储软硬件成本上已经很难满足业务需求了,所以选用了新型的非关系数据库,这需要对两种数据库进行数据切换。但是生产上必须保证有效数据无丢失、平台用户无感知。业内更多的是同类型数据库之间的数据迁徙,没有数据结构差异化较大的数据库之间的数据迁徙方案,比如从oracle数据库到cassandra数据库。从oracle到cassandra的数据迁徙需要一次性连续完成,则导致迁徙的新旧系统在很长一段时间内有大量的网络IO,会对系统的正常业务造成干扰。若单纯采用少量数据的分片迁徙,即不携带数据的业务属性,而是简单的进行数据的等量迁徙,则没有考虑到数据对业务的不同重要程度,也没有考虑到新旧数据库的格式差异,会造成大量的新库数据的被随机写,旧库的数据被随机读,对迁徙数据的性能有很大影响。需要设计安全稳定的数据源切换方案,以满足一次性连续完成数据迁徙,操作上具有事务性、时间上具有连续性,且不对系统的正常业务造成干扰,且需避免数据迁徙过程中的事务性操作因为中途某项较小的失败,而导致全局的失败。Due to the rapid increase in the amount of data called in the system, the original relational database has been difficult to meet business needs in terms of read and write efficiency and storage software and hardware costs. Therefore, a new type of non-relational database is selected, which requires data on the two databases. Switch. However, it is necessary to ensure that there is no loss of valid data and no perception by platform users in production. In the industry, there is more data migration between databases of the same type, and there is no data migration program between databases with large data structure differences, such as from oracle database to cassandra database. The data migration from oracle to cassandra needs to be completed continuously at one time, which leads to the migration of old and new systems with a large number of network IOs for a long period of time, which will interfere with the normal business of the system. If only a small amount of data is used for shard migration, that is, it does not carry the business attributes of the data, but simply performs the same amount of data migration, it does not consider the different importance of data to the business, nor does it take into account the difference in the format of the new and old databases , Will cause a lot of new database data to be randomly written, and old database data to be randomly read, which has a great impact on the performance of migration data. It is necessary to design a safe and stable data source switching scheme to meet the one-time continuous data migration, which is transactional in operation and continuous in time, and does not interfere with the normal business of the system, and needs to avoid transactional nature in the process of data migration The operation caused a global failure because of a minor failure in the middle.
技术问题technical problem
本申请的主要目的为提供迁徙数据的方法,旨在解决现有无法实现从oracle数据库到cassandra数据库的数据稳定迁徙的技术问题。The main purpose of this application is to provide a method of migration data, aiming to solve the existing technical problem that the stable migration of data from the oracle database to the cassandra database cannot be realized.
技术解决方案Technical solutions
本申请提出一种迁徙数据的方法,包括:This application proposes a method of migration data, including:
获取源数据库的业务属性,其中,所述源数据库为存放待迁徙数据的数据库,所述源数据库包括第一索引表;Acquiring business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table;
根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据;According to the partition of the first index table and the business attribute of the source database, dividing the data of the source database into a specified number of fragmented data according to a preset dividing manner;
获取各所述分片数据分别与目标数据库中的数据结构的对应关系,其中,所述目标数据库为存放迁徙后的所述迁徙数据的数据库;Acquiring the corresponding relationship between each of the fragmented data and the data structure in the target database, wherein the target database is a database storing the migration data after migration;
根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库。According to the corresponding relationship, each piece of data is migrated from the source database to the target database in a preset migration manner.
本申请还提供了一种迁徙数据的装置,包括:This application also provides a device for migration data, including:
第一获取模块,用于获取源数据库的业务属性,其中,所述源数据库为存放待迁徙数据的数据库,所述源数据库包括第一索引表;The first obtaining module is configured to obtain the business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table;
划分模块,用于根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据;A dividing module, configured to divide the data of the source database into a specified number of fragmented data according to a preset dividing manner according to the partition of the first index table and the business attributes of the source database;
第二获取模块,用于获取各所述分片数据分别与目标数据库中的数据结构的对应关系,其中,所述目标数据库为存放迁徙后的所述迁徙数据的数据库;The second acquisition module is configured to acquire the corresponding relationship between each of the fragmented data and the data structure in the target database, wherein the target database is a database storing the migration data after migration;
迁徙模块,用于根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库。The migration module is configured to migrate each piece of data from the source database to the target database according to a preset migration mode according to the corresponding relationship.
本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述方法的步骤。The present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method when the computer program is executed.
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述的方法的步骤。The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are realized.
有益效果Beneficial effect
本申请通过分片数据转移方式,实现不同数据库类型的数据库之间的数据转移,比如从关系型数据库到非关系型数据库,且根据数据结构的映射关系,实现两个数据库中存储数据的对应关系。通过业务属性优先级排序选择第一索引表的数据划分标准,比如源数据库对应的服务所在机房优先级最高,且源数据库对应的服务所在机房对应三个分区,通过服务所在机房对应三个分区对源数据库进行分片,源数据库的数据被分成三个分片数据。通过从源数据库中采集指定数量的数据集,以评价各业务属性的划分能力系数,以实现分片数据的优化。优选在J=N/K附件的索引项属性;J值相同,需要考虑通过曼哈顿距离进一步分析划分能力。This application implements data transfer between databases of different database types through the method of sharding data transfer, such as from a relational database to a non-relational database, and realizes the correspondence between the data stored in the two databases according to the mapping relationship of the data structure . The data division standard of the first index table is selected based on the business attribute priority sorting. For example, the computer room corresponding to the source database has the highest priority, and the computer room corresponding to the source database corresponds to three partitions, and the computer room corresponds to the three partition pairs. The source database is fragmented, and the data of the source database is divided into three fragmented data. By collecting a specified number of data sets from the source database, to evaluate the division ability coefficient of each business attribute, so as to realize the optimization of fragmented data. The index item attribute in the J=N/K attachment is preferred; the value of J is the same, and further analysis of the division ability through Manhattan distance needs to be considered.
附图说明Description of the drawings
图1 本申请一实施例的迁徙数据的方法流程示意图;FIG. 1 is a schematic flowchart of a migration data method according to an embodiment of the present application;
图2本申请一实施例的迁徙数据的装置结构示意图;FIG. 2 is a schematic structural diagram of a migration data device according to an embodiment of the present application;
图3 本申请一实施例的计算机设备内部结构示意图。Fig. 3 is a schematic diagram of the internal structure of a computer device according to an embodiment of the present application.
本发明的最佳实施方式The best mode of the invention
参照图1,本申请一实施例的迁徙数据的方法,包括:1, a data migration method according to an embodiment of the present application includes:
S1:获取源数据库的业务属性,其中,所述源数据库为存放待迁徙数据的数据库,所述源数据库包括第一索引表。S1: Obtain business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table.
本实施例的业务属性包括但不仅限于:源数据库对应的服务所在机房、网络环境、链路角色,服务所属专业公司、服务所属系统、服务注册编码、服务域名等。源数据库为存放待迁徙数据的数据库,比如oracle关系型数据库。数据库的索引表包括索引项属性、信息目录和地址链接,便于在数据库中查询数据,通过先遍历索引表,再根据索引表中的地址链接,去获取数据。The business attributes of this embodiment include, but are not limited to: the computer room, network environment, link role of the service corresponding to the source database, professional company to which the service belongs, system to which the service belongs, service registration code, service domain name, etc. The source database is a database storing data to be migrated, such as an oracle relational database. The index table of the database includes index item attributes, information catalogs and address links, which is convenient for querying data in the database. The data is obtained by first traversing the index table, and then according to the address link in the index table.
S2:根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据。S2: According to the partition of the first index table and the business attribute of the source database, the data of the source database is divided into a specified number of fragmented data according to a preset division manner.
本实施例根据所述第一索引表的分区以及业务属性对源数据库中的数据进行分片划分,并通过业务属性对各分片数据进行标注,以便对各分片数据进行管理与区分。上述预设划分方式包括通过对第一索引表进行分区化形成各分索引,通过分索引实现对源数据库中的数据进行分片划分,形成分片数据。In this embodiment, the data in the source database is divided into pieces according to the partitions and business attributes of the first index table, and each piece of data is marked by the business attribute, so as to manage and distinguish each piece of data. The above-mentioned preset division method includes forming each sub-index by partitioning the first index table, and realizing the data in the source database to be divided into pieces by sub-index to form the piece data.
S3:获取各所述分片数据分别与目标数据库中的数据结构的对应关系,其中,所述目标数据库为存放迁徙后的所述迁徙数据的数据库。S3: Obtain the corresponding relationship between each of the fragmented data and the data structure in the target database, where the target database is a database storing the migration data after migration.
本实施例的目标数据库为存放迁徙后的所述迁徙数据的数据库,比如非关系型数据库cassandra,以满足数据从从oracle数据库迁徙到cassandra数据库。本实施例通过将源数据库中的数据分片划分形成分片数据,以通过分片数据分别转移的方式,实现数据从源数据库到目标数据库的数据转移,以避免数据迁徙过程中因连续进行,妨碍系统中正常业务的进行,且避免因偶然因素的干扰造成局部时间段的数据转移失败,而导致整个数据转移事务失效。上述的对应关系,根据预设的映射关系形成,包括源数据库的各分片数据,在目标数据库中的存储结构位置的关联关系。The target database in this embodiment is a database storing the migration data after migration, such as a non-relational database cassandra, so as to meet the requirements of data migration from an oracle database to a cassandra database. This embodiment divides the data in the source database into fragmented data, and realizes the data transfer from the source database to the target database by separately transferring the fragmented data, so as to avoid continuous data migration. It hinders the normal business in the system, and avoids the failure of data transfer in a local time period caused by the interference of accidental factors, which causes the entire data transfer transaction to fail. The above-mentioned corresponding relationship is formed according to a preset mapping relationship, and includes an association relationship between each piece of data of the source database and the storage structure position in the target database.
S4:根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库。S4: According to the corresponding relationship, migrate each piece of data from the source database to the target database in a preset migration manner.
本实施例通过分片数据分别依次转移的方式,实现不同数据库类型的数据库之间的数据转移,比如从关系型数据库到非关系型数据库,且根据数据结构的映射关系,实现两个数据库中存储数据的一一对应关系。This embodiment implements data transfer between databases of different database types by sequentially transferring fragmented data, such as from a relational database to a non-relational database, and realizes storage in the two databases according to the mapping relationship of the data structure One-to-one correspondence of data.
进一步地,所述根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据的步骤S2,包括:Further, the step S2 of dividing the data of the source database into a specified number of pieces of data according to a preset dividing manner according to the partition of the first index table and the business attributes of the source database includes:
S21:获取所有所述源数据库的业务属性分别对应的优先级排序,其中,所述优先级排序为优先级级别从高到低的排序。S21: Acquire priority rankings corresponding to the business attributes of all the source databases, where the priority ranking is a ranking of priority levels from high to low.
S22:从所述优先级排序中选择指定序号前的第一业务属性对应的各分区,作为所述第一索引表的数据划分标准,其中,所述第一业务属性包含于所有所述源数据库的业务属性中,所述第一业务属性包括索引项属性。S22: Select each partition corresponding to the first business attribute before the designated serial number from the priority ranking as the data division standard of the first index table, wherein the first business attribute is included in all the source databases In the business attributes of, the first business attribute includes index item attributes.
S23:根据所述第一索引表的数据划分标准,将所述第一索引表划分为各分区分别对应的分索引。S23: According to the data division standard of the first index table, divide the first index table into sub-indexes corresponding to each partition.
S24:判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内。S24: Determine whether the data volume of the source database corresponding to each sub-index is within a preset single transmission volume.
S25:若是,则根据各所述分索引将所述源数据库的数据划分成第一指定数量的第一分片数据,其中,所述第一指定数量为所述第一业务属性对应的各分区的数量。S25: If yes, divide the data of the source database into a first specified number of first fragmented data according to each of the sub-indexes, where the first specified number is each partition corresponding to the first business attribute quantity.
本实施例通过业务属性的优先级排序选择数据划分的标准,比如源数据库对应的服务所在机房优先级最高,且源数据库对应的服务所在机房对应三个分区,通过服务所在机房对应三个分区对源数据库进行分片,依据服务所在机房对应三个分区将源数据库的数据被分成三个分片数据。上述第一业务属性可同时包括多个,比如包括业务属性A和业务属性B,业务属性A有三个分区A1、A2和A3,业务属性B有两个分区B1和B2,则对应的第一索引表的数据划分标准的分区个数为6个,且根据优先级排序分别为A1B1、A1B2、A2B1、A2B2、A3B1和A3B2。上述业务属性包括第一索引表中的索引项属性以及其他属性,且优先选用索引项属性进行分区,因为索引建立的原则是能够均衡的划分数据,所以用索引项属性对待迁徙数据进行分片,得到的分片数据的均衡性更好。本实施例通过先将索引表划分为分索引,然后通过分索引引领对应的数据,成为分片数据。In this embodiment, the data division criteria are selected based on the priority ranking of business attributes. For example, the computer room where the service corresponding to the source database has the highest priority, and the computer room where the service corresponding to the source database is located corresponds to three partitions, and the computer room where the service is located corresponds to the three partition pairs The source database is fragmented, and the data of the source database is divided into three fragmented data according to the three partitions corresponding to the service room. The above-mentioned first business attribute can include multiple at the same time, such as business attribute A and business attribute B. Business attribute A has three partitions A1, A2, and A3, and business attribute B has two partitions B1 and B2, so the corresponding first index The number of partitions of the table's data division standard is 6, and they are sorted by priority as A1B1, A1B2, A2B1, A2B2, A3B1, and A3B2. The above business attributes include the index item attributes and other attributes in the first index table, and the index item attributes are preferred for partitioning. Because the principle of index establishment is to divide the data in a balanced manner, the index item attributes are used to fragment the migration data. The obtained fragmented data is more balanced. In this embodiment, the index table is first divided into sub-indexes, and then the corresponding data is led by the sub-indexes to become fragmented data.
进一步地,所述判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内的步骤S24之后,包括:Further, after the step S24 of determining whether the data volume of the source database corresponding to each of the sub-indexes is within a preset single transmission volume, the method includes:
S241:若各所述分索引分别对应的数据量不在预设单次传输量之内,则添加第二业务属性对应的分区,其中,所述第二业务属性为所有所述源数据库的业务属性中除索引项属性之外的属性,所述第二业务属性至少包括一个。S241: If the data volume corresponding to each sub-index is not within the preset single transmission volume, add the partition corresponding to the second service attribute, where the second service attribute is the service attribute of all the source databases The second business attribute includes at least one attribute other than the attribute of the index item.
S242:根据各所述分索引以及所述第二业务属性对应的分区,将所述源数据库的数据划分成第二指定数量的第二分片数据,使各所述第二分片数据的数据量在预设单次传输量之内,其中,所述第二指定数量为所述第一业务属性的分区数量与所述第二业务属性的分区数量的乘积。S242: According to each sub-index and the partition corresponding to the second business attribute, divide the data of the source database into a second specified number of second fragmented data, so that the data of each second fragmented data The amount is within a preset single transmission amount, wherein the second specified number is a product of the number of partitions of the first service attribute and the number of partitions of the second service attribute.
本实施例中当通过索引项属性进行分片数据后,分片数据的数据量不在单次传输量之内,起不到单次快速转移的效果,可通过加入除索引项属性之外的其他属性,与索引项属性共同完成对源数据库中的数据分片,以便使分片数据的数据量在预设单次传输量之内,实现数据的快速转移,且不影响系统中正常运行的业务。In this embodiment, when the data is fragmented by index item attributes, the data volume of the fragmented data is not within the single transmission volume, and the effect of a single fast transfer is not achieved. It can be added by adding other than the index item attributes Attributes, together with the attributes of index items, complete the slicing of the data in the source database, so that the data volume of the sharded data is within the preset single transmission volume, realizing rapid data transfer without affecting the normal operation of the system in the business .
进一步地,所述获取所有所述源数据库的业务属性分别对应的优先级排序的步骤S21,包括:Further, the step S21 of obtaining the respective priority rankings corresponding to the business attributes of all the source databases includes:
S211:按照预设采集规则,从所述源数据库中采集第一数量的数据集。S211: According to a preset collection rule, collect a first number of data sets from the source database.
S212:获取服务系统单次迁徙数据的数据量。S212: Obtain the data volume of the single migration data of the service system.
S213:根据所述第一数量除以所述单次迁徙数据的数据量,得到划分能力系数。S213: Divide the first amount by the data amount of the single migration data to obtain a division ability coefficient.
S214:计算各所述源数据库的业务属性分别对应的聚合结果集与所述划分能力系数的接近程度,其中,所述聚合结果集为数据集的聚合分类结果,所述聚合结果集的数量等于各所述源数据库的业务属性的分区数量;S214: Calculate the closeness of the aggregation result set corresponding to the business attributes of each source database to the division capability coefficient, where the aggregation result set is the aggregation classification result of the data set, and the number of aggregation result sets is equal to The number of partitions of the business attributes of each source database;
S215:根据各所述接近程度确定所述优先级排序,其中,接近程度越高的聚合结果集对应的优先级越高。S215: Determine the priority sorting according to each of the proximity degrees, wherein the higher the proximity degree, the higher the priority corresponding to the aggregation result set.
本实施例通过从源数据库中采集指定数量的数据集,以评价各业务属性的划分能力系数,以实现分片数据的优化。上述预设采集规则比如包括每隔指定时间段采集一个数据集,使采集的数据集更具有分析代表性。比如共采集了100个数据集,将100个数据集作为样本,按照索引列index1(比如是专业公司编码)分批,得到聚合结果集的数量等于分片划分的数据种类数量,比如上述100个数据集分布属于3个专业公司,比如为科技、产险、寿险,则聚合结果集的数量为三个。上述三个专业公司中科技、产险、寿险分别对应的数量集为:30个,10个和60个。上述划分能力系数表示为均分结果集,均分结果集中数据集的数量等于采集的数据集总量除以聚合结果集的数量,即等于100除以3等于33.3333,即均分结果集中数据集的数量为33.3333个。首先考察数据集的总量,比如总量为N,而系统能够承受的单次迁徙数量在K,那么优选聚合结果集的数量在J=N/K附件的索引项属性,聚合结果集的数量在J=N/K附件的索引项属性,具有将分片数据划分更均衡更合理的划分能力,尽量满足数据的单次迁徙需求。In this embodiment, a specified number of data sets are collected from the source database to evaluate the division capability coefficient of each business attribute, so as to realize the optimization of the fragmented data. The foregoing preset collection rules include, for example, collecting a data set every specified time period, so that the collected data set is more analytically representative. For example, a total of 100 data sets are collected, 100 data sets are used as samples, and the number of aggregated result sets is equal to the number of data types divided by shards, such as the above 100 The data set distribution belongs to three professional companies, such as technology, property insurance, and life insurance. The number of aggregated result sets is three. Among the above-mentioned three professional companies, the corresponding quantity sets of science and technology, property insurance, and life insurance are: 30, 10, and 60. The above division ability coefficient is expressed as an evenly divided result set. The number of data sets in the evenly divided result set is equal to the total number of collected data sets divided by the number of aggregated result sets, which is equal to 100 divided by 3 equals 33.3333, that is, the evenly divided result set data set The number is 33.3333. First look at the total amount of the data set. For example, the total amount is N and the number of single migrations that the system can withstand is K. Then the number of aggregated result sets is preferably in the index item attribute of the J=N/K attachment, and the number of aggregated result sets The index item attribute in the J=N/K attachment has the ability to divide the fragmented data into a more balanced and reasonable division, and try to meet the needs of a single migration of data.
进一步地,所述划分能力系数表示为均分结果集,所述均分结果集包括按照预设采集规则间隔采集的数据集总量除以聚合结果集的数量的数据集,所述根据各所述接近程度确定所述优先级排序的步骤S215,包括:Further, the division ability coefficient is expressed as an evenly divided result set, and the evenly divided result set includes a data set obtained by dividing the total number of data sets collected according to a preset collection rule interval by the number of aggregate result sets, and said The step S215 of determining the priority ranking by the proximity degree includes:
S2151:判断是否存在与所述划分能力系数具有相同接近程度的第三业务属性和第四业务属性,其中,第三业务属性和第四业务属性包含于所有所述源数据库的业务属性中。S2151: Determine whether there is a third service attribute and a fourth service attribute that have the same degree of proximity to the division capability coefficient, where the third service attribute and the fourth service attribute are included in all the service attributes of the source database.
S2152:若存在,则获取所述第三业务属性对应于均分结果集中数据集的数量的第一曼哈顿距离,以及所述第四业务属性对应于均分结果集中数据集的数量的第二曼哈顿距离。S2152: If it exists, obtain the first Manhattan distance in which the third business attribute corresponds to the number of data sets in the evenly divided result set, and the fourth business attribute corresponds to the second Manhattan distance of the number of data sets in the evenly divided result set. distance.
S2153:判断所述第一曼哈顿距离是否大于所述第二曼哈顿距离。S2153: Determine whether the first Manhattan distance is greater than the second Manhattan distance.
S2154:若是,则将所述第二曼哈顿距离对应的所述第四业务属性的优先级次序,排在所述第一曼哈顿距离对应的所述第三业务属性之前。S2154: If yes, arrange the priority order of the fourth service attribute corresponding to the second Manhattan distance before the third service attribute corresponding to the first Manhattan distance.
本实施例中,当多个业务属性对应的J值相同,需要考虑通过曼哈顿距离进一步分析划分能力。相同的聚合结果集数量,曼哈顿距离越小,划分能力越好。如上述实例中100个数据集按照索引列index1分布属于3个专业公司,分别为科技、产险、寿险;而且3个专业公司中科技、产险、寿险分别对应的数量集为:30个,10个和60个。索引列index1对应的曼哈顿距离为:Manhattan距离表示为D,则D=|30-33.3|+|10-33.3|+|60-33.3|=53.3,上述33.3为均分结果集中的数据集数量,Manhattan距离是绝对值距离。In this embodiment, when the J values corresponding to multiple service attributes are the same, it is necessary to consider further analysis of the division capability through Manhattan distance. For the same number of aggregated result sets, the smaller the Manhattan distance, the better the division ability. For example, the 100 data sets in the above example belong to 3 professional companies according to the index column index1, namely technology, property insurance, and life insurance; and the corresponding quantity sets of science and technology, property insurance, and life insurance in the 3 professional companies are: 30, 10 and 60. The Manhattan distance corresponding to the index column index1 is: Manhattan distance is expressed as D, then D=|30-33.3|+|10-33.3|+|60-33.3|=53.3, the above 33.3 is the number of data sets in the result set, Manhattan distance is an absolute distance.
进一步地,所述目标数据库中的数据结构为多层数据嵌套结构,所述获取各所述分片数据分别与目标数据库中的数据结构的对应关系的步骤S3,包括:Further, the data structure in the target database is a multi-level data nesting structure, and the step S3 of obtaining the corresponding relationship between each of the fragmented data and the data structure in the target database includes:
S31:获取对所述源数据库进行分片数据划分时,调用的所述优先级排序中的指定业务属性,其中,所述指定业务属性包含于所有所述源数据库的业务属性中。S31: Obtain the designated service attributes in the priority ranking that are invoked when the source database is divided into fragments, where the designated service attributes are included in all business attributes of the source database.
S32:将各所述指定业务属性对应的优先级次序,与所述目标数据库中的多层数据嵌套结构建立一一对应映射关系,其中最高优先级的指定业务属性对应所述多层数据嵌套结构的最外层。S32: Establish a one-to-one mapping relationship between the priority order corresponding to each of the designated business attributes and the multi-layer data nesting structure in the target database, wherein the designated business attribute with the highest priority corresponds to the multi-layer data nesting structure. The outermost layer of the sleeve structure.
本实施例的目标数据库包括非关系型数据库,比如cassandra数据库,cassandra数据库的数据结构为主键构建结构,本实施例以待迁徙数据中确定的数据分片,作为cassandra数据库的主键参考,cassandra中的主键是数据组织的结构,cassandra中的数据是按主键顺序嵌套存储,比如主键是:K1,K2,K3..,可以理解为K1数据中嵌套K2数据,K2数据中再嵌套了K3数据。本实施例根据业务属性的优先级次序,映射为cassandra中的数据主键顺序,比如优先级级别高的业务属性对应的分片数据,映射为cassandra中的数据中最外层嵌套层的数据,以便实现数据在不同数据库之间迁徙时,具有清晰、合理的对应关系,文件数据顺序写,提高数据写入的效率,且方便数据的调用和管理。The target database in this embodiment includes non-relational databases, such as the cassandra database. The data structure of the cassandra database is the primary key construction structure. In this embodiment, the data fragments determined in the data to be migrated are used as the primary key reference of the cassandra database. The primary key is the structure of the data organization. The data in cassandra is stored nested in the order of the primary key. For example, the primary key is: K1, K2, K3..., which can be understood as K1 data nested K2 data, K2 data nested K3 data. According to the priority order of business attributes, this embodiment is mapped to the data primary key sequence in cassandra. For example, the shard data corresponding to the business attributes with high priority is mapped to the data of the outermost nesting layer in the data in cassandra. In order to achieve a clear and reasonable correspondence between data migration between different databases, the file data is written in order, which improves the efficiency of data writing and facilitates the calling and management of data.
进一步地,所述根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库的步骤S4,包括:Further, the step S4 of migrating each piece of data from the source database to the target database according to a preset migration mode according to the corresponding relationship includes:
S41:判断当前时刻对应的业务流量是否处于预设阈值内。S41: Determine whether the service flow corresponding to the current moment is within a preset threshold.
S42:若是,则启动预设迁徙线程,并从所述源数据库中查寻出待迁徙数据。S42: If yes, start a preset migration thread, and search for data to be migrated from the source database.
S43:将待迁徙数据存放于缓存服务器中,并转换数据格式。S43: Store the data to be migrated in the cache server, and convert the data format.
S44:按照预设线程方式运行所述预设迁徙线程,并将待迁徙数据按照分片数据的方式,依次注入到所述目标数据库。S44: Run the preset migration thread according to the preset thread mode, and sequentially inject the data to be migrated into the target database according to the fragmented data mode.
本实施例通过对数据分片实现数据迁徙分批次有序进行,每批次之间可以独立进行,互相不具有依赖性,且每批次迁徙量与系统支撑的单次的传输量相匹配,单次迁徙失败,只需再次重复执行当前失败的单次传输,不影响整体数据的迁徙效果。且通过分批次迁徙,使得可利用系统运行其他业务的碎片时间完成数据迁徙,提高系统处理事务的效率。本实施例通过识别系统运行负荷状态判断当前为业务低谷期还是高峰期,比如业务流量处于预设阈值内,说明处于业务低谷期,则启动数据迁徙线程进行数据迁徙;业务流量未处于预设阈值内,说明处于业务高峰期,则挂起数据迁徙线程终止数据迁徙。不仅支持按业务属性控制迁徙任务,且保证业务数据按类别完整地进行迁徙,且避开业务高峰期。本实施例形成分片数据后,就可以启动线程开始执行数据迁徙。迁徙过程中将数据从oracle数据库中按分片信息查询并切割出来,切割的数据结果放在高速缓存服务器中,然后进行格式转换处理,再注入到cassandra数据库中;如此往复,一个分片数据一个分片数据的通过同一线程分阶段处理。也可以将不同的分片数据分配到不同的线程中并行处理,提高分片迁徙的效率,但需合理控制并行运行的线程数量,避免两端数据库过载。This embodiment implements data migration in batches and orderly by slicing the data. Each batch can be performed independently without mutual dependence, and the migration volume of each batch matches the single transmission volume supported by the system. , If a single migration fails, you only need to repeat the current failed single transmission again without affecting the overall data migration effect. And through the migration in batches, the fragmented time of the system running other businesses can be used to complete the data migration, and the efficiency of the system in processing transactions is improved. This embodiment judges whether the current business is at a low period or a peak period by identifying the operating load status of the system. For example, the business flow is within a preset threshold, indicating that it is in the low business period, and the data migration thread is started to perform data migration; the business flow is not at the preset threshold Inside, it means that in the peak period of business, the data migration thread is suspended to terminate the data migration. It not only supports the control of migration tasks according to business attributes, but also ensures the complete migration of business data by category, and avoids business peaks. After the fragmented data is formed in this embodiment, threads can be started to start data migration. During the migration process, the data is queried and cut out from the oracle database according to the fragmentation information, and the cut data results are stored in the cache server, and then formatted, and then injected into the cassandra database; in this way, one fragment of data is divided into one. The slice data is processed in stages through the same thread. Different fragmented data can also be allocated to different threads for parallel processing to improve the efficiency of fragment migration, but the number of threads running in parallel needs to be reasonably controlled to avoid overloading the databases at both ends.
参照图2,本申请一实施例的迁徙数据的装置,包括:Referring to FIG. 2, a data migration device according to an embodiment of the present application includes:
第一获取模块1,用于获取源数据库的业务属性,其中,所述源数据库为存放待迁徙数据的数据库,所述源数据库包括第一索引表。The first obtaining module 1 is configured to obtain the business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table.
本实施例的业务属性包括但不仅限于:源数据库对应的服务所在机房、网络环境、链路角色,服务所属专业公司、服务所属系统、服务注册编码、服务域名等。源数据库为存放待迁徙数据的数据库,比如oracle关系型数据库。数据库的索引表包括索引项属性、信息目录和地址链接,便于在数据库中查询数据,通过先遍历索引表,再根据索引表中的地址链接,去获取数据。The business attributes of this embodiment include, but are not limited to: the computer room, network environment, link role of the service corresponding to the source database, professional company to which the service belongs, system to which the service belongs, service registration code, service domain name, etc. The source database is a database storing data to be migrated, such as an oracle relational database. The index table of the database includes index item attributes, information catalogs and address links, which is convenient for querying data in the database. The data is obtained by first traversing the index table, and then according to the address link in the index table.
划分模块2,用于根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据。The dividing module 2 is configured to divide the data of the source database into a specified number of pieces of data according to a preset dividing manner according to the partition of the first index table and the business attributes of the source database.
本实施例根据所述第一索引表的分区以及业务属性对源数据库中的数据进行分片划分,并通过业务属性对各分片数据进行标注,以便对各分片数据进行管理与区分。上述预设划分方式包括通过对第一索引表进行分区化形成各分索引,通过分索引实现对源数据库中的数据进行分片划分,形成分片数据。In this embodiment, the data in the source database is divided into pieces according to the partitions and business attributes of the first index table, and each piece of data is marked by the business attribute, so as to manage and distinguish each piece of data. The above-mentioned preset division method includes forming each sub-index by partitioning the first index table, and realizing the data in the source database to be divided into pieces by sub-index to form the piece data.
第二获取模块3,用于获取各所述分片数据分别与目标数据库中的数据结构的对应关系,其中,所述目标数据库为存放迁徙后的所述迁徙数据的数据库。The second acquiring module 3 is configured to acquire the corresponding relationship between each of the fragmented data and the data structure in the target database, wherein the target database is a database storing the migration data after migration.
本实施例的目标数据库为存放迁徙后的所述迁徙数据的数据库,比如非关系型数据库cassandra,以满足数据从从oracle数据库迁徙到cassandra数据库。本实施例通过将源数据库中的数据分片划分形成分片数据,以通过分片数据分别转移的方式,实现数据从源数据库到目标数据库的数据转移,以避免数据迁徙过程中因连续进行,妨碍系统中正常业务的进行,且避免因偶然因素的干扰造成局部时间段的数据转移失败,而导致整个数据转移事务失效。上述的对应关系,根据预设的映射关系形成,包括源数据库的各分片数据,在目标数据库中的存储结构位置的关联关系。The target database in this embodiment is a database storing the migration data after migration, such as a non-relational database cassandra, so as to meet the requirements of data migration from an oracle database to a cassandra database. This embodiment divides the data in the source database into fragmented data, and realizes the data transfer from the source database to the target database by separately transferring the fragmented data, so as to avoid continuous data migration. It hinders the normal business in the system, and avoids the failure of data transfer in a local time period caused by the interference of accidental factors, which causes the entire data transfer transaction to fail. The above-mentioned corresponding relationship is formed according to a preset mapping relationship, and includes an association relationship between each piece of data of the source database and the storage structure position in the target database.
迁徙模块4,用于根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库。The migration module 4 is configured to migrate each piece of data from the source database to the target database according to a preset migration mode according to the corresponding relationship.
本实施例通过分片数据分别依次转移的方式,实现不同数据库类型的数据库之间的数据转移,比如从关系型数据库到非关系型数据库,且根据数据结构的映射关系,实现两个数据库中存储数据的一一对应关系。This embodiment implements data transfer between databases of different database types by sequentially transferring fragmented data, such as from a relational database to a non-relational database, and realizes storage in the two databases according to the mapping relationship of the data structure One-to-one correspondence of data.
进一步地,所述划分模块2,包括:Further, the dividing module 2 includes:
第一获取子模块,用于获取所有所述源数据库的业务属性分别对应的优先级排序,其中,所述优先级排序为优先级级别从高到低的排序。The first obtaining sub-module is configured to obtain the priority rankings corresponding to the business attributes of all the source databases, wherein the priority ranking is a ranking of priority levels from high to low.
选择子模块,用于从所述优先级排序中选择指定序号前的第一业务属性对应的各分区,作为所述第一索引表的数据划分标准,其中,所述第一业务属性包含于所有所述源数据库的业务属性中,所述第一业务属性包括索引项属性。The selection sub-module is used to select each partition corresponding to the first business attribute before the designated sequence number from the priority ranking as the data division standard of the first index table, wherein the first business attribute is included in all In the business attributes of the source database, the first business attributes include index item attributes.
第一划分子模块,用于根据所述第一索引表的数据划分标准,将所述第一索引表划分为各分区分别对应的分索引。The first division submodule is configured to divide the first index table into sub-indexes corresponding to each partition according to the data division standard of the first index table.
第一判断子模块,用于判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内。The first judging sub-module is used to judge whether the data volume of the source database corresponding to each of the sub-indexes is within a preset single transmission volume.
第二划分子模块,用于若是,则根据各所述分索引将所述源数据库的数据划分成第一指定数量的第一分片数据,其中,所述第一指定数量为所述第一业务属性对应的各分区的数量。The second division sub-module is configured to, if yes, divide the data of the source database into a first specified number of first fragmented data according to each of the sub-indexes, where the first specified number is the first The number of each partition corresponding to the business attribute.
本实施例通过业务属性的优先级排序选择数据划分的标准,比如源数据库对应的服务所在机房优先级最高,且源数据库对应的服务所在机房对应三个分区,通过服务所在机房对应三个分区对源数据库进行分片,依据服务所在机房对应三个分区将源数据库的数据被分成三个分片数据。上述第一业务属性可同时包括多个,比如包括业务属性A和业务属性B,业务属性A有三个分区A1、A2和A3,业务属性B有两个分区B1和B2,则对应的第一索引表的数据划分标准的分区个数为6个,且根据优先级排序分别为A1B1、A1B2、A2B1、A2B2、A3B1和A3B2。上述业务属性包括第一索引表中的索引项属性以及其他属性,且优先选用索引项属性进行分区,因为索引建立的原则是能够均衡的划分数据,所以用索引项属性对待迁徙数据进行分片,得到的分片数据的均衡性更好。本实施例通过先将索引表划分为分索引,然后通过分索引引领对应的数据,成为分片数据。In this embodiment, the data division criteria are selected based on the priority ranking of business attributes. For example, the computer room where the service corresponding to the source database has the highest priority, and the computer room where the service corresponding to the source database is located corresponds to three partitions, and the computer room where the service is located corresponds to the three partition pairs The source database is fragmented, and the data of the source database is divided into three fragmented data according to the three partitions corresponding to the service room. The above-mentioned first business attribute can include multiple at the same time, such as business attribute A and business attribute B. Business attribute A has three partitions A1, A2, and A3, and business attribute B has two partitions B1 and B2, so the corresponding first index The number of partitions of the table's data division standard is 6, and they are sorted by priority as A1B1, A1B2, A2B1, A2B2, A3B1, and A3B2. The above business attributes include the index item attributes and other attributes in the first index table, and the index item attributes are preferred for partitioning. Because the principle of index establishment is to divide the data in a balanced manner, the index item attributes are used to fragment the migration data. The obtained fragmented data is more balanced. In this embodiment, the index table is first divided into sub-indexes, and then the corresponding data is led by the sub-indexes to become fragmented data.
进一步地,划分模块2,包括:Further, the division module 2 includes:
添加子模块,用于若各所述分索引分别对应的数据量不在预设单次传输量之内,则添加第二业务属性对应的分区,其中,所述第二业务属性为所有所述源数据库的业务属性中除索引项属性之外的属性,所述第二业务属性至少包括一个。Adding a sub-module for adding a partition corresponding to a second service attribute if the data volume corresponding to each sub-index is not within the preset single transmission volume, where the second service attribute is all the sources Among the business attributes of the database other than the attribute of the index item, the second business attribute includes at least one attribute.
第三划分子模块,用于根据各所述分索引以及所述第二业务属性对应的分区,将所述源数据库的数据划分成第二指定数量的第二分片数据,使各所述第二分片数据的数据量在预设单次传输量之内,其中,所述第二指定数量为所述第一业务属性的分区数量与所述第二业务属性的分区数量的乘积。The third division submodule is configured to divide the data of the source database into a second specified number of second fragmented data according to each of the sub-indexes and the partition corresponding to the second business attribute, so that each of the first The data volume of the two-slice data is within a preset single transmission volume, wherein the second specified number is a product of the number of partitions of the first service attribute and the number of partitions of the second service attribute.
本实施例中当通过索引项属性进行分片数据后,分片数据的数据量不在单次传输量之内,起不到单次快速转移的效果,可通过加入除索引项属性之外的其他属性,与索引项属性共同完成对源数据库中的数据分片,以便使分片数据的数据量在预设单次传输量之内,实现数据的快速转移,且不影响系统中正常运行的业务。In this embodiment, when the data is fragmented by index item attributes, the data volume of the fragmented data is not within the single transmission volume, and the effect of a single fast transfer is not achieved. It can be added by adding other than the index item attributes Attributes, together with the attributes of index items, complete the slicing of the data in the source database, so that the data volume of the sharded data is within the preset single transmission volume, realizing rapid data transfer without affecting the normal operation of the system in the business .
进一步地,所述获取子模块,包括:Further, the obtaining sub-module includes:
采集单元,用于按照预设采集规则,从所述源数据库中采集第一数量的数据集。The collection unit is configured to collect a first number of data sets from the source database according to a preset collection rule.
获取单元,用于获取服务系统单次迁徙数据的数据量。The acquisition unit is used to acquire the data volume of a single migration data of the service system.
得到单元,用于根据所述第一数量除以所述单次迁徙数据的数据量,得到划分能力系数。The obtaining unit is configured to obtain the division capability coefficient according to the data amount of the single migration data divided by the first amount.
计算单元,用于计算各所述源数据库的业务属性分别对应的聚合结果集与所述划分能力系数的接近程度,其中,所述聚合结果集为数据集的聚合分类结果,所述聚合结果集的数量等于各所述源数据库的业务属性的分区数量;The calculation unit is configured to calculate the closeness of the aggregation result set corresponding to the business attributes of each source database to the division capability coefficient, wherein the aggregation result set is the aggregation classification result of the data set, and the aggregation result set The number of is equal to the number of partitions of the business attributes of each source database;
确定单元,用于根据各所述接近程度确定所述优先级排序,其中,接近程度越高的聚合结果集对应的优先级越高。The determining unit is configured to determine the priority ranking according to the respective degrees of proximity, wherein the higher the degree of proximity, the higher the priority corresponding to the aggregation result set.
本实施例通过从源数据库中采集指定数量的数据集,以评价各业务属性的划分能力系数,以实现分片数据的优化。上述预设采集规则比如包括每隔指定时间段采集一个数据集,使采集的数据集更具有分析代表性。比如共采集了100个数据集,将100个数据集作为样本,按照索引列index1(比如是专业公司编码)分批,得到聚合结果集的数量等于分片划分的数据种类数量,比如上述100个数据集分布属于3个专业公司,比如为科技、产险、寿险,则聚合结果集的数量为三个。上述三个专业公司中科技、产险、寿险分别对应的数量集为:30个,10个和60个。上述划分能力系数表示为均分结果集,均分结果集中数据集的数量等于采集的数据集总量除以聚合结果集的数量,即等于100除以3等于33.3333,即均分结果集中数据集的数量为33.3333个。首先考察数据集的总量,比如总量为N,而系统能够承受的单次迁徙数量在K,那么优选聚合结果集的数量在J=N/K附件的索引项属性,聚合结果集的数量在J=N/K附件的索引项属性,具有将分片数据划分更均衡更合理的划分能力,尽量满足数据的单次迁徙需求。In this embodiment, a specified number of data sets are collected from the source database to evaluate the division capability coefficient of each business attribute, so as to realize the optimization of the fragmented data. The foregoing preset collection rules include, for example, collecting a data set every specified time period, so that the collected data set is more analytically representative. For example, a total of 100 data sets are collected, 100 data sets are used as samples, and the number of aggregated result sets is equal to the number of data types divided by shards, such as the above 100 The data set distribution belongs to three professional companies, such as technology, property insurance, and life insurance. The number of aggregated result sets is three. Among the above-mentioned three professional companies, the corresponding quantity sets of science and technology, property insurance, and life insurance are: 30, 10, and 60. The above division ability coefficient is expressed as an evenly divided result set. The number of data sets in the evenly divided result set is equal to the total number of collected data sets divided by the number of aggregated result sets, which is equal to 100 divided by 3 equals 33.3333, that is, the evenly divided result set data set The number is 33.3333. First look at the total amount of the data set. For example, the total amount is N and the number of single migrations that the system can withstand is K. Then the number of aggregated result sets is preferably in the index item attribute of the J=N/K attachment, and the number of aggregated result sets The index item attribute in the J=N/K attachment has the ability to divide the fragmented data into a more balanced and reasonable division, and try to meet the needs of a single migration of data.
进一步地,所述划分能力系数表示为均分结果集,所述均分结果集包括按照预设采集规则间隔采集的数据集总量除以聚合结果集的数量的数据集,所述确定单元,包括:Further, the division ability coefficient is expressed as an evenly divided result set, and the evenly divided result set includes a data set obtained by dividing the total amount of data sets collected at a preset collection regular interval by the number of aggregate result sets, and the determining unit, include:
第一判断子单元,用于判断是否存在与所述划分能力系数具有相同接近程度的第三业务属性和第四业务属性,其中,第三业务属性和第四业务属性包含于所有所述源数据库的业务属性中。The first judgment subunit is used to judge whether there is a third service attribute and a fourth service attribute that have the same degree of closeness as the division capability coefficient, wherein the third service attribute and the fourth service attribute are included in all the source databases Business properties.
获取子单元,用于若存在,则获取所述第三业务属性对应于均分结果集中数据集的数量的第一曼哈顿距离,以及所述第四业务属性对应于均分结果集中数据集的数量的第二曼哈顿距离。The acquiring subunit is configured to, if it exists, acquire the first Manhattan distance in which the third business attribute corresponds to the number of data sets in the evenly divided result set, and the fourth business attribute corresponds to the number of data sets in the evenly divided result set Second Manhattan distance.
第二判断子单元,用于判断所述第一曼哈顿距离是否大于所述第二曼哈顿距离。The second judgment subunit is used to judge whether the first Manhattan distance is greater than the second Manhattan distance.
排序子单元,用于若是,则将所述第二曼哈顿距离对应的所述第四业务属性的优先级次序,排在所述第一曼哈顿距离对应的所述第三业务属性之前。The sorting subunit is configured to, if yes, arrange the priority order of the fourth service attribute corresponding to the second Manhattan distance before the third service attribute corresponding to the first Manhattan distance.
本实施例中,当多个业务属性对应的J值相同,需要考虑通过曼哈顿距离进一步分析划分能力。相同的聚合结果集数量,曼哈顿距离越小,划分能力越好。如上述实例中100个数据集按照索引列index1分布属于3个专业公司,分别为科技、产险、寿险;而且3个专业公司中科技、产险、寿险分别对应的数量集为:30个,10个和60个。索引列index1对应的曼哈顿距离为:Manhattan距离表示为D,则D=|30-33.3|+|10-33.3|+|60-33.3|=53.3,上述33.3为均分结果集中数据集的数量,Manhattan距离是绝对值距离。In this embodiment, when the J values corresponding to multiple service attributes are the same, it is necessary to consider further analysis of the division capability based on the Manhattan distance. For the same number of aggregated result sets, the smaller the Manhattan distance, the better the division ability. For example, the 100 data sets in the above example belong to 3 professional companies according to the index column index1, namely technology, property insurance, and life insurance; and the corresponding quantity sets of science and technology, property insurance, and life insurance in the 3 professional companies are: 30, 10 and 60. The Manhattan distance corresponding to the index column index1 is: Manhattan distance is expressed as D, then D=|30-33.3|+|10-33.3|+|60-33.3|=53.3, the above 33.3 is the number of data sets in the result set, Manhattan distance is an absolute distance.
进一步地,所述目标数据库中的数据结构为多层数据嵌套结构,所述第二获取模块3,包括:Further, the data structure in the target database is a multi-level data nesting structure, and the second acquisition module 3 includes:
第二获取子模块,用于获取对所述源数据库进行分片数据划分时,调用的所述优先级排序中的指定业务属性,其中,所述指定业务属性包含于所有所述源数据库的业务属性中。The second acquisition sub-module is used to acquire the designated service attributes in the priority ranking that are invoked when the source database is divided into slices, wherein the designated service attributes are included in all services of the source database Properties.
映射子模块,用于将各所述指定业务属性对应的优先级次序,与所述目标数据库中的多层数据嵌套结构建立一一对应映射关系,其中最高优先级的指定业务属性对应所述多层数据嵌套结构的最外层。The mapping submodule is used to establish a one-to-one mapping relationship between the priority order corresponding to each of the designated business attributes and the multi-level data nesting structure in the target database, wherein the designated business attribute with the highest priority corresponds to the The outermost layer of the multi-level data nesting structure.
本实施例的目标数据库包括非关系型数据库,比如cassandra数据库,cassandra数据库的数据结构为主键构建结构,本实施例以待迁徙数据中确定的数据分片划分点,作为cassandra数据库的主键参考,cassandra中的主键是数据组织的结构,cassandra中的数据是按主键顺序嵌套存储,比如主键是:K1,K2,K3..,可以理解为K1数据中嵌套K2数据,K2数据中再嵌套了K3数据。本实施例根据业务属性的优先级次序,映射为cassandra中的数据主键顺序,比如优先级级别高的业务属性对应的分片数据,映射为cassandra中的数据中最外层嵌套层的数据,以便实现数据在不同数据库之间迁徙时,具有清晰、合理的对应关系,文件数据顺序写,提高数据写入的效率,且方便数据的调用和管理。The target database in this embodiment includes non-relational databases, such as cassandra database. The data structure of the cassandra database is the primary key construction structure. In this embodiment, the data fragmentation point determined in the data to be migrated is used as the primary key reference of the cassandra database. The primary key in is the structure of data organization. The data in cassandra is stored in the order of the primary key. For example, the primary key is: K1, K2, K3..., which can be understood as K1 data nested K2 data, K2 data nested again K3 data. According to the priority order of business attributes, this embodiment is mapped to the data primary key sequence in cassandra. For example, the shard data corresponding to the business attributes with high priority is mapped to the data of the outermost nesting layer in the data in cassandra. In order to achieve a clear and reasonable correspondence between data migration between different databases, the file data is written in order, which improves the efficiency of data writing and facilitates the calling and management of data.
进一步地,所述迁徙模块4,包括:Further, the migration module 4 includes:
第二判断子模块,用于判断当前时刻对应的业务流量是否处于预设阈值内。The second judgment sub-module is used to judge whether the corresponding service flow at the current moment is within a preset threshold.
启动子模块,用于若是,则启动预设迁徙线程,并从所述源数据库中查寻出待迁徙数据。The start sub-module is used to start a preset migration thread if it is, and search for data to be migrated from the source database.
存放子模块,用于将待迁徙数据存放于缓存服务器中,并转换数据格式。The storage sub-module is used to store the data to be migrated in the cache server and convert the data format.
运行子模块,用于按照预设线程方式运行所述预设迁徙线程,并将待迁徙数据按照分片数据的方式,依次注入到所述目标数据库。The running sub-module is used to run the preset migration thread according to the preset thread mode, and sequentially inject the data to be migrated into the target database according to the mode of fragmented data.
本实施例通过对数据分片实现数据迁徙分批次有序进行,每批次之间可以独立进行,互相不具有依赖性,且每批次迁徙量与系统支撑的单次的传输量相匹配,单次迁徙失败,只需再次重复执行当前失败的单次传输,不影响整体数据的迁徙效果。且通过分批次迁徙,使得可利用系统运行其他业务的碎片时间完成数据迁徙,提高系统处理事务的效率。本实施例通过识别系统运行负荷状态判断当前为业务低谷期还是高峰期,比如业务流量处于预设阈值内,说明处于业务低谷期,则启动数据迁徙线程进行数据迁徙;业务流量未处于预设阈值内,说明处于业务高峰期,则挂起数据迁徙线程终止数据迁徙。不仅支持按业务属性控制迁徙任务,且保证业务数据按类别完整地进行迁徙,且避开业务高峰期。本实施例形成分片数据后,就可以启动线程开始执行数据迁徙。迁徙过程中将数据从oracle数据库中按分片信息查询并切割出来,切割的数据结果放在高速缓存服务器中,然后进行格式转换处理,再注入到cassandra数据库中;如此往复,一个分片数据一个分片数据的通过同一线程分阶段处理。也可以将不同的分片数据分配到不同的线程中并行处理,提高分片迁徙的效率,但需合理控制并行运行的线程数量,避免两端数据库过载。This embodiment implements data migration in batches and orderly by slicing the data. Each batch can be performed independently without mutual dependence, and the migration volume of each batch matches the single transmission volume supported by the system. , If a single migration fails, you only need to repeat the current failed single transmission again without affecting the overall data migration effect. And through the migration in batches, the fragmented time of the system running other businesses can be used to complete the data migration, and the efficiency of the system in processing transactions is improved. This embodiment judges whether the current business is at a low period or a peak period by identifying the operating load status of the system. For example, the business flow is within a preset threshold, indicating that it is in the low business period, and the data migration thread is started to perform data migration; the business flow is not at the preset threshold Inside, it means that in the peak period of business, the data migration thread is suspended to terminate data migration. It not only supports the control of migration tasks according to business attributes, but also ensures the complete migration of business data by category, and avoids business peak periods. After the fragmented data is formed in this embodiment, threads can be started to start data migration. During the migration process, the data is queried and cut out from the oracle database according to the fragmentation information, and the cut data results are stored in the cache server, and then formatted, and then injected into the cassandra database; in this way, one fragment of data is divided into one. The slice data is processed in stages through the same thread. Different fragmented data can also be allocated to different threads for parallel processing to improve the efficiency of fragment migration, but the number of threads running in parallel needs to be reasonably controlled to avoid overloading the databases at both ends.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库,上述可读存储介质包括非易失性可读存储介质和易失性可读存储介质。该内存器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储迁徙数据等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令在执行时,执行如上述各方法的实施例的流程。本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities. The memory of the computer device includes a readable storage medium and an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The above-mentioned readable storage medium includes a non-volatile readable storage medium and a volatile readable storage medium. The memory provides an environment for the operation of the operating system and computer readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store data such as migration data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instruction is executed, it executes the process of the above-mentioned method embodiment. Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机可读指令,该计算机可读指令在执行时,执行如上述各方法的实施例的流程。上述可读存储介质包括非易失性可读存储介质和易失性可读存储介质。以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。An embodiment of the present application further provides a computer-readable storage medium on which computer-readable instructions are stored. When the computer-readable instructions are executed, the processes of the foregoing method embodiments are executed. The above-mentioned readable storage medium includes non-volatile readable storage medium and volatile readable storage medium. The above are only the preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of this application description and drawings, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims (20)

  1. 一种迁徙数据的方法,其特征在于,包括:A method for data migration, which is characterized in that it includes:
    获取源数据库的业务属性,其中,所述源数据库为存放待迁徙数据的数据库,所述源数据库包括第一索引表;Acquiring business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table;
    根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据;According to the partition of the first index table and the business attribute of the source database, dividing the data of the source database into a specified number of fragmented data according to a preset dividing manner;
    获取各所述分片数据分别与目标数据库中的数据结构的对应关系,其中,所述目标数据库为存放迁徙后的所述迁徙数据的数据库;Acquiring the corresponding relationship between each of the fragmented data and the data structure in the target database, wherein the target database is a database storing the migration data after migration;
    根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库。According to the corresponding relationship, each piece of data is migrated from the source database to the target database in a preset migration manner.
  2. 根据权利要求1所述的迁徙数据的方法,其特征在于,所述根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据的步骤,包括:The method for migrating data according to claim 1, wherein the data of the source database is divided into a predetermined division method according to the partition of the first index table and the business attribute of the source database. The steps to specify the number of pieces of data include:
    获取所有所述源数据库的业务属性分别对应的优先级排序,其中,所述优先级排序为优先级级别从高到低的排序;Acquiring the priority rankings corresponding to the business attributes of all the source databases, where the priority ranking is a ranking of priority levels from high to low;
    从所述优先级排序中选择指定序号前的第一业务属性对应的各分区作为所述第一索引表的数据划分标准,其中,所述第一业务属性包含于所有所述源数据库的业务属性中,所述第一业务属性包括索引项属性;Select each partition corresponding to the first business attribute before the designated serial number from the priority ranking as the data division standard of the first index table, wherein the first business attribute includes all business attributes of the source database Wherein, the first business attribute includes an index item attribute;
    根据所述第一索引表的数据划分标准,将所述第一索引表划分为各分区分别对应的分索引;Dividing the first index table into sub-indexes corresponding to each partition according to the data division standard of the first index table;
    判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内;Judging whether the data volume of the source database corresponding to each sub-index is within a preset single transmission volume;
    若是,则根据各所述分索引将所述源数据库的数据划分成第一指定数量的第一分片数据,其中,所述第一指定数量为所述第一业务属性对应的各分区的数量。If yes, divide the data of the source database into a first specified number of first fragmented data according to each of the sub-indexes, where the first specified number is the number of each partition corresponding to the first business attribute .
  3. 根据权利要求2所述的迁徙数据的方法,其特征在于,所述判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内的步骤之后,包括:The method for migrating data according to claim 2, wherein after the step of determining whether the data volume of the source database corresponding to each of the sub-indexes is within a preset single transmission volume, the method comprises:
    若各所述分索引分别对应的数据量不在预设单次传输量之内,则添加第二业务属性对应的分区,其中,所述第二业务属性为所有所述源数据库的业务属性中除索引项属性之外的属性,所述第二业务属性至少包括一个;If the data volume corresponding to each of the sub-indexes is not within the preset single transmission volume, then the partition corresponding to the second service attribute is added, where the second service attribute is the service attribute of all the source databases divided by Attributes other than index item attributes, the second business attribute includes at least one;
    根据各所述分索引以及所述第二业务属性对应的分区,将所述源数据库的数据划分成第二指定数量的第二分片数据,使各所述第二分片数据的数据量在预设单次传输量之内,其中,所述第二指定数量为所述第一业务属性的分区数量与所述第二业务属性的分区数量的乘积。According to each of the sub-indexes and the partitions corresponding to the second business attributes, the data of the source database is divided into a second specified amount of second piece of data, so that the data volume of each of the second piece of data is Within the preset single transmission amount, wherein the second specified number is a product of the number of partitions of the first service attribute and the number of partitions of the second service attribute.
  4. 根据权利要求2所述的迁徙数据的方法,其特征在于,所述获取所有所述源数据库的业务属性分别对应的优先级排序的步骤,包括:The method of migration data according to claim 2, wherein the step of obtaining the priority rankings corresponding to the business attributes of all the source databases comprises:
    按照预设采集规则,从所述源数据库中采集第一数量的数据集;Collecting a first number of data sets from the source database according to a preset collection rule;
    获取服务系统单次迁徙数据的数据量;Obtain the data volume of a single migration data of the service system;
    根据所述第一数量除以所述单次迁徙数据的数据量,得到划分能力系数;Divide the first number by the data amount of the single migration data to obtain the division ability coefficient;
    计算各所述源数据库的业务属性分别对应的聚合结果集与所述划分能力系数的接近程度,其中,所述聚合结果集为数据集的聚合分类结果,所述聚合结果集的数量等于各所述源数据库的业务属性的分区数量;Calculate the closeness of the aggregation result set corresponding to the business attributes of each source database to the division capability coefficient, where the aggregation result set is the aggregation classification result of the data set, and the number of aggregation result sets is equal to State the number of partitions of the business attribute of the source database;
    根据各所述接近程度确定所述优先级排序,其中,所述接近程度越高的聚合结果集对应的优先级越高。The priority ranking is determined according to the respective degrees of proximity, wherein the higher the degree of proximity, the higher the priority corresponding to the aggregation result set.
  5. 根据权利要求4所述的迁徙数据的方法,其特征在于,所述划分能力系数表示为均分结果集,所述均分结果集包括按照预设采集规则间隔采集的数据集总量除以聚合结果集的数量的数据集,所述根据各所述接近程度确定所述优先级排序的步骤,包括:The method for migration data according to claim 4, wherein the division ability coefficient is expressed as an evenly divided result set, and the evenly divided result set includes the total amount of data sets collected at a preset collection rule interval divided by the aggregate For a data set of the number of result sets, the step of determining the priority ranking according to each of the proximity degrees includes:
    判断是否存在与所述划分能力系数具有相同接近程度的第三业务属性和第四业务属性;Judging whether there is a third service attribute and a fourth service attribute that have the same degree of closeness as the division capability coefficient;
    若存在,则获取所述第三业务属性对应于所述均分结果集中数据集的数量的第一曼哈顿距离,以及所述第四业务属性对应于所述均分结果集中数据集的数量的第二曼哈顿距离;If so, obtain the first Manhattan distance corresponding to the number of data sets in the equalized result set by the third business attribute, and obtain the first Manhattan distance corresponding to the number of data sets in the equalized result set by the fourth business attribute. Two Manhattan distance;
    判断所述第一曼哈顿距离是否大于所述第二曼哈顿距离;Determine whether the first Manhattan distance is greater than the second Manhattan distance;
    若是,则将所述第二曼哈顿距离对应的所述第四业务属性的优先级次序,排在所述第一曼哈顿距离对应的所述第三业务属性之前。If yes, the priority order of the fourth service attribute corresponding to the second Manhattan distance is arranged before the third service attribute corresponding to the first Manhattan distance.
  6. 根据权利要求2所述的迁徙数据的方法,其特征在于,所述目标数据库中的数据结构为多层数据嵌套结构,所述获取各所述分片数据分别与目标数据库中的数据结构的对应关系的步骤,包括:The method of migrating data according to claim 2, wherein the data structure in the target database is a multi-layer data nesting structure, and the acquisition of each piece of data is different from the data structure in the target database. The steps of the corresponding relationship include:
    获取对所述源数据库进行分片数据划分时,调用的所述优先级排序中的所有指定业务属性,其中,所述指定业务属性包含于所有所述源数据库的业务属性中;Acquiring all the designated business attributes in the priority ranking that are invoked when the source database is divided into pieces of data, wherein the designated business attributes are included in all business attributes of the source database;
    将各所述指定业务属性对应的优先级次序,与所述目标数据库中的多层数据嵌套结构建立一一对应映射关系,其中最高优先级的指定业务属性对应所述多层数据嵌套结构的最外层。Establish a one-to-one mapping relationship between the priority order corresponding to each of the designated business attributes and the multi-level data nesting structure in the target database, wherein the designated business attribute with the highest priority corresponds to the multi-level data nesting structure The outermost layer.
  7. 根据权利要求1所述的迁徙数据的方法,其特征在于,所述根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库的步骤,包括:The method for migrating data according to claim 1, wherein the step of migrating each piece of data from the source database to the target database according to a preset migration mode according to the corresponding relationship, include:
    判断当前时刻对应的业务流量是否处于预设阈值;Determine whether the corresponding business traffic at the current moment is within a preset threshold;
    若是,则启动预设迁徙线程,并从所述源数据库中查寻出所述待迁徙数据;If yes, start a preset migration thread, and search for the data to be migrated from the source database;
    将所述待迁徙数据存放于缓存服务器中,并转换数据格式;Store the data to be migrated in a cache server, and convert the data format;
    按照预设线程方式运行所述预设迁徙线程,并将所述待迁徙数据按照分片数据的方式,依次注入到所述目标数据库。Run the preset migration thread according to the preset thread mode, and sequentially inject the data to be migrated into the target database according to the mode of fragmented data.
  8. 一种迁徙数据的装置,其特征在于,包括:A device for migrating data is characterized in that it comprises:
    第一获取模块,用于获取源数据库的业务属性,其中,所述源数据库为存放待迁徙数据的数据库,所述源数据库包括第一索引表;The first obtaining module is configured to obtain the business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table;
    划分模块,用于根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据;A dividing module, configured to divide the data of the source database into a specified number of fragmented data according to a preset dividing manner according to the partition of the first index table and the business attributes of the source database;
    第二获取模块,用于获取各所述分片数据分别与目标数据库中的数据结构的对应关系,其中,所述目标数据库为存放迁徙后的所述迁徙数据的数据库;The second acquisition module is configured to acquire the corresponding relationship between each of the fragmented data and the data structure in the target database, wherein the target database is a database storing the migration data after migration;
    迁徙模块,用于根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库。The migration module is configured to migrate each piece of data from the source database to the target database according to a preset migration mode according to the corresponding relationship.
  9. 根据权利要求8所述的迁徙数据的装置,其特征在于,所述划分模块,包括:The device for migration data according to claim 8, wherein the division module comprises:
    第一获取子模块,用于获取所有所述源数据库的业务属性分别对应的优先级排序,其中,所述优先级排序为优先级级别从高到低的排序;The first obtaining sub-module is configured to obtain the priority rankings corresponding to the business attributes of all the source databases, wherein the priority ranking is a ranking of priority levels from high to low;
    选择子模块,用于从所述优先级排序中选择指定序号前的第一业务属性对应的各分区作为所述第一索引表的数据划分标准,其中,所述第一业务属性包含于所有所述源数据库的业务属性中,所述第一业务属性包括索引项属性;The selection sub-module is used to select each partition corresponding to the first business attribute before the designated serial number from the priority ranking as the data division standard of the first index table, wherein the first business attribute is included in all In the business attributes of the source database, the first business attributes include index item attributes;
    第一划分子模块,用于根据所述第一索引表的数据划分标准,将所述第一索引表划分为各分区分别对应的分索引;The first division sub-module is configured to divide the first index table into sub-indexes corresponding to each partition according to the data division standard of the first index table;
    第一判断子模块,用于判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内;The first judging submodule is used to judge whether the data volume of the source database corresponding to each of the sub-indexes is within a preset single transmission volume;
    第二划分子模块,用于若是,则根据各所述分索引将所述源数据库的数据划分成第一指定数量的第一分片数据,其中,所述第一指定数量为所述第一业务属性对应的各分区的数量。The second division sub-module is configured to, if yes, divide the data of the source database into a first specified number of first fragmented data according to each of the sub-indexes, where the first specified number is the first The number of each partition corresponding to the business attribute.
  10. 根据权利要求9所述的迁徙数据的装置,其特征在于,所述划分模块,包括:The device for migration data according to claim 9, wherein the division module comprises:
    添加子模块,用于若各所述分索引分别对应的数据量不在预设单次传输量之内,则添加第二业务属性对应的分区,其中,所述第二业务属性为所有所述源数据库的业务属性中除索引项属性之外的属性,所述第二业务属性至少包括一个;Adding a sub-module for adding a partition corresponding to a second service attribute if the data volume corresponding to each sub-index is not within the preset single transmission volume, where the second service attribute is all the sources Attributes other than index item attributes in the business attributes of the database, the second business attribute includes at least one;
    第三划分子模块,用于根据各所述分索引以及所述第二业务属性对应的分区,将所述源数据库的数据划分成第二指定数量的第二分片数据,使各所述第二分片数据的数据量在预设单次传输量之内,其中,所述第二指定数量为所述第一业务属性的分区数量与所述第二业务属性的分区数量的乘积。The third division submodule is configured to divide the data of the source database into a second specified number of second fragmented data according to each of the sub-indexes and the partition corresponding to the second business attribute, so that each of the first The data volume of the two-slice data is within a preset single transmission volume, wherein the second specified number is a product of the number of partitions of the first service attribute and the number of partitions of the second service attribute.
  11. 根据权利要求9所述的迁徙数据的装置,其特征在于,所述获取子模块,包括:The device for migration data according to claim 9, wherein the acquiring sub-module comprises:
    采集单元,用于按照预设采集规则,从所述源数据库中采集第一数量的数据集;A collection unit, configured to collect a first number of data sets from the source database according to a preset collection rule;
    获取单元,用于获取服务系统单次迁徙数据的数据量;The acquisition unit is used to acquire the data volume of a single migration data of the service system;
    得到单元,用于根据所述第一数量除以所述单次迁徙数据的数据量,得到划分能力系数;The obtaining unit is configured to obtain the division capability coefficient according to the data amount of the single migration data divided by the first amount;
    计算单元,用于计算各所述源数据库的业务属性分别对应的聚合结果集与所述划分能力系数的接近程度,其中,所述聚合结果集为数据集的聚合分类结果,所述聚合结果集的数量等于各所述源数据库的业务属性的分区数量;The calculation unit is configured to calculate the closeness of the aggregation result set corresponding to the business attributes of each source database to the division capability coefficient, wherein the aggregation result set is the aggregation classification result of the data set, and the aggregation result set The number of is equal to the number of partitions of the business attributes of each source database;
    确定单元,用于根据各所述接近程度确定所述优先级排序,其中,所述接近程度越高的聚合结果集对应的优先级越高。The determining unit is configured to determine the priority ranking according to each of the proximity degrees, wherein the higher the proximity degree, the higher the priority corresponding to the aggregation result set.
  12. 根据权利要求11所述的迁徙数据的装置,其特征在于,所述划分能力系数表示为均分结果集,所述均分结果集包括按照预设采集规则间隔采集的数据集总量除以聚合结果集的数量的数据集,所述确定单元,包括:The migration data device according to claim 11, wherein the division capability coefficient is expressed as an evenly divided result set, and the evenly divided result set includes the total amount of data sets collected at a preset collection rule interval divided by the aggregate A data set of the number of result sets, and the determining unit includes:
    第一判断子单元,用于判断是否存在与所述划分能力系数具有相同接近程度的第三业务属性和第四业务属性;The first judging subunit is used to judge whether there are third service attributes and fourth service attributes that have the same degree of closeness as the division capability coefficient;
    获取子单元,用于若存在,则获取所述第三业务属性对应于所述均分结果集中数据集的数量的第一曼哈顿距离,以及所述第四业务属性对应于所述均分结果集中数据集的数量的第二曼哈顿距离;The acquiring subunit is configured to, if it exists, acquire the first Manhattan distance in which the third business attribute corresponds to the number of data sets in the evenly divided result set, and the fourth business attribute corresponds to the evenly divided result set The second Manhattan distance of the number of data sets;
    第二判断子单元,用于判断所述第一曼哈顿距离是否大于所述第二曼哈顿距离;The second judgment subunit is used to judge whether the first Manhattan distance is greater than the second Manhattan distance;
    排序子单元,用于若是,则将所述第二曼哈顿距离对应的所述第四业务属性的优先级次序,排在所述第一曼哈顿距离对应的所述第三业务属性之前。The sorting subunit is configured to, if yes, arrange the priority order of the fourth service attribute corresponding to the second Manhattan distance before the third service attribute corresponding to the first Manhattan distance.
  13. 根据权利要求9所述的迁徙数据的装置,其特征在于,所述目标数据库中的数据结构为多层数据嵌套结构,所述第二获取模块,包括:The device for migrating data according to claim 9, wherein the data structure in the target database is a multi-level data nesting structure, and the second acquiring module comprises:
    第二获取子模块,用于获取对所述源数据库进行分片数据划分时,调用的所述优先级排序中的所有指定业务属性,其中,所述指定业务属性包含于所有所述源数据库的业务属性中;The second acquisition sub-module is used to acquire all the designated business attributes in the priority ranking that are invoked when the source database is divided into fragments, wherein the designated business attributes are included in all the source database Business attributes;
    映射子模块,用于将各所述指定业务属性对应的优先级次序,与所述目标数据库中的多层数据嵌套结构建立一一对应映射关系,其中最高优先级的指定业务属性对应所述多层数据嵌套结构的最外层。The mapping submodule is used to establish a one-to-one mapping relationship between the priority order corresponding to each of the designated business attributes and the multi-level data nesting structure in the target database, wherein the designated business attribute with the highest priority corresponds to the The outermost layer of the multi-level data nesting structure.
  14. 根据权利要求8所述的迁徙数据的装置,其特征在于,所述迁徙模块,包括:The device for migration data according to claim 8, wherein the migration module comprises:
    第二判断子模块,用于判断当前时刻对应的业务流量是否处于预设阈值;The second judging sub-module is used to judge whether the corresponding service flow at the current moment is at a preset threshold;
    启动子模块,用于若是,则启动预设迁徙线程,并从所述源数据库中查寻出所述待迁徙数据;The start sub-module is used to start a preset migration thread if it is, and search for the data to be migrated from the source database;
    存放子模块,用于将所述待迁徙数据存放于缓存服务器中,并转换数据格式;The storage sub-module is used to store the data to be migrated in the cache server and convert the data format;
    运行子模块,用于按照预设线程方式运行所述预设迁徙线程,并将所述待迁徙数据按照分片数据的方式,依次注入到所述目标数据库。The running sub-module is configured to run the preset migration thread according to the preset thread mode, and sequentially inject the data to be migrated into the target database according to the mode of fragmented data.
  15. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现迁徙数据的方法,迁徙数据的方法,包括:A computer device includes a memory and a processor, the memory stores a computer program, and is characterized in that a method for migrating data when the processor executes the computer program, and a method for migrating data includes:
    获取源数据库的业务属性,其中,所述源数据库为存放待迁徙数据的数据库,所述源数据库包括第一索引表;Acquiring business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table;
    根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据;According to the partition of the first index table and the business attribute of the source database, dividing the data of the source database into a specified number of fragmented data according to a preset dividing manner;
    获取各所述分片数据分别与目标数据库中的数据结构的对应关系,其中,所述目标数据库为存放迁徙后的所述迁徙数据的数据库;Acquiring the corresponding relationship between each of the fragmented data and the data structure in the target database, wherein the target database is a database storing the migration data after migration;
    根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库。According to the corresponding relationship, each piece of data is migrated from the source database to the target database in a preset migration manner.
  16. 根据权利要求15所述的计算机设备,其特征在于,所述根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据的步骤,包括:15. The computer device according to claim 15, wherein the data in the source database is divided into a specified number according to the partition of the first index table and the business attributes of the source database according to a preset division method The steps of sharding data include:
    获取所有所述源数据库的业务属性分别对应的优先级排序,其中,所述优先级排序为优先级级别从高到低的排序;Acquiring the priority rankings corresponding to the business attributes of all the source databases, where the priority ranking is a ranking of priority levels from high to low;
    从所述优先级排序中选择指定序号前的第一业务属性对应的各分区作为所述第一索引表的数据划分标准,其中,所述第一业务属性包含于所有所述源数据库的业务属性中,所述第一业务属性包括索引项属性;Select each partition corresponding to the first business attribute before the designated serial number from the priority ranking as the data division standard of the first index table, wherein the first business attribute includes all business attributes of the source database Wherein, the first business attribute includes an index item attribute;
    根据所述第一索引表的数据划分标准,将所述第一索引表划分为各分区分别对应的分索引;Dividing the first index table into sub-indexes corresponding to each partition according to the data division standard of the first index table;
    判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内;Judging whether the data volume of the source database corresponding to each sub-index is within a preset single transmission volume;
    若是,则根据各所述分索引将所述源数据库的数据划分成第一指定数量的第一分片数据,其中,所述第一指定数量为所述第一业务属性对应的各分区的数量。If yes, divide the data of the source database into a first specified number of first fragmented data according to each of the sub-indexes, where the first specified number is the number of each partition corresponding to the first business attribute .
  17. 根据权利要求16所述的计算机设备,其特征在于,所述判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内的步骤之后,包括:16. The computer device according to claim 16, wherein after the step of determining whether the data volume of the source database corresponding to each of the sub-indexes is within a preset single transmission volume, the step comprises:
    若各所述分索引分别对应的数据量不在预设单次传输量之内,则添加第二业务属性对应的分区,其中,所述第二业务属性为所有所述源数据库的业务属性中除索引项属性之外的属性,所述第二业务属性至少包括一个;If the data volume corresponding to each of the sub-indexes is not within the preset single transmission volume, then the partition corresponding to the second service attribute is added, where the second service attribute is the service attribute of all the source databases divided by Attributes other than index item attributes, the second business attribute includes at least one;
    根据各所述分索引以及所述第二业务属性对应的分区,将所述源数据库的数据划分成第二指定数量的第二分片数据,使各所述第二分片数据的数据量在预设单次传输量之内,其中,所述第二指定数量为所述第一业务属性的分区数量与所述第二业务属性的分区数量的乘积。According to each of the sub-indexes and the partitions corresponding to the second business attributes, the data of the source database is divided into a second specified amount of second piece of data, so that the data volume of each of the second piece of data is Within the preset single transmission amount, wherein the second specified number is a product of the number of partitions of the first service attribute and the number of partitions of the second service attribute.
  18. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现迁徙数据的方法,迁徙数据的方法包括:A computer-readable storage medium having a computer program stored thereon, wherein the method for migrating data is realized when the computer program is executed by a processor, and the method for migrating data includes:
    获取源数据库的业务属性,其中,所述源数据库为存放待迁徙数据的数据库,所述源数据库包括第一索引表;Acquiring business attributes of a source database, where the source database is a database storing data to be migrated, and the source database includes a first index table;
    根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据;According to the partition of the first index table and the business attribute of the source database, dividing the data of the source database into a specified number of fragmented data according to a preset dividing manner;
    获取各所述分片数据分别与目标数据库中的数据结构的对应关系,其中,所述目标数据库为存放迁徙后的所述迁徙数据的数据库;Acquiring the corresponding relationship between each of the fragmented data and the data structure in the target database, wherein the target database is a database storing the migration data after migration;
    根据所述对应关系,将各所述分片数据按照预设迁徙方式从所述源数据库迁徙至所述目标数据库。According to the corresponding relationship, each piece of data is migrated from the source database to the target database in a preset migration manner.
  19. 根据权利要求18所述的计算机可读存储介质,其特征在于,所述根据所述第一索引表的分区以及所述源数据库的业务属性,按照预设划分方式将所述源数据库的数据划分成指定数量的分片数据的步骤,包括:The computer-readable storage medium according to claim 18, wherein the data of the source database is divided according to a preset division method according to the partition of the first index table and the business attribute of the source database The steps to form a specified number of fragmented data include:
    获取所有所述源数据库的业务属性分别对应的优先级排序,其中,所述优先级排序为优先级级别从高到低的排序;Acquiring the priority rankings corresponding to the business attributes of all the source databases, where the priority ranking is a ranking of priority levels from high to low;
    从所述优先级排序中选择指定序号前的第一业务属性对应的各分区作为所述第一索引表的数据划分标准,其中,所述第一业务属性包含于所有所述源数据库的业务属性中,所述第一业务属性包括索引项属性;Select each partition corresponding to the first business attribute before the designated serial number from the priority ranking as the data division standard of the first index table, wherein the first business attribute includes all business attributes of the source database Wherein, the first business attribute includes an index item attribute;
    根据所述第一索引表的数据划分标准,将所述第一索引表划分为各分区分别对应的分索引;Dividing the first index table into sub-indexes corresponding to each partition according to the data division standard of the first index table;
    判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内;Judging whether the data volume of the source database corresponding to each sub-index is within a preset single transmission volume;
    若是,则根据各所述分索引将所述源数据库的数据划分成第一指定数量的第一分片数据,其中,所述第一指定数量为所述第一业务属性对应的各分区的数量。If yes, divide the data of the source database into a first specified number of first fragmented data according to each of the sub-indexes, where the first specified number is the number of each partition corresponding to the first business attribute .
  20. 根据权利要求19所述的计算机可读存储介质,其特征在于,所述判断各所述分索引分别对应的所述源数据库的数据量是否在预设单次传输量之内的步骤之后,包括:The computer-readable storage medium according to claim 19, wherein after the step of determining whether the data volume of the source database corresponding to each of the sub-indexes is within a preset single transmission volume, the step comprises :
    若各所述分索引分别对应的数据量不在预设单次传输量之内,则添加第二业务属性对应的分区,其中,所述第二业务属性为所有所述源数据库的业务属性中除索引项属性之外的属性,所述第二业务属性至少包括一个;If the data volume corresponding to each of the sub-indexes is not within the preset single transmission volume, then the partition corresponding to the second service attribute is added, where the second service attribute is the service attribute of all the source databases divided by Attributes other than index item attributes, the second business attribute includes at least one;
    根据各所述分索引以及所述第二业务属性对应的分区,将所述源数据库的数据划分成第二指定数量的第二分片数据,使各所述第二分片数据的数据量在预设单次传输量之内,其中,所述第二指定数量为所述第一业务属性的分区数量与所述第二业务属性的分区数量的乘积。According to each of the sub-indexes and the partitions corresponding to the second business attributes, the data of the source database is divided into a second specified amount of second piece of data, so that the data volume of each of the second piece of data is Within the preset single transmission amount, wherein the second specified number is a product of the number of partitions of the first service attribute and the number of partitions of the second service attribute.
PCT/CN2019/116706 2019-07-30 2019-11-08 Data migration method and apparatus, computer device, and storage medium WO2021017269A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910696304.9 2019-07-30
CN201910696304.9A CN110580246B (en) 2019-07-30 2019-07-30 Method, device, computer equipment and storage medium for migrating data

Publications (1)

Publication Number Publication Date
WO2021017269A1 true WO2021017269A1 (en) 2021-02-04

Family

ID=68810517

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116706 WO2021017269A1 (en) 2019-07-30 2019-11-08 Data migration method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN110580246B (en)
WO (1) WO2021017269A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204538A (en) * 2021-04-27 2021-08-03 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for data migration
CN113485981A (en) * 2021-08-12 2021-10-08 北京青云科技股份有限公司 Data migration method and device, computer equipment and storage medium
CN113596153A (en) * 2021-07-28 2021-11-02 新华智云科技有限公司 Data equalization method and system

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113760858B (en) * 2020-06-05 2024-03-19 中国移动通信集团湖北有限公司 Dynamic migration method and device for memory database data, computing equipment and storage equipment
CN116205397B (en) * 2023-02-10 2023-10-20 广州市中大信息技术有限公司 Digital enterprise management system and method based on big data
CN116401435B (en) * 2023-02-22 2023-11-10 北京麦克斯泰科技有限公司 Method and device for calculating and scheduling heat of daily active columns

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102308297A (en) * 2011-07-13 2012-01-04 华为技术有限公司 Data migration method, data migration device and data migration system
CN105868343A (en) * 2016-03-28 2016-08-17 上海携程商务有限公司 Database migration method and system
CN106055698A (en) * 2016-06-14 2016-10-26 智者四海(北京)技术有限公司 Data migration method, agent node and database instance
CN106933859A (en) * 2015-12-30 2017-07-07 中国移动通信集团公司 The moving method and device of a kind of medical data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8768495B2 (en) * 2010-06-09 2014-07-01 Adelphoi Limited System and method for media recognition
WO2014120380A1 (en) * 2013-02-04 2014-08-07 Olsen David Allen System and method for grouping segments of data sequences into clusters
US20140280375A1 (en) * 2013-03-15 2014-09-18 Ryan Rawson Systems and methods for implementing distributed databases using many-core processors
CN107346312A (en) * 2016-05-05 2017-11-14 中国移动通信集团内蒙古有限公司 A kind of big data processing method and system
CN108304553B (en) * 2018-02-01 2021-04-27 平安普惠企业管理有限公司 Data migration method and device, computer equipment and storage medium
CN109885256B (en) * 2019-01-23 2022-07-08 平安科技(深圳)有限公司 Data storage method, device and medium based on data slicing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102308297A (en) * 2011-07-13 2012-01-04 华为技术有限公司 Data migration method, data migration device and data migration system
CN106933859A (en) * 2015-12-30 2017-07-07 中国移动通信集团公司 The moving method and device of a kind of medical data
CN105868343A (en) * 2016-03-28 2016-08-17 上海携程商务有限公司 Database migration method and system
CN106055698A (en) * 2016-06-14 2016-10-26 智者四海(北京)技术有限公司 Data migration method, agent node and database instance

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204538A (en) * 2021-04-27 2021-08-03 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for data migration
CN113596153A (en) * 2021-07-28 2021-11-02 新华智云科技有限公司 Data equalization method and system
CN113485981A (en) * 2021-08-12 2021-10-08 北京青云科技股份有限公司 Data migration method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110580246A (en) 2019-12-17
CN110580246B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
WO2021017269A1 (en) Data migration method and apparatus, computer device, and storage medium
CN104794123B (en) A kind of method and device building NoSQL database indexes for semi-structured data
US10289718B2 (en) Partition access method for query optimization
CN103020204B (en) A kind of method and its system carrying out multi-dimensional interval query to distributed sequence list
US7890541B2 (en) Partition by growth table space
US9535956B2 (en) Efficient set operation execution using a single group-by operation
US8543596B1 (en) Assigning blocks of a file of a distributed file system to processing units of a parallel database management system
CN103246749B (en) The matrix database system and its querying method that Based on Distributed calculates
US20160350302A1 (en) Dynamically splitting a range of a node in a distributed hash table
US8965941B2 (en) File list generation method, system, and program, and file list generation device
CN108897761A (en) A kind of clustering storage method and device
CN104111936B (en) Data query method and system
JP2004070403A (en) File storage destination volume control method
CN101916280A (en) Parallel computing system and method for carrying out load balance according to query contents
CN108509437A (en) A kind of ElasticSearch inquiries accelerated method
US10592153B1 (en) Redistributing a data set amongst partitions according to a secondary hashing scheme
CN104054071A (en) Method for accessing storage device and storage device
CN107590257A (en) A kind of data base management method and device
CN107193898A (en) The inquiry sharing method and system of log data stream based on stepped multiplexing
CN106599091A (en) Storage and indexing method of RDF graph structures stored based on key values
WO2021016050A1 (en) Multi-record index structure for key-value stores
CN114281989A (en) Data deduplication method and device based on text similarity, storage medium and server
US9275091B2 (en) Database management device and database management method
CN113760190A (en) Small file merging system and method based on Ceph storage
Doulkeridis et al. On saying" enough already!" in mapreduce

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19939110

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19939110

Country of ref document: EP

Kind code of ref document: A1