CN111125059A - Data migration method and device, storage medium and server - Google Patents

Data migration method and device, storage medium and server Download PDF

Info

Publication number
CN111125059A
CN111125059A CN201911301449.0A CN201911301449A CN111125059A CN 111125059 A CN111125059 A CN 111125059A CN 201911301449 A CN201911301449 A CN 201911301449A CN 111125059 A CN111125059 A CN 111125059A
Authority
CN
China
Prior art keywords
data
migration
migration object
thread
migrated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911301449.0A
Other languages
Chinese (zh)
Other versions
CN111125059B (en
Inventor
杨帆
王杰
尚应
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaoshi Technology Jiangsu Co ltd
Original Assignee
Nanjing Zhenshi Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Zhenshi Intelligent Technology Co Ltd filed Critical Nanjing Zhenshi Intelligent Technology Co Ltd
Priority to CN201911301449.0A priority Critical patent/CN111125059B/en
Publication of CN111125059A publication Critical patent/CN111125059A/en
Application granted granted Critical
Publication of CN111125059B publication Critical patent/CN111125059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Abstract

The embodiment of the application discloses a data migration method, a data migration device, a storage medium and a server, and belongs to the technical field of databases. The method comprises the following steps: determining a migration object in a data table to be migrated, wherein the migration object is a minimum granularity partition or a partition-free data table in the data table with partitions; for each migration object, calculating the number of threads required to be used by the migration object in the migration process; fragmenting the migration object according to the thread number to obtain a plurality of data fragments, wherein each data fragment corresponds to one thread; and migrating the corresponding data fragments in parallel by utilizing each thread. According to the data migration method and device, the data table can be divided into the data fragments, and then the data fragments are migrated in parallel through the multiple threads, so that time consumption of data migration is reduced, and efficiency of data migration is improved.

Description

Data migration method and device, storage medium and server
Technical Field
The embodiment of the application relates to the technical field of databases, in particular to a data migration method, a data migration device, a storage medium and a server.
Background
A large amount of data is stored in the database, and in some application scenarios, the data in the database needs to be migrated from the source device to the destination device. Wherein the source device is a device that provides the migrated data and the target device is a device that receives the migrated data.
In the related art, a server may create a process, call an exp export tool through the process, and obtain data from a source device through the exp export tool; and then the data is sent to the target equipment through the imp import tool.
Since the data in the database is typically stored in tabular form, the server may migrate the entire data table through a single thread. When the amount of data in the data table is large, the time consumption of data migration is long, and the efficiency of data migration is affected.
Disclosure of Invention
The embodiment of the application provides a data migration method, a data migration device, a storage medium and a server, which are used for solving the problem that the efficiency of data migration is affected due to long time consumption of data migration caused by single-thread migration of the whole data table. The technical scheme is as follows:
in one aspect, a data migration method is provided, and the method includes:
determining a migration object in a data table to be migrated, wherein the migration object is a minimum granularity partition or a partition-free data table in the data table with partitions;
for each migration object, calculating the number of threads required to be used by the migration object in the migration process;
fragmenting the migration object according to the thread number to obtain a plurality of data fragments, wherein each data fragment corresponds to one thread;
and migrating the corresponding data fragments in parallel by utilizing each thread.
In one aspect, an apparatus for data migration is provided, the apparatus comprising:
the determining module is used for determining a migration object in a data table to be migrated, wherein the migration object is a minimum granularity partition or a partition-free data table in the data table with partitions;
the calculation module is used for calculating the number of threads required to be used by each migration object in the migration process;
the fragmentation module is used for fragmenting the migration object according to the thread number to obtain a plurality of data fragments, and each data fragment corresponds to one thread;
and the migration module is used for migrating the corresponding data fragments in parallel by utilizing all the threads.
In one aspect, a computer-readable storage medium is provided having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by a processor to implement a data migration method as described above.
In one aspect, a server is provided, which includes a processor and a memory, where at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement the data migration method described above.
The technical scheme provided by the embodiment of the application has the beneficial effects that at least:
after determining the migration objects in the data table to be migrated, for each migration object, the number of threads required to be used by the migration object in the migration process can be calculated, and then the migration object is fragmented according to the number of the threads to obtain a plurality of data fragments.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method of data migration according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of data migration according to another embodiment of the present application;
fig. 3 is a block diagram of a data migration apparatus according to still another embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.
The following introduces application scenarios of the present application
The application is applied to an application scenario of migrating data in a database, where the database may be an Oracle database or other databases, and this embodiment is not limited.
When data migration is performed, three devices are involved, one is a source device for providing data, one is a server for relaying data, and the other is a destination device for receiving data. That is, the server may obtain data to be migrated from the source device and send the data to the destination device. The following describes a flow of data migration performed by the server.
Referring to fig. 1, a flowchart of a data migration method provided in an embodiment of the present application is shown, where the data migration method may be applied in a server. The data migration method may include:
step 101, determining a migration object in a data table to be migrated, where the migration object is a minimum granularity partition or a partition-free data table in a data table with partitions.
Before the embodiment is executed, the server needs to be configured. In one implementation, a server may be configured with first connection information of a first database in a source device and second connection information of a second database in a destination device, where the first connection information includes a database identifier, a user name, and a password of the first database, so as to facilitate the server to establish a connection with the first database in the source device, and the second connection information includes a database identifier, a user name, and a password of the second database, so as to facilitate the server to establish a connection with the second database in the destination device.
In this embodiment, data configuration information required to be used when the migration data table is configured to the server may also be used. Since data needs to be migrated from a data table in the first database to a data table in the second database, for convenience of distinction, the data table of the data to be migrated is hereinafter referred to as a first data table, and the data table of the data to be received is hereinafter referred to as a second data table.
For example, the data configuration information may include a first data table user, a first data table name, a first data table partition name, a first data table sub-partition name, a second data table user, a second data table name, a unique index field or a field with many different data values in the first data table, a physical size of the first data table, a data amount of the first data table, a number of data blocks used by the first data table, whether the first data table contains a BLOB (Binary Large Object) field, whether the first data table contains a BLOB (Character Large Object) field, and the like, which is not limited in this embodiment.
Optionally, after the server obtains the data configuration information, a first data migration preparation table may be created, and the data configuration information may be written in the first data migration preparation table.
The data is typically stored in a database in the form of a data table. When the data amount in the data table is large, the data table can be partitioned to obtain data partitions, so that the operation efficiency of the data in the data table is improved. Further, when the data size of the data partition is still large, the data partition may be partitioned to obtain a data sub-partition, and the granularity of partitioning the data table is not limited in this embodiment. Of course, when the amount of data in the data table is not large, the data table may not be partitioned.
Based on the database, the server may determine the migration object after acquiring the data configuration information. For example, when the first data table does not include a partition, the first data table is taken as a migration object; when the first data table comprises partitions and each partition does not comprise a child partition, taking each data partition in the first data table as a migration object; when the first data table comprises partitions and the partitions also comprise sub-partitions, each data sub-partition in the first data table is taken as a migration object.
Step 102, for each migration object, calculating the thread number required by the migration object in the migration process.
The server may calculate a thread count for each migrated object. For example, the number of threads 3 is calculated for the first migration object, the number of threads 5 is calculated for the second migration object, and so on. The process of calculating the thread number is described in detail below, and is not described herein.
And 103, fragmenting the migration object according to the thread number to obtain a plurality of data fragments, wherein each data fragment corresponds to one thread.
The server can fragment the migration object according to the thread number, so that the number of the obtained data fragments is equal to the thread number. For example, if the number of threads corresponding to the migration object is 3, the migration object is sliced into 3 data slices.
And 104, migrating the corresponding data fragments in parallel by using each thread.
The server may execute each thread in parallel, and thus, the server may migrate the corresponding data fragment in parallel by using each thread, for example, if one migration object includes three data fragments, the three threads may migrate the three data fragments at the same time, so that the efficiency of data migration may be improved.
To sum up, according to the data migration method provided in the embodiment of the present application, after determining a migration object in a data table to be migrated, for each migration object, the number of threads that the migration object needs to use in the migration process may be calculated, and then the migration object is fragmented according to the number of threads to obtain a plurality of data fragments.
Referring to fig. 2, a flowchart of a data migration method provided in another embodiment of the present application is shown, where the data migration method may be applied in a server. The data migration method may include:
step 201, determining a migration object in a data table to be migrated, where the migration object is a minimum granularity partition or a partition-free data table in a data table with partitions.
Before the embodiment is executed, the server needs to be configured. In one implementation, a server may be configured with first connection information of a first database in a source device and second connection information of a second database in a destination device, where the first connection information includes a database identifier, a user name, and a password of the first database, so as to facilitate the server to establish a connection with the first database in the source device, and the second connection information includes a database identifier, a user name, and a password of the second database, so as to facilitate the server to establish a connection with the second database in the destination device.
In this embodiment, data configuration information required to be used when the migration data table is configured to the server may also be used. Since data needs to be migrated from a data table in the first database to a data table in the second database, for convenience of distinction, the data table of the data to be migrated is hereinafter referred to as a first data table, and the data table of the data to be received is hereinafter referred to as a second data table.
For example, the data configuration information may include a first data table user, a first data table name, a first data table partition name, a first data table sub-partition name, a second data table user, a second data table name, a unique index field or a field with many different data values in the first data table, a physical size of the first data table, a data amount of the first data table, a number of data blocks used by the first data table, whether the first data table includes a BLOB field, and the like, which is not limited in this embodiment.
Optionally, after the server obtains the data configuration information, a first data migration preparation table may be created, and the data configuration information may be written in the first data migration preparation table.
The data is typically stored in a database in the form of a data table. When the data amount in the data table is large, the data table can be partitioned to obtain data partitions, so that the operation efficiency of the data in the data table is improved. Further, when the data size of the data partition is still large, the data partition may be partitioned to obtain a data sub-partition, and the granularity of partitioning the data table is not limited in this embodiment. Of course, when the amount of data in the data table is not large, the data table may not be partitioned.
Based on the database, the server may determine the migration object after acquiring the data configuration information. For example, when the first data table does not include a partition, the first data table is taken as a migration object; when the first data table comprises partitions and each partition does not comprise a child partition, taking each data partition in the first data table as a migration object; when the first data table comprises partitions and the partitions also comprise sub-partitions, each data sub-partition in the first data table is taken as a migration object.
In step 202, object information of each migration object is obtained, where the object information is field information of predetermined N fields in the migration object.
Wherein N is a positive integer.
The server may obtain the object information from the data configuration information. In this embodiment, the object information includes a physical size of the first data table, a data size of the first data table, a number of data blocks used by the first data table, whether the first data table includes a BLOB field, and a determination manner of the object information is described in detail in the following description.
Step 203, for each migration object, inputting the object information of the migration object into a preset scoring model to obtain the expected migration time, wherein the scoring model is used for calculating the expected migration time of each migration object according to the field information of the preset N fields.
In this embodiment, the server may obtain the scoring model from other devices, or may train the scoring model by itself, and the following introduces a process of training the scoring model by the server.
In one implementation, before inputting the object information of the migration object into the preset scoring model, the method may further include the following substeps:
step 2031, obtaining a preset training set and a test set, where the training set includes a data table for training the scoring model, and the test set includes a data table for testing the scoring model.
The server may obtain the training set and the test set by using pandas, or may obtain the training set and the test set by using other manners, which is not limited in this embodiment.
Step 2032, quantizing the data of M fields in the data tables in the training set and the test set, wherein M is larger than or equal to N.
The server may determine each migrated object in the data table of the training set and each migrated object in the data table of the test set by using the method described in step 201, and then obtain a field of each migrated object.
For example, the M fields in the data table may include a physical size of a migration object, a data amount of the migration object, a type of a table of the migration object, whether the migration object is a partition table, a partition type of the migration object, whether the migration object is a sub-partition table, a sub-partition type of the migration object, a number of data blocks used by the migration object, a number of free data blocks of the migration object, an average length of a row of the migration object, whether table compression is enabled, a single database thread migration time is consumed, whether the migration object includes a BLOB field, and whether the migration object includes a BLOB field.
Since the data units of the fields in the migration object are different and the numerical range is large, the data in the data table can be quantized. For example, when a field is consumed for single-database thread migration, data in a field consumed for single-database thread migration in a migration object can be read, when the migration time of the data exceeds 30 minutes, the data is divided into a category of "long consumed", and the data is marked as 1; when the migration of the data takes not more than 30 minutes, the data is classified into a "short-elapsed" classification, and the data is marked as 0, so that the specific time-elapsed period can be quantized into two kinds of data of 0 and 1. For another example, if the data in the migration object is complete and there is no data missing, the server may adopt unsupervised equidistant binning, and convert the data into a WOE (Evidence Weight) code, so as to achieve the purpose of quantization.
Step 2033, select N fields from the M fields according to the quantized data table.
After quantizing the data in each migration object, the server may initially screen out K (K is a positive integer) fields, where the screened K fields satisfy the following conditions: after calculating according to WOE codes to obtain IV, sorting the IV and selecting fields with IV > 0.02; performing multi-field analysis by using corrcoef of numpy, and reserving fields with the correlation lower than a threshold value of 0.6; the lower VIF (Variance inflationfactor) is judged by using the sensitivity of Variance _ inflationfactor of statscolds to collinearity, and the importance of the field is calculated by using XGBOOST (Extreme gradient enhancement algorithm).
After obtaining K fields, the server may put the K fields into a logit model of statunmodel, where the K fields are the physical size of the migration object, the data size of the migration object, the number of data blocks used by the migration object, and whether the migration object includes a BLOB field, respectively, and calculate the coefficient and significance of the logit model. And then, the server continues to add and subtract the fields into the logit model until the logit model is not significant or the coefficients are positive, thereby determining the finally selected N fields. In this embodiment, the finally selected N fields are the physical size of the migration object, the data size of the migration object, the number of data blocks used by the migration object, whether the migration object includes a BLOB field, and whether the migration object includes a BLOB field.
Step 2034, training the scoring model according to the data of the N fields in the training set, and stopping training until the verification result of the scoring model according to the data of the N fields in the test set meets the preset condition.
After determining the N fields, the server may train a scoring model based on the data of the N fields in the training set and generate a ROC (Receiver Operating characteristics) curve and a KS (Kolmogorov-Smirnov, lorentz) curve using the data of the N fields in the test set, and stop training the scoring model until the ROC curve and the KS curve are not over-fitted.
After obtaining the score model, the server may input the object information of the migration object into the score model, and an output of the score model is an expected migration time of the migration object. Each field in one migration object can obtain a corresponding score, and the scores of the N fields are summarized to obtain the expected migration time consumption of one migration object.
And step 204, calculating the number of threads required to be used by each migration object according to the expected time-consuming score of each migration object.
The method comprises the following steps of calculating the number of threads required to be used by each migration object according to the expected time-consuming score of each migration object:
step 2041, obtain the first CPU thread number of the source device and the second CPU thread number of the destination device, where the source device is a device providing migrated data, and the target device is a device receiving migrated data.
The server may obtain the first CPU thread count from the first connection information, and obtain the second CPU thread count from the second connection information.
Step 2042, calculating the difference between the maximum value and the minimum value in the expected time-consuming scores of all the migrated objects, taking the minimum value in the first CPU thread number and the second CPU thread number, and dividing the difference by the minimum value to obtain a first intermediate number.
The server can traverse the expected migration time of each migration object to obtain the maximum value and the minimum value in the expected time-consuming scores.
Step 2043, for each migration object, multiplying the first intermediate numerical value by the expected time consumption score of the migration object, and rounding down the obtained second intermediate numerical value to obtain the thread number required by the migration object.
Where the server may use a floor function to round down, then the number of threads = floor (((max-min)/min (first CPU thread number, second CPU thread number))) the expected time spent migrating the object.
The server may calculate a thread count for each migrated object. For example, the number of threads 3 is calculated for the first migration object, the number of threads 5 is calculated for the second migration object, and so on. The process of calculating the thread number is described in detail below, and is not described herein.
After obtaining the thread count for each migration object, the server may update each thread count into the first data migration preparation table.
Step 205, the migration object is fragmented according to the thread number to obtain a plurality of data fragments, and each data fragment corresponds to one thread.
The fragmenting the migration object according to the thread number to obtain a plurality of data fragments may include the following substeps:
step 2051 selects a predetermined field from the plurality of fields of the migrated object, the predetermined field having a different data value.
When the data values in a certain field in a database are different, the field can be used as a unique index field.
The server can firstly search whether the unique index field exists, and if the unique index field exists, the unique index field is used as a preset field; if there is no unique index field (there may be fields with different data values, but no unique index field is generated), the fields with different data values (i.e. fields with more different data values) may be searched for as the predetermined field.
And step 2052, the migration object is segmented according to the thread number and the data value in the predetermined field to obtain a plurality of data segments, and the data value of the predetermined field in each data segment is a partition of the data value of the predetermined field in the migration object.
And (3) forming data fragmentation based on the unique index field of the migration table or the fields with more different data values by splitting the data value of the unique index field of the migration table or the fields with more different data values of the migration table by using the thread number during migration for each migration object. For example, if the data value in the predetermined field is 1-100 and the number of threads is 5, the data value 1-20 may be regarded as one data slice, the data value 21-40 may be regarded as one data slice, the data value 41-60 may be regarded as one data slice, the data value 61-80 may be regarded as one data slice, and the data value 81-100 may be regarded as one data slice.
Each data fragment after splitting has a start value and an end value, and the server may merge the start value, the end value, and the first data migration preparation table of each data fragment into a second data migration preparation table.
And step 206, migrating the corresponding data fragments in parallel by using each thread.
The migrating the corresponding data fragment in parallel by using each thread may include the following substeps:
step 2061, for each thread running in parallel, the corresponding data fragment is obtained from the source device through the data export tool called by the thread, and the data fragment is stored in the pipeline file of the system.
The server starts a plurality of processes, each process calls a data export tool (exp), reads a start value, an end value, a unique index field or a field with a plurality of different data values of each data fragment in the second data migration preparation table, generates a query condition of 'where' a unique index field or a field with a plurality of different data values '> = a start value and' a unique index field or a field with a plurality of different data values '< an end value', and uses the query condition as a parameter of query in the exp (data export tool), thereby achieving the purpose of multi-thread parallel migration of data.
In this embodiment, when the server uses the data export tool, the data fragment may be cached in a system pipeline file (pipeline), so that the exported data fragment does not fall to the ground, and lightweight data migration is implemented. Wherein, the pipeline file has the characteristics of one-way, sequential, and instant storage and reading.
Step 2062, the data in the pipeline file is fragmented and sent to the destination device through the data import tool.
The server may start a number of data import tools corresponding to the number of data export tools, locate the data import tools to the pipeline file, and import the corresponding data fragments in the pipeline file into the second data table in the destination device through each data import tool.
Optionally, the method provided in this embodiment further includes: creating a data migration reconciliation table and a data migration monitoring table; for each successfully migrated data fragment, writing the migrated data volume and migrated data volume of the data fragment into a data migration reconciliation table; and for each data fragment which fails to migrate, writing the abnormal information of the data fragment into a data migration monitoring table.
In this embodiment, the server may create an empty data migration reconciliation table and an empty data migration monitoring table when creating the first data migration preparation table, so that after each data fragment is successfully migrated, the migrated data volume and migrated data volume of the data fragment may be written in the data migration reconciliation table to implement reconciliation, thereby ensuring that no data is lost in the data migration process; after the data fragmentation migration fails, the name of the migration object, migration error reporting information and the like can be written in the data migration monitoring table, so that the manual processing can be conveniently carried out by a worker.
In this embodiment, the worker may determine which migration object is currently migrated, the current migration progress, whether there is an exception in the migration process, and the like according to the two tables, so that monitoring and exception handling of data migration may be implemented.
To sum up, according to the data migration method provided in the embodiment of the present application, after determining a migration object in a data table to be migrated, for each migration object, the number of threads that the migration object needs to use in the migration process may be calculated, and then the migration object is fragmented according to the number of threads to obtain a plurality of data fragments.
When the server uses the data export tool, the data fragments can be cached in a system pipeline file (pipeline), the exported data fragments are not dropped, light data migration is achieved, and the data fragments are not leaked due to the fact that intermediate files are not generated by the data fragments, and therefore safety of data migration is guaranteed.
Referring to fig. 3, a block diagram of a data migration apparatus provided in an embodiment of the present application is shown, where the data migration apparatus may be applied to a server. The data migration device may include:
a determining module 310, configured to determine a migration object in a data table to be migrated, where the migration object is a minimum granularity partition or a partition-free data table in a data table with partitions;
a calculating module 320, configured to calculate, for each migration object, a number of threads that the migration object needs to use in the migration process;
the fragmentation module 330 is configured to fragment the migration object according to the number of threads to obtain a plurality of data fragments, where each data fragment corresponds to one thread;
and the migration module 340 is configured to migrate the corresponding data segments in parallel by using each thread.
In an alternative embodiment, the calculation module 320 is further configured to:
acquiring object information of each migration object, wherein the object information is field information of preset N fields in the migration object, and N is a positive integer;
for each migration object, inputting the object information of the migration object into a preset scoring model to obtain expected migration time, wherein the scoring model is used for calculating the expected migration time of each migration object according to the field information of the preset N fields;
and calculating the number of threads required to be used by each migration object according to the expected time-consuming score of each migration object.
In an alternative embodiment, the calculation module 320 is further configured to:
acquiring a first CPU thread number of a source device and a second CPU thread number of a target device, wherein the source device is a device for providing migrated data, and the target device is a device for receiving migrated data;
calculating a difference value obtained by subtracting the minimum value from the maximum value in the expected time-consuming scores of all the migration objects, taking the minimum value in the first CPU thread number and the second CPU thread number, and dividing the difference value by the minimum value to obtain a first intermediate value;
and for each migration object, multiplying the first intermediate numerical value by the expected time-consuming score of the migration object, and rounding the obtained second intermediate numerical value downwards to obtain the thread number required by the migration object.
In an alternative embodiment, the calculation module 320 is further configured to:
before the object information of the migration object is input into a preset scoring model, a preset training set and a test set are obtained, wherein the training set comprises a data table for training the scoring model, and the test set comprises a data table for testing the scoring model;
respectively quantizing the data of M fields in the data tables in the training set and the test set, wherein M is more than or equal to N;
selecting N fields from the M fields according to the quantized data table;
and training the scoring model according to the data of the N fields in the training set, and stopping training until the verification result of the scoring model according to the data of the N fields in the test set meets the preset condition.
In an optional embodiment, the fragmentation module 330 is further configured to:
selecting a preset field from a plurality of fields of the migration object, wherein the data values of the preset field are different;
and fragmenting the migration object according to the thread number and the data value in the preset field to obtain a plurality of data fragments, wherein the data value of the preset field in each data fragment is a partition of the data value of the preset field in the migration object.
In an optional embodiment, the apparatus further comprises:
the creating module is used for creating a data migration reconciliation table and a data migration monitoring table;
the first writing module is used for writing the migrated data volume and migrated data volume of each successfully migrated data fragment into the data migration reconciliation table;
and the second writing module is used for writing the abnormal information of the data fragment into the data migration monitoring table for each data fragment failed in migration.
In an alternative embodiment, the migration module 340 is further configured to:
for each thread running in parallel, acquiring corresponding data fragments from source equipment through a data export tool called by the thread, and storing the data fragments in a pipeline file of the system;
and transmitting the data fragments in the pipeline file to the destination equipment through a data import tool.
To sum up, the data migration apparatus provided in this embodiment of the present application, after determining a migration object in a data table to be migrated, may calculate, for each migration object, a thread number that needs to be used by the migration object in a migration process, and then fragment the migration object according to the thread number to obtain a plurality of data fragments.
When the server uses the data export tool, the data fragments can be cached in a system pipeline file (pipeline), the exported data fragments are not dropped, light data migration is achieved, and the data fragments are not leaked due to the fact that intermediate files are not generated by the data fragments, and therefore safety of data migration is guaranteed.
One embodiment of the present application provides a computer-readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by a processor to implement a data migration method as described above.
One embodiment of the present application provides a server comprising a processor and a memory, wherein the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the data migration method as described above.
It should be noted that: in the data migration apparatus provided in the foregoing embodiment, only the division of the functional modules is illustrated when data migration is performed, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the internal structure of the data migration apparatus may be divided into different functional modules to complete all or part of the functions described above. In addition, the data migration apparatus and the data migration method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description should not be taken as limiting the embodiments of the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims (10)

1. A method of data migration, the method comprising:
determining a migration object in a data table to be migrated, wherein the migration object is a minimum granularity partition or a partition-free data table in the data table with partitions;
for each migration object, calculating the number of threads required to be used by the migration object in the migration process;
fragmenting the migration object according to the thread number to obtain a plurality of data fragments, wherein each data fragment corresponds to one thread;
and migrating the corresponding data fragments in parallel by utilizing each thread.
2. The method according to claim 1, wherein for each migration object, calculating the number of threads required to be used by the migration object in the migration process comprises:
acquiring object information of each migration object, wherein the object information is field information of preset N fields in the migration object, and N is a positive integer;
for each migration object, inputting the object information of the migration object into a preset scoring model to obtain expected migration time, wherein the scoring model is used for calculating the expected migration time of each migration object according to the field information of the preset N fields;
and calculating the number of threads required to be used by each migration object according to the expected time-consuming score of each migration object.
3. The method according to claim 2, wherein calculating the number of threads required to be used by each migration object according to the expected time-consuming score of each migration object comprises:
acquiring a first Central Processing Unit (CPU) thread number of a source device and a second CPU thread number of a target device, wherein the source device is a device for providing migrated data, and the target device is a device for receiving migrated data;
calculating a difference value obtained by subtracting a minimum value from a maximum value in the expected time-consuming scores of all the migration objects, taking a minimum value in the first CPU thread number and the second CPU thread number, and dividing the difference value by the minimum value to obtain a first intermediate value;
and for each migration object, multiplying the first intermediate numerical value by the expected time-consuming score of the migration object, and rounding the obtained second intermediate numerical value downwards to obtain the thread number required by the migration object.
4. The method according to claim 2, wherein before the inputting the object information of the migrated object into a preset scoring model, the method further comprises:
acquiring a preset training set and a test set, wherein the training set comprises a data table for training the scoring model, and the test set comprises a data table for testing the scoring model;
respectively quantizing the data of M fields in the data tables in the training set and the test set, wherein M is more than or equal to N;
selecting N fields from the M fields according to the quantized data table;
and training the scoring model according to the data of the N fields in the training set until a verification result of the scoring model according to the data of the N fields in the testing set meets a preset condition.
5. The method according to claim 1, wherein the fragmenting the migrated object according to the thread number to obtain a plurality of data fragments comprises:
selecting a preset field from a plurality of fields of the migration object, wherein the data values of the preset field are different;
and fragmenting the migration object according to the thread number and the data value in the preset field to obtain a plurality of data fragments, wherein the data value of the preset field in each data fragment is a partition of the data value of the preset field in the migration object.
6. The method of claim 1, further comprising:
creating a data migration reconciliation table and a data migration monitoring table;
for each successfully migrated data fragment, writing the migrated data volume and migrated data volume of the data fragment into the data migration reconciliation table;
and for each data fragment which fails to migrate, writing the abnormal information of the data fragment into the data migration monitoring table.
7. The method according to any one of claims 1 to 6, wherein migrating the corresponding data slice in parallel by using each thread comprises:
for each thread running in parallel, acquiring a corresponding data fragment from source equipment through a data export tool called by the thread, and storing the data fragment in a pipeline file of a system;
and sending the data fragments in the pipeline file to a destination device through a data import tool.
8. An apparatus for data migration, the apparatus comprising:
the determining module is used for determining a migration object in a data table to be migrated, wherein the migration object is a minimum granularity partition or a partition-free data table in the data table with partitions;
the calculation module is used for calculating the number of threads required to be used by each migration object in the migration process;
the fragmentation module is used for fragmenting the migration object according to the thread number to obtain a plurality of data fragments, and each data fragment corresponds to one thread;
and the migration module is used for migrating the corresponding data fragments in parallel by utilizing all the threads.
9. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a data migration method according to any one of claims 1 to 7.
10. A server, comprising a processor and a memory, the memory having stored therein at least one instruction, the instruction being loaded and executed by the processor to implement a data migration method according to any one of claims 1 to 7.
CN201911301449.0A 2019-12-17 2019-12-17 Data migration method and device, storage medium and server Active CN111125059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911301449.0A CN111125059B (en) 2019-12-17 2019-12-17 Data migration method and device, storage medium and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911301449.0A CN111125059B (en) 2019-12-17 2019-12-17 Data migration method and device, storage medium and server

Publications (2)

Publication Number Publication Date
CN111125059A true CN111125059A (en) 2020-05-08
CN111125059B CN111125059B (en) 2022-08-12

Family

ID=70499219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911301449.0A Active CN111125059B (en) 2019-12-17 2019-12-17 Data migration method and device, storage medium and server

Country Status (1)

Country Link
CN (1) CN111125059B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708763A (en) * 2020-06-18 2020-09-25 北京金山云网络技术有限公司 Data migration method and device of fragment cluster and fragment cluster system
CN112015716A (en) * 2020-08-04 2020-12-01 北京人大金仓信息技术股份有限公司 Database data migration method, device, medium and electronic equipment
CN112817742A (en) * 2021-01-12 2021-05-18 平安科技(深圳)有限公司 Data migration method, device, equipment and storage medium
CN113760858A (en) * 2020-06-05 2021-12-07 中国移动通信集团湖北有限公司 Dynamic migration method and device for memory database data, computing equipment and storage equipment
CN114237519A (en) * 2022-02-23 2022-03-25 苏州浪潮智能科技有限公司 Method, device, equipment and medium for migrating object storage data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189756A (en) * 2018-06-29 2019-01-11 平安科技(深圳)有限公司 Electronic device, the method for Data Migration and storage medium
CN109753493A (en) * 2019-01-04 2019-05-14 中国银行股份有限公司 The method, apparatus and equipment of Data Migration are carried out between database
CN110287197A (en) * 2019-06-28 2019-09-27 微梦创科网络科技(中国)有限公司 A kind of date storage method, moving method and device
CN110297813A (en) * 2019-05-22 2019-10-01 平安银行股份有限公司 Data migration method, device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189756A (en) * 2018-06-29 2019-01-11 平安科技(深圳)有限公司 Electronic device, the method for Data Migration and storage medium
CN109753493A (en) * 2019-01-04 2019-05-14 中国银行股份有限公司 The method, apparatus and equipment of Data Migration are carried out between database
CN110297813A (en) * 2019-05-22 2019-10-01 平安银行股份有限公司 Data migration method, device, computer equipment and storage medium
CN110287197A (en) * 2019-06-28 2019-09-27 微梦创科网络科技(中国)有限公司 A kind of date storage method, moving method and device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113760858A (en) * 2020-06-05 2021-12-07 中国移动通信集团湖北有限公司 Dynamic migration method and device for memory database data, computing equipment and storage equipment
CN113760858B (en) * 2020-06-05 2024-03-19 中国移动通信集团湖北有限公司 Dynamic migration method and device for memory database data, computing equipment and storage equipment
CN111708763A (en) * 2020-06-18 2020-09-25 北京金山云网络技术有限公司 Data migration method and device of fragment cluster and fragment cluster system
CN111708763B (en) * 2020-06-18 2023-12-01 北京金山云网络技术有限公司 Data migration method and device of sliced cluster and sliced cluster system
CN112015716A (en) * 2020-08-04 2020-12-01 北京人大金仓信息技术股份有限公司 Database data migration method, device, medium and electronic equipment
CN112015716B (en) * 2020-08-04 2024-02-09 北京人大金仓信息技术股份有限公司 Database data migration method, device, medium and electronic equipment
CN112817742A (en) * 2021-01-12 2021-05-18 平安科技(深圳)有限公司 Data migration method, device, equipment and storage medium
WO2022151614A1 (en) * 2021-01-12 2022-07-21 平安科技(深圳)有限公司 Data migration method and apparatus, device, and storage medium
CN112817742B (en) * 2021-01-12 2024-03-01 平安科技(深圳)有限公司 Data migration method, device, equipment and storage medium
CN114237519A (en) * 2022-02-23 2022-03-25 苏州浪潮智能科技有限公司 Method, device, equipment and medium for migrating object storage data

Also Published As

Publication number Publication date
CN111125059B (en) 2022-08-12

Similar Documents

Publication Publication Date Title
CN111125059B (en) Data migration method and device, storage medium and server
CN110347651B (en) Cloud storage-based data synchronization method, device, equipment and storage medium
US7721288B2 (en) Organizing transmission of repository data
TWI475411B (en) Large data checking system and its method in cloud platform
CN107656807B (en) Automatic elastic expansion method and device for virtual resources
CN107562532B (en) Method and device for predicting hardware resource utilization rate of equipment cluster
CN107016115B (en) Data export method and device, computer readable storage medium and electronic equipment
CN113485999A (en) Data cleaning method and device and server
CN113177050A (en) Data balancing method, device, query system and storage medium
CN114519006A (en) Test method, device, equipment and storage medium
CN111090401B (en) Storage device performance prediction method and device
CN112685275A (en) Algorithm strategy searching method and device, electronic equipment and storage medium
CN111859139A (en) Application program recommendation method and device, computing equipment and medium
US20230305917A1 (en) Operation management apparatus and method
CN116578558A (en) Data processing method, device, equipment and storage medium
CN111190871A (en) Log generation method and device, computer equipment and storage medium
CN115102836A (en) Network equipment fault analysis method and device and storage medium
CN111142898B (en) Data anti-leakage terminal upgrading method and system based on group intelligent mode
CN113010310A (en) Job data processing method and device and server
CN111881110A (en) Data migration method and device
CN106528577B (en) Method and device for setting file to be cleaned
CN113641670B (en) Data storage and data retrieval method and device, electronic equipment and storage medium
CN114461407B (en) Data processing method, data processing device, distribution server, data processing system, and storage medium
CN117610970B (en) Intelligent evaluation method and system for data migration work
CN113918513B (en) Data migration method, device, equipment and storage medium based on block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 210000 Longmian Avenue 568, High-tech Park, Jiangning District, Nanjing City, Jiangsu Province

Patentee after: Xiaoshi Technology (Jiangsu) Co.,Ltd.

Address before: 210000 Longmian Avenue 568, High-tech Park, Jiangning District, Nanjing City, Jiangsu Province

Patentee before: NANJING ZHENSHI INTELLIGENT TECHNOLOGY Co.,Ltd.