CN111881116A

CN111881116A - Data migration method, data migration system, computer system, and storage medium

Info

Publication number: CN111881116A
Application number: CN202010780437.7A
Authority: CN
Inventors: 徐鹏飞; 刘轲; 金之华; 俞丽萍
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2020-11-03

Abstract

The present disclosure provides a data migration method, a data migration system, a computer system, and a computer-readable storage medium that can be used in the big data field or other fields. Wherein, the method comprises the following steps: determining data migration time of stock data and acquisition time of incremental data, wherein the acquisition time is earlier than the data migration time; acquiring incremental data under the condition that the current time reaches the acquisition time of the incremental data, and writing the incremental data into the message processing platform; under the condition that the current time reaches the data migration time of the stock data, the stock data stored in the host system database is obtained, and the stock data are migrated to the platform system database in batches; and after migrating at least one batch of stock data to the platform system database, storing the incremental data in the message processing platform into the platform system database.

Description

Data migration method, data migration system, computer system, and storage medium

Technical Field

The present disclosure relates to the field of big data technologies, and more particularly, to a data migration method, a data migration system, a computer system, and a computer-readable storage medium.

Background

With the rapid development of the internet big data age, data information becomes the most precious key asset. In terms of data storage, it is becoming increasingly difficult for conventional host systems to meet the rapidly growing demand for mass data volume processing in the big data era. Therefore, for data with characteristics of large scale, cross-region, complex service, data diversification, and the like, a large data (open platform) management system is inevitably required to realize the management of the data. In order to realize convenient unified management of the whole data, the original data still stored in the current host system inevitably needs to be migrated to a big data (open platform) management system.

In the process of implementing the concept of the present disclosure, the inventors found that in the related art, at least the following problems exist, and the problems that the data migration is accompanied by a slow migration response rate, smoothness and integrity of the data migration are difficult to guarantee, data types are incompatible, and the amount of data faced by the migration process is large and the like due to no specific data migration rule set.

Disclosure of Invention

In view of the above, the present disclosure provides a data migration method, a data migration system, a computer system, and a computer-readable storage medium.

One aspect of the present disclosure provides a data migration method, including: determining data migration time of stock data and acquisition time of incremental data, wherein the acquisition time is earlier than the data migration time; under the condition that the current time reaches the acquisition time of the incremental data, acquiring the incremental data and writing the incremental data into a message processing platform; under the condition that the current time reaches the data migration time of the stock data, acquiring the stock data stored in a host system database, and migrating the stock data to a platform system database in batches; and after at least one batch of the stock data is migrated to a platform system database, storing the incremental data in the message processing platform into the platform system database.

According to the embodiment of the disclosure, the host system database realizes data storage through an initial coding format, and the platform system database realizes data storage through a target coding format; and/or the host system database is a database with a centralized architecture, and the platform system database is a database with a distributed architecture.

According to the embodiment of the present disclosure, acquiring the incremental data and writing the incremental data into the message processing platform when the current time reaches the acquisition time of the incremental data includes: acquiring a message related to the incremental data; sending the message related to the incremental data to a cache region in a single thread processing mode; analyzing the message related to the incremental data in the cache region in a multithreading parallel processing mode to obtain the incremental data; recording the incremental data with the same key value into the same queue according to the key value of the incremental data to obtain queue data; and writing the queue data to the message processing platform.

According to an embodiment of the present disclosure, in a case that a current time reaches a data migration time of the stock data, acquiring the stock data stored in a host system database, and migrating the stock data to a platform system database in batches includes: acquiring part of stock data with an initial coding format to obtain a file with the initial coding format; converting the file with the initial coding format into a file with a target coding format; splitting the file with the target coding format according to the key values in the file with the target coding format to obtain subfiles formed by the target coding format; storing the subfiles formed by the target coding format into a memory of a distributed file storage system; and importing the subfiles composed of the target coding format in the distributed file storage system into the platform system database in batches.

According to an embodiment of the present disclosure, after migrating at least one batch of the stock data to a platform system database, storing the incremental data in the message processing platform into the platform system database includes: acquiring the stock data and the incremental data with the same key value; under the condition that the updating time of the incremental data is greater than the maximum time of the stock data, acquiring a current operation instruction aiming at the incremental data; under the condition that the operation instruction is an updating instruction, judging whether stock data with the same key value as the incremental data exists in the current platform system database or not, if so, updating the stock data into the incremental data, and if not, inserting the incremental data into the platform system database; under the condition that the operation instruction is an insertion instruction, the incremental data is directly inserted into the platform system database; and setting the flag state of the incremental data to be deleted when the operation instruction is a deletion instruction.

According to an embodiment of the disclosure, the method further comprises: acquiring the stock data stored in the platform system database; and deleting the stock data which is stored in the platform system database in the host system database.

According to an embodiment of the disclosure, the method further comprises: before the stock data and the incremental data are not migrated, acquiring the stock data and the incremental data in a binary form which are stored in a platform system database; under the condition that the stock data and the incremental data are migrated, reading the stock data and the incremental data in the binary form in a thread mode; and copying the stock data and the incremental data in the binary form to a backup database for backup.

Another aspect of the present disclosure provides a data migration system, including: the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining data migration time of stock data and acquisition time of incremental data, and the acquisition time is earlier than the data migration time; the first acquisition module is used for acquiring the incremental data and writing the incremental data into a message processing platform under the condition that the current time reaches the acquisition time of the incremental data; the second acquisition module is used for acquiring the stock data stored in a host system database and transferring the stock data to a platform system database in batches when the current time reaches the data transfer time of the stock data; and the storage module is used for storing the incremental data in the message processing platform into the platform system database after at least one batch of the stock data is migrated to the platform system database.

Another aspect of the present disclosure provides a computer system comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.

Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.

According to the embodiment of the disclosure, the data migration time for determining the stock data and the acquisition time for the incremental data are adopted, wherein the acquisition time is earlier than the data migration time; acquiring incremental data under the condition that the current time reaches the acquisition time of the incremental data, and writing the incremental data into the message processing platform; under the condition that the current time reaches the data migration time of the stock data, the stock data stored in the host system database is obtained, and the stock data are migrated to the platform system database in batches; and after migrating at least one batch of stock data to the platform system database, storing the incremental data in the message processing platform into the platform system database. The acquisition time and the warehousing time of the incremental data are set based on the data migration time of the stock data, so that the technical problems that the data migration response rate is slow, the integrity is difficult to guarantee, the data volume is increased in the migration process and the like are at least partially solved, the risk that the incremental data are possibly lost in the stock data migration process is reduced, the data migration efficiency is improved, the influence on normal external services is reduced, and the high availability of the migrated data is guaranteed.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an exemplary system architecture to which a data migration method may be applied, according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of data migration according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram for acquiring inventory data stored in a host system database and migrating it to a platform system database in batches according to an embodiment of the disclosure;

FIG. 4 schematically illustrates an example graph of an inventory data heterogeneous system migration flow, according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow diagram for obtaining and writing incremental data to a message processing platform according to an embodiment of the disclosure;

FIG. 6 schematically illustrates a flow diagram for storing incremental data in a message processing platform into a platform system database, according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates another flow diagram of a method of data migration in accordance with an embodiment of the present disclosure;

FIG. 8 schematically illustrates an example graph of incremental data heterogeneous system migration flow, in accordance with an embodiment of the present disclosure;

FIG. 9 schematically illustrates a flow diagram for building a backup database and server according to an embodiment of the present disclosure;

FIG. 10 schematically illustrates a master-slave database semi-synchronization flow diagram according to the present disclosure;

FIG. 11 schematically illustrates a block diagram of a data migration system according to an embodiment of the present disclosure; and

FIG. 12 schematically illustrates a block diagram of a computer system suitable for implementing the data migration method described above, in accordance with an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Under the background of the big data era, some enterprises (such as financial industry, bank enterprises and the like) as favorable supporters of digital economy should better combine the advantages of the big data era to mine the value from the big data and improve the transaction service efficiency so as to enhance the business innovation and the core competitive power of the enterprises. With the continuous growth of business in this part of enterprises, it is gradually difficult for the traditional host system to meet the demand of the rapidly growing mass data volume processing in the big data era. Therefore, under the condition that big data grows explosively, a lot of information applications facing specific services in the current host system need to be migrated to the big data open platform system for database migration.

In implementing the disclosed concept, the inventor found that in the case of a large amount of stock data stored in the host system, the stock data may need to be migrated in multiple time periods, and even if the stock data is selected to be processed in the non-important business service peak period, there is still a problem that the data of the transaction in transit at the time point is missed due to the delay of the migration procedure.

In the process of implementing the present disclosure, the inventor also finds that, for the stock data stored in the host system and the incremental data generated in the process of implementing migration of the stock data, if there is no specific migration rule constraint, the problem of data repeated writing may be caused, and when the type of the repeatedly written data is insertion, the problems of poor efficiency of data migration and storage, more time consumption for data migration, and the like may be further caused.

Embodiments of the present disclosure provide a data migration method, a data migration system, a computer system, and a computer-readable storage medium. Determining data migration time of stock data and acquisition time of incremental data, wherein the acquisition time is earlier than the data migration time; acquiring incremental data under the condition that the current time reaches the acquisition time of the incremental data, and writing the incremental data into the message processing platform; under the condition that the current time reaches the data migration time of the stock data, the stock data stored in the host system database is obtained, and the stock data are migrated to the platform system database in batches; and after migrating at least one batch of stock data to the platform system database, storing the incremental data in the message processing platform into the platform system database.

Fig. 1 schematically illustrates an exemplary system architecture 100 to which a data migration method may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications, such as a control application, a web browser application, a cloud platform application, a system application, an application capable of implementing a remote control function, and/or platform software capable of editing and sending script instructions, may be installed on the

terminal devices

101, 102, and 103.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server providing support for requests made by users with the

terminal devices

101, 102, 103. The background management server may analyze and process data such as the received user request, and feed back a processing result (for example, control or instruction information returned for the user request, and the like) to the terminal device or other external devices related to the user request, where the external devices may be, for example, various types of data storage systems and the like.

It should be noted that the data migration method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the data migration system provided by the embodiments of the present disclosure may be generally disposed in the server 105. The data migration method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the data migration system provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Alternatively, the data migration method provided by the embodiment of the present disclosure may also be executed by the

terminal device

101, 102, or 103, or may also be executed by another terminal device different from the

terminal device

101, 102, or 103. Accordingly, the data migration system provided by the embodiment of the present disclosure may also be disposed in the

terminal device

101, 102, or 103, or in another terminal device different from the

terminal device

101, 102, or 103.

For example, the control instruction related to data migration may be originally stored in any one of the

terminal apparatuses

101, 102, or 103 (for example, but not limited to the terminal apparatus 101), or stored on an external storage apparatus and may be imported into the terminal apparatus 101. Then, the terminal device 101 may locally execute the data migration method provided by the embodiment of the present disclosure, or send a control instruction related to data migration to another terminal device, a server, or a server cluster, and execute the data migration method provided by the embodiment of the present disclosure by another terminal device, a server, or a server cluster that receives the control instruction.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of end devices, networks, and servers, or any desired external devices, such as data storage systems, etc., as desired for an implementation.

It should be noted that the data migration method, the data migration system, the computer system, and the computer-readable storage medium disclosed in the present disclosure may be used in the field of big data, and may also be used in any field other than the field of big data.

FIG. 2 schematically shows a flow chart of a data migration method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, a data migration time of the stock data and an acquisition time of the incremental data are determined, wherein the acquisition time is earlier than the data migration time.

According to an embodiment of the present disclosure, data migration may refer to a process of transferring enterprise data from an initial database system to a target database system, which is performed by an enterprise administrator at a suitable time, for example, a time when the enterprise traffic is relatively not large, and on the other hand, a situation where a large amount of data is generated every second due to the large enterprise traffic, and new data is inevitably generated in the process of data migration. The stock data refers to data originally stored in the database system, the incremental data refers to new data generated due to business increase in the data migration process, and the new data may include update-type data related to the stock data and new added-type data unrelated to the stock data. In order to improve the migration efficiency and reduce the influence on the normal external service, stock data and incremental data are processed separately.

According to the embodiment of the disclosure, the data migration time of the stock data can be defined by the enterprise manager according to the actual traffic volume, and it should be noted that, theoretically, the data migration time of the stock data and the acquisition time of the incremental data cannot be seamlessly connected, and in order to prevent data omission, the acquisition time of the incremental data needs to be set earlier than the data migration time of the stock data. In this embodiment, the incremental data tracing time starting point (i.e., the acquisition time of the incremental data) is arranged to be about 10 minutes before the host migration time point (i.e., the data migration time of the stock data), so that the in-transit transaction during the migration period can be submitted to the log record, and it is ensured that no data is missing.

In operation S220, in case that the current time reaches the acquisition time of the incremental data, the incremental data is acquired and written to the message processing platform.

According to the embodiment of the disclosure, the message processing platform can be selected as KAFKA (an open source stream processing platform) which is used as an intermediate platform (a transfer station) for providing a temporary storage area for the migration process of the incremental data.

In operation S230, in the case where the current time reaches the data migration time of the stock data, the stock data stored in the host system database is acquired and batch-wise migrated into the platform system database.

According to an embodiment of the present disclosure, the host system database is used as the initial database system, wherein the accommodated data is stock data. The stock data is taken as enterprise historical data and has the characteristic of large data volume, and enterprise managers migrate the stock data in batches according to the data volume of the stock data or the length of a proper migration time period.

In operation S240, after migrating at least one batch of the inventory data to the platform system database, storing the incremental data in the message processing platform into the platform system database.

According to the embodiment of the disclosure, the platform system database is used as the target database system, and the stock data and the incremental data are finally migrated into the system. It should be noted that the time of stock falling of the incremental data (i.e., the time when the incremental data can be actually stored in the platform system database) must be set after the loading of the inventory data is completed, where the loading of the inventory data is in units of batches, and the completion of the loading of the inventory data may include, for example, completely storing one or more batches of inventory data in the platform system database.

According to the embodiment of the disclosure, stock Data is triggered to perform a migration operation in a daily end-of-day batch mode, and a Data Replication Platform (DRP) technology is used for completing a task of replicating incremental Data generated in a migration process to a Platform system database.

Through the specific embodiment, because the acquisition time of the incremental data is set to be earlier than the data migration time of the stock data, the stock data and the incremental data can be distinguished and processed, and the risk of the number of in-transit transaction losses generated by the incremental data in the stock data migration process can be effectively reduced. Meanwhile, the database dropping time of the incremental data is set after the inventory data loading is finished, so that the problems of poor database dropping efficiency and the like caused by inventory movement and incremental copy repeated writing can be effectively avoided. Therefore, efficient and stable data migration from the host system database to the platform system database can be realized, the migration process can be monitored in real time, and the high availability of the migrated data can be further ensured.

In addition, in order to avoid that the moving data is downloaded to occupy too much space, so that the normal daily final batch data cannot be successfully downloaded, the downloading time can be flexibly selected, and the daily final normal business batch operation is avoided.

According to embodiments of the present disclosure, the host system database and the platform system database described above are concerned.

The host system database realizes data storage through an initial coding format, and the platform system database realizes data storage through a target coding format. The initial encoding format may be, for example, EBCDIC code (mainframe code), and the target encoding format may be, for example, GB18130 code (ASCII code, a common data encoding format).

The host system database is a database with a centralized architecture, and the platform system database is a database with a distributed architecture.

It should be noted that the initial encoding format and the target encoding format are only one specific encoding format, and in practical embodiments, the two encoding formats may be the same encoding format or may be different encoding formats of other types. The databases may include a hardware database or a cloud platform database.

Through the specific embodiment, a specific data storage architecture for the data migration method is provided, and data migration between heterogeneous systems can be effectively achieved through the method, so that the problems that the traditional architecture expansion capability and storage performance are gradually limited and the like can be directly solved through a data migration mode, meanwhile, the enterprise IT architecture system can be better promoted to be transformed from a centralized architecture to a distributed architecture, and efficient development of enterprises is further promoted.

The method shown in fig. 2 is further described with reference to fig. 3-9 in conjunction with specific embodiments.

FIG. 3 schematically illustrates a flow diagram for acquiring inventory data stored in a host system database and migrating it to a platform system database in batches according to an embodiment of the disclosure.

As shown in fig. 3, the flow includes operations S310 to S350.

In operation S310, partial inventory data having the initial encoding format is acquired, resulting in a file having the initial encoding format.

According to an embodiment of the present disclosure, the character encoding format (i.e., the initial encoding format) of the host system database is an EBCDIC code. When the amount of the stock data is too large, the stock data is migrated in a batch migration manner, that is, generally, a part of the stock data is first acquired as a batch of stock data to be migrated. Specifically, the step of acquiring part of the stock data comprises the steps of reading the table data stored in the database of the host system by SQL sentences in the server of the database of the host system by utilizing the reading instruction, and writing the table data into a newly-built file in the file server by executing the writing instruction to obtain the stock data file with the EBCDIC code.

In operation S320, the file having the initial encoding format is converted into a file having a target encoding format.

According to the embodiment of the disclosure, the stock data file with the EBCDIC code obtained above is downloaded from the file server in the host system to the unified data exchange platform as a temporary field in an FTP (file transfer protocol) transmission manner, and after the downloading is completed, the file containing the stock data is split according to factors such as the region, the account type and the like, so as to obtain a plurality of stock data small files with the EBCDIC code.

According to an embodiment of the present disclosure, the target encoding format is ASCII code. The unified data exchange platform is used as a temporary field for storing data, and the main function of the unified data exchange platform is to perform mapping decoding of a data format, in this embodiment, each stock data small file with EBCDIC codes is decoded into a stock data small file composed of ASCII codes (GB18130 codes) which is generally suitable for a normal database or platform to store.

It should be noted that the splitting process described above may not exist, and the splitting process may be before decoding or after decoding, and generally, the splitting process is set after decoding.

In operation S330, the file with the target encoding format is (secondarily) split according to the key value in the file with the target encoding format, so as to obtain a subfile composed of the target encoding format.

According to the embodiment of the disclosure, the stock data subfile with the ASCII code obtained by splitting and decoding in the unified data exchange platform is transmitted to the file server of the open platform system in an FTP transmission manner, and then the stock data subfile with the ASCII code is split for the second time by using a hash algorithm according to different rules owned by different table data, for example, by using a client number as a primary key (key value), so as to obtain the stock data BIN file formed by ASCII codes.

In operation S340, the subfiles composed of the target encoding format are stored in the memory of the distributed file storage system.

According to the embodiment of the disclosure, the distributed file storage system comprises different fragments corresponding to the same or different actuators, each actuator has a set routing rule, and the inventory data BIN file formed by ASCII codes and obtained by secondary splitting is stored in the memory of the distributed file storage system according to the corresponding routing rule.

In operation S350, sub-files composed of the object encoding format in the distributed file storage system are imported into the platform system database in a batch manner.

According to the embodiment of the disclosure, because the character encoding format of the platform open system is that the ASCII code is consistent with the character encoding format of the platform system database, and the memory for storing the BIN file of stock data formed by ASCII codes is mounted on the corresponding slice of the actuator in the distributed file storage system, the BIN file of stock data is imported and read in batch into the platform system database or the server related to the platform system database in a LoadData mode.

Fig. 4 schematically illustrates an example graph of an inventory data heterogeneous system migration flow according to an embodiment of the present disclosure.

According to the embodiment of the disclosure, stock data in a host system is prepended to an open platform system through a unified data exchange platform. Specifically, in the host system, the stock data of the host system database is copied to the temporary table, and the data in the temporary table is stored as the stock data file in the file server, and then the transmission of the stock data file is started. In the unified data exchange platform, the data type conversion from the EBCDIC code to the ASCII code is completed through operations of splitting, decoding, translating and the like. In the open platform system, firstly, the stock data subfile with the ASCII code is secondarily split and stored in a corresponding memory of the distributed file storage system, and then the stock data subfile is led into a platform system database or a server associated with the platform system database in batch. For a more specific implementation manner of the embodiment, related descriptions are already provided in the embodiment described in fig. 3, and are not repeated here.

Through the specific embodiment, by arranging the unified data exchange platform, a hard control measure of data type mapping conversion is added between the two heterogeneous systems, the consistency of data between the heterogeneous systems can be ensured, and the data migration between the heterogeneous systems can be further flexibly realized.

FIG. 5 schematically illustrates a flow diagram for obtaining and writing incremental data to a message processing platform according to an embodiment of the disclosure.

As shown in fig. 5, the flow includes operations S510 to S550.

In operation S510, a message related to the incremental data is acquired.

According to an embodiment of the present disclosure, a message providing incremental data to the host system is captured in a targeted manner by a CAPTURE program.

In operation S520, the packet related to the incremental data is sent to the cache region in a single thread processing manner.

According to the embodiment of the disclosure, the message information is read in a single-thread preview reading mode and is put into a cache. The single thread can ensure that the messages put into the cache are time-ordered, and the preview reading mode is beneficial to ensuring that the messages can be subsequently recovered.

In operation S530, the packet related to the incremental data in the cache area is analyzed in a multi-thread parallel processing manner, so as to obtain the incremental data.

According to the embodiment of the disclosure, the cache data is analyzed through multiple threads, the analysis thread is responsible for operations such as message format disassembly and decoding, and due to complex processing and long consumed time, the multiple threads are used for concurrent processing.

In operation S540, incremental data having the same key value are recorded in the same queue according to the key value of the incremental data, so as to obtain queue data.

According to the embodiment of the disclosure, records of the same key value are put into the same local queue for processing, and records of different key values are put into different queues for concurrent processing.

In operation S550, the queue data is written to the message processing platform.

According to the embodiment of the disclosure, the data threads are concurrently written to KAFKA by way of a queue, and processed message information is periodically (e.g., may be 10 milliseconds) registered.

FIG. 6 schematically illustrates a flow diagram for storing incremental data in a message processing platform into a platform system database according to an embodiment of the disclosure.

It should be noted that, after the stock data is migration-warehousing completed, the migration warehousing of the incremental data part is started, and the warehouse-down switch is opened (i.e., a path from the KAFKA platform to the platform system database is opened). However, the KAFKA information is unordered, and when the application pulls the incremental data from the KAFKA to fall into the database, whether the related incremental data needs to be put in the database or not is to overwrite the stock data to update the data information or delete the incremental data to keep the latest data information, the increment and the stock of the data with the same key value need to be judged according to corresponding rules and then processed in the next step.

As shown in FIG. 6, the rule enforcement process includes operations S610-S650.

In operation S610, inventory data and incremental data having the same key value are acquired.

In operation S620, in the case where the update time of the incremental data is greater than the maximum time of the stock quantity data, the operation instruction currently directed to the incremental data is acquired.

In operation S630, if the operation command is an update command, it is determined whether stock data having the same key value as the incremental data exists in the current platform system database, and if so, the stock data is updated to the incremental data, and if not, the incremental data is inserted into the platform system database.

In operation S640, if the operation command is an insert command, the incremental data is directly inserted into the platform system database.

In operation S650, in the case that the operation command is a delete command, the flag state of the incremental data is set to deleted.

According to an embodiment of the present disclosure, the operation instruction currently directed to the incremental data may include one or more of an update instruction, an insert instruction, and a delete instruction, and the operations S630 to S650 may be performed sequentially, or simultaneously, or only one or more of them.

According to the embodiment of the disclosure, in order to prevent the error deletion and the modification, the incremental data marking the deleted state can be reserved for a certain period of time according to requirements.

Based on the above embodiments, it should be noted that when the update time of the current record to be warehoused (i.e. incremental data) is less than the maximum time of the warehoused record (i.e. inventory data), the incremental data is invalidated without any operation. One situation where such a structure occurs may be, for example, a period of time before migration of inventory data has not begun but acquisition of incremental data has begun.

It should be noted that, after the library falling switch is turned on, the incremental data is counted up until the current timestamp of the incremental data is the maximum timestamp. Meanwhile, due to the existence of a host system and platform system double-writing mechanism, incremental data of the time period from opening drop to leveling is captured. When the next data is moved, only the currently opened warehouse-falling switch needs to be temporarily closed, and after the stock data is moved, the house-falling switch is opened.

FIG. 7 schematically illustrates another flow diagram of a method of data migration according to an embodiment of the present disclosure.

As shown in fig. 7, the method includes operations S710 to S720.

In operation S710, inventory data stored in the platform system database is acquired.

In operation S720, the above-mentioned inventory data stored in the platform system database in the host system database is deleted.

According to other embodiments of the present disclosure, when the incremental data is generated, the incremental data is stored in the host system database on one hand, and is shared to KAFKA on the other hand by some resource sharing method. In the case where incremental data is migrated from KAFKA to the platform system database, measures may also be taken to prevent the host system from having insufficient storage resources, including, for example: limiting the concurrency of the migration operation on the host system side, cleaning the migrated host files, or deleting the incremental data migrated to the platform system database in the host system database.

FIG. 8 is a diagram schematically illustrating an example migration flow of an incremental data heterogeneous system according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, host data is captured from the host system database and placed in the MQ (message queue) by a CAPTURE program (i.e., a CAPTURE program). Specifically, the method comprises the following steps: the INSERT/UPDATE/DELETE records are first read in the host system's log, and each individual row operation is organized in transaction units from the host system's memory when replication subscriptions/data publications are active. The executing transaction will remain in memory until interrupted or the log record is committed. The interrupted transaction in memory is discarded and the committed transaction is written to the MQ send queue.

The MQ message information is read in a single thread mode and written into a RingBuffer (annular buffer). MSG (information) in RingBuffer is read in a multithreading mode, single partition processing is carried out by taking a table as a unit, one table is used for each topic (namely a theme), different consumer groups (consumer groups) are used by different applications, and data can be reserved for 3 days. And reading MSGs from the RingBuffer by each thread in the multiple threads, taking out the key values in each MSG to perform HASH (Hash) calculation, and realizing queue classification storage according to the key values.

Through the above specific embodiment, KAFKA determines that the processing of the related information is completed according to the processed information table, and then clears the MQ queue to periodically (for example, 10 seconds) clear the MQ message that has successfully sent KAFKA, thereby ensuring that the storage resource of KAFKA is sufficient. After the messages are taken from MQ, the messages with the same key values are sent to KAFKA according to the sequence, so that the downstream can directly fall into a library without considering the sequence problem.

For a more specific implementation manner of the above embodiment, related descriptions are already provided in the embodiments described in fig. 5 to 7, and are not repeated here.

Through the above specific embodiment, the incremental data starts a QR (replication technology) switch from the host system to KAFKA through the DRP platform before the stock data of the host system is downloaded, and starts an incremental data replication switch from KAFKA to the platform open system to perform the stock dropping of the incremental data after the stock dropping is completed. During which data is piled up in the KAFKA cluster, the KAFKA capacity and the amount of incremental copy data need to be monitored in order to prevent the risk of loss due to insufficient KAFKA storage capacity and deletion of the oldest data. When the risk of insufficient space exists, the cluster increment falling library which is completed to move can be opened in advance, and early data in KAFKA can be consumed as soon as possible.

In addition, the switches from QR to KAFKA of the host system are turned on in batches according to the time point of host stock data migration, so that the problems of poor stock falling efficiency, slow increment tracing and the like caused by repeated writing in the stock migration and increment copying can be effectively reduced. The message analysis process adopts the operations of message format disassembly and decoding and the like of multithread concurrent processing and analysis of the cache data, which can effectively avoid the problems of complex processing and long time consumption in the moving process. The processing efficiency is greatly improved, and the influence on the high-frequency online service is reduced.

FIG. 9 schematically shows a flowchart for building a backup database and a server according to an embodiment of the present disclosure.

As shown in fig. 9, the method includes operations S910 to S930.

In operation S910, before the stock data and the incremental data are not migrated, the stock data and the incremental data in binary form, which are already stored in the platform system database, are acquired.

According to the embodiment of the disclosure, the storage of the development platform system is fault-tolerant in a one-to-three manner, wherein a main library server (i.e., a server associated with the platform system database) operates on internal storage data, before migration and update of big data (including stock data and incremental data) are completed, the big data needs to be stored in the platform system database in a binary manner in a serial manner, and after the storage is completed, the main library server notifies an execution engine to submit a next step instruction.

In operation S920, in the case that the stock data and the incremental data migration are completed, the stock data and the incremental data in the binary form are read in a thread manner.

In operation S930, the binary form of the stock data and the incremental data are copied to the backup database for backup.

According to the embodiment of the disclosure, after the instruction is submitted in the next step, the standby library server copies the binary large data of the main library server into the directory of the standby library server and backups the binary large data. Then, the data in the standby library server is read through SQL thread, and the data is re-made, namely, the data in the main library server is converted to be synchronized to the standby library server.

It should be noted that, in the process of implementing the semi-synchronization of the primary and secondary databases, since the amount of transmitted data is large, there may be a risk that a large amount of network bandwidth is occupied when the primary and secondary databases are restored, so that other applications are affected, and it may be considered to restore the primary and secondary databases in batches according to the evaluation of the amount of transmitted data.

FIG. 10 schematically illustrates a master-slave database semi-synchronization flow diagram according to the present disclosure.

According to the embodiment of the disclosure, the online core business of some enterprises (such as banks and the like) requires 24 hours of uninterrupted operation, so that the stock data base is large, the requirement for database reading and writing operation becomes high, the waiting time becomes long as more users are used, and the concurrency problem is caused by the server load condition. Through a semi-synchronous mechanism of the main database and the standby database of the platform system, when the main database is abnormal, the data of the standby database can be automatically switched, so that the business risk is avoided in the shortest fault time, and the service is continuously provided to the outside. For a more specific implementation manner of this embodiment, related descriptions are already provided in the embodiment described in fig. 9, and are not repeated here.

FIG. 11 schematically shows a block diagram of a data migration system according to an embodiment of the present disclosure.

As shown in FIG. 11, the data migration system 1100 includes a determination module 1110, a first acquisition module 1120, a second acquisition module 1130, and a logging module 1140.

The determining module 1110 is configured to determine a data migration time of the stock data and an acquisition time of the incremental data, where the acquisition time is earlier than the data migration time.

The first obtaining module 1120 is configured to, when the current time reaches the obtaining time of the incremental data, obtain the incremental data, and write the incremental data into the message processing platform.

The second obtaining module 1130 is configured to, when the current time reaches the data migration time of the stock data, obtain the stock data stored in the host system database, and migrate the stock data to the platform system database in batches.

A logging module 1140, configured to log the incremental data in the message processing platform into the platform system database after migrating at least one batch of the inventory data to the platform system database.

According to an embodiment of the present disclosure, the data migration system further includes a host system database sub-module and a platform system database sub-module.

The host system database submodule realizes data storage through an initial coding format, and/or the host system database submodule is a database with a centralized framework.

And the platform system database submodule realizes data storage through a target coding format and/or the platform system database is a database with a distributed architecture.

According to an embodiment of the present disclosure, the data migration system further includes a first obtaining unit, a sending unit, an analyzing unit, a recording unit, and a writing unit.

And the first acquisition unit is used for acquiring the message related to the incremental data.

And the sending unit is used for sending the message related to the incremental data to the cache region in a single thread processing mode.

The analysis unit is used for analyzing the message related to the incremental data in the cache region in a multithreading parallel processing mode to obtain the incremental data;

and the recording unit is used for recording the incremental data with the same key value into the same queue according to the key value of the incremental data to obtain queue data.

And the writing unit is used for writing the queue data into the message processing platform.

According to an embodiment of the present disclosure, the data migration system further includes a second obtaining unit, a splitting unit, a converting unit, a secondary splitting unit, a storage unit, and an importing unit.

And the second acquisition unit is used for acquiring part of stock data with the initial coding format to obtain a file with the initial coding format.

And the conversion unit is used for converting the file with the initial coding format into the file with the target coding format.

And the splitting unit is used for splitting the file with the target coding format according to the key value in the file with the target coding format to obtain the subfile consisting of the target coding format.

And the storage unit is used for storing the subfiles formed by the target coding format into the memory of the distributed file storage system.

And the importing unit is used for importing the subfiles formed by the target coding format in the distributed file storage system into the platform system database in batches.

According to an embodiment of the present disclosure, the data migration system further includes a third obtaining unit, a fourth obtaining unit, a first implementing unit, a second implementing unit, and a third implementing unit.

And the third acquisition unit is used for acquiring stock data and incremental data with the same key value.

And the fourth acquisition unit is used for acquiring the current operation instruction aiming at the incremental data under the condition that the updating time of the incremental data is greater than the maximum time of the stock data.

And the first implementation unit is used for judging whether stock data with the same key value as the incremental data exists in the current platform system database or not when the operation instruction is an updating instruction, updating the stock data into the incremental data if the stock data exists, and inserting the incremental data into the platform system database if the stock data does not exist.

And the second implementation unit is used for directly inserting the incremental data into the platform system database under the condition that the operation instruction is an insertion instruction.

And the third implementation unit is used for setting the flag state of the incremental data to be deleted in the case that the operation instruction is a deletion instruction.

According to an embodiment of the present disclosure, the data migration system further includes a fifth obtaining unit and a deleting unit.

And the fifth acquiring unit is used for acquiring stock data stored in the platform system database.

And the deleting unit is used for deleting stock data stored in the platform system database in the host system database.

According to an embodiment of the present disclosure, the data migration system further includes

And the sixth acquiring unit is used for acquiring the binary stock data and the incremental data which are stored in the platform system database before the stock data and the incremental data are not migrated.

And the reading unit is used for reading the binary stock data and the increment data in a thread mode under the condition that the stock data and the increment data are migrated completely.

And the copying unit is used for copying the stock data and the incremental data in the binary form to a database of a backup database for backup.

Any number of modules, sub-modules, units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units according to the embodiments of the present disclosure may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging the circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, one or more of the modules, sub-modules, units according to embodiments of the disclosure may be implemented at least partly as computer program modules, which, when executed, may perform corresponding functions.

For example, any number of the determining module 1110, the first obtaining module 1120, the second obtaining module 1130, and the storing module 1140 may be combined and implemented in one module/sub-module/unit, or any one of the modules/sub-modules/units may be divided into a plurality of modules/sub-modules/units. Alternatively, at least part of the functionality of one or more of these modules/sub-modules/units may be combined with at least part of the functionality of other modules/sub-modules/units and implemented in one module/sub-module/unit. According to an embodiment of the present disclosure, at least one of the determining module 1110, the first obtaining module 1120, the second obtaining module 1130, and the storing module 1140 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or may be implemented in any one of three implementations of software, hardware, and firmware, or in a suitable combination of any of them. Alternatively, at least one of the determining module 1110, the first obtaining module 1120, the second obtaining module 1130, and the storing module 1140 may be at least partially implemented as a computer program module that, when executed, may perform a corresponding function.

It should be noted that, the data migration system part in the embodiment of the present disclosure corresponds to the data migration method part in the embodiment of the present disclosure, and the description of the data migration system part specifically refers to the data migration method part, which is not described herein again.

FIG. 12 schematically illustrates a block diagram of a computer system suitable for implementing the data migration method described above, in accordance with an embodiment of the present disclosure. The computer system illustrated in FIG. 12 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.

As shown in fig. 12, a computer system 1200 according to an embodiment of the present disclosure includes a processor 1201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. The processor 1201 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1201 may also include on-board memory for caching purposes. The processor 1201 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 1203, various programs and data necessary for the operation of the system 1200 are stored. The processor 1201, the ROM1202, and the RAM 1203 are connected to each other by a bus 1204. The processor 1201 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM1202 and/or the RAM 1203. Note that the programs may also be stored in one or more memories other than the ROM1202 and the RAM 1203. The processor 1201 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

System 1200 may also include an input/output (I/O) interface 1205, according to an embodiment of the disclosure, input/output (I/O) interface 1205 also connected to bus 1204. The system 1200 may also include one or more of the following components connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211. The computer program, when executed by the processor 1201, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM1202 and/or the RAM 1203 and/or one or more memories other than the ROM1202 and the RAM 1203 described above.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method of data migration, comprising:

determining data migration time of stock data and acquisition time of incremental data, wherein the acquisition time is earlier than the data migration time;

under the condition that the current time reaches the acquisition time of the incremental data, acquiring the incremental data and writing the incremental data into a message processing platform;

under the condition that the current time reaches the data migration time of the stock data, acquiring the stock data stored in a host system database, and migrating the stock data to a platform system database in batches; and

after at least one batch of the stock data is migrated to the platform system database, the incremental data in the message processing platform is stored in the platform system database.

2. The method of claim 1, wherein:

the host system database realizes data storage through an initial coding format, and the platform system database realizes data storage through a target coding format; and/or

3. The method of claim 2, wherein acquiring the delta data and writing the delta data to a message processing platform in the event that the current time reaches the acquisition time of the delta data comprises:

acquiring a message related to the incremental data;

sending the message related to the incremental data to a cache region in a single thread processing mode;

analyzing the message related to the incremental data in the cache region in a multithreading parallel processing mode to obtain the incremental data;

recording the incremental data with the same key value into the same queue according to the key value of the incremental data to obtain queue data; and

and writing the queue data into the message processing platform.

4. The method of claim 2, wherein obtaining the inventory data stored in a host system database and batch migrating the inventory data to a platform system database when a current time reaches a data migration time of the inventory data comprises:

acquiring part of stock data with an initial coding format to obtain a file with the initial coding format;

converting the file with the initial coding format into a file with a target coding format;

splitting the file with the target coding format according to the key values in the file with the target coding format to obtain subfiles formed by the target coding format;

storing the subfiles formed by the target coding format into a memory of a distributed file storage system; and

and importing the subfiles composed of the target coding format in the distributed file storage system into the platform system database in batches.

5. The method of claim 1, wherein storing the incremental data in the message processing platform into a platform system database after migrating at least a batch of the inventory data into the platform system database comprises:

acquiring the stock data and the incremental data with the same key value;

under the condition that the updating time of the incremental data is greater than the maximum time of the stock data, acquiring a current operation instruction aiming at the incremental data;

under the condition that the operation instruction is an updating instruction, judging whether stock data with the same key value as the incremental data exists in the current platform system database or not, if so, updating the stock data into the incremental data, and if not, inserting the incremental data into the platform system database;

under the condition that the operation instruction is an insertion instruction, the incremental data is directly inserted into the platform system database; and

and setting the flag state of the incremental data to be deleted when the operation instruction is a deletion instruction.

6. The method of claim 1, further comprising:

acquiring the stock data stored in the platform system database; and

deleting the stock data which is stored in the platform system database in the host system database.

7. The method of claim 1, further comprising:

before the stock data and the incremental data are not migrated, acquiring the stock data and the incremental data in a binary form which are stored in a platform system database;

under the condition that the stock data and the incremental data are migrated, reading the stock data and the incremental data in the binary form in a thread mode; and

and copying the stock data and the incremental data in the binary form to a database of a backup database for backup.

8. A data migration system, comprising:

the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining data migration time of stock data and acquisition time of incremental data, and the acquisition time is earlier than the data migration time;

the first acquisition module is used for acquiring the incremental data and writing the incremental data into a message processing platform under the condition that the current time reaches the acquisition time of the incremental data;

the second acquisition module is used for acquiring the stock data stored in a host system database and transferring the stock data to a platform system database in batches when the current time reaches the data transfer time of the stock data; and

and the storage module is used for storing the incremental data in the message processing platform into the platform system database after at least one batch of the stock data is migrated to the platform system database.

9. A computer system, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 7.