CN106909595B

CN106909595B - Data migration method and device

Info

Publication number: CN106909595B
Application number: CN201610445610.1A
Authority: CN
Inventors: 赵振林
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2016-06-20
Filing date: 2016-06-20
Publication date: 2020-12-29
Anticipated expiration: 2036-06-20
Also published as: CN106909595A

Abstract

The application relates to the technical field of databases, in particular to a data migration method and device, which are used for solving the problem that the storage and access of binary files influence the data transmission speed and stability of a database in the prior art. The embodiment of the application provides a data migration method, which comprises the following steps: determining a first file to be migrated based on a data record to be migrated in a first database; storing the first file to be migrated in a set file system, and storing the identification information of the first file to be migrated in a second database; the identification information of the first file to be migrated is information for identifying a storage location of the first file to be migrated in the file system.

Description

Data migration method and device

Technical Field

The present application relates to the field of database technologies, and in particular, to a data migration method and apparatus.

Background

Computer files are basically divided into two categories: binary files and computer programs such as ASCII (also called plain text files), graphic files, and word processing programs belong to the binary files. These files contain special formats and computer code. ASCII is a simple text file that can be read with any word processing program.

In some cases, for example, when a service is split, a current database cannot meet the storage requirement of a binary file (the binary file generally occupies a large storage space), etc., we may need to migrate data in the current database (hereinafter referred to as an old database) to another database (hereinafter referred to as a new database), and migrate an online service from the old database to the new database after completing the data migration. In the process of data migration, the old library is still used for providing data service for the online service.

During the data migration process, new data needs to be stored. In order not to affect the normal operation of the online service, a synchronous double-write mechanism is generally adopted to perform the migration of new data. Specifically, new data is written into the queue to be processed of the old bank first, then the data in the queue to be processed is synchronously written into the old bank and the new bank, and the data in the old bank before the time point when the synchronous double-write is started is migrated to the new bank.

At present, the storage and access to the binary file become a bottleneck affecting the data transmission speed and stability of the database. When data migration is performed, if the existing method is still used to perform migration of the binary file, the data migration efficiency and the data access performance after migration will be seriously affected.

Disclosure of Invention

The embodiment of the application provides a data migration method and device, which are used for solving the problem that the storage and access of binary files influence the data transmission speed and stability of a database in the prior art.

An embodiment of the present application provides a data migration method, including:

determining a first file to be migrated based on a data record to be migrated in a first database;

storing the first file to be migrated in a set file system, and storing the identification information of the first file to be migrated in a second database;

the identification information of the first file to be migrated is information for identifying a storage location of the first file to be migrated in the file system.

Optionally, the file system is a distributed file system.

Optionally, after storing the first file to be migrated in a set file system and storing the identification information of the first file to be migrated in a second database, the method further includes:

when an access request of a client for the first file to be migrated is received, the first file to be migrated stored in the file system is acquired based on the identification information of the first file to be migrated in the second database, and the acquired first file to be migrated is returned to the client.

Optionally, determining the first file to be migrated based on the data record to be migrated in the first database includes:

extracting a binary file from the data record to be migrated;

and determining at least one of the extracted binary file and a text file converted based on the binary file as the first file to be migrated.

Optionally, if the first file to be migrated includes the binary file, after storing the binary file in a set file system and storing the identification information of the binary file in a second database, the method further includes:

when a downloading request of a client for the binary file is received, acquiring the binary file from the file system based on the identification information of the binary file in the second database, and caching the acquired binary file in a Content Delivery Network (CDN);

and returning the access address of the binary file in the CDN to the client.

Optionally, if the data record to be migrated is a new data record located in the queue to be stored of the first database, the method further includes:

and storing the text file converted based on the binary file in the first database.

Optionally, the method further comprises:

extracting a second file to be migrated from the data record to be migrated, wherein the second file to be migrated is a non-binary file;

if the data record to be migrated is a new data record in the queue to be stored of the first database, storing the second file to be migrated in the first database and the second database;

and if the data record to be migrated is a historical data record in the stored list of the first database, storing the second file to be migrated in the second database.

Optionally, the method further comprises:

writing the identification information of each historical data record in the stored list of the first database into a preset data table; the identification information of each historical data record is used for identifying the storage position of the historical data record in the first database;

and respectively extracting each unprocessed historical data record from the first database as the data record to be migrated based on the identification information of each historical data record in the preset data table, and recording the processing state of the historical data record in the preset data table.

An embodiment of the present application provides a data migration apparatus, including:

the determining module is used for determining a first file to be migrated based on the data record to be migrated in the first database;

the storage module is used for storing the first file to be migrated in a set file system and storing the identification information of the first file to be migrated in a second database;

When the library migration is needed, the first file to be migrated is determined from the data records to be migrated in the first database, the first file to be migrated is stored in the set file system, and the identification information of the first file to be migrated is stored in the second database. In addition, the file system can be a distributed file system, and the distributed file system can provide high-reliability and high-concurrency storage access for the outside, so that the data access performance after data migration can be improved. Therefore, the storage and access capacity of the first file to be migrated, such as a binary file, can be improved.

In addition, in a preferred embodiment of the present application, format conversion is performed when the binary file is stored, that is, text content therein is extracted and stored, and after the conversion is performed, after a page access request of a client for the binary file is received, page rendering feedback can be quickly performed based on the text content of the binary file.

Drawings

FIG. 1 is a schematic diagram illustrating data migration according to an embodiment of the present application;

FIG. 2 is a flowchart of a data migration method according to an embodiment of the present application;

fig. 3 is a flowchart of a data migration method according to a second embodiment of the present application;

fig. 4 is a flowchart of a data migration method according to a third embodiment of the present application;

fig. 5 is a schematic structural diagram of a data migration apparatus according to an embodiment of the present application.

Detailed Description

Fig. 1 is a schematic diagram illustrating data migration according to an embodiment of the present application. In the embodiment of the application, for new data received after the determination of the migration is performed (after the new data is received, the new data is written into a queue to be processed of an old database at first), synchronous double writing (that is, writing into the new database and the old database respectively) is performed after different processes are performed on binary files and non-binary files therein respectively. When a binary file is written into a new database, a first file to be migrated (including the binary file and/or a text file converted based on the binary file) is stored in the distributed file system, and identification information identifying the storage location of the first file to be migrated is stored in the new database. The historical data stored in the old database is also divided into binary files and non-binary files, and the binary files and the non-binary files are respectively subjected to different processing and then stored in the new database.

The embodiments of the present application will be described in further detail with reference to the drawings attached hereto.

Example one

As shown in fig. 2, a flowchart of a data migration method provided in an embodiment of the present application includes the following steps:

s201: and determining a first file to be migrated based on the data record to be migrated in the first database.

Here, a plurality of data records to be migrated may exist in the first database, and for each of the data records to be migrated, a first file to be migrated is determined.

In the embodiment of the present application, it is considered that the binary file occupies a larger storage space, and the non-binary file in the data record to be migrated, that is, the text file, occupies a smaller storage space, so that the processing of step S202 may be performed only on the binary file therein.

Specifically, a binary file may be extracted from the data record to be migrated; and determining at least one of the extracted binary file and the text file converted based on the binary file as the first file to be migrated.

S202: storing the first file to be migrated in a set file system, and storing the identification information of the first file to be migrated in a second database; the identification information of the first file to be migrated is information for identifying a storage location of the first file to be migrated in the file system.

The binary file in the embodiment of the application comprises a graphic file, a word processing program and the like, and is a non-text file. Because the binary file occupies a large storage space, the binary file can be extracted from the data record to be migrated, only the corresponding identification information is stored in the database, and the binary file is stored in the file system.

Preferably, the file system may refer to a distributed file system. A Distributed File System (Distributed File System) is a File System network composed of a plurality of nodes. The distributed file system has the characteristics of high performance, high reliability and strong expandability, can support access with high concurrency, large storage capacity and large throughput by adding the server, and can effectively avoid single-point faults.

S203: when an access request of a client for the first file to be migrated is received, the first file to be migrated stored in the file system is acquired based on the identification information of the first file to be migrated in the second database, and the acquired first file to be migrated is returned to the client.

In specific implementation, if the client needs to access the first file to be migrated, the corresponding first file to be migrated is acquired from the distributed file system based on the identification information of the first file to be migrated, and is returned to the client.

Example two

The following takes the first file to be migrated as a binary file, and the second file to be migrated as a non-binary file, for example, as further described.

As shown in fig. 3, a flowchart of a data migration method provided in the second embodiment of the present application includes the following steps:

s301: the binary file and the non-binary file in the data record to be migrated in the first database are distinguished, and the following S302 is performed for the binary file, and the following S303 is performed for the non-binary file.

Here, the data record to be migrated may be new data (newly received, not yet put in storage) written in the queue to be processed of the first database (old database) after it is determined that the migration is necessary, or may be history data already stored in the first database.

For the historical data, the identification information (Identity, ID) of each historical data record in the stored list of the first database may be written into a preset data table (the data table is a created temporary table and may be deleted after the migration of the historical data is completed); the identification information of each historical data record is used for identifying the storage position of the historical data record in the first database (the identification information may be a storage address or an identification recorded in the first database and corresponding to the historical data record); and respectively extracting each unprocessed historical data record from the first database as the data record to be migrated based on the identification information of each historical data record in the preset data table, and recording the processing state of the historical data record in the preset data table. Here, a timing task may be set up, and based on the identification information of each history data record in the preset data table, each unprocessed history data record in the first database (which is not written into the second database) is periodically taken out for processing (i.e., written into the second database). Through the record processing state of the preset data table, the record of the historical data can be processed without repetition, and idempotent is supported.

For each new data record or history data record, the binary file (such as a PDF-formatted binary file) and the non-binary file are distinguished and processed separately.

S302: converting the binary file to obtain a text file; storing the binary file and the converted text file in a distributed file system, and storing identification information for identifying the storage position of the binary file in the distributed file system and identification information for identifying the storage position of the text file in the distributed file system in a second database; and if the data to be migrated is recorded as new data, storing the converted text file in the first database.

In S302, the binary file and the text file converted based on the binary file are stored in the distributed file system, and identification information (the identification information may be a storage address or an identification recorded in the distributed file system and corresponding to the binary file) identifying a storage location of the binary file in the distributed file system and identification information identifying the storage location of the text file are stored in the second database.

In this way, after the migration is completed (the database providing data services for the user is migrated from the first database to the second database), when a download request for the binary file by the client is received, the corresponding binary file can be obtained in the distributed file system and fed back to the client based on the identification information of the binary file stored in the second database. In addition, in order to further increase the download speed of the binary file, after the corresponding binary file is obtained from the distributed file system based on the identification information of the binary file, the obtained binary file may be cached in a Content Delivery Network (CDN), and an access address of the binary file in the CDN is returned to the client.

Here, for the binary file downloading, after the CDN cache is adopted, the access address of the binary file in the CDN is returned to the client. The CDN system can redirect the access request of the client to the service node closest to the client in real time according to the network flow, the connection and load conditions of each node, the distance to the client, the response time and other comprehensive information. Therefore, the client can acquire the access content nearby, the network congestion condition is avoided, and the response speed of the user for accessing the website is improved.

In addition, after the migration is completed, if a page access request of the client for the binary file is received, the corresponding text file can be obtained in the distributed file system based on the identification information of the text file, and the access page is rendered and fed back based on the text file.

In addition, if the data record to be migrated is new data, because the binary file occupies a large storage space, the embodiment of the present application does not store the binary file in the first database, that is, discards the binary file in the first database, but stores the text content extracted from the binary file in the first database in order not to affect the normal operation of the online service (that is, performs synchronous double-write). In this way, before the migration is completed, after a page access request of the client for the binary file is received, rendering feedback can be performed on an access page based on the text file in the first database.

In a specific implementation, when the binary file is a file in PDF format, a Pdfbox tool may be used to extract text content from the PDF file.

According to the method and the device, format conversion is performed when the binary file is stored, namely text content in the binary file is extracted to be stored, after the conversion is performed, the content of the binary file can be rendered and fed back quickly after an access request of a client for the binary file is received, and compared with the mode that format conversion is not performed in the prior art, the response speed is improved.

S303: if the data record to be migrated is a new data record in a queue to be stored of the first database, storing the non-binary file in the first database and the second database; and if the data record to be migrated is a historical data record in the stored list of the first database, storing the non-binary file in the second database.

Here, the non-binary file in the data record to be migrated is stored in the second database after performing model conversion processing (for example, splitting a data table), and the non-binary file is also stored in the first database when the data record to be migrated is new data.

S304: a database servicing the online service is migrated from the first database to the second database.

Specifically, a database interface for providing service for online services is switched from an access interface of the first database to an access interface of the second database.

Here, the database providing the data service to the client switches to the second database, that is, the data service interface using the second data provides the data service, but the interface of the client is not changed.

When a library needs to be migrated, the binary file in the data record to be migrated and the converted text file are stored in the distributed file system, and then the identification information for identifying the storage position of the binary file in the distributed file system and the identification information for identifying the storage position of the text file in the distributed file system are stored in the new database. Therefore, the identification information of the binary file and the converted text file is only stored in the new database, so that the efficiency of migrating data into the new database can be greatly improved. In addition, the distributed file system can provide high-reliability and high-concurrency storage access for the outside, so that the data access performance after data migration can be improved. Therefore, the storage and access capacity of the binary file can be improved. According to the method and the device, format conversion is performed when the binary file is stored, namely text content in the binary file is extracted to be stored, after the conversion is performed, after a page access request of a client for the binary file is received, page rendering feedback can be performed on the content of the binary file quickly, and compared with the mode that format conversion is not performed in the prior art, the response speed is improved.

Example two

Referring to the data migration diagram shown in fig. 1, a specific implementation process is provided in embodiment two of the present application:

the following S401a to S404a and S405 to S407 are performed for new data received after the determination of the migration, and the following S401b to S404b and S405 to S407 are performed for history data stored in the first database before the determination of the migration.

S401 a: and writing each received new data record into a pending queue of the first database.

S402 a: and sequentially taking out each new data record from the queue to be processed of the first database, distinguishing a binary file and a non-binary file in each new data record, executing the following S403a for the binary file, and executing the following S404a for the non-binary file.

S403 a: a text file is obtained based on the binary file conversion, the text file and the binary file are stored in a distributed file system, and identification information which respectively corresponds to the text file and the binary file and identifies the storage positions in the distributed file system is stored in a second database; and synchronously storing the text file in a first database.

S404 a: and after model conversion is carried out on the non-binary file, the non-binary file is synchronously stored in a first database and a second database.

S401 b: and writing the identification information of each historical data record in the first database into a preset data table.

S402 b: extracting each unprocessed historical data record from the first database respectively based on the identification information of each historical data record in the preset data table, distinguishing a binary file from a non-binary file in each extracted historical data record, executing the following S403b for the binary file, and executing the following S404b for the non-binary file. And recording the processing state of the historical data record in a preset data table.

S403 b: extracting a text in a binary file, storing the text and the binary file in a distributed file system, and storing identification information for identifying the storage positions of the text and the binary file in the distributed file system in a second database.

S404 b: and after model conversion is carried out on the non-binary file, the non-binary file is stored in a second database.

S405: and after the data migration is finished, migrating the database for providing service for the online service from the first database to the second database.

S406: when a page access request of a client aiming at a specified binary file is received, a text file stored in a distributed file system is obtained based on identification information of a text file corresponding to the specified binary file in a second database, and rendering feedback is carried out on an access page based on the text file.

S407: when a downloading request of a client for a specified binary file is received, the specified binary file stored in the distributed file system is obtained based on the identification information of the specified binary file in the second database, the specified binary file is cached in the CDN, and the corresponding access address of the CDN is returned to the client.

Based on the same inventive concept, the embodiment of the present application further provides a data migration apparatus corresponding to the data migration method, and as the principle of the apparatus for solving the problem is similar to the data migration method in the embodiment of the present application, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 5, a structure diagram of a data migration apparatus provided in an embodiment of the present application includes:

a determining module 51, configured to determine a first file to be migrated based on a data record to be migrated in a first database;

the storage module 52 is configured to store the first file to be migrated in a set file system, and store the identification information of the first file to be migrated in a second database;

Optionally, the apparatus further comprises:

a first obtaining module 53, configured to, when receiving an access request of a client for the first file to be migrated, obtain, based on the identification information of the first file to be migrated in the second database, the first file to be migrated stored in the file system, and return the first file to be migrated to the client.

Optionally, the determining module 51 is specifically configured to:

extracting a binary file from a data record to be migrated in a first database; and determining at least one of the extracted binary file and a text file converted based on the binary file as the first file to be migrated.

Optionally, the apparatus further comprises:

a second obtaining module 54, configured to, when a download request of a client for the binary file is received, obtain the binary file from the file system based on the identification information of the binary file in the second database, and cache the obtained binary file in a content delivery network CDN; and returning the access address of the binary file in the CDN to the client.

Optionally, if the data record to be migrated is a new data record located in the queue to be stored in the first database, the storage module 52 is further configured to:

Optionally, the determining module 51 is further configured to:

extracting a second file to be migrated from a data record to be migrated in a first database, wherein the second file to be migrated is a non-binary file;

the storage module 52 is further configured to: if the data record to be migrated is a new data record in the queue to be stored of the first database, storing the second file to be migrated in the first database and the second database; and if the data record to be migrated is a historical data record in the stored list of the first database, storing the second file to be migrated in the second database.

Optionally, the apparatus further comprises:

a recording module 55, configured to write the identification information of each historical data record in the stored list of the first database into a preset data table; the identification information of each historical data record is used for identifying the storage position of the historical data record in the first database; and respectively extracting each unprocessed historical data record from the first database as the data record to be migrated based on the identification information of each historical data record in the preset data table, and recording the processing state of the historical data record in the preset data table.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for data migration, the method comprising:

determining a first file to be migrated based on a data record to be migrated in a first database; the first file to be migrated is historical data written into the first database or new data to be written into the first database after the database migration is determined;

the identification information of the first file to be migrated is used for identifying the storage position of the first file to be migrated in the file system;

after the data migration is completed, the online business is migrated from the first database to the second database.

2. The method of claim 1, wherein the file system is a distributed file system.

3. The method of claim 1, wherein after storing the first file to be migrated in a set file system and storing identification information of the first file to be migrated in a second database, further comprising:

4. The method of claim 1, wherein determining the first file to be migrated based on the data records to be migrated in the first database comprises:

extracting a binary file from the data record to be migrated;

5. The method of claim 4, wherein if the first file to be migrated comprises the binary file, after storing the binary file in a set file system and storing identification information of the binary file in a second database, further comprising:

and returning the access address of the binary file in the CDN to the client.

6. The method of claim 4, wherein if the data record to be migrated is a new data record located in a queue to be stored of the first database, the method further comprises:

7. The method of claim 4, wherein the method further comprises:

8. The method of claim 1, wherein the method further comprises:

9. A data migration apparatus, comprising:

the determining module is used for determining a first file to be migrated based on the data record to be migrated in the first database; the first file to be migrated is historical data written into the first database or new data to be written into the first database after the database migration is determined;

after the data migration is completed, migrating the online service from the first database to a second database;

10. The apparatus of claim 9, wherein the apparatus further comprises:

and the first obtaining module is used for obtaining the first file to be migrated stored in the file system based on the identification information of the first file to be migrated in the second database when receiving an access request of a client for the first file to be migrated, and returning the first file to be migrated to the client.

11. The apparatus of claim 9, wherein the determination module is specifically configured to:

extracting a binary file from the data record to be migrated; and determining at least one of the extracted binary file and a text file converted based on the binary file as the first file to be migrated.

12. The apparatus of claim 11, wherein the apparatus further comprises:

the second obtaining module is used for obtaining the binary file from the file system based on the identification information of the binary file in the second database when a downloading request of a client for the binary file is received, and caching the obtained binary file in a Content Delivery Network (CDN); and returning the access address of the binary file in the CDN to the client.

13. The apparatus of claim 11, wherein if the data record to be migrated is a new data record located in a queue to be stored of the first database, the storage module is further configured to:

14. The apparatus of claim 11, wherein the determination module is further configured to:

the storage module is further configured to: if the data record to be migrated is a new data record in the queue to be stored of the first database, storing the second file to be migrated in the first database and the second database; and if the data record to be migrated is a historical data record in the stored list of the first database, storing the second file to be migrated in the second database.

15. The apparatus of claim 9, wherein the apparatus further comprises:

the recording module is used for writing the identification information of each historical data record in the stored list of the first database into a preset data table; the identification information of each historical data record is used for identifying the storage position of the historical data record in the first database; and respectively extracting each unprocessed historical data record from the first database as the data record to be migrated based on the identification information of each historical data record in the preset data table, and recording the processing state of the historical data record in the preset data table.