CN113051102A

CN113051102A - File backup method, device, system, storage medium and computer equipment

Info

Publication number: CN113051102A
Application number: CN201911364970.9A
Authority: CN
Inventors: 王有刚; 王云
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Yunnan Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Yunnan Co Ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2021-06-29
Anticipated expiration: 2039-12-26
Also published as: CN113051102B

Abstract

In the technical scheme of the file backup method, the device, the system, the storage medium and the computer equipment provided by the embodiment of the invention, a backup request of a file is sent to a command node, so that the command node inquires out the name and the node storage address of a data block in the file according to the backup request and sends the name and the node storage address of the data block to a backup server; generating the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers; and the node storage address of the data block is sent to the backup proxy server matched with the name of the data block, so that the backup proxy server acquires the data block from the data node corresponding to the node storage address according to the node storage address, backs up the data block to a backup medium, network resource consumption of a command node in a file backup process can be reduced, backup time is shortened, bottom input and output of the data node are improved, and the operation speed of the whole large data platform is improved.

Description

File backup method, device, system, storage medium and computer equipment

[ technical field ] A method for producing a semiconductor device

The present invention relates to the field of internet, and in particular, to a method, an apparatus, a system, a storage medium, and a computer device for file backup.

[ background of the invention ]

With the rapid development of information technology, big data is more and more closely related to our lives, and the influence on our lives is larger and larger. Under the rapid growth trend of unstructured data, a plurality of enterprises build own big data platforms, and most of the big data platforms are based on a distributed system infrastructure (Hadoop), such as internet enterprises BAT, Huashi, Google, large operators and government affair systems of all parts. However, natural disasters or human factors may cause the large data platform to be damaged and not to operate for a long time, and if effective data backup and data recovery means and measures are not adopted, data loss is caused, and sometimes the caused loss cannot be made up and measured.

The Hadoop has a redundancy structure, so that a plurality of files can be placed on different servers, and the reliability of the Hadoop is greatly guaranteed, but the protection mechanism cannot solve the problems of artificial misoperation, historical file tracing and the like, so that the importance of file backup on a Hadoop-based large data platform is self-evident.

The traditional file backup method of the big data platform is mainly characterized in that a Hadoop client is deployed on a backup server, and files are acquired to a backup medium through a Hadoop client access command node (Namenode). However, the file backup method has the disadvantages of large consumption of network resources of the command node (Namenode), overlong backup time and reduction of bottom input and output of part of data nodes (Datenode), thereby influencing the operation of the whole large data platform.

[ summary of the invention ]

In view of this, embodiments of the present invention provide a file backup method, an apparatus, a system, a storage medium, and a computer device, which can solve the problems that a traditional file backup method consumes large network resources of a command node, takes too long backup time, and reduces bottom layer input and output of a part of data nodes, thereby affecting the operation of a whole large data platform.

In a first aspect, an embodiment of the present invention provides a file backup method, where the method includes:

sending a backup request of a file to a command node, wherein the file comprises at least one data block, the backup request comprises the name of the file, so that the command node queries the name of the data block and the node storage address of the data block in the file according to the name of the file, and sends the name of the data block and the node storage address of the data block to the backup server;

generating the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers;

and matching the names of the data blocks with different backup proxy servers according to the workload of the backup proxy server, and sending the node storage addresses of the data blocks to the backup proxy server matched with the names of the data blocks, so that the backup proxy server acquires the data blocks from the data nodes corresponding to the node storage addresses according to the node storage addresses and backs up the data blocks to a backup medium.

In another aspect, an embodiment of the present invention provides a file backup method, where the method includes:

the method comprises the steps that a backup server sends a backup request of a file to a command node, wherein the file comprises at least one data block, and the backup request comprises the name of the file;

the command node inquires out the name of the data block and the node storage address of the data block in the file according to the name of the file, and sends the name of the data block and the node storage address of the data block to the backup server;

the backup server generates the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers;

the backup server matches the names of the data blocks with different backup proxy servers according to the workload of the backup proxy server, and sends the node storage addresses of the data blocks to the backup proxy server matched with the names of the data blocks;

and the backup proxy server acquires the data blocks from the data nodes corresponding to the node storage addresses according to the node storage addresses and backs up the data blocks to a backup medium.

Optionally, the generating, by the backup server, the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers specifically includes:

acquiring the number m of the data blocks and the number n of the backup proxy servers, wherein m and n comprise positive integers;

judging whether the number m of the data blocks is larger than the number n of the backup proxy servers;

if the number m of the data blocks is larger than the number n of the backup proxy servers, setting the workload of the backup proxy servers to be larger than or equal to INT (m/n) data blocks and smaller than or equal to INT (m/n) +1 data blocks;

and if the number m of the data blocks is judged to be less than or equal to the number n of the backup proxy servers, setting the workload of the backup proxy servers to be 1 or 0 data blocks.

Optionally, the obtaining, by the backup proxy server, the data block from the data node corresponding to the node storage address according to the node storage address, and backing up the data block to a backup medium includes:

the backup proxy server acquires the data block from a data node corresponding to the node storage address according to the node storage address of the data block; judging whether the backup medium stores the data block or not;

if the backup proxy server judges that the data block is stored in the backup medium, acquiring a data block pointer and a backup medium storage address of the data block;

and if the backup proxy server judges that the data block is not stored in the backup medium, the backup proxy server backs up the data block to the backup medium and acquires a data block pointer and a backup medium storage address of the data block.

Optionally, after the backup proxy server obtains the data block from the data node corresponding to the node storage address according to the node storage address and backs up the data block to a backup medium, the method further includes:

the backup proxy server sends the backup result of the backup proxy server to the backup server;

and the backup server combines the backup results sent by the backup proxy server into a backup record and stores the backup record into a database.

Optionally, the backup result includes a name of the data block, the data block pointer, the backup medium storage address, and a backup completion condition.

In another aspect, an embodiment of the present invention provides a file backup apparatus, where the apparatus includes:

the system comprises a receiving and sending module, a backup server and a backup server, wherein the receiving and sending module is used for sending a backup request of a file to a command node, the file comprises at least one data block, and the backup request comprises the name of the file, so that the command node inquires the name of the data block and the node storage address of the data block in the file according to the name of the file and sends the name of the data block and the node storage address of the data block to the backup server;

the generation module is used for generating the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers;

the matching module is used for matching the names of the data blocks with different backup proxy servers according to the workload of the backup proxy servers;

the transceiver module is further configured to send the node storage address of the data block to the backup proxy server matched with the name of the data block, so that the backup proxy server obtains the data block from the data node corresponding to the node storage address according to the node storage address, and backs up the data block to a backup medium.

In another aspect, an embodiment of the present invention provides a file backup system, where the system includes: the system comprises a backup server, a command node, at least one backup proxy server, at least one data node and a backup medium;

the backup server is used for sending a backup request of a file to a command node, wherein the file comprises at least one data block, and the backup request comprises the name of the file;

the command node is used for inquiring the name of the data block and the node storage address of the data block in the file according to the name of the file and sending the name of the data block and the node storage address of the data block to the backup server;

the backup server is also used for generating the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers; matching the name of the data block with different backup proxy servers according to the workload of the backup proxy server, and sending the node storage address of the data block to the backup proxy server matched with the name of the data block;

and the backup proxy server is used for acquiring the data blocks from the data nodes corresponding to the node storage addresses according to the node storage addresses and backing up the data blocks to a backup medium.

On the other hand, an embodiment of the present invention provides a storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the above file backup method.

In another aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, where the memory is used to store information including program instructions, and the processor is used to control execution of the program instructions, where the program instructions are loaded by the processor and executed to implement the steps of the above-mentioned file backup method.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a flowchart of a file backup method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a file backup method according to another embodiment of the present invention;

FIG. 3 is a flowchart of a file backup method according to another embodiment of the present invention;

FIG. 4 is a flowchart of a file backup method according to another embodiment of the present invention;

fig. 5 is a schematic structural diagram of a file backup apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a file backup system according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a computer device according to an embodiment of the present invention.

[ detailed description ] embodiments

For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one type of associative relationship that describes an associated object, meaning that three types of relationships may exist, e.g., A and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1 is a flowchart of a file backup method according to an embodiment of the present invention, and as shown in fig. 1, the method includes:

step 102, sending a backup request of a file to a command node, wherein the file comprises at least one data block, the backup request comprises a name of the file, so that the command node queries the name of the data block and a node storage address of the data block in the file according to the name of the file, and sends the name of the data block and the node storage address of the data block to a backup server.

And 104, generating the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers.

Step 106, matching the names of the data blocks with different backup proxy servers according to the workload of the backup proxy servers; and sending the node storage address of the data block to a backup proxy server matched with the name of the data block, so that the backup proxy server acquires the data block from the data node corresponding to the node storage address according to the node storage address and backs up the data block to a backup medium.

In the technical scheme of the file backup method provided by this embodiment, a backup request of a file is sent to a command node, so that the command node queries a name and a node storage address of a data block in the file according to the backup request, and sends the name and the node storage address of the data block to a backup server; generating the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers; and the node storage address of the data block is sent to the backup proxy server matched with the name of the data block, so that the backup proxy server acquires the data block from the data node corresponding to the node storage address according to the node storage address, backs up the data block to a backup medium, network resource consumption of a command node in a file backup process can be reduced, backup time is shortened, bottom input and output of the data node are improved, and the operation speed of the whole large data platform is improved.

Fig. 2 is a flowchart of a file backup method according to another embodiment of the present invention, as shown in fig. 2, the method includes:

step 202, sending a backup request of a file to the command node, where the file includes at least one data block, and the backup request includes a name of the file, so that the command node queries the name of the data block and a node storage address of the data block in the file according to the name of the file, and sends the name of the data block and the node storage address of the data block to the backup server.

In this embodiment, each step is executed by the backup server.

In this embodiment, the backup server sends a file backup request to the command node, and the data size is very small, which has little influence on the production network of the command node.

Specifically, the command node needs to form a distribution file of the detailed information of the file, i.e., the distribution information of all data blocks in the file, and transmit the distribution file to the backup server. The distribution file comprises the names of the data blocks in the file and the node storage addresses of the data blocks. Because the distribution information of the data blocks is recorded when files are created and redundancy is reconstructed in the Hadoop, the data blocks can be acquired through a Hadoop fsck/user/file-files-blocks-locations-tracks command without too many bottom-layer input and output.

TABLE 1 distribution information of data blocks in a file

Data block name	Node memory address
		b1	0x001 of data node 1
b2	0x002 of data node 1
		b3	0x003 of data node 1
b4	0x004 for data node n
		b5	0x005 of data node 1
b1’	0x001 of data node 2
		b2’	0x002 of data node 2
b3’	0x003 of data node 2
		b4’	0x004 for data node 1
b5’	0x005 of data node n
		b1”	0x001 of data node 3
b2”	0x002 of data node 3
		b3”	0x003 of data node 3
b4”	0x004 for data node 2
		b5”	0x005 of data node 3

Example (c): the command node inquires that data blocks with data block names b1, b2, b3, b 4' and b5 in the File are Distributed on the data node 1 according to the record information of a Distributed File System (HDFS); data chunks with data chunk names b1 ', b2 ', b3 ', b4 "are distributed on the data node 2; data chunks with data chunk names b1 ", b 2", b3 ", b 5" are distributed on the data node 3; the data blocks with the data block names b4 and b 5' are distributed on the data node n, and the file has a total of 15 data blocks distributed on 4 data nodes, wherein the node storage addresses of the data blocks are shown in table 1.

And 204, acquiring the number m of the data blocks and the number n of the backup proxy servers, wherein m and n comprise positive integers.

In this embodiment, the backup server obtains the number of data blocks to be backed up according to the name of the data block and the node storage address of the data block sent by the command node. After receiving the name of the data block and the node storage address of the data block sent by the command node, the backup server automatically generates a corresponding backup task sequence number for the name and the node storage address of each data block.

Step 206, judging whether the number m of the data blocks is larger than the number n of the backup proxy servers, if so, executing step 208; if not, go to step 210.

Step 208, setting the workload of the backup proxy server to be greater than or equal to INT (m/n) data blocks and less than or equal to INT (m/n) +1 data blocks; and proceeds to step 212.

Step 210, the workload of the backup proxy server is set to 1 or 0 data blocks.

Step 212, matching the name of the data block with different backup proxy servers according to the workload of the backup proxy servers; and sending the node storage address of the data block to a backup proxy server matched with the name of the data block, so that the backup proxy server acquires the data block from the data node corresponding to the node storage address according to the node storage address, backs up the data block to a backup medium, and sends a backup result of the backup proxy server to the backup server.

Specifically, the names of the data blocks are matched with different backup proxy servers through an optimal balancing algorithm according to the workload of the backup proxy servers. The optimal balancing algorithm is based on the allocation principle that names of data blocks are matched to different backup proxy servers as much as possible.

TABLE 2 backup task assignment results

Backup task sequence numbers	Name of data block	Data node	Backup proxy server
				5-1	1	Data node 1	Backup proxy server 1
5-2	2	Data node 1	Backup proxy server 1
				5-3	3	Data node 2	Backup proxy server 2
5-4	4	Data node n	Backup proxy server 3
				5-5	5	Data node 3	Backup proxy server 2

For example: the backup server receives the distribution file sent by the command node, 5 data blocks are obtained according to the distribution file and distributed on 4 data nodes, and 3 backup proxy servers are obtained. The backup server performs backup task allocation through an optimal balancing algorithm, the result of the backup task allocation is shown in table 2, and the names of the 5 data blocks are 1, 2, 3, 4 and 5, respectively. Wherein, the data blocks 1 and 2 are backed up to a backup medium by the backup proxy server 1 through the data node 1; the data block 3 is backed up to a backup medium by the backup proxy server 2 through the data node 2; the data block 4 is backed up to a backup medium by the backup proxy server 2 through the data node 3; the data blocks 5 are backed up by the backup proxy 3 via the data nodes n onto the backup medium.

Step 214, combining the backup results sent by the backup proxy server into a backup record, and storing the backup record in the database.

In this embodiment, the backup result includes the name of the data block, the pointer of the data block, the storage address of the backup medium, and the backup completion condition. The data block pointers correspond to the data blocks and the storage addresses of the backup media one by one and are used for pointing to the storage addresses of the backup media of the data blocks.

The traditional file backup method is mainly characterized in that a Hadoop client is deployed on a backup server, and files are acquired to a backup medium through a Hadoop client access command node. The traditional file backup method has the following defects:

1) the file is only read through the command node, so that the network resource consumption of the command node is large, and even the operation of the whole large data platform is influenced.

2) The command node transmits data in a single point, and the whole backup time is too long and the backup performance is poor in an environment with a large backup requirement.

3) Based on file mode transmission, due to single file backup, the whole backup process command node reading strategy is not changed, so that a plurality of files are transmitted to the command node through one or a few data nodes, and the influence on the bottom input and output of the partial data nodes is large.

In this embodiment, an elastic parallel framework is introduced, a plurality of proxy backup servers are used to backup files by reading data blocks from a plurality of data nodes, and the files are stored in a backup medium, and finally, a complete backup record is logically synthesized on the backup server. In this embodiment, the data blocks in the backup file are transmitted through a plurality of data nodes, so that the network transmission pressure of the command nodes can be reduced, the backup is reduced, and the input and output of the bottom layer are balanced on the data nodes, thereby improving the backup efficiency of the whole large data platform.

Fig. 3 is a flowchart of a file backup method according to another embodiment of the present invention, as shown in fig. 3, the method includes:

step 302, the backup server sends a backup request for a file to the command node, where the file includes at least one data block, and the backup request includes a name of the file.

And step 304, the command node queries the name of the data block in the file and the node storage address of the data block according to the name of the file, and sends the name of the data block and the node storage address of the data block to the backup server.

Step 306, the backup server generates the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers.

And 308, the backup server matches the names of the data blocks with different backup proxy servers according to the workload of the backup proxy servers, and sends the node storage addresses of the data blocks to the backup proxy servers matched with the names of the data blocks.

And step 310, the backup proxy server acquires the data blocks from the data nodes corresponding to the node storage addresses according to the node storage addresses, and backs up the data blocks to the backup medium.

In the technical solution of the file backup method provided in this embodiment, a backup server sends a backup request for a file to a command node, where the file includes at least one data block, and the backup request includes a name of the file; the command node inquires out the name of a data block in the file and the node storage address of the data block according to the name of the file, and sends the name of the data block and the node storage address of the data block to the backup server; the backup server generates the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers; the backup server matches the names of the data blocks with different backup proxy servers according to the workload of the backup proxy servers, and sends the node storage addresses of the data blocks to the backup proxy servers matched with the names of the data blocks; the backup proxy server acquires the data blocks from the data nodes corresponding to the node storage addresses according to the node storage addresses, and backs up the data blocks to the backup medium, so that the network resource consumption of the command nodes in the file backup process can be reduced, the backup time is shortened, the bottom input and output of the data nodes are improved, and the running speed of the whole large data platform is increased.

Fig. 4 is a flowchart of a file backup method according to another embodiment of the present invention, as shown in fig. 4, the method includes:

step 402, the backup server sends a backup request of a file to the command node, wherein the file comprises at least one data block, and the backup request comprises a name of the file.

Step 404, instructing the node to query the name of the data block in the file and the node storage address of the data block according to the name of the file, and sending the name of the data block and the node storage address of the data block to the backup server.

And step 406, the backup server generates the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers.

In this embodiment, step 406 specifically includes:

step 4062, the number m of the acquired data blocks and the number n of the backup proxy servers, where m and n include positive integers.

Step 4064, determine whether the number m of data blocks is greater than the number n of backup proxy servers, if yes, execute step 4066; if not, go to step 4068.

Step 4066, setting the workload of the backup proxy server to be greater than or equal to INT (m/n) data blocks and less than or equal to INT (m/n) +1 data blocks; and proceeds to step 408.

Step 4068 sets the workload of the backup proxy server to 1 or 0 data blocks.

And step 408, the backup server matches the names of the data blocks with different backup proxy servers according to the workload of the backup proxy servers, and sends the node storage addresses of the data blocks to the backup proxy servers matched with the names of the data blocks.

And step 410, the backup proxy server acquires the data blocks from the data nodes corresponding to the node storage addresses according to the node storage addresses, and backs up the data blocks to the backup medium.

In this embodiment, step 410 specifically includes:

step 4102, the backup proxy server obtains the data block from the data node corresponding to the node storage address according to the node storage address of the data block; judging whether the backup medium stores data blocks or not, if so, executing a step 4104; if not, go to step 4106.

Step 4104, the backup proxy server obtains the data block pointer and the backup media storage address of the data block, and continues to execute step 412.

Step 4106, the backup proxy server backs up the data block to the backup medium and obtains the data block pointer and the backup medium storage address of the data block.

For example: the file comprises 5 data blocks, the names of the 5 data blocks are respectively 1, 2, 3, 4 and 5, and the contents of the 5 data blocks are respectively as follows: d139, EF31, 7C4E, 876A, A38B. The backup proxy server 1 acquires the contents of the data blocks 1 and 2 through the data node 1; the backup proxy server 2 acquires the content of the data block 3 through the data node 2; the backup proxy server 2 acquires the content of the data block 4 through the data node 3; the backup proxy 3 obtains the content of the data block 5 via the data node n.

Table 3 existing data block record table of backup medium

Table 3 is a table of existing data block records of the backup medium, and it can be seen from table 3 that data blocks 1, 3, and 4 are already stored in the backup medium, and data blocks 2 and 5 are not stored in the backup medium. Thus, backup proxy server 1 retrieves data block pointer fs11 and backup medium storage address 0xaaa1 for data block 1 from the backup medium; the backup proxy server 2 acquires a data block pointer fs13 and a backup medium storage address 0xbbb3 of the data block 3 from the backup medium; backup proxy server 2 retrieves data chunk pointer fs14 and backup media storage address 0xccc4 for data chunk 4 from the backup media. Table 4 is a data block record table in the backup medium after the backup is completed, and as shown in table 4, the backup proxy server 1 backs up the content EF31 of the data block 2 to the backup medium, and obtains the data block pointer fs12 and the backup medium storage address 0xeee2 of the data block 2 from the backup medium; backup proxy 3 backs up content a38B of data block 5 to the backup medium and obtains data block pointer fs15 and backup medium storage address 0xeee3 of data block 5 from the backup medium.

Table 4 table of data block records in backup medium after completing backup

The data block pointers correspond to the data blocks and the storage addresses of the backup media one by one and are used for pointing to the storage addresses of the backup media of the data blocks.

In the embodiment, the function of deleting the repeated data blocks is added in the file backup process, that is, repeated backup is not performed on the data blocks stored in the backup medium, so that the storage space of the backup medium is saved.

Step 412, the backup proxy server sends the backup result of the backup proxy server to the backup server.

Step 414, the backup server combines the backup results sent by the backup proxy server into a backup record, and stores the backup record in the database.

TABLE 5 backup records

Backup task sequence numbers	Name of data block	Data block pointer	Backup media storage address	Completion of backup
					5-1	1	fs11	0xaaa1	Yes
5-2	2	fs12	0xeee2	Yes
					5-3	3	fs13	0xbbb3	Yes
5-4	4	fs14	0xccc4	Yes
					5-5	5	fs15	0xeee3	Yes

For example, backup proxy server 1 sends data block pointer fs11, backup media storage address 0xaaa1, and backup complete for data block 1, and data block pointer fs12, backup media storage address 0xeee2, and backup complete for data block 2 to the backup server; the backup proxy server 2 sends the data block pointer fs13, the backup medium storage address 0xbbb3 and the backup completion condition of the data block 3, and the data block pointer fs14, the backup medium storage address 0xccc4 and the backup completion condition of the data block 4 to the backup server; the backup proxy server 3 sends the data chunk pointer fs15, the backup media storage address 0xeee3, and the backup completion of the data chunk 5 to the backup server. The backup server combines the backup results sent by all backup proxy servers into one backup record, as shown in table 5. Wherein "Yes" in the case of completion of the backup indicates that the backup was successful, and "No" indicates that the backup failed.

In this embodiment, the backup server combines the backup results sent by the backup proxy server into one backup record, which is only a logical assembly, and the whole process is completed in a very short time, and the system resources are rarely occupied.

On a big data platform, data blocks of a file in the embodiment are acquired from different data nodes, and backup is performed through different backup agents, so that not only can the network pressure of command nodes in the traditional file backup method be relieved, but also the bandwidth can be fully utilized to increase the backup efficiency, and meanwhile, hot nodes of the data nodes in the backup process can be reduced.

In the embodiment, a dual parallel framework is adopted, that is, data acquisition of the backup file is completed by a plurality of data nodes in parallel, the whole backup task is written into the backup medium by a plurality of backup proxy servers in parallel, and finally, a complete backup record is logically synthesized on the backup server. The technical scheme provided by the embodiment inherits the protection of the traditional file backup method on the data in the big data platform, and simultaneously makes up the defects of the traditional file backup, such as the shortage of command node network resources, the unbalance of data node input and output and the like, so that the file backup of the whole big data platform becomes more feasible and more efficient.

Fig. 5 is a schematic structural diagram of a file backup apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus includes: a transceiver module 51, a generating module 52 and a matching module 53.

The transceiving module 51 is configured to send a backup request of a file to the command node, where the file includes at least one data block, and the backup request includes a name of the file, so that the command node queries, according to the name of the file, names of the data blocks in the file and node storage addresses of the data blocks, and sends the names of the data blocks and the node storage addresses of the data blocks to the backup server.

The file backup device provided by the embodiment comprises a backup server.

And a generating module 52, configured to generate a workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers.

In this embodiment, the generating module 52 specifically includes:

and the obtaining submodule 521 is configured to obtain the number m of data blocks and the number n of backup proxy servers.

The determining submodule 522 is configured to determine whether the number m of the data blocks is greater than the number n of the backup proxy servers.

A setting sub-module 523, configured to set, when the determining sub-module 522 determines that the number m of the data blocks is greater than the number n of the backup proxy servers, a workload of the backup proxy servers to be greater than or equal to INT (m/n) data blocks and less than or equal to INT (m/n) +1 data blocks; and continuing to execute the operation of matching the names of the data blocks with different backup proxy servers according to the workload of the backup proxy servers.

The setting sub-module 523 is further configured to set the workload of the backup proxy server to 1 or 0 data block when the determining sub-module 522 determines that the number m of the data blocks is less than or equal to the number n of the backup proxy servers; and continuing to execute the operation of matching the names of the data blocks with different backup proxy servers according to the workload of the backup proxy servers.

And a matching module 53, configured to match the name of the data block with different backup proxy servers according to the workload of the backup proxy servers.

The transceiver module 51 is further configured to send the node storage address of the data block to a backup proxy server matched with the name of the data block, so that the backup proxy server obtains the data block from the data node corresponding to the node storage address according to the node storage address, and backs up the data block to a backup medium.

The file backup apparatus provided in this embodiment may be used to implement the file backup method in fig. 1 to fig. 2, and for specific description, reference may be made to the embodiment of the file backup method described above, and a description thereof is not repeated here.

In the technical scheme of the file backup device provided by the embodiment of the invention, a backup request of a file is sent to a command node, so that the command node inquires out the name and the node storage address of a data block in the file according to the backup request and sends the name and the node storage address of the data block to a backup server; generating the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers; and the node storage address of the data block is sent to the backup proxy server matched with the name of the data block, so that the backup proxy server acquires the data block from the data node corresponding to the node storage address according to the node storage address, backs up the data block to a backup medium, network resource consumption of a command node in a file backup process can be reduced, backup time is shortened, bottom input and output of the data node are improved, and the operation speed of the whole large data platform is improved.

Fig. 6 is a schematic structural diagram of a file backup system according to an embodiment of the present invention, as shown in fig. 6, the system includes: backup server 61, command node 62, at least one backup proxy server 63, at least one data node 64, and backup media 64.

A backup server 61 for sending a backup request of a file to the command node 62, the file comprising at least one data block, the backup request comprising a name of the file.

The command node 62 is configured to query names of data blocks in the file and node storage addresses of the data blocks according to the names of the file, and send the names of the data blocks and the node storage addresses of the data blocks to the backup server 61;

the backup server 61 is further configured to generate a workload of the backup proxy server 63 according to the number of the acquired data blocks and the number of the backup proxy servers 63; according to the workload of the backup proxy server 63, the name of the data block is matched with different backup proxy servers 63, and the node storage address of the data block is transmitted to the backup proxy server 63 matched with the name of the data block.

In this embodiment, the number of the acquired data blocks includes m, and the number of the acquired backup proxy servers includes n; when m is larger than n, the workload of the backup proxy server is larger than or equal to INT (m/n) data blocks and smaller than or equal to INT (m/n) +1 data blocks; and when m is less than or equal to n, the number of the data blocks included in the workload of the backup proxy server is 1 or 0.

Specifically, the names of the data blocks are matched with different backup proxy servers 63 through an optimal balancing algorithm according to the workload of the backup proxy servers 63. The optimal balancing algorithm is based on the allocation principle that names of data blocks are matched to different backup proxy servers as much as possible.

And the backup proxy server 63 is configured to obtain the data blocks from the data nodes 64 corresponding to the node storage addresses according to the node storage addresses, and backup the data blocks to the backup medium 64.

In this embodiment, the backup proxy server 63 is specifically configured to: acquiring a data block from a data node corresponding to a node storage address according to the node storage address of the data block; judging whether the backup medium stores data blocks or not; if the backup medium is judged to be stored with the data blocks, the backup proxy server acquires the data block pointers and the backup medium storage addresses of the data blocks and continues to execute the operation of sending the backup result of the backup proxy server 63 to the backup server 61; if the backup medium is judged to be stored without the data blocks, the data blocks are backed up to the backup medium, and the data block pointers and the backup medium storage addresses of the data blocks are obtained.

In this embodiment, the backup proxy server 63 is further configured to send the backup result of the backup proxy server 63 to the backup server 61.

In this embodiment, the backup server 61 is further configured to combine the backup results sent by the backup proxy server into a backup record, and store the backup record in the database.

The file backup system provided by this embodiment may be used to implement the file backup method in fig. 3 to fig. 4, and for specific description, reference may be made to the embodiment of the file backup method described above, and a description thereof is not repeated here.

In the technical scheme of the file backup system provided by the embodiment of the invention, a backup server sends a backup request of a file to a command node, wherein the file comprises at least one data block, and the backup request comprises the name of the file; the command node inquires out the name of a data block in the file and the node storage address of the data block according to the name of the file, and sends the name of the data block and the node storage address of the data block to the backup server; the backup server generates the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers; the backup server matches the names of the data blocks with different backup proxy servers according to the workload of the backup proxy servers, and sends the node storage addresses of the data blocks to the backup proxy servers matched with the names of the data blocks; the backup proxy server acquires the data blocks from the data nodes corresponding to the node storage addresses according to the node storage addresses, and backs up the data blocks to the backup medium, so that the network resource consumption of the command nodes in the file backup process can be reduced, the backup time is shortened, the bottom input and output of the data nodes are improved, and the running speed of the whole large data platform is increased.

Fig. 7 is a schematic diagram of a computer device according to an embodiment of the present invention. As shown in fig. 5, the computer device 20 of this embodiment includes: the processor 21, the memory 22, and the computer program 23 stored in the memory 22 and capable of running on the processor 21, where the computer program 23 is executed by the processor 21 to implement the file backup method applied in the embodiment, and in order to avoid redundancy, the description is not repeated here. Alternatively, the computer program is executed by the processor 21 to implement the functions of the models/units applied in the file backup apparatus in the embodiment, which are not described herein again to avoid redundancy.

The computer device 20 includes, but is not limited to, a processor 21, a memory 22. Those skilled in the art will appreciate that fig. 5 is only an example of a computer device 20 and is not intended to limit the computer device 20 and that it may include more or less components than shown, or some components may be combined, or different components, e.g., the computer device may also include input output devices, network access devices, buses, etc.

The Processor 21 may be a Central Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 22 may be an internal storage unit of the computer device 20, such as a hard disk or a memory of the computer device 20. The memory 22 may also be an external storage device of the computer device 20, such as a plug-in hard disk provided on the computer device 20, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 22 may also include both internal storage units of the computer device 20 and external storage devices. The memory 22 is used for storing computer programs and other programs and data required by the computer device. The memory 22 may also be used to temporarily store data that has been output or is to be output.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for file backup, the method comprising:

2. A method for file backup, the method comprising:

3. The method for file backup according to claim 2, wherein the backup server generates the workload of the backup proxy server according to the number of the acquired data blocks and the number of the backup proxy servers, and specifically comprises:

4. The method for backing up files according to claim 2, wherein the backup proxy server obtains the data blocks from the data nodes corresponding to the node storage addresses according to the node storage addresses, and backs up the data blocks to a backup medium, specifically comprising:

5. The method for backing up files according to claim 2 or 4, wherein the backup proxy server obtains the data blocks from the data nodes corresponding to the node storage addresses according to the node storage addresses, and further comprises, after backing up the data blocks to a backup medium:

6. The method of claim 5, wherein the backup result comprises a name of the data block, the data block pointer, the backup medium storage address, and a backup completion.

7. A file backup apparatus, characterized in that the apparatus comprises:

8. A file backup system, the system comprising: the system comprises a backup server, a command node, at least one backup proxy server, at least one data node and a backup medium;

9. A storage medium, characterized in that the storage medium includes a stored program, wherein, when the program runs, a device on which the storage medium is located is controlled to execute the file backup method according to claim 1.

10. A computer device comprising a memory for storing information including program instructions and a processor for controlling the execution of the program instructions, wherein the program instructions are loaded and executed by the processor to implement the steps of the file backup method of claim 1.