CN110795404B

CN110795404B - Hadoop distributed file system and operation method and repair method thereof

Info

Publication number: CN110795404B
Application number: CN201911056278.XA
Authority: CN
Inventors: 樊林
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2023-04-07
Anticipated expiration: 2039-10-31
Also published as: CN110795404A

Abstract

The embodiment of the invention provides a Hadoop distributed file system and an operation method and a repair method thereof, belongs to the technical field of data processing, and can store more metadata sequences at higher frequency and enhance the safety of data. A Hadoop distributed file system comprising: the first name node and the second name node are connected with the first name node; a first name node for storing a metadata file; the metadata file comprises a mirror image file and an editing log; the first name node is also used for merging the image file and the editing log at intervals of a preset first time length to form a new image file and starting a new editing log; the second name node is used for backing up a new mirror image file on the first name node every other preset first time length; and the backup module is also used for backing up the editing log on the first name node every other preset second time length.

Description

Hadoop distributed file system and operation method and repair method thereof

Technical Field

The invention relates to the technical field of data processing, in particular to a Hadoop distributed file system and an operation method and a repair method thereof.

Background

The storage System of the Hadoop cluster is a Hadoop Distributed File System (HDFS), and the HDFS can effectively solve the storage and management problems of mass data, and specifically comprises: a certain file system fixed at a certain place is expanded to any multiple places and multiple file systems, and a file system network is formed by multiple nodes. Each node may be distributed at different locations, with communication and data transfer between nodes over the network.

When HDFS is used, there is no concern as to which node the data is stored on or obtained from, and only the data in the file system needs to be managed and stored as if a local file system is used.

Disclosure of Invention

The embodiment of the invention provides a Hadoop distributed file system, an operation method and a repair method thereof, which can save more metadata sequences at a higher frequency and enhance the data security.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

in one aspect, a Hadoop distributed file system is provided, including: the first name node and the second name node are connected with the first name node; the first name node is used for storing a metadata file; the metadata file comprises a mirror image file and an editing log; the first name node is further used for merging the image file and the editing log at intervals of a preset first time length to form a new image file and starting a new editing log; the second name node is used for backing up a new mirror image file on the first name node every other preset first time length; the first name node is used for storing the editing log of the first name node; the second duration is less than the first duration; the first name node and the second name node are respectively deployed in different hosts.

In another aspect, a method for operating the Hadoop distributed file system as described above is provided, including: starting a first name node, loading a metadata file, and simultaneously starting a second name node; the metadata file comprises a mirror image file and an editing log; combining the image file and the editing log by the first name node every other preset first time length to form a new image file and start a new editing log; the second name node backs up the new image file on the first name node; the second name node backs up the editing log on the first name node every other preset second time length; the second duration is less than the first duration; the first name node and the second name node are respectively deployed in different hosts.

In another aspect, a repair method for the Hadoop distributed file system is provided, where the repair method includes: stopping the damaged first name node and stopping the second name node at the same time; backing up the metadata file on the second name node; the metadata file comprises a mirror image file and an editing log which are backed up by the second name node from the damaged first name node for the last time; preparing a host for deploying a new first name node; the host for deploying the new first name node is different from the host for deploying the damaged first name node and the second name node; configuring the name, IP address, mutual trust login, operation environment and cluster of the host; performing formatting on the new first name node; sending the backup image file and the backup editing log on the second name node to the new first name node; and modifying the metadata sequence number of the new first name node.

Optionally, after modifying the metadata sequence number of the new first name node, further comprising: starting a new first name node and a corresponding Hadoop distributed file system; and executing Hadoop distributed file system check.

In yet another aspect, a Hadoop distributed file system is provided, comprising: the first name node and the second name node are connected with the first name node; the first name node is used for storing a metadata file; the metadata file comprises a mirror image file and an editing log; the first name node is also used for starting a new editing log at intervals of a preset first time length; the second name node is configured to backup the original image file and the original editing log on the first name node every a preset first duration, merge the backed-up image file and the editing log to form a new image file, and send the new image file back to the first name node to replace the original image file of the first name node; the second name node is further configured to backup the edit log on the first name node every preset second duration to form an intermediate edit log; the second duration is less than the first duration; the first name node and the second name node are respectively deployed in different hosts.

In another aspect, a method for operating the Hadoop distributed file system as described above is provided, including: starting a first name node, loading a metadata file, and simultaneously starting a second name node; the metadata file comprises a mirror image file and an editing log; starting a new editing log by the first name node every a preset first time length; the second name node backs up the original image file and the original editing log on the first name node, combines the backed-up image file and the editing log to form a new image file, and sends the new image file back to the first name node to replace the original image file of the first name node; the second name node backs up the editing log on the first name node every other preset second time length to form an intermediate editing log; the second duration is less than the first duration; wherein the name node and the second name node are respectively deployed in different hosts.

In another aspect, a repair method for the Hadoop distributed file system is provided, and includes: stopping the damaged first name node and stopping the second name node at the same time; backing up the metadata file on the second name node; the metadata file comprises a mirror image file and an editing log which are backed up by the second name node from the damaged first name node for the last time, and an intermediate editing log which is formed for the last time; preparing a host for deploying a new first name node; the host for deploying the new first name node is different from the host for deploying the damaged first name node and the second name node; configuring the name, IP address, mutual trust login, operation environment and cluster of the host; performing formatting on the new first name node; merging the backed-up image file, the intermediate editing log and the intermediate editing log on the second name node to generate a new image file, and sending the new image file to the new first name node; and modifying the new metadata sequence number of the first name node.

In yet another aspect, a computer device is provided that includes a storage unit and a processing unit; the storage unit stores therein a computer program executable on the processing unit and stores the result; when the processing unit executes the computer program, the operation method of the Hadoop distributed file system and/or the repair method of the Hadoop distributed file system are/is realized.

In a further aspect, a computer-readable medium is provided, which stores a computer program that, when executed by a processor, implements a method of operating a Hadoop distributed file system as described above, and/or a method of repairing a Hadoop distributed file system as described above.

The embodiment of the invention provides a Hadoop distributed file system, an operation method and a repair method thereof, wherein a mirror image file and an editing log are merged at a first name node every other preset first time length to form a new mirror image file, the editing log on the first name node is backed up at a preset first time length through a second name node connected with the first name node on the basis of starting the new editing log, the second time length is shorter than the first time length, so that the editing log is backed up at a higher frequency, more metadata sequences are stored, and the safety of data in the Hadoop distributed file system is enhanced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic structural diagram of a conventional Hadoop distributed file system;

FIG. 2 is a schematic flow chart illustrating a method for operating a Hadoop distributed file system according to an embodiment of the present invention;

FIG. 3 is a schematic flowchart illustrating a repair method for a Hadoop distributed file system according to an embodiment of the present invention;

FIG. 4 is a schematic flowchart illustrating a repair method for a Hadoop distributed file system according to an embodiment of the present invention;

FIG. 5 is a schematic flowchart of a method for operating a Hadoop distributed file system according to another embodiment of the present invention;

FIG. 6 is a schematic flow chart illustrating another repair method for a Hadoop distributed file system according to an embodiment of the present invention;

fig. 7 is a schematic flow chart of a repair method for a Hadoop distributed file system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the prior art, as shown in fig. 1, a Hadoop distributed file system generally includes: the system comprises a first Name Node (Name Node), a second Name Node (Secondary Name Node) connected with the first Name Node, and a plurality of Data nodes (Data nodes) connected with the first Name Node.

The first name node is used for storing a metadata file; the metadata file includes an image file and an edit log.

Wherein the metadata is defined as: the data describing the data is mainly information describing data attributes and is used for supporting functions such as indicating storage positions, historical data, resource searching, file recording and the like. Metadata is an electronic catalog. The mirror image file is a file formed after all metadata are serialized; and editing the log record to record each step of updating the metadata by the client.

The first name node is also used for metadata (Meta data) management, such as inquiry, modification; and is responsible for the client's access to the file.

The second name node is a cold backup of the first name node.

The data node provides data block storage data for the HDFS; data nodes are typically organized in racks that connect all systems together through a switch.

It should be noted that, because the first name node is the core of the entire Hadoop distributed file system, once the first name node is damaged, the data security of the entire Hadoop distributed file system is endangered, and thus, in order to ensure the security of the first name node, the second name node is added to periodically backup the metadata file of the first name node. If the name node is damaged, the second name node does not work in place of the first name node, but the second name node stores some information of the first name node and can send back the first name node, and therefore loss caused by damage of the first name node is reduced.

However, as the Hadoop distributed file system becomes larger and larger, the first name node becomes very busy, so that the metadata file becomes very large, which is not suitable for high-frequency backup and increases the burden. However, if the backup interval is long, the first name node is suddenly damaged during the interval, which will cause all metadata change sequences from the last backup to the damaged first name node to be lost.

Based on the above problem, an embodiment of the present invention provides a Hadoop distributed file system, including: the system comprises a first name node and a second name node connected with the first name node.

A first name node for storing a metadata file; the metadata file includes an image file and an edit log.

The first name node is also used for merging the image file and the editing log at intervals of a preset first time length to form a new image file; at the same time, a new edit log is enabled.

Wherein the enabled new edit log is an empty file.

The second name node is used for backing up a new mirror image file on the first name node every other preset first time length; the first name node is also used for backing up the editing log on the first name node every other preset second time length; the second duration is less than the first duration.

The first name node and the second name node are respectively deployed in different hosts.

It should be noted that, when the second name node backs up the metadata file of the first name node, the metadata file backed up last time is deleted at the same time.

For example, if the preset first time duration is 1 hour, and the preset second time duration is 10 minutes, for example, 2 points, the first name node merges the image file with the edit log to form a new image file a ₂ At the same time, a new edit log B is enabled ₂ Second name node pair to new image file A ₂ And a new edit log B ₂ And (6) backing up.

2 point to 10 time sharing, the second name node is used for editing the log B on the first name node ₂ ' (at this time, the edit log B ₂ Where the change sequence of metadata from 2 o ' clock to 2 o ' clock for 10 minutes) is stored, and at the same time, the edit log B of the last 2 o ' clock backup is deleted ₂ 。

2 point 20 time sharing, the second name node is used for editing the log B on the first name node ₂ "(in this case, the edit log B" stores the change sequence of metadata for 20 minutes from 2 o ' clock to 2 o ' clock) backup, and at the same time, deletes the edit log B that was backed up for 10 minutes from 2 o ' clock ₂ ′。

And analogizing in sequence until 3 points, and enabling the first name node to convert the image file A into the image file A ₂ And edit Log B ₂ "" ' (at this time, the change sequence of metadata during 2 o ' clock to 3 o ' clock is stored in the edit log) are merged to form a new image file A ₃ At the same time, a new edit log B is enabled ₃ . The second name node to the new image file A on the first name node ₃ And a new edit log B ₃ Backup is carried out, and simultaneously, the image file A which is backed up at the time of 2 points is deleted ₂ And edit Log B backed up at Point 2 and 50 ₂ ″″′。

By setting different interval durations when the second name node backs up the image file and the edit log on the first name node, the smaller edit log can independently back up the image file at a higher frequency, so that more metadata sequences are saved, and further the loss caused by the damage of the first name node is reduced.

The embodiment of the invention provides a Hadoop distributed file system, wherein a mirror image file and an editing log are merged at a first name node every other preset first time length to form a new mirror image file, the new editing log is started, and on the basis of the new editing log, the new mirror image file on the first name node is backed up at every other preset first time length through a second name node connected with the first name node, and the editing log on the first name node is backed up at every other preset second time length which is less than the first time length, so that the editing log is backed up at a higher frequency, more metadata sequences are stored, and the data security in the Hadoop distributed file system is enhanced.

The embodiment of the present invention further provides an operating method of a Hadoop distributed file system, as shown in fig. 2, including:

and S10, starting the first name node, loading the metadata file, and simultaneously starting the second name node.

The metadata file includes an image file and an edit log.

It should be noted that if the first name node is started for the first time, a new image file and an editing log are started, and if the first name node is not started for the first time, the image file and the editing log are directly loaded.

S11, combining the image file and the editing log by the first name node every other preset first time length to form a new image file, and starting the new editing log.

And S12, backing up the new mirror image file on the first name node by the second name node every other preset first time.

And S13, backing up the editing log on the first name node by the second name node every other preset second time.

Wherein the second duration is less than the first duration. The first name node and the second name node are respectively deployed in different hosts.

The operation method of the Hadoop distributed file system provided by the embodiment of the invention has the same beneficial effects as the Hadoop distributed file system, and is not repeated herein.

The embodiment of the present invention further provides a repair method for a Hadoop distributed file system, as shown in fig. 3, including:

and S20, stopping the damaged first name node and stopping the second name node at the same time.

It can be understood that the related services of the damaged first name node and the damaged second name node are stopped, and the data block of the Hadoop distributed file system is prevented from being changed.

And S21, backing up the metadata file on the second name node.

The metadata file includes an image file and an edit log of the last time the second name node backed up from the compromised first name node.

It can be appreciated that, since the interval duration between the backup of the image file and the editing log is different, the time of the last backup of the image file from the damaged first name node and the time of the last backup of the editing log are different.

For example, if the preset first duration is 1 hour, and the preset second duration is 10 minutes, for example, 2 points, the first name node merges the image file with the edit log to form a new image file a ₂ At the same time, a new edit log B is enabled ₂ Second name node pair to new image file A ₂ And a new edit log B ₂ Backing up; 2 point-10 time sharing, the second name node is used for editing the log B on the first name node ₂ ' backup, at this point, the edit Log B ₂ Stored in 'is a change sequence of metadata from 2 o' clock to 2 o 'clock for 10 minutes, and at the same time, the edit log B backed up the last 2 o' clock is deleted ₂ 。

If the first name node damaged at the point 2 19, the metadata file backed up from the damaged first name node at the last time on the second name node refers to the mirror backed up at the point 2Image file A ₂ And 2 point 10 time sharing backup edit log B ₂ ′。

S22, preparing to deploy the host of the new first name node.

The host for deploying the new first name node is different from the host for deploying the damaged first name node and the second name node.

S23, configuring the name and IP address of the host, mutually communicating and logging in, operating environment and clustering.

And S24, formatting the new first name node.

It should be noted that the metadata sequence number on the new first name node after the formatting is performed is cleared.

And S25, sending the mirror image file and the editing log on the backed-up second name node to the new first name node.

Illustratively, according to the example in S21, the last metadata file backed up from the damaged first name node on the second name node refers to the image file a backed up at 2 o' clock ₂ And 2 point 10 time sharing backup edit log B ₂ ', the image file A at the 2 o' clock ₂ And 2-point 10-time-sharing backup editing log B ₂ ', sent to the first name node.

And S26, modifying the metadata sequence number of the new first name node.

It will be appreciated that the metadata sequence number of the new first name node is manually modified so that it can be concatenated with the metadata sequence in the edit log sent back, thereby continuing to extend the entire metadata sequence while the new first name node is running.

Optionally, after S26, as shown in fig. 4, the repair method for a Hadoop distributed file system further includes:

and S27, starting a new first name node and a corresponding Hadoop distributed file system.

And S28, executing Hadoop distributed file system check.

And if the repair is confirmed after the Hadoop distributed file system is checked, the repair is finished, and if the repair is not finished yet, the steps from S22 to S26 are executed again to repair again.

An embodiment of the present invention further provides a Hadoop distributed file system, including: the system comprises a first name node and a second name node connected with the first name node.

A first name node for storing a metadata file; the metadata file comprises a mirror image file and an editing log; the first name node is also used for starting a new editing log at intervals of a preset first time length.

And the second name node is used for backing up the original image file and the original editing log on the first name node every other preset first time length, combining the backed-up image file and the editing log to form a new image file, sending the new image file back to the first name node and replacing the original image file of the first name node.

The second name node is also used for backing up the editing log on the first name node every other preset second time length to form an intermediate editing log; the second duration is less than the first duration.

For example, if the preset first time duration is 1 hour, and the preset second time duration is 15 minutes, for example, 1 point, the first name node starts a new edit log b ₁ (ii) a The second name node backs up the original image file and the original editing log on the first name node, and combines the backed-up image file and the editing log to form a new image file a ₁ And sending back the first name node to replace the original mirror image file of the first name node, and simultaneously, the second name node is used for editing a new log b on the first name node ₁ Backup, forming intermediate edit logs c ₁ 。

1 point 15 time sharing, the second name node is used for editing the log b on the first name node ₁ ' (at this time, the edit log b ₁ Where stored is a sequence of changes to metadata from 1 point to 1 point for 15 minutes) backup, forming an intermediate edit log c ₁ ', at the same time, delete the last backupIntermediate edit log c of ₁ 。

1 point 30 time sharing, the second name node is used for editing the log b on the first name node ₁ "(at this time, the edit Log b ₁ "where stored is a sequence of changes in metadata from point 1 to point 1 for 30 minutes) backup, an intermediate edit log c is formed ₁ And, at the same time, deleting the intermediate editing log c of the last backup ₁ ′。

And analogizing in turn until 2, starting a new editing log b by the first name node ₂ (ii) a The second name node is coupled with the mirror image file a on the first name node ₁ And editing the log b ₁ "' (in this case, the edit Log b ₁ Stored in "") is a sequence of changes to metadata during 1 to 2 points) backup, the image file a to be backed up ₁ And editing the log b ₁ ' merge to form a new image file a ₂ And sends back the first name node to replace the original image file a of the first name node ₁ At the same time, the second name node is used to edit the new log b on the first name node ₂ Backup, forming an intermediate edit log c ₂ 。

Different interval durations are set when the second name node backs up the image file and the edit log on the first name node, so that the smaller edit log can independently back up the image file at a higher frequency, more metadata sequences are stored, and loss caused by damage of the first name node is reduced.

The embodiment of the invention also provides a Hadoop distributed file system, wherein the first name node starts a new editing log at intervals of a preset first time length; the method comprises the steps that original mirror image files and original editing logs on a first name node are backed up every other preset first time length through a second name node connected with the first name node, the backed-up mirror image files and the editing logs are combined to form new mirror image files, the new mirror image files are sent back to the first name node, and the original mirror image files of the first name node are replaced. In addition, the second name node backs up the editing log on the first name node every other preset second time length to form a middle editing log; because the second time length is shorter than the first time length, the editing log is independently backed up at a higher frequency, more metadata sequences are saved, and the safety of data in the Hadoop distributed file system is enhanced.

The embodiment of the present invention further provides an operating method of a Hadoop distributed file system, as shown in fig. 5, including:

s100, starting the first name node, loading the metadata file, and simultaneously starting the second name node.

The metadata file includes an image file and an edit log.

And S110, starting a new editing log by the first name node every other preset first time.

And S120, backing up the original image file and the original editing log on the first name node by the second name node every other preset first time, combining the backed-up image file and the editing log to form a new image file, sending the new image file back to the first name node, and replacing the original image file of the first name node.

And S130, backing up the editing log on the first name node by the second name node every other preset second time length to form a middle editing log.

The operation method of the Hadoop distributed file system provided by the embodiment of the invention has the same beneficial effects as the Hadoop distributed file system, and is not described again here.

The embodiment of the present invention further provides a repair method for a Hadoop distributed file system, as shown in fig. 6, including:

and S200, stopping the damaged first name node and stopping the second name node at the same time.

And S210, backing up the metadata file on the second name node.

The metadata file comprises an image file and an editing log which are backed up by the second name node from the damaged first name node for the last time, and an intermediate editing log which is formed for the last time.

It can be understood that, since the interval time for forming the intermediate editing log for the mirror file and the editing log backup are different, the time for forming the intermediate editing log for the mirror file and the editing log backup from the damaged first name node and the time for forming the intermediate editing log for the editing log backup are different.

If the first name node is damaged at the point 1 and the point 25, the metadata file which is backed up from the damaged first name node at the last time on the second name node refers to the image file a backed up at the point 1 ₀ And editing the log b ₀ And 1 point 15 time-sharing pair editing log b ₁ ' (at this time, the edit log b ₁ 'storing change sequence of metadata from 1 point to 1 point for 15 minutes)' backup, forming an intermediate edit log c ₁ ′。

S220, preparing to deploy the host of the new first name node.

S230, configuring the name and IP address of the host, mutually communicating and logging in, and running environment and cluster.

And S240, formatting the new first name node.

And S250, merging the image file, the editing log and the intermediate editing log on the backed-up second name node to generate a new image file, and sending the new image file to the new first name node.

Illustratively, according to the example in S210, the last metadata file backed up from the damaged first name node on the second name node refers to the image file a backed up at 1 point ₀ And editing the log b ₀ And 1 point 15 time-sharing pair editing log b ₁ ' (at this time, the edit log b ₁ 'storing change sequence of metadata from 1 point to 1 point for 15 minutes)' backup, forming an intermediate edit log c ₁ ′。

The mirror image file a ₀ And editing the log b ₀ And an intermediate editing log c ₁ ' merging, which is equivalent to first mirroring the document a ₀ And editing the log b ₀ Merging and generating the corresponding mirror image file a at 1 point ₁ Then, the 1-point time is corresponding to the mirror image file a ₁ And intermediate editing log c ₁ ' merging, regenerating a new image file and sending the new image file to the new first name node.

And S260, modifying the metadata serial number of the new first name node.

It will be appreciated that the metadata sequence number of the new first name node is manually modified so that it can be concatenated with the metadata sequence in the image file sent back, thereby continuing to extend the entire metadata sequence while the new first name node is running.

Optionally, after S260, as shown in fig. 7, the repair method for a Hadoop distributed file system further includes:

and S270, starting a new first name node and a corresponding Hadoop distributed file system.

And S280, checking the data blocks of the distributed file system.

If the check confirms the repair, the repair is completed, and if the check does not confirm the repair, the steps from S220 to S260 are executed again to perform the repair again.

The embodiment of the invention also provides computer equipment, which comprises a storage unit and a processing unit; the storage unit stores therein a computer program executable on the processing unit and stores the result; when the processing unit executes the computer program, the operation method of the Hadoop distributed file system and/or the repair method of the Hadoop distributed file system are/is realized.

Embodiments of the present invention also provide a computer readable medium storing a computer program, which when executed by a processor implements a method for operating a Hadoop distributed file system as described above and/or a method for repairing a Hadoop distributed file system as described above.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A Hadoop distributed file system, comprising: the first name node and the second name node are connected with the first name node;

the first name node is used for storing a metadata file; the metadata file comprises a mirror image file and an editing log;

the first name node is further used for merging the image file and the editing log at intervals of a preset first time length to form a new image file and starting a new editing log;

the second name node is used for backing up a new mirror image file on the first name node every other preset first time length; the first name node is used for storing the editing log of the first name node; the second duration is less than the first duration;

2. A method of operating the Hadoop distributed file system according to claim 1, comprising:

starting a first name node, loading a metadata file, and simultaneously starting a second name node; the metadata file comprises a mirror image file and an editing log;

the first name node merges the image file and the editing log at intervals of a preset first duration to form a new image file, and starts a new editing log at the same time; the second name node backs up the new image file on the first name node;

the second name node backs up the editing log on the first name node every other preset second time length; the second duration is less than the first duration;

3. A repair method for a Hadoop distributed file system according to claim 1, comprising:

stopping the damaged first name node and stopping the second name node at the same time;

backing up the metadata file on the second name node; the metadata file comprises an image file and an editing log which are backed up by the second name node from the damaged first name node for the last time;

preparing a host for deploying a new first name node; the host for deploying the new first name node is different from the host for deploying the damaged first name node and the second name node;

configuring the name, IP address, mutual trust login, operating environment and cluster of the host;

performing formatting on the new first name node;

sending the image file and the editing log on the backed-up second name node to the new first name node;

and modifying the new metadata sequence number of the first name node.

4. The repair method for the Hadoop distributed file system according to claim 3, further comprising, after modifying the metadata sequence number of the new first name node:

starting a new first name node and a corresponding Hadoop distributed file system;

and executing Hadoop distributed file system check.

5. A Hadoop distributed file system, comprising: the first name node and the second name node are connected with the first name node;

the first name node is used for storing a metadata file; the metadata file comprises a mirror image file and an editing log; the first name node is also used for starting a new editing log at intervals of a preset first time length;

the second name node is configured to backup the original image file and the original editing log on the first name node every a preset first duration, merge the backed-up image file and the editing log to form a new image file, and send the new image file back to the first name node to replace the original image file of the first name node;

the second name node is further configured to backup the edit log on the first name node every preset second duration to form an intermediate edit log; the second duration is less than the first duration;

6. A method of operating the Hadoop distributed file system of claim 5, comprising:

starting a new editing log by the first name node every a preset first time length; the second name node backs up the original image file and the original editing log on the first name node, combines the backed-up image file and the editing log to form a new image file, and sends the new image file back to the first name node to replace the original image file of the first name node;

the second name node backs up the editing log on the first name node every other preset second time length to form a middle editing log; the second duration is less than the first duration;

wherein the name node and the second name node are respectively deployed in different hosts.

7. A repair method for the Hadoop distributed file system as claimed in claim 5, comprising:

backing up the metadata file on the second name node; the metadata file comprises an image file and an editing log which are backed up by the second name node from the damaged first name node for the last time, and an intermediate editing log which is formed for the last time;

performing formatting on the new first name node;

merging the backed-up image file, the intermediate editing log and the intermediate editing log on the second name node to generate a new image file, and sending the new image file to the new first name node;

and modifying the new metadata sequence number of the first name node.

8. The repair method for the Hadoop distributed file system according to claim 7, further comprising, after modifying the metadata sequence number of the new first name node:

and executing Hadoop distributed file system check.

9. A computer device, comprising a storage unit and a processing unit;

the storage unit stores therein a computer program executable on the processing unit and stores the result;

the processing unit, when executing the computer program, implements a method of operating a Hadoop distributed file system according to claim 2 and/or a method of repairing a Hadoop distributed file system according to any of claims 3-4; alternatively, the first and second electrodes may be,

the processing unit, when executing the computer program, implements a method of operating a Hadoop distributed file system according to claim 6 and/or a method of repairing a Hadoop distributed file system according to any of claims 7-8.

10. A computer-readable medium, in which a computer program is stored, which, when being executed by a processor, carries out a method of operating a Hadoop distributed file system according to claim 2 and/or a method of repairing a Hadoop distributed file system according to any one of claims 3 to 4; alternatively, the first and second electrodes may be,

the computer program, when executed by a processor, implements a method of operating a Hadoop distributed file system as claimed in claim 6 and/or a method of repairing a Hadoop distributed file system as claimed in any one of claims 7 to 8.