WO2018054079A1 - Procédé de stockage d'un fichier, première machine virtuelle et nœud de nom - Google Patents

Procédé de stockage d'un fichier, première machine virtuelle et nœud de nom Download PDF

Info

Publication number
WO2018054079A1
WO2018054079A1 PCT/CN2017/085351 CN2017085351W WO2018054079A1 WO 2018054079 A1 WO2018054079 A1 WO 2018054079A1 CN 2017085351 W CN2017085351 W CN 2017085351W WO 2018054079 A1 WO2018054079 A1 WO 2018054079A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual machine
data
written
storage area
virtual
Prior art date
Application number
PCT/CN2017/085351
Other languages
English (en)
Chinese (zh)
Inventor
李亿
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018054079A1 publication Critical patent/WO2018054079A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a method for storing a file, a first virtual machine, and a name node.
  • a distributed file system includes a client, a data node, and a name node; wherein the data node is used to store files and the name node is used to manage files stored on the data nodes.
  • the client can query the files stored in each data node through the name node and obtain the address of each data node, so as to read the file from the data node or write the file to the data node.
  • a data node in a distributed file system can be either a physical server or a virtual machine.
  • the virtual hard disk of the virtual machine is provided by the distributed block storage system, and writing the file to the virtual machine actually writes the file to the virtual hard disk of the virtual machine and writes to the virtual hard disk.
  • the file is implemented by writing files to the physical hard disk managed by the distributed block storage system.
  • the distributed file system uses the file copy mechanism when storing files on the virtual hard disk, and saves the same file in N (N is an integer greater than 1) virtual hard disk in the distributed file system;
  • the block storage system also uses the file copy mechanism to save the files in the same virtual hard disk in M (M is an integer greater than 1) physical hard disk. Because the file copy mechanism is adopted in both the distributed file system and the distributed block storage system, the number of files actually saved in the same file on the physical hard disk is N*M, resulting in redundancy of file shares. The redundancy of the number of copies of the file saved in the same file wastes storage space and affects the processing performance of the system.
  • the first method is, for a file to be stored, only in a virtual machine of the distributed file system. Store the file in .
  • the file can be accessed only through the virtual machine. If the virtual machine fails, the virtual machine needs to wait for the virtual machine to return to normal before providing the file read and write service to the client again, resulting in a distributed file system.
  • the second method is to use the hot standby mechanism of the virtual machine, that is, to configure the hot standby virtual machine corresponding to the primary virtual machine, and the hot standby virtual machine writes files synchronously with the primary virtual machine.
  • the distributed file system switches to the hot standby virtual machine to continue to provide file read and write services to the client.
  • the distributed file system needs a certain waiting time when switching to the hot standby virtual machine, which causes the distributed file system to fail to provide file read and write services to the client during the waiting time, so that the distributed file system is available. Reduced; and the hot standby virtual machine does not provide external services before switching to the primary virtual machine, resulting in wasted resources.
  • the existing methods for solving the problem of file number redundancy in a distributed file system may result in low availability of the distributed file system and inability to better solve the problem of file number redundancy.
  • the embodiment of the invention provides a method for storing a file, a first virtual machine and a name node, which are used to solve the problem of redundancy of the number of files existing when the distributed file system stores the file, and improve the availability of the system.
  • an embodiment of the present invention provides a method for storing a file in a distributed file system, where
  • the distributed file system includes a name node and a plurality of virtual machines as data nodes, wherein the plurality of virtual machines share the same storage area; the method includes:
  • the first virtual machine receives the data to be written sent by the client, the address of the second virtual machine, and then writes the received data to be written to the storage area shared by the plurality of virtual machines, and generates or updates the metadata of the data to be written;
  • the first virtual machine sends the metadata generated or updated by the first virtual machine to the second virtual machine according to the received address of the second virtual machine.
  • the first virtual machine is a virtual machine in the plurality of virtual machines that has the right to write data to the storage area by the name node
  • the second virtual machine is a virtual machine other than the first virtual machine among the plurality of virtual machines
  • the metadata of the data to be written includes, but is not limited to, a storage location of the data to be written, a file name of the data to be written, and a file directory of the data to be written.
  • the data to be written written by the first virtual machine in the storage area is only saved in the storage area.
  • the data to be written only multiple copies are saved due to the file copy mechanism adopted by the distributed block storage system, and there is no saved file due to the file copy mechanism adopted by both the distributed file system and the distributed block storage system. The problem of number of copies.
  • the first virtual machine of the plurality of virtual machines included in the distributed file system has the right to write data to the storage area
  • the second virtual machine of the plurality of virtual machines except the first virtual machine has the slave
  • the storage area reads the permission to write data.
  • the method before the first virtual machine writes the data to be written to the storage area, the method further includes: the first virtual machine receives the write permission identifier of the first virtual machine sent by the client, and the write permission identifier is a name node.
  • the client requests the name node to write the data to be written to the distributed file system, it is sent to the client, and is used to specify that the first virtual machine has the right to write the data to be written to the storage area.
  • a plurality of virtual machines share a storage area.
  • the multiple virtual machines can be mounted on the same virtual hard disk provided by the distributed block storage system.
  • the virtual hard disk includes a storage area shared by multiple virtual machines.
  • the metadata of the data to be written sent by the first virtual machine to the second virtual machine has the following two purposes:
  • the metadata is used by the second virtual machine to generate or update file information recorded in its own operating system, and the file information is used for operating system storage.
  • the data to be written is read in the area.
  • the metadata is used by the second virtual machine to read the data to be written from the storage area.
  • the second virtual machine can read the data to be written in the storage area shared by the multiple virtual machines according to the metadata of the data to be written sent by the first virtual machine.
  • the second virtual machine may be designated by the name node to have the right to read the data to be written from the storage area.
  • an embodiment of the present invention provides a method for storing a file in a distributed file system, where the distributed file system includes a name node and a plurality of virtual machines as data nodes, and the plurality of virtual machines share the same storage area; Methods include:
  • the name node After receiving the request message that the client requests to write the data to be written to the distributed file system, the name node sends a response message corresponding to the request message to the client.
  • the response message sent by the name node to the client includes the address of the first virtual machine and the address of the second virtual machine.
  • the response message further indicates that the first virtual machine has written to the storage area for multiple virtual machines.
  • a virtual machine of the authority of the data, the second virtual machine being a virtual machine other than the first virtual machine among the plurality of virtual machines.
  • the response message sent by the name node specifies that one of the plurality of virtual machines has data to be written into the shared storage area. Permissions, so data written to a shared storage area will only be saved in that storage area. For the data written to the shared storage area, only multiple copies are saved due to the file copy mechanism adopted by the distributed block storage system, and there is no file copy mechanism due to the distributed file system and the distributed block storage system. The resulting problem of redundant copies of saved files.
  • the first virtual machine of the plurality of virtual machines included in the distributed file system has the right to write data to the storage area
  • the second virtual machine of the plurality of virtual machines except the first virtual machine has a slave
  • the storage area reads the permission to write data.
  • the response message further indicates that the second virtual machine has the right to read the data to be written from the storage area.
  • the name node may indicate the rights of the first virtual machine and the rights of the second virtual machine to the client by using a response message in the following two manners:
  • the response message sent by the name node to the client further includes a write permission identifier of the first virtual machine and a read permission identifier of the second virtual machine, wherein the write permission identifier and the read permission identifier respectively indicate the first virtual machine Permissions and permissions of the second virtual machine, that is, the write permission identifier is used to specify that the first virtual machine has the right to write data to be written to the storage area, and the read permission identifier is used to specify that the second virtual machine has read from the storage area. Permission to write data.
  • the address of the first virtual machine and the address of the second virtual machine in the response message sent by the name node to the client are arranged according to a preset rule, and the preset rule indicates the authority of the first virtual machine and the second virtual
  • the authority of the machine that is, the first virtual machine has the right to write the data to be written to the storage area, and the second virtual machine has the right to read the data to be written from the storage area.
  • a plurality of virtual machines share a storage area.
  • the following manners may be adopted: multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • how the name node handles when the virtual machine fails can include the following two situations:
  • the name node sends the first update information to the client, where
  • the first update information includes an address of the updated first virtual machine
  • the first update information further specifies another virtual machine of the plurality of virtual machines other than the failed first virtual machine as the updated first virtual machine, that is, the specified One of the second virtual machines is the updated first virtual machine, and the updated first virtual machine has the right to write data to the storage area.
  • the name node sends the second update information to the client
  • the second update information includes the address of the updated second virtual machine
  • the second update information further specifies multiple virtual Another virtual machine other than the machine serves as the updated second virtual machine, and the updated second virtual machine has the right to read the data to be written from the storage area.
  • the name node specifies another virtual machine to replace the failed virtual machine, thereby occurring in the first virtual machine and/or the second virtual machine.
  • the distributed file system can still provide services for the client to read and write data, further improving the availability of the distributed file system.
  • an embodiment of the present invention provides a method for storing a file in a distributed file system, where the distributed file system includes a name node, multiple virtual machines as data nodes, and multiple virtual machines share the same storage area; include:
  • the client sends a request message to the name node to write the data to be written to the distributed file system, and then receives the response message corresponding to the request message sent by the name node.
  • the response message includes an address of the first virtual machine and an address of the second virtual machine.
  • the response message further indicates that the first virtual machine is a virtual machine having permission to write data to the storage area among the plurality of virtual machines,
  • the second virtual machine is a virtual machine other than the first virtual machine among the plurality of virtual machines.
  • the client sends the to-be-written data and the address of the second virtual machine to the first virtual machine according to the address of the first virtual machine included in the response message, and indicates the first virtual machine: writing the data to be written, generating or updating the data to be written. Metadata, and transmitting metadata of the data to be written to the second virtual machine according to the address of the second virtual machine.
  • the distributed file system includes multiple virtual machines sharing the same storage area. Therefore, in the distributed file system, the client instructs the first virtual machine to write the data to be written in the shared storage area only in the storage area. Store one copy. For the data to be written, only multiple copies are saved due to the file copy mechanism adopted by the distributed block storage system, and there is no saved file due to the file copy mechanism adopted by both the distributed file system and the distributed block storage system. The problem of number of copies.
  • the second virtual machine of the plurality of virtual machines except the first virtual machine has read from the storage area. Take permission to write data.
  • the number of virtual machines available in a distributed file system that can be used to provide clients with services for reading and writing data to be written is multiple.
  • other virtual machines can provide services for the client to read and write data to be written, so that the availability of the distributed file system is improved, and the hot standby of the virtual machine is avoided in the prior art. There is a waste of resources in the mechanism.
  • the client can obtain the rights of the first virtual machine and the rights of the second virtual machine by using the response message sent by the received name node in the following two manners:
  • the response message received by the client further includes a write permission identifier of the first virtual machine and a read permission identifier of the second virtual machine, wherein the write permission identifier and the read permission identifier respectively indicate the permissions of the first virtual machine
  • the write permission identifier is used to specify that the first virtual machine has the right to write the data to be written to the storage area.
  • the read permission identifier is used to specify that the second virtual machine has the right to read the data to be written from the storage area.
  • the address of the first virtual machine and the address of the second virtual machine in the response message received by the client are arranged according to a preset rule, where the preset rule indicates the authority of the first virtual machine and the second virtual machine. Permissions, that is, the first virtual machine has the right to write data to be written to the storage area, and the second virtual machine has the right to read the data to be written from the storage area.
  • a plurality of virtual machines share a storage area.
  • the following manners may be adopted: multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • the response message further indicates that the second virtual machine has the right to read the data to be written from the storage area.
  • an embodiment of the present invention provides a method for storing a file in a distributed file system, where the distributed file system includes a name node and a plurality of virtual machines as data nodes, and the plurality of virtual machines share the same storage area; include:
  • the second virtual machine receives the metadata sent by the first virtual machine.
  • the first virtual machine is a virtual machine in which a name node has a right to write data to the storage area
  • the second virtual machine is a virtual machine other than the first virtual machine among the plurality of virtual machines.
  • the metadata is metadata of the data to be written generated or updated after the first virtual machine writes the data to be written to the storage area.
  • the shared storage is performed.
  • the data written in the area will only be saved in the storage area.
  • For the data written to the shared storage area only multiple copies are saved due to the file copy mechanism adopted by the distributed block storage system, and there is no file copy mechanism due to the distributed file system and the distributed block storage system. The resulting problem of redundant copies of saved files.
  • the first virtual machine of the plurality of virtual machines included in the distributed file system has the right to write data to the storage area
  • the second virtual machine of the plurality of virtual machines except the first virtual machine has read from the storage area. Permission to write data.
  • the number of virtual machines available in a distributed file system that can be used to provide clients with services for reading and writing data to be written is multiple.
  • other virtual machines can provide services for the client to read and write data to be written, so that the availability of the distributed file system is improved, and the hot standby of the virtual machine is avoided in the prior art. There is a waste of resources in the mechanism.
  • the second virtual machine may obtain the read permission identifier of the second virtual machine sent by the client before receiving the metadata sent by the first virtual machine.
  • the read permission identifier is sent to the client when the client requests the name node to write the data to be written to the distributed file system, and the read permission identifier is used to specify that the second virtual machine has read the data to be written from the storage area. permission.
  • a plurality of virtual machines share a storage area.
  • the following manners may be adopted: multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • the data to be written in the storage area shared by the plurality of virtual machines may be read according to the received metadata, and the following manner may be adopted:
  • the first type if the second virtual machine reads the data to be written through its own operating system, the second virtual machine is based on the number of elements. According to the file information recorded in the operating system of the own generation or update, the file information can be used by the operating system to read the data to be written from the storage area.
  • the second virtual machine reads the data to be written, the second virtual machine reads the data to be written from the storage area according to the metadata.
  • the second virtual machine can read the data to be written in the storage area shared by the multiple virtual machines according to the metadata of the data to be written sent by the first virtual machine.
  • the second virtual machine is designated by the name node to have the right to read the data to be written from the storage area.
  • an embodiment of the present invention provides a first virtual machine in a distributed file system, where the distributed file system includes a name node and multiple virtual machines as data nodes, and multiple virtual machines share the same storage area.
  • a virtual machine specifies, by a name node, a virtual machine having a right to write data to the storage area among the plurality of virtual machines; the first virtual machine includes:
  • a receiving module configured to receive data to be written by the client, and an address of the second virtual machine, where the second virtual machine is a virtual machine other than the first virtual machine;
  • a processing module configured to write, to the storage area, data to be written received by the receiving module, and generate or update metadata of the data to be written
  • a sending module configured to send, to the second virtual machine, the processing module to generate or update the metadata according to the address of the second virtual machine received by the receiving module.
  • the metadata of the data to be written includes, but is not limited to, a storage location of the data to be written, a file name of the data to be written, and a file directory of the data to be written.
  • the data to be written written by the processing module to the storage area is only saved in the storage area.
  • the data to be written only multiple copies are saved due to the file copy mechanism adopted by the distributed block storage system, and there is no saved file due to the file copy mechanism adopted by both the distributed file system and the distributed block storage system. The problem of number of copies.
  • the first virtual machine of the plurality of virtual machines included in the distributed file system has the right to write data to the storage area
  • the second virtual machine of the plurality of virtual machines except the first virtual machine has the slave
  • the storage area reads the permission to write data.
  • the receiving module is further configured to: before the processing module writes the data to be written to the storage area, receive a write permission identifier of the first virtual machine sent by the client, where the write permission identifier is a name node at the client.
  • the write permission identifier is used to specify that the first virtual machine has the right to write the data to be written to the storage area.
  • Multiple virtual machines share a storage area.
  • the following methods can be adopted: multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • the metadata of the data to be written sent by the sending module to the second virtual machine has the following two purposes:
  • the metadata is used by the second virtual machine to generate or update file information recorded in its own operating system, and the file information is used for operating system storage.
  • the data to be written is read in the area.
  • the metadata is used by the second virtual machine to read the data to be written from the storage area.
  • the second virtual machine can read the data to be written in the storage area shared by the plurality of virtual machines according to the metadata of the data to be written sent by the sending module.
  • the second virtual machine is designated by the name node to have the right to read the data to be written from the storage area.
  • an embodiment of the present invention provides a name node in a distributed file system, where the distributed file system includes a name node and multiple virtual machines as data nodes, and multiple virtual machines share the same storage area; include:
  • a receiving module configured to receive a request message that the client requests to write data to be written to the distributed file system
  • a sending module configured to send, to the client, a response message corresponding to the request message received by the receiving module, where the response message includes an address of the first virtual machine and an address of the second virtual machine, and further, the response message further indicates that the first virtual machine is One of the plurality of virtual machines having the authority to write data to the storage area, and the second virtual machine is a virtual machine other than the first virtual machine among the plurality of virtual machines.
  • the response message sent by the name node specifies that one of the plurality of virtual machines has data to be written into the shared storage area. Permissions, so data written by the processing module to the shared storage area will only be saved in the storage area. For the data written to the shared storage area, only multiple copies are saved due to the file copy mechanism adopted by the distributed block storage system, and there is no file copy mechanism due to the distributed file system and the distributed block storage system. The resulting problem of redundant copies of saved files.
  • the first virtual machine of the plurality of virtual machines included in the distributed file system has the right to write data to the storage area
  • the second virtual machine of the plurality of virtual machines except the first virtual machine has a slave
  • the storage area reads the permission to write data.
  • the response message further indicates that the second virtual machine has the right to read the data to be written from the storage area.
  • the response message sent by the sending module may indicate the rights of the first virtual machine and the rights of the second virtual machine to the client in the following two manners:
  • the response message sent by the sending module to the client further includes a write permission identifier of the first virtual machine and a read permission identifier of the second virtual machine, wherein the write permission identifier and the read permission identifier respectively indicate the authority of the first virtual machine and the second
  • the permission of the virtual machine that is, the write permission identifier is used to specify that the first virtual machine has the right to write the data to be written to the storage area
  • the read permission identifier is used to specify that the second virtual machine has the right to read the data to be written from the storage area.
  • the address of the first virtual machine and the address of the second virtual machine in the response message sent by the sending module to the client are arranged according to a preset rule, where the preset rule indicates the right of the first virtual machine and the right of the second virtual machine, that is, The first virtual machine has the right to write data to be written to the storage area, and the second virtual machine has the right to read the data to be written from the storage area.
  • a plurality of virtual machines share a storage area.
  • the following manners may be adopted: multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • how the sending module handles when the virtual machine fails may include the following two situations:
  • the first update information is sent to the client, where the first update information includes an address of the updated first virtual machine, and the first update information is also specified in multiple virtual machines. Another virtual machine other than the failed first virtual machine serves as the updated first virtual machine, and the updated first virtual machine has the right to write data to the storage area.
  • the second update information is sent to the client, where the second update information includes the address of the updated second virtual machine, and the second update information is further specified by multiple virtual machines.
  • Another virtual machine is the updated second virtual machine, and the updated second virtual machine has the right to read the data to be written from the storage area.
  • the sending module specifies another virtual machine to replace the failed virtual machine, thereby occurring in the first virtual machine and/or the second virtual machine.
  • the distributed file system can still provide services for the client to read and write data, further improving the availability of the distributed file system.
  • the embodiment of the present invention provides a client, where the distributed file system includes a name node and multiple virtual machines as data nodes, and multiple virtual machines share the same storage area; the client includes:
  • a sending module configured to send, to the name node, a request message for writing a data to be written to the distributed file system
  • a receiving module configured to receive a response message corresponding to the request message sent by the name node
  • the response message includes an address of the first virtual machine and an address of the second virtual machine.
  • the response message further indicates that the first virtual machine is a virtual machine having permission to write data to the storage area among the plurality of virtual machines,
  • the second virtual machine is a virtual machine other than the first virtual machine among the plurality of virtual machines;
  • the sending module is further configured to send the to-be-written data and the address of the second virtual machine to the first virtual machine according to the address of the first virtual machine included in the response message received by the receiving module, and instruct the first virtual machine to: write the data to be written Generating or updating metadata of the data to be written, and transmitting metadata of the data to be written to the second virtual machine according to the address of the second virtual machine included in the response message received by the receiving module.
  • the distributed file system includes multiple virtual machines sharing the same storage area. Therefore, in the distributed file system, the sending module instructs the first virtual machine to write the data to be written in the shared storage area only in the storage area. Store one copy. For the data to be written, only multiple copies are saved due to the file copy mechanism adopted by the distributed block storage system, and there is no saved file due to the file copy mechanism adopted by both the distributed file system and the distributed block storage system. The problem of number of copies.
  • the second virtual machine of the plurality of virtual machines except the first virtual machine has read from the storage area. Take permission to write data.
  • the number of virtual machines available in a distributed file system that can be used to provide clients with services for reading and writing data to be written is multiple.
  • other virtual machines can provide services for the client to read and write data to be written, so that the availability of the distributed file system is improved, and the hot standby of the virtual machine is avoided in the prior art. There is a waste of resources in the mechanism.
  • the receiving module can obtain the rights of the first virtual machine and the rights of the second virtual machine by using the response message sent by the name node:
  • the response message received by the receiving module further includes a write permission identifier of the first virtual machine and a read permission identifier of the second virtual machine, wherein the write permission identifier and the read permission identifier respectively indicate the permissions of the first virtual machine
  • the permission of the second virtual machine that is, the write permission identifier is used to specify that the first virtual machine has the right to write the data to be written to the storage area
  • the read permission identifier is used to specify that the second virtual machine has read from the storage area to be written. Permissions for data.
  • the address of the first virtual machine and the address of the second virtual machine in the response message received by the receiving module are arranged according to a preset rule, where the preset rule indicates the authority of the first virtual machine and the second virtual machine. Permissions, that is, the first virtual machine has the right to write data to be written to the storage area, and the second virtual machine has the right to read the data to be written from the storage area.
  • the receiving module provides two ways of obtaining the permission of the first virtual machine and the authority of the second virtual machine by receiving the response message.
  • a plurality of virtual machines share a storage area.
  • the following manners may be adopted: multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • the response message further indicates that the second virtual machine has the right to read the data to be written from the storage area.
  • An eighth aspect of the present invention provides a second virtual machine in a distributed file system, where the distributed file system includes a name node and a plurality of virtual machines as data nodes, and the plurality of virtual machines share the same storage area;
  • the second virtual machine includes:
  • the receiving module is configured to receive metadata sent by the first virtual machine.
  • the first virtual machine is a virtual machine in which a name node has a right to write data to the storage area
  • the second virtual machine is a virtual machine other than the first virtual machine among the plurality of virtual machines.
  • the metadata is metadata of the data to be written generated or updated after the first virtual machine writes the data to be written to the storage area.
  • the shared storage is performed.
  • the data written in the area will only be saved in the storage area.
  • For the data written to the shared storage area only multiple copies are saved due to the file copy mechanism adopted by the distributed block storage system, and there is no file copy mechanism due to the distributed file system and the distributed block storage system. The resulting problem of redundant copies of saved files.
  • the first virtual machine of the plurality of virtual machines included in the distributed file system has the right to write data to the storage area
  • the second virtual machine of the plurality of virtual machines except the first virtual machine has read from the storage area. Permission to write data.
  • the number of virtual machines available in a distributed file system that can be used to provide clients with services for reading and writing data to be written is multiple.
  • the other virtual machine can provide the client with read and write data to be written.
  • the service improves the availability of the distributed file system, and also avoids the waste of resources in the prior art when the virtual machine hot standby mechanism is adopted.
  • the receiving module may obtain the permission of the second virtual machine in the following manner: the receiving module receives the read permission identifier of the second virtual machine sent by the client before receiving the metadata sent by the first virtual machine.
  • the read permission identifier is sent to the client when the client requests the name node to write the data to be written to the distributed file system, and the read permission identifier is used to specify that the second virtual machine has read the data to be written from the storage area. permission.
  • a plurality of virtual machines share a storage area.
  • the following manners may be adopted: multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • the second virtual machine further includes a processing module.
  • the processing module may read the data to be written in the storage area shared by the plurality of virtual machines according to the received metadata, and the processing module may adopt the following manner:
  • the processing module After the receiving module receives the metadata sent by the first virtual machine, if the second virtual machine reads the data to be written through its own operating system, the processing module generates or updates the record recorded in its own operating system according to the metadata. File information, which is used by the operating system to read data to be written from the storage area.
  • the processing module reads the data to be written from the storage area according to the metadata.
  • the processing module can read the data to be written in the storage area shared by the plurality of virtual machines according to the metadata of the data to be written sent by the first virtual machine.
  • the second virtual machine is designated by the name node to have the right to read the data to be written from the storage area.
  • a computer readable storage medium where computer execution instructions are stored, and when at least one processor of a computing node executes the computer to execute an instruction, the computing node executes the first aspect or the first Various possible aspects of the aspects are provided by the method provided, or the method provided by the various possible designs of the second or second aspect described above, or the methods provided by the various possible designs of the third or third aspect described above.
  • a computer program product comprising computer executed instructions stored in a computer readable storage medium.
  • At least one processor of the computing node can read the computer-executable instructions from a computer-readable storage medium, the at least one processor executing the computer-executing instructions, such that the computing node implements the first aspect or the methods provided by the various possible designs of the first aspect Or the method provided by the various possible designs of the second aspect or the second aspect described above, or the method provided by the various possible designs of the third aspect or the third aspect described above.
  • FIG. 1 is a schematic diagram of a connection relationship between a name node, a client, and multiple data nodes in a distributed file system according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a connection relationship between a distributed file system and a distributed block storage system according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method for storing a file in a distributed file system according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a distributed file system and a distributed block storage system using the method for storing files shown in FIG. 3;
  • FIG. 5 is a schematic structural diagram of a first virtual machine according to an embodiment
  • FIG. 6 is a schematic structural diagram of another first virtual machine according to an embodiment
  • FIG. 7 is a schematic structural diagram of a name node according to an embodiment
  • FIG. 8 is a schematic structural diagram of another name node according to an embodiment
  • FIG. 9 is a schematic structural diagram of a client provided by the embodiment.
  • FIG. 10 is a schematic structural diagram of another client provided by the embodiment.
  • FIG. 11 is a schematic structural diagram of a second virtual machine according to an embodiment
  • FIG. 12 is a schematic structural diagram of another second virtual machine according to an embodiment
  • FIG. 13 is a schematic structural diagram of a distributed file system according to an embodiment of the present invention.
  • the embodiment of the invention relates to a distributed file system.
  • the distributed file system is described in detail below.
  • a distributed file system can include a name node and multiple data nodes.
  • the name node may also be referred to as a master server or other name.
  • the data node may also be referred to as a data server or other name. It should be noted that only the distributed file system includes a client scenario in FIG. 1 . In practice, a distributed file system may include multiple clients.
  • the name node is used to manage multiple data nodes, the name node records information of files stored in each data node (such as metadata files), the service status of each data node, etc.; the data node is used to store files when the client When the file is read and written, the client first requests the name node to obtain the index information of the data node, and then accesses the corresponding data node according to the requested index information to perform file reading and writing.
  • Files may be synchronized between multiple data nodes. For example, when a file needs to be written in two data nodes, one of the data nodes can be written first, and then the data node synchronizes the file to another data node.
  • information interaction between the name node and the data node is also possible.
  • the name node, the data node, and the client may configure the corresponding function implementation on any of the following computing devices.
  • the computing capable device may be a physical device or a virtual device; for example, the physical device may be a personal computer, a notebook computer, a mainframe, a networked computer, a handheld computer, a personal digital assistant, a workstation, etc., and the virtual device may be deployed in a physical device. A virtual machine or container in the device.
  • the virtual hard disk of the virtual machine is provided by the distributed block storage system, and the distributed block storage system manages multiple physical hard disks, and the file written to the virtual hard disk of the virtual machine is actually distributed. The file is written to the physical hard disk managed by the block storage system.
  • the distributed file system in order to ensure its own reliability, the distributed file system generally uses a file copy mechanism when storing files.
  • the file when storing a file, the file is stored on two data nodes, that is, stored in a virtual On the machine 1 and the virtual machine 2; in order to ensure the reliability of the distributed block storage system, the file copy mechanism is also adopted when storing the files of the virtual machine, for example, when the file of the virtual machine 1 is stored, respectively, in the physical The physical hard disk of the server 1, the physical hard disk 3 of the physical server 2, and the physical hard disk 5 of the physical server 3 store the file, and are stored in the physical hard disk 2 of the physical server 1 when the file of the virtual machine 2 is stored.
  • the file is stored on the physical hard disk 4 of the physical server 2 and the physical hard disk 6 of the physical server 3.
  • the file is saved in six copies on the physical hard disk managed by the distributed block storage system. Obviously, the redundancy of the file number saved for the same file will waste storage space and affect the processing performance of the system.
  • the distributed file system can store multiple files, so the number of virtual machines included in the distributed file system is not limited, and the number of virtual hard disks included in each virtual machine is not limited; The number of physical servers included in the block storage system is not limited, and the number of physical hard disks included in each physical server is not limited.
  • the embodiment of the present invention provides a method for storing a file in a distributed file system, where the distributed file system includes a name node and multiple virtual machines as data nodes. Among them, multiple virtual machines share the same storage area. As shown in FIG. 3, the method includes:
  • S301 The client sends a request message to the name node to write to the distributed file system to write data to be written.
  • the data to be written may be video data, audio data, document data or other binary data.
  • the granularity of the data to be written can be a file, a data block, or other granularity.
  • the number of data to be written may be one or more. As long as one or more data is written to the distributed file system after the method shown in FIG. 3 is executed once, the one or more data can be regarded as to be written. data.
  • S302 The name node sends a response message corresponding to the request message to the client.
  • the response message includes an address of the first virtual machine and an address of the second virtual machine, and the response message further indicates that the first virtual machine is a virtual machine having the right to write data to the storage area among the plurality of virtual machines, and the second virtual The machine is a virtual machine other than the first virtual machine among the plurality of virtual machines.
  • the number of the first virtual machines must be one; the number of the second virtual machines may be one or more, and the number of the second virtual machines is not limited in the embodiment of the present invention.
  • only one virtual machine of the plurality of virtual machines has the right to write data to the storage area, because the reason is: if there are multiple virtual machines for writing data to be written, then when the client wants to When the distributed file system writes data to be written, multiple virtual machines receive an instruction to write data to be written; since multiple virtual machines share the same storage area, the write data to be written received by multiple virtual machines
  • the instruction will instruct multiple virtual machines to write the data to be written to the same storage area at the same time. This will cause the instruction to write the data to be written to be indistinguishable from which virtual machine should write the data to be written, resulting in writing the data to be written.
  • the instructions cannot be executed.
  • the reason why the number of the second virtual machines in the embodiment of the present invention may be multiple is that when there are multiple clients to read the data to be written, the second virtual machine can be read to improve the client reading. The efficiency of the data to be written.
  • the second virtual machine may directly access the distributed block storage system. The data to be written is read by the system to avoid the situation that the prior art fails to read the data to be written when the first virtual machine fails.
  • the response message sent by the name node to the client may only indicate that the first virtual machine has the right to write data to the storage area, without indicating the second virtual machine.
  • the reason is: since only one virtual machine of the plurality of virtual machines has the right to write data to the storage area, when the response message indicates that the first virtual machine of the plurality of virtual machines has the data written to the storage area Permission, then the second virtual machine other than the first virtual machine among the plurality of virtual machines has the right to read the data to be written from the storage area by default.
  • the response message further indicates that the second virtual machine has permission to read data to be written from the stored area.
  • the rights of the first virtual machine indicated by the response message and the rights of the second virtual machine are only for the data to be written.
  • the first virtual machine and the second virtual machine share a storage area in which data to be written is written.
  • the response message indicates that the virtual machine 1 is the first virtual machine, the virtual machine 2 is the second virtual machine, and the virtual machine 1 writes the data to be written 1 to the virtual machine 1
  • the metadata of the data to be written 1 is then sent to the virtual machine 2, wherein the virtual machine 2 shares the storage area 1 with the virtual machine 1
  • the response The message indicates that the virtual machine 1 is the first virtual machine, and the virtual machine 3 is the second virtual machine.
  • the virtual machine 1 writes the data to be written 2 into the storage area 2 of the virtual machine 1, and then sends the metadata of the data to be written 2 to The virtual machine 3, wherein the virtual machine 3 shares the storage area 2 with the virtual machine 1.
  • S303 The client sends the to-be-written data and the address of the second virtual machine to the first virtual machine according to the address of the first virtual machine.
  • the client sends the to-be-written data to the first virtual machine and the address of the second virtual machine is used to instruct the first virtual machine to write the data to be written, generate or update the metadata to be written, and according to the second virtual machine.
  • the address sends the metadata of the data to be written to the second virtual machine.
  • S304 The first virtual machine writes data to be written to the storage area shared by the plurality of virtual machines, and generates or updates metadata of the data to be written.
  • the metadata of the data to be written may be used by the first virtual machine and the second virtual machine to read data to be written from the storage area shared by the plurality of virtual machines according to the metadata; the metadata of the data to be written includes but is not limited to: The storage location of the write data, the name of the data to be written, and the directory of the data to be written.
  • S305 The first virtual machine sends the generated or updated metadata to the second virtual machine according to the address of the second virtual machine.
  • the distributed file system may generally include a client or may not include a client. If the distributed file system includes a client, the number of clients includes, but is not limited to, one. In the embodiment of the present invention, in order to more clearly describe the interaction between the client, the name node, the first virtual machine, and the second virtual machine, the client is included in the distributed file system. In actual implementation, the distributed file system may not include the client. In this case, the embodiment of the present invention may be regarded as interaction between the client and the distributed file system.
  • the distributed file system includes multiple virtual machines that mount the same virtual hard disk provided by the distributed block storage system, where the virtual hard disk includes a storage area shared by multiple virtual machines.
  • the data to be written is only in the storage area.
  • For the data to be written only multiple copies are saved due to the file copy mechanism adopted by the distributed block storage system, and there is no saved file due to the file copy mechanism adopted by both the distributed file system and the distributed block storage system. The problem of number of copies.
  • the first virtual machine among the plurality of virtual machines included in the distributed file system has the authority to write data to the storage area, and the plurality of virtual machines The second virtual machine other than the first virtual machine has the right to read the data to be written from the storage area.
  • the number of virtual machines available in a distributed file system that can be used to provide clients with services for reading and writing data to be written is multiple.
  • other virtual machines can provide services for the client to read and write data to be written, so that the availability of the distributed file system is improved, and the hot standby of the virtual machine is avoided in the prior art. There is a waste of resources in the mechanism.
  • a distributed file system and a distributed block storage system using the method shown in FIG. 3 can be as shown in FIG.
  • the distributed file system shown in FIG. 4 includes a first virtual machine, a second virtual machine, a client, and a name node. In actual implementation, there is no limit to the number of second virtual machines and the number of clients.
  • the distributed block storage system shown in Figure 4 contains three physical servers, each containing two physical hard disks.
  • the first virtual machine has the right to write data to the storage area
  • the second virtual machine has the right to read the data to be written from the storage area. Since the first virtual machine and the second virtual machine share the same storage area, the first virtual machine and the second virtual share can share the same virtual hard disk 1.
  • the first virtual machine can write the data to be written to the virtual hard disk 1.
  • the second virtual machine can read the data to be written from the virtual hard disk 1. Therefore, the data to be written is only stored in the distributed file system, that is, stored in the virtual hard disk 1.
  • the data to be written can be stored in the distributed block storage system, for example, the physical hard disk 1 of the physical server 1 is separately stored.
  • the data to be written is only saved in the physical hard disk by three copies.
  • the same file in the distributed file system and the distributed block storage system shown in FIG. 2 is stored in the physical hard disk by six copies.
  • the distributed file system shown in FIG. 4 is used after the method shown in FIG.
  • the data to be written in the distributed block storage system is only saved in the physical hard disk by three copies, thereby greatly reducing the number of copies of the file and solving the problem of redundancy of the file number saved in the distributed file system.
  • the first virtual machine can be used to write data to be written and read data to be written, and the second virtual machine can be used to read data to be written, so that when one of the virtual machines fails, the other virtual machine can be used.
  • a non-failed virtual machine provides the client with services to read and write data to be written, improving system availability.
  • the data is written by the operating system of the first virtual machine, and the client needs to read the data to be written by using the second virtual machine.
  • the operating system of the second virtual machine needs to be read or updated according to the metadata of the data to be written, so that the operating system of the second virtual machine can be implemented from the storage area.
  • the file information is used by an operating system of the second virtual machine to read the to-be-written data from the storage area. There are two ways to update the file information in the operating system of the second virtual machine. First, if the operating system of the second virtual machine can learn the change of the data in the storage area, the file of the data to be written can be updated by itself. The second operating system of the second virtual machine can update the file information of the data to be written according to the metadata of the data to be written sent by the first virtual machine.
  • the storage area is directly written, instead of being written by the operating system of the first virtual machine, the client can pass the second virtual machine.
  • the data to be written in the shared storage area is directly read without being read by the operating system of the second virtual machine.
  • the second virtual machine does not need to generate or update the file information recorded in the operating system according to the metadata of the data to be written, and the second virtual machine can read from the storage area according to the metadata of the data to be written. Waiting for data to be written.
  • the name node needs to indicate, by the corresponding message, that the first virtual machine is a virtual machine having the right to write data to the storage area among the plurality of virtual machines, and the second virtual machine is the first virtual machine among the plurality of virtual machines.
  • the virtual machine other than the virtual machine that is, after the operation of the above S302, the client can obtain not only the address of the first virtual machine and the address of the second virtual machine, but also the first virtual machine has the right to write data to the storage area.
  • the second virtual machine has the right to read data to be written from the storage area.
  • the manner in which the name node notifies the client of the rights of the first virtual machine and the second virtual machine includes but is not limited to the following two types:
  • the response message sent by the name node to the client further includes a write permission identifier of the first virtual machine and a read permission identifier of the second virtual machine, where the write permission identifier is used to specify that the first virtual machine has a write to the storage area.
  • the permission to write data, the read permission identifier is used to specify that the second virtual machine has the right to read the data to be written from the storage area.
  • the client learns the rights of the first virtual machine and the second virtual machine according to the write permission identifier and the read permission identifier, and sends the write permission identifier to the first virtual machine, and sends the read permission identifier to the second virtual machine, that is, multiple virtual machines The respective permissions of the machine are sent to the corresponding virtual machine.
  • the process of sending the write permission identifier to the first virtual machine by the client may be performed before S303 or after S303, or simultaneously with S303, that is, the write permission identifier, the data to be written, and the second virtual The address of the machine is sent to the first virtual machine.
  • the embodiment of the present invention does not limit the order of execution of the two steps.
  • the client can send the read permission identifier to the second virtual machine.
  • the address of the first virtual machine and the address of the second virtual machine are arranged according to a preset rule in the response message, where the preset rule is used to specify that the first virtual machine has the right to write the data to be written to the storage area, and the second virtual The machine has the right to read data to be written from the storage area.
  • the preset rule may be an order of addresses of virtual machines included in the response message.
  • the name node and the client may agree in advance that the first address sent by the name node to the client is the address of the first virtual machine, and the client may determine after receiving the addresses of the multiple virtual machines included in the response message.
  • the first address is the address of the first virtual machine having the right to write data to the storage area
  • the remaining addresses are the addresses of the second virtual machine having the right to read the data to be written from the storage area.
  • the method for detecting a failure of a virtual machine includes but is not limited to the following three types: first, the name node detects that a virtual machine has failed; second, the client reads through a virtual machine. If the data is read or written, if the data read/write process cannot be completed, it is determined that the virtual machine is faulty, and the client reports the failure of the virtual machine to the name node. Third, the virtual machine periodically performs a self-test. When a virtual machine is discovered When the fault occurs, the fault message is reported to the name node or reported to the name node through the client. Therefore, when a virtual machine in the distributed file system fails, the name node can learn the failure message of the virtual machine in the above three ways, and then take corresponding operations to avoid the distributed file system cannot provide the client. The case of data read and write services.
  • the failure of the virtual machine of the distributed file system may be classified into the following two cases:
  • the name node sends a first update message to the client, where the first update message includes an updated address of the first virtual machine, where the first update message specifies a failure of the plurality of virtual machines.
  • Another virtual machine other than the first virtual machine serves as the updated first virtual machine, that is, the first virtual machine indicating the update has the right to write data to the storage area shared by the plurality of virtual machines; thus, the client needs to write data.
  • it can be written by the updated first virtual machine.
  • the name node specifies another virtual machine other than the failed first virtual machine among the plurality of virtual machines as the updated first virtual machine, that is, the first indicating the update.
  • the virtual machine has the right to write data to the storage area shared by multiple virtual machines.
  • the client wants to write data
  • the virtual machine can write through the updated first virtual machine.
  • the virtual machine can pass the first The second virtual machine reads or reads through the updated first virtual machine.
  • the client does not affect the writing or reading of data by the client, thereby improving the availability of the system.
  • the number of the second virtual machines may be one or more.
  • the following method may be further performed after executing the foregoing method.
  • the name node specifies one or more updated second virtual machines in addition to the plurality of virtual machines included in the distributed file system, and the updated second virtual machine has the right to read the data to be written from the storage area, and the updated second The virtual machine shares the same storage area with the plurality of virtual machines included in the distributed file system; the name node indicates to the client that the updated second virtual machine has the right to read the data to be written from the storage area.
  • the client After receiving the indication of the name node, the client notifies the updated first virtual machine: the metadata of the data to be written is sent to the updated second virtual machine. In this way, when the client wants to read the data to be written, it can be read not only by the second virtual machine but also by the updated second virtual machine.
  • the name node specifies the updated second virtual machine outside the plurality of virtual machines included in the distributed file system to have the right to read the data to be written from the storage area.
  • the client Not only can be read by the second virtual machine, but also read by the updated second virtual machine, which improves the efficiency of the client reading the data to be written.
  • the name node sends the second update information to the client, the second update information includes the address of the updated second virtual machine, and the second update information specifies another virtual machine other than the plurality of virtual machines.
  • the updated second virtual machine has the right to read the data to be written from the storage area.
  • the updated second virtual machine shares the same storage area as the plurality of virtual machines included in the distributed file system.
  • the client notifies the first virtual machine according to the indication of the name node: sending the metadata of the data to be written to the updated Two virtual machines.
  • the first virtual machine may send the metadata of the data to be written to the updated second virtual machine according to the indication of the notification message. In this way, when the client wants to read the data to be written, it can not only pass The second virtual machine that has not failed can be read or read by the first virtual machine, and can also be read by the updated second virtual machine.
  • the name node specifies that the updated second virtual machine has the right to read the data to be written from the storage area, and then the first time when the client wants to write the data to be written
  • the virtual machine writes when the client wants to read the data to be written, can be read not only by the first virtual machine and the second virtual machine that has not failed, but also by the updated second virtual machine.
  • the client when the first virtual machine fails, the client does not affect the writing or reading of the data to be written by the client, thereby improving the availability of the system.
  • the method for storing files in the distributed file system provided by the embodiment of the present invention can solve the problem of redundancy of files stored in the distributed file system.
  • the method for storing files in the distributed file system provided by the embodiment of the present invention does not affect the client to read or write files, thereby improving system availability. .
  • the embodiment of the present invention provides a first virtual machine in a distributed file system, where the distributed file system includes a name node and a plurality of virtual machines as data nodes, and the plurality of virtual machines share the same storage area, and the first virtual machine is A virtual machine in a plurality of virtual machines that is designated by a name node to have data to write to the storage area.
  • the first virtual machine 500 includes:
  • the receiving module 501 is configured to receive data to be written sent by the client, and an address of the second virtual machine, where the second virtual machine is a virtual machine other than the first virtual machine.
  • the processing module 502 is configured to write, to the storage area, the data to be written received by the receiving module 501, and generate or update metadata of the data to be written;
  • the sending module 503 is configured to send or update the metadata to the second virtual machine sending processing module 502 according to the address of the second virtual machine received by the receiving module 501.
  • the receiving module 501 is further configured to: before the processing module 502 writes the data to be written to the storage area, receive the write permission identifier of the first virtual machine sent by the client.
  • the write permission identifier is sent to the client when the client requests the name node to write the data to be written to the distributed file system, and the write permission identifier is used to specify that the first virtual machine has written to the storage area to be written. Permissions for data.
  • multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • the metadata is used by the second virtual machine to generate or update file information recorded in its own operating system, and the file information is used by the operating system from the storage.
  • the data to be written is read in the area; or if the second virtual machine reads the data to be written, the metadata is used by the second virtual machine to read the data to be written from the storage area.
  • the second virtual machine is designated by the name node to have read access to the data to be written from the storage area.
  • the first virtual machine 500 provided by the embodiment of the present invention can solve the problem of redundancy of files stored in the distributed file system.
  • the operation of the first virtual machine 500 provided by the embodiment of the present invention may cause the failure of the virtual machine to not affect the client to read or write the file. Increased system availability.
  • first virtual machine 500 provided by the embodiment of the present invention may be used to perform operations performed by the first virtual machine in the method for storing files in the distributed file system shown in FIG. 3, and the first virtual machine 500 does not explain in detail. Description For the implementation, refer to the related description in the method for storing files in the distributed file system shown in FIG. 3.
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the embodiment of the present invention further provides a first virtual machine, and the first virtual machine may perform the method provided by the embodiment corresponding to FIG. 3, which may be the same as the first virtual machine 500 shown in FIG. 5.
  • the device where the first virtual machine 600 is located includes at least one processor 601, a memory 602, and a communication interface 603; the at least one processor 601, the memory 602, and the communication interface 603 are all connected by a bus 604;
  • the memory 602 is configured to store a computer execution instruction
  • the at least one processor 601 is configured to execute a computer execution instruction stored by the memory 602, so that the first virtual machine 600 performs data interaction with other devices in the distributed file system through the communication interface 603 to perform the foregoing.
  • the method for storing a file in a distributed file system provided by an embodiment, or causing the first virtual machine 600 to perform data interaction with other devices in the distributed file system through the communication interface 603 to implement a part of the distributed file system or All features.
  • the at least one processor 601 may include different types of processors 601, or include the same type of processor 601; the processor 601 may be any one of the following: a central processing unit (CPU), an ARM processor , Field Programmable Gate Array (FPGA), dedicated processor and other devices with computational processing capabilities. In an optional implementation manner, the at least one processor 601 may also be integrated into a many-core processor.
  • processors 601 may include different types of processors 601, or include the same type of processor 601; the processor 601 may be any one of the following: a central processing unit (CPU), an ARM processor , Field Programmable Gate Array (FPGA), dedicated processor and other devices with computational processing capabilities.
  • the at least one processor 601 may also be integrated into a many-core processor.
  • the memory 602 may be any one or any combination of the following: a random access memory (RAM), a read only memory (ROM), a non-volatile memory (non-volatile memory). (NVM), Solid State Drives (SSD), mechanical hard disks, disks, disk arrays and other storage media.
  • RAM random access memory
  • ROM read only memory
  • NVM non-volatile memory
  • SSD Solid State Drives
  • the communication interface 603 is used by the first virtual machine 600 to perform data interaction with other devices, such as other devices in a distributed file system.
  • the communication interface 603 may be any one or any combination of the following: a network interface (such as an Ethernet interface), a wireless network card, or the like having a network access function.
  • the bus 604 can include an address bus, a data bus, a control bus, etc., for ease of representation, Figure 6 shows the bus with a thick line.
  • the bus 604 can be any one or any combination of the following: an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, and an extended industry standard structure ( Extended Industry Standard Architecture (EISA) bus and other devices for wired data transmission.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the embodiment of the present invention provides a name node in a distributed file system, where the distributed file system includes a name node and multiple virtual machines as data nodes, and multiple virtual machines share the same storage area; as shown in FIG. Node 700 includes:
  • the receiving module 701 is configured to receive a request message that the client requests to write data to be written to the distributed file system;
  • the sending module 702 is configured to send, to the client, a response message corresponding to the request message received by the receiving module 701,
  • the response message includes an address of the first virtual machine and an address of the second virtual machine, and the response message indicates that the first virtual machine is a virtual machine having the right to write data to the storage area among the plurality of virtual machines, and the second virtual machine is more A virtual machine other than the first virtual machine in the virtual machine.
  • the response message further indicates that the second virtual machine has the right to read the data to be written from the storage area.
  • the response message further includes a write permission identifier of the first virtual machine and a read permission identifier of the second virtual machine, where the write permission identifier is used to specify that the first virtual machine has the right to write the data to be written to the storage area, and the read permission The identifier is used to specify that the second virtual machine has the right to read data to be written from the storage area.
  • the address of the first virtual machine and the address of the second virtual machine in the response message are arranged according to a preset rule, where the preset rule is used to specify that the first virtual machine has the right to write the data to be written to the storage area, and the designated The second virtual machine has the right to read data to be written from the storage area.
  • multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • the sending module 702 is further configured to: when the first virtual machine fails, send the first update information to the client, where the first update information includes an address of the updated first virtual machine, and the first update information specifies multiple Another virtual machine other than the failed first virtual machine in the virtual machine as the updated first virtual machine, the updated first virtual machine has the right to write data to the storage area; and/or when the second virtual machine occurs
  • the second update information is sent to the client, the second update information includes the address of the updated second virtual machine, and the second update information specifies another virtual machine other than the plurality of virtual machines as the updated second virtual machine, and is updated.
  • the second virtual machine has the right to read data to be written from the storage area.
  • the name node 700 provided by the embodiment of the present invention can solve the problem of redundancy of file shares stored in the distributed file system.
  • the operation of the name node 700 provided by the embodiment of the present invention can make the failure of the virtual machine not affect the client to read or write the file, thereby improving the System availability.
  • name node 700 provided by the embodiment of the present invention can be used to perform the operations performed by the name node in the method for storing files in the distributed file system shown in FIG. 3, and the implementation manner of the name node 700 not explained and described in detail can be referred to.
  • the embodiment of the present invention further provides a name node, which can perform the method provided by the embodiment corresponding to FIG. 3, and can be the same as the name node 700 shown in FIG. 7.
  • the name node 800 includes at least one processor 801, a memory 802, and a communication interface 803; the at least one processor 801, the memory 802, and the communication interface 803 are each connected by a bus 804;
  • the memory 802 is configured to store a computer execution instruction
  • the at least one processor 801 is configured to execute a computer execution instruction stored by the memory 802, so that the name node 800 performs data interaction with other devices in the distributed file system through the communication interface 803 to execute the foregoing embodiment.
  • the at least one processor 801 may include different types of processors 801 or include the same type of processor 801; the processor 801 may be any of the following: a CPU, an ARM processor, an FPGA, a dedicated processor, etc. having a processing process Capable device. In an optional implementation manner, the at least one processor 801 can also be integrated into a multi-core processing. Device.
  • the memory 802 may be any one or any combination of the following: a storage medium such as a RAM, a ROM, an NVM, an SSD, a mechanical hard disk, a magnetic disk, a disk array, or the like.
  • a storage medium such as a RAM, a ROM, an NVM, an SSD, a mechanical hard disk, a magnetic disk, a disk array, or the like.
  • Communication interface 803 is used by name node 800 to interact with other devices, such as other devices in a distributed file system.
  • the communication interface 803 may be any one or any combination of the following: a network interface (such as an Ethernet interface), a wireless network card, and the like having a network access function.
  • the bus 804 can include an address bus, a data bus, a control bus, etc., for ease of representation, Figure 8 shows the bus with a thick line.
  • the bus 804 may be any one or any combination of the following: a device for wired data transmission such as an ISA bus, a PCI bus, or an EISA bus.
  • the embodiment of the present invention provides a client, where the distributed file system of the client includes a name node and multiple virtual machines as data nodes, and multiple virtual machines share the same storage area; as shown in FIG. 9, the client 900 include:
  • the sending module 901 is configured to send, to the name node, a request message that requests to write data to be written to the distributed file system;
  • the receiving module 902 is configured to receive a response message corresponding to the request message sent by the name node, where the response message includes an address of the first virtual machine and an address of the second virtual machine, where the response message indicates that the first virtual machine has a plurality of virtual machines. a virtual machine that stores the right to write data in the storage area, and the second virtual machine is a virtual machine other than the first virtual machine among the plurality of virtual machines;
  • the sending module 901 is further configured to send, to the first virtual machine, the to-be-written data, the address of the second virtual machine, according to the address of the first virtual machine included in the response message received by the receiving module 902, and instruct the first virtual machine to write the to-be-written
  • the data, the metadata for generating or updating the data to be written, and the metadata of the data to be written are transmitted to the second virtual machine according to the address of the second virtual machine included in the response message received by the receiving module 902.
  • the response message further includes a write permission identifier of the first virtual machine and a read permission identifier of the second virtual machine, where the write permission identifier is used to specify that the first virtual machine has the right to write the data to be written to the storage area, and the read permission The identifier is used to specify that the second virtual machine has the right to read data to be written from the storage area.
  • the address of the first virtual machine and the address of the second virtual machine included in the response message are arranged according to a preset rule, where the preset rule is used to specify that the first virtual machine has the right to write the data to be written to the storage area, and the specified The second virtual machine has the right to read data to be written from the storage area.
  • multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • the response message further indicates that the second virtual machine has the right to read the data to be written from the storage area.
  • the client 900 provided by the embodiment of the present invention can solve the problem of redundancy of files stored in the distributed file system.
  • the operation of the client 900 provided by the embodiment of the present invention can make the failure of the virtual machine not affect the client to read or write files, thereby improving the System availability.
  • the client 900 provided by the embodiment of the present invention may be used to perform operations performed by a client in a method for storing a file in the distributed file system shown in FIG. 3, and an implementation manner not specifically explained and described by the client 900 may be referred to.
  • the embodiment of the present invention further provides a client, where the client can perform the corresponding figure in FIG. 3 .
  • the method provided by the embodiment can be the same as the client 900 shown in FIG.
  • the device where the client 1000 is located includes at least one processor 1001, a memory 1002, and a communication interface 1003; the at least one processor 1001, the memory 1002, and the communication interface 1003 are all connected by a bus 1004;
  • the memory 1002 is configured to store a computer execution instruction
  • the at least one processor 1001 is configured to execute a computer execution instruction stored by the memory 1002, so that the client 1000 performs data interaction with a device in a distributed file system through the communication interface 1003 to perform the foregoing embodiment.
  • the at least one processor 1001 may include different types of processors 1001, or include the same type of processor 1001; the processor 1001 may be any of the following: a CPU, an ARM processor, an FPGA, a dedicated processor, etc. have computational processing Capable device. In an optional implementation manner, the at least one processor 1001 may also be integrated into a many-core processor.
  • the memory 1002 may be any one or any combination of the following: a storage medium such as a RAM, a ROM, an NVM, an SSD, a mechanical hard disk, a magnetic disk, a disk array, or the like.
  • a storage medium such as a RAM, a ROM, an NVM, an SSD, a mechanical hard disk, a magnetic disk, a disk array, or the like.
  • Communication interface 1003 is used by client 1000 to perform data interaction with other devices, such as other devices in a distributed file system.
  • the communication interface 1003 may be any one or any combination of the following: a network interface (such as an Ethernet interface), a wireless network card, and the like having a network access function.
  • the bus 1004 can include an address bus, a data bus, a control bus, etc., for ease of representation, Figure 10 shows the bus with a thick line.
  • the bus 1004 may be any one or any combination of the following: a device for wired data transmission such as an ISA bus, a PCI bus, or an EISA bus.
  • the embodiment of the present invention provides a second virtual machine in a distributed file system.
  • the distributed file system includes a name node and multiple virtual machines as data nodes, and multiple virtual machines share the same storage area; as shown in FIG.
  • the second virtual machine 1100 includes:
  • the receiving module 1101 is configured to receive metadata sent by the first virtual machine, where the first virtual machine is a virtual machine that is designated by the name node to have the right to write data to the storage area, and the second virtual machine is The virtual machine other than the first virtual machine in the virtual machine, the metadata is the metadata of the data to be written generated or updated after the first virtual machine writes the data to be written to the storage area.
  • the receiving module 1101 is further configured to: before receiving the metadata sent by the first virtual machine, receive a read permission identifier of the second virtual machine sent by the client, where the read permission identifier is a name node requesting from the client node to the name node
  • the read permission identifier is used to specify that the second virtual machine has the right to read the data to be written from the storage area when the data to be written is written to the distributed file system.
  • multiple virtual machines mount the same virtual hard disk provided by the distributed block storage system, and the virtual hard disk includes a storage area.
  • the second virtual machine further includes: a processing module 1102, configured to: after the receiving module 1101 receives the metadata sent by the first virtual machine, if the second virtual machine reads the data to be written by using the operating system of the second virtual machine, Metadata generates or updates file information recorded in its own operating system, the file information is used by the operating system to read data to be written from the storage area; or if the second virtual machine reads data to be written, according to metadata from the storage area Read the data to be written.
  • a processing module 1102 configured to: after the receiving module 1101 receives the metadata sent by the first virtual machine, if the second virtual machine reads the data to be written by using the operating system of the second virtual machine, Metadata generates or updates file information recorded in its own operating system, the file information is used by the operating system to read data to be written from the storage area; or if the second virtual machine reads data to be written, according to metadata from the storage area Read the data to be written.
  • the second virtual machine is designated by the name node to have read access to the data to be written from the storage area.
  • the second virtual machine 1100 provided by the embodiment of the present invention can solve the problem of redundancy of file shares stored in the distributed file system.
  • the operation of the second virtual machine 1100 provided by the embodiment of the present invention may cause the failure of the virtual machine to not affect the client to read or write the file. Increased system availability.
  • the second virtual machine 1100 provided by the embodiment of the present invention may be used to perform operations performed by the second virtual machine in the method for storing files in the distributed file system shown in FIG. 3, and the second virtual machine 1100 does not explain in detail.
  • the embodiment of the present invention further provides a second virtual machine, which may perform the method provided by the embodiment corresponding to FIG. 3, and may be the same as the second virtual machine 1100 shown in FIG.
  • the device where the second virtual machine 1200 is located includes at least one processor 1201, a memory 1202, and a communication interface 1203; the at least one processor 1201, the memory 1202, and the communication interface 1203 are all connected by a bus 1204;
  • the memory 1202 is configured to store a computer execution instruction
  • the at least one processor 1201 is configured to execute a computer execution instruction stored by the memory 1202, so that the second virtual machine 1200 performs data interaction with other devices in the distributed file system through the communication interface 1203 to perform the foregoing.
  • the method for storing a file in a distributed file system provided by an embodiment, or causing the second virtual machine 1200 to perform data interaction with other devices in the distributed file system through the communication interface 1203 to implement a part of the distributed file system or All features.
  • At least one processor 1201 may include different types of processors 1201, or include the same type of processor 1201; the processor 1201 may be any of the following: CPU, ARM processor, FPGA, dedicated processor, etc. with calculation processing Capable device. In an optional implementation manner, the at least one processor 1201 may also be integrated into a many-core processor.
  • the memory 1202 may be any one or any combination of the following: a storage medium such as a RAM, a ROM, an NVM, an SSD, a mechanical hard disk, a magnetic disk, a disk array, or the like.
  • a storage medium such as a RAM, a ROM, an NVM, an SSD, a mechanical hard disk, a magnetic disk, a disk array, or the like.
  • Communication interface 1203 is used by second virtual machine 1200 to perform data interaction with other devices, such as other devices in a distributed file system.
  • the communication interface 1203 may be any one or any combination of the following: a network interface (such as an Ethernet interface), a wireless network card, and the like having a network access function.
  • the bus 1204 can include an address bus, a data bus, a control bus, etc., for ease of representation, Figure 12 shows the bus with a thick line.
  • the bus 1204 may be any one or any combination of the following: a device for wired data transmission such as an ISA bus, a PCI bus, or an EISA bus.
  • the embodiment of the present invention provides a distributed file system.
  • the distributed file system 1300 includes a first virtual machine 1301, a name node 1302, a client 1303, and a second virtual machine 1304.
  • the first virtual machine 1301 in the distributed file system 1300 can be used to perform the related operations performed by the first virtual machine in the method for storing files in the distributed file system shown in FIG. 3 , and the specific implementation may be FIG. 5 .
  • the first virtual machine 500 shown or the first virtual machine 600 shown in FIG. 6; the name node 1302 in the distributed file system 1300 can be used to execute the name node in the method for storing files in the distributed file system shown in FIG.
  • the specific implementation of the related operations may be the name node 700 shown in FIG. 7 or the name node 800 shown in FIG. 8; the client 1303 in the distributed file system 1300 may be used to perform the distributed process shown in FIG.
  • File storage in the file system The specific operation performed by the client in the method may be the client 900 shown in FIG. 9 or the client 1000 shown in FIG. 10; the second virtual machine 1304 in the distributed file system 1300 may be used to execute The related operations performed by the second virtual machine in the method for storing files in the distributed file system shown in FIG. 3 may be the second virtual machine 1100 shown in FIG. 11 or the second virtual device shown in FIG. Machine 1200.
  • the data to be written is saved only in a storage area shared by a plurality of virtual machines, which solves the problem of redundancy of files stored in the distributed file system.
  • the client can still perform a write operation or a read operation on the write data by the non-failed virtual machine in the distributed file system 1300, thereby improving the distribution.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Abstract

L'invention concerne in procédé de stockage d'un fichier, une première machine virtuelle et un nœud de nom, utilisés pour résoudre le problème de l'existence de fichiers redondants lorsqu'un système de fichiers réparti stocke des fichiers, et améliorant la disponibilité du système. Le procédé comporte les étapes suivantes: un client envoie à un nœud de nom un message de demande pour demander l'écriture de données à écrire dans un système de fichiers réparti; le nœud de nom envoie au client un message de réponse correspondant au message de demande, le message de réponse comportant l'adresse d'une première machine virtuelle et l'adresse d'une deuxième machine virtuelle, et indiquant que la première machine virtuelle est une machine virtuelle parmi une pluralité de machines virtuelles qui a le droit d'écrire des données dans une zone de stockage et que la deuxième machine virtuelle est une machine virtuelle parmi la pluralité de machines virtuelles autres que la première machine virtuelle; le client envoie les données à écrire et l'adresse de la deuxième machine virtuelle à la première machine virtuelle; la première machine virtuelle écrit les données à écrire dans la zone de stockage partagée par la pluralité de machines virtuelles, et génère ou actualise les métadonnées des données à écrire; la première machine virtuelle envoie les métadonnées générées ou actualisées à la deuxième machine virtuelle.
PCT/CN2017/085351 2016-09-23 2017-05-22 Procédé de stockage d'un fichier, première machine virtuelle et nœud de nom WO2018054079A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610846967.0A CN106446159B (zh) 2016-09-23 2016-09-23 一种存储文件的方法、第一虚拟机及名称节点
CN201610846967.0 2016-09-23

Publications (1)

Publication Number Publication Date
WO2018054079A1 true WO2018054079A1 (fr) 2018-03-29

Family

ID=58167356

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/085351 WO2018054079A1 (fr) 2016-09-23 2017-05-22 Procédé de stockage d'un fichier, première machine virtuelle et nœud de nom

Country Status (2)

Country Link
CN (1) CN106446159B (fr)
WO (1) WO2018054079A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111443872A (zh) * 2020-03-26 2020-07-24 深信服科技股份有限公司 分布式存储系统构建方法、装置、设备、介质
CN113641467A (zh) * 2021-10-19 2021-11-12 杭州优云科技有限公司 一种虚拟机的分布式块存储实现方法

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446159B (zh) * 2016-09-23 2019-11-12 华为技术有限公司 一种存储文件的方法、第一虚拟机及名称节点
CN107704596B (zh) * 2017-10-13 2021-06-29 郑州云海信息技术有限公司 一种读取文件的方法、装置及设备
CN109753226A (zh) * 2017-11-07 2019-05-14 阿里巴巴集团控股有限公司 数据处理系统、方法及电子设备
CN110110003A (zh) * 2018-01-26 2019-08-09 广州中国科学院计算机网络信息中心 M2m平台的数据存储控制方法及装置
CN110688194B (zh) * 2018-07-06 2023-03-17 中兴通讯股份有限公司 基于云桌面的磁盘管理方法、虚拟机及存储介质
CN113037569A (zh) * 2021-04-19 2021-06-25 杭州和利时自动化有限公司 一种基于双服务器的冗余服务方法、装置、设备及介质
CN114138737B (zh) * 2022-02-08 2022-07-12 亿次网联(杭州)科技有限公司 文件存储方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130325812A1 (en) * 2012-05-30 2013-12-05 Spectra Logic Corporation System and method for archive in a distributed file system
CN103729250A (zh) * 2012-10-11 2014-04-16 国际商业机器公司 用于选择被配置为满足一组要求的数据节点的方法和系统
CN104731691A (zh) * 2013-12-18 2015-06-24 国际商业机器公司 动态调整分布式文件系统内文件副本数目的方法和系统
CN104838374A (zh) * 2012-12-06 2015-08-12 英派尔科技开发有限公司 分散hadoop集群
CN106446159A (zh) * 2016-09-23 2017-02-22 华为技术有限公司 一种存储文件的方法、第一虚拟机及名称节点

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521063B (zh) * 2011-11-30 2013-12-25 广东电子工业研究院有限公司 一种适用于虚拟机迁移和容错的共享存储方法
CN103797770B (zh) * 2012-12-31 2015-12-02 华为技术有限公司 一种共享存储资源的方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130325812A1 (en) * 2012-05-30 2013-12-05 Spectra Logic Corporation System and method for archive in a distributed file system
CN103729250A (zh) * 2012-10-11 2014-04-16 国际商业机器公司 用于选择被配置为满足一组要求的数据节点的方法和系统
CN104838374A (zh) * 2012-12-06 2015-08-12 英派尔科技开发有限公司 分散hadoop集群
CN104731691A (zh) * 2013-12-18 2015-06-24 国际商业机器公司 动态调整分布式文件系统内文件副本数目的方法和系统
CN106446159A (zh) * 2016-09-23 2017-02-22 华为技术有限公司 一种存储文件的方法、第一虚拟机及名称节点

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111443872A (zh) * 2020-03-26 2020-07-24 深信服科技股份有限公司 分布式存储系统构建方法、装置、设备、介质
CN113641467A (zh) * 2021-10-19 2021-11-12 杭州优云科技有限公司 一种虚拟机的分布式块存储实现方法
CN113641467B (zh) * 2021-10-19 2022-02-11 杭州优云科技有限公司 一种虚拟机的分布式块存储实现方法

Also Published As

Publication number Publication date
CN106446159A (zh) 2017-02-22
CN106446159B (zh) 2019-11-12

Similar Documents

Publication Publication Date Title
WO2018054079A1 (fr) Procédé de stockage d'un fichier, première machine virtuelle et nœud de nom
US11354336B2 (en) Fault-tolerant key management system
US11157457B2 (en) File management in thin provisioning storage environments
US10382540B2 (en) Synchronizing storage state information
WO2020263765A1 (fr) Orchestrateur permettant l'orchestration d'opérations entre un environnement informatique hébergeant des machines virtuelles et un environnement de mémoire
JP2019101703A (ja) 記憶システム及び制御ソフトウェア配置方法
US10747673B2 (en) System and method for facilitating cluster-level cache and memory space
WO2019061352A1 (fr) Procédé et dispositif de chargement de données
WO2019148841A1 (fr) Système de stockage distribué, procédé de traitement de données et nœud de stockage
WO2016045428A1 (fr) Procédé de création d'une machine virtuelle et appareil de création d'une machine virtuelle
WO2021057108A1 (fr) Procédé de lecture de données, procédé d'écriture de données et serveur
US20160306550A1 (en) Constructing a scalable storage device, and scaled storage device
CN111147274B (zh) 为集群解决方案创建高度可用的仲裁集的系统和方法
CN114514500A (zh) 跨区复制块存储装置
WO2018157605A1 (fr) Procédé et dispositif de transmission de messages dans un système de fichiers en grappe
US20150074316A1 (en) Reflective memory bridge for external computing nodes
KR101601877B1 (ko) 분산 파일시스템에서 클라이언트가 데이터 저장에 참여하는 장치 및 방법
US10552067B2 (en) Method and system for delivering message in storage system
CN116389233A (zh) 容器云管理平台主备切换系统、方法、装置和计算机设备
WO2012171363A1 (fr) Procédé et équipement destinés à une opération de données dans un système de cache réparti
US10785295B2 (en) Fabric encapsulated resilient storage
WO2016046951A1 (fr) Système informatique et procédé de gestion de fichiers associé
US20200371849A1 (en) Systems and methods for efficient management of advanced functions in software defined storage systems
JPWO2015141219A1 (ja) ストレージシステム、制御装置、データアクセス方法およびプログラム
US11288004B1 (en) Consensus-based authority selection in replicated network-accessible block storage devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17852153

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17852153

Country of ref document: EP

Kind code of ref document: A1