WO2017167171A1 - Data operation method, server, and storage system - Google Patents

Data operation method, server, and storage system Download PDF

Info

Publication number
WO2017167171A1
WO2017167171A1 PCT/CN2017/078387 CN2017078387W WO2017167171A1 WO 2017167171 A1 WO2017167171 A1 WO 2017167171A1 CN 2017078387 W CN2017078387 W CN 2017078387W WO 2017167171 A1 WO2017167171 A1 WO 2017167171A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
block
address information
hdfs
Prior art date
Application number
PCT/CN2017/078387
Other languages
French (fr)
Chinese (zh)
Inventor
刘科佑
王�锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2017167171A1 publication Critical patent/WO2017167171A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • the present invention relates to the field of storage, and in particular, to a data operation method, a server, and a storage system.
  • Hadoop is an open source distributed computing platform, and its core includes HDFS (Hadoop Distributed Files System, Hadoop Distributed File System).
  • the HDFS includes a name node and a data node, a name node is used for management and processing of metadata, and a data node is used to store data in the form of a file.
  • the name node and the data node can be dedicated devices. It can also be software running on a normal computer. Usually, a dedicated machine runs the name node software, and each of the other machines runs a data node software. Multiple data node software can also be run on a single machine. Each machine running the data node software has a local file system.
  • HDFS is a logical file system built on multiple machine file systems, and its underlying data is stored in blocks. The data node stores the HDFS data into a local file system, wherein the data node does not know the existence of the HDFS file, and it stores the data block of each HDFS file as a separate file in the local file system.
  • the data is called a value, and each data corresponds to a unique key. According to the unique identifier, the position of the value can be directly located.
  • the key-value store no longer has a directory hierarchy similar to the file system, but is completely flattened, so that key-value storage is easier to expand capacity than file storage, and data can be read and written directly to the object layer, key values. Storage is more efficient in reading and writing than the storage structure of the directory structure.
  • the object of the present invention is to provide a data operation method, a server and a storage system, which can ensure that Hadoop uses key value storage under the premise of full functional support.
  • a data operation method is provided, the method being applied to a storage system, the storage system comprising a name node module, a data node module, and a key value KV storage device; the method comprising: the name node module receiving the distributed An operation request message sent by the file system HDFS client, the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate on the target data;
  • the request message is based on a ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform; determining a key key according to the file name of the target file included in the operation request message, and determining a storage space of the value value according to the key a location, the value is data of the target file; acquiring target block address information of the target data in the storage space according to start address information and data length information in the operation request message;
  • the HDFS client sends a response message for responding to the operation request message, the response message The target block comprising address information.
  • the response message is also based on the ClientProtocol communication protocol, and in the first aspect, after receiving the response message sent by the name node module, the HDFS client may be based on the Hadoop platform.
  • the ClientDatanodeProtocol communication protocol between the data node and the HDFS client sends an operation instruction including the target address information to the data node module, and the data node module can perform the operation indicated by the operation instruction on the target data according to the target address information.
  • the name node module and the HDFS client, the data node module and the HDFS client communicate with each other based on the Hadoop platform's native protocol, thus ensuring support for other Hadoop functions.
  • the data is stored at the bottom of the key value, which improves the efficiency of data read and write and capacity scalability.
  • the determining a key key according to a file name of the target file included in the operation request message includes: determining, according to the file name, the target file An index node inode number; the location of the storage space in which the value is determined by using the inode number as the key.
  • the inode number identifies the identifier of the computer identification file.
  • the data of one HDFS file is a value in the KV storage device, and the inode number of the HDFS file is a key key of value, according to an implementation mechanism of the key value storage.
  • the name node module can directly locate the value by the key.
  • the obtaining, according to the start address information and the data length information in the operation request message, The target block address information of the target data in the storage space includes: obtaining, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space And the block offset and block length in each of the blocks.
  • the block is a physical storage unit in the KV storage device.
  • the block address information of the target data in the HDFS includes each of the target data occupied in the HDFS. a number of the logical block, and a logical block offset and a logical block length in each of the logical blocks; the sending, to the HDFS client, a response message for responding to the operation request message, including: The number of the block is used as the number of the logical block, and the block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.
  • the response message returned by the name node to the HDFS client in the Hadoop platform includes the block address information of the target data in the file to which the target data belongs.
  • the block address of the target data in the file is a logical address
  • the data node passes the data when reading the data according to the logical block address.
  • the hierarchical structure of the node local file system finally obtains the target data.
  • the name node module returns the physical block address information in the KV storage device to the HDFS client, so that the data node module receives the data.
  • the target data can be directly operated in the KV storage device without passing through the file system, thereby improving the efficiency of data reading and writing.
  • the operation request message may be a read request message or a write request message.
  • a data operation method is provided, the method being applied to a storage system, the storage system comprising a name node module, a data node module, and a key value KV storage device; the method comprising: receiving, by the data node module An operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform.
  • the operation instruction includes block address information for storing the target data in the KV storage device; and performing an operation indicated by the operation instruction on the target data according to the block address information.
  • a name node module is provided, where the name node module is applied to a storage system, the storage system further includes a data node module and a key value KV storage device; the name node module includes: a receiving unit, configured to receive An operation request message sent by the distributed file system HDFS client, the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate the target data; The operation request message is based on a ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform; the determining unit is configured to determine a key key according to the file name of the target file included in the operation request message, and according to the Key: the location of the storage space of the value value, the value is the data of the target file; the obtaining unit is configured to acquire the target data according to the start address information and the data length information in the operation request message. Target block address information in the storage space; a sending unit, configured to send to the HDFS client Operation request response message in
  • the determining unit is specifically configured to: determine an inode number of the target file according to the file name; and determine the inode number as the key The location of the value storage space.
  • the acquiring unit is specifically configured to: according to the start address information and the data
  • the length information acquires a number of each block occupied by the target data in the storage space, and a block offset and a block length in each of the blocks.
  • the block address information of the target data in the HDFS includes each of the target data occupied in the HDFS. a number of a logical block, and a logical block offset and a logical block length in each of the logical blocks; the sending unit is specifically configured to: use a number of the block as a number of the logical block, The block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.
  • a data node module is provided, where the data node module is applied to a storage system, the storage system further includes a name node module and a key value KV storage device; the data node module includes: a receiving unit, configured to receive An operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform.
  • the operation instruction includes block address information for storing the target data in the KV storage device, and an operation unit, configured to perform an operation indicated by the operation instruction on the target data according to the block address information.
  • a server comprising the name node module of any of the third aspect or the possible implementation of the third aspect, and/or the server comprises the data node of the fourth aspect Module.
  • the server includes: a processor, a first interface, a second interface, and a communication bus; wherein the processor, the first interface, and the second interface are performed by using the communication bus Communication; the first interface is for communicating with a distributed file system HDFS client, the second interface is for communicating with a keyed KV storage device; the server is running name node software, and the server is The name node software performs the method of any one of the possible implementations of the first aspect or the first aspect above.
  • the server may further run data node software, where the server is executed by the data node software: receiving an operation instruction sent by the HDFS client, where the operation instruction is used to operate the target data
  • the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform; the operation instruction includes the Target block address information; performing an operation indicated by the operation instruction on the target data according to the target block address information.
  • the sixth aspect provides a storage system, where the storage system includes the name node module according to any one of the possible implementations of the third aspect, the data node module of the fourth aspect, and the key value.
  • a KV storage device the name node module is connected to the KV storage device, and the data node module is connected to the KV storage device.
  • the name node module and the data node module are deployed on the same server.
  • a seventh aspect a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
  • a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the second aspect.
  • FIG. 1 is a schematic diagram of an HDFS architecture according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for data operation according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a data reading method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of mapping of a physical storage unit in a file to a KV storage device according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of a name node module according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a data node module according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of another server according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of a storage system according to an embodiment of the present invention.
  • the HDFS architecture includes an HDFS client, a name node, a data node 1, and a data node 2, wherein the client is connected to the name node and the data node 1, respectively, and the data node 1 is connected to data node 2.
  • the name node runs the HDFS file system
  • each data node runs a local file system.
  • the HDFS architecture illustrates the data writing process based on the HDFS architecture shown in FIG. 1: the HDFS client sends a write request message to the name node, where the write request message includes a file name, start address information, and data length information, and the name node receives the data. After writing the request message, first determine whether the file exists. If it does not exist, create a new file in the file system running on the name node, and after the creation is successful, the file is divided into multiple blocks of fixed size.
  • the name node Determining, according to the start address information and the data length information, a list of data blocks to be written, the data block list including the number of each data block, and the data to be written in each data block. The offset and the length, after the HDFS client obtains the data block list, can send a write command to the data node, and write the data to be written to the data node.
  • the data reading process is as follows: the HDFS client sends a write request message to the name node, where the write request message includes a file name, a start address information, and data length information, and the name node sends the write request message to the HDFS client after receiving the write request message.
  • the terminal returns a data block list of the data to be read, and after receiving the data block list of the data to be read, the HDFS client sends a read command to read the data from the data node.
  • the interface provided by the name node to the HDFS client is an RPC (Remote Procedure Call Protocol) interface
  • the interface provided by the data node to the HDFS client is also an RPC interface
  • HDFS High Speed Transfer Protocol
  • the communication protocol between the client and the name node is the ClientProtocol protocol
  • the communication protocol between the client and the data node is the ClientDatanodeProtocol protocol.
  • An embodiment of the present invention provides a data operation method, where the method is applied to a storage system, where the storage system includes a name node module, a data node module, and a KV (key-value, key value) storage device, as shown in FIG. 2, Methods include:
  • the name node module receives an operation request message sent by the HDFS client.
  • the operation request message is used to request to obtain block address information in the HDFS of the target data to be operated in the target file to operate on the target data; the operation request message is based on the name node and the HDFS client in the Hadoop platform.
  • the ClientProtocol communication protocol is used to request to obtain block address information in the HDFS of the target data to be operated in the target file to operate on the target data; the operation request message is based on the name node and the HDFS client in the Hadoop platform.
  • the name node module provides an RPC interface to the HDFS client, and the name node module receives the operation request message sent by the HDFS client based on the RPC interface.
  • the operation request message may be a write request message for writing to the target data, for requesting to write the target data to a specified location in the target file, or may be a read request message for performing a read operation on the target data.
  • the target data is read from a specified location of the target file.
  • the name node module determines a key key according to a file name of the target file included in the operation request message, and determines a location of a storage space of the value value according to the key, where the value is data of the target file.
  • the data of a file stored in a KV storage device is called a value, and each value corresponds to a unique key. According to the unique identifier, the location of the storage space of the value can be directly located. .
  • m is an integer
  • each HashValue is a storage space, such as HashValue[0], HashValue[1], and each storage space is used.
  • the index node inode number corresponding to the file name of the target file may be used as the key of the data (value) of the target file.
  • the step S202 specifically includes: according to the file name. The inode number of the target file is determined, and the location of the storage space of the value is determined by using the inode number as a key.
  • the name node includes a list of directory entries, each directory entry consisting of two parts: the file name of the file included, and The file name corresponds to the inode number. Therefore, the name node module can determine the inode number corresponding to the file name of the target file by querying the directory entry list. It is worth noting that the file system does not use the file name internally, but uses the inode number to identify the file. In the file system, when operating on the data of a file, you need to find the inode number corresponding to the file name of the file. Second, obtain the inode information by the inode number, and finally find the block where the file data is located according to the inode information.
  • the inode number is the identifier of the file in the file system.
  • the inode number is used as the key of the file data to uniquely identify the value.
  • the name node module acquires target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message.
  • the name node module acquires, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and a block offset and a block in each block. length.
  • the address of the file presented by the HDFS file system to the HDFS client is continuous, that is, the files perceived by the HDFS client are continuously stored.
  • the data of the file is stored in blocks in the storage space of the KV storage device. Each block is a physical storage unit, and each physical storage unit has a pointer for pointing to the next unit, the target block address.
  • the information is the location information of each physical storage unit occupied by the target data in the storage space.
  • the name node module sends a response message to the HDFS client for responding to the operation request message, where the response message includes the target block address information.
  • the operation request message may be a message sent by the HDFS client to the getblocklocation interface of the name node module, where the parameter passed in is the file name of the target file, the start address of the target data in the target file, and the The length of the target data.
  • the parameters required by the interface are the number of each logical block occupied by the target data in the HDFS file system, and the logical block offset and logical block length in each logical block.
  • the name node module may use the number of the block in the storage space as the number of the logical block, and use the block offset as the logical block offset. Return to the HDFS client as the logical block length.
  • the data node module receives an operation instruction sent by the HDFS client, where the operation instruction includes the target block address information.
  • the operation instruction is used to operate the target data to be operated in the target file; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform.
  • the data node module performs an operation indicated by the operation instruction on the target data according to the block address information.
  • the data node module In the case that the operation instruction is a write operation instruction, the data node module writes the target data to a position specified by the target block address information, and in a case where the operation instruction is a read operation instruction, the data node module is from the target The target data is read by the location specified by the block address information.
  • the name node module communicates with the HDFS client based on the ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform, and the data node module is based on the ClientDatanodeProtocol communication between the data node and the HDFS client in the Hadoop platform.
  • the protocol communicates to ensure the support of other Hadoop functions.
  • the data of the HDFS file is stored at the bottom of the key value, thereby improving the data read and write efficiency and capacity scalability.
  • the HDFS client needs to read the target data with a starting address of 100M (megabytes) and a data length of 128M in the file named "first file", in the embodiment of the present invention, the data is read.
  • the method is shown in Figure 3 and includes:
  • the name node module receives a read request message sent by the HDFS client, where the read request message includes a file name, start address information, and data length information.
  • the file name is the "first file”
  • the start address information is 100M
  • the data length information is 128M.
  • the name node module determines an inode number of the file according to the file name.
  • the name node module calculates a location of a storage space of data (value) of the file according to the inode number (key).
  • step S302 and S303 reference may be made to the foregoing description of step S202, and details are not described herein again.
  • the name node module acquires target block address information of the target data in the storage space in the KV storage device.
  • the size of each block in the KV storage device can be set according to user requirements. If the size of each block in the KV storage device is 64M, as shown in FIG. 4, the starting address in the first file is 100M, and the data length is The target data of 128M occupies block 1, block 2 and block 3 in the storage space, wherein the offset in block 1 is 36M, the length is 28M, the offset in block 2 is 0, and the length is 64M. The offset in block 3 is 0 and the length is 36M.
  • the target block address information may be list information as shown in the following table:
  • the name node module sends a response message including the target block address information to the HDFS client.
  • the response protocol is based on the ClientProtocol communication protocol between the HDFS client and the name node in the Hadoop platform. Reference may be made to the description of step S204 above, and details are not described herein again.
  • the response message returned by the name node to the HDFS client includes block address information of the target data in the HDFS file system, and the block address information includes the number, the offset, and the length, but the HDFS does not perceive the name node to return.
  • the block address information is a logical address or a physical address. Therefore, the embodiment of the present invention can return the address information of the physical storage unit in the storage space of the KV storage device to the HDFS client.
  • the data node module receives a read command sent by the HDFS client, where the read command includes the target block address information.
  • the operation instruction is based on the ClientDatanodeProtocol communication protocol between the HDFS client and the data node in the Hadoop platform.
  • the data node module reads the target data from the KV storage device according to the target block address information.
  • the data node module sends the target data to the HDFS client.
  • the data storage of the lower layer is still the HDFS file system, which guarantees the support of other functions of Hadoop, and the data of the file is done at the bottom layer without the HDFS client being aware of it.
  • the key value storage the data reading does not need to go through the complicated hierarchical mechanism of the file system, the reading efficiency is improved, and the flat storage structure of the key value storage also improves the capacity scalability.
  • An embodiment of the present invention further provides a name node module 50, where the name node module 50 is applied to a storage system,
  • the storage system further includes a data node module and a key value KV storage device.
  • the name node module 50 is used to implement the corresponding steps in the foregoing method embodiments. As shown in FIG. 5, the name node module 50 includes:
  • the receiving unit 51 is configured to receive an operation request message sent by the distributed file system HDFS client, where the operation request message is used to request to acquire block address information in the HDFS of the target data to be operated in the target file, to The target data is operated; the operation request message is based on a ClientProtocol communication protocol between a name node and an HDFS client in the Hadoop platform;
  • a determining unit 52 configured to determine a key key according to a file name of the target file included in the operation request message, and determine a location of a storage space of a value value according to the key, where the value is data of the target file ;
  • the obtaining unit 53 is configured to acquire target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message;
  • the sending unit 54 is configured to send, to the HDFS client, a response message for responding to the operation request message, where the response message includes the target block address information.
  • the above name node module 50 is employed.
  • the name node module 50 communicates with the HDFS client based on the ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform.
  • the name node is for the HDFS client.
  • the HDFS file system is still present, and the data of the HDFS file is stored at the bottom of the key value, which improves the efficiency of reading and writing data and capacity scalability.
  • the determining unit 52 is specifically configured to: determine an index node inode number of the target file according to the file name; and determine the location of the storage space of the value by using the inode number as the key.
  • the obtaining unit 53 is configured to: obtain, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and each The block offset and block length in the block.
  • the block address information of the target data in the HDFS includes a number of each logical block occupied by the target data in the HDFS, and a logical block offset and a logical block in each of the logical blocks.
  • the sending unit 54 is specifically configured to: use the number of the block as the number of the logical block, use the block offset as the logical block offset, and use the block length as the logic. The block length is sent to the HDFS client.
  • the unit division of the name node module is only a logical function division. In actual implementation, there may be another division manner. For example, the determination unit 52 and the acquisition unit 53 are divided into one processing unit. . Moreover, the physical implementation of each of the above functional units may also have multiple implementations.
  • the embodiment of the present invention further provides a data node module 60, where the data node module is applied to a storage system, and the storage system further includes a name node module and a key value KV storage device, where the data node module 60 is used to implement the foregoing method embodiment.
  • the data node module 60 includes:
  • the receiving unit 61 is configured to receive an operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on the data node and the HDFS client in the Hadoop platform. a ClientDatanodeProtocol communication protocol between the ends; the operation instruction includes block address information for storing the target data in the KV storage device;
  • the operation unit 62 is configured to perform an operation indicated by the operation instruction on the target data according to the block address information.
  • the server 80 includes:
  • a processor 81 for communicating with a distributed file system HDFS client
  • the second interface 83 is for communicating with a key value KV storage device
  • the server runs name node software, and the server is executed by the name node software The following operation:
  • the HDFS client Receiving an operation request message sent by the HDFS client, where the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate on the target data;
  • the message is based on the ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform;
  • determining, according to the file name of the target file included in the operation request message, a key key including: determining an index node inode number of the target file according to the file name; using the inode number as The key determines the location of the storage space of the value.
  • the acquiring the target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message including: according to the start address information and the Describe the data length information, obtain the number of each block occupied by the target data in the storage space, and the block offset and block length in each of the blocks.
  • the block address information of the target data in the HDFS includes a number of each logical block occupied by the target data in the HDFS, and a logical block offset and a logical block in each of the logical blocks.
  • the server 80 may also run data node software, where the server 80 performs: receiving, by the data node software, an operation instruction sent by the HDFS client, The operation instruction is configured to operate on the target data; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform; the operation instruction includes the target block address information; according to the target block The address information performs an operation indicated by the operation instruction on the target data.
  • the server 80 may also include other devices, such as storage media, for storing program instructions, not shown in FIG.
  • other devices such as storage media, for storing program instructions, not shown in FIG.
  • the operations performed by the processor 81 may be performed by the cooperation of other devices.
  • the embodiment of the present invention is uniformly described as the operation of the processor 81 to perform data sorting.
  • the processor 81 in the embodiment of the present invention may be a CPU (Center Processing Unit).
  • the processor 81 may also be an FPGA (Field Programmable Gate Array) or other hardware, or the processor 81 may also be a CPU and an FPGA or other hardware, then the FPGA or Other hardware and CPU respectively perform part of operations in the embodiments of the present invention.
  • FPGA Field Programmable Gate Array
  • the embodiment of the present invention further provides a storage system 90.
  • the storage system 90 includes:
  • the name node module 50, the data node module 60, the KV storage device 91, the name node module 50 is connected to the KV storage device 91, and the data node module 60 is connected to the KV storage device 91.
  • the name node module 50 is connected to the HDFS client, and the data node module 60 is connected to the HDFS client.
  • the name node module 50 includes an INTF_Namenode interface for providing an RPC interface to the HDFS client.
  • the name node module 50 can receive the metadata processing or management command sent by the HDFS client through the RPC interface.
  • the data node module 60 includes an INTF_Datanode interface, and is configured to provide an RPC interface to the client, and the data node module 60 receives the RPC interface.
  • the KV storage device 91 provides a standard key-value interface, INTF_KV, for the name node module 50 and the data node module 60.
  • the name node module 50 can be specifically referred to the specific description of FIG. 5 above.
  • the data node module 60 can be specifically referred to the foregoing detailed description of FIG. 5, and details are not described herein again.
  • the name node module 50 and the data node module 60 may be deployed on the same server at the same time, or may be separately deployed on different servers.
  • the storage system 90 shown in FIG. 9 includes only one name node module and one data node module.
  • the number of data node modules included in the storage system and the data of the name node module may be According to the actual requirements, when there are multiple name node modules and multiple data node modules, the HDFS client can first use the DNS (Domain Name System) polling mode when connecting the name node module. Obtain the address of a name node module. For the address of multiple data node modules returned by the name node module, the HDFS client can select the nearest data node module to connect.
  • DNS Domain Name System
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the software functional units described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform portions of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a RAM (Random Access Memory), a magnetic disk, or an optical disk, and the like, which can store data.

Abstract

A data operation method, a server and a storage system, which relate to the field of storage and can ensure that a Hadoop performs storage by using a key value on the premise that a function is fully supported. The method comprises: a name node module receives an operation request message sent by an HDFS client, the operation request message being based on a communication protocol (ClientProtocol) between a name node and the HDFS client in a Hadoop platform; the name node module determines a key according to a file name of a target file comprised in the operation request message, and determines the location of a storage space of a value according to the key, the value being data of the target file; the name node module obtains target block address information of the target data in the storage space according to starting address information and data length information in the operation request message; and the name node module sends, to the HDFS client, a response message for responding to the operation request message, the response message comprising the target block address information.

Description

一种数据操作方法,服务器及存储系统Data operation method, server and storage system 技术领域Technical field
本发明涉及存储领域,尤其涉及一种数据操作方法,服务器及存储系统。The present invention relates to the field of storage, and in particular, to a data operation method, a server, and a storage system.
背景技术Background technique
现有技术中对大数据的处理采用基于Hadoop的平台。Hadoop是一个开源分布式计算平台,其核心包括HDFS(Hadoop Distributed Files System,Hadoop分布式文件系统)。The processing of big data in the prior art adopts a Hadoop-based platform. Hadoop is an open source distributed computing platform, and its core includes HDFS (Hadoop Distributed Files System, Hadoop Distributed File System).
HDFS包括名字节点和数据节点,名字节点用于元数据的管理和处理,数据节点用于以文件形式存储数据。名字节点和数据节点可以是专用设备。也可以是运行在普通计算机上的软件,通常由一台专门的机器来运行名字节点软件,其他每台机器运行一个数据节点软件。一台机器上也可以运行多个数据节点软件。每台运行数据节点软件的机器具有本地的文件系统。HDFS是建立在多个机器文件系统上的一个逻辑上的文件系统,它的底层数据以数据块方式进行存储。数据节点将HDFS数据存储到本地的文件系统中,其中,数据节点并不知道HDFS文件的存在,它在本地文件系统中以单独的文件存储每一个HDFS文件的数据块。The HDFS includes a name node and a data node, a name node is used for management and processing of metadata, and a data node is used to store data in the form of a file. The name node and the data node can be dedicated devices. It can also be software running on a normal computer. Usually, a dedicated machine runs the name node software, and each of the other machines runs a data node software. Multiple data node software can also be run on a single machine. Each machine running the data node software has a local file system. HDFS is a logical file system built on multiple machine file systems, and its underlying data is stored in blocks. The data node stores the HDFS data into a local file system, wherein the data node does not know the existence of the HDFS file, and it stores the data block of each HDFS file as a separate file in the local file system.
键值(key-value)存储中,数据被称作值(value),每个数据对应着一个唯一的标识(key),根据唯一标识(key)可以直接定位到值(value)的位置,因此,键值存储不再有类似文件系统的目录层级结构,而是完全扁平化存储,这样,相比文件存储,键值存储更容易进行容量扩展,并且由于数据读写可以直通对象层,键值存储比目录结构的存储方式的读写效率也更高。In the key-value storage, the data is called a value, and each data corresponds to a unique key. According to the unique identifier, the position of the value can be directly located. The key-value store no longer has a directory hierarchy similar to the file system, but is completely flattened, so that key-value storage is easier to expand capacity than file storage, and data can be read and written directly to the object layer, key values. Storage is more efficient in reading and writing than the storage structure of the directory structure.
如何把Hadoop与键值存储这两种先进技术进行结合,是业界亟待解决的问题。但是,由于Hadoop的一些功能的实现直接依赖HDFS,例如,HBase(Hadoop database,Hadoop的数据库)备份以及系统查询impala等,直接利用键值存储系统替换Hadoop中的HDFS,将导致Hadoop的功能支撑不全,因此,现有技术中还没有使用键值存储系统结合HDFS的完善方案。How to combine the two advanced technologies of Hadoop and key value storage is an urgent problem to be solved in the industry. However, since the implementation of some functions of Hadoop directly depends on HDFS, for example, HBase (Hadoop database, Hadoop database) backup and system query impala, directly replacing the HDFS in Hadoop with a key-value storage system will result in incomplete Hadoop function support. Therefore, there is no perfect solution in the prior art that uses a key value storage system in combination with HDFS.
发明内容Summary of the invention
本发明的目的是提供一种数据操作方法,服务器及存储系统,能够保证Hadoop在功能支撑完全的前提下使用键值存储。The object of the present invention is to provide a data operation method, a server and a storage system, which can ensure that Hadoop uses key value storage under the premise of full functional support.
为了达到上述目的,本发明采用如下的技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
第一方面,提供一种数据操作方法,所述方法应用于存储系统,所述存储系统包括名称节点模块,数据节点模块,以及键值KV存储设备;所述方法包括:名称节点模块接收分布式文件系统HDFS客户端发送的操作请求消息,所述操作请求消息是用于请求获取目标文件中待操作的目标数据的在HDFS中的块地址信息,以对所述目标数据进行操作;所述操作请求消息基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议;根据所述操作请求消息中包括的所述目标文件的文件名确定键key,并根据所述key确定值value的存储空间的位置,所述value为所述目标文件的数据;根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息;向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,所述响应消息包括所述目标块地址信息。其中,该响应消息也是基于该ClientProtocol通信协议,并且,第一方面中,HDFS客户端接收到名称节点模块发送的该响应消息后,可以基于Hadoop平台中 数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议将包括该目标地址信息的操作指令发送至数据节点模块,该数据节点模块可以根据该目标地址信息对目标数据进行该操作指令指示的操作。这样,名称节点模块与HDFS客户端之间、数据节点模块与HDFS客户端之间均是基于Hadoop平台的原生协议进行通信,从而保证了对Hadoop其他功能的支撑,在此前提下,由于HDFS文件的数据在底层做键值存储,从而提高了数据的读写效率以及容量扩展性。In a first aspect, a data operation method is provided, the method being applied to a storage system, the storage system comprising a name node module, a data node module, and a key value KV storage device; the method comprising: the name node module receiving the distributed An operation request message sent by the file system HDFS client, the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate on the target data; The request message is based on a ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform; determining a key key according to the file name of the target file included in the operation request message, and determining a storage space of the value value according to the key a location, the value is data of the target file; acquiring target block address information of the target data in the storage space according to start address information and data length information in the operation request message; The HDFS client sends a response message for responding to the operation request message, the response message The target block comprising address information. The response message is also based on the ClientProtocol communication protocol, and in the first aspect, after receiving the response message sent by the name node module, the HDFS client may be based on the Hadoop platform. The ClientDatanodeProtocol communication protocol between the data node and the HDFS client sends an operation instruction including the target address information to the data node module, and the data node module can perform the operation indicated by the operation instruction on the target data according to the target address information. In this way, the name node module and the HDFS client, the data node module and the HDFS client communicate with each other based on the Hadoop platform's native protocol, thus ensuring support for other Hadoop functions. Under this premise, due to HDFS files. The data is stored at the bottom of the key value, which improves the efficiency of data read and write and capacity scalability.
在结合第一方面的第一种可能的实现方式中,所述根据所述操作请求消息中包括的所述目标文件的文件名确定键key,包括:根据所述文件名确定所述目标文件的索引节点inode编号;将所述inode编号作为所述key确定所述value的存储空间的位置。inode编号计算机识别文件的标识,在上述可能的实现方式中,一个HDFS文件的数据为KV存储设备中一个value,HDFS文件的inode编号为value的键key,根据键值存储的实现机制,所述名称节点模块通过key可以直接定位到value的位置。In conjunction with the first possible implementation of the first aspect, the determining a key key according to a file name of the target file included in the operation request message includes: determining, according to the file name, the target file An index node inode number; the location of the storage space in which the value is determined by using the inode number as the key. The inode number identifies the identifier of the computer identification file. In the above possible implementation manner, the data of one HDFS file is a value in the KV storage device, and the inode number of the HDFS file is a key key of value, according to an implementation mechanism of the key value storage. The name node module can directly locate the value by the key.
结合第一方面或者第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,所述根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息,包括:根据所述起始地址信息以及所述数据长度信息,获取所述目标数据在所述存储空间中占用的每个块的编号,以及在每个所述块中的块偏移量和块长度。所述块为KV存储设备中的物理存储单元,上述可能的实现方式中,名称节点在确定value的存储空间的位置后,根据起始地址信息和数据长度信息可确定待操作的目标数据位于该存储空间的哪些物理存储单元上。With reference to the first aspect, or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the obtaining, according to the start address information and the data length information in the operation request message, The target block address information of the target data in the storage space includes: obtaining, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space And the block offset and block length in each of the blocks. The block is a physical storage unit in the KV storage device. In the above possible implementation manner, after determining the location of the storage space of the value, the name node may determine that the target data to be operated is located according to the start address information and the data length information. Which physical storage units of storage space are on.
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,所述目标数据在HDFS中的块地址信息包括所述目标数据在HDFS中占用的每个逻辑块的编号,以及在每个所述逻辑块中的逻辑块偏移量和逻辑块长度;所述向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,包括:将所述块的编号作为所述逻辑块的编号,将所述块偏移量作为所述逻辑块偏移量,将所述块长度作为所述逻辑块长度发送至所述HDFS客户端。Hadoop平台中名称节点返回至HDFS客户端的响应消息包括目标数据在所属文件中的块地址信息,目标数据在文件中的块地址是逻辑地址,数据节点根据该逻辑块地址读取数据时,经过数据节点本地文件系统的层级结构最终获取到目标数据,在上述可能的实现方式中,所述名称节点模块将KV存储设备中的物理块地址信息返回至HDFS客户端,这样,数据节点模块在接收到HDFS客户端发送的包括该物理块地址信息的操作指令后,可以直接在KV存储设备中对所述目标数据进行操作,无需经过文件系统,从而提高了数据读写的效率。With reference to the second possible implementation of the first aspect, in a third possible implementation manner of the first aspect, the block address information of the target data in the HDFS includes each of the target data occupied in the HDFS. a number of the logical block, and a logical block offset and a logical block length in each of the logical blocks; the sending, to the HDFS client, a response message for responding to the operation request message, including: The number of the block is used as the number of the logical block, and the block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length. The response message returned by the name node to the HDFS client in the Hadoop platform includes the block address information of the target data in the file to which the target data belongs. The block address of the target data in the file is a logical address, and the data node passes the data when reading the data according to the logical block address. The hierarchical structure of the node local file system finally obtains the target data. In the above possible implementation manner, the name node module returns the physical block address information in the KV storage device to the HDFS client, so that the data node module receives the data. After the HDFS client sends the operation instruction including the physical block address information, the target data can be directly operated in the KV storage device without passing through the file system, thereby improving the efficiency of data reading and writing.
在上述第一方面或者以上第一方面的任一种可能的实现方式中,该操作请求消息可以为读请求消息,也可以为写请求消息。In any one of the foregoing possible implementation manners of the first aspect, the operation request message may be a read request message or a write request message.
第二方面,提供一种数据操作方法,所述方法应用于存储系统,所述存储系统包括名称节点模块,数据节点模块,以及键值KV存储设备;所述方法包括:所述数据节点模块接收分布式文件系统HDFS客户端发送的操作指令,所述操作指令用于对目标文件中待操作的目标数据进行操作;所述操作指令基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议;所述操作指令包括所述KV存储设备中存储所述目标数据的块地址信息;根据所述块地址信息对所述目标数据进行所述操作指令指示的操作。 In a second aspect, a data operation method is provided, the method being applied to a storage system, the storage system comprising a name node module, a data node module, and a key value KV storage device; the method comprising: receiving, by the data node module An operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform. The operation instruction includes block address information for storing the target data in the KV storage device; and performing an operation indicated by the operation instruction on the target data according to the block address information.
第三方面,提供一种名称节点模块,所述名称节点模块应用于存储系统,所述存储系统还包括数据节点模块以及键值KV存储设备;所述名称节点模块包括:接收单元,用于接收分布式文件系统HDFS客户端发送的操作请求消息,所述操作请求消息是用于请求获取目标文件中待操作的目标数据的在HDFS中的块地址信息,以对所述目标数据进行操作;所述操作请求消息基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议;确定单元,用于根据所述操作请求消息中包括的所述目标文件的文件名确定键key,并根据所述key确定值value的存储空间的位置,所述value为所述目标文件的数据;获取单元,用于根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息;发送单元,用于向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,所述响应消息包括所述目标块地址信息。In a third aspect, a name node module is provided, where the name node module is applied to a storage system, the storage system further includes a data node module and a key value KV storage device; the name node module includes: a receiving unit, configured to receive An operation request message sent by the distributed file system HDFS client, the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate the target data; The operation request message is based on a ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform; the determining unit is configured to determine a key key according to the file name of the target file included in the operation request message, and according to the Key: the location of the storage space of the value value, the value is the data of the target file; the obtaining unit is configured to acquire the target data according to the start address information and the data length information in the operation request message. Target block address information in the storage space; a sending unit, configured to send to the HDFS client Operation request response message in response to the message, the response message includes address information of the target block.
在结合第三方面的第一种可能的实现方式中,所述确定单元具体用于:根据所述文件名确定所述目标文件的索引节点inode编号;将所述inode编号作为所述key确定所述value的存储空间的位置。In a first possible implementation manner of the third aspect, the determining unit is specifically configured to: determine an inode number of the target file according to the file name; and determine the inode number as the key The location of the value storage space.
结合第三方面或者第三方面的第一种可能的实现方式,在第三方面的第二种可能的实现方式中,所述获取单元具体用于:根据所述起始地址信息以及所述数据长度信息,获取所述目标数据在所述存储空间中占用的每个块的编号,以及在每个所述块中的块偏移量和块长度。With the third aspect or the first possible implementation manner of the third aspect, in a second possible implementation manner of the third aspect, the acquiring unit is specifically configured to: according to the start address information and the data The length information acquires a number of each block occupied by the target data in the storage space, and a block offset and a block length in each of the blocks.
结合第三方面的第二种可能的实现方式,在第三方面的第三种可能的实现方式中,所述目标数据在HDFS中的块地址信息包括所述目标数据在HDFS中占用的每个逻辑块的编号,以及在每个所述逻辑块中的逻辑块偏移量和逻辑块长度;所述发送单元具体用于:将所述块的编号作为所述逻辑块的编号,将所述块偏移量作为所述逻辑块偏移量,将所述块长度作为所述逻辑块长度发送至所述HDFS客户端。In conjunction with the second possible implementation of the third aspect, in a third possible implementation manner of the third aspect, the block address information of the target data in the HDFS includes each of the target data occupied in the HDFS. a number of a logical block, and a logical block offset and a logical block length in each of the logical blocks; the sending unit is specifically configured to: use a number of the block as a number of the logical block, The block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.
第四方面,提供一种数据节点模块,所述数据节点模块应用于存储系统,所述存储系统还包括名称节点模块以及键值KV存储设备;所述数据节点模块包括:接收单元,用于接收分布式文件系统HDFS客户端发送的操作指令,所述操作指令用于对目标文件中待操作的目标数据进行操作;所述操作指令基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议;所述操作指令包括所述KV存储设备中存储所述目标数据的块地址信息;操作单元,用于根据所述块地址信息对所述目标数据进行所述操作指令指示的操作。In a fourth aspect, a data node module is provided, where the data node module is applied to a storage system, the storage system further includes a name node module and a key value KV storage device; the data node module includes: a receiving unit, configured to receive An operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform. The operation instruction includes block address information for storing the target data in the KV storage device, and an operation unit, configured to perform an operation indicated by the operation instruction on the target data according to the block address information.
第五方面,提供一种服务器,所述服务器包括第三方面或者第三方面的任一项可能的实现方式所述的名称节点模块,和/或所述服务器包括第四方面所述的数据节点模块。In a fifth aspect, a server is provided, the server comprising the name node module of any of the third aspect or the possible implementation of the third aspect, and/or the server comprises the data node of the fourth aspect Module.
另一种实现方式,具体地,所述服务器包括:处理器、第一接口、第二接口和通信总线;所述处理器、所述第一接口和所述第二接口通过所述通信总线进行通信;所述第一接口用于与分布式文件系统HDFS客户端进行通信,所述第二接口用于与键值KV存储设备进行通信;所述服务器运行名称节点软件,所述服务器通过所述名称节点软件执行第一方面或者以上第一方面的任一种可能的实现方式所述的方法。可选地,所述的服务器还可以运行数据节点软件,所述服务器通过所述数据节点软件执行:接收所述HDFS客户端发送的操作指令,所述操作指令用于对所述目标数据进行操作;所述操作指令基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议;所述操作指令包括所述 目标块地址信息;根据所述目标块地址信息对所述目标数据进行所述操作指令指示的操作。In another implementation, specifically, the server includes: a processor, a first interface, a second interface, and a communication bus; wherein the processor, the first interface, and the second interface are performed by using the communication bus Communication; the first interface is for communicating with a distributed file system HDFS client, the second interface is for communicating with a keyed KV storage device; the server is running name node software, and the server is The name node software performs the method of any one of the possible implementations of the first aspect or the first aspect above. Optionally, the server may further run data node software, where the server is executed by the data node software: receiving an operation instruction sent by the HDFS client, where the operation instruction is used to operate the target data The operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform; the operation instruction includes the Target block address information; performing an operation indicated by the operation instruction on the target data according to the target block address information.
第六方面,提供一种存储系统,所述存储系统包括第三方面或者以上第三方面的任一种可能的实现方式所述的名称节点模块,第四方面所述的数据节点模块,键值KV存储设备,所述名称节点模块与所述KV存储设备相连,所述数据节点模块与所述KV存储设备相连。The sixth aspect provides a storage system, where the storage system includes the name node module according to any one of the possible implementations of the third aspect, the data node module of the fourth aspect, and the key value. a KV storage device, the name node module is connected to the KV storage device, and the data node module is connected to the KV storage device.
在结合第六方面的第一种可能的实现方式中,所述名称节点模块与所述数据节点模块部署在同一服务器上。In conjunction with the first possible implementation of the sixth aspect, the name node module and the data node module are deployed on the same server.
第七方面,提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行第一方面或第一方面的任一可能的实现方式中的方法的指令。A seventh aspect, a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
第八方面,提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行第二方面中的方法的指令。In an eighth aspect, a computer readable medium is provided for storing a computer program, the computer program comprising instructions for performing the method of the second aspect.
本发明在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。Based on the implementations provided by the above aspects, the present invention may further be combined to provide further implementations.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.
图1为本发明实施例提供的一种HDFS架构的示意图;FIG. 1 is a schematic diagram of an HDFS architecture according to an embodiment of the present invention;
图2为本发明实施例提供的一种数据操作的方法的流程示意图;2 is a schematic flowchart of a method for data operation according to an embodiment of the present invention;
图3为本发明实施例提供的一种数据读取方法的流程示意图;3 is a schematic flowchart of a data reading method according to an embodiment of the present invention;
图4为本发明实施例提供的文件到KV存储设备中的物理存储单元的映射示意图;4 is a schematic diagram of mapping of a physical storage unit in a file to a KV storage device according to an embodiment of the present invention;
图5为本发明实施例提供的一种名称节点模块的结构示意图;FIG. 5 is a schematic structural diagram of a name node module according to an embodiment of the present disclosure;
图6为本发明实施例提供的一种数据节点模块的结构示意图;FIG. 6 is a schematic structural diagram of a data node module according to an embodiment of the present disclosure;
图7为本发明实施例提供的一种服务器的结构示意图;FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present disclosure;
图8为本发明实施例提供的另一种服务器的结构示意图;FIG. 8 is a schematic structural diagram of another server according to an embodiment of the present disclosure;
图9为本发明实施例提供的一种存储系统的结构示意图。FIG. 9 is a schematic structural diagram of a storage system according to an embodiment of the present invention.
具体实施方式detailed description
为了使本领域的技术人员更容易理解本发明实施例对现有技术进行的改进,下面首先对现有技术中的方案进行简单介绍。In order to make it easier for those skilled in the art to understand the improvements of the prior art in the embodiments of the present invention, the following briefly introduces the solutions in the prior art.
图1为一种HDFS架构的示意图,如图所示,该HDFS架构包括HDFS客户端,名称节点,数据节点1,数据节点2,其中,客户端分别与名称节点以及数据节点1相连,数据节点1与数据节点2相连。其中,名称节点运行有HDFS文件系统,每个数据节点运行有本地文件系统。1 is a schematic diagram of an HDFS architecture, as shown in the figure, the HDFS architecture includes an HDFS client, a name node, a data node 1, and a data node 2, wherein the client is connected to the name node and the data node 1, respectively, and the data node 1 is connected to data node 2. Among them, the name node runs the HDFS file system, and each data node runs a local file system.
基于图1所示的HDFS架构说明数据的写入流程:HDFS客户端向名称节点发送写请求消息,该写请求消息包括文件名、起始地址信息和数据长度信息,该名称节点在接收到该写请求消息后,首先确定文件是否存在,若不存在,则在名称节点运行的文件系统中创建一个新的文件,并在创建成功后,该文件划分成固定大小的多个数据块(block),并为每个数据块分配数据节点,其中,数据节点将划分后的每个数据块作为本地文件系统中的一个文件进行存储,同一数据块可有多个副本存储在不同数据节点上;若存在,则该名称节点 根据该起始地址信息和该数据长度信息可确定待写入的数据将要写入的数据块的列表,该数据块列表包括每个数据块的编号,待写入的数据在每个数据块的偏移量以及长度,该HDFS客户端获取到该数据块列表后,可发送写入指令至数据节点,将待写入的数据写入数据节点。The HDFS architecture illustrates the data writing process based on the HDFS architecture shown in FIG. 1: the HDFS client sends a write request message to the name node, where the write request message includes a file name, start address information, and data length information, and the name node receives the data. After writing the request message, first determine whether the file exists. If it does not exist, create a new file in the file system running on the name node, and after the creation is successful, the file is divided into multiple blocks of fixed size. And allocating a data node for each data block, wherein the data node stores each divided data block as a file in the local file system, and the same data block may have multiple copies stored on different data nodes; The name node Determining, according to the start address information and the data length information, a list of data blocks to be written, the data block list including the number of each data block, and the data to be written in each data block The offset and the length, after the HDFS client obtains the data block list, can send a write command to the data node, and write the data to be written to the data node.
数据读取流程如下:HDFS客户端向名称节点发送写请求消息,该写请求消息包括文件名、起始地址信息和数据长度信息,该名称节点在接收到该写请求消息后,向该HDFS客户端返回待读取的数据的数据块列表,该HDFS客户端接收到待读取的数据的数据块列表后,发送读取指令从数据节点中读取该数据。The data reading process is as follows: the HDFS client sends a write request message to the name node, where the write request message includes a file name, a start address information, and data length information, and the name node sends the write request message to the HDFS client after receiving the write request message. The terminal returns a data block list of the data to be read, and after receiving the data block list of the data to be read, the HDFS client sends a read command to read the data from the data node.
值得说明的是,上述HDFS架构中,名称节点向HDFS客户端提供的接口为RPC(Remote Procedure Call Protocol,远程过程调用协议)接口,数据节点向HDFS客户端提供的接口也是RPC接口,并且,HDFS客户端与名称节点之间的通信协议为ClientProtocol协议,客户端与数据节点之间的通信协议为ClientDatanodeProtocol协议。It is worth noting that, in the HDFS architecture, the interface provided by the name node to the HDFS client is an RPC (Remote Procedure Call Protocol) interface, and the interface provided by the data node to the HDFS client is also an RPC interface, and HDFS. The communication protocol between the client and the name node is the ClientProtocol protocol, and the communication protocol between the client and the data node is the ClientDatanodeProtocol protocol.
本发明实施例提供一种数据操作方法,该方法应用于存储系统,该存储系统包括名称节点模块,数据节点模块,以及KV(key-value,键值)存储设备,如图2所示,该方法包括:An embodiment of the present invention provides a data operation method, where the method is applied to a storage system, where the storage system includes a name node module, a data node module, and a KV (key-value, key value) storage device, as shown in FIG. 2, Methods include:
S201、名称节点模块接收HDFS客户端发送的操作请求消息。S201. The name node module receives an operation request message sent by the HDFS client.
其中,该操作请求消息是用于请求获取目标文件中待操作的目标数据的在HDFS中的块地址信息,以对该目标数据进行操作;该操作请求消息基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议。The operation request message is used to request to obtain block address information in the HDFS of the target data to be operated in the target file to operate on the target data; the operation request message is based on the name node and the HDFS client in the Hadoop platform. The ClientProtocol communication protocol.
该名称节点模块向HDFS客户端提供RPC接口,该名称节点模块基于该RPC接口接收该HDFS客户端发送的该操作请求消息。The name node module provides an RPC interface to the HDFS client, and the name node module receives the operation request message sent by the HDFS client based on the RPC interface.
该操作请求消息可以是对该目标数据进行写操作的写请求消息,用于请求将该目标数据写入目标文件中的指定位置,也可以是对该目标数据进行读操作的读请求消息,用于从目标文件的指定位置读取该目标数据。The operation request message may be a write request message for writing to the target data, for requesting to write the target data to a specified location in the target file, or may be a read request message for performing a read operation on the target data. The target data is read from a specified location of the target file.
S202、该名称节点模块根据该操作请求消息中包括的目标文件的文件名确定键key,并根据该key确定值value的存储空间的位置,该value为该目标文件的数据。S202. The name node module determines a key key according to a file name of the target file included in the operation request message, and determines a location of a storage space of the value value according to the key, where the value is data of the target file.
值得说明的是,KV存储设备中存储的一个文件的数据被称作一个value,每个value对应着一个唯一的标识(key),根据唯一标识(key)可以直接定位到value的存储空间的位置。例如,在KV存储设备中定义一个大的有序结构数组HashValue[m],m为整数,每个HashValue即为一个存储空间,如HashValue[0],HashValue[1],每个存储空间用于存储一个文件的数据(value)。并且构造一个哈希函数ChangeToHashValue(key),将每个value的唯一标识key转换为HashValue[m]中的某个下标值x,然后将每个文件的数据放进HashValue[x]中去,再次需要对文件中的数据进行操作时,根据该文件的key使用哈希函数ChangeToHashValue(key)计算即可得到这个下标值,由此确定该文件的数据的存储空间的位置。It is worth noting that the data of a file stored in a KV storage device is called a value, and each value corresponds to a unique key. According to the unique identifier, the location of the storage space of the value can be directly located. . For example, define a large ordered structure array HashValue[m] in the KV storage device, where m is an integer, and each HashValue is a storage space, such as HashValue[0], HashValue[1], and each storage space is used. Stores the data (value) of a file. And construct a hash function ChangeToHashValue (key), convert each value's unique identifier key to a subscript value x in HashValue[m], and then put the data of each file into HashValue[x], When the data in the file needs to be operated again, the subscript value can be obtained by using the hash function ChangeToHashValue(key) according to the key of the file, thereby determining the location of the storage space of the data of the file.
可选地,本发明实施例可以将该目标文件的文件名对应的索引节点inode编号作为该目标文件的数据(value)的key,在此种情况下,上述步骤S202具体包括:根据该文件名确定该目标文件的inode编号,以该inode编号作为key确定value的存储空间的位置。Optionally, in the embodiment of the present invention, the index node inode number corresponding to the file name of the target file may be used as the key of the data (value) of the target file. In this case, the step S202 specifically includes: according to the file name. The inode number of the target file is determined, and the location of the storage space of the value is determined by using the inode number as a key.
该名称节点包括目录项列表,每个目录项由两部分组成:所包含文件的文件名,以及 该文件名对应的inode编号,因此,该名称节点模块通过查询该目录项列表即可确定该目标文件的文件名对应的inode编号。值得说明的是,文件系统内部不使用文件名,而使用inode编号来识别文件。文件系统中,在对某一文件的数据进行操作时,需要找到该文件的文件名对应的inode编号,其次,通过inode编号获取inode信息,最后根据inode信息,找到文件数据所在的block。The name node includes a list of directory entries, each directory entry consisting of two parts: the file name of the file included, and The file name corresponds to the inode number. Therefore, the name node module can determine the inode number corresponding to the file name of the target file by querying the directory entry list. It is worth noting that the file system does not use the file name internally, but uses the inode number to identify the file. In the file system, when operating on the data of a file, you need to find the inode number corresponding to the file name of the file. Second, obtain the inode information by the inode number, and finally find the block where the file data is located according to the inode information.
由上可知,inode编号为文件系统中文件的标识,在本发明实施例的一种可选的实现方式中,将inode编号作为文件数据(value)的key,用于唯一标识该value。It can be seen that the inode number is the identifier of the file in the file system. In an optional implementation of the embodiment of the present invention, the inode number is used as the key of the file data to uniquely identify the value.
S203、该名称节点模块根据该操作请求消息中的起始地址信息以及数据长度信息获取目标数据在该存储空间中的目标块地址信息。S203. The name node module acquires target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message.
具体地,该名称节点模块根据该起始地址信息以及该数据长度信息,获取该目标数据在该存储空间中占用的每个块的编号,以及在每个该块中的块偏移量和块长度。Specifically, the name node module acquires, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and a block offset and a block in each block. length.
值得说明的是,HDFS文件系统向HDFS客户端呈现的文件的地址是连续的,也就是说,HDFS客户端感知到的文件是连续存储的。而文件的数据(value)在KV存储设备的存储空间中分块进行存储,每个块为一个物理存储单元,每个物理存储单元都存在一个指针,用于指向下一个单元,该目标块地址信息即该目标数据在该存储空间中占用的每个物理存储单元的位置信息。It is worth noting that the address of the file presented by the HDFS file system to the HDFS client is continuous, that is, the files perceived by the HDFS client are continuously stored. The data of the file is stored in blocks in the storage space of the KV storage device. Each block is a physical storage unit, and each physical storage unit has a pointer for pointing to the next unit, the target block address. The information is the location information of each physical storage unit occupied by the target data in the storage space.
S204、该名称节点模块向该HDFS客户端发送用于响应该操作请求消息的响应消息,该响应消息包括该目标块地址信息。S204. The name node module sends a response message to the HDFS client for responding to the operation request message, where the response message includes the target block address information.
该操作请求消息可以是HDFS客户端调用名称节点模块的getblocklocation接口传入的消息,该接口传入的参数为该目标文件的文件名、该目标数据在该目标文件中的起始地址、以及该目标数据的长度,该接口要求返回的参数为该目标数据在HDFS文件系统中占用的每个逻辑块的编号,以及在每个逻辑块中的逻辑块偏移量和逻辑块长度。The operation request message may be a message sent by the HDFS client to the getblocklocation interface of the name node module, where the parameter passed in is the file name of the target file, the start address of the target data in the target file, and the The length of the target data. The parameters required by the interface are the number of each logical block occupied by the target data in the HDFS file system, and the logical block offset and logical block length in each logical block.
在本发明实施例中,该名称节点模块可以将该目标数据在该存储空间中的块的编号作为该逻辑块的编号,将该块偏移量作为该逻辑块偏移量,将该块长度作为该逻辑块长度返回至该HDFS客户端。In the embodiment of the present invention, the name node module may use the number of the block in the storage space as the number of the logical block, and use the block offset as the logical block offset. Return to the HDFS client as the logical block length.
S205、该数据节点模块接收该HDFS客户端发送的操作指令,该操作指令包括该目标块地址信息。S205. The data node module receives an operation instruction sent by the HDFS client, where the operation instruction includes the target block address information.
其中,该操作指令用于对目标文件中待操作的目标数据进行操作;该操作指令基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议。The operation instruction is used to operate the target data to be operated in the target file; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform.
S206、该数据节点模块根据该块地址信息对该目标数据进行该操作指令指示的操作。S206. The data node module performs an operation indicated by the operation instruction on the target data according to the block address information.
在该操作指令为写操作指令的情况下,该数据节点模块将该目标数据写入该目标块地址信息指定的位置,在该操作指令为读操作指令的情况下,该数据节点模块从该目标块地址信息指定的位置读取该目标数据。In the case that the operation instruction is a write operation instruction, the data node module writes the target data to a position specified by the target block address information, and in a case where the operation instruction is a read operation instruction, the data node module is from the target The target data is read by the location specified by the block address information.
采用上述方法,名称节点模块与HDFS客户端之间基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议进行通信,数据节点模块基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议进行通信,保证了对Hadoop其他功能的支撑,在此前提下,由于HDFS文件的数据在底层做键值存储,从而提高了数据的读写效率以及容量扩展性。In the above method, the name node module communicates with the HDFS client based on the ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform, and the data node module is based on the ClientDatanodeProtocol communication between the data node and the HDFS client in the Hadoop platform. The protocol communicates to ensure the support of other Hadoop functions. Under this premise, the data of the HDFS file is stored at the bottom of the key value, thereby improving the data read and write efficiency and capacity scalability.
为了使本领域的普通技术人员更容易理解本发明提供的技术方案,下面对操作请求消 息为读请求消息的情况进行举例说明。In order to make it easier for a person of ordinary skill in the art to understand the technical solution provided by the present invention, the following requests for operation are eliminated. The case of the read request message is exemplified.
示例地,若HDFS客户端需要读取文件名为“第一文件”的文件中起始地址为100M(兆),数据长度为128M的目标数据,则在本发明实施例中,数据读取的方法如图3所示,包括:For example, if the HDFS client needs to read the target data with a starting address of 100M (megabytes) and a data length of 128M in the file named "first file", in the embodiment of the present invention, the data is read. The method is shown in Figure 3 and includes:
S301、名称节点模块接收该HDFS客户端发送的读请求消息,该读请求消息包括文件名,起始地址信息以及数据长度信息。S301. The name node module receives a read request message sent by the HDFS client, where the read request message includes a file name, start address information, and data length information.
该文件名即为“第一文件”,该起始地址信息为100M,数据长度信息为128M。The file name is the "first file", the start address information is 100M, and the data length information is 128M.
S302、该名称节点模块根据该文件名确定该文件的inode编号。S302. The name node module determines an inode number of the file according to the file name.
S303、该名称节点模块根据该inode编号(key)计算该文件的数据(value)的存储空间的位置。S303. The name node module calculates a location of a storage space of data (value) of the file according to the inode number (key).
步骤S302和步骤S303可参照上述对步骤S202的描述,此处不再赘述。For the steps S302 and S303, reference may be made to the foregoing description of step S202, and details are not described herein again.
S304、该名称节点模块在KV存储设备中的该存储空间中获取该目标数据的目标块地址信息。S304. The name node module acquires target block address information of the target data in the storage space in the KV storage device.
KV存储设备中每个块的大小可根据用户需求设定,若KV存储设备中每个块的大小均为64M,则如图4所示,第一文件中起始地址为100M、数据长度为128M的目标数据在该存储空间中占用块1,块2和块3,其中,在块1中的偏移量为36M,长度为28M,在块2中的偏移量为0,长度为64M,在块3中的偏移量为0,长度为36M。The size of each block in the KV storage device can be set according to user requirements. If the size of each block in the KV storage device is 64M, as shown in FIG. 4, the starting address in the first file is 100M, and the data length is The target data of 128M occupies block 1, block 2 and block 3 in the storage space, wherein the offset in block 1 is 36M, the length is 28M, the offset in block 2 is 0, and the length is 64M. The offset in block 3 is 0 and the length is 36M.
因此,该目标块地址信息可以是如下表所示的列表信息:Therefore, the target block address information may be list information as shown in the following table:
编号Numbering 偏移量Offset 长度length
11 36M36M 28M28M
22 00 64M64M
33 00 36M36M
S305、该名称节点模块将包括该目标块地址信息的响应消息发送至该HDFS客户端。S305. The name node module sends a response message including the target block address information to the HDFS client.
其中,该响应协议基于Hadoop平台中HDFS客户端与名称节点之间的ClientProtocol通信协议。可参照上述对步骤S204的描述,此处不再赘述。The response protocol is based on the ClientProtocol communication protocol between the HDFS client and the name node in the Hadoop platform. Reference may be made to the description of step S204 above, and details are not described herein again.
在原生Hadoop平台中,名称节点向HDFS客户端返回的响应消息包括目标数据在HDFS文件系统中的块地址信息,该块地址信息包括编号,偏移量以及长度,但是HDFS并不感知名称节点返回的块地址信息是逻辑地址还是物理地址,因此,本发明实施例可以将KV存储设备的存储空间中的物理存储单元的地址信息返回至该HDFS客户端。In the native Hadoop platform, the response message returned by the name node to the HDFS client includes block address information of the target data in the HDFS file system, and the block address information includes the number, the offset, and the length, but the HDFS does not perceive the name node to return. The block address information is a logical address or a physical address. Therefore, the embodiment of the present invention can return the address information of the physical storage unit in the storage space of the KV storage device to the HDFS client.
S306、该数据节点模块接收HDFS客户端发送的读指令,该读指令包括该目标块地址信息。S306. The data node module receives a read command sent by the HDFS client, where the read command includes the target block address information.
该操作指令基于Hadoop平台中HDFS客户端与数据节点之间的ClientDatanodeProtocol通信协议。The operation instruction is based on the ClientDatanodeProtocol communication protocol between the HDFS client and the data node in the Hadoop platform.
S307、该数据节点模块根据该目标块地址信息从该KV存储设备中读取该目标数据。S307. The data node module reads the target data from the KV storage device according to the target block address information.
S308、该数据节点模块将该目标数据发送至该HDFS客户端。S308. The data node module sends the target data to the HDFS client.
采用上述方法,对于HDFS客户端来说,下层做数据存储的仍然是HDFS文件系统,保证了对Hadoop其他功能的支撑,并且,在HDFS客户端不感知的情况下,将文件的数据在底层做键值存储,数据的读取无需经过文件系统复杂的层级机构,提高了读取的效率,键值存储的扁平化存储结构也提高了容量的可扩展性。In the above method, for the HDFS client, the data storage of the lower layer is still the HDFS file system, which guarantees the support of other functions of Hadoop, and the data of the file is done at the bottom layer without the HDFS client being aware of it. The key value storage, the data reading does not need to go through the complicated hierarchical mechanism of the file system, the reading efficiency is improved, and the flat storage structure of the key value storage also improves the capacity scalability.
本发明实施例还提供一种名称节点模块50,该名称节点模块50应用于存储系统,所述 存储系统还包括数据节点模块以及键值KV存储设备,该名称节点模块50用于实施上述方法实施例中的相应步骤,如图5所示,该名称节点模块50包括:An embodiment of the present invention further provides a name node module 50, where the name node module 50 is applied to a storage system, The storage system further includes a data node module and a key value KV storage device. The name node module 50 is used to implement the corresponding steps in the foregoing method embodiments. As shown in FIG. 5, the name node module 50 includes:
接收单元51,用于接收分布式文件系统HDFS客户端发送的操作请求消息,所述操作请求消息是用于请求获取目标文件中待操作的目标数据的在HDFS中的块地址信息,以对所述目标数据进行操作;所述操作请求消息基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议;The receiving unit 51 is configured to receive an operation request message sent by the distributed file system HDFS client, where the operation request message is used to request to acquire block address information in the HDFS of the target data to be operated in the target file, to The target data is operated; the operation request message is based on a ClientProtocol communication protocol between a name node and an HDFS client in the Hadoop platform;
确定单元52,用于根据所述操作请求消息中包括的所述目标文件的文件名确定键key,并根据所述key确定值value的存储空间的位置,所述value为所述目标文件的数据;a determining unit 52, configured to determine a key key according to a file name of the target file included in the operation request message, and determine a location of a storage space of a value value according to the key, where the value is data of the target file ;
获取单元53,用于根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息;The obtaining unit 53 is configured to acquire target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message;
发送单元54,用于向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,所述响应消息包括所述目标块地址信息。The sending unit 54 is configured to send, to the HDFS client, a response message for responding to the operation request message, where the response message includes the target block address information.
采用上述名称节点模块50。该名称节点模块50与HDFS客户端之间基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议进行通信,在未改变上层通信接口的情况下,也就是说,该名称节点对HDFS客户端仍然呈现HDFS文件系统,并将HDFS文件的数据在底层做键值存储,提高了数据的读写效率以及容量扩展性。The above name node module 50 is employed. The name node module 50 communicates with the HDFS client based on the ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform. In the case that the upper layer communication interface is not changed, that is, the name node is for the HDFS client. The HDFS file system is still present, and the data of the HDFS file is stored at the bottom of the key value, which improves the efficiency of reading and writing data and capacity scalability.
可选地,所述确定单元52具体用于:根据所述文件名确定所述目标文件的索引节点inode编号;将所述inode编号作为所述key确定所述value的存储空间的位置。Optionally, the determining unit 52 is specifically configured to: determine an index node inode number of the target file according to the file name; and determine the location of the storage space of the value by using the inode number as the key.
可选地,所述获取单元53具体用于:根据所述起始地址信息以及所述数据长度信息,获取所述目标数据在所述存储空间中占用的每个块的编号,以及在每个所述块中的块偏移量和块长度。Optionally, the obtaining unit 53 is configured to: obtain, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and each The block offset and block length in the block.
可选地,所述目标数据在HDFS中的块地址信息包括所述目标数据在HDFS中占用的每个逻辑块的编号,以及在每个所述逻辑块中的逻辑块偏移量和逻辑块长度;所述发送单元54具体用于:将所述块的编号作为所述逻辑块的编号,将所述块偏移量作为所述逻辑块偏移量,将所述块长度作为所述逻辑块长度发送至所述HDFS客户端。Optionally, the block address information of the target data in the HDFS includes a number of each logical block occupied by the target data in the HDFS, and a logical block offset and a logical block in each of the logical blocks. The sending unit 54 is specifically configured to: use the number of the block as the number of the logical block, use the block offset as the logical block offset, and use the block length as the logic. The block length is sent to the HDFS client.
值得说明的是,以上对名称节点模块进行的单元划分,仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,将上述确定单元52与上述获取单元53划分为一个处理单元。并且,上述各功能单元的物理实现也可能有多种实现方式。It is to be noted that the unit division of the name node module is only a logical function division. In actual implementation, there may be another division manner. For example, the determination unit 52 and the acquisition unit 53 are divided into one processing unit. . Moreover, the physical implementation of each of the above functional units may also have multiple implementations.
另外,所属本领域的技术人员应该清楚地了解到,为描述的方便和简洁,上述描述的名称节点模块的各单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。In addition, it should be clearly understood by those skilled in the art that for the convenience and brevity of the description, the specific working process of each unit of the name node module described above may refer to the corresponding process in the foregoing method embodiment, and no longer Narration.
本发明实施例还提供一种数据节点模块60,该数据节点模块应用于存储系统,所述存储系统还包括名称节点模块以及键值KV存储设备,该数据节点模块60用于实施上述方法实施例中相应的步骤,该数据节点模块60包括:The embodiment of the present invention further provides a data node module 60, where the data node module is applied to a storage system, and the storage system further includes a name node module and a key value KV storage device, where the data node module 60 is used to implement the foregoing method embodiment. In the corresponding step, the data node module 60 includes:
接收单元61,用于接收分布式文件系统HDFS客户端发送的操作指令,所述操作指令用于对目标文件中待操作的目标数据进行操作;所述操作指令基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议;所述操作指令包括所述KV存储设备中存储所述目标数据的块地址信息;The receiving unit 61 is configured to receive an operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on the data node and the HDFS client in the Hadoop platform. a ClientDatanodeProtocol communication protocol between the ends; the operation instruction includes block address information for storing the target data in the KV storage device;
操作单元62,用于根据所述块地址信息对所述目标数据进行所述操作指令指示的操作。 The operation unit 62 is configured to perform an operation indicated by the operation instruction on the target data according to the block address information.
采用上述数据节点模块60,该数据节点模块60基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议进行通信,在未改变上层通信接口的情况下,在KV存储设备中对数据做键值存储,提高了数据的读写效率以及容量扩展性。Using the data node module 60, the data node module 60 communicates based on the ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform, and performs key on the data in the KV storage device without changing the upper layer communication interface. Value storage improves data read and write efficiency and capacity scalability.
所属本领域的技术人员应该清楚地了解到,为描述的方便和简洁,上述描述的名称节点模块的各单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。It should be clearly understood by those skilled in the art that, for the convenience and brevity of the description, the specific working process of each unit of the name node module described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
本发明实施例还提供一种服务器,如图7所示,该服务器包括图5所示的名称节点模块50和/或图6所示的数据节点模块60,具体参照上述对图5和图6的描述,此处不再赘述。也就是说,名称节点模块与数据节点模块可以灵活部署在计算机上。The embodiment of the present invention further provides a server. As shown in FIG. 7, the server includes the name node module 50 shown in FIG. 5 and/or the data node module 60 shown in FIG. 6. Referring specifically to FIG. 5 and FIG. 6 above. The description is not repeated here. That is to say, the name node module and the data node module can be flexibly deployed on the computer.
本发明实施例还提供另一种服务器80,如图8所示,该服务器80包括:Another embodiment of the present invention provides a server 80. As shown in FIG. 8, the server 80 includes:
处理器81、第一接口82、第二接口83和通信总线84;所述处理器81、所述第一接口82和所述第二接口83通过所述通信总线84进行通信;所述第一接口82用于与分布式文件系统HDFS客户端进行通信,所述第二接口83用于与键值KV存储设备进行通信;所述服务器运行名称节点软件,所述服务器通过所述名称节点软件执行一下操作:a processor 81, a first interface 82, a second interface 83, and a communication bus 84; the processor 81, the first interface 82, and the second interface 83 communicate via the communication bus 84; The interface 82 is for communicating with a distributed file system HDFS client, the second interface 83 is for communicating with a key value KV storage device; the server runs name node software, and the server is executed by the name node software The following operation:
接收HDFS客户端发送的操作请求消息,所述操作请求消息是用于请求获取目标文件中待操作的目标数据的在HDFS中的块地址信息,以对所述目标数据进行操作;所述操作请求消息基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议;Receiving an operation request message sent by the HDFS client, where the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate on the target data; The message is based on the ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform;
根据所述操作请求消息中包括的所述目标文件的文件名确定键key,并根据所述key确定值value的存储空间的位置,所述value为所述目标文件的数据;Determining a key key according to a file name of the target file included in the operation request message, and determining a location of a storage space of a value value according to the key, where the value is data of the target file;
根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息;Obtaining target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message;
向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,所述响应消息包括所述目标块地址信息。Sending a response message for responding to the operation request message to the HDFS client, the response message including the target block address information.
可选地,所述根据所述操作请求消息中包括的所述目标文件的文件名确定键key,包括:根据所述文件名确定所述目标文件的索引节点inode编号;将所述inode编号作为所述key确定所述value的存储空间的位置。Optionally, determining, according to the file name of the target file included in the operation request message, a key key, including: determining an index node inode number of the target file according to the file name; using the inode number as The key determines the location of the storage space of the value.
可选地,所述根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息,包括:根据所述起始地址信息以及所述数据长度信息,获取所述目标数据在所述存储空间中占用的每个块的编号,以及在每个所述块中的块偏移量和块长度。Optionally, the acquiring the target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message, including: according to the start address information and the Describe the data length information, obtain the number of each block occupied by the target data in the storage space, and the block offset and block length in each of the blocks.
可选地,所述目标数据在HDFS中的块地址信息包括所述目标数据在HDFS中占用的每个逻辑块的编号,以及在每个所述逻辑块中的逻辑块偏移量和逻辑块长度;所述向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,包括:将所述块的编号作为所述逻辑块的编号,将所述块偏移量作为所述逻辑块偏移量,将所述块长度作为所述逻辑块长度发送至所述HDFS客户端。Optionally, the block address information of the target data in the HDFS includes a number of each logical block occupied by the target data in the HDFS, and a logical block offset and a logical block in each of the logical blocks. Transmitting, to the HDFS client, a response message for responding to the operation request message, including: using a number of the block as a number of the logical block, using the block offset as the logic a block offset, the block length being sent to the HDFS client as the logical block length.
在本发明实施例的一种可能的实现方式中,所述服务器80还可以运行数据节点软件,所述服务器80通过所述数据节点软件执行:接收所述HDFS客户端发送的操作指令,所述操作指令用于对所述目标数据进行操作;所述操作指令基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议;所述操作指令包括所述目标块地址信息;根据所述目标块地址信息对所述目标数据进行所述操作指令指示的操作。 In a possible implementation manner of the embodiment of the present invention, the server 80 may also run data node software, where the server 80 performs: receiving, by the data node software, an operation instruction sent by the HDFS client, The operation instruction is configured to operate on the target data; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform; the operation instruction includes the target block address information; according to the target block The address information performs an operation indicated by the operation instruction on the target data.
该服务器80还可能包括其他器件,例如存储介质,用于存储程序指令,图8中未一一示出。并且,所属本领域的技术人员应该了解到,处理器81执行的操作可能是由其他器件的配合共同完成的,为了方便描述,本发明实施例中统一描述为处理器81执行数据整理的操作。The server 80 may also include other devices, such as storage media, for storing program instructions, not shown in FIG. In addition, it should be understood by those skilled in the art that the operations performed by the processor 81 may be performed by the cooperation of other devices. For the convenience of description, the embodiment of the present invention is uniformly described as the operation of the processor 81 to perform data sorting.
本发明实施例中的处理器81可以是CPU(Center Processing Unit,中央处理单元)。另外,为节省CPU的计算资源,处理器81也可以是FPGA(Field Programmable Gate Array,现场可编程门阵列)或其他硬件,或者,处理器81还可以是CPU和FPGA或其他硬件,则FPGA或其他硬件与CPU分别执行本发明实施例中的部分操作。The processor 81 in the embodiment of the present invention may be a CPU (Center Processing Unit). In addition, in order to save the computing resources of the CPU, the processor 81 may also be an FPGA (Field Programmable Gate Array) or other hardware, or the processor 81 may also be a CPU and an FPGA or other hardware, then the FPGA or Other hardware and CPU respectively perform part of operations in the embodiments of the present invention.
本发明实施例还提供一种存储系统90,如图9所示,该存储系统90包括:The embodiment of the present invention further provides a storage system 90. As shown in FIG. 9, the storage system 90 includes:
名称节点模块50,数据节点模块60,KV存储设备91,所述名称节点模块50与所述KV存储设备91相连,所述数据节点模块60与所述KV存储设备91相连。The name node module 50, the data node module 60, the KV storage device 91, the name node module 50 is connected to the KV storage device 91, and the data node module 60 is connected to the KV storage device 91.
具体地,如图9所示,该名称节点模块50与HDFS客户端相连,该数据节点模块60与该HDFS客户端进行相连,名称节点模块50包括INTF_Namenode接口,用于向HDFS客户端提供RPC接口,名称节点模块50可以通过该RPC接口接收HDFS客户端发送的元数据处理或管理的命令,数据节点模块60包括INTF_Datanode接口,用于向客户端提供RPC接口,数据节点模块60通过该RPC接口接收客户端发送的数据处理命令。该KV存储设备91为名称节点模块50和数据节点模块60提供标准的key-value形式的接口INTF_KV。Specifically, as shown in FIG. 9, the name node module 50 is connected to the HDFS client, and the data node module 60 is connected to the HDFS client. The name node module 50 includes an INTF_Namenode interface for providing an RPC interface to the HDFS client. The name node module 50 can receive the metadata processing or management command sent by the HDFS client through the RPC interface. The data node module 60 includes an INTF_Datanode interface, and is configured to provide an RPC interface to the client, and the data node module 60 receives the RPC interface. The data processing command sent by the client. The KV storage device 91 provides a standard key-value interface, INTF_KV, for the name node module 50 and the data node module 60.
该名称节点模块50具体可参照上述对图5的具体描述,该数据节点模块60具体可参照上述对图5的具体描述,此处不再赘述。The name node module 50 can be specifically referred to the specific description of FIG. 5 above. The data node module 60 can be specifically referred to the foregoing detailed description of FIG. 5, and details are not described herein again.
在本发明实施例的一种可能的实现方式中,该名称节点模块50和该数据节点模块60可以同时部署在同一服务器上,也可以分别部署在不同服务器上。In a possible implementation manner of the embodiment of the present invention, the name node module 50 and the data node module 60 may be deployed on the same server at the same time, or may be separately deployed on different servers.
另外,值得说明的是,图9所示的存储系统90中仅包括一个名称节点模块以及一个数据节点模块,在具体实施时,存储系统中包括的数据节点模块的数量以及名称节点模块的数据可以根据实际需求设定,在具有多个名称节点模块以及多个数据节点模块的情况下,HDFS客户端在需要连接名称节点模块时,可以先通过DNS(Domain Name System,域名系统)轮询方式,获取到一个名称节点模块的地址,对于名称节点模块返回的多个数据节点模块的地址,该HDFS客户端可以选择距离最近的数据节点模块进行连接。In addition, it should be noted that the storage system 90 shown in FIG. 9 includes only one name node module and one data node module. In specific implementation, the number of data node modules included in the storage system and the data of the name node module may be According to the actual requirements, when there are multiple name node modules and multiple data node modules, the HDFS client can first use the DNS (Domain Name System) polling mode when connecting the name node module. Obtain the address of a name node module. For the address of multiple data node modules returned by the name node module, the HDFS client can select the nearest data node module to connect.
在本申请所提供的几个实施例中,应该理解到,所公开的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。 In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、RAM(Random Access Memory,随机存取存储器)、磁碟或者光盘等各种可以存储数据的介质。The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The software functional units described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform portions of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a RAM (Random Access Memory), a magnetic disk, or an optical disk, and the like, which can store data.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While the preferred embodiment of the invention has been described, it will be understood that Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。 It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Claims (18)

  1. 一种数据操作方法,其特征在于,所述方法应用于存储系统,所述存储系统包括名称节点模块,数据节点模块,以及键值KV存储设备;所述方法包括:A data operation method, wherein the method is applied to a storage system, where the storage system includes a name node module, a data node module, and a key value KV storage device; the method includes:
    名称节点模块接收分布式文件系统HDFS客户端发送的操作请求消息,所述操作请求消息是用于请求获取目标文件中待操作的目标数据的在HDFS中的块地址信息,以对所述目标数据进行操作;所述操作请求消息基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议;The name node module receives an operation request message sent by the distributed file system HDFS client, where the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to the target data. Performing operations; the operation request message is based on a ClientProtocol communication protocol between a name node and an HDFS client in the Hadoop platform;
    根据所述操作请求消息中包括的所述目标文件的文件名确定键key,并根据所述key确定值value的存储空间的位置,所述value为所述目标文件的数据;Determining a key key according to a file name of the target file included in the operation request message, and determining a location of a storage space of a value value according to the key, where the value is data of the target file;
    根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息;Obtaining target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message;
    向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,所述响应消息包括所述目标块地址信息。Sending a response message for responding to the operation request message to the HDFS client, the response message including the target block address information.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述操作请求消息中包括的所述目标文件的文件名确定键key,包括:The method according to claim 1, wherein the determining the key key according to the file name of the target file included in the operation request message comprises:
    根据所述文件名确定所述目标文件的索引节点inode编号;Determining an inode number of the target file according to the file name;
    将所述inode编号作为所述key确定所述value的存储空间的位置。The inode number is used as the key to determine the location of the storage space of the value.
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息,包括:The method according to claim 1 or 2, wherein the acquiring the target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message, include:
    根据所述起始地址信息以及所述数据长度信息,获取所述目标数据在所述存储空间中占用的每个块的编号,以及在每个所述块中的块偏移量和块长度。And obtaining, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and a block offset and a block length in each of the blocks.
  4. 根据权利要求3所述的方法,其特征在于,所述目标数据在HDFS中的块地址信息包括所述目标数据在HDFS中占用的每个逻辑块的编号,以及在每个所述逻辑块中的逻辑块偏移量和逻辑块长度;所述向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,包括:The method according to claim 3, wherein the block address information of the target data in the HDFS includes the number of each logical block occupied by the target data in the HDFS, and in each of the logical blocks The logical block offset and the logical block length; the sending, to the HDFS client, a response message for responding to the operation request message, including:
    将所述块的编号作为所述逻辑块的编号,将所述块偏移量作为所述逻辑块偏移量,将所述块长度作为所述逻辑块长度发送至所述HDFS客户端。The number of the block is used as the number of the logical block, the block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.
  5. 一种数据操作方法,其特征在于,所述方法应用于存储系统,所述存储系统包括名称节点模块,数据节点模块,以及键值KV存储设备;所述方法包括: A data operation method, wherein the method is applied to a storage system, where the storage system includes a name node module, a data node module, and a key value KV storage device; the method includes:
    所述数据节点模块接收分布式文件系统HDFS客户端发送的操作指令,所述操作指令用于对目标文件中待操作的目标数据进行操作;所述操作指令基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议;所述操作指令包括所述KV存储设备中存储所述目标数据的块地址信息;The data node module receives an operation instruction sent by a distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a data node and an HDFS client in the Hadoop platform. a ClientDatanodeProtocol communication protocol; the operation instruction includes block address information in the KV storage device storing the target data;
    根据所述块地址信息对所述目标数据进行所述操作指令指示的操作。Performing an operation indicated by the operation instruction on the target data according to the block address information.
  6. 一种名称节点模块,其特征在于,所述名称节点模块应用于存储系统,所述存储系统还包括数据节点模块以及键值KV存储设备;所述名称节点模块包括:A name node module, wherein the name node module is applied to a storage system, the storage system further includes a data node module and a key value KV storage device; the name node module includes:
    接收单元,用于接收分布式文件系统HDFS客户端发送的操作请求消息,所述操作请求消息是用于请求获取目标文件中待操作的目标数据的在HDFS中的块地址信息,以对所述目标数据进行操作;所述操作请求消息基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议;a receiving unit, configured to receive an operation request message sent by the distributed file system HDFS client, where the operation request message is used to request to acquire block address information in the HDFS of the target data to be operated in the target file, to The target data is operated; the operation request message is based on a ClientProtocol communication protocol between a name node and an HDFS client in the Hadoop platform;
    确定单元,用于根据所述操作请求消息中包括的所述目标文件的文件名确定键key,并根据所述key确定值value的存储空间的位置,所述value为所述目标文件的数据;a determining unit, configured to determine a key according to a file name of the target file included in the operation request message, and determine a location of a storage space of a value value according to the key, where the value is data of the target file;
    获取单元,用于根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息;An obtaining unit, configured to acquire target block address information of the target data in the storage space according to start address information and data length information in the operation request message;
    发送单元,用于向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,所述响应消息包括所述目标块地址信息。And a sending unit, configured to send, to the HDFS client, a response message for responding to the operation request message, where the response message includes the target block address information.
  7. 根据权利要求6所述的名称节点模块,其特征在于,所述确定单元具体用于:The name node module according to claim 6, wherein the determining unit is specifically configured to:
    根据所述文件名确定所述目标文件的索引节点inode编号;Determining an inode number of the target file according to the file name;
    将所述inode编号作为所述key确定所述value的存储空间的位置。The inode number is used as the key to determine the location of the storage space of the value.
  8. 根据权利要求6或7所述的名称节点模块,其特征在于,所述获取单元具体用于:The name node module according to claim 6 or 7, wherein the obtaining unit is specifically configured to:
    根据所述起始地址信息以及所述数据长度信息,获取所述目标数据在所述存储空间中占用的每个块的编号,以及在每个所述块中的块偏移量和块长度。And obtaining, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and a block offset and a block length in each of the blocks.
  9. 根据权利要求8所述的名称节点模块,其特征在于,所述目标数据在HDFS中的块地址信息包括所述目标数据在HDFS中占用的每个逻辑块的编号,以及在每个所述逻辑块中的逻辑块偏移量和逻辑块长度;所述发送单元具体用于:The name node module according to claim 8, wherein the block address information of the target data in the HDFS includes the number of each logical block occupied by the target data in the HDFS, and in each of the logics a logical block offset and a logical block length in the block; the sending unit is specifically configured to:
    将所述块的编号作为所述逻辑块的编号,将所述块偏移量作为所述逻辑块偏移量,将所述块长度作为所述逻辑块长度发送至所述HDFS客户端。The number of the block is used as the number of the logical block, the block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.
  10. 一种数据节点模块,其特征在于,所述数据节点模块应用于存储系统,所述存 储系统还包括名称节点模块以及键值KV存储设备;所述数据节点模块包括:A data node module, wherein the data node module is applied to a storage system, and the storing The storage system further includes a name node module and a key value KV storage device; the data node module includes:
    接收单元,用于接收分布式文件系统HDFS客户端发送的操作指令,所述操作指令用于对目标文件中待操作的目标数据进行操作;所述操作指令基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议;所述操作指令包括所述KV存储设备中存储所述目标数据的块地址信息;a receiving unit, configured to receive an operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a data node and an HDFS client in the Hadoop platform a ClientDatanodeProtocol communication protocol; the operation instruction includes block address information in the KV storage device storing the target data;
    操作单元,用于根据所述块地址信息对所述目标数据进行所述操作指令指示的操作。And an operation unit, configured to perform an operation indicated by the operation instruction on the target data according to the block address information.
  11. 一种服务器,其特征在于,所述服务器包括如权利要求6-9任一项所述的名称节点模块,和/或如权利要求10所述的数据节点模块。A server, characterized in that the server comprises a name node module according to any of claims 6-9, and/or a data node module according to claim 10.
  12. 一种服务器,其特征在于,所述服务器包括:处理器、第一接口、第二接口和通信总线;所述处理器、所述第一接口和所述第二接口通过所述通信总线进行通信;所述第一接口用于与分布式文件系统HDFS客户端进行通信,所述第二接口用于与键值KV存储设备进行通信;A server, comprising: a processor, a first interface, a second interface, and a communication bus; wherein the processor, the first interface, and the second interface communicate via the communication bus The first interface is for communicating with a distributed file system HDFS client, and the second interface is for communicating with a key value KV storage device;
    所述服务器运行名称节点软件,所述服务器通过所述名称节点软件执行:The server runs name node software, and the server executes by the name node software:
    接收分布式文件系统HDFS客户端发送的操作请求消息,所述操作请求消息是用于请求获取目标文件中待操作的目标数据的在HDFS中的块地址信息,以对所述目标数据进行操作;所述操作请求消息基于Hadoop平台中名称节点与HDFS客户端之间的ClientProtocol通信协议;Receiving an operation request message sent by the distributed file system HDFS client, where the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate on the target data; The operation request message is based on a ClientProtocol communication protocol between a name node and an HDFS client in the Hadoop platform;
    根据所述操作请求消息中包括的所述目标文件的文件名确定键key,并根据所述key确定值value的存储空间的位置,所述value为所述目标文件的数据;Determining a key key according to a file name of the target file included in the operation request message, and determining a location of a storage space of a value value according to the key, where the value is data of the target file;
    根据所述操作请求消息中的起始地址信息以及数据长度信息获取所述目标数据在所述存储空间中的目标块地址信息;Obtaining target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message;
    向所述HDFS客户端发送用于响应所述操作请求消息的响应消息,所述响应消息包括所述目标块地址信息。Sending a response message for responding to the operation request message to the HDFS client, the response message including the target block address information.
  13. 根据权利要求12所述的服务器,其特征在于,所述服务器通过所述名称节点软件执行:The server according to claim 12, wherein said server is executed by said name node software:
    根据所述文件名确定所述目标文件的索引节点inode编号;Determining an inode number of the target file according to the file name;
    将所述inode编号作为所述key确定所述value的存储空间的位置。The inode number is used as the key to determine the location of the storage space of the value.
  14. 根据权利要求12或13所述的服务器,其特征在于,所述服务器通过所述名称节点软件执行: The server according to claim 12 or 13, wherein said server is executed by said name node software:
    根据所述起始地址信息以及所述数据长度信息,获取所述目标数据在所述存储空间中占用的每个块的编号,以及在每个所述块中的块偏移量和块长度。And obtaining, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and a block offset and a block length in each of the blocks.
  15. 根据权利要求14所述的服务器,其特征在于,所述目标数据在HDFS中的块地址信息包括所述目标数据在HDFS中占用的每个逻辑块的编号,以及在每个所述逻辑块中的逻辑块偏移量和逻辑块长度;所述服务器通过所述名称节点软件执行:The server according to claim 14, wherein the block address information of the target data in the HDFS includes a number of each logical block occupied by the target data in the HDFS, and in each of the logical blocks Logic block offset and logical block length; the server is executed by the name node software:
    将所述块的编号作为所述逻辑块的编号,将所述块偏移量作为所述逻辑块偏移量,将所述块长度作为所述逻辑块长度发送至所述HDFS客户端。The number of the block is used as the number of the logical block, the block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.
  16. 根据权利要求12至15任一项所述的服务器,其特征在于,所述服务器运行数据节点软件,所述服务器通过所述数据节点软件执行:The server according to any one of claims 12 to 15, wherein the server runs data node software, and the server executes by the data node software:
    接收所述HDFS客户端发送的操作指令,所述操作指令用于对所述目标数据进行操作;所述操作指令基于Hadoop平台中数据节点与HDFS客户端之间的ClientDatanodeProtocol通信协议;所述操作指令包括所述目标块地址信息;Receiving an operation instruction sent by the HDFS client, where the operation instruction is used to operate on the target data; the operation instruction is based on a ClientDatanodeProtocol communication protocol between a data node and an HDFS client in a Hadoop platform; Include the target block address information;
    根据所述目标块地址信息对所述目标数据进行所述操作指令指示的操作。Performing an operation indicated by the operation instruction on the target data according to the target block address information.
  17. 一种存储系统,其特征在于,所述存储系统包括如权利要求6至9所述的名称节点模块,如权利要求10所述的数据节点模块,键值KV存储设备,所述名称节点模块与所述KV存储设备相连,所述数据节点模块与所述KV存储设备相连。A storage system, characterized in that the storage system comprises a name node module according to claims 6 to 9, a data node module according to claim 10, a key value KV storage device, and a name node module and The KV storage device is connected, and the data node module is connected to the KV storage device.
  18. 根据权利要求17所述的存储系统,其特征在于,所述名称节点模块与所述数据节点模块部署在同一服务器上。 The storage system according to claim 17, wherein said name node module and said data node module are deployed on the same server.
PCT/CN2017/078387 2016-03-31 2017-03-28 Data operation method, server, and storage system WO2017167171A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610201356.0 2016-03-31
CN201610201356.0A CN105933376B (en) 2016-03-31 2016-03-31 A kind of data manipulation method, server and storage system

Publications (1)

Publication Number Publication Date
WO2017167171A1 true WO2017167171A1 (en) 2017-10-05

Family

ID=56840419

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/078387 WO2017167171A1 (en) 2016-03-31 2017-03-28 Data operation method, server, and storage system

Country Status (2)

Country Link
CN (1) CN105933376B (en)
WO (1) WO2017167171A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110764688A (en) * 2018-07-27 2020-02-07 杭州海康威视数字技术股份有限公司 Method and device for processing data
CN110851399A (en) * 2019-09-22 2020-02-28 苏州浪潮智能科技有限公司 Method and system for optimizing file data block transmission efficiency of distributed file system
CN113076552A (en) * 2020-01-03 2021-07-06 中国移动通信集团广东有限公司 HDFS (Hadoop distributed File System) resource access permission verification method and device and electronic equipment
CN113824812A (en) * 2021-08-27 2021-12-21 济南浪潮数据技术有限公司 Method, device and storage medium for HDFS service to acquire service node IP
US11392544B2 (en) 2018-02-06 2022-07-19 Samsung Electronics Co., Ltd. System and method for leveraging key-value storage to efficiently store data and metadata in a distributed file system
CN115190124A (en) * 2022-06-24 2022-10-14 远光软件股份有限公司 Message transmission method and device based on distributed industrial control system, storage medium and scheduling server

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105933376B (en) * 2016-03-31 2019-09-03 华为技术有限公司 A kind of data manipulation method, server and storage system
CN108021333B (en) * 2016-11-03 2021-08-24 阿里巴巴集团控股有限公司 System, device and method for randomly reading and writing data
CN106874481B (en) * 2017-02-20 2020-02-07 郑州云海信息技术有限公司 Method and system for reading metadata information of distributed file system
CN107704585A (en) * 2017-10-09 2018-02-16 郑州云海信息技术有限公司 One kind inquiry HDFS data methods and system
CN108052290A (en) * 2017-12-13 2018-05-18 北京百度网讯科技有限公司 For storing the method and apparatus of data
CN111522787B (en) * 2019-02-01 2023-04-07 阿里巴巴集团控股有限公司 Data processing method and device of distributed system and storage medium
EP3702920A1 (en) * 2019-03-01 2020-09-02 ABB Schweiz AG Heterogeneous execution engines in a network centric process control system
CN110247973B (en) * 2019-06-17 2021-09-24 华云数据控股集团有限公司 Data reading and writing method and file gateway
CN110262901B (en) * 2019-06-27 2023-06-20 深圳前海微众银行股份有限公司 Data processing method and data processing system
CN110502507B (en) * 2019-08-29 2022-02-08 上海达梦数据库有限公司 Management system, method, equipment and storage medium of distributed database
CN111831655B (en) * 2020-06-24 2024-04-09 北京字节跳动网络技术有限公司 Data processing method, device, medium and electronic equipment
CN113132233B (en) * 2021-04-06 2022-09-23 中国联合网络通信集团有限公司 Data processing method, software defined network controller and data processing system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
US20140047422A1 (en) * 2012-08-07 2014-02-13 Nec Laboratories America, Inc. Compiler-guided software accelerator for iterative hadoop jobs
CN103678520A (en) * 2013-11-29 2014-03-26 中国科学院计算技术研究所 Multi-dimensional interval query method and system based on cloud computing
CN103793442A (en) * 2012-11-05 2014-05-14 北京超图软件股份有限公司 Spatial data processing method and system
CN105933376A (en) * 2016-03-31 2016-09-07 华为技术有限公司 Data manipulation method, server and storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140047422A1 (en) * 2012-08-07 2014-02-13 Nec Laboratories America, Inc. Compiler-guided software accelerator for iterative hadoop jobs
CN103793442A (en) * 2012-11-05 2014-05-14 北京超图软件股份有限公司 Spatial data processing method and system
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
CN103678520A (en) * 2013-11-29 2014-03-26 中国科学院计算技术研究所 Multi-dimensional interval query method and system based on cloud computing
CN105933376A (en) * 2016-03-31 2016-09-07 华为技术有限公司 Data manipulation method, server and storage system

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11392544B2 (en) 2018-02-06 2022-07-19 Samsung Electronics Co., Ltd. System and method for leveraging key-value storage to efficiently store data and metadata in a distributed file system
CN110764688A (en) * 2018-07-27 2020-02-07 杭州海康威视数字技术股份有限公司 Method and device for processing data
CN110764688B (en) * 2018-07-27 2023-09-05 杭州海康威视数字技术股份有限公司 Method and device for processing data
CN110851399A (en) * 2019-09-22 2020-02-28 苏州浪潮智能科技有限公司 Method and system for optimizing file data block transmission efficiency of distributed file system
CN110851399B (en) * 2019-09-22 2022-11-25 苏州浪潮智能科技有限公司 Method and system for optimizing file data block transmission efficiency of distributed file system
CN113076552A (en) * 2020-01-03 2021-07-06 中国移动通信集团广东有限公司 HDFS (Hadoop distributed File System) resource access permission verification method and device and electronic equipment
CN113076552B (en) * 2020-01-03 2022-10-18 中国移动通信集团广东有限公司 HDFS (Hadoop distributed File System) resource access permission verification method and device and electronic equipment
CN113824812A (en) * 2021-08-27 2021-12-21 济南浪潮数据技术有限公司 Method, device and storage medium for HDFS service to acquire service node IP
CN113824812B (en) * 2021-08-27 2023-02-28 济南浪潮数据技术有限公司 Method, device and storage medium for HDFS service to acquire service node IP
CN115190124A (en) * 2022-06-24 2022-10-14 远光软件股份有限公司 Message transmission method and device based on distributed industrial control system, storage medium and scheduling server
CN115190124B (en) * 2022-06-24 2023-12-26 远光软件股份有限公司 Message transmission method and device based on distributed industrial control system, storage medium and scheduling server

Also Published As

Publication number Publication date
CN105933376A (en) 2016-09-07
CN105933376B (en) 2019-09-03

Similar Documents

Publication Publication Date Title
WO2017167171A1 (en) Data operation method, server, and storage system
KR101994021B1 (en) File manipulation method and apparatus
CN106294190B (en) Storage space management method and device
US11036393B2 (en) Migrating data between volumes using virtual copy operation
WO2016180055A1 (en) Method, device and system for storing and reading data
US10860604B1 (en) Scalable tracking for database udpates according to a secondary index
US20160364407A1 (en) Method and Device for Responding to Request, and Distributed File System
CN106649676B (en) HDFS (Hadoop distributed File System) -based duplicate removal method and device for stored files
US9110820B1 (en) Hybrid data storage system in an HPC exascale environment
CN103544319A (en) Multi-tenant database sharing method and multi-tenant database as-a-service system
US20140244606A1 (en) Method, apparatus and system for storing, reading the directory index
CN106713250B (en) Data access method and device based on distributed system
CN109542861B (en) File management method, device and system
US10262024B1 (en) Providing consistent access to data objects transcending storage limitations in a non-relational data store
US20130198230A1 (en) Information processing apparatus, distributed processing system, and distributed processing method
JP2012008854A (en) Storage virtualization device
WO2020215580A1 (en) Distributed global data deduplication method and device
CN101483668A (en) Network storage and access method, device and system for hot spot data
CN105930519A (en) Globally shared read caching method based on cluster file system
WO2012171363A1 (en) Method and equipment for data operation in distributed cache system
CN112835873A (en) Power grid regulation and control heterogeneous system service access method, system, equipment and medium
CN105786608A (en) Remote deduplication migration method and system for virtual machine
JP6260088B2 (en) Virtual file access system, virtual file access method, and virtual file access program
CN117075823B (en) Object searching method, system, electronic device and storage medium
CN112804335B (en) Data processing method, data processing device, computer readable storage medium and processor

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17773201

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17773201

Country of ref document: EP

Kind code of ref document: A1