WO2017167171A1

WO2017167171A1 - Data operation method, server, and storage system

Info

Publication number: WO2017167171A1
Application number: PCT/CN2017/078387
Authority: WO
Inventors: 刘科佑; 王�锋
Original assignee: 华为技术有限公司
Priority date: 2016-03-31
Filing date: 2017-03-28
Publication date: 2017-10-05
Also published as: CN105933376A; CN105933376B

Abstract

A data operation method, a server and a storage system, which relate to the field of storage and can ensure that a Hadoop performs storage by using a key value on the premise that a function is fully supported. The method comprises: a name node module receives an operation request message sent by an HDFS client, the operation request message being based on a communication protocol (ClientProtocol) between a name node and the HDFS client in a Hadoop platform; the name node module determines a key according to a file name of a target file comprised in the operation request message, and determines the location of a storage space of a value according to the key, the value being data of the target file; the name node module obtains target block address information of the target data in the storage space according to starting address information and data length information in the operation request message; and the name node module sends, to the HDFS client, a response message for responding to the operation request message, the response message comprising the target block address information.

Description

Data operation method, server and storage system

Technical field

The present invention relates to the field of storage, and in particular, to a data operation method, a server, and a storage system.

Background technique

The processing of big data in the prior art adopts a Hadoop-based platform. Hadoop is an open source distributed computing platform, and its core includes HDFS (Hadoop Distributed Files System, Hadoop Distributed File System).

The HDFS includes a name node and a data node, a name node is used for management and processing of metadata, and a data node is used to store data in the form of a file. The name node and the data node can be dedicated devices. It can also be software running on a normal computer. Usually, a dedicated machine runs the name node software, and each of the other machines runs a data node software. Multiple data node software can also be run on a single machine. Each machine running the data node software has a local file system. HDFS is a logical file system built on multiple machine file systems, and its underlying data is stored in blocks. The data node stores the HDFS data into a local file system, wherein the data node does not know the existence of the HDFS file, and it stores the data block of each HDFS file as a separate file in the local file system.

In the key-value storage, the data is called a value, and each data corresponds to a unique key. According to the unique identifier, the position of the value can be directly located. The key-value store no longer has a directory hierarchy similar to the file system, but is completely flattened, so that key-value storage is easier to expand capacity than file storage, and data can be read and written directly to the object layer, key values. Storage is more efficient in reading and writing than the storage structure of the directory structure.

How to combine the two advanced technologies of Hadoop and key value storage is an urgent problem to be solved in the industry. However, since the implementation of some functions of Hadoop directly depends on HDFS, for example, HBase (Hadoop database, Hadoop database) backup and system query impala, directly replacing the HDFS in Hadoop with a key-value storage system will result in incomplete Hadoop function support. Therefore, there is no perfect solution in the prior art that uses a key value storage system in combination with HDFS.

Summary of the invention

The object of the present invention is to provide a data operation method, a server and a storage system, which can ensure that Hadoop uses key value storage under the premise of full functional support.

In order to achieve the above object, the present invention adopts the following technical solutions:

In a first aspect, a data operation method is provided, the method being applied to a storage system, the storage system comprising a name node module, a data node module, and a key value KV storage device; the method comprising: the name node module receiving the distributed An operation request message sent by the file system HDFS client, the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate on the target data; The request message is based on a ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform; determining a key key according to the file name of the target file included in the operation request message, and determining a storage space of the value value according to the key a location, the value is data of the target file; acquiring target block address information of the target data in the storage space according to start address information and data length information in the operation request message; The HDFS client sends a response message for responding to the operation request message, the response message The target block comprising address information. The response message is also based on the ClientProtocol communication protocol, and in the first aspect, after receiving the response message sent by the name node module, the HDFS client may be based on the Hadoop platform. The ClientDatanodeProtocol communication protocol between the data node and the HDFS client sends an operation instruction including the target address information to the data node module, and the data node module can perform the operation indicated by the operation instruction on the target data according to the target address information. In this way, the name node module and the HDFS client, the data node module and the HDFS client communicate with each other based on the Hadoop platform's native protocol, thus ensuring support for other Hadoop functions. Under this premise, due to HDFS files. The data is stored at the bottom of the key value, which improves the efficiency of data read and write and capacity scalability.

In conjunction with the first possible implementation of the first aspect, the determining a key key according to a file name of the target file included in the operation request message includes: determining, according to the file name, the target file An index node inode number; the location of the storage space in which the value is determined by using the inode number as the key. The inode number identifies the identifier of the computer identification file. In the above possible implementation manner, the data of one HDFS file is a value in the KV storage device, and the inode number of the HDFS file is a key key of value, according to an implementation mechanism of the key value storage. The name node module can directly locate the value by the key.

With reference to the first aspect, or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the obtaining, according to the start address information and the data length information in the operation request message, The target block address information of the target data in the storage space includes: obtaining, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space And the block offset and block length in each of the blocks. The block is a physical storage unit in the KV storage device. In the above possible implementation manner, after determining the location of the storage space of the value, the name node may determine that the target data to be operated is located according to the start address information and the data length information. Which physical storage units of storage space are on.

With reference to the second possible implementation of the first aspect, in a third possible implementation manner of the first aspect, the block address information of the target data in the HDFS includes each of the target data occupied in the HDFS. a number of the logical block, and a logical block offset and a logical block length in each of the logical blocks; the sending, to the HDFS client, a response message for responding to the operation request message, including: The number of the block is used as the number of the logical block, and the block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length. The response message returned by the name node to the HDFS client in the Hadoop platform includes the block address information of the target data in the file to which the target data belongs. The block address of the target data in the file is a logical address, and the data node passes the data when reading the data according to the logical block address. The hierarchical structure of the node local file system finally obtains the target data. In the above possible implementation manner, the name node module returns the physical block address information in the KV storage device to the HDFS client, so that the data node module receives the data. After the HDFS client sends the operation instruction including the physical block address information, the target data can be directly operated in the KV storage device without passing through the file system, thereby improving the efficiency of data reading and writing.

In any one of the foregoing possible implementation manners of the first aspect, the operation request message may be a read request message or a write request message.

In a second aspect, a data operation method is provided, the method being applied to a storage system, the storage system comprising a name node module, a data node module, and a key value KV storage device; the method comprising: receiving, by the data node module An operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform. The operation instruction includes block address information for storing the target data in the KV storage device; and performing an operation indicated by the operation instruction on the target data according to the block address information.

In a third aspect, a name node module is provided, where the name node module is applied to a storage system, the storage system further includes a data node module and a key value KV storage device; the name node module includes: a receiving unit, configured to receive An operation request message sent by the distributed file system HDFS client, the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate the target data; The operation request message is based on a ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform; the determining unit is configured to determine a key key according to the file name of the target file included in the operation request message, and according to the Key: the location of the storage space of the value value, the value is the data of the target file; the obtaining unit is configured to acquire the target data according to the start address information and the data length information in the operation request message. Target block address information in the storage space; a sending unit, configured to send to the HDFS client Operation request response message in response to the message, the response message includes address information of the target block.

In a first possible implementation manner of the third aspect, the determining unit is specifically configured to: determine an inode number of the target file according to the file name; and determine the inode number as the key The location of the value storage space.

With the third aspect or the first possible implementation manner of the third aspect, in a second possible implementation manner of the third aspect, the acquiring unit is specifically configured to: according to the start address information and the data The length information acquires a number of each block occupied by the target data in the storage space, and a block offset and a block length in each of the blocks.

In conjunction with the second possible implementation of the third aspect, in a third possible implementation manner of the third aspect, the block address information of the target data in the HDFS includes each of the target data occupied in the HDFS. a number of a logical block, and a logical block offset and a logical block length in each of the logical blocks; the sending unit is specifically configured to: use a number of the block as a number of the logical block, The block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.

In a fourth aspect, a data node module is provided, where the data node module is applied to a storage system, the storage system further includes a name node module and a key value KV storage device; the data node module includes: a receiving unit, configured to receive An operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform. The operation instruction includes block address information for storing the target data in the KV storage device, and an operation unit, configured to perform an operation indicated by the operation instruction on the target data according to the block address information.

In a fifth aspect, a server is provided, the server comprising the name node module of any of the third aspect or the possible implementation of the third aspect, and/or the server comprises the data node of the fourth aspect Module.

In another implementation, specifically, the server includes: a processor, a first interface, a second interface, and a communication bus; wherein the processor, the first interface, and the second interface are performed by using the communication bus Communication; the first interface is for communicating with a distributed file system HDFS client, the second interface is for communicating with a keyed KV storage device; the server is running name node software, and the server is The name node software performs the method of any one of the possible implementations of the first aspect or the first aspect above. Optionally, the server may further run data node software, where the server is executed by the data node software: receiving an operation instruction sent by the HDFS client, where the operation instruction is used to operate the target data The operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform; the operation instruction includes the Target block address information; performing an operation indicated by the operation instruction on the target data according to the target block address information.

The sixth aspect provides a storage system, where the storage system includes the name node module according to any one of the possible implementations of the third aspect, the data node module of the fourth aspect, and the key value. a KV storage device, the name node module is connected to the KV storage device, and the data node module is connected to the KV storage device.

In conjunction with the first possible implementation of the sixth aspect, the name node module and the data node module are deployed on the same server.

A seventh aspect, a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.

In an eighth aspect, a computer readable medium is provided for storing a computer program, the computer program comprising instructions for performing the method of the second aspect.

Based on the implementations provided by the above aspects, the present invention may further be combined to provide further implementations.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.

FIG. 1 is a schematic diagram of an HDFS architecture according to an embodiment of the present invention;

2 is a schematic flowchart of a method for data operation according to an embodiment of the present invention;

3 is a schematic flowchart of a data reading method according to an embodiment of the present invention;

4 is a schematic diagram of mapping of a physical storage unit in a file to a KV storage device according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a name node module according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a data node module according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of another server according to an embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of a storage system according to an embodiment of the present invention.

detailed description

In order to make it easier for those skilled in the art to understand the improvements of the prior art in the embodiments of the present invention, the following briefly introduces the solutions in the prior art.

1 is a schematic diagram of an HDFS architecture, as shown in the figure, the HDFS architecture includes an HDFS client, a name node, a data node 1, and a data node 2, wherein the client is connected to the name node and the data node 1, respectively, and the data node 1 is connected to data node 2. Among them, the name node runs the HDFS file system, and each data node runs a local file system.

The HDFS architecture illustrates the data writing process based on the HDFS architecture shown in FIG. 1: the HDFS client sends a write request message to the name node, where the write request message includes a file name, start address information, and data length information, and the name node receives the data. After writing the request message, first determine whether the file exists. If it does not exist, create a new file in the file system running on the name node, and after the creation is successful, the file is divided into multiple blocks of fixed size. And allocating a data node for each data block, wherein the data node stores each divided data block as a file in the local file system, and the same data block may have multiple copies stored on different data nodes; The name node Determining, according to the start address information and the data length information, a list of data blocks to be written, the data block list including the number of each data block, and the data to be written in each data block The offset and the length, after the HDFS client obtains the data block list, can send a write command to the data node, and write the data to be written to the data node.

The data reading process is as follows: the HDFS client sends a write request message to the name node, where the write request message includes a file name, a start address information, and data length information, and the name node sends the write request message to the HDFS client after receiving the write request message. The terminal returns a data block list of the data to be read, and after receiving the data block list of the data to be read, the HDFS client sends a read command to read the data from the data node.

It is worth noting that, in the HDFS architecture, the interface provided by the name node to the HDFS client is an RPC (Remote Procedure Call Protocol) interface, and the interface provided by the data node to the HDFS client is also an RPC interface, and HDFS. The communication protocol between the client and the name node is the ClientProtocol protocol, and the communication protocol between the client and the data node is the ClientDatanodeProtocol protocol.

An embodiment of the present invention provides a data operation method, where the method is applied to a storage system, where the storage system includes a name node module, a data node module, and a KV (key-value, key value) storage device, as shown in FIG. 2, Methods include:

S201. The name node module receives an operation request message sent by the HDFS client.

The operation request message is used to request to obtain block address information in the HDFS of the target data to be operated in the target file to operate on the target data; the operation request message is based on the name node and the HDFS client in the Hadoop platform. The ClientProtocol communication protocol.

The name node module provides an RPC interface to the HDFS client, and the name node module receives the operation request message sent by the HDFS client based on the RPC interface.

The operation request message may be a write request message for writing to the target data, for requesting to write the target data to a specified location in the target file, or may be a read request message for performing a read operation on the target data. The target data is read from a specified location of the target file.

S202. The name node module determines a key key according to a file name of the target file included in the operation request message, and determines a location of a storage space of the value value according to the key, where the value is data of the target file.

It is worth noting that the data of a file stored in a KV storage device is called a value, and each value corresponds to a unique key. According to the unique identifier, the location of the storage space of the value can be directly located. . For example, define a large ordered structure array HashValue[m] in the KV storage device, where m is an integer, and each HashValue is a storage space, such as HashValue[0], HashValue[1], and each storage space is used. Stores the data (value) of a file. And construct a hash function ChangeToHashValue (key), convert each value's unique identifier key to a subscript value x in HashValue[m], and then put the data of each file into HashValue[x], When the data in the file needs to be operated again, the subscript value can be obtained by using the hash function ChangeToHashValue(key) according to the key of the file, thereby determining the location of the storage space of the data of the file.

Optionally, in the embodiment of the present invention, the index node inode number corresponding to the file name of the target file may be used as the key of the data (value) of the target file. In this case, the step S202 specifically includes: according to the file name. The inode number of the target file is determined, and the location of the storage space of the value is determined by using the inode number as a key.

The name node includes a list of directory entries, each directory entry consisting of two parts: the file name of the file included, and The file name corresponds to the inode number. Therefore, the name node module can determine the inode number corresponding to the file name of the target file by querying the directory entry list. It is worth noting that the file system does not use the file name internally, but uses the inode number to identify the file. In the file system, when operating on the data of a file, you need to find the inode number corresponding to the file name of the file. Second, obtain the inode information by the inode number, and finally find the block where the file data is located according to the inode information.

It can be seen that the inode number is the identifier of the file in the file system. In an optional implementation of the embodiment of the present invention, the inode number is used as the key of the file data to uniquely identify the value.

S203. The name node module acquires target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message.

Specifically, the name node module acquires, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and a block offset and a block in each block. length.

It is worth noting that the address of the file presented by the HDFS file system to the HDFS client is continuous, that is, the files perceived by the HDFS client are continuously stored. The data of the file is stored in blocks in the storage space of the KV storage device. Each block is a physical storage unit, and each physical storage unit has a pointer for pointing to the next unit, the target block address. The information is the location information of each physical storage unit occupied by the target data in the storage space.

S204. The name node module sends a response message to the HDFS client for responding to the operation request message, where the response message includes the target block address information.

The operation request message may be a message sent by the HDFS client to the getblocklocation interface of the name node module, where the parameter passed in is the file name of the target file, the start address of the target data in the target file, and the The length of the target data. The parameters required by the interface are the number of each logical block occupied by the target data in the HDFS file system, and the logical block offset and logical block length in each logical block.

In the embodiment of the present invention, the name node module may use the number of the block in the storage space as the number of the logical block, and use the block offset as the logical block offset. Return to the HDFS client as the logical block length.

S205. The data node module receives an operation instruction sent by the HDFS client, where the operation instruction includes the target block address information.

The operation instruction is used to operate the target data to be operated in the target file; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform.

S206. The data node module performs an operation indicated by the operation instruction on the target data according to the block address information.

In the case that the operation instruction is a write operation instruction, the data node module writes the target data to a position specified by the target block address information, and in a case where the operation instruction is a read operation instruction, the data node module is from the target The target data is read by the location specified by the block address information.

In the above method, the name node module communicates with the HDFS client based on the ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform, and the data node module is based on the ClientDatanodeProtocol communication between the data node and the HDFS client in the Hadoop platform. The protocol communicates to ensure the support of other Hadoop functions. Under this premise, the data of the HDFS file is stored at the bottom of the key value, thereby improving the data read and write efficiency and capacity scalability.

In order to make it easier for a person of ordinary skill in the art to understand the technical solution provided by the present invention, the following requests for operation are eliminated. The case of the read request message is exemplified.

For example, if the HDFS client needs to read the target data with a starting address of 100M (megabytes) and a data length of 128M in the file named "first file", in the embodiment of the present invention, the data is read. The method is shown in Figure 3 and includes:

S301. The name node module receives a read request message sent by the HDFS client, where the read request message includes a file name, start address information, and data length information.

The file name is the "first file", the start address information is 100M, and the data length information is 128M.

S302. The name node module determines an inode number of the file according to the file name.

S303. The name node module calculates a location of a storage space of data (value) of the file according to the inode number (key).

For the steps S302 and S303, reference may be made to the foregoing description of step S202, and details are not described herein again.

S304. The name node module acquires target block address information of the target data in the storage space in the KV storage device.

The size of each block in the KV storage device can be set according to user requirements. If the size of each block in the KV storage device is 64M, as shown in FIG. 4, the starting address in the first file is 100M, and the data length is The target data of 128M occupies block 1, block 2 and block 3 in the storage space, wherein the offset in block 1 is 36M, the length is 28M, the offset in block 2 is 0, and the length is 64M. The offset in block 3 is 0 and the length is 36M.

Therefore, the target block address information may be list information as shown in the following table:

编号Numbering	偏移量Offset	长度length
11	36M36M	28M28M
22	00	64M64M
33	00	36M36M

S305. The name node module sends a response message including the target block address information to the HDFS client.

The response protocol is based on the ClientProtocol communication protocol between the HDFS client and the name node in the Hadoop platform. Reference may be made to the description of step S204 above, and details are not described herein again.

In the native Hadoop platform, the response message returned by the name node to the HDFS client includes block address information of the target data in the HDFS file system, and the block address information includes the number, the offset, and the length, but the HDFS does not perceive the name node to return. The block address information is a logical address or a physical address. Therefore, the embodiment of the present invention can return the address information of the physical storage unit in the storage space of the KV storage device to the HDFS client.

S306. The data node module receives a read command sent by the HDFS client, where the read command includes the target block address information.

The operation instruction is based on the ClientDatanodeProtocol communication protocol between the HDFS client and the data node in the Hadoop platform.

S307. The data node module reads the target data from the KV storage device according to the target block address information.

S308. The data node module sends the target data to the HDFS client.

In the above method, for the HDFS client, the data storage of the lower layer is still the HDFS file system, which guarantees the support of other functions of Hadoop, and the data of the file is done at the bottom layer without the HDFS client being aware of it. The key value storage, the data reading does not need to go through the complicated hierarchical mechanism of the file system, the reading efficiency is improved, and the flat storage structure of the key value storage also improves the capacity scalability.

An embodiment of the present invention further provides a name node module 50, where the name node module 50 is applied to a storage system, The storage system further includes a data node module and a key value KV storage device. The name node module 50 is used to implement the corresponding steps in the foregoing method embodiments. As shown in FIG. 5, the name node module 50 includes:

The receiving unit 51 is configured to receive an operation request message sent by the distributed file system HDFS client, where the operation request message is used to request to acquire block address information in the HDFS of the target data to be operated in the target file, to The target data is operated; the operation request message is based on a ClientProtocol communication protocol between a name node and an HDFS client in the Hadoop platform;

a determining unit 52, configured to determine a key key according to a file name of the target file included in the operation request message, and determine a location of a storage space of a value value according to the key, where the value is data of the target file ;

The obtaining unit 53 is configured to acquire target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message;

The sending unit 54 is configured to send, to the HDFS client, a response message for responding to the operation request message, where the response message includes the target block address information.

The above name node module 50 is employed. The name node module 50 communicates with the HDFS client based on the ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform. In the case that the upper layer communication interface is not changed, that is, the name node is for the HDFS client. The HDFS file system is still present, and the data of the HDFS file is stored at the bottom of the key value, which improves the efficiency of reading and writing data and capacity scalability.

Optionally, the determining unit 52 is specifically configured to: determine an index node inode number of the target file according to the file name; and determine the location of the storage space of the value by using the inode number as the key.

Optionally, the obtaining unit 53 is configured to: obtain, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and each The block offset and block length in the block.

Optionally, the block address information of the target data in the HDFS includes a number of each logical block occupied by the target data in the HDFS, and a logical block offset and a logical block in each of the logical blocks. The sending unit 54 is specifically configured to: use the number of the block as the number of the logical block, use the block offset as the logical block offset, and use the block length as the logic. The block length is sent to the HDFS client.

It is to be noted that the unit division of the name node module is only a logical function division. In actual implementation, there may be another division manner. For example, the determination unit 52 and the acquisition unit 53 are divided into one processing unit. . Moreover, the physical implementation of each of the above functional units may also have multiple implementations.

In addition, it should be clearly understood by those skilled in the art that for the convenience and brevity of the description, the specific working process of each unit of the name node module described above may refer to the corresponding process in the foregoing method embodiment, and no longer Narration.

The embodiment of the present invention further provides a data node module 60, where the data node module is applied to a storage system, and the storage system further includes a name node module and a key value KV storage device, where the data node module 60 is used to implement the foregoing method embodiment. In the corresponding step, the data node module 60 includes:

The receiving unit 61 is configured to receive an operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on the data node and the HDFS client in the Hadoop platform. a ClientDatanodeProtocol communication protocol between the ends; the operation instruction includes block address information for storing the target data in the KV storage device;

The operation unit 62 is configured to perform an operation indicated by the operation instruction on the target data according to the block address information.

Using the data node module 60, the data node module 60 communicates based on the ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform, and performs key on the data in the KV storage device without changing the upper layer communication interface. Value storage improves data read and write efficiency and capacity scalability.

It should be clearly understood by those skilled in the art that, for the convenience and brevity of the description, the specific working process of each unit of the name node module described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

The embodiment of the present invention further provides a server. As shown in FIG. 7, the server includes the name node module 50 shown in FIG. 5 and/or the data node module 60 shown in FIG. 6. Referring specifically to FIG. 5 and FIG. 6 above. The description is not repeated here. That is to say, the name node module and the data node module can be flexibly deployed on the computer.

Another embodiment of the present invention provides a server 80. As shown in FIG. 8, the server 80 includes:

a processor 81, a first interface 82, a second interface 83, and a communication bus 84; the processor 81, the first interface 82, and the second interface 83 communicate via the communication bus 84; The interface 82 is for communicating with a distributed file system HDFS client, the second interface 83 is for communicating with a key value KV storage device; the server runs name node software, and the server is executed by the name node software The following operation:

Receiving an operation request message sent by the HDFS client, where the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate on the target data; The message is based on the ClientProtocol communication protocol between the name node and the HDFS client in the Hadoop platform;

Determining a key key according to a file name of the target file included in the operation request message, and determining a location of a storage space of a value value according to the key, where the value is data of the target file;

Obtaining target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message;

Sending a response message for responding to the operation request message to the HDFS client, the response message including the target block address information.

Optionally, determining, according to the file name of the target file included in the operation request message, a key key, including: determining an index node inode number of the target file according to the file name; using the inode number as The key determines the location of the storage space of the value.

Optionally, the acquiring the target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message, including: according to the start address information and the Describe the data length information, obtain the number of each block occupied by the target data in the storage space, and the block offset and block length in each of the blocks.

Optionally, the block address information of the target data in the HDFS includes a number of each logical block occupied by the target data in the HDFS, and a logical block offset and a logical block in each of the logical blocks. Transmitting, to the HDFS client, a response message for responding to the operation request message, including: using a number of the block as a number of the logical block, using the block offset as the logic a block offset, the block length being sent to the HDFS client as the logical block length.

In a possible implementation manner of the embodiment of the present invention, the server 80 may also run data node software, where the server 80 performs: receiving, by the data node software, an operation instruction sent by the HDFS client, The operation instruction is configured to operate on the target data; the operation instruction is based on a ClientDatanodeProtocol communication protocol between the data node and the HDFS client in the Hadoop platform; the operation instruction includes the target block address information; according to the target block The address information performs an operation indicated by the operation instruction on the target data.

The server 80 may also include other devices, such as storage media, for storing program instructions, not shown in FIG. In addition, it should be understood by those skilled in the art that the operations performed by the processor 81 may be performed by the cooperation of other devices. For the convenience of description, the embodiment of the present invention is uniformly described as the operation of the processor 81 to perform data sorting.

The processor 81 in the embodiment of the present invention may be a CPU (Center Processing Unit). In addition, in order to save the computing resources of the CPU, the processor 81 may also be an FPGA (Field Programmable Gate Array) or other hardware, or the processor 81 may also be a CPU and an FPGA or other hardware, then the FPGA or Other hardware and CPU respectively perform part of operations in the embodiments of the present invention.

The embodiment of the present invention further provides a storage system 90. As shown in FIG. 9, the storage system 90 includes:

The name node module 50, the data node module 60, the KV storage device 91, the name node module 50 is connected to the KV storage device 91, and the data node module 60 is connected to the KV storage device 91.

Specifically, as shown in FIG. 9, the name node module 50 is connected to the HDFS client, and the data node module 60 is connected to the HDFS client. The name node module 50 includes an INTF_Namenode interface for providing an RPC interface to the HDFS client. The name node module 50 can receive the metadata processing or management command sent by the HDFS client through the RPC interface. The data node module 60 includes an INTF_Datanode interface, and is configured to provide an RPC interface to the client, and the data node module 60 receives the RPC interface. The data processing command sent by the client. The KV storage device 91 provides a standard key-value interface, INTF_KV, for the name node module 50 and the data node module 60.

The name node module 50 can be specifically referred to the specific description of FIG. 5 above. The data node module 60 can be specifically referred to the foregoing detailed description of FIG. 5, and details are not described herein again.

In a possible implementation manner of the embodiment of the present invention, the name node module 50 and the data node module 60 may be deployed on the same server at the same time, or may be separately deployed on different servers.

In addition, it should be noted that the storage system 90 shown in FIG. 9 includes only one name node module and one data node module. In specific implementation, the number of data node modules included in the storage system and the data of the name node module may be According to the actual requirements, when there are multiple name node modules and multiple data node modules, the HDFS client can first use the DNS (Domain Name System) polling mode when connecting the name node module. Obtain the address of a name node module. For the address of multiple data node modules returned by the name node module, the HDFS client can select the nearest data node module to connect.

In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The software functional units described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform portions of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a RAM (Random Access Memory), a magnetic disk, or an optical disk, and the like, which can store data.

While the preferred embodiment of the invention has been described, it will be understood that Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and

It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Claims

A data operation method, wherein the method is applied to a storage system, where the storage system includes a name node module, a data node module, and a key value KV storage device; the method includes:

The name node module receives an operation request message sent by the distributed file system HDFS client, where the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to the target data. Performing operations; the operation request message is based on a ClientProtocol communication protocol between a name node and an HDFS client in the Hadoop platform;

Determining a key key according to a file name of the target file included in the operation request message, and determining a location of a storage space of a value value according to the key, where the value is data of the target file;

Obtaining target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message;

Sending a response message for responding to the operation request message to the HDFS client, the response message including the target block address information.
The method according to claim 1, wherein the determining the key key according to the file name of the target file included in the operation request message comprises:

Determining an inode number of the target file according to the file name;

The inode number is used as the key to determine the location of the storage space of the value.
The method according to claim 1 or 2, wherein the acquiring the target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message, include:

And obtaining, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and a block offset and a block length in each of the blocks.
The method according to claim 3, wherein the block address information of the target data in the HDFS includes the number of each logical block occupied by the target data in the HDFS, and in each of the logical blocks The logical block offset and the logical block length; the sending, to the HDFS client, a response message for responding to the operation request message, including:

The number of the block is used as the number of the logical block, the block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.
A data operation method, wherein the method is applied to a storage system, where the storage system includes a name node module, a data node module, and a key value KV storage device; the method includes:

The data node module receives an operation instruction sent by a distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a data node and an HDFS client in the Hadoop platform. a ClientDatanodeProtocol communication protocol; the operation instruction includes block address information in the KV storage device storing the target data;

Performing an operation indicated by the operation instruction on the target data according to the block address information.
A name node module, wherein the name node module is applied to a storage system, the storage system further includes a data node module and a key value KV storage device; the name node module includes:

a receiving unit, configured to receive an operation request message sent by the distributed file system HDFS client, where the operation request message is used to request to acquire block address information in the HDFS of the target data to be operated in the target file, to The target data is operated; the operation request message is based on a ClientProtocol communication protocol between a name node and an HDFS client in the Hadoop platform;

a determining unit, configured to determine a key according to a file name of the target file included in the operation request message, and determine a location of a storage space of a value value according to the key, where the value is data of the target file;

An obtaining unit, configured to acquire target block address information of the target data in the storage space according to start address information and data length information in the operation request message;

And a sending unit, configured to send, to the HDFS client, a response message for responding to the operation request message, where the response message includes the target block address information.
The name node module according to claim 6, wherein the determining unit is specifically configured to:

Determining an inode number of the target file according to the file name;

The inode number is used as the key to determine the location of the storage space of the value.
The name node module according to claim 6 or 7, wherein the obtaining unit is specifically configured to:

And obtaining, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and a block offset and a block length in each of the blocks.
The name node module according to claim 8, wherein the block address information of the target data in the HDFS includes the number of each logical block occupied by the target data in the HDFS, and in each of the logics a logical block offset and a logical block length in the block; the sending unit is specifically configured to:

The number of the block is used as the number of the logical block, the block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.
A data node module, wherein the data node module is applied to a storage system, and the storing The storage system further includes a name node module and a key value KV storage device; the data node module includes:

a receiving unit, configured to receive an operation instruction sent by the distributed file system HDFS client, where the operation instruction is used to operate target data to be operated in the target file; the operation instruction is based on a data node and an HDFS client in the Hadoop platform a ClientDatanodeProtocol communication protocol; the operation instruction includes block address information in the KV storage device storing the target data;

And an operation unit, configured to perform an operation indicated by the operation instruction on the target data according to the block address information.
A server, characterized in that the server comprises a name node module according to any of claims 6-9, and/or a data node module according to claim 10.
A server, comprising: a processor, a first interface, a second interface, and a communication bus; wherein the processor, the first interface, and the second interface communicate via the communication bus The first interface is for communicating with a distributed file system HDFS client, and the second interface is for communicating with a key value KV storage device;

The server runs name node software, and the server executes by the name node software:

Receiving an operation request message sent by the distributed file system HDFS client, where the operation request message is block address information in the HDFS for requesting acquisition of target data to be operated in the target file, to operate on the target data; The operation request message is based on a ClientProtocol communication protocol between a name node and an HDFS client in the Hadoop platform;

Determining a key key according to a file name of the target file included in the operation request message, and determining a location of a storage space of a value value according to the key, where the value is data of the target file;

Obtaining target block address information of the target data in the storage space according to the start address information and the data length information in the operation request message;

Sending a response message for responding to the operation request message to the HDFS client, the response message including the target block address information.
The server according to claim 12, wherein said server is executed by said name node software:

Determining an inode number of the target file according to the file name;

The inode number is used as the key to determine the location of the storage space of the value.
The server according to claim 12 or 13, wherein said server is executed by said name node software:

And obtaining, according to the start address information and the data length information, a number of each block occupied by the target data in the storage space, and a block offset and a block length in each of the blocks.
The server according to claim 14, wherein the block address information of the target data in the HDFS includes a number of each logical block occupied by the target data in the HDFS, and in each of the logical blocks Logic block offset and logical block length; the server is executed by the name node software:

The number of the block is used as the number of the logical block, the block offset is used as the logical block offset, and the block length is sent to the HDFS client as the logical block length.
The server according to any one of claims 12 to 15, wherein the server runs data node software, and the server executes by the data node software:

Receiving an operation instruction sent by the HDFS client, where the operation instruction is used to operate on the target data; the operation instruction is based on a ClientDatanodeProtocol communication protocol between a data node and an HDFS client in a Hadoop platform; Include the target block address information;

Performing an operation indicated by the operation instruction on the target data according to the target block address information.
A storage system, characterized in that the storage system comprises a name node module according to claims 6 to 9, a data node module according to claim 10, a key value KV storage device, and a name node module and The KV storage device is connected, and the data node module is connected to the KV storage device.
The storage system according to claim 17, wherein said name node module and said data node module are deployed on the same server.