CN114201446A - Method and system for realizing HDFS (Hadoop distributed File System) remote storage mounting - Google Patents

Method and system for realizing HDFS (Hadoop distributed File System) remote storage mounting Download PDF

Info

Publication number
CN114201446A
CN114201446A CN202111389173.3A CN202111389173A CN114201446A CN 114201446 A CN114201446 A CN 114201446A CN 202111389173 A CN202111389173 A CN 202111389173A CN 114201446 A CN114201446 A CN 114201446A
Authority
CN
China
Prior art keywords
information
metadata
remote storage
hdfs
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111389173.3A
Other languages
Chinese (zh)
Other versions
CN114201446B (en
Inventor
尹明俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111389173.3A priority Critical patent/CN114201446B/en
Publication of CN114201446A publication Critical patent/CN114201446A/en
Application granted granted Critical
Publication of CN114201446B publication Critical patent/CN114201446B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms

Abstract

The invention discloses a method and a system for realizing HDFS remote storage mounting, wherein the method comprises the following steps: receiving a request of data remote storage mount sent from a client, pulling metadata to be mounted to the remote storage, and establishing mapping relation information between the remote storage and the HDFS; and forwarding and persisting the mapping relation information to a database, reading the mapping relation information in the database according to a request for reading the remote storage metadata sent by the client, and returning the block information to be read to the client. The method realizes the dynamic mounting of the remote storage on the HDFS layer, and effectively improves the flexibility of reading the remote storage; the metadata information stored remotely is automatically constructed by utilizing the framework of the mounting management module, so that the operation and maintenance difficulty is reduced; and a metadata forwarding form and a data caching mechanism are adopted, so that the access pressure of the NameNode is reduced while remote storage is increased, and the reading performance of the HDFS is improved.

Description

Method and system for realizing HDFS (Hadoop distributed File System) remote storage mounting
Technical Field
The invention belongs to the technical field of data storage, and particularly relates to a method for realizing HDFS (Hadoop distributed File System) remote storage mounting.
Background
In a large-scale HDFS cluster, data has capacity problems, and as the data volume rises, higher demands are made on storage space. Therefore, partial data in the HDFS can be stored in an external storage, so that the storage space pressure of the HDFS is reduced. Moreover, for the consideration of Storage cost and flexibility, it is becoming more popular to migrate HDFS data to cloud Storage, for example, to migrate HDFS cold data (i.e. state data before a long time, which is characterized by low access frequency and needs to reduce its Storage cost to the maximum extent, and at the same time requires to be accessible at any time) to Amazon S3 (Amazon Simple Storage Service), which can greatly reduce the Storage cost. However, after data is migrated to other storage, the upper layer service also needs to make corresponding changes to adapt to changes in the lower layer storage.
At present, a providedstorage technology (which allows an HDFS to map data of an external file system into the HDFS, and can directly implement addressing of the external file system through the HDFS, thereby implementing access to remote external data) is mainly maintained in a NameNode by constructing metadata mapping of external Storage in a form of Alias map (essentially, a LevelDB, which operates as an independent service in the NameNode and is responsible for storing metadata information related to remote Storage), so that data stored externally can be directly accessed through an HDFS client. The following problems still remain:
firstly, when external storage is added, an external storage path needs to be configured in the DataNode service, and the HDFS service can be used after the configuration is restarted, so that the HDFS service is not beneficial to the use of a production environment;
secondly, generating metadata of externally stored data depends on a manually operated fsimage tool, and the tool is complex to use and lacks flexibility for long-term running clusters;
compared with HDFS local reading, the external storage reading process has poor reading performance, so that the reading data is long in time consumption and low in efficiency;
and fourthly, when the external storage data is not used any more, the service is restarted after the relevant configuration of the HDFS is modified, and the operation and maintenance difficulty and risk of the HDFS cluster are increased.
Therefore, the research on the efficient, flexible and easy-to-use HDFS remote storage mounting technology has important significance, dynamic mounting of external storage in the HDFS is achieved, flexible transmission of the HDFS and external storage data is achieved, and the technical problem to be solved at present is solved.
Disclosure of Invention
The invention aims to provide a method and a system for realizing remote storage mounting of an HDFS (Hadoop distributed File System), so as to realize dynamic mounting of external storage in the HDFS and make the HDFS and external storage data flexibly transmitted.
To this end, in a first aspect of the present invention, a method for implementing a remote mount of an HDFS is provided, where the method includes:
receiving a request of data remote storage mount sent from a client, pulling metadata to be mounted to the remote storage, and establishing mapping relation information between the remote storage and the HDFS;
and persisting the mapping relation information to a database, reading the mapping relation information in the database according to a request for reading the metadata stored at the remote end sent by the client, and returning the block information to be read to the client.
Further, the method further comprises: and caching the metadata stored at the remote end to the DataNode node in a data copy mode for local storage.
Further, after receiving the request initiated by the client, the authentication device further obtains an authentication credential stored remotely based on the authentication information contained in the request, and uses the authentication credential for the process of pulling the metadata information and forwarding the mapping relationship information.
Further, pulling the metadata to be mounted to the remote storage and establishing and forming mapping relationship information between the remote storage and the HDFS includes:
using the authentication certificate to recursively acquire metadata information related to all files and directories through the URI stored at the far end;
and aggregating the acquired metadata information, constructing a URI and an HDFS mounting directory stored at a far end into a first mapping pair, constructing files and block information corresponding to the far end storage into a second mapping pair, constructing files, directories, user group information and authority information corresponding to the far end storage into a third mapping pair, and forwarding and persisting the three kinds of formed mapping relation information to corresponding databases.
Further, the received information of the third mapping pair is sent to a Namespace database for storage; and sending the received information of the first mapping pair and the second mapping pair to an AliasMap database for storage, and forwarding the information to the AlisMap database and further comprising the authentication certificate. And after the data is successfully forwarded to the database, the mounting operation is completed.
Further, the request sent by the client includes creation, deletion and inquiry.
In a second aspect of the present invention, a system for implementing remote storage mount of an HDFS is provided, where the system includes:
the mount management module receives a data remote storage mount request sent from a client, pulls metadata to be mounted to the remote storage, establishes mapping relation information between the remote storage and the HDFS, and sends the mapping relation information to the metadata forwarding module;
and the metadata forwarding module is used for persisting the received mapping relation information to a database, acquiring the block information to be read by reading the mapping relation information in the database according to a request for reading the remote storage metadata sent by the client, and returning the block information to the client.
The system further comprises a data caching module, which is used for caching the metadata stored at the remote end to the DataNode node in a data copy mode for local storage, and obtaining the metadata from the cache of the DataNode node when the client reads the same data request again.
Further, the mount management module includes:
the authentication module is used for receiving a mounting request of a client, analyzing authentication information contained in the mounting request to acquire an authentication certificate for remote storage, and sending the acquired authentication certificate to the metadata forwarding module and the remote storage access module;
the remote storage access module is used for pulling corresponding metadata information to be mounted to the remote storage according to the acquired authentication certificate and sending the metadata information to the metadata processing module;
and the metadata processing module is used for aggregating the acquired metadata information, constructing the URI and the HDFS mount directory stored at the far end into a first mapping pair, constructing the file and block information corresponding to the far end storage into a second mapping pair, constructing the file, the directory and the user group information corresponding to the far end storage and the authority information into a third mapping pair, and sending the formed three kinds of mapping relation information to the metadata forwarding module.
Further, the metadata forwarding module includes: a first forwarding module and a second forwarding module, wherein,
the first forwarding module is used for sending the received information of the third mapping pair to a Namespace database;
and the second forwarding module is used for sending the received information of the first mapping pair and the second mapping pair to the AliasMap database.
Compared with the prior art, the method and the system for realizing the remote storage mount of the HDFS provided by the invention have the advantages that the mount management module is designed, the metadata information is pulled according to the authentication certificate, the metadata processing module is designed, the remote storage metadata conversion and construction are realized, and the flexibility of remote storage reading is improved; by designing a metadata forwarding module, shunting remote storage read requests, and persisting metadata information, the creation, query and deletion of mount metadata are realized, and the access pressure of a NameNode is reduced; by designing the data cache module, the remote data is stored in the local DataNode in a copy mode, so that the reading performance can be greatly optimized.
Drawings
Fig. 1 is a schematic flowchart of a method for implementing HDFS remote storage mount according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a system for implementing HDFS remote storage mount according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a mount management module in the second embodiment of the present invention.
Detailed Description
The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby. As certain terms are used throughout the description and claims to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. The present specification and claims do not intend to distinguish between components that differ in name but not function. The following description is of the preferred embodiment for carrying out the invention, and is made for the purpose of illustrating the general principles of the invention and not for the purpose of limiting the scope of the invention. The scope of the present invention is defined by the appended claims.
The invention is described in further detail below with reference to the figures and specific embodiments.
The HDFS (Hadoop Distributed File System) is a Hadoop Distributed File system, and has the following characteristics: 1) HDFS clusters are mainly divided into two major roles: NameNode, DataNode; 2) the NameNode is responsible for managing metadata of the whole file system, operating file or directory operation of a file name space, such as opening, closing, renaming and the like, and simultaneously determining mapping relation between blocks and data nodes, such as mapping information of files and DataNode nodes, mapping information of files and file blocks and the like; 3) the DataNode is responsible for managing the file data block of the user, is responsible for the read-write request from the customer of the file system, the data node also needs to carry out the establishment of the block at the same time, delete, and come from the block replication order of the name node; 4) the file is cut into a plurality of blocks according to a fixed size (blocksize) and then is stored on a plurality of DataNodes in a distributed mode; 5) each file block can have a plurality of copies and is stored in different DataNodes; 6) the DataNode will report the block information stored by itself to NameNode periodically, while NameNode will be responsible for keeping the copy number of the file; 7) the internal working mechanism of the HDFS is kept transparent to the client, and the client requests to access the HDFS through a NameNode application.
Example one
Fig. 1 is a schematic flowchart of a method for implementing HDFS remote storage mount according to an embodiment of the present invention, where the method includes:
step S1, receiving a request for mounting the data remote storage sent from the client, pulling the metadata to be mounted to the remote storage, and establishing and forming the mapping relation information between the remote storage and the HDFS.
The HDFS client initiates a request instruction for mounting the remote storage to the HDFS directory through an administrator instruction, and the request can be creation, deletion or inquiry of mounting.
After receiving a request initiated by an HDFS client, after receiving a mounting request, pulling metadata to be mounted to a remote storage, meanwhile, obtaining an authentication certificate of the remote storage based on authentication information contained in the request, and using the authentication certificate in the metadata information pulling and metadata reading request process of the client. Specifically, the authentication information contained in the mount request is firstly analyzed, the authentication information is derived from the authentication parameters input in the client administrator instruction, then the authentication request is initiated to the remote storage through the authentication information to obtain the authentication certificate returned by the remote storage, after the authentication certificate is obtained, the request initiated by the subsequent client can realize the access to the remote storage without the authentication information, and meanwhile, in order to facilitate the subsequent operation, the authentication certificate is used in the subsequent pulling of the metadata information and the request process of reading the metadata by the client, so that the metadata information to be mounted in the remote storage can be quickly obtained without performing authentication when the metadata is pulled to the remote storage, authentication is not required when the HDFS client initiates the read request, and the efficiency of data transmission is improved.
When pulling corresponding metadata to be mounted to a remote storage, firstly, metadata information of all files and directories, specifically including user group information, permission information, block information and the like, is recursively obtained through a URI of the remote storage, and then, relevant metadata information is subjected to aggregation processing to form mapping relation information between the remote storage and the HDFS. Specifically, a URI and an HDFS mounting directory stored at a far end are constructed into a first mapping pair, files and block information corresponding to the far end storage are constructed into a second mapping pair, files, directories, user group information and authority information corresponding to the far end storage are constructed into a third mapping pair, and the formed three kinds of mapping relation information are forwarded and persisted to corresponding databases.
And step S2, forwarding and persisting the formed mapping relationship information to a database, and according to a request for reading remote storage metadata sent by the client, obtaining block information to be read by reading the mapping relationship information in the database, and returning the block information to the client.
Sending the received information of the third mapping pair, namely mapping relation data among files, directories, user group information and authority information corresponding to remote storage to a Namespace database for storage; and sending the received information of the first mapping pair and the second mapping pair, namely the URI and the HDFS mount directory of the remote storage and the file and block information (the form of the block information can be path offset and length)) corresponding to the remote storage to the AliasMap database for storage. After the mounting is completed, the client can directly acquire the authority information of the corresponding user through the Namespace database during later access, directly access the remote storage through the first mapping pair relationship in the Namespace database, and read the relationship through the second mapping pair relationship. The method and the device separate the information distribution related to the remote storage from the traditional read-write information, and do not influence the normal read-write performance.
Data persistence is the general term for converting a data model in memory to a storage model and vice versa. The data model may be any data structure or object model, and the storage model may be a relational model, XML, binary stream, etc.; the object model and the relational model are widely applied, so the data persistence in the general sense is the conversion of the object model into the relational database. The 'persistence' is opposite to the 'temporary' concept, data generally has two storage places in a computer, a memory is temporary storage, data is lost when power is off, and data needs to be persisted if the data needs to be used repeatedly, so that the persistence is realized. The persistence technology encapsulates data access details, provides an object-oriented API for most business logic, can reduce the times of accessing database data and increase the execution speed of an application program; the code reusability is high, and most of database operations can be completed; loose coupling, make persistence not rely on bottom database and upper business logic to realize, only need to modify the configuration file and not need to modify the code while changing the database.
The Namespace, i.e., "Namespace", in this embodiment is also referred to as "Namespace" or "Namespace". Net, one form of code organization used by the various languages is classified by name space, distinguishing different code functions, and is also part of the full name of all classes in vs.
The alias map in the embodiment is essentially a level db, and the level db is a persistent KV standalone database of Google open source, and has very high random write and sequential read/write performance, and the level db is very suitable for being applied to scenes with few queries and many writes. The level DB applies an LSM (Log Structured Merge) strategy, the lsm _ tree carries out delay and batch processing on index change, updates are efficiently migrated to a disk in a merging and sequencing-like mode, and index insertion cost is reduced.
As a preferred embodiment of the present invention, the present application further caches metadata stored remotely to the DataNode node in the form of a data copy for local storage.
Specifically, after the remote storage mount is completed, when the HDFS client requests to read data, the HDFS client obtains the authentication credential and the block information stored remotely through the persistent mapping information, then pulls the corresponding block information data in the form of the remote storage client, and caches the block information data in the form of a data copy locally to the DataNode, and then reads the data. In addition, the cache design in the DataNode service can be closed or opened through configuration according to needs, and for remote storage data which needs to be frequently read for a long time, the reading performance can be greatly optimized through the design, and the communication efficiency of the data is improved.
The method for realizing the mount of the remote storage of the HDFS disclosed by the embodiment of the invention realizes the creation and deletion of the mount and constructs the mapping relation of the remote storage in the HDFS by authenticating, pulling and processing the metadata information stored at the remote end and forwarding and persisting the metadata information to Namespace and AliasMap, thereby improving the flexibility of the remote storage and reading, shunting the read request of the remote storage, persisting the metadata information, realizing the creation, query and deletion of the mount metadata, reducing the access pressure of the NameNode, designing a cache space at the DataNode node, caching the remote data in a copy form at the local of the DataNode, and greatly optimizing the reading performance.
Example two
Fig. 2 is a schematic structural diagram of a system for implementing HDFS remote storage mount according to a second embodiment of the present invention, where the system includes: the device comprises a mounting management module, a metadata forwarding module and a data caching module. Firstly, a request is sent to a mount management module from an HDFS client through an administrator instruction, and a remote storage is mounted to an HDFS directory; then, the mount management module sends the pulled related metadata information stored at the remote end to the metadata forwarding module, and the metadata forwarding module then persists the metadata to the AliasMap and the Namespace. When the HDFS client initiates a read request to the remote storage, the NameNode forwards the request to the metadata forwarding module, and the metadata forwarding module reads the AliasMap and returns the block information to the client. And finally, the client sends a reading request to the DataNode by using the returned block information, a data cache module in the DataNode caches the remote storage data to the local, and the client reads the data from the cache. The HDFS remote storage mounting solution which is efficient, flexible and easy to use is achieved overall.
The mount management module receives a data remote storage mount request sent from a client, pulls metadata to be mounted to the remote storage, establishes mapping relation information between the remote storage and the HDFS, and sends the mapping relation information to the metadata forwarding module.
Specifically, the mount management module includes:
the authentication module is used for receiving a mounting request of a client, analyzing authentication information contained in the mounting request to acquire an authentication certificate for remote storage, and sending the acquired authentication certificate to the metadata forwarding module and the remote storage access module;
the remote storage access module is used for pulling corresponding metadata information to be mounted to the remote storage according to the acquired authentication certificate and sending the metadata information to the metadata processing module; after the remote storage access module obtains the authentication voucher, the remote storage access module uses the voucher to pull the metadata information of the data to be mounted corresponding to the remote storage, and recursively obtains the metadata information of all files and directories through the URI of the remote storage, wherein the metadata information specifically comprises user group information, permission information, block information and the like.
And the metadata processing module is used for aggregating the acquired metadata information, constructing the URI and the HDFS mount directory stored at the far end into a first mapping pair, constructing the file and block information corresponding to the far end storage into a second mapping pair, constructing the file, the directory and the user group information corresponding to the far end storage and the authority information into a third mapping pair, and sending the formed three kinds of mapping relation information to the metadata forwarding module.
And the metadata forwarding module is used for persisting the received mapping relation information to a database, completing mounting operation after persistence, persisting the authentication certificate to the corresponding AliasMap, acquiring the block information to be read by reading the mapping relation information in the database according to a request for reading remote storage metadata sent by the client side forwarded by the NameNode, and returning the block information to the client side. By setting the metadata forwarding module, the concurrent access pressure of the NameNode can be reduced, the information distribution related to remote storage is separated from the traditional read-write information, and the normal read-write performance is not influenced.
In this embodiment, the metadata forwarding module includes: the first forwarding module and the second forwarding module are respectively used for forwarding different metadata information to different databases for storage,
the first forwarding module is used for sending the received information of the third mapping pair to a Namespace database; all files and directory under the remote storage mount directory, user group information and permission information mapping pairs are stored in the Namespace, and corresponding user permission information can be directly obtained through the Namespace when a client accesses the Namespace in the future.
And the second forwarding module is used for sending the received information of the first mapping pair and the second mapping pair to the AliasMap database. In the AliasMap, a mapping pair of a URI (Uniform resource identifier), an authentication voucher and an HDFS (Hadoop distributed file system) mounting directory stored at a remote end is stored, the HDFS mounting directory is used as a key, and an HDFS client can directly access the remote storage through the mapping pair after mounting is completed; and in addition, a mapping pair constructed by the corresponding file and the block information in the storage remote end storage is stored, and the block information is in the form of (path, offset, length) so as to realize the reading of the data.
When mount deletion is executed, a delete request is initiated by the HDFS client, the mount management module sends the remote storage URI to be deleted and the corresponding HDFS mount directory to the metadata forwarding module, and the metadata forwarding module deletes the metadata information which is persisted in the AliasMap and the Namespace, so that mount deletion operation is completed.
And the data caching module is used for caching the metadata stored at the far end to the DataNode node in a data copy mode for local storage, and acquiring the metadata from the cache of the DataNode node when the client reads the same data request again. In the DataNode service, when the remote storage mounting is completed, the HDFS client side acquires the authentication voucher and the block information of the remote storage through the persistent mapping information when requesting to read data. The module pulls the corresponding block information data in a form of a remote storage client, caches the data to the local of the DataNode in a form of data copy, and then reads the data. The data cache module can be closed or opened through configuration, and the reading performance can be greatly optimized for remote data needing to be read frequently for a long time.
According to the system for realizing the remote storage mount of the HDFS disclosed by the embodiment of the invention, the mount management module pulls the metadata information according to the authentication certificate, and the metadata processing module realizes the conversion and construction of the remote storage metadata and improves the flexibility of remote storage reading; through the metadata forwarding module, the remote storage read request is shunted, the metadata information is persisted, the creation, query and deletion of the mounted metadata are realized, and the access pressure of the NameNode is reduced; the data cache module stores the remote data in a local DataNode in a copy mode, so that the reading performance can be greatly optimized.
With regard to the system in the above embodiment, the specific steps in which the respective modules perform operations have been described in detail in the embodiment related to the method, and are not described in detail herein.
The present application may also provide the following embodiments:
a computing device, the computing device comprising: a processor and a memory;
the memory is to store computer program instructions;
when the computing device is run, the processor executes the computer program instructions in the memory to perform the operational steps of any of the methods described above.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A method for realizing remote storage mount of HDFS (Hadoop distributed File System), which is characterized by comprising the following steps:
receiving a request of data remote storage mount sent from a client, pulling metadata to be mounted to the remote storage, and establishing mapping relation information between the remote storage and the HDFS;
and forwarding and persisting the mapping relation information to a database, reading the mapping relation information in the database according to a request for reading the remote storage metadata sent by the client, and returning the block information to be read to the client.
2. The method of claim 1, wherein the method further comprises: and caching the metadata stored at the remote end to the DataNode node in a data copy mode for local storage.
3. The method according to claim 1 or 2, wherein after receiving the request initiated by the client, further obtaining a remotely stored authentication credential based on the authentication information contained in the request, and using the authentication credential for the pulling of metadata information and the forwarding process of mapping relationship information.
4. The method of claim 3, wherein pulling metadata to be mounted to a remote storage and establishing information forming a mapping relationship between the remote storage and the HDFS comprises:
using the authentication certificate to recursively acquire metadata information related to all files and directories through the URI stored at the far end;
and aggregating the acquired metadata information, constructing a URI and an HDFS mounting directory stored at a far end into a first mapping pair, constructing files and block information corresponding to the far end storage into a second mapping pair, constructing files, directories, user group information and authority information corresponding to the far end storage into a third mapping pair, and forwarding and persisting the three kinds of formed mapping relation information to corresponding databases.
5. The method of claim 4, wherein the received information of the third mapping pair is sent to a Namespace database for storage; and sending the received information of the first mapping pair and the second mapping pair to an AliasMap database for storage.
6. The implementation method of claim 1 wherein the request sent by a client includes create, delete and query.
7. A system for implementing HDFS remote storage mount, the system comprising:
the mount management module receives a data remote storage mount request sent from a client, pulls metadata to be mounted to the remote storage, establishes mapping relation information between the remote storage and the HDFS, and sends the mapping relation information to the metadata forwarding module;
and the metadata forwarding module is used for persisting the received mapping relation information to a database, acquiring the block information to be read by reading the mapping relation information in the database according to a request for reading the remote storage metadata sent by the client, and returning the block information to the client.
8. The system of claim 7, further comprising a data caching module for caching metadata stored remotely to the DataNode node in the form of a data copy for local storage, and obtaining the result from the cache of the DataNode node when the client reads the same data request again.
9. The system of claim 7 or 8, wherein the mount management module comprises:
the authentication module is used for receiving a mounting request of a client, analyzing authentication information contained in the mounting request to acquire an authentication certificate for remote storage, and sending the acquired authentication certificate to the metadata forwarding module and the remote storage access module;
the remote storage access module is used for pulling corresponding metadata information to be mounted to the remote storage according to the acquired authentication certificate and sending the metadata information to the metadata processing module;
and the metadata processing module is used for aggregating the acquired metadata information, constructing the URI and the HDFS mount directory stored at the far end into a first mapping pair, constructing the file and block information corresponding to the far end storage into a second mapping pair, constructing the file, the directory and the user group information corresponding to the far end storage and the authority information into a third mapping pair, and sending the formed three kinds of mapping relation information to the metadata forwarding module.
10. The system of claim 9, wherein the metadata forwarding module comprises: a first forwarding module and a second forwarding module, wherein,
the first forwarding module is used for sending the received information of the third mapping pair to a Namespace database;
and the second forwarding module is used for sending the received information of the first mapping pair and the second mapping pair to the AliasMap database.
CN202111389173.3A 2021-11-22 2021-11-22 Method and system for realizing remote storage mounting of HDFS (Hadoop distributed File System) Active CN114201446B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111389173.3A CN114201446B (en) 2021-11-22 2021-11-22 Method and system for realizing remote storage mounting of HDFS (Hadoop distributed File System)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111389173.3A CN114201446B (en) 2021-11-22 2021-11-22 Method and system for realizing remote storage mounting of HDFS (Hadoop distributed File System)

Publications (2)

Publication Number Publication Date
CN114201446A true CN114201446A (en) 2022-03-18
CN114201446B CN114201446B (en) 2024-01-23

Family

ID=80648351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111389173.3A Active CN114201446B (en) 2021-11-22 2021-11-22 Method and system for realizing remote storage mounting of HDFS (Hadoop distributed File System)

Country Status (1)

Country Link
CN (1) CN114201446B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115268797A (en) * 2022-09-26 2022-11-01 创云融达信息技术(天津)股份有限公司 Method for realizing system and object storage communication through WebDav
CN116991333A (en) * 2023-09-25 2023-11-03 苏州元脑智能科技有限公司 Distributed data storage method, device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150169602A1 (en) * 2013-12-18 2015-06-18 Software Ag File metadata handler for storage and parallel processing of files in a distributed file system, and associated systems and methods
CN107992491A (en) * 2016-10-26 2018-05-04 中国移动通信有限公司研究院 A kind of method and device of distributed file system, data access and data storage
CN111694791A (en) * 2020-04-01 2020-09-22 新华三大数据技术有限公司 Data access method and device in distributed basic framework
CN113672584A (en) * 2021-08-30 2021-11-19 济南浪潮数据技术有限公司 HDFS protocol data mapping transmission method and device of distributed file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150169602A1 (en) * 2013-12-18 2015-06-18 Software Ag File metadata handler for storage and parallel processing of files in a distributed file system, and associated systems and methods
CN107992491A (en) * 2016-10-26 2018-05-04 中国移动通信有限公司研究院 A kind of method and device of distributed file system, data access and data storage
CN111694791A (en) * 2020-04-01 2020-09-22 新华三大数据技术有限公司 Data access method and device in distributed basic framework
CN113672584A (en) * 2021-08-30 2021-11-19 济南浪潮数据技术有限公司 HDFS protocol data mapping transmission method and device of distributed file system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115268797A (en) * 2022-09-26 2022-11-01 创云融达信息技术(天津)股份有限公司 Method for realizing system and object storage communication through WebDav
CN116991333A (en) * 2023-09-25 2023-11-03 苏州元脑智能科技有限公司 Distributed data storage method, device, electronic equipment and storage medium
CN116991333B (en) * 2023-09-25 2024-01-26 苏州元脑智能科技有限公司 Distributed data storage method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114201446B (en) 2024-01-23

Similar Documents

Publication Publication Date Title
US11816126B2 (en) Large scale unstructured database systems
US11093466B2 (en) Incremental out-of-place updates for index structures
CN109783438B (en) Distributed NFS system based on librados and construction method thereof
CN107169083B (en) Mass vehicle data storage and retrieval method and device for public security card port and electronic equipment
US10592488B2 (en) Application-centric object interfaces
US9043372B2 (en) Metadata subsystem for a distributed object store in a network storage system
CN114201446B (en) Method and system for realizing remote storage mounting of HDFS (Hadoop distributed File System)
CN108140040A (en) The selective data compression of database in memory
WO2011108695A1 (en) Parallel data processing system, parallel data processing method and program
CN103595797B (en) Caching method for distributed storage system
CN110287150A (en) A kind of large-scale storage systems meta-data distribution formula management method and system
CN107977446A (en) A kind of memory grid data load method based on data partition
CN103501319A (en) Low-delay distributed storage system for small files
Nguyen et al. Zing database: high-performance key-value store for large-scale storage service
US11886411B2 (en) Data storage using roaring binary-tree format
CN104158897A (en) Updating method of file layout in distributed file system
CN111159176A (en) Method and system for storing and reading mass stream data
CN116108057A (en) Distributed database access method, device, equipment and storage medium
US10146833B1 (en) Write-back techniques at datastore accelerators
US20230281211A1 (en) Adding a read-only query engine to perform queries to a point-in-time of a write-accessible database
WO2015049734A1 (en) Search system and search method
CN110109866B (en) Method and equipment for managing file system directory
Cheng et al. Optimizing small file storage process of the HDFS which based on the indexing mechanism
CN113051244B (en) Data access method and device, and data acquisition method and device
US11341163B1 (en) Multi-level replication filtering for a distributed database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant