CN105022779A - Method for realizing HDFS file access by utilizing Filesystem API - Google Patents

Method for realizing HDFS file access by utilizing Filesystem API Download PDF

Info

Publication number
CN105022779A
CN105022779A CN201510229757.2A CN201510229757A CN105022779A CN 105022779 A CN105022779 A CN 105022779A CN 201510229757 A CN201510229757 A CN 201510229757A CN 105022779 A CN105022779 A CN 105022779A
Authority
CN
China
Prior art keywords
file
namenode
hadoop
hdfs
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510229757.2A
Other languages
Chinese (zh)
Inventor
杨莉
王森
沈映泉
赵薇
段嘉杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Yunnan Power System Ltd
Original Assignee
Electric Power Research Institute of Yunnan Power System Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Yunnan Power System Ltd filed Critical Electric Power Research Institute of Yunnan Power System Ltd
Priority to CN201510229757.2A priority Critical patent/CN105022779A/en
Publication of CN105022779A publication Critical patent/CN105022779A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

A method for realizing HDFS file access by utilizing Filesystem API is disclosed. The method comprises Hadoop cluster environment construction, an HDFS file uploading and downloading method and a HDFS file read-write process. The method can realize efficient uploading and downloading of HDFS files.

Description

One utilizes Filesystem API to realize HDFS file access method
Technical field
The invention belongs to computer distribution type file access field, further relate to a kind of realization of the HDFS file access based on Filesystem API.
Background technology
Along with the develop rapidly of network technology, many enterprises and group carry out the storage of data, calculating and mutual by data storage service provider.Hadoop is a distributed system architecture, is developed by Apache foundation.User can when not understanding distributed low-level details, exploitation distributed program.Make full use of power high-speed computation and the storage of cluster.Hadoop achieves a distributed file system (Hadoop Distributed File System), is called for short HDFS.
HDFS adopts principal and subordinate (Master/Slave) framework, and a HDFS cluster is made up of a Namenode and multiple Datanode.Namenode is a central server, be responsible for the operation of the name space of file system, such as open, close, Rename file or catalogue, it is responsible for the mapping of maintenance documentation path to data block, data block to the mapping of Datanode, and monitors the heartbeat of Datanode and the number of service data block copy.And the metadata information of NameNode is placed in main memory, a record probably accounts for 150 bytes, and its structure mainly comprises: (1) listed files information; (2) information of the blocks of files that each file is corresponding; (3) information of the DataNode that each blocks of files is corresponding; (4) file attribute, as creation-time, creates this, copy number etc.Datanode is responsible for processing the read-write requests of client, it carry out under Namenode United Dispatching data block establishment, delete and copy, general physical node (machine) deploy one.
HDFS has the feature of high fault tolerance, and design is used for being deployed on cheap (low-cost) hardware.And it provides high transmission rates (high throughput) to visit the data of application program, be applicable to the application program that those have super large data set (large data set).HDFS adopts stream data access mode access data, is all the read or write for a whole blocks at every turn.
Summary of the invention
The object of the present invention is to provide a kind of implementation method of the HDFS file access system based on Filesystem API, realize the high-level efficiency upload and download to HDFS document.
To achieve these goals, the invention provides one and utilize Filesystem API to realize HDFS file access method, it is characterized in that, comprise that Hadoop cluster environment is built, the uploading of HDFS file, method for down loading and HDFS file read-write flow process; Wherein,
Hadoop cluster environment is built and is comprised the steps:
Step S1, installs linux system;
Step S2, creates Hadoop user's group and user under linux system;
Step S3, installs JDK and configuration surroundings variable;
Step S4, revises the host name of each main frame, and the mapping relations between configure host;
Step S5, installs ssh service, creates ssh without cryptographic acess;
Step S6, each main frame is installed and configures Hadoop;
Step S7, starts Hadoop after installation, and whether checking installs correct;
The method for uploading of HDFS document comprises the steps:
Step S1, user is to upload file in Hadoop distributed file system;
Step S2, utilizes org.apache.hadoop.fs.FileInputStream class to create the inlet flow of local file;
Step S3, utilize org.apache.hadoop.conf.Configuration class to read Hadoop file system configuration item, the configurations be configured in core-site.xml all is here as the criterion;
Step S4, org.apache.hadoop.fs.FileSystem are the core classes of user operation HDFS, are the abstract base class of a universal document system, can be distributed formula file system and inherit, and obtain file HDFS file system corresponding to URI by such;
Step S5, opens the output stream of Hadoop file in the mode created, this output stream points to HDFS file destination;
Step S6, copies to HDFS file destination by file from local file system with IOUtils instrument;
Step S7, shows the catalogue of the All Files in current HDFS file destination;
The method for down loading of HDFS file comprises the steps:
Step S1, user is download file from Hadoop distributed file system;
Step S2, utilize org.apache.hadoop.conf.Configuration class to read Hadoop file system configuration item, the configurations be configured in core-site.xml all is here as the criterion;
Step S3, obtains file HDFS file system corresponding to URI by org.apache.hadoop.fs.FileSystem class;
Step S4, allows FileSystem open FSDataInputStream file input stream corresponding to a URI, file reading;
Step S5, with IOUtils instrument file selected files from HDFS file destination be saved in local file system specified path under;
Step S6, closes inlet flow and output stream;
It is as follows that flow process read by HDFS file:
Step S1, client sends by the open function of FileSystem the request opened file;
Request is sent to Namenode by RPC agreement by step S2, FileSystem;
Step S3, Namenode check meta information, return the data block location of corresponding document;
Step S4, FileSystem return a FSDataInputStream to client, allow it from FSDataInputStream, read data;
The read function of step S5, client call FSDataInputStream;
Step S6, client starts to read data from Datanode in streaming fashion;
After the data block reading of step S7, current Datanode, close the connection of this stream and Datanode; Then connect the nearest Datanode of the next data block of this file, and carry out block reading;
Step S8, after client reads complete data, calls the close function of FSDataInputStream, closes this stream;
It is as follows that flow process write by HDFS file:
Step S1, the file that first client will be uploaded carries out piecemeal in units of 64M, is respectively block1, block2...block n, sends establishment file request by the create function of FileSystem simultaneously;
Request is sent to Namenode by RPC agreement by step S2, FileSystem;
Step S3, the file that establishment one is new inside the Namespace of Namenode, Namenode returns available Datanode simultaneously;
Step S4, FileSystem return a FSDataOutputStream to client, for writing data;
The write function of step S5, client call FSDataOutputStream;
Step S6, client starts in streaming fashion block1 to be write Datanode:(1) block1 of 64M is divided by the package of 64k; (2) then first package is sent to first Datanode1; (3) after Datanode1 receives, first package is sent to Datanode2, Client sends second package to Datanode1 simultaneously; (4) after Datanode2 receives first package, send to Datanode3, receive second package that Datanode1 sends simultaneously; (5) by that analogy, until block1 is sent;
Step S7, Datanode1, Datanode2, Datanode3 send block1 and send successful message to NameNode, Datanode1 to Client.After Client receives the message that Datanode1 sends, send message to Namenode; Now, block1 sends and terminates completely, jumps to step S6, starts to write the piecemeal that block2, block3...block n etc. is remaining, until block n sends and terminates completely;
Step S8, after client write data complete, calls the close function of FSDataOutputStream, closes this stream;
Step S9, FileSystem notify that Namenode write is complete;
When file system client client carries out write operation, first it is recorded in amendment daily record edit log; Namenode saves the metadata information of file system in internal memory; After have recorded amendment daily record, Namenode then revises the data structure in internal memory; Before each write operation success, amendment daily record all can be synchronized to file system; Fsimage file, be the checkpoint of metadata on hard disk in internal memory, it is a kind of form of serializing, directly can not revise on hard disk; When Namenode failure, then the metadata information of up-to-date checkpoint is loaded into internal memory from fsimage, then re-executes the operation in amendment daily record one by one; SecondaryNamenode is just used to help Namenode by metadata information checkpoint in internal memory on hard disk; The process of checkpoint is as follows:
Step S1, SecondaryNamenode notify that Namenode generates new journal file, and later daily record is all write in new journal file;
Step S2, SecondaryNamenode http get obtains fsimage file and old journal file from Namenode;
Step S3, SecondaryNamenode are by fsimage files loading in internal memory, and the operation in execution journal file, then generates new fsimage file;
Step S4, SecondaryNamenode pass new fsimage file http post back Namenode;
Step S5, Namenode by old fsimage file and old journal file, can be changed to the new journal file of new fsimage file and step S1 generation, then upgrade fstime file, write the time of this checkpoint;
Step S6, the fsimage file in such Namenode saves the metadata information of up-to-date checkpoint, and journal file empties and restarts record modification.
FSNamesystem of the present invention is file system name space system class, and it is defined as follows:
public class F SNamesystem{
Public FSDirectory dir; // storage file is set
final BlocksMap blocksMap=newBlocksMap(DEFAULT_INITIAL_MAP_CAPACITY,DEFAULT_MAP_LOAD_FACTOR);
//BlocksMap class maintenance block (Block) is to the mapping table of its metadata, and metadata information comprises the Datanode of inode belonging to block, storage block.
Public CorruptReplicasMap corruptReplicas=new CorruptReplicasMap (); The mapping table of // fail block.
NavigableMap<String, DatanodeDescriptor>datanodeMap=new TreeMap<String, DatanodeDescriptor> (); //datanode is to the mapping table of block
ArrayList<DatanodeDescriptor>heartbeats=new ArrayList<DatanodeDescriptor>();
The subset of //datanodeMap, only comprises the DatanodeDescriptor thinking and survive, and HeartbeatMonitor regularly can remove expired element
private UnderReplicatedBlocks neededReplications=new UnderReplicatedBlocks();
// entity class of the not enough block of copy amount of some block is described, and, priority is set for block, is carried out the set of the block of management block copy deficiency by a priority query.
Private PendingReplicationBlocks pendingReplications; // the current list not yet completing the block of block copy replication is described.
Public LeaseManager leaseManager=new LeaseManager (this); // lease of file is managed.
Daemon hbthread=null; // periodically call FSNamesystem class definition heartbeatCheck method, monitor the heartbeat state information that Datanode node sends, and handle it
public Daemon lmthread=null;//LeaseMonitor thread
Daemon smmthread=null; // be used for periodically checking whether the condition reaching and leave safe mode, therefore, this thread must start (namely reaching threshold) after entering safe mode.
Public Daemon replthread=null; // periodically call two methods: computing block copy amount, to make a plan and to dispatch Datanode process; The copy that the streamline that process does not complete block copies
private ReplicationMonitor replmon=null;//Replication metrics
Private Host2NodesMap host2DataNodeMap=new Host2NodesMap (); // be used for the mapping of the main frame->DatanodeDescriptor array of preserving Datanode node
NetworkTopology clusterMap=new NetworkTopology (); // represent a computer cluster with tree network topological structure, a cluster may be made up of multiple data center (Data Center), is dispersed with the frame (Rack) of a lot of computing machines arranged for computation requirement in these data centers.
Private DNSToSwitchMapping dnsToSwitchMapping; // this interface is a definition supporting plug-in unit, by the resolver changed between plug-in definition DNS-name/IP-address->RackID.
ReplicationTargetChooser replicator; // selection positions to the deposit position of the block copy of specifying realize class.
Private HostsFileReader hostsReader; // be used for following the tracks of Datanode, which Datanode allows to be connected to Namenode, and which can not be connected to Namenode, record in the list of all specifying in such }.
Accompanying drawing explanation
Fig. 1 is the HDFS files passe schematic flow sheet based on Filesystem API;
Fig. 2 is the HDFS file download schematic flow sheet based on Filesystem API;
Fig. 3 is that process flow diagram read by HDFS file;
Fig. 4 is that process flow diagram write by HDFS file;
Fig. 5 is checkpoint process flow diagram.
Embodiment
One utilizes Filesystem API to realize HDFS file access method, the present invention includes Hadoop cluster environment dispose, upload the local document of Windows system to Hadoop distributed file system (HDFS), download HDFS document to local assigned catalogue, flow process read by file, flow process write by file and checkpoint process first, linux system is installed and creates Hadoop user's group and user under this systems, in addition, also JDK to be installed and configuration surroundings variable; Secondly, revise the host name of each main frame, and the mapping relations between configure host; Again, ssh service is installed, creates ssh without cryptographic acess; Finally, each main frame installed and configures Hadoop, after installation, starting Hadoop, and whether checking installs correct.
Hadoop cluster environment is built and is comprised the steps:
Step S1, installs linux system;
Step S2, creates Hadoop user's group and user under linux system;
Step S3, installs JDK and configuration surroundings variable;
Step S4, revises the host name of each main frame, and the mapping relations between configure host;
Step S5, installs ssh service, creates ssh without cryptographic acess;
Step S6, each main frame is installed and configures Hadoop;
Step S7, starts Hadoop after installation, and whether checking installs correct;
The method for uploading of HDFS document as shown in Figure 1, comprises the steps:
Step S1, user is to upload file in Hadoop distributed file system;
Step S2, utilizes org.apache.hadoop.fs.FileInputStream class to create the inlet flow of local file;
Step S3, utilize org.apache.hadoop.conf.Configuration class to read Hadoop file system configuration item, the configurations be configured in core-site.xml all is here as the criterion;
Step S4, org.apache.hadoop.fs.FileSystem are the core classes of user operation HDFS, are the abstract base class of a universal document system, can be distributed formula file system and inherit, and obtain file HDFS file system corresponding to URI by such;
Step S5, opens the output stream of Hadoop file in the mode created, this output stream points to HDFS file destination;
Step S6, copies to HDFS file destination by file from local file system with IOUtils instrument;
Step S7, shows the catalogue of the All Files in current HDFS file destination;
The method for down loading of HDFS file as shown in Figure 2, comprises the steps:
Step S1, user is download file from Hadoop distributed file system;
Step S2, utilize org.apache.hadoop.conf.Configuration class to read Hadoop file system configuration item, the configurations be configured in core-site.xml all is here as the criterion;
Step S3, obtains file HDFS file system corresponding to URI by org.apache.hadoop.fs.FileSystem class;
Step S4, allows FileSystem open FSDataInputStream file input stream corresponding to a URI, file reading;
Step S5, with IOUtils instrument file selected files from HDFS file destination be saved in local file system specified path under;
Step S6, closes inlet flow and output stream;
HDFS file reads flow process as shown in Figure 3, specific as follows:
Step S1, client sends by the open function of FileSystem the request opened file;
Request is sent to Namenode by RPC agreement by step S2, FileSystem;
Step S3, Namenode check meta information, return the data block location of corresponding document;
Step S4, FileSystem return a FSDataInputStream to client, allow it from FSDataInputStream, read data;
The read function of step S5, client call FSDataInputStream;
Step S6, client starts to read data from Datanode in streaming fashion;
After the data block reading of step S7, current Datanode, close the connection of this stream and Datanode; Then connect the nearest Datanode of the next data block of this file, and carry out block reading;
Step S8, after client reads complete data, calls the close function of FSDataInputStream, closes this stream;
HDFS file writes flow process as shown in Figure 4, specific as follows:
Step S1, the file that first client will be uploaded carries out piecemeal in units of 64M, is respectively block1, block2...block n, sends establishment file request by the create function of FileSystem simultaneously;
Request is sent to Namenode by RPC agreement by step S2, FileSystem;
Step S3, the file that establishment one is new inside the Namespace of Namenode, Namenode returns available Datanode simultaneously;
Step S4, FileSystem return a FSDataOutputStream to client, for writing data;
The write function of step S5, client call FSDataOutputStream;
Step S6, client starts in streaming fashion block1 to be write Datanode:(1) block1 of 64M is divided by the package of 64k; (2) then first package is sent to first Datanode1; (3) after Datanode1 receives, first package is sent to Datanode2, Client sends second package to Datanode1 simultaneously; (4) after Datanode2 receives first package, send to Datanode3, receive second package that Datanode1 sends simultaneously; (5) by that analogy, until block1 is sent;
Step S7, Datanode1, Datanode2, Datanode3 send block1 and send successful message to NameNode, Datanode1 to Client.After Client receives the message that Datanode1 sends, send message to Namenode; Now, block1 sends and terminates completely, jumps to step S6, starts to write the piecemeal that block2, block3...block n etc. is remaining, until block n sends and terminates completely;
Step S8, after client write data complete, calls the close function of FSDataOutputStream, closes this stream;
Step S9, FileSystem notify that Namenode write is complete;
When file system client client carries out write operation, first it is recorded in amendment daily record edit log; Namenode saves the metadata information of file system in internal memory; After have recorded amendment daily record, Namenode then revises the data structure in internal memory; Before each write operation success, amendment daily record all can be synchronized to file system; Fsimage file, be the checkpoint of metadata on hard disk in internal memory, it is a kind of form of serializing, directly can not revise on hard disk; When Namenode failure, then the metadata information of up-to-date checkpoint is loaded into internal memory from fsimage, then re-executes the operation in amendment daily record one by one; SecondaryNamenode is just used to help Namenode by metadata information checkpoint in internal memory on hard disk; The process of checkpoint is as shown in Figure 5, specific as follows:
Step S1, SecondaryNamenode notify that Namenode generates new journal file, and later daily record is all write in new journal file;
Step S2, SecondaryNamenode http get obtains fsimage file and old journal file from Namenode;
Step S3, SecondaryNamenode are by fsimage files loading in internal memory, and the operation in execution journal file, then generates new fsimage file;
Step S4, SecondaryNamenode pass new fsimage file http post back Namenode;
Step S5, Namenode by old fsimage file and old journal file, can be changed to the new journal file of new fsimage file and step S1 generation, then upgrade fstime file, write the time of this checkpoint;
Step S6, the fsimage file in such Namenode saves the metadata information of up-to-date checkpoint, and journal file empties and restarts record modification.

Claims (1)

1. utilize Filesystem API to realize a HDFS file access method, it is characterized in that, comprise that Hadoop cluster environment is built, the uploading of HDFS file, method for down loading and HDFS file read-write flow process; Wherein,
Hadoop cluster environment is built and is comprised the steps:
Step S1, installs linux system;
Step S2, creates Hadoop user's group and user under linux system;
Step S3, installs JDK and configuration surroundings variable;
Step S4, revises the host name of each main frame, and the mapping relations between configure host;
Step S5, installs ssh service, creates ssh without cryptographic acess;
Step S6, each main frame is installed and configures Hadoop;
Step S7, starts Hadoop after installation, and whether checking installs correct;
The method for uploading of HDFS document comprises the steps:
Step S1, user is to upload file in Hadoop distributed file system;
Step S2, utilizes org.apache.hadoop.fs.FileInputStream class to create the inlet flow of local file;
Step S3, utilize org.apache.hadoop.conf.Configuration class to read Hadoop file system configuration item, the configurations be configured in core-site.xml all is here as the criterion;
Step S4, org.apache.hadoop.fs.FileSystem are the core classes of user operation HDFS, are the abstract base class of a universal document system, can be distributed formula file system and inherit, and obtain file HDFS file system corresponding to URI by such;
Step S5, opens the output stream of Hadoop file in the mode created, this output stream points to HDFS file destination;
Step S6, copies to HDFS file destination by file from local file system with IOUtils instrument;
Step S7, shows the catalogue of the All Files in current HDFS file destination;
The method for down loading of HDFS file comprises the steps:
Step S1, user is download file from Hadoop distributed file system;
Step S2, utilize org.apache.hadoop.conf.Configuration class to read Hadoop file system configuration item, the configurations be configured in core-site.xml all is here as the criterion;
Step S3, obtains file HDFS file system corresponding to URI by org.apache.hadoop.fs.FileSystem class;
Step S4, allows FileSystem open FSDataInputStream file input stream corresponding to a URI, file reading;
Step S5, with IOUtils instrument file selected files from HDFS file destination be saved in local file system specified path under;
Step S6, closes inlet flow and output stream;
It is as follows that flow process read by HDFS file:
Step S1, client sends by the open function of FileSystem the request opened file;
Request is sent to Namenode by RPC agreement by step S2, FileSystem;
Step S3, Namenode check meta information, return the data block location of corresponding document;
Step S4, FileSystem return a FSDataInputStream to client, allow it from FSDataInputStream, read data;
The read function of step S5, client call FSDataInputStream;
Step S6, client starts to read data from Datanode in streaming fashion;
After the data block reading of step S7, current Datanode, close the connection of this stream and Datanode; Then connect the nearest Datanode of the next data block of this file, and carry out block reading;
Step S8, after client reads complete data, calls the close function of FSDataInputStream, closes this stream;
It is as follows that flow process write by HDFS file:
Step S1, the file that first client will be uploaded carries out piecemeal in units of 64M, is respectively block1, block2 ... blockn, sends establishment file request by the create function of FileSystem simultaneously;
Request is sent to Namenode by RPC agreement by step S2, FileSystem;
Step S3, the file that establishment one is new inside the Namespace of Namenode, Namenode returns available Datanode simultaneously;
Step S4, FileSystem return a FSDataOutputStream to client, for writing data;
The write function of step S5, client call FSDataOutputStream;
Step S6, client starts in streaming fashion block1 to be write Datanode:(1) block1 of 64M is divided by the package of 64k; (2) then first package is sent to first Datanode1; (3) after Datanode1 receives, first package is sent to Datanode2, Client sends second package to Datanode1 simultaneously; (4) after Datanode2 receives first package, send to Datanode3, receive second package that Datanode1 sends simultaneously; (5) by that analogy, until block1 is sent;
Step S7, Datanode1, Datanode2, Datanode3 send block1 and send successful message to NameNode, Datanode1 to Client.After Client receives the message that Datanode1 sends, send message to Namenode; Now, block1 sends and terminates completely, jumps to step S6, starts to write block2, block3 ... the piecemeal that block n etc. are remaining, terminates completely until blockn sends;
Step S8, after client write data complete, calls the close function of FSDataOutputStream, closes this stream;
Step S9, FileSystem notify that Namenode write is complete;
When file system client client carries out write operation, first it is recorded in amendment daily record edit log; Namenode saves the metadata information of file system in internal memory; After have recorded amendment daily record, Namenode then revises the data structure in internal memory; Before each write operation success, amendment daily record all can be synchronized to file system; Fsimage file, be the checkpoint of metadata on hard disk in internal memory, it is a kind of form of serializing, directly can not revise on hard disk; When Namenode failure, then the metadata information of up-to-date checkpoint is loaded into internal memory from fsimage, then re-executes the operation in amendment daily record one by one; SecondaryNamenode is just used to help Namenode by metadata information checkpoint in internal memory on hard disk; The process of checkpoint is as follows:
Step S1, SecondaryNamenode notify that Namenode generates new journal file, and later daily record is all write in new journal file;
Step S2, SecondaryNamenode http get obtains fsimage file and old journal file from Namenode;
Step S3, SecondaryNamenode are by fsimage files loading in internal memory, and the operation in execution journal file, then generates new fsimage file;
Step S4, SecondaryNamenode pass new fsimage file http post back Namenode;
Step S5, Namenode by old fsimage file and old journal file, can be changed to the new journal file of new fsimage file and step S1 generation, then upgrade fstime file, write the time of this checkpoint;
Step S6, the fsimage file in such Namenode saves the metadata information of up-to-date checkpoint, and journal file empties and restarts record modification.
CN201510229757.2A 2015-05-07 2015-05-07 Method for realizing HDFS file access by utilizing Filesystem API Pending CN105022779A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510229757.2A CN105022779A (en) 2015-05-07 2015-05-07 Method for realizing HDFS file access by utilizing Filesystem API

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510229757.2A CN105022779A (en) 2015-05-07 2015-05-07 Method for realizing HDFS file access by utilizing Filesystem API

Publications (1)

Publication Number Publication Date
CN105022779A true CN105022779A (en) 2015-11-04

Family

ID=54412750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510229757.2A Pending CN105022779A (en) 2015-05-07 2015-05-07 Method for realizing HDFS file access by utilizing Filesystem API

Country Status (1)

Country Link
CN (1) CN105022779A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106713493A (en) * 2017-01-20 2017-05-24 郑州云海信息技术有限公司 System and method for constructing distributed file system in cluster environment
CN107294771A (en) * 2017-05-17 2017-10-24 上海斐讯数据通信技术有限公司 A kind of efficient deployment system and application method suitable for big data cluster
CN112199334A (en) * 2020-10-23 2021-01-08 东北大学 Method and device for storing data stream processing check point file based on message queue
CN115495057A (en) * 2022-11-16 2022-12-20 江苏智云天工科技有限公司 Method and system for realizing windows and HDFS communication

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073741A (en) * 2011-01-30 2011-05-25 宇龙计算机通信科技(深圳)有限公司 Method for realizing file reading and/or writing and data server
US20120182891A1 (en) * 2011-01-19 2012-07-19 Youngseok Lee Packet analysis system and method using hadoop based parallel computation
CN102902716A (en) * 2012-08-27 2013-01-30 苏州两江科技有限公司 Storage system based on Hadoop distributed computing platform
CN103793425A (en) * 2012-10-31 2014-05-14 国际商业机器公司 Data processing method and data processing device for distributed system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120182891A1 (en) * 2011-01-19 2012-07-19 Youngseok Lee Packet analysis system and method using hadoop based parallel computation
CN102073741A (en) * 2011-01-30 2011-05-25 宇龙计算机通信科技(深圳)有限公司 Method for realizing file reading and/or writing and data server
CN102902716A (en) * 2012-08-27 2013-01-30 苏州两江科技有限公司 Storage system based on Hadoop distributed computing platform
CN103793425A (en) * 2012-10-31 2014-05-14 国际商业机器公司 Data processing method and data processing device for distributed system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHNRLES AANA888: "Hadoop利用FileSystem API执行hadoop文件读写操作", 《HTTP://SUPERCHARLES888.BLOG.51CTO.COM/609344/878921》 *
ZANGLU: "Hadoop集群(第8期)_HDFS初探之旅", 《HTTP://WWW.EDUCITY.CN/NET/1618908.HTML》 *
风生水起: "超详细单机版搭建hadoop环境图文解析", 《HTTP://WWW.CNBLOGS.COM/END/ARCHIVE/2012/08/13/2636645.HTML》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106713493A (en) * 2017-01-20 2017-05-24 郑州云海信息技术有限公司 System and method for constructing distributed file system in cluster environment
CN106713493B (en) * 2017-01-20 2020-09-29 苏州浪潮智能科技有限公司 System and method for constructing distributed file in computer cluster environment
CN107294771A (en) * 2017-05-17 2017-10-24 上海斐讯数据通信技术有限公司 A kind of efficient deployment system and application method suitable for big data cluster
CN112199334A (en) * 2020-10-23 2021-01-08 东北大学 Method and device for storing data stream processing check point file based on message queue
CN112199334B (en) * 2020-10-23 2023-12-05 东北大学 Method and device for processing check point file storage by data flow based on message queue
CN115495057A (en) * 2022-11-16 2022-12-20 江苏智云天工科技有限公司 Method and system for realizing windows and HDFS communication
CN115495057B (en) * 2022-11-16 2023-02-28 江苏智云天工科技有限公司 Method and system for realizing windows and HDFS communication

Similar Documents

Publication Publication Date Title
US10956601B2 (en) Fully managed account level blob data encryption in a distributed storage environment
US11954002B1 (en) Automatically provisioning mediation services for a storage system
US10764045B2 (en) Encrypting object index in a distributed storage environment
CN104731691B (en) The method and system of duplicate of the document number in dynamic adjustment distributed file system
US10659225B2 (en) Encrypting existing live unencrypted data using age-based garbage collection
US20210019063A1 (en) Utilizing data views to optimize secure data access in a storage system
US9436556B2 (en) Customizable storage system for virtual databases
CN107797767B (en) One kind is based on container technique deployment distributed memory system and its storage method
US20200174671A1 (en) Bucket views
WO2016180055A1 (en) Method, device and system for storing and reading data
CN102420854A (en) Distributed file system facing to cloud storage
US20210055885A1 (en) Enhanced data access using composite data views
CN103166785A (en) Distributed type log analysis system based on Hadoop
CN104462185A (en) Digital library cloud storage system based on mixed structure
JP5868986B2 (en) Recovery by item
CN104050248A (en) File storage system and storage method
CN104281980B (en) Thermal power generation unit remote diagnosis method and system based on Distributed Calculation
CN105095103A (en) Storage device management method and device used for cloud environment
CN105022779A (en) Method for realizing HDFS file access by utilizing Filesystem API
US20220214814A1 (en) Cross-platform replication of logical units
CN102281312A (en) Data loading method and system and data processing method and system
Won et al. Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS
CN109413130A (en) A kind of cloud storage system
EP3349416B1 (en) Relationship chain processing method and system, and storage medium
CN116389233A (en) Container cloud management platform active-standby switching system, method and device and computer equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151104