CN104408047A - Method for uploading text file to HDFS (hadoop distributed file system) in multi-machine parallel mode based on NFS (network file system) file server - Google Patents
Method for uploading text file to HDFS (hadoop distributed file system) in multi-machine parallel mode based on NFS (network file system) file server Download PDFInfo
- Publication number
- CN104408047A CN104408047A CN201410584207.8A CN201410584207A CN104408047A CN 104408047 A CN104408047 A CN 104408047A CN 201410584207 A CN201410584207 A CN 201410584207A CN 104408047 A CN104408047 A CN 104408047A
- Authority
- CN
- China
- Prior art keywords
- node
- file
- uploading
- hdfs
- uploaded
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000005516 engineering process Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
- G06F16/183—Provision of network file services by network file servers, e.g. by using NFS, CIFS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention provides a method for uploading a text file to an HDFS (hadoop distributed file system) in a multi-machine parallel mode based on an NFS (network file system) file server. The method comprises the following steps: selecting N host computers from an HDFS cluster, and then selecting any node as a main node and other N-1 nodes as slave nodes; obtaining a file in a to-be-uploaded directory of a to-be-uploaded NFS file server on the main node; as to each file, adopting a parallel uploading method, namely that all machines in the cluster participate into uploading, wherein each host computer in the cluster takes charge of uploading of continuous data blocks in 1/N size of each file, so as to achieve the parallel uploading target. Thus the uploading speed is increased.
Description
Technical field
The present invention relates to large technical field of data storage, specifically a kind of text multi-host parallel based on NFS file server uploads to HDFS method.
Background technology
Along with the development of computer network, the epoch of mass data arrive.The data use amount in Internet data center's prediction whole world will increase by 44 times to the year two thousand twenty, reaches 35.2ZB.
For the storage of large data sets like this, analysis, management and excavation, conventional art (comprising traditional relational) is incompetent, how analysis the most best and to understand these data are pendulum task of top priority in face of everybody.And in the technology had now and instrument, the most ripe also the most successful a set of large data solution is that Hadoop file stores Computational frame and framework associated component thereon.For a large amount of texts that every day generates, if upload to HDFS fast for follow-up process, be the current problem faced.For solving the problem uploaded fast of text, proposing herein and a kind ofly uploading to HDFS method based on the text multi-host parallel based on NFS file server.
HDFS gives tacit consent to employing three copy mechanism, for the client of HDFS, when some users write data by a client in HDFS, if this client there is DataNode node, NameNode override considers the data write copy to be kept on the DataNode node of this client, two other copy is saved on other DataNode nodes of cluster, like this in whole cluster, if only there is a client write operation, the work of 3 DataNode nodes is only had in cluster, other DataNode nodes are idle, the performance of whole cluster can not be played.
Summary of the invention
The object of this invention is to provide a kind of text multi-host parallel based on NFS file server and upload to HDFS method.
The object of the invention is to realize in the following manner, choose N number of main frame in HDFS cluster, then select any node as host node, other N-1 node is as from node, on the primary node, the NFS file server that acquisition will be uploaded will upload file under catalogue, for each file, adopt parallel method for uploading, namely in cluster, all machines all participate in uploading, and in cluster, the continuous print data block uploading each file 1/N size is responsible for by each main frame, reach the parallel object uploaded, thus raising uploading speed, concrete steps flow process is:
1) each node of the N number of node of MainPut program computation data block start-stop to be uploaded byte stream on host node, and start BlockPut program parallelization on N number of node and upload; If first time is run, on each node an executable program BlockPut being installed, being responsible for for uploading this node institute the data block uploaded, then initiate order startup BlockPut program to each from node;
2) on each node, BlockPut program is responsible for data block to be uploaded to upload to HDFS, BlockPut opens a file input stream InputStream to be uploaded, InputStream navigates to banner word throttling, after on HDFS create a unique file, start-stop byte stream is written in HDFS unique file.
Catalogue to be uploaded is mounted to the acquiescence unified directory of N number of node.
N is not more than the number clients that NFS file server walks abreast when can to reach maximum bandwidth when reading,
Object beneficial effect of the present invention is: this to have chosen in cluster N number of node clearly as client, a file is divided into N number of data block upload simultaneously, each client is responsible for one piece, each piecemeal saves as an independently file on HDFS, can utilize the performance of whole cluster to greatest extent.A text block parallel is uploaded, plays the performance of cluster to greatest extent, transfer efficiency in raising.
Accompanying drawing explanation
Fig. 1 is based on multi-host parallel upload process frame diagram.
Embodiment
With reference to Figure of description, method of the present invention is described in detail below.
Choose N number of main frame in HDFS cluster, then select any node as host node, other N-1 node is as from node, on the primary node, the NFS file server that acquisition will be uploaded will upload file under catalogue, for each file, adopt parallel method for uploading, namely in cluster, all machines all participate in uploading, in cluster, the continuous print data block uploading each file 1/N size is responsible for by each main frame, reach the parallel object uploaded, thus raising uploading speed, a kind of text multi-host parallel based on NFS file server of the present invention uploads to HDFS method, whole flow process is:
1) each node of the N number of node of MainPut program computation data block start-stop to be uploaded byte stream on host node, and start BlockPut program parallelization on N number of node and upload; If first time is run, on each node an executable program BlockPut being installed, being responsible for for uploading this node institute the data block uploaded, then initiate order startup BlockPut program to each from node;
2) on each node, BlockPut program is responsible for data block to be uploaded to upload to HDFS, BlockPut opens a file input stream InputStream to be uploaded, InputStream navigates to banner word throttling, after on HDFS create a unique file, start-stop byte stream is written in HDFS unique file.
Except the technical characteristic described in instructions, be the known technology of those skilled in the art.
Claims (3)
1. one kind uploads to HDFS method based on the text multi-host parallel of NFS file server, it is characterized in that, choose N number of main frame in HDFS cluster, then select any node as host node, other N-1 node is as from node, on the primary node, the NFS file server that acquisition will be uploaded will upload file under catalogue, for each file, adopt parallel method for uploading, namely in cluster, all machines all participate in uploading, in cluster, the continuous print data block uploading each file 1/N size is responsible for by each main frame, reach the parallel object uploaded, thus raising uploading speed, concrete steps flow process is:
The each node of the N number of node of MainPut program computation data block start-stop to be uploaded byte stream on host node, and start BlockPut program parallelization on N number of node and upload; If first time is run, on each node an executable program BlockPut being installed, being responsible for for uploading this node institute the data block uploaded, then initiate order startup BlockPut program to each from node;
2) on each node, BlockPut program is responsible for data block to be uploaded to upload to HDFS, BlockPut opens a file input stream InputStream to be uploaded, InputStream navigates to banner word throttling, after on HDFS create a unique file, start-stop byte stream is written in HDFS unique file.
2. method according to claim 1, is characterized in that acquiescence unified directory catalogue to be uploaded being mounted to N number of node.
3. method according to claim 1, is characterized in that, N is not more than the number clients that NFS file server walks abreast when can to reach maximum bandwidth when reading.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410584207.8A CN104408047A (en) | 2014-10-28 | 2014-10-28 | Method for uploading text file to HDFS (hadoop distributed file system) in multi-machine parallel mode based on NFS (network file system) file server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410584207.8A CN104408047A (en) | 2014-10-28 | 2014-10-28 | Method for uploading text file to HDFS (hadoop distributed file system) in multi-machine parallel mode based on NFS (network file system) file server |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104408047A true CN104408047A (en) | 2015-03-11 |
Family
ID=52645679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410584207.8A Pending CN104408047A (en) | 2014-10-28 | 2014-10-28 | Method for uploading text file to HDFS (hadoop distributed file system) in multi-machine parallel mode based on NFS (network file system) file server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104408047A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105357317A (en) * | 2015-12-07 | 2016-02-24 | 金蝶软件(中国)有限公司 | Data uploading method and system based on multi-client polling queuing |
CN105357280A (en) * | 2015-10-19 | 2016-02-24 | 福建新大陆软件工程有限公司 | Hadoop distributed file system (HDFS) based file tracing file transfer protocol (FTP) system |
CN105610899A (en) * | 2015-12-10 | 2016-05-25 | 浪潮(北京)电子信息产业有限公司 | Text file parallel uploading method and device |
CN106339473A (en) * | 2016-08-29 | 2017-01-18 | 北京百度网讯科技有限公司 | Method and device for copying file |
CN107800691A (en) * | 2017-10-12 | 2018-03-13 | 云巅(上海)网络科技有限公司 | The system and method for building application program on demand and accessing data trnascription is realized based on distributed storage mechanism |
CN108280214A (en) * | 2017-02-02 | 2018-07-13 | 马志强 | Quick I/O systems applied to distributed genetic group analysis |
CN109325002A (en) * | 2018-09-03 | 2019-02-12 | 北京京东金融科技控股有限公司 | Text file processing method, device, system, electronic equipment, storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030110237A1 (en) * | 2001-12-06 | 2003-06-12 | Hitachi, Ltd. | Methods of migrating data between storage apparatuses |
CN101227460A (en) * | 2007-01-19 | 2008-07-23 | 秦晨 | Method for uploading and downloading distributed document and apparatus and system thereof |
CN103530388A (en) * | 2013-10-22 | 2014-01-22 | 浪潮电子信息产业股份有限公司 | Performance improving data processing method in cloud storage system |
CN103544285A (en) * | 2013-10-28 | 2014-01-29 | 华为技术有限公司 | Data loading method and device |
CN103970881A (en) * | 2014-05-16 | 2014-08-06 | 浪潮(北京)电子信息产业有限公司 | Method and system for achieving file uploading |
CN103971066A (en) * | 2014-05-20 | 2014-08-06 | 浪潮电子信息产业股份有限公司 | Verification method for integrity of big data migration in HDFS |
-
2014
- 2014-10-28 CN CN201410584207.8A patent/CN104408047A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030110237A1 (en) * | 2001-12-06 | 2003-06-12 | Hitachi, Ltd. | Methods of migrating data between storage apparatuses |
CN101227460A (en) * | 2007-01-19 | 2008-07-23 | 秦晨 | Method for uploading and downloading distributed document and apparatus and system thereof |
CN103530388A (en) * | 2013-10-22 | 2014-01-22 | 浪潮电子信息产业股份有限公司 | Performance improving data processing method in cloud storage system |
CN103544285A (en) * | 2013-10-28 | 2014-01-29 | 华为技术有限公司 | Data loading method and device |
CN103970881A (en) * | 2014-05-16 | 2014-08-06 | 浪潮(北京)电子信息产业有限公司 | Method and system for achieving file uploading |
CN103971066A (en) * | 2014-05-20 | 2014-08-06 | 浪潮电子信息产业股份有限公司 | Verification method for integrity of big data migration in HDFS |
Non-Patent Citations (1)
Title |
---|
杨锋 等: "基于Hadoop 的海量农业数据资源管理平台", 《计算机工程》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105357280A (en) * | 2015-10-19 | 2016-02-24 | 福建新大陆软件工程有限公司 | Hadoop distributed file system (HDFS) based file tracing file transfer protocol (FTP) system |
CN105357280B (en) * | 2015-10-19 | 2019-02-19 | 福建新大陆软件工程有限公司 | A kind of file based on HDFS is traced to the source FTP system |
CN105357317A (en) * | 2015-12-07 | 2016-02-24 | 金蝶软件(中国)有限公司 | Data uploading method and system based on multi-client polling queuing |
CN105357317B (en) * | 2015-12-07 | 2019-06-07 | 金蝶软件(中国)有限公司 | A kind of data uploading method and system based on multi-client repeating query queuing |
CN105610899A (en) * | 2015-12-10 | 2016-05-25 | 浪潮(北京)电子信息产业有限公司 | Text file parallel uploading method and device |
CN105610899B (en) * | 2015-12-10 | 2019-09-24 | 浪潮(北京)电子信息产业有限公司 | A kind of parallel method for uploading of text file and device |
CN106339473A (en) * | 2016-08-29 | 2017-01-18 | 北京百度网讯科技有限公司 | Method and device for copying file |
CN108280214A (en) * | 2017-02-02 | 2018-07-13 | 马志强 | Quick I/O systems applied to distributed genetic group analysis |
CN107800691A (en) * | 2017-10-12 | 2018-03-13 | 云巅(上海)网络科技有限公司 | The system and method for building application program on demand and accessing data trnascription is realized based on distributed storage mechanism |
CN109325002A (en) * | 2018-09-03 | 2019-02-12 | 北京京东金融科技控股有限公司 | Text file processing method, device, system, electronic equipment, storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104408047A (en) | Method for uploading text file to HDFS (hadoop distributed file system) in multi-machine parallel mode based on NFS (network file system) file server | |
US11003992B2 (en) | Distributed training and prediction using elastic resources | |
CN111966684B (en) | Apparatus, method and computer program product for distributed data set indexing | |
US9268716B2 (en) | Writing data from hadoop to off grid storage | |
US9628582B2 (en) | Social-driven precaching of accessible objects | |
US11716271B2 (en) | Automated data flows using flow-based data processor blocks | |
US9767040B2 (en) | System and method for generating and storing real-time analytics metric data using an in memory buffer service consumer framework | |
US9715532B1 (en) | Systems and methods for content object optimization | |
US11182216B2 (en) | Auto-scaling cloud-based computing clusters dynamically using multiple scaling decision makers | |
JP7038740B2 (en) | Data aggregation methods for cache optimization and efficient processing | |
US20140047059A1 (en) | Method for improving mobile network performance via ad-hoc peer-to-peer request partitioning | |
Tudoran et al. | Jetstream: Enabling high performance event streaming across cloud data-centers | |
EP3161610A1 (en) | Optimized browser rendering process | |
US20140149465A1 (en) | Feature rich view of an entity subgraph | |
Pal et al. | Big data real time ingestion and machine learning | |
US20160259494A1 (en) | System and method for controlling video thumbnail images | |
Dev et al. | A survey of different technologies and recent challenges of big data | |
KR102031589B1 (en) | Methods and systems for processing relationship chains, and storage media | |
CN106101710A (en) | A kind of distributed video transcoding method and device | |
Li et al. | Design of the mass multimedia files storage architecture based on Hadoop | |
US11481168B2 (en) | Data streams of production intents | |
Li et al. | Enabling performance as a service for a cloud storage system | |
Basha et al. | Storage and processing speed for knowledge from enhanced cloud computing with Hadoop frame work: A survey | |
Chen et al. | Data-driven parallel video transcoding for content delivery network in the cloud | |
Manekar et al. | Studying cloud as IAAS for big data analytics: opportunity, challenges |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150311 |
|
WD01 | Invention patent application deemed withdrawn after publication |