CN106547910A - Moosefs realizes the multistage storage method of file based on Tachyon - Google Patents

Moosefs realizes the multistage storage method of file based on Tachyon Download PDF

Info

Publication number
CN106547910A
CN106547910A CN201611053775.0A CN201611053775A CN106547910A CN 106547910 A CN106547910 A CN 106547910A CN 201611053775 A CN201611053775 A CN 201611053775A CN 106547910 A CN106547910 A CN 106547910A
Authority
CN
China
Prior art keywords
tachyon
moosefs
file
bin
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611053775.0A
Other languages
Chinese (zh)
Inventor
葛天成
曹苗苗
宋育千
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Business System Co Ltd
Original Assignee
Shandong Inspur Business System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Business System Co Ltd filed Critical Shandong Inspur Business System Co Ltd
Priority to CN201611053775.0A priority Critical patent/CN106547910A/en
Publication of CN106547910A publication Critical patent/CN106547910A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of moosefs and realizes the multistage storage method of file based on Tachyon, belong to cloud storage, big data field, the present invention is by introducing Tachyon internal memory distributed file systems in MooseFS file system, the file existed in tachyon can be accessed in cluster to access the speed of internal memory, Tachyon is a kind of middleware of the framework between the distributed document storage of the bottom and the various Computational frames on upper strata.Major responsibility is that those need not be landed the file in distributed disk file system, lands in distributed memory file system, reaches shared drive, so as to improve efficiency.Memory redundancy, GC times etc. can be reduced simultaneously.Can also easily dock with the map reduce and spark computation models of hadoop.

Description

Moosefs realizes the multistage storage method of file based on Tachyon
Technical field
The present invention relates to based on Tachyon, cloud storage, big data field, more particularly to a kind of moosefs realize that file is more Level storage method.
Background technology
MooseFS is a network distributed file system.It is dispersed in data on multiple servers, but for user For, it is seen that a simply source.MFS is the same also like other class unix file system, contains hierarchical structure(Directory tree), deposit Store up file attribute(Authority, finally accesses and modification time), special file can be created(Block device, character device, pipe Road, socket), Symbolic Links, hard link.
The advantage of MooseFS:
1. can be extremely strong with on-line rapid estimation, architectural framework scalability.
2. dispose simple.Universal document system, it is not necessary to which changing upper layer application can be to use (support fuse).But The kernel of some early versions, such as 5.4 may once need to increase fuse modules
3. architectural framework High Availabitity, except master components are without Single Point of Faliure.
4. file object High Availabitity, can arrange arbitrary file redundancy degree (three copies), and can never affect to read or The performance that person writes.
5. the function of Windows recycle bins is provided.
6. the GC (garbage reclamation) of similar Java language is provided.
7. the snapshot characteristics of business storage such as netapp, emc, ibm are provided.
8. web gui monitoring interfaces are provided.
MooseFS distributed file storage systems are ripe, stablize famous, though have many advantages, such as above, its read-write Performance is barely satisfactory all the time, and its performance is far below HDFS, glusterFS distributed file system.
MooseFS shortcomings:
1. it is used for depositing input and the MooseFS of output data is located in a long-range storage cluster.Local computing cluster with There is higher network delay in long-range storage cluster, frequently teledata exchanges the big bottleneck for becoming whole stream process process
2., based on disk during the design of MooseFS, its I/O performance is especially write data performance and is difficult to meet streaming calculating Required time delay.
The content of the invention
In order to solve the problem, the present invention proposes a kind of moosefs and realizes the multistage storage side of file based on Tachyon Method.
Tachyon is introduced MooseFS, using the virtual distributed memory system that center is saved as within it, and will be many The upper strata of sample, Computational frame and bottom storage system are coupled together, universal data access mode, improve MooseFS read-writes Performance, and in the docking of the Computational frames such as ripe Spark, MapReduce.
Moosefs realizes the multistage storage method of file based on Tachyon, and Tachyon is introduced MooseFS, using it with Virtual distributed memory system centered on internal memory, and diversified upper strata, Computational frame and bottom storage system are connected Get up, universal data access mode improves MooseFS readwrite performances, and in calculation blocks such as ripe Spark, MapReduce Frame is docked.
MooseFS is realized based on fuse, and after MFS Client ends have set mount point, Tachyon just can be as operation Linux generic-documents system equally operates MooseFS.
Step is as follows:
1)Download and Tachyon is installed
$ wget http://tachyon-project.org/downloads/tachyon-0.3.0-bin.tar.gz
$ tar xvfz tachyon-0.3.0-bin.tar.gz
2)Configuration local file system
TACHYON_UNDERFS_ADDRESS parameters in modification conf/tachyon-env.sh,
cp conf/tachyon-env.sh.template conf/tachyon-env.sh
export TACHYON_UNDERFS_ADDRESS=/home/smallb/tachyon-0.3.0/tmp
3)Initialization and startup
$ ./bin/format.sh
$ ./bin/start.sh local。
The storage characteristics that center is saved as within Tachyon causes the data access speed of upper layer application than existing conventional scheme Fast several orders of magnitude.
Tachyon mainly includes to the value that whole stream processing system brings:
1. it is using the Bedding storage characteristic of Tachyon, comprehensive to use the various storage resources of internal memory, SSD and disk.
The cache policies such as LRU, the LFU provided by Tachyon can ensure that dsc data is maintained in internal memory,
Cold data is then persisted in the level2 even storage devices of level3;And MooseFS is standby as long-term file Part system.
2. the characteristic of multiple Computational frames is supported using Tachyon, Spark and Zeppelin are realized by Tachyon Deng the data sharing between Computational frame, and reach the file transfer rate of internal memory level;Additionally, we plan Flink and Presto business migrations are on Tachyon.
3., using the naming space characteristic of Tachyon, long-range MooseFS bottom storage systems are easily managed, And unified NameSpace is provided to upper strata, Computational frame can unify the different data source of access by Tachyon with application Data.
4. the various wieldy API for being provided using Tachyon, reduces the learning cost of user, convenient by originally Whole system migrate to Tachyon, while also cause adjustment verification process become light many.
5. Tachyon and Spark has closely combination, and key data is stored in by we in Spark Streaming In Tachyon rather than in the JVM of Spark executor, as storage location is equally local memory, therefore will not drag slow The performance of data processing, can reduce the expense of Java GC on the contrary.Meanwhile, this way is it also avoid because of number on same node The internal memory caused according to the redundancy of block overflows.The intermediate result that SparkSteaming is also calculated by we is i.e. to RDD's Checkpoint is stored on Tachyon.
Description of the drawings
Fig. 1 is the topological diagram of present system;
Fig. 2 is Tachyon fundamental diagrams.
Specific embodiment
More detailed elaboration is carried out to present disclosure below:
The present invention by MooseFS file system introduce Tachyon internal memory distributed file systems, can in cluster with Access the speed of internal memory to access the file existed in tachyon, Tachyon is that framework is stored in the distributed document of the bottom A kind of middleware and the various Computational frames on upper strata between.Major responsibility is that those need not be landed distributed disk text File in part system, lands in distributed memory file system, reaches shared drive, so as to improve efficiency.Simultaneously can To reduce memory redundancy, GC times etc..Can also easily dock with the map reduce and spark computation models of hadoop.
MooseFS is realized based on fuse, and after MFS Client ends have set mount point, Tachyon just can be as operation Linux generic-documents system equally operates MooseFS, and this brings great convenience to deployment enforcement
Step is described
1 downloads installation Tachyon
$ wget http://tachyon-project.org/downloads/tachyon-0.3.0-bin.tar.gz
$ tar xvfz tachyon-0.3.0-bin.tar.gz
2 configuration local file systems
TACHYON_UNDERFS_ADDRESS parameters in modification conf/tachyon-env.sh, such as:
cp conf/tachyon-env.sh.template conf/tachyon-env.sh
export TACHYON_UNDERFS_ADDRESS=/home/smallb/tachyon-0.3.0/tmp
3 initialization and startup
$ ./bin/format.sh
$ ./bin/start.sh local。

Claims (3)

1.moosefs realizes the multistage storage method of file based on Tachyon, it is characterised in that Tachyon is introduced MooseFS, Using the virtual distributed memory system that center is saved as within it, and diversified upper strata, Computational frame and bottom are stored System is coupled together, universal data access mode, improves MooseFS readwrite performances, and in ripe Spark, MapReduce Dock Deng Computational frame.
2. method according to claim 1, it is characterised in that
MooseFS is realized based on fuse, and after MFS Client ends have set mount point, Tachyon just can be as operating Linux mono- As file system equally operate MooseFS.
3. method according to claim 2, it is characterised in that
Step is as follows:
1)Download and Tachyon is installed
$ wget http://tachyon-project.org/downloads/tachyon-0.3.0-bin.tar.gz
$ tar xvfz tachyon-0.3.0-bin.tar.gz
2)Configuration local file system
TACHYON_UNDERFS_ADDRESS parameters in modification conf/tachyon-env.sh,
cp conf/tachyon-env.sh.template conf/tachyon-env.sh
export TACHYON_UNDERFS_ADDRESS=/home/smallb/tachyon-0.3.0/tmp
3)Initialization and startup
$ ./bin/format.sh
$ ./bin/start.sh local。
CN201611053775.0A 2016-11-25 2016-11-25 Moosefs realizes the multistage storage method of file based on Tachyon Pending CN106547910A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611053775.0A CN106547910A (en) 2016-11-25 2016-11-25 Moosefs realizes the multistage storage method of file based on Tachyon

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611053775.0A CN106547910A (en) 2016-11-25 2016-11-25 Moosefs realizes the multistage storage method of file based on Tachyon

Publications (1)

Publication Number Publication Date
CN106547910A true CN106547910A (en) 2017-03-29

Family

ID=58395189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611053775.0A Pending CN106547910A (en) 2016-11-25 2016-11-25 Moosefs realizes the multistage storage method of file based on Tachyon

Country Status (1)

Country Link
CN (1) CN106547910A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220348A (en) * 2017-05-27 2017-09-29 郑州云海信息技术有限公司 A kind of method of data capture based on Flume and Alluxio
CN107483571A (en) * 2017-08-08 2017-12-15 柏域信息科技(上海)有限公司 A kind of dynamic cloud storage method and system
CN109740765A (en) * 2019-01-31 2019-05-10 成都品果科技有限公司 A kind of machine learning system building method based on Amazon server
CN111736776A (en) * 2020-06-24 2020-10-02 杭州海康威视数字技术股份有限公司 Data storage and reading method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268252A (en) * 2013-05-12 2013-08-28 南京载玄信息科技有限公司 Virtualization platform system based on distributed storage and achieving method thereof
CN103747064A (en) * 2013-12-26 2014-04-23 广东中科遥感技术有限公司 Mounting method, client and system based on MooseFS distributed file system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268252A (en) * 2013-05-12 2013-08-28 南京载玄信息科技有限公司 Virtualization platform system based on distributed storage and achieving method thereof
CN103747064A (en) * 2013-12-26 2014-04-23 广东中科遥感技术有限公司 Mounting method, client and system based on MooseFS distributed file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHINE_FOREVER: "分析分布式文件系统MooseFS架构(基于HDFS架构思想)", 《HTTP://F.DATAGURU.CN/THREAD-35134-1-1.HTML》 *
YIRENBOY: "Spark入门实战系列--10.分布式内存文件系统Tachyon介绍及安装部署", 《HTTPS://BLOG.CSDN.NET/YIRENBOY/ARTICLE/DETAILS/48368455》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220348A (en) * 2017-05-27 2017-09-29 郑州云海信息技术有限公司 A kind of method of data capture based on Flume and Alluxio
CN107483571A (en) * 2017-08-08 2017-12-15 柏域信息科技(上海)有限公司 A kind of dynamic cloud storage method and system
CN109740765A (en) * 2019-01-31 2019-05-10 成都品果科技有限公司 A kind of machine learning system building method based on Amazon server
CN109740765B (en) * 2019-01-31 2023-05-02 成都品果科技有限公司 Machine learning system building method based on Amazon network server
CN111736776A (en) * 2020-06-24 2020-10-02 杭州海康威视数字技术股份有限公司 Data storage and reading method and device
CN111736776B (en) * 2020-06-24 2023-10-10 杭州海康威视数字技术股份有限公司 Data storage and reading method and device

Similar Documents

Publication Publication Date Title
US10929428B1 (en) Adaptive database replication for database copies
KR102457611B1 (en) Method and apparatus for tenant-aware storage sharing platform
US8516159B2 (en) Asynchronous file operations in a scalable multi-node file system cache for a remote cluster file system
US20210141917A1 (en) Low latency access to physical storage locations by implementing multiple levels of metadata
US9087066B2 (en) Virtual disk from network shares and file servers
US9268652B1 (en) Cached volumes at storage gateways
US9251003B1 (en) Database cache survivability across database failures
US10725666B2 (en) Memory-based on-demand data page generation
Stoyanov et al. Efficient live migration of linux containers
US9559889B1 (en) Cache population optimization for storage gateways
US20210344772A1 (en) Distributed database systems including callback techniques for cache of same
US9959074B1 (en) Asynchronous in-memory data backup system
KR102288503B1 (en) Apparatus and method for managing integrated storage
US20220114064A1 (en) Online restore for database engines
US11048591B1 (en) Efficient name space organization in a global name space cluster
US10747677B2 (en) Snapshot locking mechanism
CN106547910A (en) Moosefs realizes the multistage storage method of file based on Tachyon
NO326041B1 (en) Procedure for managing data storage in a system for searching and retrieving information
US11567680B2 (en) Method and system for dynamic storage scaling
US10298709B1 (en) Performance of Hadoop distributed file system operations in a non-native operating system
US11494301B2 (en) Storage system journal ownership mechanism
US11341163B1 (en) Multi-level replication filtering for a distributed database
US20230169093A1 (en) Fast database scaling utilizing a decoupled storage and compute architecture
US11556473B2 (en) Cache memory management
US11341001B1 (en) Unlimited database change capture for online database restores

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170329

RJ01 Rejection of invention patent application after publication