CN103617177A - Stackable repeating data deletion file system - Google Patents
Stackable repeating data deletion file system Download PDFInfo
- Publication number
- CN103617177A CN103617177A CN201310541623.5A CN201310541623A CN103617177A CN 103617177 A CN103617177 A CN 103617177A CN 201310541623 A CN201310541623 A CN 201310541623A CN 103617177 A CN103617177 A CN 103617177A
- Authority
- CN
- China
- Prior art keywords
- data
- file system
- deduplication
- service module
- storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
提出一种堆叠式重复数据删除文件系统,包括文件系统服务模块,对于正常的数据,采用直接接口转换的方式将底层文件系统的数据导入本文件系统中;对于进行了重复数据删除的数据,读取相应的数据属性标识,进行IO流程的重定向,实现重删后数据的透明无缝访问;重删服务模块,读取文件系统服务模块导出的文件系统日志数据,解析日志内容后进行数据签名的计算、重复数据的检测和删除,完成重删后对数据进行标识。所述系统能够充分利用已有存储系统的存储能力,无需升级硬件最大限度地节省投资,通过堆叠式的软件设计,在已有的文件系统上提供重复数据删除功能,优化数据存储结构,降低存储系统的空间占用。
A stacked data deduplication file system is proposed, including the file system service module. For normal data, the data of the underlying file system is imported into the file system by direct interface conversion; for the deduplicated data, read Take the corresponding data attribute identifier, redirect the IO process, and realize transparent and seamless access to the data after deduplication; the deduplication service module reads the file system log data exported by the file system service module, and performs data signature after parsing the log content The calculation, the detection and deletion of duplicate data, and the identification of the data after deduplication is completed. The system can make full use of the storage capacity of the existing storage system without upgrading the hardware to save investment to the greatest extent. Through the stacked software design, it provides the deduplication function on the existing file system, optimizes the data storage structure, and reduces the storage cost. The space occupied by the system.
Description
技术领域technical field
本发明涉及计算机存储领域,具体涉及一种基于堆叠式文件系统技术实现的重复数据删除文件系统。The invention relates to the field of computer storage, in particular to a data deduplication file system based on stacked file system technology.
背景技术Background technique
在大型存储系统中,数据急速增长与存储设备升级相对缓慢的矛盾较为尖锐,为了缓解存储系统的空间增长问题,缩减数据占用的空间,降低成本,最大化利用已有资源,重复数据删除技术已经成为大型系统中必不可少的关键技术。In a large-scale storage system, the contradiction between the rapid growth of data and the relatively slow upgrade of storage devices is relatively acute. In order to alleviate the problem of space growth in the storage system, reduce the space occupied by data, reduce costs, and maximize the use of existing resources, deduplication technology has been adopted. Become an indispensable key technology in large-scale systems.
通过使用重复数据删除技术,用户可以获得明显的数据缩减效果,可以大大降低存储系统的带宽需求,降低运营成本和维护成本。通过数据缩减使得后端实际的存储容量大大缩减,由此带来了更简洁的存储管理,有效降低了管理成本。By using the data deduplication technology, users can obtain obvious data reduction effects, which can greatly reduce the bandwidth requirements of the storage system, and reduce operating and maintenance costs. Through data reduction, the actual storage capacity of the backend is greatly reduced, which brings simpler storage management and effectively reduces management costs.
然而目前流行的重复数据删除方案,多为面向近线存储和备份存储的重删方案,而且往往与备份系统紧密结合,因而无法提供一般性的文件系统服务。能够在在线系统中直接提供重复数据删除功能的产品较少,且均需要使用专有的文件系统格式,这些专有的文件系统往往在性能、功能、可靠性、可扩展性方面均存在诸多限制,使得在大型在线存储系统中直接应用存在一定困难。However, currently popular data deduplication solutions are mostly for nearline storage and backup storage, and are often closely integrated with backup systems, so they cannot provide general file system services. There are few products that can directly provide the deduplication function in the online system, and all of them need to use a proprietary file system format. These proprietary file systems often have many limitations in terms of performance, function, reliability, and scalability , making it difficult to apply directly in large-scale online storage systems.
已有的大型存储系统往往基于成熟的文件系统构建,如ext3、ext4、xfs、lustre等,这类文件系统本身并不具备重复数据删除的功能,而如果要使用重复数据删除功能,则面临着需要使用专有的文件系统,忍受明显可感知的性能降低,并进行大规模的数据迁移,这带来极高的时间和空间成本,在已经有大量数据的存储系统中,基本上没有可行性,成本过高。Existing large-scale storage systems are often built based on mature file systems, such as ext3, ext4, xfs, lustre, etc. These file systems do not have the function of deduplication, and if you want to use the function of deduplication, you will face It is necessary to use a proprietary file system, endure obvious perceptible performance degradation, and perform large-scale data migration, which brings extremely high time and space costs, and is basically not feasible in a storage system that already has a large amount of data , the cost is too high.
针对这一现状,本发明设计了一种堆叠式重复数据删除文件系统,能够基于已有的成熟的文件系统提供重复数据删除功能,充分保持原有存储系统的性能,同时几乎不需要进行任何数据迁移。In view of this situation, the present invention designs a stacked data deduplication file system, which can provide deduplication function based on the existing mature file system, fully maintain the performance of the original storage system, and hardly need any data migrate.
发明内容Contents of the invention
本发明设计并实现了一种堆叠式重复数据删除文件系统,能够充分利用已有存储系统的存储能力,无需升级硬件最大限度地节省投资,通过堆叠式的软件设计,在已有的文件系统上提供重复数据删除功能,优化数据存储结构,降低存储系统的空间占用。The present invention designs and implements a stacked data deduplication file system, which can make full use of the storage capacity of the existing storage system and save investment to the greatest extent without upgrading the hardware. Through the stacked software design, on the existing file system Provides the deduplication function, optimizes the data storage structure, and reduces the space occupied by the storage system.
所述系统包括:The system includes:
文件系统服务模块,对于正常的数据,采用直接接口转换的方式将底层文件系统的数据导入本文件系统中;对于进行了重复数据删除的数据,读取相应的数据属性标识,进行IO流程的重定向,实现重删后数据的透明无缝访问;The file system service module, for normal data, imports the data of the underlying file system into this file system by means of direct interface conversion; for the data that has been deduplicated, reads the corresponding data attribute identification, and performs IO process re- Orientation, to achieve transparent and seamless access to data after deduplication;
重删服务模块,读取文件系统服务模块导出的文件系统日志数据,解析日志内容后进行数据签名的计算、重复数据的检测和删除,完成重删后对数据进行标识。The deduplication service module reads the file system log data exported by the file system service module, calculates the data signature, detects and deletes duplicate data after parsing the log content, and identifies the data after deduplication is completed.
本发明的有益效果是:基于堆叠式文件系统的设计可以充分利用现有的存储系统,仅通过安装本专利描述的软件系统即可使已有的文件系统支持重复数据删除功能以节省存储空间,无需迁移数据,同时保持了原有存储系统的IO性能,实现充分的设备利旧和投资保护。The beneficial effects of the present invention are: the design based on the stacked file system can make full use of the existing storage system, and only by installing the software system described in this patent, the existing file system can support the deduplication function to save storage space, There is no need to migrate data, while maintaining the IO performance of the original storage system, achieving full equipment recycling and investment protection.
附图说明Description of drawings
附图1为本专利所提出的堆叠式重复数据删除文件系统的架构示意图。Accompanying
具体实施方式Detailed ways
下面参照附图1,对本发明的内容以一个具体实例来描述实现这一体系结构的过程。Referring to accompanying
正如发明内容中所描述的,本发明体系结构主要包括:文件系统服务模块、重删服务模块。As described in the summary of the invention, the architecture of the present invention mainly includes: a file system service module and a deduplication service module.
文件系统服务模块实现了一个完整支持POSIX协议的文件系统,其采用了堆叠式文件系统的设计策略,通过在文件系统接口层的映射和重写,将底层文件系统的服务完整实现。对于正常的数据,本模块采用直接接口转换的方式将底层文件系统的数据导入本文件系统中,实现了正常数据的无缝访问。对于进行了重复数据删除的数据,本模块根据本发明所描述的文件系统的约定,读取相应的数据属性标识,进行IO流程的重定向,实现重删后数据的透明无缝访问。The file system service module implements a file system that fully supports the POSIX protocol. It adopts the design strategy of a stacked file system, and fully realizes the services of the underlying file system through mapping and rewriting at the file system interface layer. For normal data, this module imports the data of the underlying file system into this file system by means of direct interface conversion, realizing the seamless access of normal data. For the deduplicated data, this module reads the corresponding data attribute identification according to the agreement of the file system described in the present invention, redirects the IO process, and realizes transparent and seamless access to the deduplicated data.
重删服务模块在带外独立运行,其采用多线程设计,充分利用多核系统的并行计算能力,提供超高速的重复数据删除功能。本模块读取文件系统服务模块导出的文件系统日志数据,解析日志内容后进行数据签名的计算、重复数据的检测和删除,完成重删后对数据进行标识。本模块可与文件系统服务模块同时运行,通过文件系统服务模块内设计的细粒度锁,保证数据处理的原子性,提供可靠的并行数据处理能力。The deduplication service module runs independently out-of-band. It adopts multi-thread design, fully utilizes the parallel computing capability of the multi-core system, and provides ultra-high-speed deduplication function. This module reads the file system log data exported by the file system service module, analyzes the log content, calculates the data signature, detects and deletes duplicate data, and identifies the data after deduplication. This module can run simultaneously with the file system service module, and through the fine-grained lock designed in the file system service module, the atomicity of data processing is guaranteed and reliable parallel data processing capabilities are provided.
在一个典型的配置环境里,文件系统服务模块、重删服务模块可作为一般应用软件安装到主机系统中。在进行了相关的软件配置后,可启动文件系统服务模块、重删服务模块,此时已经能够在主机上挂载本发明描述的文件系统,并能够进行数据访问。在一段时间的文件系统IO完成后,重删服务模块能够自动地进行数据签名的计算,并根据配置参数进行重复数据的检测和删除,并完成重删后数据的标记。In a typical configuration environment, the file system service module and deduplication service module can be installed in the host system as general application software. After the relevant software configuration is carried out, the file system service module and the deduplication service module can be started. At this time, the file system described in the present invention can be mounted on the host computer and data access can be performed. After a period of file system IO is completed, the deduplication service module can automatically calculate the data signature, detect and delete duplicate data according to the configuration parameters, and complete the marking of the deduplicated data.
至此,已经完整实现了整个堆叠式重复数据删除文件系统,实现了在已有文件系统上提供高性能重复数据删除服务的功能,极大的提高了存储系统的空间利用率,有效保护了客户投资。So far, the entire stacked deduplication file system has been fully realized, and the function of providing high-performance deduplication service on the existing file system has been realized, which greatly improves the space utilization rate of the storage system and effectively protects customer investment. .
当然,本发明还可有其他多种实施例,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明的权利要求的保护范围。Of course, the present invention can also have other various embodiments, and those skilled in the art can make various corresponding changes and deformations according to the present invention without departing from the spirit and essence of the present invention, but these corresponding Changes and deformations should all belong to the protection scope of the claims of the present invention.
Claims (1)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310541623.5A CN103617177A (en) | 2013-11-05 | 2013-11-05 | Stackable repeating data deletion file system |
| PCT/CN2014/089303 WO2015067128A1 (en) | 2013-11-05 | 2014-10-23 | Stackable data duplication file system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310541623.5A CN103617177A (en) | 2013-11-05 | 2013-11-05 | Stackable repeating data deletion file system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN103617177A true CN103617177A (en) | 2014-03-05 |
Family
ID=50167880
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201310541623.5A Pending CN103617177A (en) | 2013-11-05 | 2013-11-05 | Stackable repeating data deletion file system |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN103617177A (en) |
| WO (1) | WO2015067128A1 (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104133888A (en) * | 2014-07-30 | 2014-11-05 | 宇龙计算机通信科技(深圳)有限公司 | Multi-system data processing method, device and terminal |
| CN104391915A (en) * | 2014-11-19 | 2015-03-04 | 湖南国科微电子有限公司 | Duplicated data delete method |
| WO2015067128A1 (en) * | 2013-11-05 | 2015-05-14 | 浪潮(北京)电子信息产业有限公司 | Stackable data duplication file system |
| CN105205094A (en) * | 2015-08-12 | 2015-12-30 | 浪潮(北京)电子信息产业有限公司 | Multi-control share storage system |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100082700A1 (en) * | 2008-09-22 | 2010-04-01 | Riverbed Technology, Inc. | Storage system for data virtualization and deduplication |
| US20100082547A1 (en) * | 2008-09-22 | 2010-04-01 | Riverbed Technology, Inc. | Log Structured Content Addressable Deduplicating Storage |
| CN101908073A (en) * | 2010-08-13 | 2010-12-08 | 清华大学 | A method for real-time deletion of duplicate data in a file system |
| CN103051671A (en) * | 2012-11-22 | 2013-04-17 | 浪潮电子信息产业股份有限公司 | Repeating data deletion method for cluster file system |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB0104227D0 (en) * | 2001-02-21 | 2001-04-11 | Ibm | Information component based data storage and management |
| CN103279502B (en) * | 2013-05-06 | 2016-01-20 | 北京赛思信安技术有限公司 | A kind of framework and method with the data de-duplication file system be combined with parallel file system |
| CN103617177A (en) * | 2013-11-05 | 2014-03-05 | 浪潮(北京)电子信息产业有限公司 | Stackable repeating data deletion file system |
-
2013
- 2013-11-05 CN CN201310541623.5A patent/CN103617177A/en active Pending
-
2014
- 2014-10-23 WO PCT/CN2014/089303 patent/WO2015067128A1/en active Application Filing
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100082700A1 (en) * | 2008-09-22 | 2010-04-01 | Riverbed Technology, Inc. | Storage system for data virtualization and deduplication |
| US20100082547A1 (en) * | 2008-09-22 | 2010-04-01 | Riverbed Technology, Inc. | Log Structured Content Addressable Deduplicating Storage |
| CN101908073A (en) * | 2010-08-13 | 2010-12-08 | 清华大学 | A method for real-time deletion of duplicate data in a file system |
| CN103051671A (en) * | 2012-11-22 | 2013-04-17 | 浪潮电子信息产业股份有限公司 | Repeating data deletion method for cluster file system |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015067128A1 (en) * | 2013-11-05 | 2015-05-14 | 浪潮(北京)电子信息产业有限公司 | Stackable data duplication file system |
| CN104133888A (en) * | 2014-07-30 | 2014-11-05 | 宇龙计算机通信科技(深圳)有限公司 | Multi-system data processing method, device and terminal |
| CN104133888B (en) * | 2014-07-30 | 2019-08-02 | 宇龙计算机通信科技(深圳)有限公司 | A kind of multisystem data processing method, device and terminal |
| CN104391915A (en) * | 2014-11-19 | 2015-03-04 | 湖南国科微电子有限公司 | Duplicated data delete method |
| CN104391915B (en) * | 2014-11-19 | 2016-02-24 | 湖南国科微电子股份有限公司 | A kind of data heavily delete method |
| CN105205094A (en) * | 2015-08-12 | 2015-12-30 | 浪潮(北京)电子信息产业有限公司 | Multi-control share storage system |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2015067128A1 (en) | 2015-05-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102629258B (en) | Repeating data deleting method and device | |
| CN102662992B (en) | Method and device for storing and accessing massive small files | |
| US9729659B2 (en) | Caching content addressable data chunks for storage virtualization | |
| CN103229173B (en) | Metadata management method and system | |
| CN104145468B (en) | Method and device for controlling file access authority | |
| CN101866359B (en) | Small file storage and visit method in avicade file system | |
| CN105183839A (en) | Hadoop-based storage optimizing method for small file hierachical indexing | |
| CN104462185B (en) | A kind of digital library's cloud storage system based on mixed structure | |
| CN106909651A (en) | A kind of method for being write based on HDFS small documents and being read | |
| CN103561101A (en) | Network file system | |
| CN105487818A (en) | Efficient duplicate removal method for repeated redundant data in cloud storage system | |
| CN103020174A (en) | Similarity analysis method, device and system | |
| CN103279502B (en) | A kind of framework and method with the data de-duplication file system be combined with parallel file system | |
| CN103034684A (en) | Optimizing method for storing virtual machine mirror images based on CAS (content addressable storage) | |
| CN103078898B (en) | File system, interface service device and data storage service supplying method | |
| CN105630810B (en) | A method of mass small documents are uploaded in distributed memory system | |
| CN104778229A (en) | Telecommunication service small file storage system and method based on Hadoop | |
| CN103595799A (en) | Method for achieving distributed shared data bank | |
| CN103617177A (en) | Stackable repeating data deletion file system | |
| WO2021082928A1 (en) | Data reduction method and apparatus, computing device, and storage medium | |
| CN102722450B (en) | Storage method for redundancy deletion block device based on location-sensitive hash | |
| CN105516313A (en) | Distributed storage system used for big data | |
| CN102566942A (en) | File striping writing method, device and system | |
| CN103984507A (en) | Storage configuration and optimizing strategy for bioinformatics high-performance computing platform | |
| CN103543959B (en) | The method and device of mass data cache |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140305 |
|
| WD01 | Invention patent application deemed withdrawn after publication |
