WO2013086969A1 - Procédé, dispositif et système permettant de trouver des données en double - Google Patents

Procédé, dispositif et système permettant de trouver des données en double Download PDF

Info

Publication number
WO2013086969A1
WO2013086969A1 PCT/CN2012/086371 CN2012086371W WO2013086969A1 WO 2013086969 A1 WO2013086969 A1 WO 2013086969A1 CN 2012086371 W CN2012086371 W CN 2012086371W WO 2013086969 A1 WO2013086969 A1 WO 2013086969A1
Authority
WO
WIPO (PCT)
Prior art keywords
block data
fingerprint information
data
information
metadata
Prior art date
Application number
PCT/CN2012/086371
Other languages
English (en)
Chinese (zh)
Inventor
黄焰
谢勇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2013086969A1 publication Critical patent/WO2013086969A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • G06F16/1752De-duplication implemented within the file system, e.g. based on file segments based on file chunks

Definitions

  • Embodiments of the present invention relate to storage technologies, and in particular, to a repeated data search method, apparatus, and system. Background technique
  • Deduplication also known as smart compression or single instance storage
  • deduplication is an automatic search for duplicate data, keeping only the same copy of the same data, and replacing the other with a pointer to a single copy.
  • the search for duplicate data is undoubtedly an important indicator of deduplication performance.
  • the file to be deduplicated is segmented based on the content to obtain the first block, and then the first block is subdivided to obtain the segment data; Randomly sampling a subdivided block data in the block, calculating fingerprint information of the subdivided block data and using the fingerprint information as the first segmented fingerprint information, determining the processing node of the first block by using the fingerprint information, and belonging to the first time All the subdivision blocks of the block are allocated to the node for processing, and each subdivision block is searched in the database managed by the metadata server for duplicate data; if the same fingerprint information is not found, the subdivision block is considered to be If there is no storage in the database, the data of the subdivided block is compressed and sent to the specified metadata server for storage; if the same fingerprint information is queried, the subdivision block is considered to have been stored in the database, and the subdivision is updated.
  • the reference count of the block The inventors found in the research that the prior art is in the process of searching for duplicate data blocks, when in the database. When the subdivided block data is not stored, the subdivided block data needs to be sent to the node of the management database for storage, even if the subdivided block data is compressed and transmitted during the transfer, but once the amount of data is large, the prior art The method still consumes a lot of link overhead and reduces system performance. Summary of the invention
  • An embodiment of the present invention provides a method for repeating data query, including:
  • Blocking the file generating metadata information of each of the block data, wherein the metadata information of the block data includes fingerprint information of the block data;
  • the server inserts the block data metadata information corresponding to the fingerprint information of the block data in the database; when the query result indicates that the fingerprint information of the block data already exists in the database, the instruction metadata server updates the block The reference information of the fingerprint data of the data in the corresponding block data in the database.
  • the embodiment of the invention further provides a duplicate data searching device, including:
  • a data dividing module configured to block the file, and generate metadata information of each of the blocked data, where the metadata information of the blocked data includes fingerprint information of the blocked data;
  • a sending module configured to send fingerprint information of the block data to the metadata server according to the fingerprint information of the block data, so that the metadata server searches the database for whether the fingerprint information of the received block data is already stored in the database And return the result of the query;
  • the query result processing module receives the query result, and when the block data fingerprint information is not queried in the database, stores the block data corresponding to the block data fingerprint information into the shared file system, and instructs the metadata server Inserting a metadata letter corresponding to the block data fingerprint information in the database
  • the instruction metadata server updates the reference count of the corresponding block data in the database.
  • An embodiment of the present invention further provides a duplicate data searching system, including:
  • a data search device configured to block the file, and generate metadata information of each of the block data, wherein the metadata information of the block data includes fingerprint information of the block data;
  • the fingerprint information of the data is sent to the metadata server to send the fingerprint information of the block data to the metadata server for repeated data query; receiving the query result returned by the metadata server, when the block data fingerprint information is not queried in the database, Storing the block data corresponding to the block data fingerprint information into the shared file system, and instructing the metadata server to insert the block data metadata information in the database; when querying the block data fingerprint information in the database, the instruction
  • the metadata server updates the reference count of the block data corresponding to the fingerprint information;
  • a metadata server configured to receive fingerprint information of the blocked data sent by the repeated data searching device, search for a fingerprint information of the received blocked data in the database, and search for the duplicate data.
  • the device returns a query result; executing an instruction of the reference count of the block data corresponding to the fingerprint information searched by the update data search device and an instruction to insert the received metadata information into the database;
  • a database configured to store metadata information of the block data, wherein the metadata information includes fingerprint information of the block data and a number of times the block data is referenced.
  • the fingerprint information of the block data is determined by the fingerprint information of the block data, and when the fingerprint information of the block data is already stored in the database, The block data corresponding to the block data fingerprint information is sent to the shared file system, and is not sent to the metadata server for storage.
  • the link overhead is greatly reduced, and the system performance is improved.
  • FIG. 1 is a flowchart of a method for searching for duplicate data according to an embodiment of the present invention
  • FIG. 2 is a flowchart of another method for searching for repeated data according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of an embodiment of a data processing device according to the present invention.
  • FIG. 4 is a schematic structural diagram of another embodiment of a repetitive data processing apparatus according to the present invention.
  • FIG. 5 is a schematic structural diagram of an embodiment of a repeated data processing system according to the present invention. detailed description
  • the method in this embodiment may include:
  • Step 100 Block the file to generate metadata information of each of the block data, where the metadata information of the block data includes fingerprint information of the block data;
  • the file data is first divided into blocks; the metadata information of each of the block data is calculated, where the metadata information of the block data includes the fingerprint information of the block data;
  • the metadata information of the block data may further include a file serial number to which the block data belongs in the shared file system, a starting position of the block data in the file to which it belongs, a block length, and a processing node number assigned by the node. And block reference count information, etc.;
  • the algorithm of the fingerprint information may use a hash value algorithm to obtain the hash value of each block data as the fingerprint information of the block data, and of course, other algorithms such as MD5 may also be used. As long as you can get the unique identifier of the block data;
  • Step 102 Send fingerprint information of the block data to the metadata server according to the fingerprint information of the block data, so that the metadata server searches the database for whether the fingerprint information of the received block data is already stored in the database, and Return the query result;
  • the range value of the hash may be calculated according to the hash algorithm used in advance, and the hash range value is divided into m segments to obtain m hash value segments, and each metadata server corresponds to For different hash segments, each metadata server can correspond to m/n hash segments, where n is the number of metadata servers, and m can usually be configured as an integer multiple of n.
  • a 20-byte HASH value can be generated by using the common HASH algorithm.
  • the last byte of the HASH value is shifted to the right by 5 bits.
  • the result of this calculation can be expressed in the range of 2 3 , that is, 0 ⁇ 7. Can represent the HASH segment range.
  • which part of the preset fingerprint information segment is determined according to the fingerprint information of the block data, and the fingerprint information of the block data is sent to the fingerprint according to the determined fingerprint information segmentation.
  • the metadata server corresponding to the segment where the information is located, so that further query work is performed by the metadata server.
  • the metadata database is divided into several groups, and each metadata server can access each group; thus, the data to be deduplicated is divided into several pieces of block data, because the elements of the block data are
  • the data is distributed to a plurality of metadata servers, each of which is responsible for finding one or more metadata information in the database to confirm whether the found metadata information is in the database. Already stored, improve search efficiency.
  • Step 104 Receive the query result, when the block data fingerprint information is not queried in the database, store the block data corresponding to the block data fingerprint information into the shared file system, and instruct the metadata server to be in the database
  • the metadata information corresponding to the block data fingerprint information is inserted; when the block data fingerprint information is queried in the database, the instruction metadata server updates the reference count of the corresponding block data in the database.
  • the reference count of the block data indicates how many identical block data are recorded as one block data, wherein the data content of the one block data recorded, and the content of one of the same block data the same.
  • the number of citations is based on the number of citations added, plus the number of citations added this time.
  • the position of the block data is pointed to the saved single instance. Each time the block data is pointed once, it is considered to be referenced once; How many times this single instance in the database is pointed to, can be considered how many times it is referenced, which means how many identical block data is recorded as a block of data.
  • the received query result includes whether the same fingerprint information is found in the database, and if so, indicating that the searched fingerprint information has been stored in the database, and if not, indicating that the search is not currently stored in the database. Fingerprint information;
  • the reference count of the block data corresponding to the searched fingerprint information is updated in the database; when the query result is "No", the fingerprint information of the searched block data is corresponding.
  • the block data is stored in the shared file system and inserted into the database by the metadata server. Metadata information corresponding to the block data fingerprint information.
  • the fingerprint information of the block data is determined by the fingerprint information of the block data, and when the fingerprint information of the block data is already stored in the database, The block data corresponding to the block data fingerprint information is sent to the shared file system without being sent to the metadata server for storage, thereby greatly reducing link overhead and improving system performance.
  • an embodiment of the present invention further provides another method for deduplicating data, including: Step 200: Blocking a file, generating metadata information of each block data, where the element of the block data is The data information includes fingerprint information of the block data;
  • the deduplication work of multiple files may be processed at the same time, and after the files are divided into blocks, there may be the same case of blocking data; or, the case where multiple equal-block data is included in one file
  • the embodiment of the present invention may further include: Step 201: Search for the same block data according to the fingerprint information of the block data, and record the same block data as one block data, and according to the same The number of the block data is updated by the reference count in the metadata information corresponding to the blocked block data;
  • the file is first divided into blocks, and then only one instance of the block data having the same content in all the block data is retained, and the other positions where the block data is used are pointed to the single instance by reference. Play to reduce the space occupied. For a single instance partition, how many chunks of data in the original file have the same content, we will record the number of times the single instance is pointed to, this value is before sending the query command to the metadata server. , the reference data corresponding to the block data.
  • step 201 before sending the query instruction to the metadata server, the same number of blocks is first used.
  • the integrated block data is only recorded as one block data, and the reference count is updated according to the metadata corresponding to the block data according to the number of the same block data. For example: There are 6 pieces of block data with the same fingerprint information.
  • the 6 pieces of block data are only recorded as one block data, and the reference count corresponding to the block data of the record is updated. It is 6, indicating that the block data has been referenced 6 times.
  • step 201 only one fingerprint information can be sent to the metadata for the same block data.
  • Step 202 Determine, according to the fingerprint information of the block data, in which preset fingerprint information segment the fingerprint information is, and send the fingerprint information of the block data to a metadata server corresponding to the determined fingerprint information segment;
  • the method of presetting the fingerprint information segment is the same as the embodiment corresponding to FIG. 1, and reference may be made to the corresponding embodiment of FIG.
  • Step 204 The metadata server searches the database for whether the fingerprint information of the received block data is already present in the database. If yes, step 206 is performed. If not, step 208 is performed; Step 206, sending an update to the metadata server An instruction of a reference count of the block data corresponding to the searched fingerprint information;
  • the update count stored in the metadata information corresponding to the block data in the database is updated, and the reference count is increased by 6;
  • Step 208 Store the block data corresponding to the searched fingerprint information into the shared file system, and send the metadata information of the block data corresponding to the searched fingerprint information to the metadata server, and execute the element.
  • the data server inserts the metadata information into the database;
  • a shared file system is provided. It is used to store real data, that is, chunked data, and the shared file system can store the location of the chunked data for each of the duplicate data lookup devices.
  • the embodiment of the present invention further includes: generating a record file to replace the file according to the storage information of the block data of the file in the shared file system;
  • the offset position and the block length of the file are specified from the shared file system by reading the storage information of the blocked data of the file in the shared file system. Obtain the block data of the file to recover the data before deduplication.
  • the "specified file” refers to a file storing the current block data; in the file system, the block data belonging to different HASH segments are stored in different files according to the HASH segment, which facilitates concurrent access when multi-node processing is performed, and
  • the metadata corresponding to the block data stores the serial number of the file to which the block data belongs in the shared file system, and the file stored in the block data can be found by the serial number of the file.
  • a file is deduplicated, a record file for recording the block data information in the file is generated. After the deduplication is completed, the original file is replaced by the file, because there is no more block data in the file at this time. It is just some record files, so it plays the role of reducing the space occupation.
  • the underlying driver determines the current file type. If the file is deduplicated, Then, the duplicate data search device is called to restore the file as a pre-deduplication file, and then provided to the user.
  • the fingerprint information of the block data is determined by the fingerprint information of the block data, and when the fingerprint information of the block data is already stored in the database,
  • the block data corresponding to the block data fingerprint information is sent to the shared file system, and is not sent to the metadata server for storage, and after the block is to be deleted, the same block data is integrated, and then The same block data is uniformly sent to the query command, which greatly reduces the link overhead and improves system performance.
  • the embodiment of the present invention further provides a corresponding device and system, which will be specifically described below. It should be noted that the content consistent with the method will not be described in detail below.
  • the embodiment of the present invention further provides a repetitive data searching device.
  • the method includes: a data dividing module 300, configured to block a file, and generate metadata information of each block data, where the block data is The metadata information includes the fingerprint information of the block data;
  • the sending module 302 is configured to send the fingerprint information of the block data to the metadata server according to the fingerprint information of the block data, so that the metadata server searches the database for whether the fingerprint information of the received block data is already stored in the database. Medium, and return the query result;
  • the query result processing module 304 receives the query result, and when the block data fingerprint information is not queried in the database, stores the block data corresponding to the block data fingerprint information into the shared file system, and instructs the metadata.
  • the server inserts the metadata information corresponding to the block data fingerprint information in the database; when the block data fingerprint information is queried in the database, the instruction metadata server updates the reference count of the corresponding block data in the database.
  • the fingerprint server responsible for the query is determined by the fingerprint information of the block data, and when the block data fingerprint is already stored in the database, After the information, the block data corresponding to the block data fingerprint information is sent to the shared file system, and is not sent to the metadata server for storage, thereby greatly reducing link overhead and improving system performance.
  • the embodiment of the present invention further provides another repetitive data searching device. Referring to FIG.
  • the method includes: a data dividing module 400, configured to block a file, and generate metadata information of each block data, where the block data is The metadata information includes fingerprint information of the block data;
  • the integration module 402 is configured to: before sending the fingerprint information of the block data to the metadata server, search for the same block data according to the fingerprint information of the block data, and the same The block data is recorded as a block data, and the reference count in the metadata information corresponding to the block data is updated according to the quantity of the same block data;
  • the sending module 404 is configured to divide the fingerprint information according to the block data.
  • the fingerprint information of the block data is sent to the metadata server, so that the metadata server searches the database for whether the fingerprint information of the received blocked data has been stored in the database, and returns a query result; wherein the sending module is divided according to the score Fingerprint information of the block data determines which preset fingerprint information segmentation of the fingerprint information is in accordance with the determined finger Information segment, the sub-block data fingerprint information is transmitted to the determined segment of the fingerprint information corresponding to the metadata server.
  • a shared file system for storing real data, that is, block data, and the shared file system stores the location of the block data for each repeated data search device.
  • the device may access the query result processing module 406, and receive the query result.
  • the block data fingerprint information is not queried in the database
  • the block data corresponding to the block data fingerprint information is stored in the shared file.
  • the metadata server is instructed to insert the metadata information corresponding to the block data fingerprint information in the database; when the block data fingerprint information is queried in the database, the instruction metadata server updates the fingerprint information corresponding in the database. Reference count of chunked data;
  • the query result processing module 406 is further configured to generate a record file to replace the file according to the storage information of the block data of the file in the shared file system.
  • the data recovery module 408 is configured to: when the data of the file needs to be restored, read the storage information of the block data of the file in the shared file system by using the record file, and obtain the block data of the file. Restore the data in the file.
  • the device for deduplicating the data is divided into blocks by the device provided by the embodiment of the present invention, and determining the metadata server responsible for the query by using the fingerprint information of the block data, when the block is already stored in the database, the block is already stored.
  • the data fingerprint information is sent, it is not sent to the metadata server for storage, and after the block is to be deleted, the same block data is integrated, and the query command is uniformly issued to the same block data, thereby largely This reduces link overhead and improves system performance.
  • An embodiment of the present invention further provides a duplicate data searching system, as shown in FIG. 5, including:
  • a data search device 500 configured to block a file, and generate metadata information of each of the block data, wherein the metadata information of the block data includes fingerprint information of the block data;
  • the fingerprint information of the block data is sent to the metadata server to send the fingerprint information of the block data to the metadata server for repeated data query; receiving the query result returned by the metadata server, when the block data fingerprint information is not queried in the database And storing the block data corresponding to the block data fingerprint information into the shared file system, and instructing the metadata server to insert the block data metadata information in the database; when the block data fingerprint information is queried in the database,
  • the instruction metadata server updates the reference count of the corresponding piecewise data of the fingerprint information in the database.
  • the metadata server 502 is configured to receive fingerprint information of the block data sent by the duplicate data searching device, and search for a fingerprint information of the received block data in the database, whether the fingerprint information is already present in the database, and to the duplicate data.
  • the search device returns a query result; and executes an instruction for updating the reference count of the block data corresponding to the fingerprint information searched by the repeated data search device and inserting the received metadata information into the database.
  • the database 504 is configured to store metadata information of the block data, where the metadata information includes fingerprint information of the block data and a number of times the block data is referenced.
  • the fingerprint information of the block data is determined by the fingerprint information of the block data, and when the fingerprint information of the block data is already stored in the database, The block data corresponding to the block data fingerprint information is sent to the shared file system without being sent to the metadata server for storage, thereby greatly reducing link overhead and improving system performance.
  • a person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage medium includes: ROM, RAM, magnetic disk or optical disk, etc., which can store various program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Le mode de réalisation de la présente invention concerne un procédé, un dispositif et un système permettant de trouver des données en double. Le mode de réalisation comprend les étapes consistant à : bloquer des fichiers devant être à nouveau supprimés; déterminer, par l'intermédiaire d'informations d'empreintes numériques de données de blocs, un serveur de métadonnées à l'origine d'interrogations; après avoir trouvé que les informations d'empreintes numériques des données de blocs ont déjà été mémorisées dans une base de données, envoyer à un système de fichiers partagés les données de blocs qui correspondent aux informations d'empreintes numériques des données de blocs, sans les envoyer au serveur de métadonnées à des fins de mémorisation. Comparée à l'art antérieur, la présente invention permet de réduire considérablement le coût de liaison et d'améliorer les performances du système.
PCT/CN2012/086371 2011-12-12 2012-12-11 Procédé, dispositif et système permettant de trouver des données en double WO2013086969A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110412056.4 2011-12-12
CN2011104120564A CN102495894A (zh) 2011-12-12 2011-12-12 重复数据查找方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2013086969A1 true WO2013086969A1 (fr) 2013-06-20

Family

ID=46187719

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/086371 WO2013086969A1 (fr) 2011-12-12 2012-12-11 Procédé, dispositif et système permettant de trouver des données en double

Country Status (2)

Country Link
CN (1) CN102495894A (fr)
WO (1) WO2013086969A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2012389110B2 (en) * 2012-12-12 2016-03-17 Huawei Technologies Co., Ltd. Data processing method and apparatus in cluster system

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102495894A (zh) * 2011-12-12 2012-06-13 成都市华为赛门铁克科技有限公司 重复数据查找方法、装置及系统
CN102915278A (zh) * 2012-09-19 2013-02-06 浪潮(北京)电子信息产业有限公司 重复数据删除方法
CN103823807B (zh) * 2012-11-16 2018-06-15 深圳市腾讯计算机系统有限公司 一种去除重复数据的方法、装置及系统
CN103020174B (zh) 2012-11-28 2016-01-06 华为技术有限公司 相似性分析方法、装置及系统
CN103259729B (zh) * 2012-12-10 2018-03-02 上海德拓信息技术股份有限公司 基于零碰撞散列算法的网络数据精简传输方法
CN103064757A (zh) * 2012-12-12 2013-04-24 鸿富锦精密工业(深圳)有限公司 数据备份方法及系统
CN103019887B (zh) * 2012-12-12 2016-01-06 华为技术有限公司 数据备份方法及装置
CN103067129B (zh) * 2012-12-24 2015-10-28 中国科学院深圳先进技术研究院 网络数据传输方法和系统
CN103246730B (zh) * 2013-05-08 2016-08-10 网易(杭州)网络有限公司 文件存储方法和设备、文件发送方法和设备
CN104077338B (zh) 2013-06-25 2016-02-17 腾讯科技(深圳)有限公司 一种数据处理的方法及装置
CN103414759B (zh) * 2013-07-22 2016-12-28 华为技术有限公司 网盘文件传输方法和装置
CN103810297B (zh) * 2014-03-07 2017-02-01 华为技术有限公司 基于重删技术的写方法、读方法、写装置和读装置
CN105022741B (zh) * 2014-04-23 2018-09-28 苏宁易购集团股份有限公司 压缩方法和系统以及云存储方法和系统
CN103970875B (zh) * 2014-05-15 2017-02-15 华中科技大学 一种并行重复数据删除方法和系统
CN106610790B (zh) * 2015-10-26 2020-01-03 华为技术有限公司 一种重复数据删除方法及装置
US10255288B2 (en) * 2016-01-12 2019-04-09 International Business Machines Corporation Distributed data deduplication in a grid of processors
CN107122370A (zh) * 2016-02-25 2017-09-01 阿里巴巴集团控股有限公司 一种分布式检索方法及装置
CN107391761B (zh) * 2017-08-28 2020-03-06 苏州浪潮智能科技有限公司 一种基于重复数据删除技术的数据管理方法及装置
CN107506150A (zh) * 2017-08-30 2017-12-22 郑州云海信息技术有限公司 分布式存储装置、重删、写、删除、读取方法以及系统
CN107644081A (zh) * 2017-09-21 2018-01-30 锐捷网络股份有限公司 数据去重方法及装置
CN108134775B (zh) * 2017-11-21 2020-10-09 华为技术有限公司 一种数据处理方法和设备
CN109522283B (zh) * 2018-10-30 2021-09-21 深圳先进技术研究院 一种重复数据删除方法及系统
CN112286457B (zh) * 2020-10-28 2022-08-26 杭州宏杉科技股份有限公司 对象重删方法、装置、电子设备及机器可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (zh) * 2008-01-04 2008-07-09 华中科技大学 基于指纹的文件备份方法
CN101599079A (zh) * 2009-07-22 2009-12-09 中国科学院计算技术研究所 一种备份数据集中存储的管理方法
CN102495894A (zh) * 2011-12-12 2012-06-13 成都市华为赛门铁克科技有限公司 重复数据查找方法、装置及系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908077B (zh) * 2010-08-27 2012-11-21 华中科技大学 一种适用于云备份的重复数据删除方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (zh) * 2008-01-04 2008-07-09 华中科技大学 基于指纹的文件备份方法
CN101599079A (zh) * 2009-07-22 2009-12-09 中国科学院计算技术研究所 一种备份数据集中存储的管理方法
CN102495894A (zh) * 2011-12-12 2012-06-13 成都市华为赛门铁克科技有限公司 重复数据查找方法、装置及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2012389110B2 (en) * 2012-12-12 2016-03-17 Huawei Technologies Co., Ltd. Data processing method and apparatus in cluster system

Also Published As

Publication number Publication date
CN102495894A (zh) 2012-06-13

Similar Documents

Publication Publication Date Title
WO2013086969A1 (fr) Procédé, dispositif et système permettant de trouver des données en double
US11775392B2 (en) Indirect replication of a dataset
US20230359644A1 (en) Cloud-based replication to cloud-external systems
USRE49148E1 (en) Reclaiming space occupied by duplicated data in a storage system
US9201891B2 (en) Storage system
US9454476B2 (en) Logical sector mapping in a flash storage array
CN102782643B (zh) 使用布隆过滤器的索引搜索
US8954710B2 (en) Variable length encoding in a storage system
US9298726B1 (en) Techniques for using a bloom filter in a duplication operation
US9613046B1 (en) Parallel optimized remote synchronization of active block storage
US8370315B1 (en) System and method for high performance deduplication indexing
US9367448B1 (en) Method and system for determining data integrity for garbage collection of data storage systems
CN108255647B (zh) 一种samba服务器集群下的高速数据备份方法
US10254964B1 (en) Managing mapping information in a storage system
WO2015127083A2 (fr) Synchronisation de données dans un système distribué
US10210188B2 (en) Multi-tiered data storage in a deduplication system
WO2017020576A1 (fr) Procédé et appareil de compactage de fichiers dans un système de stockage clé/valeur
WO2014201696A1 (fr) Procédé de lecture de fichiers, dispositif de stockage et système de lecture
WO2014067063A1 (fr) Procédé et dispositif de récupération de données en double
WO2014000458A1 (fr) Procédé et dispositif de traitement de petits fichiers
CN108415671B (zh) 一种面向绿色云计算的重复数据删除方法及系统
US20180107404A1 (en) Garbage collection system and process
US9940069B1 (en) Paging cache for storage system
US11436088B2 (en) Methods for managing snapshots in a distributed de-duplication system and devices thereof
JP6113816B1 (ja) 情報処理システム、情報処理装置、及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12858165

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12858165

Country of ref document: EP

Kind code of ref document: A1