WO2015078136A1 - 去重复数据的恢复方法及装置 - Google Patents

去重复数据的恢复方法及装置 Download PDF

Info

Publication number
WO2015078136A1
WO2015078136A1 PCT/CN2014/075850 CN2014075850W WO2015078136A1 WO 2015078136 A1 WO2015078136 A1 WO 2015078136A1 CN 2014075850 W CN2014075850 W CN 2014075850W WO 2015078136 A1 WO2015078136 A1 WO 2015078136A1
Authority
WO
WIPO (PCT)
Prior art keywords
data block
storage medium
threshold
access
file
Prior art date
Application number
PCT/CN2014/075850
Other languages
English (en)
French (fr)
Inventor
崔飞
程佳佳
程宁
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to RU2016124319A priority Critical patent/RU2665272C1/ru
Publication of WO2015078136A1 publication Critical patent/WO2015078136A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present invention relates to the field of communications, and in particular to a method and apparatus for recovering deduplication data.
  • existing storage technologies can only remove duplicates in the server when removing duplicate data, and cannot use an efficient network to perform repeated data removal in the entire system; in addition, when the duplicate database has been removed, when the amount of access is too large, Simply responding to the data to cope with the drop in access performance caused by excessive access, this still does not effectively solve the above problem.
  • the access to the same pair of chunks is excessively dense, resulting in a decrease in access efficiency and affecting the operational efficiency of the distributed file system.
  • a method for recovering deduplication data including: acquiring a first access number of a file corresponding to a first data block, where the first access number indicates that the file is currently accessed simultaneously The number of visitors; comparing the first number of accesses with a first threshold and a second threshold, wherein the first threshold is less than a second threshold; and recovering the first data block according to a comparison result a storage medium or a second storage medium, wherein, when the first access number is greater than the first threshold and less than the second threshold, the first data block is restored to the first storage medium; When the second threshold is greater than, the first data block is restored to the second storage medium; and the access efficiency of the second storage medium is higher than the access efficiency of the first storage medium.
  • the method Before obtaining the first access number of the file corresponding to the first data block, the method includes: acquiring a second access number of the first data block, where the second access number indicates a number of visitors currently accessing the first data block at the same time; And searching for a file corresponding to the first data block when the second access number is greater than a third threshold. Before acquiring the second access number of the first data block, the method includes: acquiring feature information of the first data block, where the feature information is used to indicate that only the first data block has content; The feature information is notified to the current distributed file system and other distributed file systems connected to the current distributed file system, wherein the feature information is used for the current distributed file system and the other distributed file system Perform deduplication treatment.
  • Notifying the feature information to the current distributed file system includes: notifying the feature information to a node server in the current distributed system.
  • Recovering the first data block to the first storage medium or the second storage medium includes: copying the first data block to obtain a second data block; and copying the second data block to the first A storage medium or a second storage medium.
  • the method further includes: subtracting the second access times from the first access times to obtain the latest data block The number of accesses, and the reference count of the first data block is decremented by one.
  • a device for recovering deduplication data including: a first obtaining module, configured to acquire a first access number of a file corresponding to the first data block, where the first access number The number of the visitors that are currently accessing the file at the same time; the comparison module is configured to compare the first number of accesses with the first threshold and the second threshold, wherein the first threshold is smaller than the second threshold; And setting, according to the comparison result, the first data block to be restored to the first storage medium or the second storage medium, wherein, when the first access number is greater than the first threshold and less than the second threshold, Recovering the data block to the first storage medium; recovering the first data block to the second storage medium when the first access number is greater than the second threshold; wherein, the access efficiency of the second storage medium Higher than the access efficiency of the first storage medium.
  • the device further includes: a second obtaining module, configured to acquire a second access number of the first data block, where the second access number indicates a number of visitors currently accessing the first data block; the query module is set to And searching for a file corresponding to the first data block when the second access number is greater than a third threshold.
  • the device further includes: a third obtaining module, configured to acquire feature information of the first data block, where the feature information is used to indicate that only the first data block has content; and the notification module is configured to Notifying the current distributed file system and other distributed file systems connected to the current distributed file system, wherein the feature information is used for the current distributed file system and the other distributed files
  • the system performs deduplication processing.
  • the device further includes: a counting module, configured to: after copying the second data block to the first storage medium or the second storage medium, subtracting the second access number from the first access number to obtain The latest number of accesses of the first data block, and decrementing the reference count of the first data block by one.
  • a counting module configured to: after copying the second data block to the first storage medium or the second storage medium, subtracting the second access number from the first access number to obtain The latest number of accesses of the first data block, and decrementing the reference count of the first data block by one.
  • the number of accesses to the file corresponding to the first data block is compared with the first threshold and the second threshold, respectively, and the first data block is determined to be restored to the first storage medium or the second storage according to the comparison result.
  • the technical means of the medium solves the problems of excessive access to the same data block in the related art, thereby improving the access efficiency of the file.
  • FIG. 1 is a flow chart of a method for restoring deduplication data according to an embodiment of the present invention
  • FIG. 2 is a block diagram showing a structure of a device for restoring deduplication data according to an embodiment of the present invention
  • FIG. 4 is a structural block diagram of a distributed file system according to a preferred embodiment of the present invention
  • FIG. 5 is a schematic diagram of a deduplication process of a data block according to a preferred embodiment of the present invention
  • 6 is a schematic diagram of a recovery process of a data block according to a preferred embodiment of the present invention
  • FIG. 7 is a flowchart of a method for restoring deduplication data according to a preferred embodiment of the present invention
  • FIG. 8 is a deduplication according to a preferred embodiment of the present invention. Another flow chart of the data recovery method.
  • FIG. 1 is a flow chart of a method for restoring deduplication data according to an embodiment of the present invention. As shown in FIG.
  • the method includes: Step S102: Obtain a first access number of a file corresponding to the first data block, where the first access number indicates a number of visitors currently accessing the file at the same time;
  • the number of accesses of the first data block needs to be considered, specifically: obtaining the second access number of the first data block, where the second access number indicates the current The number of visitors accessing the first data block is simultaneously accessed; when the second access number is greater than the third threshold, the file corresponding to the first data block is searched; and the first access times of the file are obtained.
  • step S104 the first access number is compared with the first threshold and the second threshold, where the first threshold is smaller than the second threshold.
  • Step S106 according to the comparison result, the first data block is restored to the first storage medium or the first a second storage medium, wherein, when the first access number is greater than the first threshold and less than the second threshold, the first data block is restored to the first storage medium; when the first access number is greater than the second threshold, the first data block is Reverting to the second storage medium; the access efficiency of the second storage medium is higher than the access efficiency of the first storage medium.
  • Recovering the first data block to the first storage medium or the second storage medium may be performed as follows: copying the first data block to obtain the second data block; copying the second data block to the first storage medium or Two storage media.
  • a de-duplication data recovery device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments. The descriptions of the above-mentioned embodiments are omitted.
  • the term "module" may implement a combination of software and/or hardware of a predetermined function.
  • the device includes: a first obtaining module 20, configured to acquire a first access number of files corresponding to the first data block, where the first access number indicates a number of visitors currently accessing the file at the same time;
  • the module 22 is connected to the first obtaining module 20, and is configured to compare the first access times with the first threshold and the second threshold, wherein the first threshold is smaller than the second threshold;
  • the recovery module 24 is connected to the comparing module 22, And being configured to restore the first data block to the first storage medium or the second storage medium according to the comparison result, wherein when the first access number is greater than the first threshold and less than the second threshold, the first data block is restored to the first And storing, by the storage medium, the first data block to the second storage medium when the first access number is greater than the second threshold; wherein the access efficiency of the second storage medium is higher than the access efficiency of the first storage medium.
  • the foregoing apparatus further includes: a second obtaining module 26, configured to acquire a second access number of the first data block, where the second access number indicates that the first access is currently simultaneously The number of visitors of the data block; the query module 28, connected to the second obtaining module 26, is configured to search for a file corresponding to the first data block when the second access number is greater than the third threshold.
  • the foregoing apparatus may further include the following processing module: the third obtaining module 30 is configured to acquire feature information of the first data block, where the feature information is used to indicate only the first data.
  • the notification module 32 is configured to notify the current distributed file system and other distributed file systems connected to the current distributed file system, wherein the feature information is used for the current distributed The file system and the other distributed file systems described above perform deduplication processing.
  • the meaning of "connected” here can be: Both can communicate.
  • the foregoing apparatus may further include the following processing module: the counting module 34 is configured to: after copying the second data block to the first storage medium or the second storage medium, perform the second access The number of times of the first access is subtracted from the number of times, and the latest number of accesses of the first data block is obtained, and the reference count of the first data block is decremented by one.
  • FIG. 4 is a structural block diagram of a distributed file system in accordance with a preferred embodiment of the present invention.
  • the system includes: a metadata server 40: responsible for managing metadata such as file names and data blocks of all files in the file system; and providing metadata write and query operations to the file access client; On the basis of the original metadata chunk, in order to realize the weight reduction function, the metadata nk nk reference counter is added.
  • a certain chu nk of the A file is the same as the chunk content of the B file, and needs to be deduplicated, then The chunk of the B file is deleted, the B file is pointed to the chunk of the A file, and the chunk reference count of the A file is incremented by one; at the same time, according to the chunk reading and writing, the number of chunks being operated is recorded, when the configured gate is exceeded When the limit is reached, the file location register is required to perform the deduplication chunk recovery to cope with the problem of excessive access; the file access client 42: is responsible for providing an interface call similar to the standard file system for the application oriented by the file system.
  • file access server 44 responsible for interacting with the storage medium in the file system, reading and writing the actual data block The response file accesses the client's data read and write request, reads the data from the storage medium and returns it to the file access client; reads the data from the file access client and writes it to the storage medium;
  • File Location Register 46 responsible for file access control , data file distribution and management of various data; the file location register may also include a mapping database, which is responsible for storing the file and the chunk mapping table, and simultaneously counting the number of simultaneous accesses of the file; storage medium 48: generally one of the following : Integrqated Drive Elcetronics (IDE) Disk, Serial Advanced Technology Attachment (SATA) Disk, Secure Digital (SD) Disk, Solid State Drive (Solid) State Disk, referred to as SSD) disk.
  • IDE Integrqated Drive Elcetronics
  • SATA Serial Advanced Technology Attachment
  • SD Secure Digital
  • SSD Solid State Drive
  • Step A When the user needs to read and write the file, the read and write operation instruction is sent to the file access client, and then the chunk information corresponding to the file is obtained through the file location register and the metadata server, and finally the specific file disk information is returned through the file access server.
  • the foregoing process may be expressed as the following process: Step A.
  • the file location register periodically queries the metadata server whether there is a chunk that does not calculate a fingerprint, and if so, returns a chunk that needs to calculate a fingerprint, and the register notifies the access server to calculate the fingerprint, the register Notifying the calculated fingerprint to the metadata server of the system and the registers of other systems connected thereto, so that the database connected to the register can perform the deduplication operation, thereby realizing the chunk weight elimination across the server; Step B.
  • the metadata server counts the number of accesses of chunk A. If the number of simultaneous accesses is greater than the threshold n, the file location register is notified, and the file corresponding to the chunk is found in the register to the file mapping database; Step C.
  • Step D File Locator Notification Metadata
  • the server newly adds the chunk and tells the server the number of accesses of the files that need to be restored.
  • the metadata server creates chunk B, and subtracts the number of accesses of the original chunk from the number of file accesses and the reference count by one;
  • Step E The file location register is copied as needed. The information is notified to the file access server to copy the file to the corresponding storage medium (such as a normal disk or an SSD disk).
  • the file location register periodically checks whether the chunk server needs to calculate the fingerprint, and if so, returns the chunk to be calculated, the register calculates the fingerprint to the access server, and then sends the calculated fingerprint to the server.
  • the metadata server and the server of the neighboring node and then achieve the purpose of the deduplication of the respective servers through the respective metadata query.
  • the embodiment of the present invention not only realizes the deduplication of the server but also implements the entire system through the message connection.
  • the server's de-emphasis solves the problem of efficiency in de-emphasis and saves space compared to the old single server.
  • the specific implementation process is as follows: Step S502, periodically query whether there is a chunk that does not calculate a fingerprint;
  • S51 notifying the calculated fingerprint value, asking whether there is a chunkA that needs to be deduplicated; S51 2, returning the chunkB that needs to be deleted; S514, notifying the fingerprint value of chunkA, finding whether there is the same on B; S51 6, notifying the calculation Fingerprint value, it is required to find out whether there is a chunkA that needs to be de-emphasized;
  • S51 8 return the chunkC that needs to be deleted;
  • S520 the notification maps the file corresponding to the chunkB to the chunkA;
  • S522 the mapping succeeds;
  • S524, the notification deletes the chunkB;
  • the deletion is successful.
  • the metadata server counts the stored chunks. When the chunk access count exceeds the threshold n, the chunk information is reported to the file location register, and the file register finds the corresponding chunk occupying the chunk according to the chunk. File, and sort the number of accesses of all files. If the number of file accesses is greater than the normal access threshold but less than the file performance threshold, it will directly restore to the normal disk. If the number of file accesses is greater than the performance threshold, the file will be deduplicated. The chunk is restored to a highly efficient solid state disk.
  • the specific implementation process is as follows:
  • the statistics chunk concurrent access number is greater than a threshold n (n is a natural number), if greater than n, and the reference count is greater than 1, the notification register is notified;
  • Embodiment 2 When the file is opened, the file access client sends the file information to the default file system location register, and the location register finds the chunk corresponding to the file through the mapping database, and finds that the chunk is not saved by the system. Then, the accessing client is notified to the file system where the chunk is located to query, so that the file access across the system is easily realized; by opening the file file A, an implementation manner of the present invention is illustrated. As shown in FIG.
  • Step S702 Open file file A
  • Step S704 Check chunk according to file A, and count the number of accesses of the file
  • Step S706 Return chunk A
  • Step S708 Notification statistics chunk A Simultaneous access number
  • step S710 finding that the simultaneous access number of chunk A is greater than the access threshold n, and the reference count is greater than 1
  • step S712 searching for the corresponding file according to chunk A
  • step S714 returning file A and file A for chunk A File B, where the number of accesses of file A is greater than the performance threshold m
  • Step S716 Notifying the newly added chunk, and telling file A the number of accesses
  • Step S718 Returning the newly added chunk B, and subtracting the number of chunkA accesses from file A
  • the number of accesses, at the same time, the reference count of chunkA is decremented by 1 and the mapping relationship between file A and chunkA is released
  • step S720 the notification ⁇ is mapped with 1 ⁇
  • Step S802 Open file file A
  • Step S804 Check chunk according to file A, and count the number of accesses of the file
  • Step S806 Return chunk A
  • Step S808 Notify the number of simultaneous accesses of chunk A
  • Step 810 It is found that the number of simultaneous accesses of chunk A is greater than the access threshold n, and the reference count is greater than 1
  • step S812 searching for a corresponding file according to chunk A
  • step S814 returning files A and file B for chunk A, where fileB
  • the number of accesses is greater than the performance threshold m, and the number of accesses of file A is greater than the normal threshold I
  • Step S816 Notifying the newly added chunk and telling the access number of file A
  • Step S818 Returning the newly added chunk B, and accessing the chunkA
  • the embodiments of the present invention achieve the following beneficial effects:
  • the embodiments of the present invention are directed to the related art, in which the access to the same data block is excessively dense due to the weight loss, resulting in a decrease in access efficiency, by counting the number of concurrent accesses of the chunk.
  • the number of simultaneous accesses of the file is used to determine whether the access is intensive.
  • the file access is classified into the number of normal accesses and the number of performance accesses. According to these two values, the chunk is restored to the ordinary disk and the ssd disk, thereby hierarchically recovering the chunk according to the number of accesses. To the combination of efficiency and economy.
  • the granularity of the deduplication is increased, and the cross-system deduplication is realized, thereby saving disk space.
  • the file mapping table can be reduced when the chunk is modified. Probability, increasing the efficiency of system operation.
  • software is also provided for performing the technical solutions described in the above embodiments and preferred embodiments.
  • a storage medium is provided, the software being stored, including but not limited to: an optical disk, a floppy disk, a hard disk, a rewritable memory, and the like.
  • modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the above are only the preferred embodiments of the present invention, and are not intended to limit the present invention, and various modifications and changes can be made to the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computing Systems (AREA)

Abstract

提供了一种去重复数据的恢复方法及装置,其中,该方法包括:获取第一数据块所对应文件的第一访问次数,其中,第一访问次数表示当前同时访问文件的访问者数量;将第一访问次数分别和第一阈值以及第二阈值进行比较,其中,第一阈值小于第二阈值;根据比较结果,将第一数据块恢复到第一存储媒介或第二存储媒介,其中,在第一访问次数大于第一阈值且小于第二阈值时,将第一数据块恢复到第一存储媒介;在第一访问次数大于第二阈值时,将第一数据块恢复到第二存储媒介;第二存储媒介的访问效率高于第一存储媒介的访问效率。采用提供的上述技术方案,解决了相关技术中,对同一数据块的访问过度密集等问题,从而提高了对文件的访问效率。

Description

去重复数据的恢复方法及装置
技术领域 本发明涉及通信领域, 具体而言, 涉及一种去重复数据的恢复方法及装置。 背景技术 目前现有的存储技术在去除重复数据时只能在本服务器内去除重复, 不能利用高 效的网络进行整个系统内的重复数据去除; 另外当已经去除重复的数据库在访问量过 大时, 只是简单的进行回复数据来应对访问量过大引起的访问性能下降问题, 这样仍 然不能有效解决上述问题。 并且, 在去重复数据的恢复方案中, 往外会造成对同一对数据块(chunk) 的访问 过度密集, 导致访问效率下降, 影响分布式文件系统的运行效率。 针对相关技术中的上述问题, 目前尚未提出有效的解决方案。 发明内容 针对相关技术中, 对同一数据块的访问过度密集等问题, 本发明实施例提供了一 种去重复数据的恢复方法及装置, 以至少解决上述问题。 根据本发明的一个实施例, 提供了一种去重复数据的恢复方法, 包括: 获取第一 数据块所对应文件的第一访问次数, 其中, 所述第一访问数表示当前同时访问所述文 件的访问者数量;将所述第一访问次数分别和第一阈值以及第二阈值进行比较,其中, 所述第一阈值小于第二阈值; 根据比较结果, 将所述第一数据块恢复到第一存储媒介 或第二存储媒介, 其中, 在所述第一访问数大于第一阈值且小于第二阈值时, 将所述 第一数据块恢复到第一存储媒介; 在所述第一访问数大于所述第二阈值时, 将所述第 一数据块恢复到第二存储媒介; 所述第二存储媒介的访问效率高于所述第一存储媒介 的访问效率。 获取第一数据块所对应文件的第一访问次数之前, 包括: 获取所述第一数据块的 第二访问次数, 其中, 第二访问数表示当前同时访问该第一数据块的访问者数量; 在 所述第二访问次数大于第三阈值时, 查找所述第一数据块所对应的文件。 获取所述第一数据块的第二访问次数之前, 包括: 获取所述第一数据块的特征信 息, 其中, 所述特征信息用于表示仅所述第一数据块具有的内容; 将所述特征信息通 知给当前分布式文件系统以及与所述当前分布式文件系统相连的其它分布式文件系 统, 其中, 所述特征信息用于对所述当前分布式文件系统以及所述其它分布式文件系 统进行消重处理。 将所述特征信息通知给当前分布式文件系统包括: 将所述特征信息通知给所述当 前分布式系统中的节点服务器。 将所述第一数据块恢复到第一存储媒介或第二存储媒介, 包括: 对所述第一数据 块进行复制, 得到第二数据块; 将所述第二数据块复制到所述第一存储媒介或第二存 储媒介。 将所述第二数据块复制到所述第一存储媒介或第二存储媒介之后, 还包括: 将所 述第二访问次数减去所述第一访问次数, 得到所述第一数据块的最新访问次数, 以及 将所述第一数据块的被引用计数减 1。 根据本发明的另一个实施例, 提供一种去重复数据的恢复装置, 包括: 第一获取 模块, 设置为获取第一数据块所对应文件的第一访问次数, 其中, 所述第一访问数表 示当前同时访问所述文件的访问者数量; 比较模块, 设置为将所述第一访问次数分别 和第一阈值以及第二阈值进行比较, 其中, 所述第一阈值小于第二阈值; 恢复模块, 设置为根据比较结果, 将所述第一数据块恢复到第一存储媒介或第二存储媒介,其中, 在所述第一访问数大于第一阈值且小于第二阈值时, 将所述第一数据块恢复到第一存 储媒介; 在所述第一访问数大于所述第二阈值时, 将所述第一数据块恢复到第二存储 媒介; 其中, 所述第二存储媒介的访问效率高于所述第一存储媒介的访问效率。 上述装置还包括: 第二获取模块, 设置为获取所述第一数据块的第二访问次数, 其中, 第二访问数表示当前同时访问该第一数据块的访问者数量; 查询模块, 设置为 在所述第二访问次数大于第三阈值时, 查找所述第一数据块所对应的文件。 上述装置还包括: 第三获取模块, 设置为获取所述第一数据块的特征信息, 其中, 所述特征信息用于表示仅所述第一数据块具有的内容; 通知模块, 设置为将所述特征 信息通知给当前分布式文件系统以及与所述当前分布式文件系统相连的其它分布式文 件系统, 其中, 所述特征信息用于对所述当前分布式文件系统以及所述其它分布式文 件系统进行消重处理。 上述装置还包括: 计数模块, 设置为在将所述第二数据块复制到所述第一存储媒 介或第二存储媒介之后, 将所述第二访问次数减去所述第一访问次数, 得到所述第一 数据块的最新访问次数, 以及将所述第一数据块的被引用计数减 1。 通过本发明实施例, 采用根据对第一数据块所对应文件的访问次数分别与第一阈 值和第二阈值进行比较, 根据比较结果确定将第一数据块恢复到第一存储媒体或第二 存储媒介的技术手段, 解决了相关技术中, 对同一数据块的访问过度密集等问题, 从 而提高了对文件的访问效率。 附图说明 此处所说明的附图用来提供对本发明的进一步理解, 构成本申请的一部分, 本发 明的示意性实施例及其说明用于解释本发明, 并不构成对本发明的不当限定。 在附图 中- 图 1为根据本发明实施例的去重复数据的恢复方法的流程图; 图 2为根据本发明实施例的去重复数据的恢复装置的结构框图; 图 3为根据本发明实施例的去重复数据的恢复装置的另一结构框图; 图 4为根据本发明优选实施例的分布式文件系统的结构框图; 图 5为根据本发明优选实施例的数据块的消重流程示意图; 图 6为根据本发明优选实施例的数据块的恢复流程示意图; 图 7为根据本发明优选实施例的去重复数据的恢复方法的流程图; 以及 图 8为根据本发明优选实施例的去重复数据的恢复方法的另一流程图。 具体实施方式 下文中将参考附图并结合实施例来详细说明本发明。 需要说明的是, 在不冲突的 情况下, 本申请中的实施例及实施例中的特征可以相互组合。 以下实施例可以应用到计算机中, 例如应用到 PC 中。 也可以应用到目前采用了 智能操作系统中的移动终端中, 并且并不限于此。 对于计算机或移动终端的操作系统 并没有特殊要求, 只要支持应用程序的运行即可。 例如, 以下实施例可以应用到 Windows操作系统中。 图 1为根据本发明实施例的去重复数据的恢复方法的流程图。 如图 1所示, 该方 法包括: 步骤 S102, 获取第一数据块所对应文件的第一访问次数, 其中, 该第一访问数表 示当前同时访问上述文件的访问者数量; 在本实施例中, 为了进一步提高对数据块的访问效率, 在步骤 S1 02之前, 还需 要考虑第一数据块的访问次数, 具体地: 获取第一数据块的第二访问次数, 其中, 第 二访问数表示当前同时访问该第一数据块的访问者数量; 在第二访问次数大于第三阈 值时, 查找第一数据块所对应的文件; 获取文件的第一访问次数。 为了实现跨系统以及跨服务器的消重处理, 还需要执行以下处理过程: 获取上述 第一数据块的特征信息,其中, 上述特征信息用于表示仅上述第一数据块具有的内容; 将上述特征信息通知给当前分布式文件系统以及与上述当前分布式文件系统相连的其 它分布式文件系统, 其中, 上述特征信息用于对上述其它分布式文件系统进行消重处 理。 其中, 对于跨服务器进行消重之前, 需要将上述特征信息通知给当前分布式系统 中的节点服务器。 步骤 S104,将第一访问次数分别和第一阈值以及第二阈值进行比较, 其中, 第一 阈值小于第二阈值; 步骤 S106, 根据比较结果, 将第一数据块恢复到第一存储媒介或第二存储媒介, 其中, 在第一访问数大于第一阈值且小于第二阈值时, 将第一数据块恢复到第一存储 媒介; 在第一访问数大于第二阈值时, 将第一数据块恢复到第二存储媒介; 上述第二 存储媒介的访问效率高于上述第一存储媒介的访问效率。 将第一数据块恢复到第一存储媒介或第二存储媒介可以表现为以下处理过程: 对 第一数据块进行复制, 得到第二数据块; 将第二数据块复制到第一存储媒介或第二存 储媒介。 在将第二数据块复制到上述第一存储媒介或第二存储媒介之后, 将上述第二访问 次数减去上述第一访问次数, 得到上述第一数据块的最新访问次数, 以及将上述第一 数据块的被引用计数减 1。 在本实施例中还提供了一种去重复数据的恢复装置, 用于实现上述实施例及优选 实施方式, 已经进行过说明的不再赘述, 下面对该装置中涉及到的模块进行说明。 如 以下所使用的, 术语 "模块"可以实现预定功能的软件和 /或硬件的组合。 尽管以下实 施例所描述的装置较佳地以软件来实现, 但是硬件, 或者软件和硬件的组合的实现也 是可能并被构想的。 图 2为根据本发明实施例的去重复数据的恢复装置的结构框图。 如图 2所示, 该装置包括: 第一获取模块 20, 设置为获取第一数据块所对应文件的第一访问次数, 其中, 第 一访问数表示当前同时访问上述文件的访问者数量; 比较模块 22, 连接至第一获取模块 20, 设置为将第一访问次数分别和第一阈值 以及第二阈值进行比较, 其中, 第一阈值小于第二阈值; 恢复模块 24, 连接至比较模块 22, 设置为根据比较结果, 将第一数据块恢复到 第一存储媒介或第二存储媒介,其中,在第一访问数大于第一阈值且小于第二阈值时, 将第一数据块恢复到第一存储媒介; 在第一访问数大于第二阈值时, 将第一数据块恢 复到第二存储媒介; 其中, 第二存储媒介的访问效率高于第一存储媒介的访问效率。 在本实施例中, 如图 3所示, 上述装置还包括: 第二获取模块 26, 设置为获取上 述第一数据块的第二访问次数, 其中, 第二访问数表示当前同时访问该第一数据块的 访问者数量; 查询模块 28, 连接至第二获取模块 26, 设置为在第二访问次数大于第 三阈值时, 查找第一数据块所对应的文件。 可选地, 如图 3所示, 上述装置还可以包括以下处理模块: 第三获取模块 30, 设 置为获取上述第一数据块的特征信息, 其中, 上述特征信息用于表示仅上述第一数据 块具有的内容; 通知模块 32, 设置为将上述特征信息通知给当前分布式文件系统以及 与上述当前分布式文件系统相连的其它分布式文件系统, 其中, 上述特征信息用于对 上述当前分布式文件系统以及上述其它分布式文件系统进行消重处理。 此处 "相连" 的含义可以为: 两者可以进行通信。 可选地, 如图 3所示, 上述装置还可以包括以下处理模块: 计数模块 34, 设置为 在将第二数据块复制到上述第一存储媒介或第二存储媒介之后, 将上述第二访问次数 减去上述第一访问次数, 得到上述第一数据块的最新访问次数, 以及将上述第一数据 块的被引用计数减 1 为了更好地理解上述实施例, 以下结合优选实施例详细说明。 实施例 1 图 4为根据本发明优选实施例的分布式文件系统的结构框图。 如图 4所示, 该系 统包括: 元数据服务器 40: 负责管理本文件系统内所有文件的文件名、数据块等元数据信 息; 并向文件访问客户端提供元数据写入和查询等操作; 在原有的元数据 chunk的基 础上,为了实现消重功能,增加了元数据 chu nk引用计数器,例如, A文件的某个 chu nk 与 B文件的 chunk内容相同, 需要做消重, 则可以把 B文件的 chunk删除, 把 B文 件指向 A文件的 chunk, 通过把 A文件的 chunk引用计数加 1; 同时根据该 chunk的 正在读写进行计数, 记录正在操作该 chunk的数量, 当超过配置的门限值的时候则要 求文件定位寄存器进行该消重 chunk进行恢复, 以应对访问数量过大的问题; 文件访问客户端 42:负责为本文件系统面向的应用程序提供类似于标准文件系统 的接口调用服务; 文件访问服务器 44: 负责与本文件系统内的存储介质进行交互, 进行实际数据块 的读写操作; 响应文件访问客户端的数据读写请求, 从存储媒介上读取数据并返回给 文件访问客户端; 从文件访问客户端读取数据并写入存储媒介; 文件定位寄存器 46: 负责文件访问控制, 数据文件分布和各种数据的管理; 文件 定位寄存器中还可以包括恩及映射数据库, 负责把保存文件和 chunk的映射表, 同时 统计文件同时访问数; 存储媒介 48 : —般为以下之一: 普通的电子集成驱动器 (Integrqated Drive Elcetronics, 简称为 IDE ) 磁盘、 串行高级技术附件 (Serial Advanced Technology Attachment, 简称为 SATA) 磁盘、 安全数码 (Secure Digital , 简称为 SD)磁盘、 固态 硬盘 (Solid State Disk, 简称为 SSD ) 磁盘。 当用户需要读写文件的时候, 把读写操作指令发到文件访问客户端, 然后通过文 件定位寄存器和元数据服务器得到该文件所对应的 chunk信息, 最后通过文件访问服 务器把具体文件磁盘信息返回给用户。 具体地, 上述过程可以表现为以下处理过程: 步骤 A. 文件定位寄存器定时向元数据服务器查询是否有 chunk没有计算指纹, 如果有, 则返回需要计算指纹的 chunk, 寄存器通知访问服务器计算指纹, 寄存器把 计算的指纹通知本系统的元数据服务器和与之相连的其他系统的寄存器, 这样只要与 该寄存器相连的数据库都可以进行消重操作, 从而实现跨服务器的 chunk消重; 步骤 B. 元数据服务器统计 chunk A的访问数, 如果同时访问数大于阈值 n时, 通知文件定位寄存器, 寄存器到文件映射数据库上查找到该 chunk对应的文件; 步骤 C. 映射数据库计算找到文件的同时访问数, 如果该文件同时访问数大于普 通访问阈值且小于性能阈值, 则通知寄存器把 chunk恢复到普通磁盘上, 如果大于性 能阈值则恢复到 SSD磁盘上; 步骤 D. 文件定位寄存器通知元数据服务器新增加 chunk, 并告诉服务器需要恢 复的文件的访问数, 元数据服务器创建 chunk B, 同时把原来 chunk的访问数减去文 件访问数和引用计数减 1 ; 步骤 E. 文件定位寄存器根据需要拷贝的信息, 通知文件访问服务器把文件拷贝 到对应的存储媒介 (例如普通磁盘或者 SSD磁盘) 上。 为了更好地理解上述实现过程, 以下结合图 5和图 6详细说明。 如图 5所示, 文件定位寄存器定时向本服务器的元数据服务器查是否有 chunk需 要计算指纹, 如果有则返回需要计算的 chunk, 寄存器到访问服务器计算指纹, 然后 把计算的指纹发给本服务器的元数据服务器和临接节点的服务器, 然后通过各自元数 据查询来实现各自服务器的消重目的, 由此可见, 本发明实施例不仅实现了本服务器 的消重还通过消息连接来实现整个系统的服务器的消重, 从而解决了消重的效率问题 并比老的单服务器更加节省空间。 具体实现流程如下: 步骤 S502, 定期查询是否有 chunk没有计算指纹;
S504, 返回需要计算指纹的 chunkA; S506, 把返回的 chunkA通知计算指纹;
S508, 通知计算的指纹值;
S51 0, 通知计算的指纹值, 要求查找是否有需要消重的 chunkA; S51 2, 返回需要删除的 chunkB; S514, 通知 chunkA的指纹值, 查找 B上是否有相同的; S51 6, 通知计算的指纹值, 要求查找是否有需要消重的 chunkA;
S51 8, 返回需要删除的 chunkC; S520, 通知把 chunkB对应的文件映射到 chunkA上; S522, 映射成功; S524, 通知删除 chunkB;
S526, 删除成功, 通知把 chunkB对应的文件映射到 chunkA上; S528, 映射成功;
S530, 通知删除 chunkB;
S532, 删除成功。 如图 6所示, 元数据服务器对所保存的 chunk进行访问计数, 当 chunk的访问计 数超过阈值 n时, 则把该 chunk信息上报到文件定位寄存器, 文件寄存器根据 chunk 找到对应的占用该 chunk的文件, 并对所有文件的访问数进行排序, 如果有文件访问 数大于普通访问阈值但是小于把文件性能阈值, 则直接恢复到普通磁盘, 如果文件访 问数大于性能阈值, 则把该文件对应消重的 chunk恢复到访问效率高的固态磁盘上。 具体实现流程如下:
S602, 统计 chunk同时访问数大于阈值 n ( n为自然数), 如果大于 n, 且引用计 数大于 1, 则通知寄存器;
S604, 根据 chunk找到对应的文件 fileid;
S606, 找到的 fileid的正在访问数大于 I, 但小于 m ( m为自然数), 则要求寄存 器恢复 chunk到普通磁盘, 如果大于 m则恢复到 sd磁盘;
S608, 把 chunk B映射到需要恢复的文件上; S61 0, 返回成功;
S61 2, 通知新增加 chunk, 并把原 chunk的引用计数减 1;
S61 4, 返回新增加的 chunk B;
S61 6, 拷贝 chunk信息到普通磁盘或者 sd磁盘;
S61 8, 返回成功; S620, 进行拷贝。 实施例 2 在打开文件的时候, 文件访问客户端把文件信息发送到默认的文件系统的定位寄 存器后, 定位寄存器通过映射数据库查到该文件所对应的 chunk, 发现该 chunk并不 是本系统保存的, 则通知访问客户端到该 chunk所在的文件系统去查询, 这样就很容 易的实现跨系统的文件访问; 通过打开文件 file A, 举例说明本发明的一个实施的方式。 如图 7所示, 包括以下 处理步骤: 步骤 S702: 打开文件 file A; 步骤 S704: 根据 file A查 chunk, 同时统计该文件的访问数; 步骤 S706: 返回 chunk A; 步骤 S708: 通知统计 chunk A的同时访问数; 步骤 S710: 发现 chunk A的同时访问数大于访问阈值 n, 且引用计数大于 1; 步骤 S712: 根据 chunk A, 查找对应的文件; 步骤 S714: 返回 chunk A对于的文件 file A和 file B,其中 file A的访问数大于性 能阈值 m; 步骤 S716: 通知新增加 chunk, 并告诉 file A的访问数; 步骤 S718: 返回新增加的 chunk B, 并把 chunkA的访问数减去 file A的访问数, 同时把 chunkA的引用计数减 1和 file A与 chunkA的映射关系解除; 步骤 S720: 通知 {^ 与 1^1^进行映射; 步骤 S722: 文件数据库返回映射成功; 步骤 S724: 通知把 chunk A的内容拷贝到 chunk B上,并把 chunk B拷贝到 ssd 磁盘上以增加访问效率; 步骤 S726: 文件访问服务器返回拷贝成功; 步骤 S728: 定位寄存器返回 chunk B。 通过打开文件 file A,在打开过程中发现该文件对应的 chunk别的文件的访问数也 超过阈值的一个实施的方式。 如图 8所示: 步骤 S802: 打开文件 file A; 步骤 S804: 根据 file A查 chunk, 同时统计该文件的访问数; 步骤 S806: 返回 chunk A; 步骤 S808: 通知统计 chunk A的同时访问数; 步骤 810: 发现 chunk A的同时访问数大于访问阈值 n, 且引用计数大于 1; 步骤 S812: 根据 chunk A, 查找对应的文件; 步骤 S814: 返回 chunk A对于的文件 file A和 file B, 其中 fileB的访问数大于性 能阈值 m, file A的访问数大于普通阈值 I; 步骤 S816: 通知新增加 chunk, 并告诉 file A的访问数; 歩骤 S818: 返回新增加的 chunk B, 并把 chunkA的访问数减去 file A的访问数, 同时把 chunkA的引用计数减 1和 file A与 chunkA的映射关系解除; 步骤 S820: 通知 ^ 与 |^1^进行映射; 步骤 S822: 文件数据库返回映射成功; 步骤 S824: 通知新增加 chunk, 并告诉 file B的访问数; 步骤 S826:返回新增加的 chunk C,并把 chunkA的访问数减去 file B的访问数, 同时把 chunkA的引用计数减 1和 file B与 chunkA的映射关系解除; 步骤 S828: 通知 file B与 chunkC进行映射; 步骤 S830: 文件数据库返回映射成功; 步骤 S832:通知把 chunk A的内容拷贝到 chunk B上,此时由于是超过普通阈值, 则拷贝在普通磁盘上; 步骤 S834: 文件访问服务器返回拷贝成功; 步骤 S836: 定位寄存器返回 chunk B; 步骤 S838:在操作 fileA的同时通知把 chunk A的内容拷贝到 chunk C上,此时由 于是超过性能阈值, 则拷贝在 ssd磁盘上; 步骤 S840: 文件访问服务器返回拷贝成功。 综上所述, 本发明实施例实现了以下有益效果: 本发明实施例针对相关技术中, 由于消重导致访问同一数据块过度密集, 导致访 问效率下降的问题, 通过统计 chunk的同时访问数和文件的同时访问数来判断访问是 否密集, 同时把文件访问分级为普通访问数和性能访问数, 根据这两个值把 chunk恢 复到普通磁盘和 ssd磁盘, 从而根据访问数来分级恢复 chunk, 做到效率与经济的合 一。 另外, 直接通过对元数据的消重, 增加了消重的粒度, 并实现跨系统的消重, 更 加节省磁盘空间, 同时利用引用计数的做法, 可以在修改 chunk的时候减少查找文件 映射表的机率, 增加系统运行的效率。 在另外一个实施例中, 还提供了一种软件, 该软件用于执行上述实施例及优选实 施方式中描述的技术方案。 在另外一个实施例中, 还提供了一种存储介质, 该存储介质中存储有上述软件, 该存储介质包括但不限于: 光盘、 软盘、 硬盘、 可擦写存储器等。 显然, 本领域的技术人员应该明白, 上述的本发明的各模块或各步骤可以用通用 的计算装置来实现, 它们可以集中在单个的计算装置上, 或者分布在多个计算装置所 组成的网络上, 可选地, 它们可以用计算装置可执行的程序代码来实现, 从而, 可以 将它们存储在存储装置中由计算装置来执行, 并且在某些情况下, 可以以不同于此处 的顺序执行所示出或描述的步骤, 或者将它们分别制作成各个集成电路模块, 或者将 它们中的多个模块或步骤制作成单个集成电路模块来实现。 这样, 本发明不限制于任 何特定的硬件和软件结合。 以上仅为本发明的优选实施例而已, 并不用于限制本发明, 对于本领域的技术人 员来说, 本发明可以有各种更改和变化。 凡在本发明的精神和原则之内, 所作的任何 修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。 工业实用性 本发明提供的上述技术方案, 可以应用于去重复数据的恢复过程中, 采用根据对 第一数据块所对应文件的访问次数分别与第一阈值和第二阈值进行比较, 根据比较结 果确定将第一数据块恢复到第一存储媒体或第二存储媒介的技术手段, 解决了相关技 术中, 对同一数据块的访问过度密集等问题, 从而提高了对文件的访问效率。

Claims

权 利 要 求 书
1. 一种去重复数据的恢复方法, 包括:
获取第一数据块所对应文件的第一访问次数, 其中, 所述第一访问数表示 当前同时访问所述文件的访问者数量;
将所述第一访问次数分别和第一阈值以及第二阈值进行比较, 其中, 所述 第一阈值小于第二阈值;
根据比较结果, 将所述第一数据块恢复到第一存储媒介或第二存储媒介, 其中, 在所述第一访问数大于所述第一阈值且小于所述第二阈值时, 将所述第 一数据块恢复到所述第一存储媒介; 在所述第一访问数大于所述第二阈值时, 将所述第一数据块恢复到所述第二存储媒介; 所述第二存储媒介的访问效率高 于所述第一存储媒介的访问效率。
2. 根据权利要求 1所述的方法, 其中, 获取第一数据块所对应文件的第一访问次 数之前, 包括: 获取所述第一数据块的第二访问次数, 其中, 第二访问数表示当前同时访 问该第一数据块的访问者数量; 在所述第二访问次数大于第三阈值时,查找所述第一数据块所对应的文件。
3. 根据权利要求 2所述的方法,其中,获取所述第一数据块的第二访问次数之前, 包括:
获取所述第一数据块的特征信息, 其中, 所述特征信息用于表示仅所述第 一数据块具有的内容;
将所述特征信息通知给当前分布式文件系统以及与所述当前分布式文件系 统相连的其它分布式文件系统, 其中, 所述特征信息用于对所述当前分布式文 件系统以及所述其它分布式文件系统进行消重处理。
4. 根据权利要求 3所述的方法, 其中, 将所述特征信息通知给当前分布式文件系 统包括:
将所述特征信息通知给所述当前分布式系统中的节点服务器。
5. 根据权利要求 2所述的方法, 其中, 将所述第一数据块恢复到第一存储媒介或 第二存储媒介, 包括:
对所述第一数据块进行复制, 得到第二数据块;
将所述第二数据块复制到所述第一存储媒介或第二存储媒介。
6. 根据权利要求 5所述的方法, 其中, 将所述第二数据块复制到所述第一存储媒 介或第二存储媒介之后, 还包括:
将所述第二访问次数减去所述第一访问次数, 得到所述第一数据块的最新 访问次数, 以及将所述第一数据块的被引用计数减 1。
7. 一种去重复数据的恢复装置, 包括:
第一获取模块,设置为获取第一数据块所对应文件的第一访问次数,其中, 所述第一访问数表示当前同时访问所述文件的访问者数量;
比较模块, 设置为将所述第一访问次数分别和第一阈值以及第二阈值进行 比较, 其中, 所述第一阈值小于第二阈值;
恢复模块, 设置为根据比较结果, 将所述第一数据块恢复到第一存储媒介 或第二存储媒介, 其中, 在所述第一访问数大于第一阈值且小于第二阈值时, 将所述第一数据块恢复到第一存储媒介; 在所述第一访问数大于所述第二阈值 时, 将所述第一数据块恢复到第二存储媒介; 其中, 所述第二存储媒介的访问 效率高于所述第一存储媒介的访问效率。
8. 根据权利要求 7所述的装置, 其中, 还包括: 第二获取模块, 设置为获取所述第一数据块的第二访问次数, 其中, 第二 访问数表示当前同时访问该第一数据块的访问者数量;
查询模块, 设置为在所述第二访问次数大于第三阈值时, 查找所述第一数 据块所对应的文件。
9. 根据权利要求 8所述的装置, 其中, 还包括: 第三获取模块, 设置为获取所述第一数据块的特征信息, 其中, 所述特征 信息用于表示仅所述第一数据块具有的内容;
通知模块, 设置为将所述特征信息通知给当前分布式文件系统以及与所述 当前分布式文件系统相连的其它分布式文件系统, 其中, 所述特征信息用于对 所述当前分布式文件系统以及所述其它分布式文件系统进行消重处理。 根据权利要求 9所述的装置, 其中, 还包括: 计数模块, 设置为在将所述第二数据块复制到所述第一存储媒介或第二存 储媒介之后, 将所述第二访问次数减去所述第一访问次数, 得到所述第一数据 块的最新访问次数, 以及将所述第一数据块的被引用计数减 1。
PCT/CN2014/075850 2013-11-26 2014-04-21 去重复数据的恢复方法及装置 WO2015078136A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
RU2016124319A RU2665272C1 (ru) 2013-11-26 2014-04-21 Способ и устройство для восстановления дедуплицированных данных

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310612870.X 2013-11-26
CN201310612870.XA CN104679746A (zh) 2013-11-26 2013-11-26 去重复数据的恢复方法及装置

Publications (1)

Publication Number Publication Date
WO2015078136A1 true WO2015078136A1 (zh) 2015-06-04

Family

ID=53198283

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/075850 WO2015078136A1 (zh) 2013-11-26 2014-04-21 去重复数据的恢复方法及装置

Country Status (3)

Country Link
CN (1) CN104679746A (zh)
RU (1) RU2665272C1 (zh)
WO (1) WO2015078136A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160275171A1 (en) * 2015-03-20 2016-09-22 International Business Machines Corporation Efficient performance of insert and point query operations in a column store
US9922064B2 (en) 2015-03-20 2018-03-20 International Business Machines Corporation Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables
US10108653B2 (en) 2015-03-27 2018-10-23 International Business Machines Corporation Concurrent reads and inserts into a data structure without latching or waiting by readers
US10489403B2 (en) 2014-10-08 2019-11-26 International Business Machines Corporation Embracing and exploiting data skew during a join or groupby
US10831736B2 (en) 2015-03-27 2020-11-10 International Business Machines Corporation Fast multi-tier indexing supporting dynamic update

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877725A (zh) * 2010-06-25 2010-11-03 中兴通讯股份有限公司 分布式存储系统中的副本管理方法及装置
CN102375893A (zh) * 2011-11-17 2012-03-14 浪潮(北京)电子信息产业有限公司 一种分布式文件系统及其建立副本的方法
CN102385554A (zh) * 2011-10-28 2012-03-21 华中科技大学 重复数据删除系统的优化方法
US8489832B1 (en) * 2009-12-10 2013-07-16 Guidance-Tableau, Llc System and method for duplicating electronically stored data
CN103220367A (zh) * 2013-05-13 2013-07-24 深圳市中博科创信息技术有限公司 数据复制方法及数据存储系统
US20130290274A1 (en) * 2012-04-25 2013-10-31 International Business Machines Corporation Enhanced reliability in deduplication technology over storage clouds

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2315349C1 (ru) * 2006-07-12 2008-01-20 Михаил ТОПР Способ репликации информации в распределенных базах данных и система для его осуществления
US8458193B1 (en) * 2012-01-31 2013-06-04 Google Inc. System and method for determining active topics
CN103034592B (zh) * 2012-12-05 2016-09-28 华为技术有限公司 数据处理方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8489832B1 (en) * 2009-12-10 2013-07-16 Guidance-Tableau, Llc System and method for duplicating electronically stored data
CN101877725A (zh) * 2010-06-25 2010-11-03 中兴通讯股份有限公司 分布式存储系统中的副本管理方法及装置
CN102385554A (zh) * 2011-10-28 2012-03-21 华中科技大学 重复数据删除系统的优化方法
CN102375893A (zh) * 2011-11-17 2012-03-14 浪潮(北京)电子信息产业有限公司 一种分布式文件系统及其建立副本的方法
US20130290274A1 (en) * 2012-04-25 2013-10-31 International Business Machines Corporation Enhanced reliability in deduplication technology over storage clouds
CN103220367A (zh) * 2013-05-13 2013-07-24 深圳市中博科创信息技术有限公司 数据复制方法及数据存储系统

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489403B2 (en) 2014-10-08 2019-11-26 International Business Machines Corporation Embracing and exploiting data skew during a join or groupby
US20160275171A1 (en) * 2015-03-20 2016-09-22 International Business Machines Corporation Efficient performance of insert and point query operations in a column store
US9922064B2 (en) 2015-03-20 2018-03-20 International Business Machines Corporation Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables
US10387397B2 (en) 2015-03-20 2019-08-20 International Business Machines Corporation Parallel build of non-partitioned join hash tables and non-enforced n:1 join hash tables
US10394783B2 (en) 2015-03-20 2019-08-27 International Business Machines Corporation Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables
US10650011B2 (en) * 2015-03-20 2020-05-12 International Business Machines Corporation Efficient performance of insert and point query operations in a column store
US11061878B2 (en) 2015-03-20 2021-07-13 International Business Machines Corporation Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables
US10108653B2 (en) 2015-03-27 2018-10-23 International Business Machines Corporation Concurrent reads and inserts into a data structure without latching or waiting by readers
US10831736B2 (en) 2015-03-27 2020-11-10 International Business Machines Corporation Fast multi-tier indexing supporting dynamic update
US11080260B2 (en) 2015-03-27 2021-08-03 International Business Machines Corporation Concurrent reads and inserts into a data structure without latching or waiting by readers

Also Published As

Publication number Publication date
CN104679746A (zh) 2015-06-03
RU2016124319A (ru) 2018-01-09
RU2665272C1 (ru) 2018-08-28

Similar Documents

Publication Publication Date Title
US10489059B2 (en) Tier-optimized write scheme
US10545833B1 (en) Block-level deduplication
US20230259495A1 (en) Global deduplication
US9703803B2 (en) Replica identification and collision avoidance in file system replication
US10216740B2 (en) System and method for fast parallel data processing in distributed storage systems
US10417190B1 (en) Log-structured file system for zone block devices with small zones
US8112463B2 (en) File management method and storage system
CN102594849B (zh) 数据备份、恢复方法、虚拟机快照删除、回滚方法及装置
JP6026738B2 (ja) 重複排除記憶システムのスケーラビリティを向上させるシステムおよび方法
US9678975B2 (en) Reducing digest storage consumption in a data deduplication system
CN106682186B (zh) 文件访问控制列表管理方法和相关装置和系统
US20160196320A1 (en) Replication to the cloud
WO2016041384A1 (zh) 重复数据删除方法和装置
US9785643B1 (en) Systems and methods for reclaiming storage space in deduplicating data systems
WO2015078136A1 (zh) 去重复数据的恢复方法及装置
US20190272260A1 (en) Remote Durable Logging for Journaling File Systems
WO2015054897A1 (zh) 数据存储方法、数据存储装置和存储设备
US9892041B1 (en) Cache consistency optimization
CN106528338B (zh) 一种远程数据复制方法、存储设备及存储系统
WO2015051641A1 (zh) 一种磁盘镜像空间回收的方法及装置
US9336250B1 (en) Systems and methods for efficiently backing up data
CN104965835A (zh) 一种分布式文件系统的文件读写方法及装置
US20220342851A1 (en) File system event monitoring using metadata snapshots
Viji et al. Various data deduplication techniques of primary storage
US10789002B1 (en) Hybrid data deduplication for elastic cloud storage devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14865050

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2016124319

Country of ref document: RU

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 14865050

Country of ref document: EP

Kind code of ref document: A1