WO2016091138A1 - 数据缩减的方法及装置 - Google Patents

数据缩减的方法及装置 Download PDF

Info

Publication number
WO2016091138A1
WO2016091138A1 PCT/CN2015/096568 CN2015096568W WO2016091138A1 WO 2016091138 A1 WO2016091138 A1 WO 2016091138A1 CN 2015096568 W CN2015096568 W CN 2015096568W WO 2016091138 A1 WO2016091138 A1 WO 2016091138A1
Authority
WO
WIPO (PCT)
Prior art keywords
stored
data
data block
preset
storage address
Prior art date
Application number
PCT/CN2015/096568
Other languages
English (en)
French (fr)
Inventor
金添福
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016091138A1 publication Critical patent/WO2016091138A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems

Definitions

  • Embodiments of the present invention relate to storage technologies, and in particular, to a data reduction method and apparatus.
  • the data reduction mainly includes three processes of blocking processing, deduplication processing, and compression processing; wherein the deduplication processing includes fingerprint calculation and weight checking.
  • the storage server receives a write request sent by the client, where the write request includes: data to be stored; secondly, the storage server performs data segmentation on the data to be stored by the block processing, and divides the data to be stored into a preset size.
  • the storage server obtains the fingerprint identifier corresponding to the to-be-stored data block by using the fingerprint algorithm for each data block to be stored, and determines whether the obtained fingerprint identifier is the same as the stored fingerprint identifier in the fingerprint table by checking the weight If the same, it indicates that the data block to be stored corresponding to the fingerprint identifier is duplicated with the data block stored in the storage server, and does not need to be stored; if different, the data block to be stored corresponding to the fingerprint identifier is compressed, and The compressed data block to be stored is stored in the storage server, and the fingerprint identifier is added to the fingerprint table.
  • the embodiment of the invention provides a data reduction method and device for solving the problem of waste of CPU resources of a storage server or a memory.
  • an embodiment of the present invention provides a data reduction method, including:
  • the data block to be stored is subjected to compression processing.
  • the determining, according to the feature information of the data to be stored, determining whether to perform deduplication processing on the to-be-stored data block in the to-be-stored data including Determining whether the data block to be stored needs to be subjected to deduplication processing according to the location information of the data to be stored, and/or the content information of the data block to be stored.
  • the determining, according to the location information of the data to be stored, determining whether the data block to be stored needs to be performed De-reprocessing, including: determining, according to a relative position relationship between the storage address corresponding to the data to be stored and the preset storage address, whether the storage data block needs to be subjected to deduplication processing;
  • the preset storage address includes a first preset storage address and a second preset storage address; the data between the first preset storage address and the second preset storage address is not required to be deduplicated. data.
  • the preset storage address further includes: a third preset storage address and a fourth preset storage address
  • the data between the third preset storage address and the fourth preset storage address is data that needs to be deduplicated.
  • the data that does not need to be deduplicated is metadata.
  • the determining, according to content information of the to-be-stored data block, determining whether the data block to be stored is needed Performing the de-duplication process, including: determining, according to the matching relationship between the content between the first preset offset position and the second preset offset position of the to-be-stored data block and the preset content, whether the to-be-stored is needed The data block is subjected to deduplication processing;
  • the preset content includes content that is to be included in the data block to be stored that does not need to be deduplicated; the first preset offset position and the second preset offset position are used to indicate the preset content.
  • the preset content further includes content that needs to be included in the data block to be stored that needs to be deduplicated.
  • the content includes: a label.
  • the content that needs to be included in the data block to be stored that does not need to be deduplicated is combined with the possible implementation of the fifth to seventh aspects of the first aspect. If the size of the to-be-stored data block is 1K, the first preset offset position is 0, and the second preset offset position is 3.
  • an embodiment of the present invention provides a device for data reduction, where the device is a storage server, or is a memory including a control unit, and the device includes:
  • a determining module configured to determine, according to the feature information of the data to be stored, whether to perform deduplication processing on the to-be-stored data block in the to-be-stored data;
  • a processing module configured to perform de-duplication processing on the to-be-stored data block when the determining module determines that de-duplication processing is to be performed on the to-be-stored data block; otherwise, perform compression processing on the to-be-stored data block.
  • the determining module is specifically configured to: according to location information of the to-be-stored data, and/or content information of the to-be-stored data block Determining whether the data block to be stored needs to be subjected to deduplication processing.
  • the determining module is specifically configured to: according to the storage address and the preset storage corresponding to the to-be-stored data Determining whether the storage data block needs to be subjected to deduplication processing;
  • the preset storage address includes a first preset storage address and a second preset storage address; the data between the first preset storage address and the second preset storage address is not required to be deduplicated. data.
  • the preset storage address further includes: a third preset storage address and a fourth preset storage address
  • the data between the third preset storage address and the fourth preset storage address is data that needs to be deduplicated.
  • the data that does not need to be deduplicated is metadata.
  • the determining module is specifically configured to: according to the first preset offset of the to-be-stored data block Determining whether the data block to be stored needs to be subjected to deduplication processing by matching the content between the location and the second preset offset position with the preset content;
  • the preset content includes content that is to be included in a data block to be stored that does not need to be deduplicated;
  • the first preset offset position and the second preset offset position are used to indicate a relative position of the preset content in a data block to be stored.
  • the preset content further includes content that needs to be included in the data block to be stored that needs to be deduplicated.
  • the content includes: a label.
  • the content that needs to be included in the data block to be stored that does not need to be deduplicated If the size of the to-be-stored data block is 1K, the first preset offset position is 0, and the second preset offset position is 3.
  • An embodiment of the present invention provides a data reduction method and apparatus, by determining, according to feature information of data to be stored, whether to perform deduplication processing on a data block to be stored in the data to be stored; if necessary, Deleting the data block to be stored; if not, performing compression processing on the data block to be stored; so that the storage server does not perform deduplication on the data block that cannot be deduplicated or has a low deduplication rate; The fingerprint calculation and check weight of the data block with low or low repetition rate cannot be removed, and the resource consumption of the CPU of the storage server or the memory is reduced, thereby solving the problem of waste of CPU resources of the storage server or the storage.
  • FIG. 1 is a schematic diagram 1 of an application scenario of a data reduction method according to the present invention.
  • FIG. 2 is a schematic diagram 2 of an application scenario of a data reduction method according to the present invention.
  • Embodiment 3 is a flowchart of Embodiment 1 of a method for data reduction according to the present invention.
  • Embodiment 4 is a flowchart of Embodiment 2 of a method for data reduction according to the present invention.
  • FIG. 5 is a flowchart of Embodiment 4 of a method for data reduction according to the present invention.
  • Embodiment 1 of a data reduction device is a schematic structural diagram of Embodiment 1 of a data reduction device according to the present invention.
  • FIG. 7 is a schematic structural diagram of Embodiment 6 of the data reduction device of the present invention.
  • FIG. 1 is a schematic diagram 1 of an application scenario of a data reduction method according to the present invention.
  • a storage server 11 receives a write request sent by a client 12; and a CPU 111 in the storage server 11 stores the packet through a block processing.
  • the data is subjected to the block processing; the CPU 111 obtains the fingerprint identifier corresponding to the to-be-stored data block by using the fingerprint algorithm for each data block to be stored, and determines whether the obtained fingerprint identifier is the same as the stored fingerprint identifier in the fingerprint table by checking the weight.
  • the data block to be stored corresponding to the fingerprint identifier is duplicated with the data block stored in the storage server, and does not need to be stored; if different, the data block to be stored corresponding to the fingerprint identifier is compressed, and The compressed data block to be stored is stored in the memory 112 in the storage server 11, and the fingerprint identification is added to the fingerprint table.
  • the data block to be stored cannot be deduplicated (that is, the fingerprint identifier of the data block to be stored and the fingerprint identifier stored in the fingerprint table must be different) or the deduplication rate is low (that is, the data block to be stored)
  • the data block to be stored still has to undergo fingerprint calculation and check in the deduplication process: therefore, there is a problem of waste of CPU resources of the storage server 11. .
  • the data reduction method of the present invention can also be applied to a scenario in which the first processing unit in the storage server sends a data write request to the second processing unit, and the second processing unit performs a block processing and a deduplication process on the stored data block; similarly, In this scenario, there is also a problem of wasted CPU resources of the storage server.
  • FIG. 2 is a schematic diagram of an application scenario of the data reduction method of the present invention
  • the memory 21 receives a write request sent by the storage server 22; the CPU 211 in the memory 21 performs block processing on the stored data, De-reprocessing; similarly, there is also a problem of waste of CPU resources of the memory 21 in the prior art.
  • the memory 21 is a memory including a control unit; for example, it may be a solid state drive (SSD, Solid State Drives), or may be a magnetic disk.
  • SSD Solid State Drive
  • magnetic disk for example, it may be a magnetic disk.
  • FIG. 3 is a flowchart of Embodiment 1 of a method for data reduction according to the present invention. As shown in FIG. 3, the method in this embodiment may include:
  • Step 301 Determine, according to the feature information of the data to be stored, whether to perform deduplication processing on the to-be-stored data block in the to-be-stored data;
  • step 302 is performed; otherwise, step 303 is performed.
  • the feature information of the data to be stored includes: location information of the data to be stored, and/or content information of the data block to be stored.
  • Step 302 Perform deduplication processing on the to-be-stored data block.
  • step 302 if it is determined that the to-be-stored data block is duplicated with the stored data block, the data block to be stored need not be stored; if the data block to be stored is determined to have been The stored data blocks are not duplicated, and the to-be-stored data blocks are subjected to compression processing, and the compressed data blocks to be stored are stored.
  • Step 303 Perform compression processing on the to-be-stored data block.
  • the storage server or the memory performs deduplication processing on all the data blocks to be stored.
  • the storage server or the memory determines whether the data to be stored needs to be stored according to the feature information of the data to be stored.
  • the data block is subjected to de-duplication processing; if necessary, the data block to be stored is subjected to de-duplication processing; if not, the data block to be stored is subjected to compression processing.
  • the storage server or the memory performs deduplication processing on all the data blocks to be stored; therefore, when the data block to be stored cannot be deduplicated or the deduplication rate is low, the data block to be stored still needs to be subjected to deduplication processing.
  • the fingerprint is calculated and checked; therefore, there is a problem of wasted CPU resources of the storage server or the storage.
  • the data block to be stored in the data to be stored needs to be subjected to deduplication processing according to the feature information of the data to be stored; if necessary, the data block to be stored is subjected to deduplication processing; If not required, the data block to be stored is subjected to compression processing; so that the storage server or the memory does not perform deduplication on the data block that cannot be deduplicated or has a low deduplication rate; the pair cannot be deduplicated or the deduplication rate is low.
  • the fingerprint calculation and check of the data block reduces the resource consumption of the CPU of the storage server or the memory, thereby solving the problem of waste of CPU resources of the storage server or the storage.
  • the fingerprint calculation and check of the data block with low or de-emphasis rate reduces the resource consumption of the CPU of the storage server or the memory, thereby solving the problem of waste of CPU resources of the storage server or the storage.
  • Embodiment 2 is a flowchart of Embodiment 2 of a method for data reduction according to the present invention. As shown in FIG. 4, the method in this embodiment may include:
  • Step 401 Determine, according to the location information of the data to be stored, whether to perform deduplication processing on the to-be-stored data block in the to-be-stored data.
  • the preset storage address includes a first preset storage address and a second preset storage address; the data between the first preset storage address and the second preset storage address is not required to be deduplicated. data.
  • step 402 is performed; otherwise, step 403 is performed.
  • the first preset storage address and the second preset storage address are boundary values of a storage address corresponding to the first segment storage space; the first segment storage space is stored in a storage medium and does not need to be deduplicated A piece of storage space for the data.
  • the storage medium includes: a magnetic disk, a USB flash drive, an optical disk, and the like.
  • Metadata refers to system data used to describe the characteristics of a file, such as access rights, access time, modification time, and modifiers. Since any operation on a file causes a change in metadata, the metadata is data that does not need to be deduplicated.
  • the first preset storage address can be set to the start address of the 1/8 storage space before the disk partition
  • the second preset storage address can be set to the disk. Determining whether the storage address of the storage space is to be treated by determining whether the storage address corresponding to the data to be stored is between the first preset storage address and the second preset storage address. The data block to be stored is subjected to deduplication processing. When the storage address corresponding to the data to be stored is between the first preset storage address and the second preset storage address, it is determined that the data block to be stored in the stored data is not required to be subjected to deduplication processing.
  • the preset storage address may further include: a third preset storage address and a fourth preset storage address; and data between the third preset storage address and the fourth preset storage address. For the need to go heavy data.
  • the third preset storage address and the fourth preset storage address are second segment storage The boundary value of the storage address corresponding to the storage space; the second segment storage space is a storage space in the storage medium for storing data that needs to be deduplicated.
  • Step 402 Perform deduplication processing on the to-be-stored data block.
  • step 402 is the same as step 302, and details are not described herein again.
  • Step 403 Perform compression processing on the to-be-stored data block.
  • step 403 is the same as step 303, and details are not described herein again.
  • the block performs de-reprocessing; if not, the data block to be stored is compressed; so that the storage server or the memory does not perform deduplication on the data block that cannot be deduplicated or has a low deduplication rate;
  • the fingerprint calculation and check of the data block with low deduplication or de-emphasis rate reduces the resource consumption of the CPU of the storage server or the memory, thereby solving the problem of waste of CPU resources of the storage server or the storage.
  • the step 401 may further be: determining, according to the storage address corresponding to the data to be stored and the pre-stored location class rule, whether the data to be stored needs to be stored.
  • the block is subjected to deduplication processing;
  • the location class rule includes a rule determined according to a relative location relationship between a storage address corresponding to the data to be stored and a preset storage address; the preset storage address includes a first preset storage address and a second pre-predetermined address The storage address is set; the data between the first preset storage address and the second preset storage address is data that does not need to be deduplicated.
  • pre-stored location class rules as shown in Table 1:
  • N1, N2, N3, and N4 are preset storage addresses, and N1 is greater than N2, N3 is greater than N1, and N4 is less than N2.
  • the data block to be stored in the data to be stored needs to be subjected to deduplication processing according to the storage address corresponding to the data to be stored and the pre-stored location class rule; if necessary, the The data block is stored for de-duplication processing; if not, the data block to be stored is compressed; so that the storage server or the memory does not perform deduplication on the data block that cannot be deduplicated or has a low deduplication rate;
  • the fingerprint calculation and check of the data blocks that cannot be deduplicated or de-emphasized have reduced the resource consumption of the CPU of the storage server or the memory, thereby solving the problem of waste of CPU resources of the storage server or the storage.
  • FIG. 5 is a flowchart of Embodiment 4 of a method for data reduction according to the present invention. As shown in FIG. 5, the method in this embodiment may include:
  • Step 501 Determine, according to the content information of the data block to be stored in the data to be stored, whether the data block to be stored needs to be subjected to deduplication processing;
  • the preset content includes content that is to be included in the data block to be stored that does not need to be deduplicated; the first preset offset position and the second preset offset position are used to indicate the preset content.
  • step 502 is performed; otherwise, step 503 is performed.
  • the content includes: a label.
  • the first preset offset position can be set to 0 for the 1K data block, and the second preset offset position can be set. For example, it is determined whether the block needs to be deduplicated by determining whether the content between the first preset offset position and the second preset offset position is "FILE".
  • the preset content may further include a data block to be stored that needs to be deduplicated. content
  • determining, according to the matching relationship between the content between the first preset offset position and the second preset offset position of the to-be-stored data block and the preset content, whether the data block to be stored needs to be performed.
  • it also includes:
  • the data block is stored for deduplication processing.
  • Step 502 Perform deduplication processing on the to-be-stored data block.
  • step 502 is the same as step 302, and details are not described herein again.
  • Step 503 Perform compression processing on the to-be-stored data block.
  • step 503 is the same as step 303, and details are not described herein again.
  • the block determines whether the data to be stored needs to be stored according to a matching relationship between the content between the first preset offset position and the second preset offset position of the to-be-stored data block and the preset content.
  • the block performs de-duplication processing; if necessary, performing de-duplication processing on the to-be-stored data block; if not, performing compression processing on the to-be-stored data block; so that the storage server or the memory pair cannot be deduplicated or deduplicated
  • the data block with low rate is no longer subjected to de-reprocessing; the fingerprint calculation and check of the data block that cannot be deduplicated or de-emphasized is avoided, and the resource consumption of the CPU of the storage server or the memory is reduced, thereby solving the storage server.
  • the problem of wasted CPU resources is avoided.
  • the step 501 may further be: determining, according to the content information of the to-be-stored data block and the pre-stored content class rule, whether the Storing data blocks for deduplication processing;
  • the content class rule includes: a rule determined according to a matching relationship between a content between a first preset offset position and a second preset offset position of the to-be-stored data block and the preset content;
  • the preset content includes content that is to be included in the data block to be stored that does not need to be deduplicated; the first preset offset position and the second preset offset position are used to indicate the preset content.
  • pre-stored content class rules as shown in Table 2:
  • strl1 and strl2 are preset contents; n1, n2, n3, and n4 are preset offset addresses, and n2 is greater than n1, n4 is greater than n3, and n3 is greater than n2.
  • the fingerprint calculation and check of the data block with low or de-emphasis rate reduces the resource consumption of the CPU of the storage server or the memory, thereby solving the problem of waste of CPU resources of the storage server.
  • FIG. 6 is a schematic structural diagram of Embodiment 1 of the data reduction device of the present invention.
  • the device may be a storage server or a memory including a control unit.
  • the data reduction device of this embodiment may include: a determining module 601 and a processing module. 602.
  • the determining module 601 is configured to determine, according to the feature information of the data to be stored, whether the data block to be stored in the data to be stored needs to be subjected to deduplication processing; and the processing module 602 is configured to determine, when the determining module 601 determines that the device needs to be When the storage data block is to be subjected to de-duplication processing, the data block to be stored is subjected to de-duplication processing; otherwise, the data block to be stored is subjected to compression processing.
  • the determining module 601 is configured to: determine, according to the location information of the to-be-stored data, and/or the content information of the to-be-stored data block, whether to perform deduplication processing on the to-be-stored data block.
  • the data reduction device of this embodiment can be used to implement the technical solution of the method embodiment shown in FIG. 3, and the implementation principle and technical effects are similar, and details are not described herein again.
  • the determining module 601 is specifically configured to: determine, according to the relative position relationship between the storage address corresponding to the data to be stored and the preset storage address, whether the need is Decoding the storage data block for deduplication;
  • the preset storage address includes a first preset storage address and a second preset storage address;
  • the data between the first preset storage address and the second preset storage address is data that does not need to be deduplicated.
  • the preset storage address further includes: a third preset storage address and a fourth preset storage address; and data between the third preset storage address and the fourth preset storage address is Need to go to heavy data.
  • the data reduction device of this embodiment can be used to implement the technical solution of the method embodiment shown in FIG. 4, and the implementation principle and technical effects are similar, and details are not described herein again.
  • the determining module 601 is specifically configured to: determine, according to the storage address corresponding to the data to be stored and the pre-stored location class rule, whether the Storing data blocks for deduplication processing;
  • the location class rule includes a rule determined according to a relative location relationship between a storage address corresponding to the data to be stored and a preset storage address; the preset storage address includes a first preset storage address and a second pre-predetermined address The storage address is set; the data between the first preset storage address and the second preset storage address is data that does not need to be deduplicated.
  • the data reduction device of this embodiment can be used to implement the technical solution of the third embodiment of the method for data reduction, and the implementation principle and technical effects are similar, and details are not described herein again.
  • the determining module 601 is configured to: according to the first preset offset position and the second preset offset position of the data block to be stored. Determine a matching relationship between the content and the preset content, and determine whether the data block to be stored needs to be subjected to deduplication processing;
  • the preset content includes content that is to be included in the data block to be stored that does not need to be deduplicated; the first preset offset position and the second preset offset position are used to indicate the preset content.
  • the content includes: a label.
  • the preset content further includes content that needs to be included in the data block to be stored that needs to be deduplicated.
  • the data reduction device of this embodiment can be used to implement the technical solution of the method embodiment shown in FIG. 5, and the implementation principle and technical effects are similar, and details are not described herein again.
  • the determining module 601 is specific. And determining, according to the content information of the to-be-stored data block and the pre-stored content class rule, whether to perform deduplication processing on the to-be-stored data block;
  • the content class rule includes: a rule determined according to a matching relationship between a content between a first preset offset position and a second preset offset position of the to-be-stored data block and the preset content;
  • the preset content includes content that is to be included in the data block to be stored that does not need to be deduplicated; the first preset offset position and the second preset offset position are used to indicate the preset content.
  • the content includes: a label.
  • the preset content further includes content that needs to be included in the data block to be stored that needs to be deduplicated.
  • the data reduction device of this embodiment can be used to implement the technical solution of the fifth embodiment of the method for data reduction, and the implementation principle and technical effects are similar, and details are not described herein again.
  • FIG. 7 is a schematic structural diagram of Embodiment 6 of the data reduction apparatus of the present invention.
  • the data reduction apparatus of this embodiment may include: a processor 701 and a memory 702.
  • the data reduction device may further include a transmitter 703 and a receiver 704.
  • Transmitter 703 and receiver 704 can be coupled to processor 701.
  • the transmitter 703 is configured to transmit data or information
  • the receiver 704 is configured to receive data or information
  • the memory 702 stores execution instructions, when the data reduction device is in operation
  • the processor 701 communicates with the memory 702, and the processor 701 calls the memory.
  • the execution instruction in 702 is used to perform the following operations:
  • the determining, according to the feature information of the data to be stored, determining whether to perform deduplication processing on the to-be-stored data block in the to-be-stored data including: according to location information of the to-be-stored data, and/or The content information of the stored data block is referred to, and it is determined whether the data block to be stored needs to be subjected to deduplication processing.
  • the determining, according to the location information of the to-be-stored data, whether to perform de-duplication processing on the to-be-stored data block including: comparing, according to the storage address corresponding to the to-be-stored data, with a preset storage address a positional relationship, determining whether the storage data block needs to be subjected to deduplication processing;
  • the preset storage address includes a first preset storage address and a second preset storage address; the data between the first preset storage address and the second preset storage address is not required to be deduplicated. data.
  • the preset storage address further includes: a third preset storage address and a fourth preset storage address; and data between the third preset storage address and the fourth preset storage address is Need to go to heavy data.
  • the data that does not need to be deduplicated is metadata.
  • the determining, according to the content information of the to-be-stored data block, whether to perform de-duplication processing on the to-be-stored data block including: according to the first preset offset position of the to-be-stored data block Determine a matching relationship between the content of the second preset offset position and the preset content, and determine whether the data block to be stored needs to be subjected to deduplication processing;
  • the preset content includes content that is to be included in the data block to be stored that does not need to be deduplicated; the first preset offset position and the second preset offset position are used to indicate the preset content.
  • the preset content further includes content that needs to be included in the data block to be stored that needs to be deduplicated.
  • the content includes: a label.
  • the content that is to be included in the to-be-stored data block that is not to be de-duplicated is FILE; if the size of the to-be-stored data block is 1K, the first preset offset position is 0, The second preset offset position is 3.
  • the data reduction device of this embodiment can be used to implement the technical solution of the data reduction method provided by any embodiment of the present invention, and the implementation principle and technical effects thereof are similar, and details are not described herein again.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据缩减的方法及装置。所述数据缩减的方法包括:根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理(301);若需要,则对所述待存储数据块进行去重处理(302);若不需要,则对所述待存储数据块进行压缩处理(303)。所述方法可以避免对无法去重或去重率低的数据块的指纹计算和查重,减少了存储服务器或存储器的CPU的资源消耗,从而解决了存储服务器或存储器的CPU资源浪费的问题。

Description

数据缩减的方法及装置
本申请要求于2014年12月12日提交中国专利局、申请号为201410767371.2、发明名称为“数据缩减的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及存储技术,尤其涉及一种数据缩减的方法及装置。
背景技术
随着需要存储的数据信息的不断增加,数据缩减技术在数据存储中的作用越来越重要。
现有技术中,数据缩减主要包括分块处理、去重处理、压缩处理三个过程;其中,去重处理包括指纹计算和查重。首先,存储服务器接收客户端发送的写入请求,该写入请求包括:待存储数据;其次,存储服务器通过分块处理对待存储数据进行数据分块,将待存储数据分为预设大小的待存储数据块;再次,存储服务器对每一待存储数据块通过指纹算法分别获得该待存储数据块对应的指纹标识,并通过查重确定获得的指纹标识与指纹表中已存储的指纹标识是否相同;若相同,则表明该指纹标识对应的待存储数据块与存储服务器中已存储的数据块重复,不需要存储;若不同,则将该指纹标识所对应的待存储数据块进行压缩处理,将压缩处理后的待存储数据块存储至存储服务器中,并将该指纹标识添加至指纹表。
但是,现有技术中,在进行数据缩减时存在存储服务器的中央处理器(CPU,Central Processing Unit)资源浪费的问题。
发明内容
本发明实施例提供一种数据缩减的方法及装置,用以解决存储服务器或存储器的CPU资源浪费的问题。
第一方面,本发明实施例提供一种数据缩减的方法,包括:
根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;
若需要,则对所述待存储数据块进行去重处理;
若不需要,则对所述待存储数据块进行压缩处理。
结合第一方面,在第一方面的第一种可能实现的方式中,所述根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理,包括:根据所述待存储数据的位置信息,和/或所述待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理。
结合第一方面的第一种可能实现的方式,在第一方面的第二种可能实现的方式中,所述根据所述待存储数据的位置信息,确定是否需要对所述待存储数据块进行去重处理,包括:根据所述待存储数据对应的存储地址与预设存储地址的相对位置关系,确定是否需要对所述存储数据块进行去重处理;
其中,所述预设存储地址包括第一预设存储地址和第二预设存储地址;所述第一预设存储地址与所述第二预设存储地址之间的数据为不需要去重的数据。
结合第一方面的第二种可能实现的方式,在第一方面的第三种可能实现的方式中,所述预设存储地址,还包括:第三预设存储地址和第四预设存储地址;所述第三预设存储地址与所述第四预设存储地址之间的数据为需要去重的数据。
结合第一方面的第二种或第三种可能实现的方式,在第一方面的第四种可能实现的方式中,所述不需要去重的数据为元数据。
结合第一方面的第一种可能实现的方式,在第一方面的第五种可能实现的方式中,所述根据所述待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理,包括:根据所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容与预设内容的匹配关系,确定是否需要对所述待存储数据块进行去重处理;
其中,所述预设内容包括不需要去重的待存储数据块所需要包括的内容;所述第一预设偏移位置及所述第二预设偏移位置用于指示所述预设内容在待存储数据块中的相对位置。
结合第一方面的第五种可能实现的方式,在第一方面的第六种可能实现的方式中,所述预设内容,还包括需要去重的待存储数据块所需要包括的内容。
结合第一方面的第五种或第六种可能实现的方式,在第一方面的第七种可能实现的方式中,所述内容包括:标签。
结合第一方面的第五种至第七种任一种可能实现的方式,在第一方面的第八种可能实现的方式中,所述不需要去重的待存储数据块所需要包括的内容为FILE;若所述待存储数据块的大小为1K,则所述第一预设偏移位置为0,所述第二预设偏移位置为3。
第二方面,本发明实施例提供一种数据缩减的装置,所述装置为存储服务器,或者为包括控制单元的存储器,所述装置包括:
确定模块,用于根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;
处理模块,用于当所述确定模块确定需要对所述待存储数据块进行去重处理时,对所述待存储数据块进行去重处理;否则,对所述待存储数据块进行压缩处理。
结合第二方面,在第二方面的第一种可能实现的方式中,所述确定模块,具体用于:根据所述待存储数据的位置信息,和/或所述待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理。
结合第二方面的第一种可能实现的方式,在第二方面的第二种可能实现的方式中,所述确定模块,具体用于:根据所述待存储数据对应的存储地址与预设存储地址的相对位置关系,确定是否需要对所述存储数据块进行去重处理;
其中,所述预设存储地址包括第一预设存储地址和第二预设存储地址;所述第一预设存储地址与所述第二预设存储地址之间的数据为不需要去重的数据。
结合第二方面的第二种可能实现的方式,在第二方面的第三种可能实现的方式中,所述预设存储地址,还包括:第三预设存储地址和第四预设存储地址;所述第三预设存储地址与所述第四预设存储地址之间的数据为需要去重的数据。
结合第二方面的第二种或第三种可能实现的方式,在第二方面的第四种可能实现的方式中,所述不需要去重的数据为元数据。
结合第二方面的第一种可能实现的方式,在第二方面的第五种可能实现的方式中,所述确定模块,具体用于:根据所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容与预设内容的匹配关系,确定是否需要对所述待存储数据块进行去重处理;
其中,所述预设内容包括不需要去重的待存储数据块所需要包括的内容; 所述第一预设偏移位置及所述第二预设偏移位置用于指示所述预设内容在待存储数据块中的相对位置。
结合第二方面的第五种可能实现的方式,在第人方面的第六种可能实现的方式中,所述预设内容,还包括需要去重的待存储数据块所需要包括的内容。
结合第二方面的第五种或第六种可能实现的方式,在第二方面的第七种可能实现的方式中,所述内容包括:标签。
结合第二方面的第五种至第七种任一种可能实现的方式,在第二方面的第八种可能实现的方式中,所述不需要去重的待存储数据块所需要包括的内容为FILE;若所述待存储数据块的大小为1K,则所述第一预设偏移位置为0,所述第二预设偏移位置为3。
本发明实施例提供一种数据缩减的方法及装置,通过根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;若需要,则对所述待存储数据块进行去重处理;若不需要,则对所述待存储数据块进行压缩处理;使得存储服务器对无法去重或去重率低的数据块不再进行去重处理;避免了对无法去重或去重率低的数据块的指纹计算和查重,减少了存储服务器或存储器的CPU的资源消耗,从而解决了存储服务器或存储器的CPU资源浪费的问题。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明数据缩减方法的应用场景示意图一;
图2为本发明数据缩减方法的应用场景示意图二;
图3为本发明数据缩减的方法实施例一的流程图;
图4为本发明数据缩减的方法实施例二的流程图;
图5为本发明数据缩减的方法实施例四的流程图;
图6为本发明数据缩减装置实施例一的结构示意图;
图7为本发明数据缩减装置实施例六的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1为本发明数据缩减方法的应用场景示意图一;如图1所示,存储系统中,存储服务器11接收客户端12发送的写入请求;存储服务器11中的CPU 111通过分块处理对待存储数据进行分块处理;CPU 111对每一待存储数据块通过指纹算法分别获得该待存储数据块对应的指纹标识,并通过查重确定所获得指纹标识与指纹表中已存储的指纹标识是否相同;若相同,则表明该指纹标识对应的待存储数据块与存储服务器中已存储的数据块重复,不需要存储;若不同,则将该指纹标识所对应的待存储数据块进行压缩处理,将压缩处理后的待存储数据块存储至存储服务器11中的存储器112中,并将该指纹标识添加至指纹表。现有技术中,当待存储数据块无法去重(也即,待存储数据块的指纹标识与指纹表中已存储的指纹标识必定不相同)或去重率低(也即,待存储数据块的指纹标识与指纹表中已存储的指纹标识重复的概率非常小)时,待存储数据块仍然要经过去重处理中的指纹计算和查重:因此,存在存储服务器11的CPU资源浪费的问题。
本发明的数据缩减方法还可以应用于存储服务器内部第一处理单元向第二处理单元发送数据写入请求,第二处理单元对待存储数据块进行分块处理、去重处理的场景;类似的,这种场景下也存在存储服务器的CPU资源浪费的问题。
图2为本发明数据缩减方法的应用场景示意图二;如图2所示,存储系统中,存储器21接收存储服务器22发送的写入请求;存储器21中的CPU 211对待存储数据进行分块处理、去重处理;类似的,现有技术中也存在存储器21的CPU资源浪费的问题。
可选的,存储器21为包括控制单元的存储器;例如,可以为固态硬盘(SSD,Solid State Drives),或者,也可以为磁盘。
需要说明的是,任何需要进行数据缩减处理的场景都是本发明数据缩减的方法的应用场景,都属于本发明的保护范围。
图3为本发明数据缩减的方法实施例一的流程图,如图3所示,本实施例的方法可以包括:
步骤301、根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;
若需要,则执行步骤302;否则,执行步骤303。
其中,待存储数据的特征信息包括:待存储数据的位置信息,和/或待存储数据块的内容信息。
步骤302、对所述待存储数据块进行去重处理;
需要说明的是,在执行步骤302后,若确定所述待存储数据块与已存储的数据块重复,则不需要对所述待存储数据块进行存储;若确定所述待存储数据块与已存储的数据块不重复,则将所述待存储数据块进行压缩处理,并对压缩处理后的待存储数据块进行存储。
步骤303、对所述待存储数据块进行压缩处理。
现有技术中,存储服务器或存储器对所有的待存储数据块都进行去重处理;本发明中,存储服务器或存储器根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;若需要,则对所述待存储数据块进行去重处理;若不需要,则对所述待存储数据块进行压缩处理。
现有技术中,由于存储服务器或存储器对所有的待存储数据块都进行去重处理;因此,当待存储数据块无法去重或去重率低时,待存储数据块仍然需要经过去重处理中的指纹计算和查重;因此,存在存储服务器或存储器的CPU资源浪费的问题。本发明中,通过根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;若需要,则对所述待存储数据块进行去重处理;若不需要,则对所述待存储数据块进行压缩处理;使得存储服务器或存储器对无法去重或去重率低的数据块不再进行去重处理;避免了对无法去重或去重率低的数据块的指纹计算和查重,减少了存储服务器或存储器的CPU的资源消耗,从而解决了存储服务器或存储器的CPU资源浪费的问题。
本实施例中,通过根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;若需要,则对所述待存储数据块进行去重处理;若不需要,则对所述待存储数据块进行压缩处理;使得存储服务器或存储器对无法去重或去重率低的数据块不再进行去重处理;避免了对无法去 重或去重率低的数据块的指纹计算和查重,减少了存储服务器或存储器的CPU的资源消耗,从而解决了存储服务器或存储器的CPU资源浪费的问题。
图4为本发明数据缩减的方法实施例二的流程图,如图4所示,本实施例的方法可以包括:
步骤401、根据待存储数据的位置信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;
具体的,根据所述待存储数据对应的存储地址与预设存储地址的相对位置关系,确定是否需要对所述存储数据块进行去重处理;
其中,所述预设存储地址包括第一预设存储地址和第二预设存储地址;所述第一预设存储地址与所述第二预设存储地址之间的数据为不需要去重的数据。
若需要,则执行步骤402;否则,执行步骤403。
其中,所述第一预设存储地址和所述第二预设存储地址为第一段存储空间所对应的存储地址的边界值;所述第一段存储空间为存储介质中存储不需要去重的数据的一段存储空间。
可选的,所述存储介质包括:磁盘、U盘、光盘等。
例如,文件系统中存储的内容可分为数据和元数据。数据是指普通文件中的实际数据,元数据指用来描述一个文件的特征的系统数据,例如访问权限、访问时间、修改时间、修改人等。由于对一个文件的任何操作都会造成元数据的变化,因此,元数据为不需要去重的数据。
由于元数据通常存储在磁盘分区前1/8的存储空间,因此可以将第一预设存储地址设置为磁盘分区前1/8存储空间的起始地址,将第二预设存储地址设置为磁盘分区前1/8存储空间的终止地址,通过确定待存储数据对应的存储地址是否在所述第一预设存储地址与所述第二预设存储地址之间,来确定是否需要对待存储数据中的待存储数据块进行去重处理。当待存储数据对应的存储地址在所述第一预设存储地址与所述第二预设存储地址之间时,则确定不需要对待存储数据中的待存储数据块进行去重处理。
可选的,所述预设存储地址,还可以包括:第三预设存储地址和第四预设存储地址;所述第三预设存储地址与所述第四预设存储地址之间的数据为需要去重的数据。
可选的,所述第三预设存储地址和所述第四预设存储地址为第二段存储空 间所对应的存储地址的边界值;所述第二段存储空间为存储介质中存储需要去重的数据的一段存储空间。
步骤402、对所述待存储数据块进行去重处理;
需要说明的是,步骤402与步骤302相同,在此不再赘述。
步骤403、对所述待存储数据块进行压缩处理。
需要说明的是,步骤403与步骤303相同,在此不再赘述。
本实施例中,通过根据所述待存储数据对应的存储地址与预设存储地址的相对位置关系,确定是否需要对所述存储数据块进行去重处理;若需要,则对所述待存储数据块进行去重处理;若不需要,则对所述待存储数据块进行压缩处理;使得存储服务器或存储器对无法去重或去重率低的数据块不再进行去重处理;避免了对无法去重或去重率低的数据块的指纹计算和查重,减少了存储服务器或存储器的CPU的资源消耗,从而解决了存储服务器或存储器的CPU资源浪费的问题。
数据缩减的方法实施例三
可选的,在数据缩减的方法实施例二的基础上,步骤401具体还可以为:根据所述待存储数据对应的存储地址及预先存储的位置类规则,确定是否需要对所述待存储数据块进行去重处理;
其中,所述位置类规则,包括根据待存储数据对应的存储地址与预设存储地址之间的相对位置关系所确定的规则;所述预设存储地址包括第一预设存储地址和第二预设存储地址;所述第一预设存储地址与所述第二预设存储地址之间的数据为不需要去重的数据。
本实施例,通过将待存储数据对应的存储地址与预先存储的位置类规则中的各条规则进行比较,确定是否需要对所述待存储数据块进行去重处理。
例如,预先存储的位置类规则,如表1所示:
表1
Figure PCTCN2015096568-appb-000001
其中,loc为待存储数据对应的存储位置;N1、N2、N3、N4为预设存储地址,且N1大于N2,N3大于N1,N4小于N2。
本实施例中,通过根据待存储数据对应的存储地址及预先存储的位置类规则,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;若需要,则对所述待存储数据块进行去重处理;若不需要,则对所述待存储数据块进行压缩处理;使得存储服务器或存储器对无法去重或去重率低的数据块不再进行去重处理;避免了对无法去重或去重率低的数据块的指纹计算和查重,减少了存储服务器或存储器的CPU的资源消耗,从而解决了存储服务器或存储器的CPU资源浪费的问题。
图5为本发明数据缩减的方法实施例四的流程图,如图5所示,本实施例的方法可以包括:
步骤501、根据待存储数据中的待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理;
具体的,根据所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容与预设内容的匹配关系,确定是否需要对所述待存储数据块进行去重处理;
其中,所述预设内容包括不需要去重的待存储数据块所需要包括的内容;所述第一预设偏移位置及所述第二预设偏移位置用于指示所述预设内容在待存储数据块中的相对位置。
若需要,则执行步骤502;否则,执行步骤503。
可选的,所述内容包括:标签。
例如,对于桌面(windows)新技术文件系统(NTFS,New Technology File System)的主文件表(MFT,Master File Table)分区中每个1K的MFT记录,由于这些1K数据块中包含的是日期、时间等信息,因此去重率不高。并且,由于这些1K数据块的前四个字节的内容都为标签“FILE”,因此对于1K的数据块可以将第一预设偏移位置设置为0,将第二预设偏移位置设置为3,通过确定第一预设偏移位置与第二预设偏移位置之间的内容是否为“FILE”,来确定是否需要对该块进行去重。当待存储数据块第一预设偏移位置与第二预设偏移位置之间的内容(也即,前四个字节)为“FILE”(也即,与预设内容匹配)时,确定不需要对该待存储数据块进行去重处理。
可选的,所述预设内容还可以包括需要去重的待存储数据块所需要包括的 内容;
可选的,根据所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容与预设内容的匹配关系,确定是否需要对所述待存储数据块进行去重处理,还包括:
当所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容为需要去重的待存储数据块所需包括的内容时,则确定需要对所述待存储数据块进行去重处理。
步骤502、对所述待存储数据块进行去重处理;
需要说明的是,步骤502与步骤302相同,在此不再赘述。
步骤503、对所述待存储数据块进行压缩处理。
需要说明的是,步骤503与步骤303相同,在此不再赘述。
本实施例中,通过根据所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容与预设内容的匹配关系,确定是否需要对所述待存储数据块进行去重处理;若需要,则对所述待存储数据块进行去重处理;若不需要,则对所述待存储数据块进行压缩处理;使得存储服务器或存储器对无法去重或去重率低的数据块不再进行去重处理;避免了对无法去重或去重率低的数据块的指纹计算和查重,减少了存储服务器或存储器的CPU的资源消耗,从而解决了存储服务器的CPU资源浪费的问题。
数据缩减的方法实施例五
可选的,在本发明数据缩减的方法实施例四的基础上,步骤501具体还可以为:根据所述待存储数据块的内容信息及预先存储的内容类规则,确定是否需要对所述待存储数据块进行去重处理;
其中,所述内容类规则,包括:根据待存储数据块的第一预设偏移位置和第二预设偏移位置之间的内容与预设内容之间的匹配关系所确定的规则;
其中,所述预设内容包括不需要去重的待存储数据块所需要包括的内容;所述第一预设偏移位置及所述第二预设偏移位置用于指示所述预设内容在待存储数据块中的相对位置。
本实施例,通过将待存储数据块的内容信息与预先存储的内容类规则中的各条规则进行比较,确定是否需要对所述待存储数据块进行去重处理。
例如,预先存储的内容类规则,如表2所示:
表2
Figure PCTCN2015096568-appb-000002
其中,strl1、strl2为预设内容;n1、n2、n3、n4为预设偏移地址,且n2大于n1,n4大于n3,n3大于n2。
本实施例中,通过根据所述待存储数据块的内容信息及预先存储的内容类规则,确定是否需要对所述待存储数据块进行去重处理;若需要,则对所述待存储数据块进行去重处理;若不需要,则对所述待存储数据块进行压缩处理;使得存储服务器或存储器对无法去重或去重率低的数据块不再进行去重处理;避免了对无法去重或去重率低的数据块的指纹计算和查重,减少了存储服务器或存储器的CPU的资源消耗,从而解决了存储服务器的CPU资源浪费的问题。
图6为本发明数据缩减装置实施例一的结构示意图,该装置可以为存储服务器或包括控制单元的存储器,如图6所示,本实施例的数据缩减装置可以包括:确定模块601和处理模块602。其中,确定模块601,用于根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;处理模块602,用于当确定模块601确定需要对所述待存储数据块进行去重处理时,对所述待存储数据块进行去重处理;否则,对所述待存储数据块进行压缩处理。
可选的,确定模块601,具体用于:根据所述待存储数据的位置信息,和/或所述待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理。
本实施例的数据缩减装置,可以用于执行图3所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
数据缩减装置实施例二
在本发明数据缩减装置实施例一的基础上,可选的,确定模块601,具体用于:根据所述待存储数据对应的存储地址与预设存储地址的相对位置关系,确定是否需要对所述存储数据块进行去重处理;
其中,所述预设存储地址包括第一预设存储地址和第二预设存储地址;所 述第一预设存储地址与所述第二预设存储地址之间的数据为不需要去重的数据。
可选的,所述预设存储地址,还包括:第三预设存储地址和第四预设存储地址;所述第三预设存储地址与所述第四预设存储地址之间的数据为需要去重的数据。
本实施例的数据缩减装置,可以用于执行图4所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
数据缩减装置实施例三
在本发明数据缩减装置实施例一的基础上,可选的,确定模块601,具体用于:根据所述待存储数据对应的存储地址及预先存储的位置类规则,确定是否需要对所述待存储数据块进行去重处理;
其中,所述位置类规则,包括根据待存储数据对应的存储地址与预设存储地址之间的相对位置关系所确定的规则;所述预设存储地址包括第一预设存储地址和第二预设存储地址;所述第一预设存储地址与所述第二预设存储地址之间的数据为不需要去重的数据。
本实施例的数据缩减装置,可以用于执行数据缩减的方法实施例三的技术方案,其实现原理和技术效果类似,此处不再赘述。
数据缩减装置实施例四
在本发明数据缩减装置实施例一的基础上,可选的,确定模块601,具体用于:根据所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容与预设内容的匹配关系,确定是否需要对所述待存储数据块进行去重处理;
其中,所述预设内容包括不需要去重的待存储数据块所需要包括的内容;所述第一预设偏移位置及所述第二预设偏移位置用于指示所述预设内容在待存储数据块中的相对位置。
其中,所述内容包括:标签。
可选的,所述预设内容,还包括需要去重的待存储数据块所需要包括的内容。
本实施例的数据缩减装置,可以用于执行图5所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
数据缩减装置实施例五
在本发明数据缩减装置实施例一的基础上,可选的,确定模块601,具体 用于:根据所述待存储数据块的内容信息及预先存储的内容类规则,确定是否需要对所述待存储数据块进行去重处理;
其中,所述内容类规则,包括:根据待存储数据块的第一预设偏移位置和第二预设偏移位置之间的内容与预设内容之间的匹配关系所确定的规则;
其中,所述预设内容包括不需要去重的待存储数据块所需要包括的内容;所述第一预设偏移位置及所述第二预设偏移位置用于指示所述预设内容在待存储数据块中的相对位置。
其中,所述内容包括:标签。
可选的,所述预设内容,还包括需要去重的待存储数据块所需要包括的内容。
本实施例的数据缩减装置,可以用于执行数据缩减的方法实施例五的技术方案,其实现原理和技术效果类似,此处不再赘述。
图7为本发明数据缩减装置实施例六的结构示意图,如图7所示,本实施例的数据缩减装置可以包括:处理器701和存储器702。该数据缩减装置还可以包括发射器703、接收器704。发射器703和接收器704可以和处理器701相连。其中,发射器703用于发送数据或信息,接收器704用于接收数据或信息,存储器702存储执行指令,当数据缩减装置运行时,处理器701与存储器702之间通信,处理器701调用存储器702中的执行指令,用于执行以下操作:
根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;若需要,则对所述待存储数据块进行去重处理;若不需要,则对所述待存储数据块进行压缩处理。
可选的,所述根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理,包括:根据所述待存储数据的位置信息,和/或所述待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理。
可选的,所述根据所述待存储数据的位置信息,确定是否需要对所述待存储数据块进行去重处理,包括:根据所述待存储数据对应的存储地址与预设存储地址的相对位置关系,确定是否需要对所述存储数据块进行去重处理;
其中,所述预设存储地址包括第一预设存储地址和第二预设存储地址;所述第一预设存储地址与所述第二预设存储地址之间的数据为不需要去重的数据。
可选的,所述预设存储地址,还包括:第三预设存储地址和第四预设存储地址;所述第三预设存储地址与所述第四预设存储地址之间的数据为需要去重的数据。
可选的,所述不需要去重的数据为元数据。
可选的,所述根据所述待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理,包括:根据所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容与预设内容的匹配关系,确定是否需要对所述待存储数据块进行去重处理;
其中,所述预设内容包括不需要去重的待存储数据块所需要包括的内容;所述第一预设偏移位置及所述第二预设偏移位置用于指示所述预设内容在待存储数据块中的相对位置。
可选的,所述预设内容,还包括需要去重的待存储数据块所需要包括的内容。
可选的,所述内容包括:标签。
可选的,所述不需要去重的待存储数据块所需要包括的内容为FILE;若所述待存储数据块的大小为1K,则所述第一预设偏移位置为0,所述第二预设偏移位置为3。
本实施例的数据缩减装置,可以用于执行本发明任意实施例所提供的数据缩减的方法的技术方案,其实现原理和技术效果类似,此处不再赘述。
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (18)

  1. 一种数据缩减的方法,其特征在于,包括:
    根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;
    若需要,则对所述待存储数据块进行去重处理;
    若不需要,则对所述待存储数据块进行压缩处理。
  2. 根据权利要求1所述的方法,其特征在于,所述根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理,包括:
    根据所述待存储数据的位置信息,和/或所述待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述待存储数据的位置信息,确定是否需要对所述待存储数据块进行去重处理,包括:
    根据所述待存储数据对应的存储地址与预设存储地址的相对位置关系,确定是否需要对所述存储数据块进行去重处理;
    其中,所述预设存储地址包括第一预设存储地址和第二预设存储地址;所述第一预设存储地址与所述第二预设存储地址之间的数据为不需要去重的数据。
  4. 根据权利要求3所述的方法,其特征在于,所述预设存储地址,还包括:第三预设存储地址和第四预设存储地址;所述第三预设存储地址与所述第四预设存储地址之间的数据为需要去重的数据。
  5. 根据权利要求3或4所述的方法,其特征在于,所述不需要去重的数据为元数据。
  6. 根据权利要求2所述的方法,其特征在于,所述根据所述待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理,包括:
    根据所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容与预设内容的匹配关系,确定是否需要对所述待存储数据块进行去重处理;
    其中,所述预设内容包括不需要去重的待存储数据块所需要包括的内容;所述第一预设偏移位置及所述第二预设偏移位置用于指示所述预设内容在待存储数据块中的相对位置。
  7. 根据权利要求6所述的方法,其特征在于,所述预设内容,还包括需要 去重的待存储数据块所需要包括的内容。
  8. 根据权利要求6或7所述的方法,其特征在于,所述内容包括:标签。
  9. 根据权利要求6~8任一项所述的方法,其特征在于,所述不需要去重的待存储数据块所需要包括的内容为FILE;若所述待存储数据块的大小为1K,则所述第一预设偏移位置为0,所述第二预设偏移位置为3。
  10. 一种数据缩减装置,所述装置为存储服务器,或者为包括控制单元的存储器,所述装置包括:
    确定模块,用于根据待存储数据的特征信息,确定是否需要对所述待存储数据中的待存储数据块进行去重处理;
    处理模块,用于当所述确定模块确定需要对所述待存储数据块进行去重处理时,对所述待存储数据块进行去重处理;否则,对所述待存储数据块进行压缩处理。
  11. 根据权利要求10所述的装置,其特征在于,所述确定模块,具体用于:
    根据所述待存储数据的位置信息,和/或所述待存储数据块的内容信息,确定是否需要对所述待存储数据块进行去重处理。
  12. 根据权利要求11所述的装置,其特征在于,所述确定模块,具体用于:
    根据所述待存储数据对应的存储地址与预设存储地址的相对位置关系,确定是否需要对所述存储数据块进行去重处理;
    其中,所述预设存储地址包括第一预设存储地址和第二预设存储地址;所述第一预设存储地址与所述第二预设存储地址之间的数据为不需要去重的数据。
  13. 根据权利要求12所述的装置,其特征在于,所述预设存储地址,还包括:第三预设存储地址和第四预设存储地址;所述第三预设存储地址与所述第四预设存储地址之间的数据为需要去重的数据。
  14. 根据权利要求12或13所述的装置,其特征在于,所述不需要去重的数据为元数据。
  15. 根据权利要求11所述的装置,其特征在于,所述确定模块,具体用于:
    根据所述待存储数据块的第一预设偏移位置与第二预设偏移位置之间的内容与预设内容的匹配关系,确定是否需要对所述待存储数据块进行去重处理;
    其中,所述预设内容包括不需要去重的待存储数据块所需要包括的内容;所述第一预设偏移位置及所述第二预设偏移位置用于指示所述预设内容在待存 储数据块中的相对位置。
  16. 根据权利要求15所述的装置,其特征在于,所述预设内容,还包括需要去重的待存储数据块所需要包括的内容。
  17. 根据权利要求15或16所述的装置,其特征在于,所述内容包括:标签。
  18. 根据权利要求15~17任一项所述的装置,其特征在于,所述不需要去重的待存储数据块所需要包括的内容为FILE;若所述待存储数据块的大小为1K,则所述第一预设偏移位置为0,所述第二预设偏移位置为3。
PCT/CN2015/096568 2014-12-12 2015-12-07 数据缩减的方法及装置 WO2016091138A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410767371.2 2014-12-12
CN201410767371.2A CN104484132B (zh) 2014-12-12 2014-12-12 数据缩减的方法及装置

Publications (1)

Publication Number Publication Date
WO2016091138A1 true WO2016091138A1 (zh) 2016-06-16

Family

ID=52758680

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/096568 WO2016091138A1 (zh) 2014-12-12 2015-12-07 数据缩减的方法及装置

Country Status (2)

Country Link
CN (1) CN104484132B (zh)
WO (1) WO2016091138A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10901951B2 (en) 2018-07-17 2021-01-26 International Business Machines Corporation Memory compaction for append-only formatted data in a distributed storage network

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484132B (zh) * 2014-12-12 2017-11-17 华为技术有限公司 数据缩减的方法及装置
US20160378352A1 (en) * 2015-06-26 2016-12-29 Intel Corporation Efficient solid state drive data compression scheme and layout
CN105302495B (zh) * 2015-11-20 2019-05-28 华为技术有限公司 数据存储方法及装置
CN113296709B (zh) 2017-06-02 2024-03-08 伊姆西Ip控股有限责任公司 用于去重的方法和设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243769A1 (en) * 2007-03-30 2008-10-02 Symantec Corporation System and method for exporting data directly from deduplication storage to non-deduplication storage
CN101916171A (zh) * 2010-07-16 2010-12-15 中国科学院计算技术研究所 一种并发层次式的重复数据消除方法和系统
US20110184908A1 (en) * 2010-01-28 2011-07-28 Alastair Slater Selective data deduplication
CN102591855A (zh) * 2012-01-13 2012-07-18 广州从兴电子开发有限公司 一种数据标识方法及系统
CN104063374A (zh) * 2013-03-18 2014-09-24 阿里巴巴集团控股有限公司 一种对数据进行去重的方法和设备
CN104484132A (zh) * 2014-12-12 2015-04-01 华为技术有限公司 数据缩减的方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0104227D0 (en) * 2001-02-21 2001-04-11 Ibm Information component based data storage and management

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243769A1 (en) * 2007-03-30 2008-10-02 Symantec Corporation System and method for exporting data directly from deduplication storage to non-deduplication storage
US20110184908A1 (en) * 2010-01-28 2011-07-28 Alastair Slater Selective data deduplication
CN101916171A (zh) * 2010-07-16 2010-12-15 中国科学院计算技术研究所 一种并发层次式的重复数据消除方法和系统
CN102591855A (zh) * 2012-01-13 2012-07-18 广州从兴电子开发有限公司 一种数据标识方法及系统
CN104063374A (zh) * 2013-03-18 2014-09-24 阿里巴巴集团控股有限公司 一种对数据进行去重的方法和设备
CN104484132A (zh) * 2014-12-12 2015-04-01 华为技术有限公司 数据缩减的方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10901951B2 (en) 2018-07-17 2021-01-26 International Business Machines Corporation Memory compaction for append-only formatted data in a distributed storage network

Also Published As

Publication number Publication date
CN104484132A (zh) 2015-04-01
CN104484132B (zh) 2017-11-17

Similar Documents

Publication Publication Date Title
WO2016091138A1 (zh) 数据缩减的方法及装置
US11132338B1 (en) Sharing services between deduplication systems
US10254989B2 (en) Method and apparatus of data deduplication storage system
US20200150890A1 (en) Data Deduplication Method and Apparatus
JP2020521254A5 (zh)
AU2011256912B2 (en) Systems and methods for providing increased scalability in deduplication storage systems
US9792350B2 (en) Real-time classification of data into data compression domains
US9864542B2 (en) Data deduplication using a solid state drive controller
JP2018505501A5 (zh)
US9003228B2 (en) Consistency of data in persistent memory
US11409766B2 (en) Container reclamation using probabilistic data structures
WO2016101165A1 (zh) 事务处理的方法、装置及计算机系统
US10152274B2 (en) Method and apparatus for reading/writing data from/into flash memory, and user equipment
US10761760B2 (en) Duplication between disparate deduplication systems
WO2017113059A1 (zh) 一种差异数据备份方法、存储系统和差异数据备份装置
US20170052736A1 (en) Read ahead buffer processing
US10503717B1 (en) Method for locating data on a deduplicated storage system using a SSD cache index
US9619336B2 (en) Managing production data
US10437784B2 (en) Method and system for endurance enhancing, deferred deduplication with hardware-hash-enabled storage device
US10642758B2 (en) Storage drive and method of executing a compare command
US10248677B1 (en) Scaling an SSD index on a deduplicated storage system
CN106528876B (zh) 分布式系统的信息处理方法及分布式信息处理系统
US10268543B2 (en) Online volume repair
US20140207743A1 (en) Method for Storage Driven De-Duplication of Server Memory
US9703497B2 (en) Storage system and storage control method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15868277

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15868277

Country of ref document: EP

Kind code of ref document: A1