CN106383670B - Data processing method and storage device - Google Patents

Data processing method and storage device Download PDF

Info

Publication number
CN106383670B
CN106383670B CN201610839436.9A CN201610839436A CN106383670B CN 106383670 B CN106383670 B CN 106383670B CN 201610839436 A CN201610839436 A CN 201610839436A CN 106383670 B CN106383670 B CN 106383670B
Authority
CN
China
Prior art keywords
data block
characteristic value
data
storage device
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610839436.9A
Other languages
Chinese (zh)
Other versions
CN106383670A (en
Inventor
袁冉胤
游俊
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201610839436.9A priority Critical patent/CN106383670B/en
Publication of CN106383670A publication Critical patent/CN106383670A/en
Application granted granted Critical
Publication of CN106383670B publication Critical patent/CN106383670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data processing scheme, wherein a storage device stores a first mapping relation, the first mapping relation comprises mapping of a first characteristic value and data with a specific format, the storage device calculates a first data block to obtain the characteristic value of the first data block, and the characteristic value of the first data block is the first characteristic value; the storage device queries the first mapping relation according to the first characteristic value of the first data block to determine that the first mapping relation contains the first characteristic value, the first data block belongs to the data with the specific format, and the storage device does not perform data de-duplication on the first data block any more.

Description

Data processing method and storage device
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a data processing method and a storage device.
Background
De-duplication is a very popular technology in the field of data storage technology, and only one unique data is reserved by deleting duplicated data in data, so that redundant data is eliminated. Such techniques may greatly reduce the need for physical storage space, thereby meeting the ever-increasing data storage needs.
In the prior art, if it is determined that a characteristic value of a certain data block to be subjected to data de-duplication already exists in a storage device, it indicates that the data block is already stored in the storage device, and the storage device already stores a mapping relationship between the characteristic value and a storage address for storing the data block. In this case, the storage device will update the reference count for that storage address. When reading the data block to be deduplicated, the mapping relationship between the characteristic value and the storage address storing the data block needs to be queried according to the characteristic value of the data block to be deduplicated, and the data is read from the storage address. Therefore, there are inevitably multiple access operations to the memory address.
Disclosure of Invention
In a first aspect, an embodiment of the present invention provides a data processing scheme, where a storage device stores a first mapping relationship, where the first mapping relationship includes a mapping between a first feature value and data in a specific format corresponding to the first feature value, the storage device calculates a first data block to obtain the feature value of the first data block, and the feature value of the first data block is the first feature value; the storage device queries the first mapping relation according to the first characteristic value of the first data block to determine that the first mapping relation contains the first characteristic value, the first data block belongs to the data with the specific format, and the storage device does not perform data de-duplication on the first data block any more. The characteristic format data may be all 0 data or all 1 data of a specific length, or a combination of 0 and 1 data, or may be data with a relatively high repetition number (repetition degree), where the repetition number may be determined by reference counting. The characteristic value may be a fingerprint of the data block obtained using a Hash algorithm. When the first data block is data in a specific format, the storage device does not need to perform further deduplication operations, that is, the following operations are not needed: when the first data block is inquired to be stored in the storage device, the reference count of the storage address corresponding to the first characteristic value is updated, or when the first data block is not stored in the storage device, the storage device allocates the storage address for the first data block, stores the first data block in the storage address, establishes the mapping relation between the first characteristic value and the storage address, and reduces the access operation to the storage address. More importantly, when the first data block is accessed, the first data block can be directly obtained by querying the first mapping relation according to the characteristic value of the first data block, the storage address for storing the first data block is not required to be determined according to the first characteristic value, and the storage address for storing the first data block is accessed to obtain the first data block, so that the access operation on the storage address is further reduced.
Optionally, the storage device stores a second mapping relationship; wherein the second mapping relationship comprises a mapping of the second characteristic value and the first storage address; the first storage address stores data corresponding to the second characteristic value; the storage device calculates the second data block to obtain a characteristic value of the second data block, wherein the characteristic value of the second data block is a second characteristic value; the storage device queries the first mapping relation according to the second characteristic value of the second data block to determine that the first mapping relation does not contain the second characteristic value; the storage device queries a second mapping relation according to a second characteristic value of the second data block to determine that the second mapping relation contains the second characteristic value; the storage device updates a reference count for the first storage address. On one hand, the data with the specific format is controlled within a certain data range, the size of the first mapping relation stored in the storage device can be controlled within a certain range, and the storage device is prevented from occupying too large cache when the first mapping relation is loaded; meanwhile, data which do not belong to a specific lattice number can be processed according to the existing data de-duplication process, so that the storage space of the storage device is saved.
Optionally, the storage device stores a second mapping relationship; the second mapping relation comprises the mapping of the second characteristic value and the first storage address; the first storage address stores data corresponding to the second characteristic value; the storage device calculates a third data block to obtain a characteristic value of the third data block, wherein the characteristic value of the third data block is a third characteristic value; the storage device queries the first mapping relation according to the third characteristic value to determine that the first mapping relation does not contain the third characteristic value; the storage device queries the second mapping relation according to the third characteristic value to determine that the second mapping relation does not contain the third characteristic value; the storage device stores the third data block to the second storage address; the storage device establishes mapping between the third characteristic value and the second storage address in the second mapping relationship, that is, the second mapping relationship includes mapping between the third characteristic value and the second storage address. On one hand, the size of the first mapping relation stored by the storage device can be controlled within a certain range, so that the storage device is prevented from occupying an overlarge cache when the first mapping relation is loaded, and meanwhile, data which do not belong to a specific lattice number can be processed according to the existing data de-duplication process, so that the storage space of the storage device is saved. Further, the storage device updates a reference count of the second storage address.
Optionally, the storage device divides the data segment to obtain a first data block; the storage device establishes a mapping relationship between the data segment and the first characteristic value of the first data block. When the data segment is accessed, the first characteristic value is determined according to the mapping relation between the data segment and the first characteristic value of the first data block, the storage device queries the first mapping relation according to the first characteristic value to obtain the first data block, the storage address in the storage device does not need to be accessed any more, and the access operation to the storage address is reduced.
Optionally, the second mapping relationship includes a mapping between the first characteristic value and a third storage address, where the third storage address stores data whose characteristic value is the first characteristic value; when the reference count of the third storage address is greater than a threshold value R, the storage device establishes a mapping between the first characteristic value and data with a characteristic value of the first characteristic value in the first mapping relationship, that is, the data with the characteristic value of the first characteristic value is the specific format data corresponding to the first characteristic value, where R is an integer greater than 0. Further, the storage device deletes the mapping between the first characteristic value and the third storage address in the second mapping relationship and the data in the third storage address or sets the mapping between the first characteristic value and the third storage address and the data in the third storage address to be invalid by using an invalid identifier. Access operations to the memory address are further reduced.
In a second aspect, correspondingly, an embodiment of the present invention further provides a storage device, which is used to implement various implementation schemes of the first aspect. The storage device includes a structural unit for implementing various implementations of the first aspect of the embodiment of the present invention, or the storage device includes an interface and a processor for respectively executing various implementations of the first aspect of the embodiment of the present invention.
Accordingly, the present invention further provides a non-volatile computer-readable storage medium and a computer program product, which, when the memory of the storage device provided by the embodiment of the present invention is loaded with the computer instructions contained in the non-volatile computer-readable storage medium and the computer program product, and a Central Processing Unit (CPU) of the storage device executes the computer instructions, respectively, cause the storage device to execute various possible implementation schemes of the first aspect of the embodiment of the present invention.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below.
Fig. 1 is a schematic structural diagram of a storage device according to an embodiment of the present invention;
fig. 2 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 3 is a block diagram of a data segment according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a mapping relationship between a logical address of a data segment and a syndrome of the data block according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a storage device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention.
As shown in fig. 1, the storage device includes a Central Processing Unit (CPU) 101, a memory 102 and an interface 103, where the memory 102 stores computer instructions, and the CPU101 executes the computer instructions in the memory 102 to manage and deduplicate the storage system. In addition, in order to save the computing resources of the CPU101, a Field Programmable Gate Array (FPGA) or other hardware may also be used to execute all operations of the CPU in the embodiment of the present invention, or the FPGA or other hardware and the CPU are respectively used to execute partial operations of the CPU in the embodiment of the present invention, so as to implement the technical solution described in the embodiment of the present invention. For convenience of description, the processor collectively described as the controller in the embodiments of the present invention is used to implement the technical solutions in the embodiments of the present invention. The interface 103 communicates with the processor, and the interface 103 may specifically be a Host Bus Adapter (HBA) card, a Peripheral Component Interconnect Express (PCIE) interface card, or the like.
In the embodiment of the present invention shown in fig. 2, the storage device stores a first mapping relationship, where the first mapping relationship includes a mapping between a first feature value and data in a specific format corresponding to the first feature value, and a second mapping relationship, where the second mapping relationship includes a mapping between a second feature value and a first storage address. The characteristic format data may be all 0 data or all 1 data of a specific length, or a combination of 0 and 1 data, or data with a repetition degree greater than a threshold, and the first characteristic value is a characteristic value of the special format data. The first memory address stores a unique data block in the memory device and the second characteristic value is a characteristic value of the unique data block. The unique data block means that the data block is different from other data blocks in the storage device, and the specific implementation that the repetition degree is greater than the threshold value can be that the reference count of the storage address of the stored data block is greater than the threshold value. The embodiment shown in fig. 2 discloses the following data processing scheme:
step 201: the storage device retrieves the data segment.
The storage device obtains the data segment and the logical address of the data segment through the interface 103 shown in fig. 1.
Step 202: the storage device divides the data segments into data blocks.
The storage device may divide the data segment by a variable length or fixed length blocking algorithm to obtain one or more data blocks. As shown in fig. 3, the embodiment of the present invention takes the division of data segments into data block a, data block B, and data block C as an example.
Step 203: the storage device calculates a characteristic value of the data block.
And the storage equipment calculates the characteristic value of the data block through a Hash algorithm. In an embodiment of the invention, the storage device calculates characteristic values of one or more data blocks. The eigenvalues of the data block a, the data block B and the data block C shown in fig. 3 are the first eigenvalue, the second eigenvalue and the third eigenvalue, respectively.
Step 204: the storage device establishes a mapping relationship between the data segments and the characteristic values of the data blocks.
After the storage device stores the data segment, when a read request for reading the data segment is received, in order to provide the data segment according to the read request, a mapping relation between the data segment and the characteristic values of the data block A, the data block B and the data block C needs to be established, and corresponding data is read according to the characteristic values of the data block A, the data block B and the data block C. For example, a mapping relationship between logical addresses of data segments and characteristic values of data block a, data block B, and data block C as shown in fig. 4 is established, and corresponding data is read according to the characteristic values of data block a, data block B, and data block C. For another example, the characteristic value of the data segment is calculated, and a mapping relationship between the logical address of the data segment and the characteristic value of the data segment and a mapping relationship between the characteristic value of the data segment and the characteristic values of the data block a, the data block B and the data block C are established. The logical address may be a logical block address.
Step 205: the storage device queries the first mapping relation according to the characteristic value of the data block to determine whether the first mapping relation contains the characteristic value of the data block, and when the first mapping relation contains the characteristic value of the data block, the storage device determines that the data block is data with a specific format, the data deduplication operation is not executed any more, and the process is ended; otherwise, step 206 is performed.
The storage device queries the first mapping relation according to the characteristic values of the data block A, the data block B and the data block C to determine whether the first mapping relation contains a first characteristic value, a second characteristic value and a third characteristic value. The characteristic value of the data block a is a first characteristic value, and the first mapping relationship includes the first characteristic value, so that the data block a is data in a specific format and data deduplication operation is not performed on the data block a any more. The first mapping does not contain the characteristic values of data blocks B and C, and therefore step 206 is performed for data blocks B and C. When the second data block is data in a specific format, the storage device does not need to perform further deduplication operations, that is, the following operations are not needed: when the data block A is inquired to be stored in the storage device, the reference count of the storage address corresponding to the first characteristic value is updated, or when the data block A is not stored in the storage device, the storage device allocates the storage address for the data block A, stores the data block A to the storage address, establishes the mapping between the first characteristic value and the storage address in the second mapping relation, and reduces the access operation to the storage address. More importantly, when the data block A is accessed, the data block A can be directly obtained by querying the first mapping relation according to the characteristic value of the data block A, the storage address of the data block A is not required to be determined according to the first characteristic value, and the data block A is obtained by accessing the storage address of the data block A, so that the access operation on the storage address is further reduced.
Step 206: and the storage equipment queries the second mapping relation according to the characteristic value of the data block to determine whether the second mapping relation contains the characteristic value of the data block. When the second mapping relation includes the feature value of the data block, step 207 is executed, otherwise step 208 is executed.
The storage device queries the second mapping relationship according to the second feature value of the data block B to determine that the second mapping relationship includes the second feature value, and then step 207 is executed. The storage device queries the second mapping relationship according to the third feature value of the data block C, and if the second mapping relationship does not contain the third feature value, step 208 is executed.
Step 207: the storage device updates a reference count for the first storage address.
The storage device determines that the second mapping relationship already includes the second characteristic value of the data block B, that is, the data block stored in the first storage address corresponding to the second characteristic value in the second mapping relationship is the same as the data block B, so that the data block B does not need to be stored, and the second characteristic value of the data block B also corresponds to the first storage address, therefore, the reference count of the first storage address is increased by 1. In this embodiment of the present invention, the reference count refers to the number of times of repeating the data block stored in the storage address included in the second mapping relationship, when the data block is first stored in the storage address in the second mapping relationship, the reference count of the storage address is 1, and when there is a data block with the same characteristic value again, the reference count of the storage address is incremented by 1.
Step 208: the storage device stores the data block to a second storage address, and establishes a mapping between the characteristic value of the data block and the second storage address in a second mapping relation.
The storage device queries the second mapping relation according to the third characteristic value of the data block C to determine that the second mapping relation does not contain the third characteristic value, that is, the data block C is a data block written for the first time, so that a second storage address is allocated to the data block C, the data block C is stored in the second storage address, and mapping between the third characteristic value and the second storage address is established in the second mapping relation.
In the embodiment of the invention, the repeated data deleting operation is carried out on the data which does not belong to the specific format, on one hand, the specific format data is controlled within a certain data range, the size of the first mapping relation stored by the storage equipment can be controlled within a certain range, and the storage equipment is prevented from occupying too large cache when the first mapping relation is loaded; meanwhile, data which do not belong to a specific lattice number can be processed according to the existing data de-duplication process, so that the storage space of the storage device is saved.
In the embodiment of the present invention, as an implementation manner of data in a specific format, a data segment may be constructed by using data with a length of n bytes as a data segment basic unit. Wherein n is an integer greater than 0, and the value of n can be determined according to the resource utilization rate of the storage device. Generally, the larger the value of n, the more resources of the storage device are consumed, and the higher the resource utilization rate is. If n is 1, 1 byte (8 bits) is used as the basic unit of the data segment, and the data segment is constructed based on the basic unit. From the 8 bits, 256 kinds of basic units of data segment of 00000000-. And dividing the data segment into data blocks according to a fixed-length or variable-length block division algorithm. According to a fixed-length or variable-length blocking algorithm, the contents of data blocks divided by one data segment are the same, one data block is selected as specific format data, a characteristic value of the data block, such as a hash value, is determined, and mapping between the characteristic value and the data block is established in a first mapping relation.
In another implementation, the storage device may determine the data in the specific format according to a reference count of the storage address recorded in the second mapping relationship, where the reference count is used to characterize the repeatability of the data stored in the storage address. Such as data block D in memory address M with reference count greater than threshold R as the specific format data. Wherein R is an integer greater than 0. Then, a mapping between the characteristic value T of the data block D and the data block D is established in the first mapping relationship, that is, the first mapping relationship includes a mapping between the characteristic value T of the data block D and the data block D. Since the second mapping relationship already contains the characteristic value T of the data block D, the storage device can directly obtain the characteristic value T of the data block D from the second mapping relationship. After the mapping between the characteristic value T of the data block D and the data block D is established, deleting the mapping between the characteristic value T of the data block D and the storage address M and the data in the storage address M or setting the mapping between the characteristic value T of the data block D and the storage address M and the data in the storage address M as invalid by using an invalid identifier. When the storage device obtains the data block with the characteristic value of T again, the data de-duplication operation does not need to be executed any more, and the access operation to the storage address is reduced.
In this embodiment of the present invention, the first mapping relationship may be implemented by using an array or a binary tree, which is not limited in this embodiment of the present invention.
The embodiment of the invention can be applied to the scene of deleting the repeated data on line, such as the scene of data backup, and the like, for example, the storage device receives the data segment and executes the operation described in the embodiment of the invention. The embodiment of the invention can also be applied to scenes such as off-line deduplication and the like. For example, the storage device reads the stored data segment and performs the operations described in the embodiments of the present invention. The embodiment of the present invention is not limited thereto.
According to the described scheme of the embodiment of the present invention, another embodiment of the present invention provides a storage device as shown in fig. 5, where the storage device stores a first mapping relationship, the first mapping relationship includes a mapping of a first feature value and specific format data corresponding to the first feature value, and the storage device includes a calculating unit 501 and a determining unit 502; the calculating unit 501 is configured to calculate a first data block to obtain a feature value of the first data block, where the feature value of the first data block is a first feature value; the determining unit 502 is configured to query the first mapping relationship according to the first feature value of the first data block, determine that the first mapping relationship includes the first feature value, and if the first data block is data in a specific format, the storage device does not perform deduplication on the first data block.
Optionally, the storage device stores a second mapping relationship; the second mapping relation comprises the mapping of the second characteristic value and the first storage address; the first storage address stores data corresponding to the second characteristic value; the storage device further comprises an update unit 503; the calculating unit 501 is further configured to calculate a second data block to obtain a feature value of the second data block, where the feature value of the second data block is a second feature value; the determining unit 502 is further configured to query the first mapping relationship according to the second feature value of the second data block to determine that the first mapping relationship does not include the second feature value; the determining unit 502 is further configured to query the second mapping relationship according to the second feature value of the second data block to determine that the second mapping relationship includes the second feature value; the updating unit 503 is further configured to update the reference count of the first storage address. Optionally, the storage device further includes a storage unit 504 and an establishing unit 505, and the calculating unit 501 is further configured to calculate a third data block to obtain a feature value of the third data block, where the feature value of the third data block is a third feature value; the determining unit 502 is further configured to query the first mapping relationship according to the third feature value to determine that the first mapping relationship does not include the third feature value; the determining unit 502 is further configured to query the second mapping relationship according to the third feature value to determine that the second mapping relationship does not include the third feature value; a storage unit 504 for storing the third data block to the second storage address; the establishing unit 505 is configured to establish a mapping between the third feature value and the second storage address in the second mapping relationship. Further, the updating unit 503 is further configured to update the reference count of the second storage address.
Optionally, the storage device further includes a dividing unit 506; a dividing unit 506 for dividing the data segment
Obtaining a first data block; the establishing unit 505 is further configured to establish a data segment and the first data block
Mapping of the first eigenvalue.
The effect and further implementation of the storage device shown in fig. 5 can refer to the corresponding description of the foregoing embodiments, and are not described herein again.
As shown in fig. 5, one implementation manner of the storage device is that the above units are installed on the storage device, and the above units may be loaded into a memory of the storage device, and a CPU in the storage device executes instructions in the memory to implement the functions in the corresponding embodiments of the present invention; in another implementation, the units contained in the storage device may be implemented by hardware, or by a combination of hardware and CPU executing instructions in a memory. The above units are also referred to as structural units.
The embodiment of the invention also provides a nonvolatile computer readable storage medium and a computer program product, wherein the nonvolatile computer readable storage medium and the computer program product contain computer instructions, and the computer instructions loaded in the CPU execution memory are used for realizing the functions corresponding to the storage device in the implementation of the invention.
Exemplary descriptions given in the embodiments of the present invention. The terms "first", "second", "third", and the like in the embodiments of the present invention are not used to strictly define the precedence relationship, and for example, when "first", "second", and "third" are used to indicate data blocks, they are used only to distinguish different data blocks, and when "first", "second", and "third" are used to indicate feature values, they are used only to indicate that the feature values belong to different feature values. There may also be one or more data blocks between the first data block and the second data block.
In the first mapping relationship and the second mapping relationship in the embodiment of the present invention, each of the first mapping relationship and the second mapping relationship may include a plurality of mappings, for example, the first mapping relationship may include a plurality of mappings, one of which is a mapping of the first feature value and the specific format data corresponding to the first feature value. The specific format data corresponding to the first characteristic value means that the characteristic value of the specific format data is the first characteristic value.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the division of the units in the above-described apparatus embodiments is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

Claims (16)

1. A data processing method, characterized in that,
the storage device stores a first mapping relation, wherein the first mapping relation comprises a mapping of a first characteristic value and specific format data corresponding to the first characteristic value,
the method comprises the following steps:
the storage device calculates a first data block to obtain a characteristic value of the first data block, wherein the characteristic value of the first data block is the first characteristic value;
the storage device queries the first mapping relation according to a first characteristic value corresponding to the first data block to determine that the first mapping relation includes the first characteristic value, and the first data block is the data in the specific format, wherein the data in the specific format is all-0 data or all-1 data or a combination of 0 and 1 data with a specific length, or the data in the specific format is data with a repetition degree greater than a threshold value, and the storage device does not perform data de-duplication on the first data block any more;
when the storage device receives a request for reading the first data block, the storage device obtains the first data block according to the first mapping relation and the first characteristic value of the first data block without determining the storage address of the first data block according to the first characteristic value.
2. The method of claim 1, wherein the storage device stores a second mapping relationship; the second mapping relation comprises a mapping of a second characteristic value and a first storage address; the first storage address stores data corresponding to the second characteristic value; the method further comprises the following steps:
the storage device calculates a second data block to obtain a characteristic value of the second data block, wherein the characteristic value of the second data block is the second characteristic value;
the storage device queries the first mapping relation according to a second characteristic value of the second data block to determine that the first mapping relation does not contain the second characteristic value;
the storage device queries the second mapping relation according to a second characteristic value of the second data block to determine that the second mapping relation contains the second characteristic value;
the storage device updates a reference count for the first storage address.
3. The method of claim 1, wherein the storage device stores a second mapping relationship; the second mapping relation comprises a mapping of a second characteristic value and a first storage address; the first storage address stores data corresponding to the second characteristic value; the method further comprises the following steps:
the storage device calculates a third data block to obtain a characteristic value of the third data block, wherein the characteristic value of the third data block is a third characteristic value;
the storage device queries the first mapping relation according to the third characteristic value to determine that the first mapping relation does not contain the third characteristic value;
the storage device queries the second mapping relation according to the third characteristic value to determine that the second mapping relation does not contain the third characteristic value;
the storage device stores the third data block to a second storage address;
the storage device establishes mapping between the third characteristic value and the second storage address in the second mapping relation.
4. The method of claim 3, further comprising:
the storage device updates a reference count for the second storage address.
5. The method of claim 1, further comprising:
the storage device divides a data segment to obtain the first data block;
the storage device establishes a mapping relationship between the data segment and the first characteristic value of the first data block.
6. The storage device is characterized in that the storage device stores a first mapping relation, the first mapping relation comprises a first characteristic value and a mapping of specific format data corresponding to the first characteristic value, and the storage device comprises a calculating unit and a determining unit; wherein the content of the first and second substances,
the calculation unit is used for calculating a first data block to obtain a characteristic value of the first data block, and the characteristic value of the first data block is the first characteristic value;
the determining unit is configured to query the first mapping relationship according to a first feature value corresponding to the first data block, and determine that the first mapping relationship includes the first feature value, if the first data block is the data in the specific format, the storage device does not perform deduplication operation on the first data block any more; the characteristic format data is all-0 data or all-1 data or a combination of 0 and 1 data with a specific length, or the characteristic format data is data with a repetition degree larger than a threshold value;
when the storage device receives a request for reading the first data block, the storage device obtains the first data block according to the first mapping relation and the first characteristic value of the first data block without determining the storage address of the first data block according to the first characteristic value.
7. The storage device of claim 6, wherein the storage device stores a second mapping relationship; the second mapping relation comprises a mapping of a second characteristic value and a first storage address; the first storage address stores data corresponding to the second characteristic value; the storage device further includes an update unit:
the calculating unit is further configured to calculate a second data block to obtain a feature value of the second data block, where the feature value of the second data block is the second feature value;
the determining unit is further configured to query the first mapping relationship according to a second feature value of the second data block to determine that the first mapping relationship does not include the second feature value;
the determining unit is further configured to query the second mapping relationship according to a second feature value of the second data block to determine that the second mapping relationship includes the second feature value;
the update unit is further to update a reference count of the first memory address.
8. The storage device of claim 6, wherein the storage device stores a second mapping relationship; the second mapping relation comprises a mapping of a second characteristic value and a first storage address; the first storage address stores data corresponding to the second characteristic value; the storage device further comprises a storage unit and an establishing unit: wherein the content of the first and second substances,
the calculating unit is further configured to calculate a third data block to obtain a feature value of the third data block, where the feature value of the third data block is a third feature value;
the determining unit is further configured to query the first mapping relationship according to the third feature value to determine that the first mapping relationship does not include the third feature value;
the determining unit is further configured to query the second mapping relationship according to the third feature value to determine that the second mapping relationship does not include the third feature value;
the storage unit is used for storing the third data block to a second storage address;
the establishing unit is configured to establish a mapping between the third feature value and the second storage address in the second mapping relationship.
9. The storage device according to claim 8, further comprising an update unit,
the update unit is to update a reference count of the second storage address.
10. The storage device of claim 6,
the storage device also comprises a dividing unit and an establishing unit; the dividing unit is used for dividing a data segment to obtain the first data block;
the establishing unit is configured to establish a mapping between the data segment and the first characteristic value of the first data block.
11. The storage device is characterized in that the storage device stores a first mapping relation, the first mapping relation comprises a first characteristic value and a mapping of specific format data corresponding to the first characteristic value, and the storage device comprises a calculating unit and a determining unit; the storage device comprises a processing interface and a processor, wherein the processor is used for:
calculating a first data block to obtain a characteristic value of the first data block, wherein the characteristic value of the first data block is the first characteristic value;
querying the first mapping relation according to a first characteristic value of the first data block to determine that the first mapping relation includes the first characteristic value, if the first data block is the data with the specific format, the processor does not perform deduplication operation on the first data block any more; the characteristic format data is all-0 data or all-1 data or a combination of 0 and 1 data with a specific length, or the characteristic format data is data with a repetition degree larger than a threshold value;
when the storage device receives a request for reading the first data block, the storage device obtains the first data block according to the first mapping relation and the first characteristic value of the first data block without determining the storage address of the first data block according to the first characteristic value.
12. The storage device of claim 11, wherein the storage device stores a second mapping relationship; the second mapping relation comprises a mapping of a second characteristic value and a first storage address; the first storage address stores data corresponding to the second characteristic value; the processor is further configured to:
calculating a second data block to obtain a characteristic value of the second data block, wherein the characteristic value of the second data block is the second characteristic value;
querying the first mapping relation according to a second characteristic value of the second data block to determine that the first mapping relation does not contain the second characteristic value;
querying the second mapping relation according to a second characteristic value of the second data block to determine that the second mapping relation contains the second characteristic value;
updating a reference count for the first memory address.
13. The storage device of claim 11, wherein the storage device stores a second mapping relationship; the second mapping relation comprises a mapping of a second characteristic value and a first storage address; the first storage address stores data corresponding to the second characteristic value; the processor is further configured to:
calculating a third data block to obtain a characteristic value of the third data block, wherein the characteristic value of the third data block is a third characteristic value;
querying the first mapping relation according to the third characteristic value to determine that the first mapping relation does not contain the third characteristic value;
querying the second mapping relation according to the third feature value to determine that the second mapping relation does not contain the third feature value;
storing the third data block to a second storage address; and establishing the mapping between the third characteristic value and the second storage address in the second mapping relation.
14. The memory device of claim 13, wherein the processor is further configured to update a reference count for the second memory address.
15. The storage device of claim 11,
the processor is further configured to: dividing a data segment to obtain the first data block; and establishing a mapping relation between the data segment and the first characteristic value of the first data block.
16. A non-transitory computer readable storage medium containing computer instructions that are executed by a central processing unit of a storage device to cause the storage device to perform the method of any one of claims 1 to 5.
CN201610839436.9A 2016-09-21 2016-09-21 Data processing method and storage device Active CN106383670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610839436.9A CN106383670B (en) 2016-09-21 2016-09-21 Data processing method and storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610839436.9A CN106383670B (en) 2016-09-21 2016-09-21 Data processing method and storage device

Publications (2)

Publication Number Publication Date
CN106383670A CN106383670A (en) 2017-02-08
CN106383670B true CN106383670B (en) 2020-02-14

Family

ID=57935887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610839436.9A Active CN106383670B (en) 2016-09-21 2016-09-21 Data processing method and storage device

Country Status (1)

Country Link
CN (1) CN106383670B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112394874A (en) * 2019-08-13 2021-02-23 华为技术有限公司 Key value KV storage method and device and storage equipment
CN113467716B (en) * 2021-06-11 2023-05-23 苏州浪潮智能科技有限公司 Method, device, equipment and readable medium for data storage

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591592A (en) * 2010-12-14 2012-07-18 微软公司 Data deduplication in a virtualization environment
CN103279502A (en) * 2013-05-06 2013-09-04 北京赛思信安技术有限公司 Framework and method of repeated data deleting file system combined with parallel file system
CN103514250A (en) * 2013-06-20 2014-01-15 易乐天 Method and system for deleting global repeating data and storage device
CN104376584A (en) * 2013-08-15 2015-02-25 华为技术有限公司 Data compression method, computer system and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9858155B2 (en) * 2010-11-16 2018-01-02 Actifio, Inc. System and method for managing data with service level agreements that may specify non-uniform copying of data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591592A (en) * 2010-12-14 2012-07-18 微软公司 Data deduplication in a virtualization environment
CN103279502A (en) * 2013-05-06 2013-09-04 北京赛思信安技术有限公司 Framework and method of repeated data deleting file system combined with parallel file system
CN103514250A (en) * 2013-06-20 2014-01-15 易乐天 Method and system for deleting global repeating data and storage device
CN104376584A (en) * 2013-08-15 2015-02-25 华为技术有限公司 Data compression method, computer system and device

Also Published As

Publication number Publication date
CN106383670A (en) 2017-02-08

Similar Documents

Publication Publication Date Title
CN108427538B (en) Storage data compression method and device of full flash memory array and readable storage medium
US11119668B1 (en) Managing incompressible data in a compression-enabled log-structured array storage system
US11082206B2 (en) Layout-independent cryptographic stamp of a distributed dataset
US10613976B2 (en) Method and storage device for reducing data duplication
CN108427539B (en) Offline de-duplication compression method and device for cache device data and readable storage medium
US9778881B2 (en) Techniques for automatically freeing space in a log-structured storage system based on segment fragmentation
US9946462B1 (en) Address mapping table compression
CN109074226B (en) Method for deleting repeated data in storage system, storage system and controller
KR102440370B1 (en) System and method for identifying hot data and stream in a solid-state drive
EP3296996A1 (en) Method for processing data, storage apparatus, solid state disk and storage system
CN108959117B (en) H2D write operation acceleration method and device, computer equipment and storage medium
EP3316150B1 (en) Method and apparatus for file compaction in key-value storage system
CA2896369C (en) Method for writing data into flash memory apparatus, flash memory apparatus, and storage system
CN111125033B (en) Space recycling method and system based on full flash memory array
CN110837479B (en) Data processing method, related equipment and computer storage medium
EP3352071A1 (en) Data check method and storage system
CN111625181A (en) Data processing method, redundant array controller of independent hard disk and data storage system
CN109086008B (en) Data processing method of solid state disk and solid state disk
US10402108B2 (en) Efficient control of data storage areas based on a size of compressed data to be written
CN106383670B (en) Data processing method and storage device
CN110737607B (en) Method and device for managing HMB memory, computer equipment and storage medium
US20220300180A1 (en) Data Deduplication Method and Apparatus, and Computer Program Product
US10055356B2 (en) Memory device and method for controlling memory device
CN110199270B (en) Management method and device for storage equipment in storage system
US11099985B2 (en) Storage controller, storage array device, data depositing method, and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant